KeyNote
A voice-controlled note-taking and audio-to-text test harness for local multimodal models.
KeyNote combines push-to-talk recording, prompt modes, local SQLite storage, exports,
clipboard workflows, and a terminal UI around a llama-server transcription loop.
What It Does
The project started as a way to test local audio-to-text capabilities in multimodal LLMs. It grew into a compact note system: notes can be created from recordings, appended through a separate hotkey, searched later, exported to Markdown, and transformed with reusable prompt modes.
System Pieces
| Layer | Role | Implementation |
|---|---|---|
| Capture | Push-to-talk and long recordings | Global hotkeys, audio device selection, microphone or loopback input. |
| Processing | Local transcription and mode prompts | Requests to a local llama-server, with modes such as mail, Slack, transcript, and summarize. |
| Storage | Searchable local notes | SQLite-backed notes, metadata, active-note appends, and Markdown export. |
| Interface | CLI, TUI, and desktop overlay | Click commands, Textual screens, clipboard automation, and a compact mode/status overlay. |
Design Notes
- Balancing quick push-to-talk capture with longer recordings that need chunked processing.
- Keeping prompt modes editable while still making them easy to switch with hotkeys.
- Handling local-only data, clipboard paste, and app focus without turning the tool into a cloud service.
- Making the same note store usable from both direct CLI commands and an interactive terminal UI.