Dictation
that thinks.
Press ⌘⌥M anywhere on macOS. Speak naturally — even mid-sentence corrections. Watch the messy raw transcript get rewritten live into something you'd actually send. Then it pastes into wherever you were typing.
colin thanks for setting up the meeting i'm not gonna make it for the full 15 minutes probably can make it for like 10 minutes actually no 5 minuteshope that's okay catch you later
Four moments,
one motion.
Trigger from anywhere
Speakit lives in your menu bar. Hit the global hotkey while you're focused in any text field — Mail, Slack, your terminal, a code comment. The floating panel appears without stealing focus.
Transcribe on-device
Your voice is streamed into Whisper running entirely on your Mac — no audio leaves the machine. Partial words refine into confirmed segments as you speak. You see it happen in real time.
Clean up with intent
When you stop, an LLM rewrites the raw transcript: resolves mid-sentence corrections ("actually no, 5 minutes"), strips filler, picks the right register for where you're typing. Local Ollama or cloud — your choice.
Land in the right place
Speakit remembers where you were typing. The cleaned text drops in at your cursor with a synthetic ⌘V. Your clipboard is preserved and restored. You never lose context.
Made for people
who actually speak.
Knows the room you're
talking to.
Speakit reads the frontmost app and (when allowed) the focused field to pick a register. Mail gets greetings and paragraphs. Slack gets short and casual. Terminal gets a bare command. Xcode keeps your identifiers intact.
"Actually, no, 5 minutes."
The LLM drops abandoned drafts when you correct yourself mid-sentence. You don't have to speak in finished thoughts. You can think out loud.
Five providers, one settings pane.
Goes back to where you were.
Speakit tracks the last app you were typing in, even when you click the menu bar to start. When you stop, your clipboard is saved, the cleaned text drops in at your cursor, and your clipboard is restored two beats later.
Watch your words sharpen.
Whisper streams partial words that snap into confirmed segments as you finish phrases. When you stop, the LLM streams its rewrite token-by-token into the same panel. Nothing about Speakit feels like waiting.
Global from any app.
Registered through Carbon at the lowest reliable level. Works in browsers, Electron apps, terminals, fullscreen software, and your own builds. No focus shuffle.
Your voice can stay yours.
Pair on-device Whisper with a local Ollama model and Speakit makes zero network requests. Your voice is captured, transcribed, cleaned, and pasted without a packet leaving your Mac.
Prefer a cloud model for sharper formatting? Drop your own Anthropic, OpenAI, Gemini, or z.ai key into Settings. Your key lives in macOS Keychain. Requests go straight from your Mac to your provider — Speakit has no backend.
Speakit has no servers and no accounts. No analytics SDK, no crash reporting, no remote config. Your voice never leaves your Mac unless you've chosen a cloud formatter — and even then it goes straight to your provider, on your key.
Get Speakit.
Free during the beta. Signed and notarized for macOS 13+. Drop into Applications, grant mic + accessibility in the onboarding window, you're dictating in under a minute.
The honest small print.
01Does my audio leave my Mac?
Does my audio leave my Mac?
By default, no. Whisper runs on-device via WhisperKit (CoreML), and the default formatter is Ollama running locally. If you switch the formatter to Anthropic, OpenAI, Gemini, or z.ai in Settings, the raw transcript text (not the audio) is sent to your chosen provider using your own API key. Speakit has no backend of its own.
02Which Whisper model is used?
Which Whisper model is used?
We default to openai_whisper-base.en — ~140 MB, English-only, fast enough for live updates on Apple Silicon. The model downloads once on first launch into ~/Documents/huggingface/models. Swapping to base, small, or large variants will be a Settings option in a later release.
03What's the difference between local Ollama and cloud LLMs?
What's the difference between local Ollama and cloud LLMs?
Quality vs latency vs privacy. qwen2.5:3b on a recent Mac formats short utterances in a second or two with zero network. Claude Haiku 4.5 or GPT-4o-mini handles complex rewrites better — especially long emails and intricate self-corrections — but spends a fraction of a cent per request and needs a network round-trip. Pick the one whose tradeoffs you like.
04Will it work with my favorite app?
Will it work with my favorite app?
If you can type into it, you can dictate into it. Speakit pastes via a system-level ⌘V keystroke, so it works in native apps, Electron apps, browser inputs, terminals, fullscreen software — anywhere ⌘V works. Email composers, Slack, VS Code, Cursor, Terminal/iTerm/Warp/Ghostty, Apple Notes, Obsidian, Bear, Linear, Discord, and more are all explicitly recognised for context-aware formatting.
05How does context detection work?
How does context detection work?
Two layers. First, a bundle-ID lookup table maps known apps to a writing context (email / chat / code / terminal / notes / generic). Second, when Accessibility permission is granted, Speakit reads the focused field's placeholder/label so it can tell, say, an email Subject from the body. That hint goes into the LLM prompt.
06What does the onboarding window do?
What does the onboarding window do?
Walks you through the two macOS permissions Speakit needs — microphone and accessibility — once. Live status badges turn green the instant you toggle each one on in System Settings, so you're not guessing. Re-open it any time from the menu bar → Setup…. There's a 'Permissions stuck? Reset and relaunch' link for the rare case macOS has a stale entry from an earlier signature.
07Is it really free?
Is it really free?
Free during the beta. No account, no telemetry, no email signup. Pricing only comes in if cloud features land that need a backend (sync, team workspaces) — the core local dictation will always be free.
08What about Windows / Linux?
What about Windows / Linux?
Not yet. Speakit leans hard on macOS-specific APIs — Carbon hotkeys, NSWorkspace, accessibility, NSPanel — so a port wouldn't be a recompile. If there's enough interest, a Windows version is possible. The transcription and formatting layers are already largely portable.