Voice Coding
Boris Cherny, creator of Claude Code, does most of his coding by speaking to Claude rather than typing. Voice coding turns Claude Code into a hands-free programming partner — you describe what you want in natural language, Claude implements it, and your keyboard stays quiet.
This guide covers everything from platform setup to dictation technique to Boris’s own workflow.
Why Voice Coding
Typing slows down the gap between intent and implementation. When you code by voice, you stay in the high-level problem-solving mode and let Claude handle the mechanical translation to code. The result is:
- Faster expression of complex ideas
- Less context-switching between thinking and typing
- Reduced repetitive strain
- Natural language for architecture decisions, exact strings for code snippets
Voice is best for the what — describing features, explaining intent, asking questions. The keyboard handles the exact — SQL strings, regex, cryptographic constants.
Syntax
# Start continuous voice mode/voice
# Start with trace output (debug audio issues)/voice --trace
# Push-to-talk mode (hold key to speak)/voice --pttPTT vs Continuous Mode
| Mode | How it works | Best for |
|---|---|---|
| Continuous | Always listening, detects pauses | Long dictation sessions, quiet environments |
| PTT (Push-to-Talk) | Hold a key to record | Open offices, noisy environments |
In PTT mode, press and hold the configured hotkey while speaking. Release to submit. The default hotkey is configurable in ~/.claude/config.json under voice.pttKey.
Platform Setup
macOS
macOS has built-in dictation (Enhanced Dictation) that works system-wide:
- Open System Settings → Keyboard → Dictation
- Enable Dictation and set your shortcut (default: press Fn twice)
- Enable Enhanced Dictation for offline processing and continuous mode
Claude Code picks up system dictation automatically in continuous mode. For lower latency and better accuracy with technical vocabulary, use a third-party tool (see below).
Windows
Windows Speech Recognition and the newer Windows Voice Access both work with Claude Code:
Windows Voice Access (Windows 11 recommended):
- Open Settings → Accessibility → Speech
- Enable Voice Access
- Say “Voice Access, start listening” or use the toolbar
Windows Speech Recognition (older):
- Search for “Speech Recognition” in Start
- Run the setup wizard and microphone calibration
- Enable “Start Speech Recognition at startup”
Tip: Windows Voice Access handles technical terms better than the legacy Speech Recognition engine. On Windows 11, prefer Voice Access.
Linux
Linux voice input requires a third-party engine. The most reliable options:
# Option 1: nerd-dictation (local Whisper, no cloud)pip install nerd-dictationnerd-dictation begin --simulate-input-tool xdotool
# Option 2: whisper-mic (OpenAI Whisper locally)pip install whisper-micwhisper-mic --model base --english
# Option 3: Vosk (fully offline, lighter weight)pip install voskAfter setting up any of these, Claude Code’s /voice command will detect the active audio input stream.
Third-Party Integrations
Wispr Flow (Boris’s Preferred Tool)
Wispr Flow is Boris Cherny’s preferred voice input tool. It integrates at the OS level and works in any text field, including the Claude Code terminal.
- Platform: macOS, Windows
- Key feature: Learns your vocabulary over time, handles technical jargon out of the box
- Setup: Install from wispr.flow, grant accessibility permissions, configure hotkey
- Claude Code usage: Activate Wispr Flow normally; Claude Code treats it as standard text input
# No special config needed — Wispr Flow types into the terminal# Start a voice session with /voice for Claude's own voice processing,# or just use Wispr Flow directly to type commands naturallySuperwhisper (Mac Alternative)
Superwhisper runs OpenAI Whisper locally on your Mac with Apple Silicon acceleration:
- Platform: macOS (Apple Silicon recommended)
- Key feature: Runs fully offline, very fast on M-series chips
- Setup: Download from the Mac App Store, configure microphone permissions
- Dictionary: Add technical terms (function names, library names) to the custom dictionary
# Superwhisper custom dictionary location~/Library/Application Support/Superwhisper/custom-dictionary.txtAdd your project’s vocabulary there — function names, package names, domain terms — and accuracy improves dramatically.
Comparison
| Tool | Platform | Runs Offline | Technical Vocab | Price |
|---|---|---|---|---|
| Wispr Flow | Mac, Win | Partial | Excellent | Paid |
| Superwhisper | Mac only | Yes (local) | Good + custom dict | Paid |
| System dictation | Mac, Win | Mac yes | Fair | Free |
| nerd-dictation | Linux | Yes | Good | Free |
Claude Code Desktop vs CLI
Desktop App
The Claude Code desktop app has a voice button (microphone icon) in the input area. Click to toggle continuous listening. The button shows a waveform while active.
The desktop voice button uses Claude’s own voice processing pipeline, which is tuned for developer vocabulary.
CLI /voice Command
The CLI /voice command wraps your platform’s audio input and feeds it to Claude Code’s session:
# Start interactive voice sessionclaude /voice
# Voice with trace output to diagnose recognition issuesclaude /voice --trace
# One-shot: speak once, submit, exit voice modeclaude /voice --onceNote: The CLI
/voicecommand and the desktop voice button share the same recognition pipeline. Use whichever fits your workflow.
Multilingual Support
Claude Code voice supports 20 languages. Language detection is automatic in most cases, but you can set it explicitly:
/voice --lang ja # Japanese/voice --lang vi # Vietnamese/voice --lang fr # French/voice --lang de # German/voice --lang es # Spanish/voice --lang pt # Portuguese/voice --lang zh # Chinese (Mandarin)/voice --lang ko # Korean/voice --lang ar # Arabic/voice --lang hi # HindiSwitching Languages Mid-Session
Say “switch to [language]” during an active voice session. Claude Code will confirm the switch and continue listening in the new language.
Code output language stays English regardless of dictation language — only the conversation layer switches.
How to Dictate Effectively
Code vs Prose vs Commands
Treat these three categories differently:
Commands — use natural language, be direct:
“Add error handling to the fetchUser function. Wrap the database call in try-catch and log the error with the user ID.”
Architecture/design — describe the outcome, not the implementation:
“I want the sidebar to close when you click outside it. The state should live in the layout component.”
Exact strings — switch to keyboard:
- SQL queries with specific syntax
- Regex patterns
- Configuration values
- API keys, URLs, version numbers
Correcting Misinterpretations
When voice misrecognizes a word, correct it without breaking flow:
# Voice: "create a function called foo bar"# Claude interprets: fooBar ✓
# Voice: "the variable should be called access token"# Claude interprets: accessToken ✓
# Misrecognition — say:"Correction: the last word should be [correct word]"
# Or rephrase:"Actually, name it [word spelled out] — as in [phonetic]"For proper nouns and library names, spell them out once: “useQuery — that’s u-s-e-Q-u-e-r-y from TanStack.” Claude Code remembers for the session.
Punctuation
You can dictate punctuation explicitly:
- “open paren” / “close paren”
- “open brace” / “close brace”
- “semicolon”, “colon”, “comma”
- “new line”, “blank line”
- “backtick” for inline code references
In practice, you rarely need to dictate punctuation because Claude infers code structure from context.
Productivity Tips
-
Voice for the what, keyboard for the exact. Dictate intent and architecture; type literal strings, regexes, and version numbers.
-
Use
/btwby voice. Ask quick questions without derailing the main task: say “btw what’s the flag for verbose output?” Claude answers without changing the conversation context. -
Describe the file, not the line. Say “in the auth middleware, add rate limiting” rather than “on line 47, after the token check.” Claude finds the right place.
-
Use the trace flag when something sounds wrong.
/voice --traceshows the raw transcription before Claude processes it, so you can see if the issue is recognition or interpretation. -
Build a custom dictionary. Wispr Flow and Superwhisper both support custom dictionaries. Add your project’s vocabulary once and recognition improves permanently.
-
Pause between instructions. A clear pause signals the end of a thought. Run-on dictation creates run-on prompts.
Example Voice-to-Code Workflow
Here is a complete example of a feature implemented entirely by voice:
You say:
“Create a custom React hook called useDebounce. It takes a value and a delay in milliseconds. It returns the debounced value. Use useEffect and useState internally. Include JSDoc.”
Claude produces:
/** * Debounces a value by delaying updates until after the specified delay. * @param value - The value to debounce * @param delay - Delay in milliseconds * @returns The debounced value */export function useDebounce<T>(value: T, delay: number): T { const [debouncedValue, setDebouncedValue] = useState<T>(value);
useEffect(() => { const timer = setTimeout(() => { setDebouncedValue(value); }, delay);
return () => clearTimeout(timer); }, [value, delay]);
return debouncedValue;}You say:
“Good. Now add a test file for it using Vitest. Test the delay behavior and that it cleans up the timer.”
No typing. No context switch. You stayed in design-thinking mode the entire time.
Boris’s Workflow
“Boris Cherny, creator of Claude Code, does most of his coding by speaking to Claude rather than typing.”
Boris’s setup:
- Tool: Wispr Flow as the primary dictation layer
- Mode: Continuous listening during focused coding sessions
- Pattern: Dictate the intent → review Claude’s implementation → dictate any adjustments
- Keyboard use: Reserved for exact strings, commit messages with specific formatting, and reviewing diffs
His observation: voice coding shifts you from implementer mode to architect mode. You spend more time deciding what the code should do and less time on the mechanical act of writing it. For complex systems with many interacting parts, this shift produces better designs.
Gotchas
Technical Jargon Recognition
Generic voice engines misrecognize developer vocabulary. “useState” becomes “use state” (two words), “async/await” becomes “async slash await,” “typeof” becomes “type of.” Solutions:
- Use Wispr Flow or Superwhisper with a custom dictionary
- Spell out unfamiliar names once per session
- Use
/voice --traceto see raw transcription and spot patterns
Punctuation in Code Snippets
Do not dictate code snippets character by character — it is slow and error-prone. Instead, describe the intent and let Claude write the code. Use the keyboard only for the rare cases where exact syntax matters in the prompt itself.
Noisy Environments
PTT mode (/voice --ptt) eliminates false activations in noisy environments. The tradeoff is slightly higher cognitive load (remembering to hold the key).
The Trace Flag
When voice recognition produces unexpected results, run:
/voice --traceThis shows:
- Raw audio captured (duration, volume level)
- Raw transcription before processing
- Claude’s interpretation of the transcription
The trace output tells you whether the problem is at the microphone, the recognition engine, or Claude’s interpretation layer.