Voice Coding

Boris Cherny, creator of Claude Code, does most of his coding by speaking to Claude rather than typing. Voice coding turns Claude Code into a hands-free programming partner — you describe what you want in natural language, Claude implements it, and your keyboard stays quiet.

This guide covers everything from platform setup to dictation technique to Boris’s own workflow.

Why Voice Coding

Typing slows down the gap between intent and implementation. When you code by voice, you stay in the high-level problem-solving mode and let Claude handle the mechanical translation to code. The result is:

Faster expression of complex ideas
Less context-switching between thinking and typing
Reduced repetitive strain
Natural language for architecture decisions, exact strings for code snippets

Voice is best for the what — describing features, explaining intent, asking questions. The keyboard handles the exact — SQL strings, regex, cryptographic constants.

Syntax

# Start continuous voice mode
/voice

# Start with trace output (debug audio issues)
/voice --trace

# Push-to-talk mode (hold key to speak)
/voice --ptt

PTT vs Continuous Mode

Mode	How it works	Best for
Continuous	Always listening, detects pauses	Long dictation sessions, quiet environments
PTT (Push-to-Talk)	Hold a key to record	Open offices, noisy environments

In PTT mode, press and hold the configured hotkey while speaking. Release to submit. The default hotkey is configurable in ~/.claude/config.json under voice.pttKey.

Platform Setup

macOS

macOS has built-in dictation (Enhanced Dictation) that works system-wide:

Open System Settings → Keyboard → Dictation
Enable Dictation and set your shortcut (default: press Fn twice)
Enable Enhanced Dictation for offline processing and continuous mode

Claude Code picks up system dictation automatically in continuous mode. For lower latency and better accuracy with technical vocabulary, use a third-party tool (see below).

Windows

Windows Speech Recognition and the newer Windows Voice Access both work with Claude Code:

Windows Voice Access (Windows 11 recommended):

Open Settings → Accessibility → Speech
Enable Voice Access
Say “Voice Access, start listening” or use the toolbar

Windows Speech Recognition (older):

Search for “Speech Recognition” in Start
Run the setup wizard and microphone calibration
Enable “Start Speech Recognition at startup”

Tip: Windows Voice Access handles technical terms better than the legacy Speech Recognition engine. On Windows 11, prefer Voice Access.

Linux

Linux voice input requires a third-party engine. The most reliable options:

# Option 1: nerd-dictation (local Whisper, no cloud)
pip install nerd-dictation
nerd-dictation begin --simulate-input-tool xdotool

# Option 2: whisper-mic (OpenAI Whisper locally)
pip install whisper-mic
whisper-mic --model base --english

# Option 3: Vosk (fully offline, lighter weight)
pip install vosk

After setting up any of these, Claude Code’s /voice command will detect the active audio input stream.

Third-Party Integrations

Wispr Flow (Boris’s Preferred Tool)

Wispr Flow is Boris Cherny’s preferred voice input tool. It integrates at the OS level and works in any text field, including the Claude Code terminal.

Platform: macOS, Windows
Key feature: Learns your vocabulary over time, handles technical jargon out of the box
Setup: Install from wispr.flow, grant accessibility permissions, configure hotkey
Claude Code usage: Activate Wispr Flow normally; Claude Code treats it as standard text input

# No special config needed — Wispr Flow types into the terminal
# Start a voice session with /voice for Claude's own voice processing,
# or just use Wispr Flow directly to type commands naturally

Superwhisper (Mac Alternative)

Superwhisper runs OpenAI Whisper locally on your Mac with Apple Silicon acceleration:

Platform: macOS (Apple Silicon recommended)
Key feature: Runs fully offline, very fast on M-series chips
Setup: Download from the Mac App Store, configure microphone permissions
Dictionary: Add technical terms (function names, library names) to the custom dictionary

# Superwhisper custom dictionary location
~/Library/Application Support/Superwhisper/custom-dictionary.txt

Add your project’s vocabulary there — function names, package names, domain terms — and accuracy improves dramatically.

Comparison

Tool	Platform	Runs Offline	Technical Vocab	Price
Wispr Flow	Mac, Win	Partial	Excellent	Paid
Superwhisper	Mac only	Yes (local)	Good + custom dict	Paid
System dictation	Mac, Win	Mac yes	Fair	Free
nerd-dictation	Linux	Yes	Good	Free

Claude Code Desktop vs CLI

Desktop App

The Claude Code desktop app has a voice button (microphone icon) in the input area. Click to toggle continuous listening. The button shows a waveform while active.

The desktop voice button uses Claude’s own voice processing pipeline, which is tuned for developer vocabulary.

CLI `/voice` Command

The CLI /voice command wraps your platform’s audio input and feeds it to Claude Code’s session:

# Start interactive voice session
claude /voice

# Voice with trace output to diagnose recognition issues
claude /voice --trace

# One-shot: speak once, submit, exit voice mode
claude /voice --once

Note: The CLI /voice command and the desktop voice button share the same recognition pipeline. Use whichever fits your workflow.

Multilingual Support

Claude Code voice supports 20 languages. Language detection is automatic in most cases, but you can set it explicitly:

/voice --lang ja    # Japanese
/voice --lang vi    # Vietnamese
/voice --lang fr    # French
/voice --lang de    # German
/voice --lang es    # Spanish
/voice --lang pt    # Portuguese
/voice --lang zh    # Chinese (Mandarin)
/voice --lang ko    # Korean
/voice --lang ar    # Arabic
/voice --lang hi    # Hindi

Switching Languages Mid-Session

Say “switch to [language]” during an active voice session. Claude Code will confirm the switch and continue listening in the new language.

Code output language stays English regardless of dictation language — only the conversation layer switches.

How to Dictate Effectively

Code vs Prose vs Commands

Treat these three categories differently:

Commands — use natural language, be direct:

“Add error handling to the fetchUser function. Wrap the database call in try-catch and log the error with the user ID.”

Architecture/design — describe the outcome, not the implementation:

“I want the sidebar to close when you click outside it. The state should live in the layout component.”

Exact strings — switch to keyboard:

SQL queries with specific syntax
Regex patterns
Configuration values
API keys, URLs, version numbers

Correcting Misinterpretations

When voice misrecognizes a word, correct it without breaking flow:

# Voice: "create a function called foo bar"
# Claude interprets: fooBar ✓

# Voice: "the variable should be called access token"
# Claude interprets: accessToken ✓

# Misrecognition — say:
"Correction: the last word should be [correct word]"

# Or rephrase:
"Actually, name it [word spelled out] — as in [phonetic]"

For proper nouns and library names, spell them out once: “useQuery — that’s u-s-e-Q-u-e-r-y from TanStack.” Claude Code remembers for the session.

Punctuation

You can dictate punctuation explicitly:

“open paren” / “close paren”
“open brace” / “close brace”
“semicolon”, “colon”, “comma”
“new line”, “blank line”
“backtick” for inline code references

In practice, you rarely need to dictate punctuation because Claude infers code structure from context.

Productivity Tips

Voice for the what, keyboard for the exact. Dictate intent and architecture; type literal strings, regexes, and version numbers.
Use /btw by voice. Ask quick questions without derailing the main task: say “btw what’s the flag for verbose output?” Claude answers without changing the conversation context.
Describe the file, not the line. Say “in the auth middleware, add rate limiting” rather than “on line 47, after the token check.” Claude finds the right place.
Use the trace flag when something sounds wrong. /voice --trace shows the raw transcription before Claude processes it, so you can see if the issue is recognition or interpretation.
Build a custom dictionary. Wispr Flow and Superwhisper both support custom dictionaries. Add your project’s vocabulary once and recognition improves permanently.
Pause between instructions. A clear pause signals the end of a thought. Run-on dictation creates run-on prompts.

Example Voice-to-Code Workflow

Here is a complete example of a feature implemented entirely by voice:

You say:

“Create a custom React hook called useDebounce. It takes a value and a delay in milliseconds. It returns the debounced value. Use useEffect and useState internally. Include JSDoc.”

Claude produces:

/**
 * Debounces a value by delaying updates until after the specified delay.
 * @param value - The value to debounce
 * @param delay - Delay in milliseconds
 * @returns The debounced value
 */
export function useDebounce<T>(value: T, delay: number): T {
  const [debouncedValue, setDebouncedValue] = useState<T>(value);

  useEffect(() => {
    const timer = setTimeout(() => {
      setDebouncedValue(value);
    }, delay);

    return () => clearTimeout(timer);
  }, [value, delay]);

  return debouncedValue;
}

You say:

“Good. Now add a test file for it using Vitest. Test the delay behavior and that it cleans up the timer.”

No typing. No context switch. You stayed in design-thinking mode the entire time.

Boris’s Workflow

“Boris Cherny, creator of Claude Code, does most of his coding by speaking to Claude rather than typing.”

Boris’s setup:

Tool: Wispr Flow as the primary dictation layer
Mode: Continuous listening during focused coding sessions
Pattern: Dictate the intent → review Claude’s implementation → dictate any adjustments
Keyboard use: Reserved for exact strings, commit messages with specific formatting, and reviewing diffs

His observation: voice coding shifts you from implementer mode to architect mode. You spend more time deciding what the code should do and less time on the mechanical act of writing it. For complex systems with many interacting parts, this shift produces better designs.

Gotchas

Technical Jargon Recognition

Generic voice engines misrecognize developer vocabulary. “useState” becomes “use state” (two words), “async/await” becomes “async slash await,” “typeof” becomes “type of.” Solutions:

Use Wispr Flow or Superwhisper with a custom dictionary
Spell out unfamiliar names once per session
Use /voice --trace to see raw transcription and spot patterns

Punctuation in Code Snippets

Do not dictate code snippets character by character — it is slow and error-prone. Instead, describe the intent and let Claude write the code. Use the keyboard only for the rare cases where exact syntax matters in the prompt itself.

Noisy Environments

PTT mode (/voice --ptt) eliminates false activations in noisy environments. The tradeoff is slightly higher cognitive load (remembering to hold the key).

The Trace Flag

When voice recognition produces unexpected results, run:

/voice --trace

This shows:

Raw audio captured (duration, volume level)
Raw transcription before processing
Claude’s interpretation of the transcription

The trace output tells you whether the problem is at the microphone, the recognition engine, or Claude’s interpretation layer.