Extended Thinking

Extended thinking is Claude’s reasoning mode: before writing a response, Claude works through the problem internally — evaluating options, catching contradictions, and reconsidering assumptions. That internal chain appears as a collapsible <thinking> block above the final answer.

The result isn’t just a better answer. It’s a visible reasoning trail you can inspect, challenge, and learn from.

How Extended Thinking Activates

Extended thinking isn’t always on. You control it explicitly:

Keyword in your prompt

ultrathink: what's the right database schema for this multi-tenant SaaS?

The word ultrathink anywhere in the prompt activates extended thinking for that query.

Effort flags

/effort high    # activates extended thinking on supported models
/effort max     # maximum depth — deepest reasoning available

Model selection

/model opus

Opus 4.6 on a complex question will often engage extended thinking automatically. Sonnet 4.6 supports it at high and max effort. Haiku does not support extended thinking — it’s optimized for speed, not depth.

Automatic activation

Certain problem types trigger extended thinking automatically when using Opus: multi-step proofs, architectural decisions with many trade-offs, and security analysis. You don’t need to ask — Claude detects the complexity.

What You See

When extended thinking activates, the response begins with a collapsible block:

▶ Thinking (click to expand)
  The user wants to design a schema for a multi-tenant SaaS. Key
  considerations: row-level isolation vs. separate schemas vs. separate
  databases. Row-level: simplest operationally, but shared indexes can
  cause noisy-neighbor issues at scale. Separate schemas: better
  isolation, harder migrations. Separate databases: strongest isolation,
  expensive. Given the user's context (early-stage, PostgreSQL, Supabase),
  row-level with RLS policies is the right starting point...

Then the actual answer follows — concise, direct, and already informed by the reasoning above.

The thinking block is Claude checking its own work before presenting it to you.

Response Flow: Standard vs Extended

flowchart TD A([Your Prompt]) --> B{Effort Level?} B -->|low / default| C[LLM generates response] C --> D([Answer]) B -->|high / max / ultrathink| E[LLM enters thinking mode] E --> F[Reason through problem] F --> G{Contradiction\nor gap?} G -->|Yes| H[Reconsider approach] H --> F G -->|No| I[Synthesize conclusion] I --> J([Thinking block + Answer]) style A fill:#1e293b,color:#7dd3fc,stroke:#334155 style B fill:#1e293b,color:#fcd34d,stroke:#334155 style C fill:#1e293b,color:#7dd3fc,stroke:#334155 style D fill:#1e293b,color:#86efac,stroke:#334155 style E fill:#1e293b,color:#7dd3fc,stroke:#334155 style F fill:#1e293b,color:#7dd3fc,stroke:#334155 style G fill:#1e293b,color:#fcd34d,stroke:#334155 style H fill:#1e293b,color:#7dd3fc,stroke:#334155 style I fill:#1e293b,color:#7dd3fc,stroke:#334155 style J fill:#1e293b,color:#86efac,stroke:#334155

Best Use Cases

Extended thinking earns its cost when the decision is hard to reverse or the stakes of getting it wrong are high.

Architectural decisions

ultrathink: we're designing an event-driven notification system.
Should we use Kafka, SQS, or Postgres LISTEN/NOTIFY?
Constraints: < 10k events/day, existing Postgres infra, 2-person team.

Claude won’t just pick one — it will reason about your specific constraints before recommending.

Root cause analysis for elusive bugs

Intermittent failures are especially hard to diagnose because their causes aren’t obvious from a single stack trace. Extended thinking lets Claude build a mental model of the system, trace causality chains, and surface hypotheses you might have missed.

Security review

ultrathink: review this authentication flow for vulnerabilities.
What attack vectors are we not accounting for?

Security analysis benefits from adversarial reasoning: Claude actively tries to break its own assumptions.

Algorithm design where correctness is critical

Financial calculations, distributed consensus logic, cryptographic protocols — domains where “good enough” isn’t acceptable. Extended thinking catches off-by-one errors, edge cases in state machines, and logical gaps before they become production incidents.

Exploring design trade-offs

When you want to understand why a choice is better, not just what to choose, extended thinking produces the comparison reasoning that would otherwise be implicit.

When Not to Use It

Extended thinking is slower and more expensive than standard mode. Using it for simple tasks wastes both.

Task	Right mode
Rename a variable	Standard (or `/effort low`)
Fix a typo	Standard
Explain what a function does	Standard
Generate boilerplate	Standard
Design a distributed system	Extended
Debug a Heisenbug	Extended
Security threat modeling	Extended
Choose a tech stack	Extended

When in doubt: if you’d spend 30+ minutes thinking about the answer yourself, extended thinking is worth it.

Side-by-Side: Standard vs Extended

Standard prompt:

Refactor this function to reduce cognitive complexity.

Claude returns refactored code. Clean, probably correct.

Extended thinking prompt:

ultrathink: what's the best approach for refactoring this function
given our performance constraints and the fact that it's called
in a hot path 10,000 times per request?

The thinking block shows Claude working through: is complexity the real problem? What’s the actual bottleneck? Would extracting the inner loop help or just shuffle complexity? Only then does it write the refactored code — and explains the reasoning that led to that specific shape.

The difference isn’t just quality. It’s that you can see whether Claude’s reasoning matches your mental model, and push back if it doesn’t.

Extended Thinking as a Workflow Tool

At the start of a planning session, ultrathink forces thorough upfront analysis:

ultrathink: before we write any code, what are the 3 biggest
risks in this migration plan? What would you change?

This surfaces objections early — before they become expensive mid-implementation discoveries.

Before a major refactor, use it to validate your approach:

ultrathink: we're about to refactor the auth module from session-based
to JWT. What are the non-obvious failure modes? What should we
test before switching the flag in production?

After hitting a wall, use it to break out of a local maximum:

ultrathink: I've been debugging this for 2 hours and my current
hypothesis is X. What am I missing?

Claude’s external perspective, combined with deep reasoning, often surfaces the assumption you didn’t know you were making.

Cost Implications

Extended thinking uses more tokens — sometimes significantly more — because the thinking chain itself is billed. On Opus 4.6 with /effort max, a complex architectural question can consume 5-15x more tokens than a standard query.

The practical rule: reserve extended thinking for decisions where being wrong costs more than the token difference. For high-stakes architecture, security, and debugging, it’s almost always worth it. For routine coding tasks, it isn’t.

Model Support Summary

Model	Extended Thinking Support
Opus 4.6	Full — deepest reasoning, auto-activates on complex tasks
Sonnet 4.6	Supported at `/effort high` and `/effort max`
Haiku	Not supported — optimized for speed

See Model Selection for guidance on choosing the right model for your task.

Model Selection — choosing Opus vs Sonnet vs Haiku
Prompting Techniques — how to structure prompts for better reasoning
Planning Mode — using extended thinking at the start of a session