knack
← all posts

Claude Code Effort Levels and Fast Mode: Tuning Opus 4.8 for Speed vs Depth

Opus 4.8 ships two dials people keep mixing up. Effort controls how hard Claude thinks; fast mode controls how fast it types. When to reach for each, and what they cost.

Opus 4.8 landed on May 28, 2026, and it shipped with two separate dials that people keep mixing up. One controls how hard the model thinks. The other controls how fast it generates tokens. They live in different places, they cost different amounts, and only one of them is actually wired into Claude Code right now. If you want to get your Claude Code effort settings right, that distinction is the whole game.

Get the two straight and you stop paying Opus-grade token bills for a one-line rename. You also stop waiting on deep reasoning you never asked for.

What an effort level actually is

Effort is the control for adaptive reasoning. The model decides, per step, whether to think and how much, based on how hard the step looks. Lower effort means it thinks less often and comes back faster and cheaper. Higher effort means it reaches for extended reasoning more aggressively on the parts of a task that warrant it. The Claude Code model configuration docs spell out the five levels Opus 4.8 supports: low, medium, high, xhigh, and max.

The default on Opus 4.8 is high. That one is worth pinning down, because the previous release, Opus 4.7, defaulted to xhigh. So if your team upgraded and felt the model got a little quicker and a little shallower on hard problems, you were not imagining it. The default reasoning budget dropped a notch. Anthropic's Opus 4.8 notes say high on 4.8 spends roughly the same token count as 4.7's default while performing better, and that is the whole pitch of the release.

Here is one detail the docs are blunt about and most people miss. The scale is calibrated per model. xhigh on Opus 4.8 is not the same underlying budget as xhigh on Opus 4.7. The names are stable; the values behind them are not. Do not assume a level transfers when you switch models.

The xhigh tier, and the one above it

xhigh is the "this is genuinely hard, spend the tokens" setting: deeper reasoning, higher token spend, slower wall-clock time. The docs steer you toward it for difficult tasks and long-running async work, the kind where you would rather wait and get it right than iterate three times at high. On Opus 4.8 you have to ask for it, because high is the floor you start from.

Above xhigh sits max, which means deepest reasoning and no constraint on token spending. Anthropic's own guidance is unusually candid here. max "can improve performance on demanding tasks but may show diminishing returns and is prone to overthinking. Test before adopting broadly." That is the docs telling you, in plain text, that cranking the dial to the top is not a free win. max also behaves differently as a setting. Levels low through xhigh persist across sessions, but max applies to the current session only unless you force it through the CLAUDE_CODE_EFFORT_LEVEL environment variable.

There is a sixth option in the /effort menu, and it is not really an effort level at all. ultracode sends xhigh to the model and then has Claude orchestrate dynamic workflows for substantive tasks. It is a Claude Code feature layered on top, session-only, and it does not live in the effortLevel setting or the --effort flag. Reach for it when you want the model to plan and fan out work rather than just think harder on a single turn.

How to set the Claude Code effort level

There are five ways, in priority order. The environment variable wins over everything, then your configured level, then the model default.

Run /effort with no argument for an interactive slider. Run /effort xhigh to set a level directly, or /effort auto to drop back to the model default. The slider also shows up inside /model when you have a supporting model selected, and the current level prints next to the spinner. That last part is handy: you can confirm you are actually running low before you blame the model for a shallow answer.

For something more permanent, set effortLevel in your settings file to low, medium, high, or xhigh. Note that max and ultracode are session-only, and the settings file rejects them. Launch with --effort <level> for a single session, or export CLAUDE_CODE_EFFORT_LEVEL. You can also pin effort per skill or per subagent via frontmatter, which overrides the session level whenever that skill or subagent is active. That last one matters if you build reusable workflows, because a skill that does heavy refactoring can carry xhigh while a quick-lookup skill stays cheap, and you never touch the global setting.

If you only need one deep turn, type ultrathink anywhere in your prompt. Claude Code recognizes the keyword and adds an in-context nudge for that turn without changing your session effort. Plain phrases like "think hard" do nothing special, so do not rely on them.

Fast mode is a different dial entirely

This is where the two controls get tangled. Fast mode does not make Claude think more. It makes the same model generate output tokens faster. Same weights, same behavior, same intelligence, per the fast mode docs. You set speed: "fast" on an API request and get up to 2.5x higher output tokens per second.

Two things to internalize before you reach for it. First, the speedup is on output tokens per second, not time to first token. A short answer that was already quick will not feel transformed, but a long generation will. Second, you pay for it. On Opus 4.8, fast mode runs $10 per million input tokens and $50 per million output tokens, against the standard $5 and $25 (Anthropic's launch post lists the standard rates). That is double. The "three times cheaper" line Anthropic uses refers to fast mode on Opus 4.8 versus the old fast mode on Opus 4.6 and 4.7, which billed $30 and $150 per million. So it is cheaper than the old fast tier, but still twice the price of standard 4.8.

There is a catch most people hit. As of the Opus 4.8 launch, fast mode is a research preview on the Claude API only. You request access through an account manager or the waitlist. It is not on Bedrock, Vertex, or Foundry, and the official docs do not document a /fast toggle inside the Claude Code TUI the way they document /effort. If you are using Claude Code on a subscription plan, effort is the dial you have today, and fast mode is an API-tier capability you opt into per request. Check the fast mode page for current access status before you plan around it.

How the dials interact with model and cost

Effort and speed multiply; they do not substitute for each other. Effort changes how many tokens get generated, since more thinking means more tokens billed at the model's rate. Fast mode changes how fast those tokens come out and at what per-token price. Run xhigh in fast mode on Opus 4.8 and you are paying double rate on a deliberately larger token count. That is the most expensive corner of the grid, and once in a while it is the right one for a deadline.

Here is a practical way to reason about it. Pick the model first, then effort for the quality you need, then speed only if latency is the actual bottleneck and you can eat the premium. For routine edits and scoped changes, low or medium on the default model is the cheap, fast path, and the docs explicitly reserve low for short, latency-sensitive work that is not intelligence-sensitive. For architecture, gnarly debugging, or long async runs, xhigh earns its tokens. Save max for the rare task where you have already tried xhigh and want to test whether more depth helps, knowing it might just overthink the problem.

Model choice still dominates the bill. opusplan runs Opus for planning and switches to Sonnet for execution, which is often a better cost lever than fiddling with effort on a single model. And if you are wiring these settings into repeatable skills, Knack turns a short interview into a SKILL.md with the effort frontmatter already set, so a heavy refactor skill ships at xhigh and a lookup skill stays cheap without anyone hand-editing config.

The short version: effort is your everyday dial inside Claude Code, high is the new Opus 4.8 floor, xhigh is for hard problems, max is for testing whether more depth actually pays, and fast mode is a separate, API-only, double-price way to make long generations finish sooner. Tune the one your task is actually bottlenecked on.