Updated May 28, 2026 with Opus 4.8
Token Estimator – paste text to estimate tokens and cost
Model Comparison – see costs across all models
| Model | Per Request | 10K Requests |
|---|
Calculating Claude API Costs
Anthropic charges per million tokens, with separate rates for input and output. Total Claude API costs depend on which model you select, how much of your context repeats across requests, and whether your workload can run asynchronously. This page covers current pricing as of May 28, 2026, the features that reduce spend, and the considerations that matter when choosing a model.
Current Claude Models
Anthropic now lists Claude Opus 4.8 as the current Opus API model, replacing Opus 4.7 at the same standard price: $5 input and $25 output per million tokens. Sonnet 4.6 remains the practical default for balanced production workloads. Haiku 4.5 remains the lowest-cost current-generation option.
The current flagship for high-difficulty coding, complex agentic work, and expensive-to-fail business tasks. Use the API identifier claude-opus-4-8. Fast mode is available for Opus 4.8 at 2x standard token pricing when speed is worth the premium.
The default model on Claude plans and the best starting point for most API workloads. It usually delivers the best balance between quality, latency, and cost before moving up to Opus.
The cheapest current-generation model. Best for routing, extraction, summarization, moderation, real-time chat, and sub-agent work under a Sonnet or Opus orchestrator.
Full Claude API Pricing Table
Standard prices per million tokens, ordered by model family: Opus, Sonnet, then Haiku. Prompt caching and batch discounts apply at standard rates unless a feature says otherwise. Retired models are included only for historical comparison.
| Model | Input | Output | Status |
|---|---|---|---|
| Opus family – most capable | |||
| Claude Opus 4.8 | $5.00 | $25.00 | Active; current flagship; 1M context; fast mode 2x |
| Claude Opus 4.7 | $5.00 | $25.00 | Active; 1M context; fast mode 6x |
| Claude Opus 4.6 | $5.00 | $25.00 | Active; 1M context; fast mode 6x |
| Claude Opus 4.5 | $5.00 | $25.00 | Active; 200K context |
| Claude Opus 4.1 | $15.00 | $75.00 | Active; higher legacy Opus price |
| Claude Opus 4 | $15.00 | $75.00 | Deprecated; retires June 15, 2026 |
| Sonnet family – balanced | |||
| Claude Sonnet 4.6 | $3.00 | $15.00 | Active; current Sonnet; 1M context |
| Claude Sonnet 4.5 | $3.00 | $15.00 | Active; 200K context |
| Claude Sonnet 4 | $3.00 | $15.00 | Deprecated; retires June 15, 2026 |
| Claude 3.7 Sonnet | $3.00 | $15.00 | Retired on Claude API; historical only |
| Haiku family – fastest and lowest cost | |||
| Claude Haiku 4.5 | $1.00 | $5.00 | Active; current Haiku; 200K context |
| Claude Haiku 3.5 | $0.80 | $4.00 | Retired on Claude API; historical only |
| Claude Haiku 3 | $0.25 | $1.25 | Retired on Claude API; historical only |
Opus 4.1 and Opus 4 at $15/$75 are now three times more expensive than Opus 4.8 for weaker performance. Teams still running on them can reduce costs materially by migrating to a current Opus model without a sticker-price increase over Opus 4.5 through 4.7. Haiku 3 at $0.25/$1.25 remains the absolute cheapest option for simple tasks.
Claude Opus 4.8 Benchmarks
Benchmark comparison from Anthropic’s Opus 4.8 announcement.
Opus Fast Mode Costs
Claude’s Opus 4.8 announcement lists fast mode at 2x standard token pricing, which means $10 per million input tokens and $50 per million output tokens. The detailed Claude API pricing docs still list fast mode for Opus 4.6 and Opus 4.7 at 6x standard token pricing, or $30 input and $150 output per million tokens.
What this means A standard Opus 4.8 request with 10,000 input tokens and 2,000 output tokens costs $0.10. The same request in Opus 4.8 fast mode costs $0.20. On Opus 4.6 or Opus 4.7, a 6x fast-mode multiplier would make the same request $0.60. Use fast mode when response time is the bottleneck, not as a default setting for batch jobs or background processing.
Prompt caching still matters in fast mode because repeated input can be billed at cache-read rates after the fast-mode multiplier. Batch processing is separate: fast mode is not available with the Batch API, so non-urgent work should usually use batch instead of paying the speed premium.
Prompt Caching
Prompt caching is the single most effective cost feature on the Claude API. When a request reads from the cache, cached tokens are billed at 10% of the standard input rate, a 90% discount. Cache writes cost 1.25x the standard input rate for a 5-minute time-to-live, or 2x for a 1-hour time-to-live.
The break-even math is straightforward. A 5-minute cache pays for itself after a single read. A 1-hour cache pays for itself after two reads. Every subsequent read saves 90% of the normal input cost.
What to cache
Cache any context that repeats across requests. System prompts, role definitions, tool schemas, reference documents, code examples, and long context that does not change between user messages are all candidates. A chatbot with a 4,000-token system prompt serving 10,000 conversations per day will save roughly $108 daily on Sonnet 4.6 by caching that system prompt, which is about $3,240 per month from a single optimization.
Which TTL to use
Use the 5-minute TTL for interactive workloads where the same context is hit in quick succession, such as chat sessions, multi-turn agents, and iterative coding. Use the 1-hour TTL when cache hits are spread out but predictable within an hour, or when combining cache with batch processing.
Batch Processing
The Message Batches API processes requests asynchronously and returns results within 24 hours, typically under 1 hour, at exactly 50% off both input and output prices. There is no quality difference between batch and real-time responses. The only trade-off is timing.
Batch fits any workload where a few hours of latency is acceptable: document processing pipelines, data enrichment, nightly analytics, offline evaluations, content generation queues, code reviews, and dataset classification. A team processing 500,000 documents per month at Sonnet 4.6 rates saves approximately $750 to $2,250 monthly by routing through batch.
Combining batch with caching
Batch processing and prompt caching compose. A cached read inside a batch request is billed at 50% of the 10% cache-read rate, which works out to 5% of the standard input rate. On prompt-heavy workloads with repeated context, combined use can reduce input costs by up to 95%. Pair the 1-hour cache TTL with batch rather than the 5-minute TTL, since batch turnaround usually exceeds 5 minutes.
Choosing a Claude Model
Model selection has the largest impact on cost. For cost-sensitive production workloads, start on Sonnet 4.6 and only move to Opus 4.8 when benchmarks on your specific workload show a meaningful quality gap. For the most complex agentic coding, long-horizon reasoning, and expensive-to-fail work, Opus 4.8 is Anthropic’s recommended top-capability model.
When to use Opus 4.8
Use Opus 4.8 when the task requires sustained reasoning across many steps, when errors are costly, or when the model must recover from its own mistakes during long autonomous runs. Complex debugging, large-scale code migration, long-horizon agents that manage spreadsheets or documents end-to-end, and enterprise workflows where a wrong answer is expensive are the strongest fits.
When Haiku 4.5 is enough
Haiku 4.5 handles classification, extraction, routing, summarization, moderation, and simple chat competently. At $0.10 per million tokens on cache hits, it is the most cost-effective choice for retrieval pipelines with heavily reused context. Haiku also works as a sub-agent under an Opus or Sonnet orchestrator in multi-agent architectures, where the orchestrator plans and Haiku executes.
Output tokens cost more than input
Output is billed at five times the input rate across every current model. A request with 5,000 input tokens and 5,000 output tokens on Sonnet 4.6 costs $0.015 for the input and $0.075 for the output, so the output represents 83% of the bill. Keep responses concise. Request specific output formats such as JSON or bullet points when the task allows. Set length constraints when detailed analysis is not required. Extended thinking, while billed at input rates, can add substantial volume on complex tasks, so monitor thinking token usage on long-running agents.
Additional API Charges
Base token costs are not the only line item on an API bill. Several features carry their own pricing.
Web search
$10 per 1,000 searches for the search tool itself. The input and output tokens produced by processing the search results are billed separately at the model’s standard rate.
Code execution
Each organization receives 50 free container-hours per day. Additional hours beyond that run at $0.05 per hour per container.
Managed Agents sessions
$0.08 per session-hour of active runtime. Idle time while the session waits for the next message does not count. Standard token rates still apply on top.
US-only data residency
For workloads that must run inside the US, specifying US-only inference via the inference_geo parameter adds a 1.1x multiplier across every token category, covering input, output, cache writes, and cache reads. Applies to Opus 4.8, Opus 4.7, Opus 4.6, and newer models on the first-party Claude API. Third-party platforms have their own regional pricing.
Tool use overhead
Tool use adds a small system prompt automatically: roughly 313 to 346 tokens for basic tools, around 700 tokens for the text editor, and approximately 245 tokens for bash. The overhead is per-request, so it matters most on high-volume, short-message workloads.
1M Context Window
Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1 million token context window at standard flat pricing. A 900,000-token request is billed at the same per-token rate as a 9,000-token request. This is a change from earlier generations, where long-context requests above 200,000 tokens incurred a 2x premium on all tokens in the request.
Prompt caching and batch discounts apply at standard rates across the full context window. For applications that analyze large codebases, long legal documents, or entire books in a single pass, the flat pricing keeps spend predictable.
Planning Token Counts
Claude’s tokenizer produces roughly 1.33 tokens per word for English prose, or about 4 characters per token. A planning rule of thumb: 1,000 words is approximately 1,330 tokens. Technical documentation runs slightly higher at around 1,400 tokens per 1,000 words. Source code runs higher still at around 1,500 tokens per 1,000 words, because punctuation and syntax tokenize less efficiently than prose.
For accurate counts before deployment, the Anthropic API includes a token counting endpoint that reports exact token usage for any input. The estimator above applies the 4-characters-per-token heuristic, which is close enough for budgeting but not exact. Re-run token counts when migrating between Opus releases because identical text can produce different billable token totals across model versions.
Best Cost Control Practices
A few operational practices keep API spend predictable as usage grows.
Track token usage per request from the start. The API response includes a usage object reporting exact input and output token counts. Monitor per-feature and per-user spend rather than only the monthly total, since the total alone will not identify which feature or customer is responsible for a spike.
Cache anything that repeats. System prompts, tool schemas, documentation embedded in the prompt, and few-shot examples are all candidates. Any context that appears in more than one request should be cached.
Route non-urgent requests through batch. Anything that does not need a sub-second response, such as nightly jobs, content generation pipelines, and evaluation runs, belongs on the Batch API at half price.
Set spending limits in the Anthropic Console. Organization-level and workspace-level limits prevent runaway spend from a bug or a misconfigured agent. Alerts at 50%, 75%, and 90% of the monthly budget give time to react before hitting the hard cap.
Benchmark smaller models on your actual task. Teams often default to Opus when Sonnet or even Haiku would suffice. Running a representative sample of real workload through each tier and measuring the quality gap empirically is usually worth the evaluation time, because the cost difference is large.
Pricing verified from claude.com/pricing on May 28, 2026.
See also the detailed pricing documentation and model overview.