Updated April 16, 2026 with Opus 4.7
Token Estimator – paste text to estimate tokens and cost
Model Comparison – see costs across all models
| Model | Per Request | 10K Requests |
|---|
Calculating Claude API Costs
Anthropic charges per million tokens, with separate rates for input and output. Total Claude API costs depend on which model you select, how much of your context repeats across requests, and whether your workload can run asynchronously. This page covers current pricing as of April 16, 2026, the features that reduce spend, and the considerations that matter when choosing a model.
Current Claude Models
Anthropic released Claude Opus 4.7 on April 16, 2026, replacing Opus 4.6 as the flagship. Pricing is unchanged from Opus 4.6 at $5 input and $25 output per million tokens. Sonnet 4.6 remains the default model on Free, Pro, Max, and Team plans. Haiku 4.5 remains the budget option. These three cover the majority of production use cases.
The current flagship. Strongest performance on coding, vision, and long-running agentic workflows in the Claude family. Introduces an “xhigh” effort level between high and max, giving finer control over the reasoning-versus-speed trade-off. API identifier: claude-opus-4-7.
The default model on every Claude plan and the recommended starting point for most API workloads. Sits within a couple of percentage points of Opus on most benchmarks at one-fifth the input price. Supports extended thinking, adaptive thinking, and context compaction for long agentic runs.
The fastest and cheapest model in the current generation. One-third the input price of Sonnet. Suited to classification, routing, extraction, summarization, moderation, real-time chat, and as a sub-agent under an Opus or Sonnet orchestrator.
Full Claude API Pricing Table
Prices per million tokens. Prompt caching and batch discounts apply at standard rates across every model listed.
| Model | Input | Output | Status |
|---|---|---|---|
| Claude Opus 4.7 | $5.00 | $25.00 | Current flagship |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Default on plans |
| Claude Haiku 4.5 | $1.00 | $5.00 | Budget option |
| Claude Opus 4.6 | $5.00 | $25.00 | Previous flagship |
| Claude Opus 4.5 | $5.00 | $25.00 | Highest SWE-bench (80.9%) |
| Claude Sonnet 4.5 | $3.00 | $15.00 | Predecessor to Sonnet 4.6 |
| Claude Opus 4.1 | $15.00 | $75.00 | Legacy |
| Claude Opus 4 | $15.00 | $75.00 | Legacy |
| Claude Sonnet 4 | $3.00 | $15.00 | Legacy |
| Claude 3.7 Sonnet | $3.00 | $15.00 | Deprecated |
| Claude Haiku 3.5 | $0.80 | $4.00 | Deprecated |
| Claude Haiku 3 | $0.25 | $1.25 | Cheapest available |
Opus 4.1 and Opus 4 at $15/$75 are now three times more expensive than Opus 4.7 for weaker performance. Teams still running on them can reduce costs materially by migrating to Opus 4.6 or 4.7 without changing behavior meaningfully. Haiku 3 at $0.25/$1.25 remains the absolute cheapest option for simple tasks.
Important: Changes to Opus 4.7 Tokenizer
Opus 4.7 uses an updated tokenizer that produces approximately 1.0 to 1.35 times more tokens than prior Claude tokenizers for the same input text, depending on content type. The per-token price is unchanged from Opus 4.6. A prompt that tokenized to 10,000 tokens on Opus 4.6 may tokenize to between 10,000 and 13,500 tokens on Opus 4.7.
What this means: If you are migrating existing prompts from Opus 4.6 to 4.7, actual spend on identical workloads may rise modestly even though the sticker price is unchanged. Re-profile your highest-volume prompts before switching production traffic. This is not a reason to avoid the upgrade, but it is a reason to validate costs rather than assume they stay flat.
Prompt Caching
Prompt caching is the single most effective cost feature on the Claude API. When a request reads from the cache, cached tokens are billed at 10% of the standard input rate, a 90% discount. Cache writes cost 1.25x the standard input rate for a 5-minute time-to-live, or 2x for a 1-hour time-to-live.
The break-even math is straightforward. A 5-minute cache pays for itself after a single read. A 1-hour cache pays for itself after two reads. Every subsequent read saves 90% of the normal input cost.
What to cache
Cache any context that repeats across requests. System prompts, role definitions, tool schemas, reference documents, code examples, and long context that does not change between user messages are all candidates. A chatbot with a 4,000-token system prompt serving 10,000 conversations per day will save roughly $108 daily on Sonnet 4.6 by caching that system prompt, which is about $3,240 per month from a single optimization.
Which TTL to use
Use the 5-minute TTL for interactive workloads where the same context is hit in quick succession, such as chat sessions, multi-turn agents, and iterative coding. Use the 1-hour TTL when cache hits are spread out but predictable within an hour, or when combining cache with batch processing.
Batch Processing
The Message Batches API processes requests asynchronously and returns results within 24 hours, typically under 1 hour, at exactly 50% off both input and output prices. There is no quality difference between batch and real-time responses. The only trade-off is timing.
Batch fits any workload where a few hours of latency is acceptable: document processing pipelines, data enrichment, nightly analytics, offline evaluations, content generation queues, code reviews, and dataset classification. A team processing 500,000 documents per month at Sonnet 4.6 rates saves approximately $750 to $2,250 monthly by routing through batch.
Combining batch with caching
Batch processing and prompt caching compose. A cached read inside a batch request is billed at 50% of the 10% cache-read rate, which works out to 5% of the standard input rate. On prompt-heavy workloads with repeated context, combined use can reduce input costs by up to 95%. Pair the 1-hour cache TTL with batch rather than the 5-minute TTL, since batch turnaround usually exceeds 5 minutes.
Choosing a Claude Model
Model selection has the largest impact on cost. Start on Sonnet 4.6 and only move to Opus 4.7 when benchmarks on your specific workload show a meaningful quality gap. Most production workloads are well served by Sonnet at one-fifth the Opus input price.
When to use Opus 4.7
Use Opus 4.7 when the task requires sustained reasoning across many steps, when errors are costly, or when the model must recover from its own mistakes during multi-hour autonomous runs. Complex debugging, large-scale code migration, long-horizon agents that manage spreadsheets or documents end-to-end, and enterprise workflows where a wrong answer is expensive are the strongest fits. The xhigh effort level gives finer control over reasoning depth than the older high/max options.
When Haiku 4.5 is enough
Haiku 4.5 handles classification, extraction, routing, summarization, moderation, and simple chat competently. At $0.10 per million tokens on cache hits, it is the most cost-effective choice for retrieval pipelines with heavily reused context. Haiku also works as a sub-agent under an Opus or Sonnet orchestrator in multi-agent architectures, where the orchestrator plans and Haiku executes.
Output tokens cost more than input
Output is billed at five times the input rate across every current model. A request with 5,000 input tokens and 5,000 output tokens on Sonnet 4.6 costs $0.015 for the input and $0.075 for the output, so the output represents 83% of the bill. Keep responses concise. Request specific output formats such as JSON or bullet points when the task allows. Set length constraints when detailed analysis is not required. Extended thinking, while billed at input rates, can add substantial volume on complex tasks, so monitor thinking token usage on long-running agents.
Additional API Charges
Base token costs are not the only line item on an API bill. Several features carry their own pricing.
Web search
$10 per 1,000 searches for the search tool itself. The input and output tokens produced by processing the search results are billed separately at the model’s standard rate.
Code execution
Each organization receives 50 free container-hours per day. Additional hours beyond that run at $0.05 per hour per container.
Managed Agents sessions
$0.08 per session-hour of active runtime. Idle time while the session waits for the next message does not count. Standard token rates still apply on top.
US-only data residency
For workloads that must run inside the US, specifying US-only inference via the inference_geo parameter adds a 1.1x multiplier across every token category, covering input, output, cache writes, and cache reads. Applies to Opus 4.7, Opus 4.6, and newer models on the first-party Claude API. Third-party platforms have their own regional pricing.
Tool use overhead
Tool use adds a small system prompt automatically: roughly 313 to 346 tokens for basic tools, around 700 tokens for the text editor, and approximately 245 tokens for bash. The overhead is per-request, so it matters most on high-volume, short-message workloads.
1M Context Window
Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1 million token context window at standard flat pricing. A 900,000-token request is billed at the same per-token rate as a 9,000-token request. This is a change from earlier generations, where long-context requests above 200,000 tokens incurred a 2x premium on all tokens in the request.
Prompt caching and batch discounts apply at standard rates across the full context window. For applications that analyze large codebases, long legal documents, or entire books in a single pass, the flat pricing keeps spend predictable.
How To Plan Token Counts
Claude’s tokenizer produces roughly 1.33 tokens per word for English prose, or about 4 characters per token. A planning rule of thumb: 1,000 words is approximately 1,330 tokens. Technical documentation runs slightly higher at around 1,400 tokens per 1,000 words. Source code runs higher still at around 1,500 tokens per 1,000 words, because punctuation and syntax tokenize less efficiently than prose.
For accurate counts before deployment, the Anthropic API includes a token counting endpoint that reports exact token usage for any input. The estimator above applies the 4-characters-per-token heuristic, which is close enough for budgeting but not exact. Note that Opus 4.7’s updated tokenizer produces 1.0 to 1.35 times more tokens than older tokenizers on the same text, so plan accordingly for that model.
Best Cost Control Practices
A few operational practices keep API spend predictable as usage grows.
Track token usage per request from the start. The API response includes a usage object reporting exact input and output token counts. Monitor per-feature and per-user spend rather than only the monthly total, since the total alone will not identify which feature or customer is responsible for a spike.
Cache anything that repeats. System prompts, tool schemas, documentation embedded in the prompt, and few-shot examples are all candidates. Any context that appears in more than one request should be cached.
Route non-urgent requests through batch. Anything that does not need a sub-second response, such as nightly jobs, content generation pipelines, and evaluation runs, belongs on the Batch API at half price.
Set spending limits in the Anthropic Console. Organization-level and workspace-level limits prevent runaway spend from a bug or a misconfigured agent. Alerts at 50%, 75%, and 90% of the monthly budget give time to react before hitting the hard cap.
Benchmark smaller models on your actual task. Teams often default to Opus when Sonnet or even Haiku would suffice. Running a representative sample of real workload through each tier and measuring the quality gap empirically is usually worth the evaluation time, because the cost difference is large.
Pricing verified from claude.com/pricing on April 16, 2026.
See also the detailed pricing documentation and model overview.