Claude API Cost Calculator 2026 | Fable 5 & Sonnet 5

Updated July 3, 2026

Model

Input tokens

approx. 1,125 words

Output tokens

approx. 600 words

Monthly API calls

Per Request

$0.0550

$0.0150 input + $0.0400 output

Monthly Total

$275.00

5,000 requests

Cost Optimizations optional

Prompt Caching on input

Batch Processing 50% off

Fast Mode 2x price

US Data Residency 1.1x

Additional Tools

Token Estimator – paste text to estimate tokens and cost

0 estimated tokens

$0.000000 input cost

0 words

Model Comparison – see costs across all models

Using your current input/output token values. Change them above to update this table.

Show retired models

Model	Per Request	1K Requests	10K Requests	Best For

Calculating Claude API Costs

Anthropic charges per million tokens, with separate rates for input and output. Total Claude API costs depend on which model you select, how much of your context repeats across requests, and whether your workload can run asynchronously. This page covers current pricing as of June 30, 2026, the features that reduce spend, and the considerations that matter when choosing a model.

Current Claude Models

Anthropic released Claude Sonnet 5 on June 30, 2026 as the new default model across Claude plans, the most agentic Sonnet yet, with performance approaching Opus 4.8 at a lower price. It launched at introductory pricing of $2 input and $10 output per million tokens through August 31, 2026, then moves to standard $3 / $15. Claude Fable 5 remains the most capable widely released model at $10 input and $50 output per million tokens, double Opus 4.8 rates, with Opus 4.8 the current Opus flagship at $5 / $25 and Haiku 4.5 the lowest-cost current-generation option. Sonnet 4.6 stays available at the same standard $3 / $15 for workloads pinned to its behavior.

Claude Fable 5 NEW

$10 / $50 per 1M tokens

1M context, 128K output per request, adaptive thinking always on

Anthropic’s most capable widely released model, built for the most demanding reasoning and long-horizon agentic work. Use the API identifier claude-fable-5. Fable 5 ships with safety classifiers that can decline certain requests; a refused request that produces no output is not billed, and the API can automatically retry on another Claude model via the fallbacks parameter.

Claude Opus 4.8

$5 / $25 per 1M tokens

1M context, fast mode at 2x pricing, built for agents and coding

The Opus flagship for high-difficulty coding, complex agentic work, and expensive-to-fail business tasks at half the Fable 5 rate. Use the API identifier claude-opus-4-8. Fast mode is available for Opus 4.8 at 2x standard token pricing when speed is worth the premium.

Claude Sonnet 5 NEW DEFAULT

$2 / $10 per 1M tokens intro

1M context, 128K output, most agentic Sonnet yet, selectable effort levels

Released June 30, 2026 as the new default model on Claude plans and the best starting point for most API workloads, with agentic performance approaching Opus 4.8 at a lower price. Use the API identifier claude-sonnet-5. Introductory pricing of $2 input and $10 output per million tokens runs through August 31, 2026, then moves to standard $3 / $15. Sonnet 5 uses an updated tokenizer that can map the same text to roughly 1.0 to 1.35x more tokens, so re-profile prompts when migrating from Sonnet 4.6.

Claude Sonnet 4.6

$3 / $15 per 1M tokens

1M context, strong coding balance, adaptive thinking

The previous default model on Claude plans, superseded by Sonnet 5 at the same standard $3 / $15 pricing. Still a strong, well-understood balance of quality, latency, and cost for workloads pinned to its behavior.

Claude Haiku 4.5

$1 / $5 per 1M tokens

200K context, fastest current-generation option, extended thinking

The cheapest current-generation model. Best for routing, extraction, summarization, moderation, real-time chat, and sub-agent work under a Sonnet or Opus orchestrator.

Full Claude API Pricing Table

Standard prices per million tokens, ordered by model family: Opus, Sonnet, then Haiku. Prompt caching and batch discounts apply at standard rates unless a feature says otherwise. Retired models are included only for historical comparison.

Model	Input	Output	Status
Fable family – frontier capability
Claude Fable 5	$10.00	$50.00	Active; most capable widely released model; 1M context
Claude Mythos 5	$10.00	$50.00	Limited availability via Project Glasswing; 1M context
Opus family – flagship workhorse
Claude Opus 4.8	$5.00	$25.00	Active; current Opus; 1M context; fast mode 2x
Claude Opus 4.7	$5.00	$25.00	Active; 1M context; fast mode 6x
Claude Opus 4.6	$5.00	$25.00	Active; 1M context; fast mode 6x
Claude Opus 4.5	$5.00	$25.00	Active; 200K context
Claude Opus 4.1	$15.00	$75.00	Active; higher legacy Opus price
Claude Opus 4	$15.00	$75.00	Deprecated; retires June 15, 2026
Sonnet family – balanced
Claude Sonnet 5	$3.00	$15.00	Active; new default; 1M context; intro $2 / $10 through Aug 31, 2026
Claude Sonnet 4.6	$3.00	$15.00	Active; previous Sonnet; 1M context
Claude Sonnet 4.5	$3.00	$15.00	Active; 200K context
Claude Sonnet 4	$3.00	$15.00	Deprecated; retires June 15, 2026
Claude 3.7 Sonnet	$3.00	$15.00	Retired on Claude API; historical only
Haiku family – fastest and lowest cost
Claude Haiku 4.5	$1.00	$5.00	Active; current Haiku; 200K context
Claude Haiku 3.5	$0.80	$4.00	Retired on Claude API; historical only
Claude Haiku 3	$0.25	$1.25	Retired on Claude API; historical only

Claude Mythos 5 shares Fable 5’s capabilities and pricing but is offered only in limited release to approved customers through Anthropic’s Project Glasswing, so Fable 5 is the generally available model in that class. Both carry 30-day data retention and are not available under zero data retention agreements, which matters for compliance-sensitive workloads. Opus 4.1 and Opus 4 at $15/$75 are now three times more expensive than Opus 4.8 for weaker performance. Teams still running on them can reduce costs materially by migrating to a current Opus model without a sticker-price increase over Opus 4.5 through 4.7. Haiku 3 at $0.25/$1.25 remains the absolute cheapest option for simple tasks.

Claude Fable 5 Benchmarks

Benchmark comparison from Anthropic’s Fable 5 announcement. Anthropic reports Fable 5 and Mythos 5 scores jointly; on starred benchmarks Fable 5 can land closer to Opus 4.8 because its safety fallbacks route some queries away.

Claude Fable 5 benchmark comparison table showing SWE-Bench Pro, Terminal-Bench 2.1, Humanity's Last Exam, OSWorld-Verified, GDPval-AA, and GDPpdf scores versus Claude Opus 4.8, GPT-5.5, and Gemini 3.1 Pro.

Claude Sonnet 5 Benchmarks

Benchmark comparison from Anthropic’s Claude Sonnet 5 launch on June 30, 2026. Sonnet 5 sits clearly above Sonnet 4.6 across every headline benchmark and approaches Opus 4.8, edging ahead of the flagship on GDPval-AA v2 knowledge work. Opus 4.8 is shown as the reference ceiling.

Claude Sonnet 5 benchmark comparison table showing SWE-bench Pro, Terminal-Bench 2.1, OSWorld-Verified, Humanity's Last Exam, and GDPval-AA v2 scores versus Claude Sonnet 4.6 and Claude Opus 4.8.

Opus Fast Mode Costs

Claude’s Opus 4.8 announcement lists fast mode at 2x standard token pricing, which means $10 per million input tokens and $50 per million output tokens. The detailed Claude API pricing docs list fast mode for Opus 4.7 at 6x standard token pricing, or $30 input and $150 output per million tokens. Opus 4.6 fast mode was discontinued on June 29, 2026, and Opus 4.7 fast mode is scheduled for removal on July 24, 2026.

What this means A standard Opus 4.8 request with 10,000 input tokens and 2,000 output tokens costs $0.10. The same request in Opus 4.8 fast mode costs $0.20. On Opus 4.7, a 6x fast-mode multiplier would make the same request $0.60. Use fast mode when response time is the bottleneck, not as a default setting for batch jobs or background processing.

Prompt caching still matters in fast mode because repeated input can be billed at cache-read rates after the fast-mode multiplier. Batch processing is separate: fast mode is not available with the Batch API, so non-urgent work should usually use batch instead of paying the speed premium. Fast mode is an Opus feature; Anthropic does not list it for Fable 5, Sonnet, or Haiku models.

Prompt Caching

Prompt caching is the single most effective cost feature on the Claude API. When a request reads from the cache, cached tokens are billed at 10% of the standard input rate, a 90% discount. Cache writes cost 1.25x the standard input rate for a 5-minute time-to-live, or 2x for a 1-hour time-to-live.

The break-even math is straightforward. A 5-minute cache pays for itself after a single read. A 1-hour cache pays for itself after two reads. Every subsequent read saves 90% of the normal input cost.

What to cache

Cache any context that repeats across requests. System prompts, role definitions, tool schemas, reference documents, code examples, and long context that does not change between user messages are all candidates. A chatbot with a 4,000-token system prompt serving 10,000 conversations per day will save roughly $108 daily on Sonnet 4.6 by caching that system prompt, which is about $3,240 per month from a single optimization.

Which TTL to use

Use the 5-minute TTL for interactive workloads where the same context is hit in quick succession, such as chat sessions, multi-turn agents, and iterative coding. Use the 1-hour TTL when cache hits are spread out but predictable within an hour, or when combining cache with batch processing.

Batch Processing

The Message Batches API processes requests asynchronously and returns results within 24 hours, typically under 1 hour, at exactly 50% off both input and output prices. There is no quality difference between batch and real-time responses. The only trade-off is timing.

Batch fits any workload where a few hours of latency is acceptable: document processing pipelines, data enrichment, nightly analytics, offline evaluations, content generation queues, code reviews, and dataset classification. A team processing 500,000 documents per month at Sonnet 4.6 rates saves approximately $750 to $2,250 monthly by routing through batch.

Combining batch with caching

Batch processing and prompt caching compose. A cached read inside a batch request is billed at 50% of the 10% cache-read rate, which works out to 5% of the standard input rate. On prompt-heavy workloads with repeated context, combined use can reduce input costs by up to 95%. Pair the 1-hour cache TTL with batch rather than the 5-minute TTL, since batch turnaround usually exceeds 5 minutes.

Choosing a Claude Model

Model selection has the largest impact on cost. For cost-sensitive production workloads, start on Sonnet 5 and only move up when benchmarks on your specific workload show a meaningful quality gap. Opus 4.8 covers most complex agentic coding and expensive-to-fail work, and Fable 5 sits above it for the most demanding reasoning at double the price.

When to use Sonnet 5

Use Sonnet 5 as the default for most production workloads: agentic coding, tool use, computer use, and everyday knowledge work. It is the most agentic Sonnet to date and lands close to Opus 4.8 on many tasks, and through August 31, 2026 its introductory $2 / $10 pricing is below what Sonnet 4.6 cost. Move up to Opus 4.8 or Fable 5 only when benchmarks on your own workload show a clear quality gap worth the higher token price.

When to use Fable 5

Use Fable 5 when even Opus 4.8 falls short: the hardest multi-step reasoning, long-horizon autonomous agents that run for hours, and frontier research or analysis tasks where capability is worth a 2x token premium. Budget for adaptive thinking, which is always on and bills thinking tokens at output rates. Plan for refusal handling too, since Fable 5’s safety classifiers can decline requests; refused requests that produce no output are free, and the fallbacks parameter can retry on Opus 4.8 automatically.

When to use Opus 4.8

Use Opus 4.8 when the task requires sustained reasoning across many steps, when errors are costly, or when the model must recover from its own mistakes during long autonomous runs. Complex debugging, large-scale code migration, long-horizon agents that manage spreadsheets or documents end-to-end, and enterprise workflows where a wrong answer is expensive are the strongest fits.

When Haiku 4.5 is enough

Haiku 4.5 handles classification, extraction, routing, summarization, moderation, and simple chat competently. At $0.10 per million tokens on cache hits, it is the most cost-effective choice for retrieval pipelines with heavily reused context. Haiku also works as a sub-agent under an Opus or Sonnet orchestrator in multi-agent architectures, where the orchestrator plans and Haiku executes.

Output tokens cost more than input

Output is billed at five times the input rate across every current model. A request with 5,000 input tokens and 5,000 output tokens on Sonnet 4.6 costs $0.015 for the input and $0.075 for the output, so the output represents 83% of the bill. Keep responses concise. Request specific output formats such as JSON or bullet points when the task allows. Set length constraints when detailed analysis is not required. Extended thinking, while billed at input rates, can add substantial volume on complex tasks, so monitor thinking token usage on long-running agents.

Additional API Charges

Base token costs are not the only line item on an API bill. Several features carry their own pricing.

Web search

$10 per 1,000 searches for the search tool itself. The input and output tokens produced by processing the search results are billed separately at the model’s standard rate.

Code execution

Each organization receives 50 free container-hours per day. Additional hours beyond that run at $0.05 per hour per container.

Managed Agents sessions

$0.08 per session-hour of active runtime. Idle time while the session waits for the next message does not count. Standard token rates still apply on top.

US-only data residency

For workloads that must run inside the US, specifying US-only inference via the inference_geo parameter adds a 1.1x multiplier across every token category, covering input, output, cache writes, and cache reads. Applies to Opus 4.8, Opus 4.7, Opus 4.6, and newer models on the first-party Claude API. Third-party platforms have their own regional pricing.

Tool use overhead

Tool use adds a small system prompt automatically: roughly 313 to 346 tokens for basic tools, around 700 tokens for the text editor, and approximately 245 tokens for bash. The overhead is per-request, so it matters most on high-volume, short-message workloads.

1M Context Window

Fable 5, Opus 4.8, Opus 4.7, Opus 4.6, Sonnet 5, and Sonnet 4.6 include the full 1 million token context window at standard flat pricing. A 900,000-token request is billed at the same per-token rate as a 9,000-token request. This is a change from earlier generations, where long-context requests above 200,000 tokens incurred a 2x premium on all tokens in the request.

Prompt caching and batch discounts apply at standard rates across the full context window. For applications that analyze large codebases, long legal documents, or entire books in a single pass, the flat pricing keeps spend predictable.

Planning Token Counts

Claude’s tokenizer produces roughly 1.33 tokens per word for English prose, or about 4 characters per token. A planning rule of thumb: 1,000 words is approximately 1,330 tokens. Technical documentation runs slightly higher at around 1,400 tokens per 1,000 words. Source code runs higher still at around 1,500 tokens per 1,000 words, because punctuation and syntax tokenize less efficiently than prose.

For accurate counts before deployment, the Anthropic API includes a token counting endpoint that reports exact token usage for any input. The estimator above applies the 4-characters-per-token heuristic, which is close enough for budgeting but not exact. Re-run token counts when migrating between Opus releases because identical text can produce different billable token totals across model versions.

Best Cost Control Practices

A few operational practices keep API spend predictable as usage grows.

Track token usage per request from the start. The API response includes a usage object reporting exact input and output token counts. Monitor per-feature and per-user spend rather than only the monthly total, since the total alone will not identify which feature or customer is responsible for a spike.

Cache anything that repeats. System prompts, tool schemas, documentation embedded in the prompt, and few-shot examples are all candidates. Any context that appears in more than one request should be cached.

Route non-urgent requests through batch. Anything that does not need a sub-second response, such as nightly jobs, content generation pipelines, and evaluation runs, belongs on the Batch API at half price.

Set spending limits in the Anthropic Console. Organization-level and workspace-level limits prevent runaway spend from a bug or a misconfigured agent. Alerts at 50%, 75%, and 90% of the monthly budget give time to react before hitting the hard cap.

Benchmark smaller models on your actual task. Teams often default to Opus when Sonnet or even Haiku would suffice. Running a representative sample of real workload through each tier and measuring the quality gap empirically is usually worth the evaluation time, because the cost difference is large.

Pricing verified from Anthropic’s pricing documentation.

See also the detailed pricing documentation and model overview.

Claude API Token Cost Calculator

Sources

Formula

Embed This Calculator

WordPress

Wix

Squarespace

Shopify

Webflow

HTML

Cite This Calculator

Share

Send Feedback