Claude Code prompt caching TTL full interpretation: 5 minutes vs 1 hour, which one to choose? Includes billing comparison for 3 platforms


title: "Claude Code Prompt Caching: TTL Mechanics, Costs, and Optimization Strategies"
description: "A deep dive into Claude Code's prompt caching TTL, the differences between 5-minute and 1-hour tiers, and how to optimize costs across Anthropic API and AWS Bedrock."

Author's Note: This guide breaks down the TTL mechanism for Claude Code prompt caching, explains the differences between the 5-minute and 1-hour tiers, compares billing between the Anthropic API and AWS Bedrock, and provides actionable tips for cost optimization.

"Can I change the prompt caching TTL in Claude Code? What's the difference between 5 minutes and 1 hour? Which one is actually more cost-effective?" These are the most common questions from Claude Code users looking to manage their expenses.

Here’s the bottom line: You cannot directly modify the cache TTL in Claude Code—it's determined by your subscription plan. Max subscribers automatically get a 1-hour TTL, while Pro subscribers and API key users default to a 5-minute TTL. However, if you are calling the Claude API directly, you can freely choose between 5 minutes or 1 hour using the cache_control parameter.

Core Value: After reading this, you'll fully understand the Claude prompt caching TTL mechanism, the billing differences between the official Anthropic API and AWS Bedrock, and how to choose the most cost-effective caching strategy for your specific use case.

claude-code-prompt-caching-ttl-pricing-guide-en 图示


Claude Prompt Caching: Key Takeaways

Prompt caching is one of the most effective ways to save money with Claude models. It stores the prompt prefix you've previously sent (system prompts, tool definitions, conversation history, etc.) on the server. If the prefix matches in your next request, it reads directly from the cache, costing only 10% of the standard input price.

Feature Description Practical Impact
Two TTL Tiers 5 minutes (default) and 1 hour (optional) Choosing the right TTL saves significant write costs
Cache Read = 10% Only 0.1x the price for cached input Saves 80-90% on long conversation costs
5-min Write = 1.25x 25% premium for writing to cache Breaks even after one cache read
1-hour Write = 2x 2x price for writing to cache Requires two cache reads to break even
Auto-managed System prompts, tools, and CLAUDE.md are cached automatically No manual configuration needed

Can I change the TTL in Claude Code?

This is the question users care about most. The answer depends on how you're using it:

Claude Code (Interactive CLI tool): You cannot manually change it. Claude Code's caching is controlled server-side—Max subscribers get a 1-hour TTL (controlled via the tengu_prompt_cache_1h_config feature flag), while Pro subscribers and API key users get a 5-minute TTL. You can only disable caching entirely using the DISABLE_PROMPT_CACHING=1 environment variable; you cannot switch between TTL tiers.

Claude API (Direct invocation): You can choose freely. When calling via the API, you can specify the TTL in the cache_control parameter:

// 5-minute cache (default)
{ "cache_control": { "type": "ephemeral" } }

// 1-hour cache
{ "cache_control": { "type": "ephemeral", "ttl": "1h" } }

🎯 Recommendation: If you primarily use the Claude Code CLI, your TTL is tied to your subscription plan. If you use the API (e.g., via APIYI), you can flexibly choose between 5 minutes or 1 hour based on your workflow to achieve more precise cost control.

claude-code-prompt-caching-ttl-pricing-guide-en 图示


A Deep Dive into Claude Prompt Caching TTL Billing Rules

5 Minutes vs. 1 Hour: A Billing Comparison

The core difference between these two TTLs lies in the write costs. Read costs are identical across the board, sitting at 0.1x the base input price:

Operation 5-Minute TTL 1-Hour TTL Note
Cache Write 1.25x base price 2.0x base price Premium for initial cache write
Cache Read 0.1x base price 0.1x base price Discounted rate after a cache hit (same)
Break-even Point 1 read to break even 2 reads to break even Usage frequency determines value
Auto-renewal Resets 5 mins on hit Fixed 1-hour expiration 5-min can last indefinitely with high frequency

Specific Prompt Caching Prices by Model

Here is the complete cache billing table for Anthropic's official API models (as of March 2026):

Model Base Input Price 5-Min Write 1-Hour Write Cache Read Output Price
Claude Opus 4.6 $5/MTok $6.25/MTok $10/MTok $0.50/MTok $25/MTok
Claude Sonnet 4.6 $3/MTok $3.75/MTok $6/MTok $0.30/MTok $15/MTok
Claude Haiku 4.5 $1/MTok $1.25/MTok $2/MTok $0.10/MTok $5/MTok

Key Takeaway: The cache read discount is massive. Taking Claude Opus 4.6 as an example:

  • Standard input for 1 million tokens = $5.00
  • Cached read for 1 million tokens = $0.50 (You save $4.50, a 90% discount)
  • This is exactly why the $20/month Claude Code Pro subscription is economically viable—100 rounds of Opus conversation without caching could cost $50–100, but with caching, it drops to just $10–19.

Minimum Token Requirements for Caching

Not everything can be cached. Each model has a minimum token requirement; if your content isn't long enough, it won't trigger the cache:

Model Minimum Cache Tokens
Claude Opus 4.6 / 4.5 4,096
Claude Sonnet 4.6 2,048
Claude Sonnet 4.5 / 4 1,024
Claude Haiku 4.5 4,096
Claude Haiku 3.5 / 3 2,048

🎯 Pro Tip: If your system prompt is short (e.g., under 2,048 tokens), it won't trigger caching when using Claude Sonnet 4.6. You can reach the minimum threshold by expanding your system prompt content or consolidating your tool definitions. Using APIYI (apiyi.com) for your model invocation also supports caching with even better rates.


Anthropic API vs. AWS Bedrock: Caching Billing Comparison

Platform Support Overview

Claude's prompt caching is supported across the Anthropic official API, AWS Bedrock, and Google Vertex AI, though there are subtle differences:

Comparison Dimension Anthropic Official API AWS Bedrock Google Vertex AI
5-Min TTL ✅ All models ✅ All models ✅ All models
1-Hour TTL ✅ All models ✅ Select models (Opus/Sonnet/Haiku 4.5) ✅ Supported
Write Premium (5-min) 1.25x ~1.25x 1.25x
Write Premium (1-hour) 2.0x 2.0x 2.0x
Read Discount 0.1x ~0.1x 0.1x
Max Breakpoints 4 4 4
Auto-caching ✅ Supported ✅ Supported ✅ Supported
Custom TTL ✅ 5-min/1-hour options ✅ Select models ✅ Supported

Key Platform Differences

Anthropic Official API: Offers the most comprehensive caching features, with both 5-minute and 1-hour TTLs available for all models. As of February 5, 2026, cache isolation shifted from the organization level to the workspace level, meaning caches are independent across different workspaces within the same organization.

AWS Bedrock: Announced support for 1-hour TTL in January 2026, but it's currently limited to specific models like Claude Opus 4.5, Sonnet 4.5, and Haiku 4.5. Support for 1-hour TTL on the latest Claude Sonnet 4.6 and Opus 4.6 via Bedrock is still pending confirmation. If you're connecting to Bedrock via Claude Code, keep an eye on the CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1 compatibility setting.

Google Vertex AI: Caching functionality is largely identical to the official API, but it requires authentication and billing through your Google Cloud project.

🎯 Platform Recommendation: If you want to avoid the headache of platform-specific differences and complex configurations, using the unified interface at APIYI (apiyi.com) is the simplest solution—it supports full caching features without the need to manage separate AWS IAM or Google Cloud credentials.


description: A quick guide to getting started with Claude Code prompt caching, including TTL selection strategies and cost-saving tips.

Getting Started with Claude Code Prompt Caching

Minimal Example: Setting a 1-Hour TTL Cache

import anthropic

client = anthropic.Anthropic(
    api_key="YOUR_API_KEY",
    base_url="https://vip.apiyi.com/v1"
)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": "You are a professional physics tutor assistant, responsible for answering high school physics problems...(long system prompt here)",
        "cache_control": {"type": "ephemeral", "ttl": "1h"}
    }],
    messages=[{"role": "user", "content": "Explain Newton's Third Law"}]
)
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")

View Full Code: Mixing 5-minute and 1-hour TTLs
import anthropic

client = anthropic.Anthropic(
    api_key="YOUR_API_KEY",
    base_url="https://vip.apiyi.com/v1"
)

# Mixed TTL: 1 hour for system prompt (rarely changes), 5 minutes for conversation context (frequently changes)
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    system=[
        {
            "type": "text",
            "text": "You are a professional AI technical consultant...(long system prompt, 2000+ tokens)",
            "cache_control": {"type": "ephemeral", "ttl": "1h"}  # 1 hour for system prompt
        },
        {
            "type": "text",
            "text": "Here is the user's historical conversation context...(conversation history)",
            "cache_control": {"type": "ephemeral"}  # 5 minutes for conversation context (default)
        }
    ],
    messages=[{"role": "user", "content": "Compare the reasoning capabilities of Claude and GPT"}]
)

# Check cache usage
usage = response.usage
print(f"Standard input tokens: {usage.input_tokens}")
print(f"Cache read tokens: {usage.cache_read_input_tokens}")
print(f"Cache creation tokens: {usage.cache_creation_input_tokens}")

# Calculate savings (using Sonnet 4.6 as an example)
base_cost = (usage.input_tokens / 1_000_000) * 3
cache_cost = (usage.cache_read_input_tokens / 1_000_000) * 0.3
saved = (usage.cache_read_input_tokens / 1_000_000) * 2.7
print(f"Savings this time: ${saved:.4f}")

Important Constraint: When mixing two types of TTLs in the same request, the 1-hour cached content must be placed before the 5-minute cached content, otherwise, an error will be returned.

Tip: When calling the Claude API via APIYI (apiyi.com), you get full support for cache_control parameter configuration, including the flexibility to choose between 5-minute and 1-hour TTLs.


5-Minute vs. 1-Hour TTL: Which one should you choose?

Selection Decision Table

Use Case Recommended TTL Reason
Claude Code High-Frequency Coding (messages every minute) 5 Minutes Timer resets automatically on every hit; it won't expire
Customer Service Bot (user reply interval < 5 mins) 5 Minutes Low write cost (1.25x), high hit frequency
Document Analysis Agent (processing interval 5-60 mins) 1 Hour Prevents cache expiration leading to re-writes
Scheduled Batch Tasks (one batch every 30 mins) 1 Hour 5-minute TTL would expire; 1-hour covers it perfectly
Low-Frequency API Calls (intervals > 1 hour) No Cache Both TTLs will expire; write costs are wasted
System Prompts (rarely change) 1 Hour Write once, read multiple times
Conversation History (changes every turn) 5 Minutes Lower write costs are more cost-effective for frequent changes

Cost Calculation Formula

To determine if caching is worth it, use this core formula:

5-Minute TTL Break-even Condition: Cached content is read at least once within 5 minutes

  • Write cost: 1.25x → extra 0.25x
  • Read savings: 0.9x saved per read
  • Breaks even after 1 read (0.9 > 0.25)

1-Hour TTL Break-even Condition: Cached content is read at least twice within 1 hour

  • Write cost: 2.0x → extra 1.0x
  • Read savings: 0.9x saved per read
  • Breaks even after 2 reads (0.9 × 2 = 1.8 > 1.0)

claude-code-prompt-caching-ttl-pricing-guide-en 图示


FAQ

Q1: Can I change the 5-minute TTL to 1 hour in Claude Code?

The Claude Code CLI tool doesn't allow users to manually modify the TTL. Max subscribers automatically get a 1-hour TTL (controlled by a server-side feature flag), while Pro and API key users are fixed at a 5-minute TTL. If you need a 1-hour TTL but don't want to upgrade to a Max subscription, you can make direct API calls (by setting cache_control.ttl: "1h") and use pay-as-you-go services like APIYI (apiyi.com).

Q2: Does the 5-minute TTL expire exactly after 5 minutes, or does it auto-renew?

The 5-minute TTL resets its timer every time the cache is hit. If you send a message every 1–2 minutes (like in a Claude Code programming session), the timer keeps resetting, and the cache won't expire. The cache only invalidates if you go 5 minutes without sending a message. So, for high-frequency use cases, a 5-minute TTL is usually more than enough.

Q3: Is the cache billing on AWS Bedrock the same as the official Anthropic API?

It's mostly the same, but with a few minor differences:

  • Write premiums are both ~1.25x (for 5 minutes) and ~2.0x (for 1 hour).
  • Read discounts are both ~0.1x.
  • Key difference: The 1-hour TTL on Bedrock currently only supports specific models like Opus 4.5, Sonnet 4.5, and Haiku 4.5; you'll need to verify support for the latest 4.6 series models.
  • By using APIYI (apiyi.com), you get full cache support that matches the official API.

Summary

Key takeaways for Claude prompt caching TTL:

  1. Two TTL tiers available: 5 minutes (1.25x write cost, breaks even after 1 read) and 1 hour (2x write cost, breaks even after 2 reads). Both offer a 0.1x read cost.
  2. Claude Code CLI cannot change TTL: Max subscriptions get 1 hour automatically; Pro/API key users are fixed at 5 minutes and cannot toggle this.
  3. Claude API offers flexibility: You can set the TTL via the cache_control.ttl parameter, and you can even mix both TTLs in the same request.
  4. Choose 5 minutes for high-frequency chats: It auto-renews on every hit and has lower write costs. Choose 1 hour for intermittent use to avoid expiration.

Cache hits mean your input costs are slashed by 90%, making it the most effective way to save money with Claude. We recommend using the unified interface at APIYI (apiyi.com) for full support of cache configurations, allowing you to test the actual cost differences between various TTL strategies with a single API key.

📚 References

  1. Anthropic Official Documentation – Prompt Caching: The authoritative source for TTL configuration, billing rules, and cache_control syntax.

    • Link: platform.claude.com/docs/en/build-with-claude/prompt-caching
    • Note: Includes complete billing formulas and code examples for 5-minute/1-hour TTLs.
  2. Anthropic Official Documentation – Pricing: Base and caching prices for all models.

    • Link: platform.claude.com/docs/en/about-claude/pricing
    • Note: Details the cache write and read rates for Opus, Sonnet, and Haiku models.
  3. AWS Official Documentation – Bedrock Prompt Caching: Details on cache support for the Bedrock platform.

    • Link: docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html
    • Note: Covers TTL support ranges and billing standards for models on Bedrock.
  4. Claude Code Camp – How Prompt Caching Works: An in-depth analysis of Claude Code's cache implementation.

    • Link: claudecodecamp.com/p/how-prompt-caching-actually-works-in-claude-code
    • Note: Learn how Claude Code automatically manages cache breakpoints.
  5. GitHub Issue #19436 – Multi-layer Cache TTL Feature Request: Community discussion on more flexible TTL configurations.

    • Link: github.com/anthropics/claude-code/issues/19436
    • Note: Explores community-proposed multi-layer TTL schemes based on content change frequency.

Author: APIYI Technical Team
Technical Discussion: Feel free to share your experiences with Claude cache configuration in the comments. For more tutorials on model invocation, visit the APIYI documentation center at docs.apiyi.com.

Leave a Comment