The antigravity upstream-passthrough path (account.Type == AccountTypeUpstream
forwarding to a Claude-format upstream) drains the SSE stream via
streamUpstreamResponse + extractSSEUsage. The extractor only reads top-level
event["usage"], which matches Anthropic's message_delta but misses
message_start where usage is nested under event.message.usage.
As a result, every streaming /v1/messages request through this path drops
the input-side fields (input_tokens, cache_read_input_tokens, cache_creation_*)
and writes a usage_logs row with input_tokens=0 + output_tokens>0. The user
in #2332 observed 2,728 such rows attributed to claude-opus-4-6 / haiku-4-5
streaming requests; their billing on output is correct but the input-side
accounting is missing. (Their "duplicate write from message_delta" hypothesis
isn't borne out by the code — RecordUsage is invoked once per request and
writeUsageLogBestEffort dedupes by request_id; what they're seeing is
single records produced by this buggy extractor.)
Branch on event.type so message_start reads from event.message.usage and
other events keep using event.usage, matching how parseSSEUsagePassthrough
already handles both shapes for the Anthropic OAuth / API-key / Bedrock paths.
Adds two extractSSEUsage table cases plus a TestExtractSSEUsage_StreamingSequence
that drives the message_start → message_delta sequence end-to-end; both fail
on main and pass with this change.
Fixes#2332
Co-authored-by: Cursor <cursoragent@cursor.com>
When session long-context pricing is triggered in computeTokenBreakdown
(e.g. GPT-5.4 / GPT-5.5 above the 272k token threshold), the multiplier
was only being applied to InputPricePerToken and OutputPricePerToken.
The cache_read price was left at its base value, so CacheReadCost was
silently undercharged whenever a long-context session also had cache
hits — which is essentially every long Codex / Claude Code session.
Concretely for gpt-5.4 with 300k cache_read tokens, the bug
under-billed the request by exactly 1x the LongContextInputMultiplier
on the cache portion (e.g. 0.075 instead of 0.150 in the regression
test).
Cache reads are conceptually input-side replays, so they should scale
with LongContextInputMultiplier, matching the treatment of
InputPricePerToken.
Adds two regression tests:
- positive: long-context triggered -> cache_read scaled by 2.0x
- negative: below threshold -> cache_read stays at base price
Fixes#2293
Co-authored-by: Cursor <cursoragent@cursor.com>
- Export ApplyBedrockCCCompat() in GatewayService, called after channel
model mapping to ensure mapped model ID is used for Opus 4.7+ detection
- Add sanitizeBedrockCCFields(): remove service_tier/interface_geo/
context_management, inject max_tokens/anthropic_version defaults
- Add sanitizeBedrockCCBetaTokens(): filter anthropic_beta to keep only
Bedrock-supported tokens, reusing autoInjectBedrockBetaTokens and
filterBedrockBetaTokens for consistent rules
- Remove unsupported beta tokens (interleaved-thinking, context-management)
from whitelist based on AWS official docs
- Simplify IsBedrockCCCompatEnabled() to check boolean toggle directly,
applying CC compat to all accounts regardless of platform
- Add unit tests for IsBedrockCCCompatEnabled (8 cases),
sanitizeBedrockCCFields (8 cases), sanitizeBedrockCCBetaTokens (7 cases)
- Update bedrock beta policy tests for removed auto-injection
Add channel-level Bedrock CC compatibility toggle (similar to web_search_emulation)
that fixes 4 types of Bedrock 400 errors seen with Claude Code:
1. thinking.type "enabled" → "adaptive" for Opus 4.7+ (only supports adaptive)
2. Add default budget_tokens when missing for older models
3. Replace illegal characters in tool_use IDs to match Bedrock's ^[a-zA-Z0-9_-]+$ pattern
4. anthropic_version / invalid beta flag (already handled elsewhere)
Transformations run in Forward() before any forwarding path, so both native Bedrock
accounts and apikey passthrough accounts pointing to Bedrock relays benefit.
Includes channel-level toggle UI and unit tests.
Some clients reuse assistant history from other models when switching to
claude with extended thinking enabled. If a prior thinking block lacks the
thinking text field, upstream returns:
messages.X.content.Y.thinking: each thinking block must contain thinking
Add this pattern to isThinkingBlockSignatureError so the existing
FilterThinkingBlocksForRetry retry path triggers and rewrites/drops the
offending blocks.