SlientRainyDay b9509e823a fix(billing): apply long-context multiplier to cache_read price
When session long-context pricing is triggered in computeTokenBreakdown
(e.g. GPT-5.4 / GPT-5.5 above the 272k token threshold), the multiplier
was only being applied to InputPricePerToken and OutputPricePerToken.
The cache_read price was left at its base value, so CacheReadCost was
silently undercharged whenever a long-context session also had cache
hits — which is essentially every long Codex / Claude Code session.

Concretely for gpt-5.4 with 300k cache_read tokens, the bug
under-billed the request by exactly 1x the LongContextInputMultiplier
on the cache portion (e.g. 0.075 instead of 0.150 in the regression
test).

Cache reads are conceptually input-side replays, so they should scale
with LongContextInputMultiplier, matching the treatment of
InputPricePerToken.

Adds two regression tests:
- positive: long-context triggered -> cache_read scaled by 2.0x
- negative: below threshold -> cache_read stays at base price

Fixes #2293

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-27 07:09:28 +00:00
..