sub2api

History

SlientRainyDay b9509e823a fix(billing): apply long-context multiplier to cache_read price

When session long-context pricing is triggered in computeTokenBreakdown
(e.g. GPT-5.4 / GPT-5.5 above the 272k token threshold), the multiplier
was only being applied to InputPricePerToken and OutputPricePerToken.
The cache_read price was left at its base value, so CacheReadCost was
silently undercharged whenever a long-context session also had cache
hits — which is essentially every long Codex / Claude Code session.

Concretely for gpt-5.4 with 300k cache_read tokens, the bug
under-billed the request by exactly 1x the LongContextInputMultiplier
on the cache portion (e.g. 0.075 instead of 0.150 in the regression
test).

Cache reads are conceptually input-side replays, so they should scale
with LongContextInputMultiplier, matching the treatment of
InputPricePerToken.

Adds two regression tests:
- positive: long-context triggered -> cache_read scaled by 2.0x
- negative: below threshold -> cache_read stays at base price

Fixes #2293

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-27 07:09:28 +00:00

cmd

chore: sync VERSION to 0.1.131 [skip ci]

2026-05-26 06:43:49 +00:00

ent

feat(quota): 用户 × 平台 USD 配额

2026-05-26 10:49:20 +08:00

internal

fix(billing): apply long-context multiplier to cache_read price

2026-05-27 07:09:28 +00:00

migrations

feat(quota): 用户 × 平台 USD 配额