2618 Commits

Author SHA1 Message Date
win
3cffaa1e8e x
Some checks are pending
CI / golangci-lint (push) Waiting to run
CI / windsurf-platform (macos-latest) (push) Waiting to run
CI / windsurf-platform (windows-latest) (push) Waiting to run
CI / test (push) Waiting to run
CI / frontend (push) Waiting to run
Security Scan / backend-security (push) Waiting to run
Security Scan / frontend-security (push) Waiting to run
2026-05-30 16:30:59 +08:00
win
a420179abb chore: merge upstream Wei-Shaw/sub2api v0.1.133 2026-05-29 17:48:27 +08:00
Wesley Liddick
0a8c77f8c3
Merge pull request #2884 from Pluviobyte/fix/gemini-messages-tool-text-block-ordering
fix(gemini): close tool_use block before text in messages streaming
2026-05-29 16:22:34 +08:00
Wesley Liddick
69e7c4db30
Merge pull request #2865 from wey-gu/feat/usage-request-context
fix(gateway): preserve usage request context
2026-05-29 16:21:59 +08:00
Wesley Liddick
52292741cb
Merge pull request #2849 from Pluviobyte/fix/count-tokens-payload-filter
fix(gateway): filter count_tokens generation fields
2026-05-29 16:11:01 +08:00
Wesley Liddick
f68d351158
Merge pull request #2873 from wucm667/feat/account-quota-threshold-auto-pause
feat(account): 支持按 5h/7d 用量阈值自动暂停账号调度
2026-05-29 15:40:33 +08:00
Pluviobyte
0a521f09fb
fix(gemini): close tool_use block before text in messages streaming
When the Gemini->Anthropic streaming bridge for the /v1/messages endpoint
receives a functionCall part followed by a text part, the text branch in
handleStreamingResponse opened a new text content block without closing the
already-open tool_use block. The tool block's content_block_stop was only
emitted at end-of-stream, after the text block's content_block_start, so the
Anthropic SSE stream contained overlapping/unterminated content blocks. Clients
that assemble messages by block index (e.g. Claude Code) can drop the tool
input or mis-parse the response.

The functionCall branch already closes an open text block before opening a tool
block, and the chat-completions sibling closes the tool block in its text branch
via closeOpenTool(). This applies the same symmetric handling to the messages
variant: close any open tool_use block (resetting openToolIndex/openToolName/
seenToolJSON) before starting text.

Adds a regression test that replays a tool->text Gemini stream and asserts the
Anthropic content-block lifecycle never overlaps.
2026-05-29 06:47:23 +00:00
wucm667
c9caadb378 fix(account): address second-round review on quota auto-pause
- TopK initial filter now drops quota-paused accounts: fold the quota check
  into isAccountRequestCompatible so session-hash, TopK pool, and per-candidate
  rechecks all skip paused accounts. Previously the candidate pool was built
  without the quota check, so paused accounts could fill TopK and leave the
  scheduler returning "no available accounts" even with healthy ones available.
- Add per-account explicit disable flags auto_pause_5h_disabled /
  auto_pause_7d_disabled with toggles in EditAccountModal. Without these,
  leaving the account threshold blank silently falls back to the global default,
  so admins could not exempt a single account once a global default existed.
  Disable is per-window: an account can opt out of 5h auto-pause while still
  honoring 7d. Schedule snapshot whitelist includes the new fields, i18n EN/ZH
  updated, threshold-hint text revised to explain "blank = global default".
- Move quota auto-pause settings off the request hot path: replace the per-repo
  TTL+singleflight sync DB read with a per-SettingService stale-while-revalidate
  in-memory snapshot. Get is non-blocking (atomic.Pointer load + async refresh
  on staleness); writes via UpdateOpsAdvancedSettings push directly into the
  cache through an injected sink; wire warms the cache at startup. Adds Warm
  (sync) for tests/init and SetOpenAIQuotaAutoPauseSettings (sink target).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 14:32:45 +08:00
wucm667
8b7a822706 fix(account): address review on OpenAI quota auto-pause
- gate previous_response_id sticky path with quota auto-pause check at both
  the snapshot and DB-recheck stages (previously bypassed, #1)
- skip pausing when the usage window already reset to avoid a stale stuck-pause;
  carry codex_*_reset_at / reset_after_seconds / codex_usage_updated_at through
  the scheduler snapshot whitelist (#2)
- remove the incomplete limit mode; percentage threshold only (#3)
- add global default 5h/7d threshold inputs to the Ops settings dialog with
  validation and en/zh i18n (#4)
- downgrade account_auto_paused_by_quota log from Info to Debug; it fires
  per-candidate on the scheduling hot path (#5)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 12:20:30 +08:00
Wesley Liddick
1d46be02ae
Merge pull request #2830 from stabey/fix/anthropic-to-responses-cache-tokens
fix(apicompat): Anthropic 转 Responses 时按 OpenAI 语义汇总 input_tokens
2026-05-29 11:00:19 +08:00
wucm667
ead471d64b feat(account): 支持按 5h/7d 用量阈值自动暂停账号调度 2026-05-29 10:47:47 +08:00
Wesley Liddick
16842c2f8b
Merge pull request #2836 from siyuan-123/fix/openai-ws-compat-usage
修复 OpenAI WS 兼容性与 usage 统计
2026-05-29 10:47:12 +08:00
Wesley Liddick
21cd382f39
Merge pull request #2855 from gaoren002/fix/concurrency-error-classification
fix: classify concurrency acquire failures
2026-05-29 10:33:40 +08:00
Wesley Liddick
e6a3f1e12b
Merge pull request #2869 from Pluviobyte/fix/ws-first-token-terminal-event
fix(ws): exclude terminal events from first-token detection
2026-05-29 10:32:16 +08:00
Wesley Liddick
fc9d79d96b
Merge pull request #2835 from JIA-ss/fix/responses-passthrough-token-details
fix(apicompat): Responses→Chat 转换补齐 completion_tokens_details 透传
2026-05-29 10:31:52 +08:00
Wesley Liddick
d3a8db8084
Merge pull request #2833 from alfadb/fix/haiku-strip-context-management
fix(gateway): 按最终 anthropic-beta header 对 body.context_management 做能力维度 sanitize
2026-05-29 10:31:12 +08:00
Wesley Liddick
433f8dcd13
Merge pull request #2834 from DaydreamCoding/pr/openai-codex-cli-allow-claude-code
feat(openai): codex_cli_only 新增放行 Claude Code Codex 插件的机制
2026-05-29 10:30:33 +08:00
Wesley Liddick
6bc1983506
Merge pull request #2853 from wucm667/fix/system-update-already-up-to-date-response
fix(admin): system/update 在无更新时返回 200 + already_up_to_date 而非 500
2026-05-29 10:29:49 +08:00
shaw
6010c3cca9 test: 修复内容审计日志异步断言 2026-05-29 09:57:02 +08:00
shaw
514ac5c6a1 feat: 适配 claude-opus-4-8 2026-05-29 09:56:48 +08:00
Pluviobyte
8a999f438d
fix(ws): exclude terminal events from first-token detection
isOpenAIWSTokenEvent classified response.completed / response.done as
token events. When upstream finishes a request without ever emitting
a recognizable delta (e.g. cached completions or models that skip
incremental output), firstTokenMs was then filled at the terminal
event's timestamp, so the first-token latency metric effectively
reported total request duration.

Terminal events are already handled separately by
isOpenAIWSTerminalEvent. Treating them as token events makes the two
classifiers overlap, which violates the implicit invariant that the
token-event and terminal-event sets are disjoint.

The metric only affects ForwardResult.FirstTokenMs (logging and
observability) — billing and routing are unchanged.

Add regression tests for both directions:

* TestIsOpenAIWSTokenEvent_TerminalEventsExcluded covers each
  classification branch.
* TestIsOpenAIWSTokenEvent_DisjointWithTerminal asserts the
  disjoint-set invariant for every known terminal event.

Both new tests fail when the old `return eventType == "response.completed"
|| eventType == "response.done"` is restored.

Fixes #2651

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-29 01:33:42 +00:00
shaw
ed1b57c597 fix(openai): gate routing by endpoint capability 2026-05-29 08:58:10 +08:00
win
f519a02ec9 chore: merge upstream Wei-Shaw/sub2api v0.1.132
Conflicts resolved (preserving fork customizations):
- config.go: keep NodeTLSProxy + add upstream OpenAIHTTP2
- gateway_service.go: NewGatewayService now takes both rpmTokenBucketSvc
  (local) and userPlatformQuotaRepo (upstream)
- wire_gen.go: wire both new args into the call site
- http_upstream.go: drop redundant settings re-assignment; keep proxy
  URL log redaction
- http_upstream_test.go: adopt upstream's explicit-0-disables semantics;
  keep 600s default constant in nil-cfg fallback test
- user_handler_test.go / gateway_record_usage_test.go: pick up new
  userPlatformQuotaRepo nil parameter

Also updated test stubs (windsurf_google_login_test.go,
windsurf_tier_access_service_test.go, gateway_models_test.go) for new
SetModelRateLimit variadic signature and the extra NewGatewayService arg.

Upstream highlights: OpenAI embeddings gateway, user x platform USD
quota, content-moderation risk thresholds, OAuth 401 credentials
no-overwrite fix, HTTP/2 OpenAI upstream config, pool retry status code
configurability, long-context cache pricing multipliers.
2026-05-29 07:21:32 +08:00
Wey Gu
2bd3125d0f Preserve usage request context 2026-05-28 22:44:25 +08:00
Wesley Liddick
8c1a07852c
Merge pull request #2858 from wey-gu/feat/openai-embeddings-gateway
feat(gateway): Add OpenAI embeddings gateway
2026-05-28 22:15:19 +08:00
Wesley Liddick
6b0ee8594c
Merge pull request #2860 from fofoj/fix/oauth-401-credentials-overwrite
fix(oauth): 401 handler 不再回写 credentials,避免 refresh_token 被陈旧快照覆盖
2026-05-28 21:59:02 +08:00
fofoj
be3613593b test(oauth): update OAuth 401 tests to match new no-write behavior
Two tests in ratelimit_service_401_test.go were encoding the bug behavior
itself:

- OAuth401InvalidatorError asserted updateCredentialsCalls == 1
- OAuth401UsesCredentialsUpdater asserted updateCredentialsCalls == 1
  and lastCredentials["expires_at"] non-empty

Both assertions exercised the exact write-back this PR removes. Update
them to reflect the new contract and guard against regression:

- OAuth401InvalidatorError: assert updateCredentialsCalls == 0
- OAuth401UsesCredentialsUpdater is renamed to
  OAuth401DoesNotOverwriteCredentials with reversed assertions, so it
  now serves as a regression test ensuring the 401 handler never writes
  credentials back from the request-start snapshot.
2026-05-28 20:32:16 +08:00
fofoj
6aec505016 fix(oauth): don't overwrite credentials JSONB in 401 handler
The 401 handler in RateLimitService.HandleUpstreamError set
account.Credentials["expires_at"] = time.Now() and then persisted the
full credentials map via persistAccountCredentials, which routes through
accountRepository.UpdateCredentials -> ent SetCredentials and replaces
the entire JSONB column.

The account passed to the handler is the request-start snapshot taken
by the gateway at SelectAccount time. When another worker has just
rotated refresh_token via oauth_refresh_api.RefreshIfNeeded, the
snapshot still holds the old refresh_token; writing the full snapshot
back rolls refresh_token in the DB back to the stale value.

The next refresh cycle then calls the upstream with the stale token,
receives invalid_grant, and tryRecoverFromRefreshRace re-reads the DB
only to find currentRT == usedRT (because the 401 handler just poisoned
the DB), returns false, and the account is incorrectly disabled.

Drop the credentials write. InvalidateToken + SetTempUnschedulable is
sufficient: the account is held out of scheduling during the cooldown,
and after the cooldown the next request goes through token_provider's
NeedsRefresh check, which routes through the locked, DB-re-reading
RefreshIfNeeded path.

The "force background refresh by setting expires_at = now" semantic is
intentionally dropped. token_refresh_service will naturally pick the
account up when the real expires_at enters the refresh window, and if
the real expires_at has already passed by the time the account becomes
schedulable again, token_provider's NeedsRefresh returns true and
RefreshIfNeeded fires synchronously on the next request.
2026-05-28 20:05:38 +08:00
lyen1688
1b2d8873b0 feat: 完善前置拦截审核运行态 2026-05-28 20:05:24 +08:00
Wey Gu
ccace69d4e Add OpenAI embeddings gateway 2026-05-28 19:39:52 +08:00
gaoren002
56e96fdd8c fix: classify concurrency acquire failures 2026-05-28 10:03:41 +00:00
wucm667
b15375dfb4 fix(admin): handle already up-to-date updates 2026-05-28 17:27:01 +08:00
Pluviobyte
27600b1d2c
fix(gateway): filter count_tokens generation fields
Anthropic count_tokens rejects generation-only fields such as temperature, top_p, top_k, stream, and stop sequences. Passing the original messages payload through unchanged can turn otherwise valid requests into upstream 400 errors.

Sanitize only the count_tokens upstream payload after the gateway's existing request normalization, preserving fields that existing compatibility paths rely on while removing parameters the count_tokens endpoint does not accept.

Fixes #2764

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 05:40:50 +00:00
siyuan
d7bed40dda 修复 OpenAI WS 兼容性与 usage 统计
- 对齐 WS 与流式终态 usage 解析,补齐 failed/done/incomplete/cancelled 等事件
- 兼容后续 WS response.create 省略 model,保持模型映射与权限判断一致
- 补齐 passthrough header 透传和图片 usage 字段映射
2026-05-28 01:27:11 +08:00
JIA-ss
20f5340784 fix(apicompat): Responses→Chat 转换补齐 completion_tokens_details 透传
OpenAI Responses API 在 gpt-5.x 等 reasoning 模型上会返回
output_tokens_details.reasoning_tokens, 但 ResponsesToChatCompletions
只映射了 input_tokens_details.cached_tokens, 导致客户端拿到的
chat.completion.usage 中 completion_tokens 出现无法解释的波动
(短 prompt 也可能 30+ token), 且缺失 reasoning_tokens 细分字段,
难以与 OpenAI 原生 Chat Completions 响应对账。

按 OpenAI 官方 CompletionUsage schema (openai/openai-go SDK
completion.go) 补齐所有 token-details 字段, 全部 omitempty:

  prompt_tokens_details:
    - cached_tokens   (原已支持)
    - audio_tokens    (新增)
  completion_tokens_details:
    - reasoning_tokens             (新增)
    - audio_tokens                 (新增)
    - accepted_prediction_tokens   (新增)
    - rejected_prediction_tokens   (新增)

实现细节:
- 抽出 promptDetailsFromResponses / completionDetailsFromResponses
  两个 helper, 全零字段返回 nil
- 非流路径 ResponsesToChatCompletions 复用已存在的
  chatUsageFromResponsesUsage helper, 消除两条路径间的重复
- 非 reasoning / 非 audio 上游 (Anthropic, Gemini, gpt-4o) 不填这些
  字段, helper 返回 nil → CompletionTokensDetails 不输出, 对现有响应
  字节级兼容

新增单测:
- TestResponsesToChatCompletions_ReasoningTokens
- TestResponsesToChatCompletions_AllTokenDetailsPassThrough
- TestResponsesToChatCompletions_NoReasoningTokensWhenZero
- TestResponsesEventToChatChunks_CompletedWithReasoningTokens
2026-05-28 00:38:25 +08:00
alfadb
ddf91e9a7f fix(gateway): 按最终 anthropic-beta header 对 body.context_management 做能力维度 sanitize
上游 Anthropic 在 body 含 `context_management` 但最终发出去的
`anthropic-beta` header 不含 `context-management-2025-06-27` 时会拒收:

    {
      "type": "invalid_request_error",
      "message": "context_management: Extra inputs are not permitted"
    }
    (HTTP 400, request_id 形如 req_011C...)

该 400 在 haiku 路径上触发,因为三个 beta header 构造器有意排除了
context-management beta:

  - HaikuBetaHeader          (messages,     OAuth / mimic CC)
  - APIKeyHaikuBetaHeader    (messages,     API-key)
  - CountTokensBetaHeader    (count_tokens, 所有认证类型)

但 body 中仍然带着 `context_management` 字段,原因有二:

  1. normalizeClaudeOAuthRequestBody 在 thinking_enabled / thinking_adaptive
     打开时为 `clear_thinking_20251015` 主动注入;
  2. 客户端 (Claude Code CLI >= 2.1.87) 原样发送, 网关透传时一并转发。

修复方案: 能力维度对称约束
==========================

对齐已有的 Bedrock 模式
(`backend/internal/service/bedrock_request.go` 中的
`sanitizeBedrockFieldsForBetaTokens`):
根据 **最终** 发出的 `anthropic-beta` header 决定是否保留
`body.context_management`, 而不是按 model 名或路由分类来决定。

新增纯函数:

    sanitizeAnthropicBodyForBetaTokens(body, betaHeader) (body, changed)

如果 `betaHeader` 不含 `context-management-2025-06-27`, 用 sjson 把 body
字段 strip 掉; 否则原样返回。

在所有 Anthropic / Anthropic-兼容 上游出口都接入:

| 路径                                       | sanitize 接入点                                       |
|--------------------------------------------|-------------------------------------------------------|
| /v1/messages OAuth mimic CC                | buildUpstreamRequest                                  |
| /v1/messages OAuth 真 CC 透传              | buildUpstreamRequest                                  |
| /v1/messages API-key                       | buildUpstreamRequest                                  |
| /v1/messages API-key passthrough           | buildUpstreamRequestAnthropicAPIKeyPassthrough        |
| /v1/messages Vertex / service-account      | buildUpstreamRequestAnthropicVertex                   |
| /v1/messages/count_tokens (全部 4 条路径)  | buildCountTokensRequest,                              |
|                                            | buildCountTokensRequestAnthropicAPIKeyPassthrough     |
| Antigravity Anthropic-兼容 上游            | AntigravityGatewayService.ForwardUpstream             |
| Bedrock                                    | (已由 sanitizeBedrockFieldsForBetaTokens 处理)        |

为什么要重排 (而不是加一行调用)
================================

sanitize 必须 **在** `signBillingHeaderCCH` 之前运行。CCH 对整个 body
取 xxHash64 摘要后写入 billing header 里 5 位十六进制的 `cch` 字段;
如果先签名再 strip, 上游对发出去的 body 重算 hash 会和 `cch` 不一致,
请求被判为 third-party。这就要求在 `http.NewRequest` 之前算出最终的
`anthropic-beta` header, 所以把原本内联在 builder 里的 beta 计算逻辑
抽成了两个纯函数:

  - computeFinalAnthropicBeta              (messages 路径: mimic 不透传
                                            客户端 beta)
  - computeFinalCountTokensAnthropicBeta   (count_tokens 路径: mimic 不
                                            跳过白名单透传)

两者逐位保留原行为:

  - mimic 路径在 messages 上跳过客户端 beta, 在 count_tokens 上合并
  - API-key 路径尊重 `InjectBetaForAPIKey` 开关
  - dropSet (`defaultDroppedBetasSet` + BetaPolicy filter) 应用在主路径,
    passthrough / Vertex 路径有意不应用 —— 这条原有的不对称行为本 PR
    不动。

一条语义测试 (`TestSanitizeMustBeBeforeCCHSigning_HashConsistency`) 把
顺序约束文档化并强制守住: 它证明 `sanitize -> signBillingHeaderCCH`
产生的 `cch` 与最终 body 一致, 而 `signBillingHeaderCCH -> sanitize`
产生的 `cch` 会被上游 hash 重算判失败。

为什么是能力维度 (而不是 haiku 模型名匹配)
==========================================

最朴素的"按 model 名 strip"方案
(`strings.Contains(modelID, "haiku") -> DeleteBytes "context_management"`)
有四个真实失败模式:

  1. 过度删除。CLI >= 2.1.87 的真 Claude Code 客户端在 haiku 上同时
     发送 body 字段 **和** `anthropic-beta: context-management-2025-06-27`。
     一律 strip 会让该用户的 `clear_thinking_20251015` 静默失效。
  2. 别名漂移。未来的 haiku 别名 (`claude-3-haiku-...`,
     `claude-haiku-...` 等) 改变匹配面; 任何新别名都会悄悄绕过 strip。
  3. count_tokens 漏覆盖。count_tokens 有自己的 builder 和不同的 beta
     header 集合; 在一个地方做 model 名检查会漏掉这条路径。
  4. API-key passthrough 早退。passthrough builder 在 model 名 strip
     之前就 return 了, strip 根本不执行。

能力维度沿着 header 端到端走, 上述 4 个 case 都由构造方式保证正确,
不依赖任何 modelID 匹配。

防御项
======

  - 当 `sjson.DeleteBytes` 在 `gjson` 刚验证过字段存在的 body 上失败时,
    `sanitizeAnthropicBodyForBetaTokens` 会记 warning 日志 —— 这种情况
    现实中仅在请求中途被破坏时发生, 日志把此前会静默发生的 body / header
    不一致暴露出来。
  - `header_util.go` 新增 `deleteHeaderAllForms`: 在白名单透传已经写入
    canonical 大小写的 `Anthropic-Beta` 之后再覆盖, 否则会同时留下两条。

测试
====

`backend/internal/service` 下新增 44 个测试:

  - 纯函数: anthropicBetaTokensContains x 5, sanitize keep/strip x 6,
    computeFinal{Anthropic,CountTokens}AnthropicBeta x 12
  - normalize 回归 x 5
  - buildUpstreamRequest 端到端 x 4
      (OAuth mimic haiku strip / mimic sonnet preserve /
       真 CC haiku 带客户端 beta preserve / API-key haiku strip)
  - buildCountTokensRequest 端到端 x 2
  - buildUpstreamRequestAnthropicAPIKeyPassthrough x 2 (strip / preserve)
  - buildCountTokensRequestAnthropicAPIKeyPassthrough x 2 (strip / preserve)
  - buildUpstreamRequestAnthropicVertex x 2 (strip / preserve, 含 outgoing
    `anthropic-beta` header 对称断言)
  - CCH 顺序语义测试 x 1

unit 套件全过 (本机 88s), `golangci-lint` 0 issues。

已知局限 (本 PR 范围外)
========================

  - Vertex 路径用透传过来的客户端 `anthropic-beta` header 作为 sanitize
    依据, 而不是 Vertex 侧的能力矩阵。最坏情况是过度 strip (= 当前 main
    的行为, 主路径本来什么都不 strip); 不是 regression。完整的 Vertex
    能力模型属于单独的 PR。
  - Vertex builder 仍然不应用 BetaPolicy filter / dropSet。这是该 builder
    早 return 的既有架构决策, 本 PR 不动。
  - count_tokens mimic 在 haiku 上仍然注入 `context-management-2025-06-27`
    (因为原 count_tokens mimic 逻辑并不像 messages mimic 那样排除它)。
    本 PR 逐位保留 main 的行为; 是否要让它与 messages mimic 的排除策略
    统一是另一个问题。
  - `sanitizeAnthropicBodyForBetaTokens` 目前只处理
    `context_management <-> context-management-2025-06-27` 这一对。如果
    Anthropic 后续推出更多 beta-gated body 字段, 可以在后续 PR 重构为
    `{body 路径 -> required beta token}` 注册表的形式。
2026-05-28 00:02:50 +08:00
DaydreamCoding
56908d3c4c feat(openai): codex_cli_only 新增放行 Claude Code Codex 插件的机制
适用场景:在 Claude Code 中使用 https://github.com/openai/codex-plugin-cc
插件时,插件经官方 codex app-server 以 clientInfo.name="Claude Code" 完成
initialize 握手,请求头被设为 originator=Claude Code、User-Agent 含
"Claude Code/",不在官方客户端白名单内,原本会被 codex_cli_only 拦截 403。

在官方客户端白名单未命中时评估两层独立放行(OR 语义):

- 按账号:account.Extra.codex_cli_only_allowed_clients 引用命名预设
  (目前仅 claude_code),detector reason=allowed_client_matched
- 全局开关:/admin/settings 网关服务 OpenAI 区块新增
  openai_allow_claude_code_codex_plugin(默认 false),开启后对所有
  codex_cli_only 账号统一放行,detector reason=global_allowed_client_matched

签名仍要求 originator=Claude Code 精确等值 + UA 含 "Claude Code/"。
上游转发保持透传不变。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 23:55:34 +08:00
stabey
89dffdd2e1 fix(apicompat): emit OpenAI-semantic input_tokens when converting Anthropic to Responses
Anthropic Messages reports input_tokens excluding cache_read/cache_creation, but
OpenAI Responses input_tokens is the total including cached tokens. The reverse
converter passed Anthropic's input_tokens straight through, so client-facing
prompt_tokens/input_tokens were short by the cached count and cache_creation
was dropped entirely.

Fix the non-stream path and the streaming state machine to add cache_read +
cache_creation back into input_tokens, and track CacheCreationInputTokens on
the streaming state. Six downstream paths benefit (Anthropic->Responses,
Anthropic->ChatCompletions, Gemini->ChatCompletions, each sync + stream).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 22:36:52 +08:00
Wesley Liddick
cc077862b3
Merge pull request #2797 from wucm667/feat/account-list-created-at-column
feat(admin): 账号管理列表新增创建时间列
2026-05-27 22:10:21 +08:00
Wesley Liddick
bbe847ed3e
Merge pull request #2805 from StarryKira/feat/configurable-pool-retry-status-codes
feat(account): configurable pool-mode same-account retry status codes
2026-05-27 22:09:55 +08:00
Wesley Liddick
69657b2fa1
Merge pull request #2827 from ttt132/fix/api-key-responses-sse-fallback
fix: fallback to SSE body for API key responses
2026-05-27 21:56:00 +08:00
Wesley Liddick
61ce79533e
Merge pull request #2800 from wucm667/fix/scheduler-model-not-found-per-model-cooldown
fix(scheduler): 模型 404 仅冷却该账号-模型组合,不再封整个账号
2026-05-27 21:01:52 +08:00
Wesley Liddick
c949d22725
Merge pull request #2821 from Pluviobyte/fix/long-context-cache-creation-multiplier
fix(billing): apply long-context multiplier to cache_creation price (follow-up to #2816)
2026-05-27 21:01:27 +08:00
Wesley Liddick
8461e42a97
Merge pull request #2822 from lyen1688/feat/group-custom-models-list
feat(group): 支持自定义 /v1/models 模型列表
2026-05-27 21:00:19 +08:00
haichuan
32ea9cfe1f fix: fallback to SSE body for API key responses 2026-05-27 20:24:52 +08:00
lyen1688
f597c1581b feat(group): 支持自定义 /v1/models 模型列表 2026-05-27 18:00:45 +08:00
Pluviobyte
ed2aac25a6
fix(billing): apply long-context multiplier to cache_creation price
Follow-up to #2816 (already merged): the same long-context pricing
exemption that affected cache_read also applies to all three
cache_creation price fields (standard, 5m ephemeral, 1h ephemeral).
computeCacheCreationCost reads these prices directly from pricing and
never sees the LongContextInputMultiplier that computeTokenBreakdown
applies to inputPrice / outputPrice / cacheReadPrice.

For GPT-5.4 / 5.5 above the 272k threshold, this causes the cache_write
portion of long sessions to be billed at roughly half what it should
be (default multiplier 2.0). Cache writes are conceptually input-side
operations and should share the same long-context treatment as input /
cache_read.

This patch threads an explicit multiplier into computeCacheCreationCost
so the function can be unit-tested in isolation and matches the existing
pattern used for cache_read. computeTokenBreakdown captures the long
context decision once and passes LongContextInputMultiplier when it
applies, 1.0 otherwise.

Adds three regression tests mirroring the #2816 cache_read tests:
- positive: long-context triggered -> cache_creation scaled by 2.0x
- negative: below threshold -> cache_creation stays at base price
- breakdown: 5m + 1h ephemeral prices both scaled when applicable

Refs #2816

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-27 09:59:58 +00:00
Pluviobyte
1e6d0b602a
fix(antigravity): capture message_start input_tokens in streaming passthrough
The antigravity upstream-passthrough path (account.Type == AccountTypeUpstream
forwarding to a Claude-format upstream) drains the SSE stream via
streamUpstreamResponse + extractSSEUsage. The extractor only reads top-level
event["usage"], which matches Anthropic's message_delta but misses
message_start where usage is nested under event.message.usage.

As a result, every streaming /v1/messages request through this path drops
the input-side fields (input_tokens, cache_read_input_tokens, cache_creation_*)
and writes a usage_logs row with input_tokens=0 + output_tokens>0. The user
in #2332 observed 2,728 such rows attributed to claude-opus-4-6 / haiku-4-5
streaming requests; their billing on output is correct but the input-side
accounting is missing. (Their "duplicate write from message_delta" hypothesis
isn't borne out by the code — RecordUsage is invoked once per request and
writeUsageLogBestEffort dedupes by request_id; what they're seeing is
single records produced by this buggy extractor.)

Branch on event.type so message_start reads from event.message.usage and
other events keep using event.usage, matching how parseSSEUsagePassthrough
already handles both shapes for the Anthropic OAuth / API-key / Bedrock paths.
Adds two extractSSEUsage table cases plus a TestExtractSSEUsage_StreamingSequence
that drives the message_start → message_delta sequence end-to-end; both fail
on main and pass with this change.

Fixes #2332

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-27 09:02:15 +00:00
Wesley Liddick
b0142146af
Merge pull request #2816 from Pluviobyte/fix/long-context-cache-read-multiplier
fix(billing): apply long-context multiplier to cache_read price (#2293)
2026-05-27 15:59:11 +08:00
Wesley Liddick
2387cf9934
Merge pull request #2799 from siyuan-123/fix/ws-rate-limit-failover
修复 OpenAI WS 限额时不自动切换账号
2026-05-27 15:14:28 +08:00