2635 Commits

Author SHA1 Message Date
fofoj
be3613593b test(oauth): update OAuth 401 tests to match new no-write behavior
Two tests in ratelimit_service_401_test.go were encoding the bug behavior
itself:

- OAuth401InvalidatorError asserted updateCredentialsCalls == 1
- OAuth401UsesCredentialsUpdater asserted updateCredentialsCalls == 1
  and lastCredentials["expires_at"] non-empty

Both assertions exercised the exact write-back this PR removes. Update
them to reflect the new contract and guard against regression:

- OAuth401InvalidatorError: assert updateCredentialsCalls == 0
- OAuth401UsesCredentialsUpdater is renamed to
  OAuth401DoesNotOverwriteCredentials with reversed assertions, so it
  now serves as a regression test ensuring the 401 handler never writes
  credentials back from the request-start snapshot.
2026-05-28 20:32:16 +08:00
fofoj
6aec505016 fix(oauth): don't overwrite credentials JSONB in 401 handler
The 401 handler in RateLimitService.HandleUpstreamError set
account.Credentials["expires_at"] = time.Now() and then persisted the
full credentials map via persistAccountCredentials, which routes through
accountRepository.UpdateCredentials -> ent SetCredentials and replaces
the entire JSONB column.

The account passed to the handler is the request-start snapshot taken
by the gateway at SelectAccount time. When another worker has just
rotated refresh_token via oauth_refresh_api.RefreshIfNeeded, the
snapshot still holds the old refresh_token; writing the full snapshot
back rolls refresh_token in the DB back to the stale value.

The next refresh cycle then calls the upstream with the stale token,
receives invalid_grant, and tryRecoverFromRefreshRace re-reads the DB
only to find currentRT == usedRT (because the 401 handler just poisoned
the DB), returns false, and the account is incorrectly disabled.

Drop the credentials write. InvalidateToken + SetTempUnschedulable is
sufficient: the account is held out of scheduling during the cooldown,
and after the cooldown the next request goes through token_provider's
NeedsRefresh check, which routes through the locked, DB-re-reading
RefreshIfNeeded path.

The "force background refresh by setting expires_at = now" semantic is
intentionally dropped. token_refresh_service will naturally pick the
account up when the real expires_at enters the refresh window, and if
the real expires_at has already passed by the time the account becomes
schedulable again, token_provider's NeedsRefresh returns true and
RefreshIfNeeded fires synchronously on the next request.
2026-05-28 20:05:38 +08:00
github-actions[bot]
89d96f4b25 chore: sync VERSION to 0.1.132 [skip ci] 2026-05-27 14:28:22 +00:00
Wesley Liddick
cc077862b3
Merge pull request #2797 from wucm667/feat/account-list-created-at-column
feat(admin): 账号管理列表新增创建时间列
2026-05-27 22:10:21 +08:00
Wesley Liddick
bbe847ed3e
Merge pull request #2805 from StarryKira/feat/configurable-pool-retry-status-codes
feat(account): configurable pool-mode same-account retry status codes
2026-05-27 22:09:55 +08:00
Wesley Liddick
69657b2fa1
Merge pull request #2827 from ttt132/fix/api-key-responses-sse-fallback
fix: fallback to SSE body for API key responses
2026-05-27 21:56:00 +08:00
Wesley Liddick
61ce79533e
Merge pull request #2800 from wucm667/fix/scheduler-model-not-found-per-model-cooldown
fix(scheduler): 模型 404 仅冷却该账号-模型组合,不再封整个账号
2026-05-27 21:01:52 +08:00
Wesley Liddick
c949d22725
Merge pull request #2821 from Pluviobyte/fix/long-context-cache-creation-multiplier
fix(billing): apply long-context multiplier to cache_creation price (follow-up to #2816)
2026-05-27 21:01:27 +08:00
Wesley Liddick
8461e42a97
Merge pull request #2822 from lyen1688/feat/group-custom-models-list
feat(group): 支持自定义 /v1/models 模型列表
2026-05-27 21:00:19 +08:00
haichuan
32ea9cfe1f fix: fallback to SSE body for API key responses 2026-05-27 20:24:52 +08:00
lyen1688
f597c1581b feat(group): 支持自定义 /v1/models 模型列表 2026-05-27 18:00:45 +08:00
Pluviobyte
ed2aac25a6
fix(billing): apply long-context multiplier to cache_creation price
Follow-up to #2816 (already merged): the same long-context pricing
exemption that affected cache_read also applies to all three
cache_creation price fields (standard, 5m ephemeral, 1h ephemeral).
computeCacheCreationCost reads these prices directly from pricing and
never sees the LongContextInputMultiplier that computeTokenBreakdown
applies to inputPrice / outputPrice / cacheReadPrice.

For GPT-5.4 / 5.5 above the 272k threshold, this causes the cache_write
portion of long sessions to be billed at roughly half what it should
be (default multiplier 2.0). Cache writes are conceptually input-side
operations and should share the same long-context treatment as input /
cache_read.

This patch threads an explicit multiplier into computeCacheCreationCost
so the function can be unit-tested in isolation and matches the existing
pattern used for cache_read. computeTokenBreakdown captures the long
context decision once and passes LongContextInputMultiplier when it
applies, 1.0 otherwise.

Adds three regression tests mirroring the #2816 cache_read tests:
- positive: long-context triggered -> cache_creation scaled by 2.0x
- negative: below threshold -> cache_creation stays at base price
- breakdown: 5m + 1h ephemeral prices both scaled when applicable

Refs #2816

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-27 09:59:58 +00:00
Pluviobyte
1e6d0b602a
fix(antigravity): capture message_start input_tokens in streaming passthrough
The antigravity upstream-passthrough path (account.Type == AccountTypeUpstream
forwarding to a Claude-format upstream) drains the SSE stream via
streamUpstreamResponse + extractSSEUsage. The extractor only reads top-level
event["usage"], which matches Anthropic's message_delta but misses
message_start where usage is nested under event.message.usage.

As a result, every streaming /v1/messages request through this path drops
the input-side fields (input_tokens, cache_read_input_tokens, cache_creation_*)
and writes a usage_logs row with input_tokens=0 + output_tokens>0. The user
in #2332 observed 2,728 such rows attributed to claude-opus-4-6 / haiku-4-5
streaming requests; their billing on output is correct but the input-side
accounting is missing. (Their "duplicate write from message_delta" hypothesis
isn't borne out by the code — RecordUsage is invoked once per request and
writeUsageLogBestEffort dedupes by request_id; what they're seeing is
single records produced by this buggy extractor.)

Branch on event.type so message_start reads from event.message.usage and
other events keep using event.usage, matching how parseSSEUsagePassthrough
already handles both shapes for the Anthropic OAuth / API-key / Bedrock paths.
Adds two extractSSEUsage table cases plus a TestExtractSSEUsage_StreamingSequence
that drives the message_start → message_delta sequence end-to-end; both fail
on main and pass with this change.

Fixes #2332

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-27 09:02:15 +00:00
Wesley Liddick
b0142146af
Merge pull request #2816 from Pluviobyte/fix/long-context-cache-read-multiplier
fix(billing): apply long-context multiplier to cache_read price (#2293)
2026-05-27 15:59:11 +08:00
Wesley Liddick
2387cf9934
Merge pull request #2799 from siyuan-123/fix/ws-rate-limit-failover
修复 OpenAI WS 限额时不自动切换账号
2026-05-27 15:14:28 +08:00
SlientRainyDay
b9509e823a fix(billing): apply long-context multiplier to cache_read price
When session long-context pricing is triggered in computeTokenBreakdown
(e.g. GPT-5.4 / GPT-5.5 above the 272k token threshold), the multiplier
was only being applied to InputPricePerToken and OutputPricePerToken.
The cache_read price was left at its base value, so CacheReadCost was
silently undercharged whenever a long-context session also had cache
hits — which is essentially every long Codex / Claude Code session.

Concretely for gpt-5.4 with 300k cache_read tokens, the bug
under-billed the request by exactly 1x the LongContextInputMultiplier
on the cache portion (e.g. 0.075 instead of 0.150 in the regression
test).

Cache reads are conceptually input-side replays, so they should scale
with LongContextInputMultiplier, matching the treatment of
InputPricePerToken.

Adds two regression tests:
- positive: long-context triggered -> cache_read scaled by 2.0x
- negative: below threshold -> cache_read stays at base price

Fixes #2293

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-27 07:09:28 +00:00
StarryKira
21033dceb9 feat(account): configurable pool-mode same-account retry status codes
Pool mode currently retries the same account for a fixed set of
upstream HTTP statuses: 401, 403, 429. Some upstream pool deployments
also need same-account retry for transient provider/proxy statuses
such as 502, 503, 520, 529, but hard-coding more statuses changes
behavior for everyone.

Add a per-account credentials option `pool_mode_retry_status_codes`
that lets admins choose which upstream HTTP status codes trigger
same-account retry in pool mode:

- Unset (default): preserve the current 401/403/429 default
- Explicit list: override the defaults with the configured codes
- Codes normalized to the 100-599 range, deduplicated, sorted

The standalone `isPoolModeRetryableStatus` helper is kept as the
default-only fallback. All 15 gateway call sites switch to the new
`Account.IsPoolModeRetryableStatus` method so behavior is preserved
for accounts that do not configure the new field.

Frontend admin UI gains a "Retry Status Codes" comma-separated input
under the pool-mode section in both Create/Edit account modals
(en + zh i18n).

Fixes #2731

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 11:24:25 -07:00
shaw
f7ac5e5931 fix(openai): preserve chat responses usage billing 2026-05-26 21:33:28 +08:00
wucm667
a31b507484 fix(scheduler): 模型404仅冷却账号模型组合 2026-05-26 20:29:48 +08:00
Wesley Liddick
4b9b63443f
Merge pull request #2790 from Arron196/from-arron-main
修复 Ops SLA 本地限制错误统计
2026-05-26 20:21:11 +08:00
siyuan
08061717b8 fix: enable account failover for OpenAI WS rate limits 2026-05-26 20:07:00 +08:00
Wesley Liddick
4a5c5367cf
Merge pull request #2796 from DaydreamCoding/fix/account-reauth-keep-extra
fix(account): 重新授权不再清空 Extra 配置
2026-05-26 20:06:48 +08:00
Wesley Liddick
b9f421d647
Merge pull request #2751 from wucm667/fix/bedrock-strip-context-management-when-beta-removed
fix(bedrock): v0.1.130 回归 — beta token 被移除时同步剥离 context_management 字段
2026-05-26 20:05:43 +08:00
wucm667
b6a38ddab7 feat(admin): 账号管理列表新增创建时间列 2026-05-26 19:59:12 +08:00
DaydreamCoding
11fe7de926 fix(account): 重新授权不再清空 Extra 配置
Claude / OpenAI 账号重新授权走通用 PUT /accounts/:id 时,后端
UpdateAccount 会全量覆盖 account.Extra(仅保留 5 个 quota 用量键),
导致 base_rpm / window_cost_limit / window_cost_sticky_reserve /
max_sessions / quota_* / privacy_mode 等持久化配置全部丢失。

新增专用接口 POST /accounts/:id/apply-oauth-credentials,沿用
现有 /refresh 路径模式:Credentials-only update + Extra JSONB
key 级合并(UpdateAccountExtra) + ClearError + InvalidateToken。

作用域:Claude OAuth / Claude Cookie auth / OpenAI OAuth 三个
调用点。Gemini / Antigravity 现有路径本就不传 extra,保持不变。

顺带修复:旧重新授权路径未调用 InvalidateToken,导致重新授权后
首请求可能仍用缓存中的旧 token 而立即 401。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 19:46:08 +08:00
benjamin
03ae510c68 fix(ops): exclude count-tokens from metrics errors
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:21:56 +08:00
benjamin
9c56fe0b0b fix(openai): mark fast-policy entrypoints business-limited
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:21:45 +08:00
benjamin
5d7df678b1 fix(openai): mark local gateway denials business-limited
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:19:50 +08:00
benjamin
47fe90eab4 fix(antigravity): mark whitelist denials business-limited
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:19:37 +08:00
benjamin
c3e7476992 fix(gateway): mark local platform gates business-limited
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:19:23 +08:00
benjamin
c782c2d9c3 fix(ops): classify local policy denials outside SLA
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:19:09 +08:00
benjamin
00eb3abbe1 fix(auth): mark Google group denials business-limited
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:18:55 +08:00
benjamin
bd1e98ec29 fix(auth): mark API key group denials business-limited
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:18:41 +08:00
benjamin
5c4101ac53 feat(ops): add local business limit reasons
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:18:27 +08:00
github-actions[bot]
9ef144874a chore: sync VERSION to 0.1.131 [skip ci] 2026-05-26 06:43:49 +00:00
Wesley Liddick
bebc082306
Merge pull request #2766 from DaydreamCoding/feat/user-platform-quota
feat(quota): 用户 × 平台 USD 配额
2026-05-26 14:13:18 +08:00
Wesley Liddick
83248478e2
Merge pull request #2777 from lyen1688/feat/content-moderation-risk-threshold
feat: 支持内容审计风险阈值配置
2026-05-26 14:12:54 +08:00
lyen1688
23f3d426c6 feat: 支持内容审计风险阈值配置 2026-05-26 13:58:02 +08:00
mt21625457
33ac8eb27d fix openai http2 response header timeout 2026-05-26 13:57:59 +08:00
DaydreamCoding
6b39b344d8 feat(quota): 用户 × 平台 USD 配额
为用户在 anthropic/openai/gemini/antigravity 四个平台上提供日/周/月
三个窗口的 USD 配额管控。配额语义:未设置=不限制,0=禁用,>0=美元上限。

两层模型:
- 配置层:系统默认配额,以及 email/linuxdo/oidc/wechat/github/google/
  dingtalk 七个鉴权来源的默认配额,存于 settings,以嵌套 JSON 整体读写
  (系统 1 个 key + 每个来源 1 个 key),整体替换语义。
- 运行时层:user_platform_quota 表按用户记录实际配额,与配置层解耦。

后端:新增 ent schema 与 140_user_platform_quotas.sql 迁移、repository
与 service 端口、计费链路集成、管理端与用户端读写接口。
前端:管理端设置页配额编辑、用户配额管理 Modal、用户 Dashboard 展示、
中英文案。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 10:49:20 +08:00
shaw
53acde1efd style: fix lint errors in response.failed SSE writer
Errcheck flagged three unchecked strings.Builder.WriteString calls and
gofmt rejected over-aligned trailing comment in the route table.

Rewrite writeResponsesFailedSSE with json.Marshal on typed structs
instead of Builder+strconv.Quote. Same wire format, but:
- no unchecked Write returns to silence
- strict JSON escaping (strconv.Quote emits \a and \v which are not
  valid JSON; Marshal handles all runes correctly)
- omitempty model field via struct tag instead of conditional Builder
- consistent with the json.Marshal style used elsewhere in handler/

Collapse trailing comment whitespace in stream_error_event_test.go to
satisfy gofmt.

All 30+ subtests in the package still pass.
2026-05-25 18:16:46 +08:00
Wesley Liddick
a18738b29e
Merge pull request #2732 from wminjay/fix/responses-stream-failed-event
fix(openai): emit response.failed when /v1/responses SSE aborted post-flush
2026-05-25 18:12:25 +08:00
wucm667
a9c7a3a095 fix(bedrock): strip context_management when beta is removed 2026-05-25 14:15:39 +08:00
siyuan
fc66cd704a fix: recognize codex tool outputs in ws continuation 2026-05-25 10:46:58 +08:00
Jamie Wong
b34cc71bee fix(openai): also emit response.failed in ensureForwardErrorResponse after Writer.Written
Case B: when a slot wait flushes SSE ping comments first (Writer.Written
becomes true), the previous ensureForwardErrorResponse short-circuited
on `c.Writer.Written()` and returned false without notifying the client.
Subsequent upstream errors (http2 timeout, stream INTERNAL_ERROR, etc.)
produced silent EOF; Codex CLI reported "stream closed before
response.completed" just like the user-slot timeout case.

Remove the Written() early return; coerce streamStarted to true when
Writer has already been written to, and let handleStreamingAwareError
walk the existing logic — which now (thanks to the previous commits)
emits a protocol-compliant response.failed for /responses paths and the
legacy `event: error` for others.

Update tests that previously asserted "do not override written response":
the new contract is to *append* an SSE terminal frame so the client sees
a clean close instead of EOF. recoverResponsesPanic inherits this fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 22:00:56 +08:00
Jamie Wong
cff2f291be fix(openai): also match bare /responses route in handleStreamingAwareError
The first revision compared GetInboundEndpoint(c) against EndpointResponses
("/v1/responses"). NormalizeInboundEndpoint only recognizes paths that
contain the literal "/v1/responses" substring, but the project actually
registers six /responses routes — three of which (top-level
r.POST("/responses", ...) and codexDirect's "/backend-api/codex/responses")
have FullPath values without the "/v1" prefix and therefore fall through
to the default branch.

Codex CLI users targeting the bare /responses route at the production
deployment (observed 2026-05-24 ~11:05 UTC, user 16) never reached the
new writeResponsesFailedSSE path: the endpoint check was false, the
legacy `event: error` frame fired, and the strict SDK kept reporting
"stream closed before response.completed".

Replace the strict equality check with inboundIsResponses(c), which
uses suffix detection on FullPath (falling back to URL.Path when
FullPath is empty in test fixtures) and covers all six route variants:

  /v1/responses[/...]
  /responses[/...]
  /backend-api/codex/responses[/...]

Add test table covering all routes plus negative cases.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 19:32:08 +08:00
Jamie Wong
5e5c2062bf fix(openai): emit response.failed for /v1/responses after stream started
When /v1/responses streaming hits the user/account concurrency wait, the
wait loop sends SSE ping comments to keep the connection alive, which
flushes HTTP 200 + headers. If the wait then times out (or any other
post-flush error fires), handleStreamingAwareError previously emitted a
generic `event: error` frame. Codex CLI requires the stream to end with
a Responses terminal event (response.completed/failed/incomplete/cancelled),
so it reports "stream closed before response.completed" and the user-facing
rate-limit intent is lost.

This change detects inbound = /v1/responses in both handleStreamingAwareError
implementations and emits a protocol-compliant response.failed event whose
field set mirrors apicompat.makeResponsesCompletedEvent
(id/object/model/status/output/error). The synthetic id reuses
ctxkey.RequestID so client errors can be grepped against server logs.
sequence_number is intentionally omitted to preserve monotonicity on streams
that already emitted real events.

Other inbound endpoints (/v1/chat/completions, /v1/messages) keep their
legacy formats untouched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 10:58:29 +08:00
github-actions[bot]
63b0631a58 chore: sync VERSION to 0.1.130 [skip ci] 2026-05-23 06:40:10 +00:00
Wesley Liddick
3c5a444802
Merge pull request #2698 from deqiying/fix/log-real-client-ip
fix: 修复反代部署下拒绝日志客户端 IP 不准确
2026-05-23 11:08:47 +08:00
shaw
b6c0b40848 fix: update x/net vulnerability dependency 2026-05-23 10:55:44 +08:00