2642 Commits

Author SHA1 Message Date
shaw
6010c3cca9 test: 修复内容审计日志异步断言 2026-05-29 09:57:02 +08:00
shaw
514ac5c6a1 feat: 适配 claude-opus-4-8 2026-05-29 09:56:48 +08:00
shaw
ed1b57c597 fix(openai): gate routing by endpoint capability 2026-05-29 08:58:10 +08:00
Wesley Liddick
8c1a07852c
Merge pull request #2858 from wey-gu/feat/openai-embeddings-gateway
feat(gateway): Add OpenAI embeddings gateway
2026-05-28 22:15:19 +08:00
Wesley Liddick
6b0ee8594c
Merge pull request #2860 from fofoj/fix/oauth-401-credentials-overwrite
fix(oauth): 401 handler 不再回写 credentials,避免 refresh_token 被陈旧快照覆盖
2026-05-28 21:59:02 +08:00
fofoj
be3613593b test(oauth): update OAuth 401 tests to match new no-write behavior
Two tests in ratelimit_service_401_test.go were encoding the bug behavior
itself:

- OAuth401InvalidatorError asserted updateCredentialsCalls == 1
- OAuth401UsesCredentialsUpdater asserted updateCredentialsCalls == 1
  and lastCredentials["expires_at"] non-empty

Both assertions exercised the exact write-back this PR removes. Update
them to reflect the new contract and guard against regression:

- OAuth401InvalidatorError: assert updateCredentialsCalls == 0
- OAuth401UsesCredentialsUpdater is renamed to
  OAuth401DoesNotOverwriteCredentials with reversed assertions, so it
  now serves as a regression test ensuring the 401 handler never writes
  credentials back from the request-start snapshot.
2026-05-28 20:32:16 +08:00
fofoj
6aec505016 fix(oauth): don't overwrite credentials JSONB in 401 handler
The 401 handler in RateLimitService.HandleUpstreamError set
account.Credentials["expires_at"] = time.Now() and then persisted the
full credentials map via persistAccountCredentials, which routes through
accountRepository.UpdateCredentials -> ent SetCredentials and replaces
the entire JSONB column.

The account passed to the handler is the request-start snapshot taken
by the gateway at SelectAccount time. When another worker has just
rotated refresh_token via oauth_refresh_api.RefreshIfNeeded, the
snapshot still holds the old refresh_token; writing the full snapshot
back rolls refresh_token in the DB back to the stale value.

The next refresh cycle then calls the upstream with the stale token,
receives invalid_grant, and tryRecoverFromRefreshRace re-reads the DB
only to find currentRT == usedRT (because the 401 handler just poisoned
the DB), returns false, and the account is incorrectly disabled.

Drop the credentials write. InvalidateToken + SetTempUnschedulable is
sufficient: the account is held out of scheduling during the cooldown,
and after the cooldown the next request goes through token_provider's
NeedsRefresh check, which routes through the locked, DB-re-reading
RefreshIfNeeded path.

The "force background refresh by setting expires_at = now" semantic is
intentionally dropped. token_refresh_service will naturally pick the
account up when the real expires_at enters the refresh window, and if
the real expires_at has already passed by the time the account becomes
schedulable again, token_provider's NeedsRefresh returns true and
RefreshIfNeeded fires synchronously on the next request.
2026-05-28 20:05:38 +08:00
lyen1688
1b2d8873b0 feat: 完善前置拦截审核运行态 2026-05-28 20:05:24 +08:00
Wey Gu
ccace69d4e Add OpenAI embeddings gateway 2026-05-28 19:39:52 +08:00
github-actions[bot]
89d96f4b25 chore: sync VERSION to 0.1.132 [skip ci] 2026-05-27 14:28:22 +00:00
Wesley Liddick
cc077862b3
Merge pull request #2797 from wucm667/feat/account-list-created-at-column
feat(admin): 账号管理列表新增创建时间列
2026-05-27 22:10:21 +08:00
Wesley Liddick
bbe847ed3e
Merge pull request #2805 from StarryKira/feat/configurable-pool-retry-status-codes
feat(account): configurable pool-mode same-account retry status codes
2026-05-27 22:09:55 +08:00
Wesley Liddick
69657b2fa1
Merge pull request #2827 from ttt132/fix/api-key-responses-sse-fallback
fix: fallback to SSE body for API key responses
2026-05-27 21:56:00 +08:00
Wesley Liddick
61ce79533e
Merge pull request #2800 from wucm667/fix/scheduler-model-not-found-per-model-cooldown
fix(scheduler): 模型 404 仅冷却该账号-模型组合,不再封整个账号
2026-05-27 21:01:52 +08:00
Wesley Liddick
c949d22725
Merge pull request #2821 from Pluviobyte/fix/long-context-cache-creation-multiplier
fix(billing): apply long-context multiplier to cache_creation price (follow-up to #2816)
2026-05-27 21:01:27 +08:00
Wesley Liddick
8461e42a97
Merge pull request #2822 from lyen1688/feat/group-custom-models-list
feat(group): 支持自定义 /v1/models 模型列表
2026-05-27 21:00:19 +08:00
haichuan
32ea9cfe1f fix: fallback to SSE body for API key responses 2026-05-27 20:24:52 +08:00
lyen1688
f597c1581b feat(group): 支持自定义 /v1/models 模型列表 2026-05-27 18:00:45 +08:00
Pluviobyte
ed2aac25a6
fix(billing): apply long-context multiplier to cache_creation price
Follow-up to #2816 (already merged): the same long-context pricing
exemption that affected cache_read also applies to all three
cache_creation price fields (standard, 5m ephemeral, 1h ephemeral).
computeCacheCreationCost reads these prices directly from pricing and
never sees the LongContextInputMultiplier that computeTokenBreakdown
applies to inputPrice / outputPrice / cacheReadPrice.

For GPT-5.4 / 5.5 above the 272k threshold, this causes the cache_write
portion of long sessions to be billed at roughly half what it should
be (default multiplier 2.0). Cache writes are conceptually input-side
operations and should share the same long-context treatment as input /
cache_read.

This patch threads an explicit multiplier into computeCacheCreationCost
so the function can be unit-tested in isolation and matches the existing
pattern used for cache_read. computeTokenBreakdown captures the long
context decision once and passes LongContextInputMultiplier when it
applies, 1.0 otherwise.

Adds three regression tests mirroring the #2816 cache_read tests:
- positive: long-context triggered -> cache_creation scaled by 2.0x
- negative: below threshold -> cache_creation stays at base price
- breakdown: 5m + 1h ephemeral prices both scaled when applicable

Refs #2816

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-27 09:59:58 +00:00
Pluviobyte
1e6d0b602a
fix(antigravity): capture message_start input_tokens in streaming passthrough
The antigravity upstream-passthrough path (account.Type == AccountTypeUpstream
forwarding to a Claude-format upstream) drains the SSE stream via
streamUpstreamResponse + extractSSEUsage. The extractor only reads top-level
event["usage"], which matches Anthropic's message_delta but misses
message_start where usage is nested under event.message.usage.

As a result, every streaming /v1/messages request through this path drops
the input-side fields (input_tokens, cache_read_input_tokens, cache_creation_*)
and writes a usage_logs row with input_tokens=0 + output_tokens>0. The user
in #2332 observed 2,728 such rows attributed to claude-opus-4-6 / haiku-4-5
streaming requests; their billing on output is correct but the input-side
accounting is missing. (Their "duplicate write from message_delta" hypothesis
isn't borne out by the code — RecordUsage is invoked once per request and
writeUsageLogBestEffort dedupes by request_id; what they're seeing is
single records produced by this buggy extractor.)

Branch on event.type so message_start reads from event.message.usage and
other events keep using event.usage, matching how parseSSEUsagePassthrough
already handles both shapes for the Anthropic OAuth / API-key / Bedrock paths.
Adds two extractSSEUsage table cases plus a TestExtractSSEUsage_StreamingSequence
that drives the message_start → message_delta sequence end-to-end; both fail
on main and pass with this change.

Fixes #2332

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-27 09:02:15 +00:00
Wesley Liddick
b0142146af
Merge pull request #2816 from Pluviobyte/fix/long-context-cache-read-multiplier
fix(billing): apply long-context multiplier to cache_read price (#2293)
2026-05-27 15:59:11 +08:00
Wesley Liddick
2387cf9934
Merge pull request #2799 from siyuan-123/fix/ws-rate-limit-failover
修复 OpenAI WS 限额时不自动切换账号
2026-05-27 15:14:28 +08:00
SlientRainyDay
b9509e823a fix(billing): apply long-context multiplier to cache_read price
When session long-context pricing is triggered in computeTokenBreakdown
(e.g. GPT-5.4 / GPT-5.5 above the 272k token threshold), the multiplier
was only being applied to InputPricePerToken and OutputPricePerToken.
The cache_read price was left at its base value, so CacheReadCost was
silently undercharged whenever a long-context session also had cache
hits — which is essentially every long Codex / Claude Code session.

Concretely for gpt-5.4 with 300k cache_read tokens, the bug
under-billed the request by exactly 1x the LongContextInputMultiplier
on the cache portion (e.g. 0.075 instead of 0.150 in the regression
test).

Cache reads are conceptually input-side replays, so they should scale
with LongContextInputMultiplier, matching the treatment of
InputPricePerToken.

Adds two regression tests:
- positive: long-context triggered -> cache_read scaled by 2.0x
- negative: below threshold -> cache_read stays at base price

Fixes #2293

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-27 07:09:28 +00:00
StarryKira
21033dceb9 feat(account): configurable pool-mode same-account retry status codes
Pool mode currently retries the same account for a fixed set of
upstream HTTP statuses: 401, 403, 429. Some upstream pool deployments
also need same-account retry for transient provider/proxy statuses
such as 502, 503, 520, 529, but hard-coding more statuses changes
behavior for everyone.

Add a per-account credentials option `pool_mode_retry_status_codes`
that lets admins choose which upstream HTTP status codes trigger
same-account retry in pool mode:

- Unset (default): preserve the current 401/403/429 default
- Explicit list: override the defaults with the configured codes
- Codes normalized to the 100-599 range, deduplicated, sorted

The standalone `isPoolModeRetryableStatus` helper is kept as the
default-only fallback. All 15 gateway call sites switch to the new
`Account.IsPoolModeRetryableStatus` method so behavior is preserved
for accounts that do not configure the new field.

Frontend admin UI gains a "Retry Status Codes" comma-separated input
under the pool-mode section in both Create/Edit account modals
(en + zh i18n).

Fixes #2731

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 11:24:25 -07:00
shaw
f7ac5e5931 fix(openai): preserve chat responses usage billing 2026-05-26 21:33:28 +08:00
wucm667
a31b507484 fix(scheduler): 模型404仅冷却账号模型组合 2026-05-26 20:29:48 +08:00
Wesley Liddick
4b9b63443f
Merge pull request #2790 from Arron196/from-arron-main
修复 Ops SLA 本地限制错误统计
2026-05-26 20:21:11 +08:00
siyuan
08061717b8 fix: enable account failover for OpenAI WS rate limits 2026-05-26 20:07:00 +08:00
Wesley Liddick
4a5c5367cf
Merge pull request #2796 from DaydreamCoding/fix/account-reauth-keep-extra
fix(account): 重新授权不再清空 Extra 配置
2026-05-26 20:06:48 +08:00
Wesley Liddick
b9f421d647
Merge pull request #2751 from wucm667/fix/bedrock-strip-context-management-when-beta-removed
fix(bedrock): v0.1.130 回归 — beta token 被移除时同步剥离 context_management 字段
2026-05-26 20:05:43 +08:00
wucm667
b6a38ddab7 feat(admin): 账号管理列表新增创建时间列 2026-05-26 19:59:12 +08:00
DaydreamCoding
11fe7de926 fix(account): 重新授权不再清空 Extra 配置
Claude / OpenAI 账号重新授权走通用 PUT /accounts/:id 时,后端
UpdateAccount 会全量覆盖 account.Extra(仅保留 5 个 quota 用量键),
导致 base_rpm / window_cost_limit / window_cost_sticky_reserve /
max_sessions / quota_* / privacy_mode 等持久化配置全部丢失。

新增专用接口 POST /accounts/:id/apply-oauth-credentials,沿用
现有 /refresh 路径模式:Credentials-only update + Extra JSONB
key 级合并(UpdateAccountExtra) + ClearError + InvalidateToken。

作用域:Claude OAuth / Claude Cookie auth / OpenAI OAuth 三个
调用点。Gemini / Antigravity 现有路径本就不传 extra,保持不变。

顺带修复:旧重新授权路径未调用 InvalidateToken,导致重新授权后
首请求可能仍用缓存中的旧 token 而立即 401。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 19:46:08 +08:00
benjamin
03ae510c68 fix(ops): exclude count-tokens from metrics errors
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:21:56 +08:00
benjamin
9c56fe0b0b fix(openai): mark fast-policy entrypoints business-limited
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:21:45 +08:00
benjamin
5d7df678b1 fix(openai): mark local gateway denials business-limited
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:19:50 +08:00
benjamin
47fe90eab4 fix(antigravity): mark whitelist denials business-limited
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:19:37 +08:00
benjamin
c3e7476992 fix(gateway): mark local platform gates business-limited
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:19:23 +08:00
benjamin
c782c2d9c3 fix(ops): classify local policy denials outside SLA
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:19:09 +08:00
benjamin
00eb3abbe1 fix(auth): mark Google group denials business-limited
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:18:55 +08:00
benjamin
bd1e98ec29 fix(auth): mark API key group denials business-limited
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:18:41 +08:00
benjamin
5c4101ac53 feat(ops): add local business limit reasons
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-26 17:18:27 +08:00
github-actions[bot]
9ef144874a chore: sync VERSION to 0.1.131 [skip ci] 2026-05-26 06:43:49 +00:00
Wesley Liddick
bebc082306
Merge pull request #2766 from DaydreamCoding/feat/user-platform-quota
feat(quota): 用户 × 平台 USD 配额
2026-05-26 14:13:18 +08:00
Wesley Liddick
83248478e2
Merge pull request #2777 from lyen1688/feat/content-moderation-risk-threshold
feat: 支持内容审计风险阈值配置
2026-05-26 14:12:54 +08:00
lyen1688
23f3d426c6 feat: 支持内容审计风险阈值配置 2026-05-26 13:58:02 +08:00
mt21625457
33ac8eb27d fix openai http2 response header timeout 2026-05-26 13:57:59 +08:00
DaydreamCoding
6b39b344d8 feat(quota): 用户 × 平台 USD 配额
为用户在 anthropic/openai/gemini/antigravity 四个平台上提供日/周/月
三个窗口的 USD 配额管控。配额语义:未设置=不限制,0=禁用,>0=美元上限。

两层模型:
- 配置层:系统默认配额,以及 email/linuxdo/oidc/wechat/github/google/
  dingtalk 七个鉴权来源的默认配额,存于 settings,以嵌套 JSON 整体读写
  (系统 1 个 key + 每个来源 1 个 key),整体替换语义。
- 运行时层:user_platform_quota 表按用户记录实际配额,与配置层解耦。

后端:新增 ent schema 与 140_user_platform_quotas.sql 迁移、repository
与 service 端口、计费链路集成、管理端与用户端读写接口。
前端:管理端设置页配额编辑、用户配额管理 Modal、用户 Dashboard 展示、
中英文案。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 10:49:20 +08:00
shaw
53acde1efd style: fix lint errors in response.failed SSE writer
Errcheck flagged three unchecked strings.Builder.WriteString calls and
gofmt rejected over-aligned trailing comment in the route table.

Rewrite writeResponsesFailedSSE with json.Marshal on typed structs
instead of Builder+strconv.Quote. Same wire format, but:
- no unchecked Write returns to silence
- strict JSON escaping (strconv.Quote emits \a and \v which are not
  valid JSON; Marshal handles all runes correctly)
- omitempty model field via struct tag instead of conditional Builder
- consistent with the json.Marshal style used elsewhere in handler/

Collapse trailing comment whitespace in stream_error_event_test.go to
satisfy gofmt.

All 30+ subtests in the package still pass.
2026-05-25 18:16:46 +08:00
Wesley Liddick
a18738b29e
Merge pull request #2732 from wminjay/fix/responses-stream-failed-event
fix(openai): emit response.failed when /v1/responses SSE aborted post-flush
2026-05-25 18:12:25 +08:00
wucm667
a9c7a3a095 fix(bedrock): strip context_management when beta is removed 2026-05-25 14:15:39 +08:00