sub2api

Author	SHA1	Message	Date
Pluviobyte	27600b1d2c	fix(gateway): filter count_tokens generation fields Anthropic count_tokens rejects generation-only fields such as temperature, top_p, top_k, stream, and stop sequences. Passing the original messages payload through unchanged can turn otherwise valid requests into upstream 400 errors. Sanitize only the count_tokens upstream payload after the gateway's existing request normalization, preserving fields that existing compatibility paths rely on while removing parameters the count_tokens endpoint does not accept. Fixes #2764 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-28 05:40:50 +00:00
github-actions[bot]	89d96f4b25	chore: sync VERSION to 0.1.132 [skip ci]	2026-05-27 14:28:22 +00:00
Wesley Liddick	cc077862b3	Merge pull request #2797 from wucm667/feat/account-list-created-at-column feat(admin): 账号管理列表新增创建时间列	2026-05-27 22:10:21 +08:00
Wesley Liddick	bbe847ed3e	Merge pull request #2805 from StarryKira/feat/configurable-pool-retry-status-codes feat(account): configurable pool-mode same-account retry status codes	2026-05-27 22:09:55 +08:00
Wesley Liddick	69657b2fa1	Merge pull request #2827 from ttt132/fix/api-key-responses-sse-fallback fix: fallback to SSE body for API key responses	2026-05-27 21:56:00 +08:00
Wesley Liddick	61ce79533e	Merge pull request #2800 from wucm667/fix/scheduler-model-not-found-per-model-cooldown fix(scheduler): 模型 404 仅冷却该账号-模型组合，不再封整个账号	2026-05-27 21:01:52 +08:00
Wesley Liddick	c949d22725	Merge pull request #2821 from Pluviobyte/fix/long-context-cache-creation-multiplier fix(billing): apply long-context multiplier to cache_creation price (follow-up to #2816)	2026-05-27 21:01:27 +08:00
Wesley Liddick	8461e42a97	Merge pull request #2822 from lyen1688/feat/group-custom-models-list feat(group): 支持自定义 /v1/models 模型列表	2026-05-27 21:00:19 +08:00
haichuan	32ea9cfe1f	fix: fallback to SSE body for API key responses	2026-05-27 20:24:52 +08:00
lyen1688	f597c1581b	feat(group): 支持自定义 /v1/models 模型列表	2026-05-27 18:00:45 +08:00
Pluviobyte	ed2aac25a6	fix(billing): apply long-context multiplier to cache_creation price Follow-up to #2816 (already merged): the same long-context pricing exemption that affected cache_read also applies to all three cache_creation price fields (standard, 5m ephemeral, 1h ephemeral). computeCacheCreationCost reads these prices directly from pricing and never sees the LongContextInputMultiplier that computeTokenBreakdown applies to inputPrice / outputPrice / cacheReadPrice. For GPT-5.4 / 5.5 above the 272k threshold, this causes the cache_write portion of long sessions to be billed at roughly half what it should be (default multiplier 2.0). Cache writes are conceptually input-side operations and should share the same long-context treatment as input / cache_read. This patch threads an explicit multiplier into computeCacheCreationCost so the function can be unit-tested in isolation and matches the existing pattern used for cache_read. computeTokenBreakdown captures the long context decision once and passes LongContextInputMultiplier when it applies, 1.0 otherwise. Adds three regression tests mirroring the #2816 cache_read tests: - positive: long-context triggered -> cache_creation scaled by 2.0x - negative: below threshold -> cache_creation stays at base price - breakdown: 5m + 1h ephemeral prices both scaled when applicable Refs #2816 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-27 09:59:58 +00:00
Pluviobyte	1e6d0b602a	fix(antigravity): capture message_start input_tokens in streaming passthrough The antigravity upstream-passthrough path (account.Type == AccountTypeUpstream forwarding to a Claude-format upstream) drains the SSE stream via streamUpstreamResponse + extractSSEUsage. The extractor only reads top-level event["usage"], which matches Anthropic's message_delta but misses message_start where usage is nested under event.message.usage. As a result, every streaming /v1/messages request through this path drops the input-side fields (input_tokens, cache_read_input_tokens, cache_creation_*) and writes a usage_logs row with input_tokens=0 + output_tokens>0. The user in #2332 observed 2,728 such rows attributed to claude-opus-4-6 / haiku-4-5 streaming requests; their billing on output is correct but the input-side accounting is missing. (Their "duplicate write from message_delta" hypothesis isn't borne out by the code — RecordUsage is invoked once per request and writeUsageLogBestEffort dedupes by request_id; what they're seeing is single records produced by this buggy extractor.) Branch on event.type so message_start reads from event.message.usage and other events keep using event.usage, matching how parseSSEUsagePassthrough already handles both shapes for the Anthropic OAuth / API-key / Bedrock paths. Adds two extractSSEUsage table cases plus a TestExtractSSEUsage_StreamingSequence that drives the message_start → message_delta sequence end-to-end; both fail on main and pass with this change. Fixes #2332 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-27 09:02:15 +00:00
Wesley Liddick	b0142146af	Merge pull request #2816 from Pluviobyte/fix/long-context-cache-read-multiplier fix(billing): apply long-context multiplier to cache_read price (#2293)	2026-05-27 15:59:11 +08:00
Wesley Liddick	2387cf9934	Merge pull request #2799 from siyuan-123/fix/ws-rate-limit-failover 修复 OpenAI WS 限额时不自动切换账号	2026-05-27 15:14:28 +08:00
SlientRainyDay	b9509e823a	fix(billing): apply long-context multiplier to cache_read price When session long-context pricing is triggered in computeTokenBreakdown (e.g. GPT-5.4 / GPT-5.5 above the 272k token threshold), the multiplier was only being applied to InputPricePerToken and OutputPricePerToken. The cache_read price was left at its base value, so CacheReadCost was silently undercharged whenever a long-context session also had cache hits — which is essentially every long Codex / Claude Code session. Concretely for gpt-5.4 with 300k cache_read tokens, the bug under-billed the request by exactly 1x the LongContextInputMultiplier on the cache portion (e.g. 0.075 instead of 0.150 in the regression test). Cache reads are conceptually input-side replays, so they should scale with LongContextInputMultiplier, matching the treatment of InputPricePerToken. Adds two regression tests: - positive: long-context triggered -> cache_read scaled by 2.0x - negative: below threshold -> cache_read stays at base price Fixes #2293 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-27 07:09:28 +00:00
StarryKira	21033dceb9	feat(account): configurable pool-mode same-account retry status codes Pool mode currently retries the same account for a fixed set of upstream HTTP statuses: 401, 403, 429. Some upstream pool deployments also need same-account retry for transient provider/proxy statuses such as 502, 503, 520, 529, but hard-coding more statuses changes behavior for everyone. Add a per-account credentials option `pool_mode_retry_status_codes` that lets admins choose which upstream HTTP status codes trigger same-account retry in pool mode: - Unset (default): preserve the current 401/403/429 default - Explicit list: override the defaults with the configured codes - Codes normalized to the 100-599 range, deduplicated, sorted The standalone `isPoolModeRetryableStatus` helper is kept as the default-only fallback. All 15 gateway call sites switch to the new `Account.IsPoolModeRetryableStatus` method so behavior is preserved for accounts that do not configure the new field. Frontend admin UI gains a "Retry Status Codes" comma-separated input under the pool-mode section in both Create/Edit account modals (en + zh i18n). Fixes #2731 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 11:24:25 -07:00
shaw	f7ac5e5931	fix(openai): preserve chat responses usage billing	2026-05-26 21:33:28 +08:00
wucm667	a31b507484	fix(scheduler): 模型404仅冷却账号模型组合	2026-05-26 20:29:48 +08:00
Wesley Liddick	4b9b63443f	Merge pull request #2790 from Arron196/from-arron-main 修复 Ops SLA 本地限制错误统计	2026-05-26 20:21:11 +08:00
siyuan	08061717b8	fix: enable account failover for OpenAI WS rate limits	2026-05-26 20:07:00 +08:00
Wesley Liddick	4a5c5367cf	Merge pull request #2796 from DaydreamCoding/fix/account-reauth-keep-extra fix(account): 重新授权不再清空 Extra 配置	2026-05-26 20:06:48 +08:00
Wesley Liddick	b9f421d647	Merge pull request #2751 from wucm667/fix/bedrock-strip-context-management-when-beta-removed fix(bedrock): v0.1.130 回归 — beta token 被移除时同步剥离 context_management 字段	2026-05-26 20:05:43 +08:00
wucm667	b6a38ddab7	feat(admin): 账号管理列表新增创建时间列	2026-05-26 19:59:12 +08:00
DaydreamCoding	11fe7de926	fix(account): 重新授权不再清空 Extra 配置 Claude / OpenAI 账号重新授权走通用 PUT /accounts/:id 时，后端 UpdateAccount 会全量覆盖 account.Extra（仅保留 5 个 quota 用量键），导致 base_rpm / window_cost_limit / window_cost_sticky_reserve / max_sessions / quota_* / privacy_mode 等持久化配置全部丢失。新增专用接口 POST /accounts/:id/apply-oauth-credentials，沿用现有 /refresh 路径模式：Credentials-only update + Extra JSONB key 级合并（UpdateAccountExtra） + ClearError + InvalidateToken。作用域：Claude OAuth / Claude Cookie auth / OpenAI OAuth 三个调用点。Gemini / Antigravity 现有路径本就不传 extra，保持不变。顺带修复：旧重新授权路径未调用 InvalidateToken，导致重新授权后首请求可能仍用缓存中的旧 token 而立即 401。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 19:46:08 +08:00
benjamin	03ae510c68	fix(ops): exclude count-tokens from metrics errors Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-26 17:21:56 +08:00
benjamin	9c56fe0b0b	fix(openai): mark fast-policy entrypoints business-limited Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-26 17:21:45 +08:00
benjamin	5d7df678b1	fix(openai): mark local gateway denials business-limited Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-26 17:19:50 +08:00
benjamin	47fe90eab4	fix(antigravity): mark whitelist denials business-limited Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-26 17:19:37 +08:00
benjamin	c3e7476992	fix(gateway): mark local platform gates business-limited Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-26 17:19:23 +08:00
benjamin	c782c2d9c3	fix(ops): classify local policy denials outside SLA Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-26 17:19:09 +08:00
benjamin	00eb3abbe1	fix(auth): mark Google group denials business-limited Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-26 17:18:55 +08:00
benjamin	bd1e98ec29	fix(auth): mark API key group denials business-limited Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-26 17:18:41 +08:00
benjamin	5c4101ac53	feat(ops): add local business limit reasons Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-26 17:18:27 +08:00
github-actions[bot]	9ef144874a	chore: sync VERSION to 0.1.131 [skip ci]	2026-05-26 06:43:49 +00:00
Wesley Liddick	bebc082306	Merge pull request #2766 from DaydreamCoding/feat/user-platform-quota feat(quota): 用户 × 平台 USD 配额	2026-05-26 14:13:18 +08:00
Wesley Liddick	83248478e2	Merge pull request #2777 from lyen1688/feat/content-moderation-risk-threshold feat: 支持内容审计风险阈值配置	2026-05-26 14:12:54 +08:00
lyen1688	23f3d426c6	feat: 支持内容审计风险阈值配置	2026-05-26 13:58:02 +08:00
mt21625457	33ac8eb27d	fix openai http2 response header timeout	2026-05-26 13:57:59 +08:00
DaydreamCoding	6b39b344d8	feat(quota): 用户 × 平台 USD 配额为用户在 anthropic/openai/gemini/antigravity 四个平台上提供日/周/月三个窗口的 USD 配额管控。配额语义：未设置=不限制，0=禁用，>0=美元上限。两层模型： - 配置层：系统默认配额，以及 email/linuxdo/oidc/wechat/github/google/ dingtalk 七个鉴权来源的默认配额，存于 settings，以嵌套 JSON 整体读写（系统 1 个 key + 每个来源 1 个 key），整体替换语义。 - 运行时层：user_platform_quota 表按用户记录实际配额，与配置层解耦。后端：新增 ent schema 与 140_user_platform_quotas.sql 迁移、repository 与 service 端口、计费链路集成、管理端与用户端读写接口。前端：管理端设置页配额编辑、用户配额管理 Modal、用户 Dashboard 展示、中英文案。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 10:49:20 +08:00
shaw	53acde1efd	style: fix lint errors in response.failed SSE writer Errcheck flagged three unchecked strings.Builder.WriteString calls and gofmt rejected over-aligned trailing comment in the route table. Rewrite writeResponsesFailedSSE with json.Marshal on typed structs instead of Builder+strconv.Quote. Same wire format, but: - no unchecked Write returns to silence - strict JSON escaping (strconv.Quote emits \a and \v which are not valid JSON; Marshal handles all runes correctly) - omitempty model field via struct tag instead of conditional Builder - consistent with the json.Marshal style used elsewhere in handler/ Collapse trailing comment whitespace in stream_error_event_test.go to satisfy gofmt. All 30+ subtests in the package still pass.	2026-05-25 18:16:46 +08:00
Wesley Liddick	a18738b29e	Merge pull request #2732 from wminjay/fix/responses-stream-failed-event fix(openai): emit response.failed when /v1/responses SSE aborted post-flush	2026-05-25 18:12:25 +08:00
wucm667	a9c7a3a095	fix(bedrock): strip context_management when beta is removed	2026-05-25 14:15:39 +08:00
siyuan	fc66cd704a	fix: recognize codex tool outputs in ws continuation	2026-05-25 10:46:58 +08:00
Jamie Wong	b34cc71bee	fix(openai): also emit response.failed in ensureForwardErrorResponse after Writer.Written Case B: when a slot wait flushes SSE ping comments first (Writer.Written becomes true), the previous ensureForwardErrorResponse short-circuited on `c.Writer.Written()` and returned false without notifying the client. Subsequent upstream errors (http2 timeout, stream INTERNAL_ERROR, etc.) produced silent EOF; Codex CLI reported "stream closed before response.completed" just like the user-slot timeout case. Remove the Written() early return; coerce streamStarted to true when Writer has already been written to, and let handleStreamingAwareError walk the existing logic — which now (thanks to the previous commits) emits a protocol-compliant response.failed for /responses paths and the legacy `event: error` for others. Update tests that previously asserted "do not override written response": the new contract is to append an SSE terminal frame so the client sees a clean close instead of EOF. recoverResponsesPanic inherits this fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-24 22:00:56 +08:00
Jamie Wong	cff2f291be	fix(openai): also match bare /responses route in handleStreamingAwareError The first revision compared GetInboundEndpoint(c) against EndpointResponses ("/v1/responses"). NormalizeInboundEndpoint only recognizes paths that contain the literal "/v1/responses" substring, but the project actually registers six /responses routes — three of which (top-level r.POST("/responses", ...) and codexDirect's "/backend-api/codex/responses") have FullPath values without the "/v1" prefix and therefore fall through to the default branch. Codex CLI users targeting the bare /responses route at the production deployment (observed 2026-05-24 ~11:05 UTC, user 16) never reached the new writeResponsesFailedSSE path: the endpoint check was false, the legacy `event: error` frame fired, and the strict SDK kept reporting "stream closed before response.completed". Replace the strict equality check with inboundIsResponses(c), which uses suffix detection on FullPath (falling back to URL.Path when FullPath is empty in test fixtures) and covers all six route variants: /v1/responses[/...] /responses[/...] /backend-api/codex/responses[/...] Add test table covering all routes plus negative cases. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-24 19:32:08 +08:00
Jamie Wong	5e5c2062bf	fix(openai): emit response.failed for /v1/responses after stream started When /v1/responses streaming hits the user/account concurrency wait, the wait loop sends SSE ping comments to keep the connection alive, which flushes HTTP 200 + headers. If the wait then times out (or any other post-flush error fires), handleStreamingAwareError previously emitted a generic `event: error` frame. Codex CLI requires the stream to end with a Responses terminal event (response.completed/failed/incomplete/cancelled), so it reports "stream closed before response.completed" and the user-facing rate-limit intent is lost. This change detects inbound = /v1/responses in both handleStreamingAwareError implementations and emits a protocol-compliant response.failed event whose field set mirrors apicompat.makeResponsesCompletedEvent (id/object/model/status/output/error). The synthetic id reuses ctxkey.RequestID so client errors can be grepped against server logs. sequence_number is intentionally omitted to preserve monotonicity on streams that already emitted real events. Other inbound endpoints (/v1/chat/completions, /v1/messages) keep their legacy formats untouched. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-24 10:58:29 +08:00
github-actions[bot]	63b0631a58	chore: sync VERSION to 0.1.130 [skip ci]	2026-05-23 06:40:10 +00:00
Wesley Liddick	3c5a444802	Merge pull request #2698 from deqiying/fix/log-real-client-ip fix: 修复反代部署下拒绝日志客户端 IP 不准确	2026-05-23 11:08:47 +08:00
shaw	b6c0b40848	fix: update x/net vulnerability dependency	2026-05-23 10:55:44 +08:00
shaw	1e406fed52	fix: optimize OpenAI account cooldown scheduling	2026-05-23 10:18:43 +08:00

1 2 3 4 5 ...

2634 Commits