sub2api

Author	SHA1	Message	Date
win	3cffaa1e8e	x Some checks are pending CI / golangci-lint (push) Waiting to run Details CI / windsurf-platform (macos-latest) (push) Waiting to run Details CI / windsurf-platform (windows-latest) (push) Waiting to run Details CI / test (push) Waiting to run Details CI / frontend (push) Waiting to run Details Security Scan / backend-security (push) Waiting to run Details Security Scan / frontend-security (push) Waiting to run Details	2026-05-30 16:30:59 +08:00
win	a420179abb	chore: merge upstream Wei-Shaw/sub2api v0.1.133	2026-05-29 17:48:27 +08:00
Wesley Liddick	0a8c77f8c3	Merge pull request #2884 from Pluviobyte/fix/gemini-messages-tool-text-block-ordering fix(gemini): close tool_use block before text in messages streaming	2026-05-29 16:22:34 +08:00
Wesley Liddick	69e7c4db30	Merge pull request #2865 from wey-gu/feat/usage-request-context fix(gateway): preserve usage request context	2026-05-29 16:21:59 +08:00
Wesley Liddick	52292741cb	Merge pull request #2849 from Pluviobyte/fix/count-tokens-payload-filter fix(gateway): filter count_tokens generation fields	2026-05-29 16:11:01 +08:00
Wesley Liddick	f68d351158	Merge pull request #2873 from wucm667/feat/account-quota-threshold-auto-pause feat(account): 支持按 5h/7d 用量阈值自动暂停账号调度	2026-05-29 15:40:33 +08:00
Pluviobyte	0a521f09fb	fix(gemini): close tool_use block before text in messages streaming When the Gemini->Anthropic streaming bridge for the /v1/messages endpoint receives a functionCall part followed by a text part, the text branch in handleStreamingResponse opened a new text content block without closing the already-open tool_use block. The tool block's content_block_stop was only emitted at end-of-stream, after the text block's content_block_start, so the Anthropic SSE stream contained overlapping/unterminated content blocks. Clients that assemble messages by block index (e.g. Claude Code) can drop the tool input or mis-parse the response. The functionCall branch already closes an open text block before opening a tool block, and the chat-completions sibling closes the tool block in its text branch via closeOpenTool(). This applies the same symmetric handling to the messages variant: close any open tool_use block (resetting openToolIndex/openToolName/ seenToolJSON) before starting text. Adds a regression test that replays a tool->text Gemini stream and asserts the Anthropic content-block lifecycle never overlaps.	2026-05-29 06:47:23 +00:00
wucm667	c9caadb378	fix(account): address second-round review on quota auto-pause - TopK initial filter now drops quota-paused accounts: fold the quota check into isAccountRequestCompatible so session-hash, TopK pool, and per-candidate rechecks all skip paused accounts. Previously the candidate pool was built without the quota check, so paused accounts could fill TopK and leave the scheduler returning "no available accounts" even with healthy ones available. - Add per-account explicit disable flags auto_pause_5h_disabled / auto_pause_7d_disabled with toggles in EditAccountModal. Without these, leaving the account threshold blank silently falls back to the global default, so admins could not exempt a single account once a global default existed. Disable is per-window: an account can opt out of 5h auto-pause while still honoring 7d. Schedule snapshot whitelist includes the new fields, i18n EN/ZH updated, threshold-hint text revised to explain "blank = global default". - Move quota auto-pause settings off the request hot path: replace the per-repo TTL+singleflight sync DB read with a per-SettingService stale-while-revalidate in-memory snapshot. Get is non-blocking (atomic.Pointer load + async refresh on staleness); writes via UpdateOpsAdvancedSettings push directly into the cache through an injected sink; wire warms the cache at startup. Adds Warm (sync) for tests/init and SetOpenAIQuotaAutoPauseSettings (sink target). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-29 14:32:45 +08:00
wucm667	8b7a822706	fix(account): address review on OpenAI quota auto-pause - gate previous_response_id sticky path with quota auto-pause check at both the snapshot and DB-recheck stages (previously bypassed, #1) - skip pausing when the usage window already reset to avoid a stale stuck-pause; carry codex_*_reset_at / reset_after_seconds / codex_usage_updated_at through the scheduler snapshot whitelist (#2) - remove the incomplete limit mode; percentage threshold only (#3) - add global default 5h/7d threshold inputs to the Ops settings dialog with validation and en/zh i18n (#4) - downgrade account_auto_paused_by_quota log from Info to Debug; it fires per-candidate on the scheduling hot path (#5) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 12:20:30 +08:00
Wesley Liddick	1d46be02ae	Merge pull request #2830 from stabey/fix/anthropic-to-responses-cache-tokens fix(apicompat): Anthropic 转 Responses 时按 OpenAI 语义汇总 input_tokens	2026-05-29 11:00:19 +08:00
wucm667	ead471d64b	feat(account): 支持按 5h/7d 用量阈值自动暂停账号调度	2026-05-29 10:47:47 +08:00
Wesley Liddick	16842c2f8b	Merge pull request #2836 from siyuan-123/fix/openai-ws-compat-usage 修复 OpenAI WS 兼容性与 usage 统计	2026-05-29 10:47:12 +08:00
Wesley Liddick	21cd382f39	Merge pull request #2855 from gaoren002/fix/concurrency-error-classification fix: classify concurrency acquire failures	2026-05-29 10:33:40 +08:00
Wesley Liddick	e6a3f1e12b	Merge pull request #2869 from Pluviobyte/fix/ws-first-token-terminal-event fix(ws): exclude terminal events from first-token detection	2026-05-29 10:32:16 +08:00
Wesley Liddick	fc9d79d96b	Merge pull request #2835 from JIA-ss/fix/responses-passthrough-token-details fix(apicompat): Responses→Chat 转换补齐 completion_tokens_details 透传	2026-05-29 10:31:52 +08:00
Wesley Liddick	d3a8db8084	Merge pull request #2833 from alfadb/fix/haiku-strip-context-management fix(gateway): 按最终 anthropic-beta header 对 body.context_management 做能力维度 sanitize	2026-05-29 10:31:12 +08:00
Wesley Liddick	433f8dcd13	Merge pull request #2834 from DaydreamCoding/pr/openai-codex-cli-allow-claude-code feat(openai): codex_cli_only 新增放行 Claude Code Codex 插件的机制	2026-05-29 10:30:33 +08:00
Wesley Liddick	6bc1983506	Merge pull request #2853 from wucm667/fix/system-update-already-up-to-date-response fix(admin): system/update 在无更新时返回 200 + already_up_to_date 而非 500	2026-05-29 10:29:49 +08:00
shaw	6010c3cca9	test: 修复内容审计日志异步断言	2026-05-29 09:57:02 +08:00
shaw	514ac5c6a1	feat: 适配 claude-opus-4-8	2026-05-29 09:56:48 +08:00
Pluviobyte	8a999f438d	fix(ws): exclude terminal events from first-token detection isOpenAIWSTokenEvent classified response.completed / response.done as token events. When upstream finishes a request without ever emitting a recognizable delta (e.g. cached completions or models that skip incremental output), firstTokenMs was then filled at the terminal event's timestamp, so the first-token latency metric effectively reported total request duration. Terminal events are already handled separately by isOpenAIWSTerminalEvent. Treating them as token events makes the two classifiers overlap, which violates the implicit invariant that the token-event and terminal-event sets are disjoint. The metric only affects ForwardResult.FirstTokenMs (logging and observability) — billing and routing are unchanged. Add regression tests for both directions: * TestIsOpenAIWSTokenEvent_TerminalEventsExcluded covers each classification branch. * TestIsOpenAIWSTokenEvent_DisjointWithTerminal asserts the disjoint-set invariant for every known terminal event. Both new tests fail when the old `return eventType == "response.completed" \|\| eventType == "response.done"` is restored. Fixes #2651 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-29 01:33:42 +00:00
shaw	ed1b57c597	fix(openai): gate routing by endpoint capability	2026-05-29 08:58:10 +08:00
win	f519a02ec9	chore: merge upstream Wei-Shaw/sub2api v0.1.132 Conflicts resolved (preserving fork customizations): - config.go: keep NodeTLSProxy + add upstream OpenAIHTTP2 - gateway_service.go: NewGatewayService now takes both rpmTokenBucketSvc (local) and userPlatformQuotaRepo (upstream) - wire_gen.go: wire both new args into the call site - http_upstream.go: drop redundant settings re-assignment; keep proxy URL log redaction - http_upstream_test.go: adopt upstream's explicit-0-disables semantics; keep 600s default constant in nil-cfg fallback test - user_handler_test.go / gateway_record_usage_test.go: pick up new userPlatformQuotaRepo nil parameter Also updated test stubs (windsurf_google_login_test.go, windsurf_tier_access_service_test.go, gateway_models_test.go) for new SetModelRateLimit variadic signature and the extra NewGatewayService arg. Upstream highlights: OpenAI embeddings gateway, user x platform USD quota, content-moderation risk thresholds, OAuth 401 credentials no-overwrite fix, HTTP/2 OpenAI upstream config, pool retry status code configurability, long-context cache pricing multipliers.	2026-05-29 07:21:32 +08:00
Wey Gu	2bd3125d0f	Preserve usage request context	2026-05-28 22:44:25 +08:00
Wesley Liddick	8c1a07852c	Merge pull request #2858 from wey-gu/feat/openai-embeddings-gateway feat(gateway): Add OpenAI embeddings gateway	2026-05-28 22:15:19 +08:00
Wesley Liddick	6b0ee8594c	Merge pull request #2860 from fofoj/fix/oauth-401-credentials-overwrite fix(oauth): 401 handler 不再回写 credentials,避免 refresh_token 被陈旧快照覆盖	2026-05-28 21:59:02 +08:00
fofoj	be3613593b	test(oauth): update OAuth 401 tests to match new no-write behavior Two tests in ratelimit_service_401_test.go were encoding the bug behavior itself: - OAuth401InvalidatorError asserted updateCredentialsCalls == 1 - OAuth401UsesCredentialsUpdater asserted updateCredentialsCalls == 1 and lastCredentials["expires_at"] non-empty Both assertions exercised the exact write-back this PR removes. Update them to reflect the new contract and guard against regression: - OAuth401InvalidatorError: assert updateCredentialsCalls == 0 - OAuth401UsesCredentialsUpdater is renamed to OAuth401DoesNotOverwriteCredentials with reversed assertions, so it now serves as a regression test ensuring the 401 handler never writes credentials back from the request-start snapshot.	2026-05-28 20:32:16 +08:00
fofoj	6aec505016	fix(oauth): don't overwrite credentials JSONB in 401 handler The 401 handler in RateLimitService.HandleUpstreamError set account.Credentials["expires_at"] = time.Now() and then persisted the full credentials map via persistAccountCredentials, which routes through accountRepository.UpdateCredentials -> ent SetCredentials and replaces the entire JSONB column. The account passed to the handler is the request-start snapshot taken by the gateway at SelectAccount time. When another worker has just rotated refresh_token via oauth_refresh_api.RefreshIfNeeded, the snapshot still holds the old refresh_token; writing the full snapshot back rolls refresh_token in the DB back to the stale value. The next refresh cycle then calls the upstream with the stale token, receives invalid_grant, and tryRecoverFromRefreshRace re-reads the DB only to find currentRT == usedRT (because the 401 handler just poisoned the DB), returns false, and the account is incorrectly disabled. Drop the credentials write. InvalidateToken + SetTempUnschedulable is sufficient: the account is held out of scheduling during the cooldown, and after the cooldown the next request goes through token_provider's NeedsRefresh check, which routes through the locked, DB-re-reading RefreshIfNeeded path. The "force background refresh by setting expires_at = now" semantic is intentionally dropped. token_refresh_service will naturally pick the account up when the real expires_at enters the refresh window, and if the real expires_at has already passed by the time the account becomes schedulable again, token_provider's NeedsRefresh returns true and RefreshIfNeeded fires synchronously on the next request.	2026-05-28 20:05:38 +08:00
lyen1688	1b2d8873b0	feat: 完善前置拦截审核运行态	2026-05-28 20:05:24 +08:00
Wey Gu	ccace69d4e	Add OpenAI embeddings gateway	2026-05-28 19:39:52 +08:00
gaoren002	56e96fdd8c	fix: classify concurrency acquire failures	2026-05-28 10:03:41 +00:00
wucm667	b15375dfb4	fix(admin): handle already up-to-date updates	2026-05-28 17:27:01 +08:00
Pluviobyte	27600b1d2c	fix(gateway): filter count_tokens generation fields Anthropic count_tokens rejects generation-only fields such as temperature, top_p, top_k, stream, and stop sequences. Passing the original messages payload through unchanged can turn otherwise valid requests into upstream 400 errors. Sanitize only the count_tokens upstream payload after the gateway's existing request normalization, preserving fields that existing compatibility paths rely on while removing parameters the count_tokens endpoint does not accept. Fixes #2764 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-28 05:40:50 +00:00
siyuan	d7bed40dda	修复 OpenAI WS 兼容性与 usage 统计 - 对齐 WS 与流式终态 usage 解析，补齐 failed/done/incomplete/cancelled 等事件 - 兼容后续 WS response.create 省略 model，保持模型映射与权限判断一致 - 补齐 passthrough header 透传和图片 usage 字段映射	2026-05-28 01:27:11 +08:00
JIA-ss	20f5340784	fix(apicompat): Responses→Chat 转换补齐 completion_tokens_details 透传 OpenAI Responses API 在 gpt-5.x 等 reasoning 模型上会返回 output_tokens_details.reasoning_tokens, 但 ResponsesToChatCompletions 只映射了 input_tokens_details.cached_tokens, 导致客户端拿到的 chat.completion.usage 中 completion_tokens 出现无法解释的波动 (短 prompt 也可能 30+ token), 且缺失 reasoning_tokens 细分字段, 难以与 OpenAI 原生 Chat Completions 响应对账。按 OpenAI 官方 CompletionUsage schema (openai/openai-go SDK completion.go) 补齐所有 token-details 字段, 全部 omitempty: prompt_tokens_details: - cached_tokens (原已支持) - audio_tokens (新增) completion_tokens_details: - reasoning_tokens (新增) - audio_tokens (新增) - accepted_prediction_tokens (新增) - rejected_prediction_tokens (新增) 实现细节: - 抽出 promptDetailsFromResponses / completionDetailsFromResponses 两个 helper, 全零字段返回 nil - 非流路径 ResponsesToChatCompletions 复用已存在的 chatUsageFromResponsesUsage helper, 消除两条路径间的重复 - 非 reasoning / 非 audio 上游 (Anthropic, Gemini, gpt-4o) 不填这些字段, helper 返回 nil → CompletionTokensDetails 不输出, 对现有响应字节级兼容新增单测: - TestResponsesToChatCompletions_ReasoningTokens - TestResponsesToChatCompletions_AllTokenDetailsPassThrough - TestResponsesToChatCompletions_NoReasoningTokensWhenZero - TestResponsesEventToChatChunks_CompletedWithReasoningTokens	2026-05-28 00:38:25 +08:00
alfadb	ddf91e9a7f	fix(gateway): 按最终 anthropic-beta header 对 body.context_management 做能力维度 sanitize 上游 Anthropic 在 body 含 `context_management` 但最终发出去的 `anthropic-beta` header 不含 `context-management-2025-06-27` 时会拒收： { "type": "invalid_request_error", "message": "context_management: Extra inputs are not permitted" } (HTTP 400, request_id 形如 req_011C...) 该 400 在 haiku 路径上触发,因为三个 beta header 构造器有意排除了 context-management beta: - HaikuBetaHeader (messages, OAuth / mimic CC) - APIKeyHaikuBetaHeader (messages, API-key) - CountTokensBetaHeader (count_tokens, 所有认证类型) 但 body 中仍然带着 `context_management` 字段,原因有二: 1. normalizeClaudeOAuthRequestBody 在 thinking_enabled / thinking_adaptive 打开时为 `clear_thinking_20251015` 主动注入; 2. 客户端 (Claude Code CLI >= 2.1.87) 原样发送, 网关透传时一并转发。修复方案: 能力维度对称约束 ========================== 对齐已有的 Bedrock 模式 (`backend/internal/service/bedrock_request.go` 中的 `sanitizeBedrockFieldsForBetaTokens`): 根据最终发出的 `anthropic-beta` header 决定是否保留 `body.context_management`, 而不是按 model 名或路由分类来决定。新增纯函数: sanitizeAnthropicBodyForBetaTokens(body, betaHeader) (body, changed) 如果 `betaHeader` 不含 `context-management-2025-06-27`, 用 sjson 把 body 字段 strip 掉; 否则原样返回。在所有 Anthropic / Anthropic-兼容上游出口都接入: \| 路径 \| sanitize 接入点 \| \|--------------------------------------------\|-------------------------------------------------------\| \| /v1/messages OAuth mimic CC \| buildUpstreamRequest \| \| /v1/messages OAuth 真 CC 透传 \| buildUpstreamRequest \| \| /v1/messages API-key \| buildUpstreamRequest \| \| /v1/messages API-key passthrough \| buildUpstreamRequestAnthropicAPIKeyPassthrough \| \| /v1/messages Vertex / service-account \| buildUpstreamRequestAnthropicVertex \| \| /v1/messages/count_tokens (全部 4 条路径) \| buildCountTokensRequest, \| \| \| buildCountTokensRequestAnthropicAPIKeyPassthrough \| \| Antigravity Anthropic-兼容上游 \| AntigravityGatewayService.ForwardUpstream \| \| Bedrock \| (已由 sanitizeBedrockFieldsForBetaTokens 处理) \| 为什么要重排 (而不是加一行调用) ================================ sanitize 必须在 `signBillingHeaderCCH` 之前运行。CCH 对整个 body 取 xxHash64 摘要后写入 billing header 里 5 位十六进制的 `cch` 字段; 如果先签名再 strip, 上游对发出去的 body 重算 hash 会和 `cch` 不一致, 请求被判为 third-party。这就要求在 `http.NewRequest` 之前算出最终的 `anthropic-beta` header, 所以把原本内联在 builder 里的 beta 计算逻辑抽成了两个纯函数: - computeFinalAnthropicBeta (messages 路径: mimic 不透传客户端 beta) - computeFinalCountTokensAnthropicBeta (count_tokens 路径: mimic 不跳过白名单透传) 两者逐位保留原行为: - mimic 路径在 messages 上跳过客户端 beta, 在 count_tokens 上合并 - API-key 路径尊重 `InjectBetaForAPIKey` 开关 - dropSet (`defaultDroppedBetasSet` + BetaPolicy filter) 应用在主路径, passthrough / Vertex 路径有意不应用 —— 这条原有的不对称行为本 PR 不动。一条语义测试 (`TestSanitizeMustBeBeforeCCHSigning_HashConsistency`) 把顺序约束文档化并强制守住: 它证明 `sanitize -> signBillingHeaderCCH` 产生的 `cch` 与最终 body 一致, 而 `signBillingHeaderCCH -> sanitize` 产生的 `cch` 会被上游 hash 重算判失败。为什么是能力维度 (而不是 haiku 模型名匹配) ========================================== 最朴素的"按 model 名 strip"方案 (`strings.Contains(modelID, "haiku") -> DeleteBytes "context_management"`) 有四个真实失败模式: 1. 过度删除。CLI >= 2.1.87 的真 Claude Code 客户端在 haiku 上同时发送 body 字段和 `anthropic-beta: context-management-2025-06-27`。一律 strip 会让该用户的 `clear_thinking_20251015` 静默失效。 2. 别名漂移。未来的 haiku 别名 (`claude-3-haiku-...`, `claude-haiku-...` 等) 改变匹配面; 任何新别名都会悄悄绕过 strip。 3. count_tokens 漏覆盖。count_tokens 有自己的 builder 和不同的 beta header 集合; 在一个地方做 model 名检查会漏掉这条路径。 4. API-key passthrough 早退。passthrough builder 在 model 名 strip 之前就 return 了, strip 根本不执行。能力维度沿着 header 端到端走, 上述 4 个 case 都由构造方式保证正确, 不依赖任何 modelID 匹配。防御项 ====== - 当 `sjson.DeleteBytes` 在 `gjson` 刚验证过字段存在的 body 上失败时, `sanitizeAnthropicBodyForBetaTokens` 会记 warning 日志 —— 这种情况现实中仅在请求中途被破坏时发生, 日志把此前会静默发生的 body / header 不一致暴露出来。 - `header_util.go` 新增 `deleteHeaderAllForms`: 在白名单透传已经写入 canonical 大小写的 `Anthropic-Beta` 之后再覆盖, 否则会同时留下两条。测试 ==== `backend/internal/service` 下新增 44 个测试: - 纯函数: anthropicBetaTokensContains x 5, sanitize keep/strip x 6, computeFinal{Anthropic,CountTokens}AnthropicBeta x 12 - normalize 回归 x 5 - buildUpstreamRequest 端到端 x 4 (OAuth mimic haiku strip / mimic sonnet preserve / 真 CC haiku 带客户端 beta preserve / API-key haiku strip) - buildCountTokensRequest 端到端 x 2 - buildUpstreamRequestAnthropicAPIKeyPassthrough x 2 (strip / preserve) - buildCountTokensRequestAnthropicAPIKeyPassthrough x 2 (strip / preserve) - buildUpstreamRequestAnthropicVertex x 2 (strip / preserve, 含 outgoing `anthropic-beta` header 对称断言) - CCH 顺序语义测试 x 1 unit 套件全过 (本机 88s), `golangci-lint` 0 issues。已知局限 (本 PR 范围外) ======================== - Vertex 路径用透传过来的客户端 `anthropic-beta` header 作为 sanitize 依据, 而不是 Vertex 侧的能力矩阵。最坏情况是过度 strip (= 当前 main 的行为, 主路径本来什么都不 strip); 不是 regression。完整的 Vertex 能力模型属于单独的 PR。 - Vertex builder 仍然不应用 BetaPolicy filter / dropSet。这是该 builder 早 return 的既有架构决策, 本 PR 不动。 - count_tokens mimic 在 haiku 上仍然注入 `context-management-2025-06-27` (因为原 count_tokens mimic 逻辑并不像 messages mimic 那样排除它)。本 PR 逐位保留 main 的行为; 是否要让它与 messages mimic 的排除策略统一是另一个问题。 - `sanitizeAnthropicBodyForBetaTokens` 目前只处理 `context_management <-> context-management-2025-06-27` 这一对。如果 Anthropic 后续推出更多 beta-gated body 字段, 可以在后续 PR 重构为 `{body 路径 -> required beta token}` 注册表的形式。	2026-05-28 00:02:50 +08:00
DaydreamCoding	56908d3c4c	feat(openai): codex_cli_only 新增放行 Claude Code Codex 插件的机制适用场景：在 Claude Code 中使用 https://github.com/openai/codex-plugin-cc 插件时，插件经官方 codex app-server 以 clientInfo.name="Claude Code" 完成 initialize 握手，请求头被设为 originator=Claude Code、User-Agent 含 "Claude Code/"，不在官方客户端白名单内，原本会被 codex_cli_only 拦截 403。在官方客户端白名单未命中时评估两层独立放行（OR 语义）： - 按账号：account.Extra.codex_cli_only_allowed_clients 引用命名预设（目前仅 claude_code），detector reason=allowed_client_matched - 全局开关：/admin/settings 网关服务 OpenAI 区块新增 openai_allow_claude_code_codex_plugin（默认 false），开启后对所有 codex_cli_only 账号统一放行，detector reason=global_allowed_client_matched 签名仍要求 originator=Claude Code 精确等值 + UA 含 "Claude Code/"。上游转发保持透传不变。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-27 23:55:34 +08:00
stabey	89dffdd2e1	fix(apicompat): emit OpenAI-semantic input_tokens when converting Anthropic to Responses Anthropic Messages reports input_tokens excluding cache_read/cache_creation, but OpenAI Responses input_tokens is the total including cached tokens. The reverse converter passed Anthropic's input_tokens straight through, so client-facing prompt_tokens/input_tokens were short by the cached count and cache_creation was dropped entirely. Fix the non-stream path and the streaming state machine to add cache_read + cache_creation back into input_tokens, and track CacheCreationInputTokens on the streaming state. Six downstream paths benefit (Anthropic->Responses, Anthropic->ChatCompletions, Gemini->ChatCompletions, each sync + stream). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 22:36:52 +08:00
Wesley Liddick	cc077862b3	Merge pull request #2797 from wucm667/feat/account-list-created-at-column feat(admin): 账号管理列表新增创建时间列	2026-05-27 22:10:21 +08:00
Wesley Liddick	bbe847ed3e	Merge pull request #2805 from StarryKira/feat/configurable-pool-retry-status-codes feat(account): configurable pool-mode same-account retry status codes	2026-05-27 22:09:55 +08:00
Wesley Liddick	69657b2fa1	Merge pull request #2827 from ttt132/fix/api-key-responses-sse-fallback fix: fallback to SSE body for API key responses	2026-05-27 21:56:00 +08:00
Wesley Liddick	61ce79533e	Merge pull request #2800 from wucm667/fix/scheduler-model-not-found-per-model-cooldown fix(scheduler): 模型 404 仅冷却该账号-模型组合，不再封整个账号	2026-05-27 21:01:52 +08:00
Wesley Liddick	c949d22725	Merge pull request #2821 from Pluviobyte/fix/long-context-cache-creation-multiplier fix(billing): apply long-context multiplier to cache_creation price (follow-up to #2816)	2026-05-27 21:01:27 +08:00
Wesley Liddick	8461e42a97	Merge pull request #2822 from lyen1688/feat/group-custom-models-list feat(group): 支持自定义 /v1/models 模型列表	2026-05-27 21:00:19 +08:00
haichuan	32ea9cfe1f	fix: fallback to SSE body for API key responses	2026-05-27 20:24:52 +08:00
lyen1688	f597c1581b	feat(group): 支持自定义 /v1/models 模型列表	2026-05-27 18:00:45 +08:00
Pluviobyte	ed2aac25a6	fix(billing): apply long-context multiplier to cache_creation price Follow-up to #2816 (already merged): the same long-context pricing exemption that affected cache_read also applies to all three cache_creation price fields (standard, 5m ephemeral, 1h ephemeral). computeCacheCreationCost reads these prices directly from pricing and never sees the LongContextInputMultiplier that computeTokenBreakdown applies to inputPrice / outputPrice / cacheReadPrice. For GPT-5.4 / 5.5 above the 272k threshold, this causes the cache_write portion of long sessions to be billed at roughly half what it should be (default multiplier 2.0). Cache writes are conceptually input-side operations and should share the same long-context treatment as input / cache_read. This patch threads an explicit multiplier into computeCacheCreationCost so the function can be unit-tested in isolation and matches the existing pattern used for cache_read. computeTokenBreakdown captures the long context decision once and passes LongContextInputMultiplier when it applies, 1.0 otherwise. Adds three regression tests mirroring the #2816 cache_read tests: - positive: long-context triggered -> cache_creation scaled by 2.0x - negative: below threshold -> cache_creation stays at base price - breakdown: 5m + 1h ephemeral prices both scaled when applicable Refs #2816 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-27 09:59:58 +00:00
Pluviobyte	1e6d0b602a	fix(antigravity): capture message_start input_tokens in streaming passthrough The antigravity upstream-passthrough path (account.Type == AccountTypeUpstream forwarding to a Claude-format upstream) drains the SSE stream via streamUpstreamResponse + extractSSEUsage. The extractor only reads top-level event["usage"], which matches Anthropic's message_delta but misses message_start where usage is nested under event.message.usage. As a result, every streaming /v1/messages request through this path drops the input-side fields (input_tokens, cache_read_input_tokens, cache_creation_*) and writes a usage_logs row with input_tokens=0 + output_tokens>0. The user in #2332 observed 2,728 such rows attributed to claude-opus-4-6 / haiku-4-5 streaming requests; their billing on output is correct but the input-side accounting is missing. (Their "duplicate write from message_delta" hypothesis isn't borne out by the code — RecordUsage is invoked once per request and writeUsageLogBestEffort dedupes by request_id; what they're seeing is single records produced by this buggy extractor.) Branch on event.type so message_start reads from event.message.usage and other events keep using event.usage, matching how parseSSEUsagePassthrough already handles both shapes for the Anthropic OAuth / API-key / Bedrock paths. Adds two extractSSEUsage table cases plus a TestExtractSSEUsage_StreamingSequence that drives the message_start → message_delta sequence end-to-end; both fail on main and pass with this change. Fixes #2332 Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-27 09:02:15 +00:00
Wesley Liddick	b0142146af	Merge pull request #2816 from Pluviobyte/fix/long-context-cache-read-multiplier fix(billing): apply long-context multiplier to cache_read price (#2293)	2026-05-27 15:59:11 +08:00
Wesley Liddick	2387cf9934	Merge pull request #2799 from siyuan-123/fix/ws-rate-limit-failover 修复 OpenAI WS 限额时不自动切换账号	2026-05-27 15:14:28 +08:00

1 2 3 4 5 ...

2618 Commits