385 Commits

Author SHA1 Message Date
win
3cffaa1e8e x
Some checks are pending
CI / golangci-lint (push) Waiting to run
CI / windsurf-platform (macos-latest) (push) Waiting to run
CI / windsurf-platform (windows-latest) (push) Waiting to run
CI / test (push) Waiting to run
CI / frontend (push) Waiting to run
Security Scan / backend-security (push) Waiting to run
Security Scan / frontend-security (push) Waiting to run
2026-05-30 16:30:59 +08:00
Wesley Liddick
52292741cb
Merge pull request #2849 from Pluviobyte/fix/count-tokens-payload-filter
fix(gateway): filter count_tokens generation fields
2026-05-29 16:11:01 +08:00
Pluviobyte
27600b1d2c
fix(gateway): filter count_tokens generation fields
Anthropic count_tokens rejects generation-only fields such as temperature, top_p, top_k, stream, and stop sequences. Passing the original messages payload through unchanged can turn otherwise valid requests into upstream 400 errors.

Sanitize only the count_tokens upstream payload after the gateway's existing request normalization, preserving fields that existing compatibility paths rely on while removing parameters the count_tokens endpoint does not accept.

Fixes #2764

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 05:40:50 +00:00
alfadb
ddf91e9a7f fix(gateway): 按最终 anthropic-beta header 对 body.context_management 做能力维度 sanitize
上游 Anthropic 在 body 含 `context_management` 但最终发出去的
`anthropic-beta` header 不含 `context-management-2025-06-27` 时会拒收:

    {
      "type": "invalid_request_error",
      "message": "context_management: Extra inputs are not permitted"
    }
    (HTTP 400, request_id 形如 req_011C...)

该 400 在 haiku 路径上触发,因为三个 beta header 构造器有意排除了
context-management beta:

  - HaikuBetaHeader          (messages,     OAuth / mimic CC)
  - APIKeyHaikuBetaHeader    (messages,     API-key)
  - CountTokensBetaHeader    (count_tokens, 所有认证类型)

但 body 中仍然带着 `context_management` 字段,原因有二:

  1. normalizeClaudeOAuthRequestBody 在 thinking_enabled / thinking_adaptive
     打开时为 `clear_thinking_20251015` 主动注入;
  2. 客户端 (Claude Code CLI >= 2.1.87) 原样发送, 网关透传时一并转发。

修复方案: 能力维度对称约束
==========================

对齐已有的 Bedrock 模式
(`backend/internal/service/bedrock_request.go` 中的
`sanitizeBedrockFieldsForBetaTokens`):
根据 **最终** 发出的 `anthropic-beta` header 决定是否保留
`body.context_management`, 而不是按 model 名或路由分类来决定。

新增纯函数:

    sanitizeAnthropicBodyForBetaTokens(body, betaHeader) (body, changed)

如果 `betaHeader` 不含 `context-management-2025-06-27`, 用 sjson 把 body
字段 strip 掉; 否则原样返回。

在所有 Anthropic / Anthropic-兼容 上游出口都接入:

| 路径                                       | sanitize 接入点                                       |
|--------------------------------------------|-------------------------------------------------------|
| /v1/messages OAuth mimic CC                | buildUpstreamRequest                                  |
| /v1/messages OAuth 真 CC 透传              | buildUpstreamRequest                                  |
| /v1/messages API-key                       | buildUpstreamRequest                                  |
| /v1/messages API-key passthrough           | buildUpstreamRequestAnthropicAPIKeyPassthrough        |
| /v1/messages Vertex / service-account      | buildUpstreamRequestAnthropicVertex                   |
| /v1/messages/count_tokens (全部 4 条路径)  | buildCountTokensRequest,                              |
|                                            | buildCountTokensRequestAnthropicAPIKeyPassthrough     |
| Antigravity Anthropic-兼容 上游            | AntigravityGatewayService.ForwardUpstream             |
| Bedrock                                    | (已由 sanitizeBedrockFieldsForBetaTokens 处理)        |

为什么要重排 (而不是加一行调用)
================================

sanitize 必须 **在** `signBillingHeaderCCH` 之前运行。CCH 对整个 body
取 xxHash64 摘要后写入 billing header 里 5 位十六进制的 `cch` 字段;
如果先签名再 strip, 上游对发出去的 body 重算 hash 会和 `cch` 不一致,
请求被判为 third-party。这就要求在 `http.NewRequest` 之前算出最终的
`anthropic-beta` header, 所以把原本内联在 builder 里的 beta 计算逻辑
抽成了两个纯函数:

  - computeFinalAnthropicBeta              (messages 路径: mimic 不透传
                                            客户端 beta)
  - computeFinalCountTokensAnthropicBeta   (count_tokens 路径: mimic 不
                                            跳过白名单透传)

两者逐位保留原行为:

  - mimic 路径在 messages 上跳过客户端 beta, 在 count_tokens 上合并
  - API-key 路径尊重 `InjectBetaForAPIKey` 开关
  - dropSet (`defaultDroppedBetasSet` + BetaPolicy filter) 应用在主路径,
    passthrough / Vertex 路径有意不应用 —— 这条原有的不对称行为本 PR
    不动。

一条语义测试 (`TestSanitizeMustBeBeforeCCHSigning_HashConsistency`) 把
顺序约束文档化并强制守住: 它证明 `sanitize -> signBillingHeaderCCH`
产生的 `cch` 与最终 body 一致, 而 `signBillingHeaderCCH -> sanitize`
产生的 `cch` 会被上游 hash 重算判失败。

为什么是能力维度 (而不是 haiku 模型名匹配)
==========================================

最朴素的"按 model 名 strip"方案
(`strings.Contains(modelID, "haiku") -> DeleteBytes "context_management"`)
有四个真实失败模式:

  1. 过度删除。CLI >= 2.1.87 的真 Claude Code 客户端在 haiku 上同时
     发送 body 字段 **和** `anthropic-beta: context-management-2025-06-27`。
     一律 strip 会让该用户的 `clear_thinking_20251015` 静默失效。
  2. 别名漂移。未来的 haiku 别名 (`claude-3-haiku-...`,
     `claude-haiku-...` 等) 改变匹配面; 任何新别名都会悄悄绕过 strip。
  3. count_tokens 漏覆盖。count_tokens 有自己的 builder 和不同的 beta
     header 集合; 在一个地方做 model 名检查会漏掉这条路径。
  4. API-key passthrough 早退。passthrough builder 在 model 名 strip
     之前就 return 了, strip 根本不执行。

能力维度沿着 header 端到端走, 上述 4 个 case 都由构造方式保证正确,
不依赖任何 modelID 匹配。

防御项
======

  - 当 `sjson.DeleteBytes` 在 `gjson` 刚验证过字段存在的 body 上失败时,
    `sanitizeAnthropicBodyForBetaTokens` 会记 warning 日志 —— 这种情况
    现实中仅在请求中途被破坏时发生, 日志把此前会静默发生的 body / header
    不一致暴露出来。
  - `header_util.go` 新增 `deleteHeaderAllForms`: 在白名单透传已经写入
    canonical 大小写的 `Anthropic-Beta` 之后再覆盖, 否则会同时留下两条。

测试
====

`backend/internal/service` 下新增 44 个测试:

  - 纯函数: anthropicBetaTokensContains x 5, sanitize keep/strip x 6,
    computeFinal{Anthropic,CountTokens}AnthropicBeta x 12
  - normalize 回归 x 5
  - buildUpstreamRequest 端到端 x 4
      (OAuth mimic haiku strip / mimic sonnet preserve /
       真 CC haiku 带客户端 beta preserve / API-key haiku strip)
  - buildCountTokensRequest 端到端 x 2
  - buildUpstreamRequestAnthropicAPIKeyPassthrough x 2 (strip / preserve)
  - buildCountTokensRequestAnthropicAPIKeyPassthrough x 2 (strip / preserve)
  - buildUpstreamRequestAnthropicVertex x 2 (strip / preserve, 含 outgoing
    `anthropic-beta` header 对称断言)
  - CCH 顺序语义测试 x 1

unit 套件全过 (本机 88s), `golangci-lint` 0 issues。

已知局限 (本 PR 范围外)
========================

  - Vertex 路径用透传过来的客户端 `anthropic-beta` header 作为 sanitize
    依据, 而不是 Vertex 侧的能力矩阵。最坏情况是过度 strip (= 当前 main
    的行为, 主路径本来什么都不 strip); 不是 regression。完整的 Vertex
    能力模型属于单独的 PR。
  - Vertex builder 仍然不应用 BetaPolicy filter / dropSet。这是该 builder
    早 return 的既有架构决策, 本 PR 不动。
  - count_tokens mimic 在 haiku 上仍然注入 `context-management-2025-06-27`
    (因为原 count_tokens mimic 逻辑并不像 messages mimic 那样排除它)。
    本 PR 逐位保留 main 的行为; 是否要让它与 messages mimic 的排除策略
    统一是另一个问题。
  - `sanitizeAnthropicBodyForBetaTokens` 目前只处理
    `context_management <-> context-management-2025-06-27` 这一对。如果
    Anthropic 后续推出更多 beta-gated body 字段, 可以在后续 PR 重构为
    `{body 路径 -> required beta token}` 注册表的形式。
2026-05-28 00:02:50 +08:00
Wesley Liddick
bbe847ed3e
Merge pull request #2805 from StarryKira/feat/configurable-pool-retry-status-codes
feat(account): configurable pool-mode same-account retry status codes
2026-05-27 22:09:55 +08:00
StarryKira
21033dceb9 feat(account): configurable pool-mode same-account retry status codes
Pool mode currently retries the same account for a fixed set of
upstream HTTP statuses: 401, 403, 429. Some upstream pool deployments
also need same-account retry for transient provider/proxy statuses
such as 502, 503, 520, 529, but hard-coding more statuses changes
behavior for everyone.

Add a per-account credentials option `pool_mode_retry_status_codes`
that lets admins choose which upstream HTTP status codes trigger
same-account retry in pool mode:

- Unset (default): preserve the current 401/403/429 default
- Explicit list: override the defaults with the configured codes
- Codes normalized to the 100-599 range, deduplicated, sorted

The standalone `isPoolModeRetryableStatus` helper is kept as the
default-only fallback. All 15 gateway call sites switch to the new
`Account.IsPoolModeRetryableStatus` method so behavior is preserved
for accounts that do not configure the new field.

Frontend admin UI gains a "Retry Status Codes" comma-separated input
under the pool-mode section in both Create/Edit account modals
(en + zh i18n).

Fixes #2731

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 11:24:25 -07:00
wucm667
a31b507484 fix(scheduler): 模型404仅冷却账号模型组合 2026-05-26 20:29:48 +08:00
DaydreamCoding
6b39b344d8 feat(quota): 用户 × 平台 USD 配额
为用户在 anthropic/openai/gemini/antigravity 四个平台上提供日/周/月
三个窗口的 USD 配额管控。配额语义:未设置=不限制,0=禁用,>0=美元上限。

两层模型:
- 配置层:系统默认配额,以及 email/linuxdo/oidc/wechat/github/google/
  dingtalk 七个鉴权来源的默认配额,存于 settings,以嵌套 JSON 整体读写
  (系统 1 个 key + 每个来源 1 个 key),整体替换语义。
- 运行时层:user_platform_quota 表按用户记录实际配额,与配置层解耦。

后端:新增 ent schema 与 140_user_platform_quotas.sql 迁移、repository
与 service 端口、计费链路集成、管理端与用户端读写接口。
前端:管理端设置页配额编辑、用户配额管理 Modal、用户 Dashboard 展示、
中英文案。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 10:49:20 +08:00
erio
fe1c6c958b feat(bedrock): add Claude Code compatibility for AWS Bedrock
- Export ApplyBedrockCCCompat() in GatewayService, called after channel
  model mapping to ensure mapped model ID is used for Opus 4.7+ detection
- Add sanitizeBedrockCCFields(): remove service_tier/interface_geo/
  context_management, inject max_tokens/anthropic_version defaults
- Add sanitizeBedrockCCBetaTokens(): filter anthropic_beta to keep only
  Bedrock-supported tokens, reusing autoInjectBedrockBetaTokens and
  filterBedrockBetaTokens for consistent rules
- Remove unsupported beta tokens (interleaved-thinking, context-management)
  from whitelist based on AWS official docs
- Simplify IsBedrockCCCompatEnabled() to check boolean toggle directly,
  applying CC compat to all accounts regardless of platform
- Add unit tests for IsBedrockCCCompatEnabled (8 cases),
  sanitizeBedrockCCFields (8 cases), sanitizeBedrockCCBetaTokens (7 cases)
- Update bedrock beta policy tests for removed auto-injection
2026-05-21 11:46:24 +08:00
erio
4fd21994c5 feat(bedrock): add Claude Code compatibility transformations for Bedrock accounts
Add channel-level Bedrock CC compatibility toggle (similar to web_search_emulation)
that fixes 4 types of Bedrock 400 errors seen with Claude Code:

1. thinking.type "enabled" → "adaptive" for Opus 4.7+ (only supports adaptive)
2. Add default budget_tokens when missing for older models
3. Replace illegal characters in tool_use IDs to match Bedrock's ^[a-zA-Z0-9_-]+$ pattern
4. anthropic_version / invalid beta flag (already handled elsewhere)

Transformations run in Forward() before any forwarding path, so both native Bedrock
accounts and apikey passthrough accounts pointing to Bedrock relays benefit.

Includes channel-level toggle UI and unit tests.
2026-05-20 21:47:38 +08:00
name
8211aa7066 fix: retry on "thinking block must contain thinking" upstream error
Some clients reuse assistant history from other models when switching to
claude with extended thinking enabled. If a prior thinking block lacks the
thinking text field, upstream returns:
  messages.X.content.Y.thinking: each thinking block must contain thinking

Add this pattern to isThinkingBlockSignatureError so the existing
FilterThinkingBlocksForRetry retry path triggers and rewrites/drops the
offending blocks.
2026-05-20 18:46:50 +08:00
name
2eb622f2f6 Remove ops retry replay storage 2026-05-19 19:37:41 +08:00
DaydreamCoding
b19da9c7fe feat(dingtalk): 钉钉 OAuth 登录接入与 internal_only 用户属性同步
⚠️ 应用类型约束:当前实现仅支持「钉钉登录-企业内部应用」(DingTalk 开放平台
internal_app 类型)。第三方个人应用、第三方企业应用类型暂不支持——OAuth 流程
相同但 corp 校验、跨企业行为不同。backend 通过 DingTalkAppKind 校验对非
internal_app 类型 fail-closed(硬约束)。

钉钉 OAuth 登录主链
- 4 步 OAuth 链:ExchangeCodeForUserToken / GetUnionIdByUserToken /
  GetUserIdByUnionId / GetStaffInfoByUserId;app token 缓存
- pending session 机制持久化 OAuth 中间态;cookie-only token 持久化
- 三种分流:bind_login_required / email_completion / choose_account_action
- corp_restriction_policy 支持 none + internal_only;stale "whitelist" 在
  加载层与写入层均静默 coerce 为 none + slog.Warn
- bypass_registration 开关:企业内部模式豁免全局 REGISTRATION_DISABLED
- isReservedEmail / signup_source / canUnbindProvider / OAuth pending flow
  等横切点支持 dingtalk provider
- migration 136:4 表 CHECK 约束加入 'dingtalk' provider 值

internal_only 模式同步企业邮箱/姓名/部门到用户属性
- SyncCorpEmail / SyncDisplayName / SyncDept 三个独立开关 + 对应
  SyncXxxAttrKey 目标属性 key(默认 dingtalk_email / dingtalk_name /
  dingtalk_department);非 internal_only policy 在写入层与加载层均
  coerce 为 false,admin handler 与 setting_service 双层兜底
- 同步语义:首次注册写 users.username(昵称优先 → 企业姓名 fallback),
  之后每次登录刷新 3 个属性;空值也写入以覆盖旧值
- 邮箱三级 fallback:org_email > email > extension["企业邮箱"]
  (钉钉自定义字段 JSON)
- 部门路径递归向上拼接,跳过 dept_id=1 选首个真实子部门,剥离根组织名
- GetUnionIdByUserToken 同时返回 OIDC /contact/users/me 的 nick 字段;
  新增 GetDeptInfo 调用 OAPI /topapi/v2/department/get
- AuthHandler 注入 UserAttributeService;OAuth pending flow 在
  createPendingOAuthAccount / bindPendingOAuthLogin 分别派发到
  AfterRegistration(syncUsername=true)/ AfterLogin
- migration 137 seed dingtalk_email/name/department 三个用户属性定义

附带修复(同集成路径暴露的两个 OAuth 注册回归)
- LoginOrRegisterOAuthWithTokenPair 新建用户分支用 inferLegacySignupSource
  覆写 caller 显式传入的 signupSource,导致 dingtalk/linuxdo/oidc/wechat
  渠道授权按 email 渠道读取;改为只在 caller 未显式传入时回退邮箱推断
- mergeProviderDefaultGrantSettings 把 parse fallback 默认值
  (Concurrency=5 / Balance=0) 当作"未配置"哨兵,admin 显式设 5 时被误判
  退回全局默认(复现:全局默认 1 + 渠道默认并发 5 + grant_on_signup → 新
  用户实际 concurrency=1);去掉哨兵,admin 任何 >=0 值都覆盖 globalDefaults

前端
- DingTalk Login / Callback / EmailCompletion / ChoiceAccount / Error
  视图;router + auth API client
- admin SettingsView:corp policy radio(none / internal_only)+ bypass
  注册开关 + i18n;internal_only 下展示三同步开关 + 目标 attr key 下拉
  (拉取 user attribute definitions),展示 fieldEmail /
  qyapi_get_department_list 钉钉权限申请提示
- Profile:S1 主动绑定 / S5 解绑钉钉按钮 + 合成邮箱防自锁

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 15:27:47 +08:00
Wesley Liddick
a340002c6d
Merge pull request #2401 from 2ue/fix/normalize-image-billing-size
修复图片计费尺寸归一化与使用记录展示
2026-05-19 14:00:24 +08:00
lyen1688
164e2f610c fix: add keepalive for Anthropic passthrough streams 2026-05-18 18:41:25 +08:00
2ue
bb4c1abe28 Fix image billing size normalization 2026-05-12 15:21:31 +08:00
shaw
9377c96746 fix: 让消息 cache_control 改写默认关闭 2026-05-11 21:26:41 +08:00
shaw
501b7f2772 fix: stabilize anthropic passthrough timeout error 2026-05-07 10:24:29 +08:00
2ue
6faa344916 feat: add OpenAI image generation controls 2026-05-05 03:26:54 +08:00
shaw
72d5ee4cd1 fix: drain OpenAI compat streams for usage 2026-05-03 17:11:27 +08:00
shaw
73b872998e feat: 添加 Anthropic 缓存 TTL 注入开关 2026-04-30 13:38:22 +08:00
shaw
733627cf9d fix: improve sticky session scheduling 2026-04-30 11:38:11 +08:00
Wesley Liddick
4d676dddd1
Merge pull request #2066 from alfadb/fix/anthropic-stream-eof-failover
fix(gateway): Anthropic 流式 EOF 失败移交 + SSE error 帧标准化
2026-04-29 17:09:47 +08:00
alfadb
d78478e866 fix(gateway): sanitize stream errors to avoid leaking infrastructure topology
(*net.OpError).Error() concatenates Source/Addr fields, so the previous
disconnectMsg surfaced internal source IP/port and upstream server address
to clients via SSE error frames and UpstreamFailoverError.ResponseBody
(reported by @Wei-Shaw on PR #2066).

- Add sanitizeStreamError that maps known errors (io.ErrUnexpectedEOF,
  context.Canceled, syscall.ECONNRESET/EPIPE/ETIMEDOUT/...) to fixed
  descriptions and falls back to a generic placeholder, with an explicit
  *net.OpError branch that drops Source/Addr fields entirely.
- Use sanitized message in client-facing disconnectMsg; full ev.err is
  still preserved in the existing operator log line for diagnosis.
- Tests cover net.OpError redaction, the failover ResponseBody path, and
  every known sanitized error mapping.
2026-04-29 15:44:54 +08:00
alfadb
4c474616b9 fix(gateway): emit Anthropic-standard SSE error events and failover body
Two follow-ups to PR #2066's failover-wrap fix:

1. Failover ResponseBody (`UpstreamFailoverError.ResponseBody`) was encoded
   as `{"error": "<msg>"}` (string field). `ExtractUpstreamErrorMessage`
   probes for `error.message`, `detail`, or top-level `message` only — so
   `handleFailoverExhausted` and downstream passthrough rules saw an empty
   message, losing the EOF root cause in ops logs. Re-encode as the
   Anthropic standard shape `{"type":"error","error":{"type":"upstream_disconnected","message":"..."}}`.
   (Addresses the inline review comment from copilot-pull-request-reviewer
   on Wei-Shaw/sub2api#2066.)

2. The streaming `event: error` SSE frame for `response_too_large`,
   `stream_read_error`, and `stream_timeout` was non-standard
   (`{"error":"<reason>"}`). Anthropic SDKs (and Claude Code) expect
   `{"type":"error","error":{"type":"...","message":"..."}}` and parse
   `error.type`/`error.message` accordingly. Refactor `sendErrorEvent` to
   take both reason and message, and emit the standard frame so client
   SDKs surface a real diagnostic message instead of a generic stream error.

This does not by itself prevent task interruption on long-stream EOF
(SSE has no resume; client-side retry remains the only complete fix), but
it gives both server-side ops logs and client-side error UIs a meaningful
upstream message so users know the next step is to retry.

Tests updated to assert the new body shape on both branches plus a new
assertion that `ExtractUpstreamErrorMessage` returns a non-empty string.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 20:24:17 +08:00
alfadb
6327573534 fix(gateway): wrap Anthropic stream EOF as failover error before client output
Anthropic streaming path (gateway_service.go) returned a plain error on
upstream SSE read failure, so the handler-level UpstreamFailoverError check
never fired and the client received a bare `stream_read_error` event,
breaking long-running tasks even when no bytes had been written yet.

The most common trigger is HTTP/2 GOAWAY from api.anthropic.com edge
backends doing graceful rotation: Go's http.Transport surfaces this as
`unexpected EOF` and never auto-retries.

Mirror what the OpenAI and antigravity gateways already do: when the read
error happens before any byte has reached the client (`!c.Writer.Written()`),
return `*UpstreamFailoverError{StatusCode: 502, RetryableOnSameAccount: true}`
so the handler can retry on the same or another account. After client
output has begun, SSE has no resume protocol — keep the existing passthrough
behavior.

Tests cover both branches via streamReadCloser-based fixtures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 19:12:48 +08:00
Oliver
6d11f9ed77 Add Vertex service account support 2026-04-25 20:39:58 -04:00
shaw
496469ac4e fix(gateway): skip body mimicry for real Claude Code clients to restore prompt caching
PR #1914 unconditionally applied the full mimicry pipeline to all OAuth
accounts, including real Claude Code CLI clients. This replaced the
client's long system prompt (~10K+ tokens with stable cache_control
breakpoints) with a short ~45 token [billing, CC prompt] pair, which
falls below Anthropic's 1024-token minimum cacheable prefix threshold.
The result: every request created a new cache but never hit an existing
one.

Fix: restore the Claude Code client detection gate so that real CC
clients bypass body-level mimicry (system rewrite, message cache
management, tool name obfuscation). Non-CC third-party clients
(opencode, etc.) continue to receive full mimicry.

Also harden the detection logic:
- Make UA regex case-insensitive (align with claude_code_validator.go)
- Validate metadata.user_id format via ParseMetadataUserID() instead of
  just checking non-empty, preventing third-party tools from spoofing
  a claude-cli/* UA with an arbitrary user_id string to bypass mimicry
2026-04-25 22:50:35 +08:00
hungryboy1025
8987e0ba67 fix(openai): tighten responses stream account tests 2026-04-25 16:56:50 +08:00
shaw
732d6495ea chore(gateway): fix lint issues from cc-mimicry-parity merge
- staticcheck QF1001: apply De Morgan's law to the OAuth-mimic header
  passthrough guard (`!(a && b)` → `a != ... || !b`).
- unused: drop `isClaudeCodeRequest`, which became dead after PR #1914
  switched both `/v1/messages` and `/count_tokens` paths to unconditional
  `account.IsOAuth()` mimicry. The lowercase helper `isClaudeCodeClient`
  is kept (still referenced by `TestIsClaudeCodeClient`).
2026-04-25 08:58:57 +08:00
keh4l
bdbd2916f5 fix(gateway): skip client header passthrough on OAuth mimicry path
Root cause of persistent third-party detection: sub2api's
buildUpstreamRequest transparently forwards client headers via
allowedHeaders whitelist (addHeaderRaw) before applying mimicry
overrides. When third-party clients (opencode, etc.) send their own
anthropic-beta / user-agent / x-stainless-* / x-claude-code-session-id
values, these get appended to the request alongside our injected
headers, creating an inconsistent header set that Anthropic detects.

Parrot's build_upstream_headers constructs exactly 9 headers from
scratch and never forwards anything from the client. This is why
'same opencode version, some users work some don't' — different
opencode configs/versions send different header combinations.

Fix: when tokenType=oauth and mimicClaudeCode=true, skip the
client header passthrough loop entirely. The subsequent
applyClaudeCodeMimicHeaders + ApplyFingerprint + beta merge
pipeline constructs all necessary headers from our controlled values.

Also: remove systemIncludesClaudeCodePrompt gate — OAuth accounts
now unconditionally rewrite system (even if client already sent a
Claude Code-style prompt), ensuring billing attribution block is
always present.
2026-04-25 00:43:38 +08:00
keh4l
6dc89765fd fix(gateway): always apply full mimicry for OAuth accounts regardless of client identity
Before: isClaudeCodeRequest() checked whether the client looks like a
real Claude Code CLI (UA, system prompt, X-App header, metadata format).
If it looked like Claude Code, all mimicry was skipped — the assumption
being that a real CLI needs no help.

Problem: third-party tools like opencode partially impersonate Claude
Code (sending claude-cli UA + claude-code beta + CC system prompt) but
miss critical details (billing attribution block, tool-name obfuscation,
cache breakpoints, full beta set). Some users' opencode instances pass
the isClaudeCodeRequest check, causing sub2api to skip mimicry entirely,
while Anthropic still detects the request as third-party.

This explains why 'same opencode version, some users work, some don't'
— it depends on which opencode features/config trigger the validator.

Fix: OAuth accounts now unconditionally run the full mimicry pipeline,
matching Parrot's behavior (Parrot never checks client identity).
This is safe because our mimicry is strictly more complete than any
third-party client's partial impersonation.

Changed:
  - /v1/messages path: remove isClaudeCode gate
  - /v1/messages/count_tokens path: same
2026-04-25 00:26:37 +08:00
keh4l
f3233db01f fix(gateway): apply D/E/F mimicry to native /v1/messages and count_tokens paths
The previous commit only wired stripMessageCacheControl,
addMessageCacheBreakpoints, and tool-name obfuscation into
applyClaudeCodeOAuthMimicryToBody (used by /chat/completions and
/responses). The native /v1/messages path and count_tokens path
have their own independent mimicry code blocks and were missed.

Now all three entry points share the same D/E/F pipeline:
  - /v1/messages (gateway_service.go forwardAnthropic)
  - /v1/messages/count_tokens (gateway_service.go countTokens)
  - OpenAI compat (applyClaudeCodeOAuthMimicryToBody)
2026-04-24 23:16:32 +08:00
keh4l
6e12578bc5 feat(gateway): port Parrot tool-name obfuscation + message cache breakpoints
Implements the remaining three parity items with Parrot cc_mimicry:

  D) Tool-name obfuscation
     - Dynamic mapping when tools.length > 5 (matches Parrot threshold).
       Fake names follow {prefix}{name[:3]}{i:02d} (e.g. 'manage_bas00').
       Go port of random.Random(hash(tuple(names))) uses fnv64a seed +
       math/rand; byte-exact reproduction is impossible (Python hash vs
       Go hash), but the two invariants that matter are preserved:
         * same input tool_names yield identical mapping (cache hit)
         * prefix pool is shuffled (names look distributed)
     - Static prefix map (sessions_ -> cc_sess_, session_ -> cc_ses_)
       applied as fallback, matching Parrot TOOL_NAME_REWRITES verbatim.
     - Server tools (web_search_20250305, computer_*, etc.) are NOT
       renamed; only type=='function' and type=='custom' tools are.
     - tool_choice.name is rewritten in sync (only when type=='tool').
     - Response side: bytes-level replace on every SSE chunk / JSON
       body at 6 injection points (standard stream/non-stream,
       passthrough stream/non-stream, chat_completions stream +
       non-stream, responses stream + non-stream). Reverse mapping
       applied longest-fake-name-first to prevent substring conflicts
       (parity with Parrot _restore_tool_names_in_chunk).
     - tool_choice is no longer unconditionally deleted in
       normalizeClaudeOAuthRequestBody — Parrot passes it through.

  E) tools[-1] cache_control breakpoint
     - Injected as {type:ephemeral, ttl:<DefaultCacheControlTTL>} when
       the last tool has no cache_control. Client-provided ttl is
       passed through unchanged (repo-wide policy).

  F) messages cache_control strategy
     - stripMessageCacheControl removes every client-provided
       messages[*].content[*].cache_control (multi-turn stability).
     - addMessageCacheBreakpoints then injects two stable breakpoints:
       (1) last message, and (2) second-to-last user turn when
       messages.length >= 4.
     - Combined with the system block breakpoint and tools[-1]
       breakpoint, this gives exactly the 4 breakpoints Anthropic
       allows per request.

Non-trivial implementation details to be aware of when rebasing:

  * Two new files, no upstream collision:
      gateway_tool_rewrite.go       (D + E algorithms)
      gateway_messages_cache.go     (F strip + breakpoints)
  * Two new feature calls bolted onto the tail of
    applyClaudeCodeOAuthMimicryToBody in gateway_service.go — rebase
    conflicts will be ~10 lines maximum.
  * Response-side injection points all wrap their existing write with
    reverseToolNamesIfPresent(c, ...), preserving original behavior
    when no mapping is stored (static prefix rollback still runs).
  * Non-stream chat/responses switched from c.JSON to
    json.Marshal + c.Data so bytes-level replace is possible.
  * Retry bodies (FilterThinkingBlocksForRetry,
    FilterSignatureSensitiveBlocksForRetry, RectifyThinkingBudget)
    only prune blocks — they preserve the already-obfuscated tool
    names, so no extra mapping re-application is needed.

Manual QA: end-to-end scenario verified with 6 tools (above threshold)
and tool_choice.type=='tool'. Obfuscation + restore roundtrip shown
in test logs; then removed the temp test file.

Tests (16 new):
  - buildDynamicToolMap stability + below-threshold guard
  - sanitizeToolName precedence (dynamic > static)
  - restoreToolNamesInBytes longest-first + static rollback
  - applyToolNameRewriteToBody skips server tools + syncs tool_choice
  - applyToolsLastCacheBreakpoint defaults to 5m + passes client ttl
  - stripMessageCacheControl + addMessageCacheBreakpoints in the
    1/4/string-content cases + second-to-last user turn selection
  - buildToolNameRewriteFromBody ReverseOrdered is desc-by-fake-length
  - fake name shape follows Parrot {prefix}{head3}{i:02d}
2026-04-24 23:16:32 +08:00
keh4l
a25faecadd feat(gateway): align body shape with real Claude Code CLI defaults
Three field-level alignments in normalizeClaudeOAuthRequestBody to
match real Claude Code CLI traffic byte-for-byte:

  1. temperature: previously deleted unconditionally; now passes
     through client value, defaults to 1 when absent (real CLI
     always sends temperature, default 1).

  2. max_tokens: defaults to 128000 when absent (real CLI default).

  3. context_management: when thinking.type is enabled/adaptive
     and the client did not provide context_management, inject
     {"edits":[{"type":"clear_thinking_20251015","keep":"all"}]}
     to mirror real CLI behavior.

tool_choice removal is unchanged (Claude Code OAuth credentials
do not allow client-supplied tool_choice).

Tests updated:
  - gateway_body_order_test.go: temperature/max_tokens are now
    expected in output; tool_choice still removed.
  - gateway_prompt_test.go: system array is now 2 blocks
    (billing + cc prompt), assertions adjusted.
  - gateway_anthropic_apikey_passthrough_test.go: same 2-block
    assertion.
2026-04-24 23:16:32 +08:00
keh4l
5862e2d8d9 feat(gateway): add billing attribution block with cc_version fingerprint
Real Claude Code CLI always sends a 2-block system array:

  [0] {"type":"text", "text":"x-anthropic-billing-header: cc_version=X.Y.Z.{fp}; cc_entrypoint=cli; cch=00000;"}
  [1] {"type":"text", "text":"You are Claude Code...", "cache_control":{...}}

Before this commit, sub2api's mimicry path only produced block [1].
The missing billing block is one of the primary third-party detection
signals Anthropic uses for Claude-Code-scoped OAuth tokens.

New file gateway_billing_block.go ports the fingerprint algorithm
(byte-for-byte from Parrot cc_mimicry.py:compute_fingerprint):
pick chars at positions [4,7,20] of the first user text, then
`sha256(SALT + chars + cc_version)[:3]`.

  - claude/constants.go: CLICurrentVersion = "2.1.92" (must match UA)
  - gateway_billing_block.go: computeClaudeCodeFingerprint +
    buildBillingAttributionBlockJSON + extractFirstUserText
  - gateway_service.go: rewriteSystemForNonClaudeCode now emits both
    blocks in order; cch=00000 is filled in later by
    signBillingHeaderCCH in buildUpstreamRequest.

Downstream compat note: syncBillingHeaderVersion's regex
`cc_version=\d+\.\d+\.\d+` only matches the semver triple,
leaving the `.{fp}` suffix intact when rewriting in buildUpstreamRequest.
2026-04-24 23:16:32 +08:00
keh4l
66d6454535 feat(claude): add ttl to cache_control with default 5m
Real Claude CLI traffic sends cache_control as
`{"type":"ephemeral","ttl":"1h"}`. Our previous payload only
sent `{"type":"ephemeral"}`, which is a bytewise mismatch with
the official CLI and one more third-party detection signal.

Policy: client-provided ttl is always passed through unchanged.
Proxy-generated cache_control blocks default to 5m (vs Parrot's 1h)
to avoid burning the 1h cache budget on automatic breakpoints while
still aligning with the `ttl` field being present.

  - claude/constants.go: DefaultCacheControlTTL = "5m"
  - apicompat/types.go: new AnthropicCacheControl type with TTL field;
    AnthropicTool gains optional CacheControl pointer so the mimicry
    path can attach a cache breakpoint to tools[-1] later.
  - service/gateway_service.go: anthropicCacheControlPayload gains TTL;
    marshalAnthropicSystemTextBlock and rewriteSystemForNonClaudeCode
    emit ttl=5m by default.
2026-04-24 23:16:32 +08:00
keh4l
165553cfb0 fix(gateway): use full beta list in buildUpstreamRequest mimicry path
The previous commit added FullClaudeCodeMimicryBetas() but the two
call sites in buildUpstreamRequest still hardcoded the old 3-token
subset. Anthropic now checks the complete set of beta tokens to
decide if a request qualifies as Claude Code. Wire them up:

  - /v1/messages mimic path: requiredBetas = FullClaudeCodeMimicryBetas()
  - /v1/messages/count_tokens mimic path: same + BetaTokenCounting

Haiku models keep the 2-token exemption (BetaOAuth + InterleaveThinking).
2026-04-24 23:16:32 +08:00
keh4l
b5467d610a fix(gateway): apply full Claude Code mimicry on /chat/completions and /responses
Before: the OpenAI-compat forwarders only called injectClaudeCodePrompt,
which prepends the Claude Code banner but leaves the rest of the body
in its original non-Claude-Code shape. The codebase already admits this
is insufficient (see the comment on rewriteSystemForNonClaudeCode in
gateway_service.go: "仅前置追加 Claude Code 提示词无法通过检测").

Effect: OAuth accounts served through /v1/chat/completions or /v1/responses
were detected as third-party apps and bled plan quota with:

    Third-party apps now draw from your extra usage, not your plan limits.

Fix:
  - apicompat.AnthropicRequest: add Metadata json.RawMessage so metadata
    survives the OpenAI->Anthropic->Marshal round trip; without it the
    downstream rewrite has no user_id to work with.
  - service: extract applyClaudeCodeOAuthMimicryToBody, a ParsedRequest-free
    variant of the /v1/messages mimicry pipeline
    (rewriteSystemForNonClaudeCode + normalizeClaudeOAuthRequestBody +
    metadata.user_id injection) so the OpenAI-compat forwarders can reuse it.
  - service: add buildOAuthMetadataUserIDFromBody + hashBodyForSessionSeed
    for the same reason (no ParsedRequest at the call site).
  - ForwardAsChatCompletions / ForwardAsResponses: replace the 3-line
    prompt-prepend with the full mimicry pipeline.
  - applyClaudeCodeMimicHeaders: set x-client-request-id per-request
    (real Claude CLI always does); missing/duplicated values are one more
    third-party fingerprint signal.

No change to the native /v1/messages path: it already called the full
pipeline, we only lift those helpers into a reusable function.

Tests:
  - go build ./... passes
  - go test ./internal/service/... ./internal/pkg/apicompat/... passes
  - lsp_diagnostics clean on all touched files
  - pre-existing failures in internal/config are unrelated (env-sensitive
    tests that also fail on upstream main)
2026-04-24 23:16:32 +08:00
erio
258fd145ff fix(account): prevent quota-exceeded API key/Bedrock accounts from being scheduled
Add quota exceeded check to IsSchedulable() and refactor
shouldClearStickySession to delegate to IsSchedulable(), eliminating
duplicated logic and fixing missed overload/rate-limit/expired checks.
Frontend displays quota exceeded status independently via quota fields.
2026-04-19 18:45:04 +08:00
erio
44cdef7934 fix(usage): subscription billing honours group rate multiplier
Subscription-mode billing was consuming quota at TotalCost (raw) instead of
ActualCost (TotalCost * RateMultiplier), so per-group rate multipliers —
including free subscriptions (multiplier = 0) — were silently ignored.
Switch the three subscription cost writes in buildUsageBillingCommand,
finalizePostUsageBilling, and the legacy postUsageBilling fallback to
ActualCost, and add a table-driven test covering 2x / 0.5x / free multipliers
plus a balance-mode regression check.
2026-04-17 22:06:32 +08:00
erio
10699eeb34 refactor: extract ReadUpstreamResponseBody to deduplicate upstream response read + too-large error handling
Consolidates 9 call sites of resolveUpstreamResponseReadLimit + readUpstreamResponseBodyLimited + ErrUpstreamResponseBodyTooLarge error handling into a single ReadUpstreamResponseBody function with TooLargeWriter callback for API-format-specific error responses (Anthropic, OpenAI, countTokens).
2026-04-16 01:53:22 +08:00
erio
a9880ee7b9 fix: round-2 audit fixes — security, code quality, and UI improvements
Security (HIGH):
- Normalize all Redis cache keys to lowercase (verifyCode, passwordReset)
- Fix verify code TTL renewal on failed attempts: use remaining TTL via
  ExpiresAt field instead of resetting to full 15-minute window
- Add 3 missing fields to diffSettings audit log (promo_code, invitation_code,
  custom_endpoints)

Code quality (MEDIUM):
- Extract filterVerifiedEmails shared helper (balance_notify_service.go)
- Add Pricing array non-empty validation for channel pricing rules
- Add platform token semantics comment in gateway_service.go
- Complete validatePlanPatch test coverage (+10 test cases)
- Replace string types with QuotaThresholdType/QuotaResetMode across frontend
- Remove duplicate getPlatformTextColor/getRateBadgeClass in ChannelsView
- Return EMAIL_NOT_FOUND error on RemoveNotifyEmail miss

UI improvements:
- Reorder cost tooltip: user billing above separator, account billing below
- Add NaN guard to accountBilled function
- Move timezone selector inline into reset-mode row (no longer standalone)
2026-04-14 09:35:05 +08:00
erio
98c9d51791 fix: correct account stats pricing priority order
Priority was wrong:
- Before: custom rules → LiteLLM (when ApplyPricingToAccountStats) → nil
- After:  custom rules → totalCost (when ApplyPricingToAccountStats) → LiteLLM → nil

When ApplyPricingToAccountStats is enabled, use the request's actual
client billing cost (before multiplier) as account_stats_cost, instead
of recalculating from LiteLLM per-token prices which produced incorrect
values for per-request billing mode.

LiteLLM model pricing is now the final fallback (priority 3), used only
when neither custom rules nor ApplyPricingToAccountStats apply.
2026-04-14 09:28:11 +08:00
erio
1262654d97 feat: WebSearch tri-state, account stats pricing fix, quota cache fix, usage tooltip
WebSearch tri-state switch:
- Account-level web_search_emulation changed from bool to tri-state
  string: "default" (follow channel) / "enabled" / "disabled"
- shouldEmulateWebSearch checks channel config when account is "default"
- SQL migration converts old bool values
- Frontend select replaces toggle in Edit/CreateAccountModal

Account stats pricing:
- resolveAccountStatsCost uses upstream model (post-mapping) for matching
- Priority: custom rules → model pricing file (when toggle on) → default
- Custom rules always configurable, independent of toggle
- Account ID field changed to searchable selector filtered by platform
- Description updated to reflect new behavior

Quota notification cache fix:
- CheckAccountQuotaAfterIncrement fetches real-time account from DB
- Reconstructs pre-increment usage for accurate threshold crossing detection
- New AccountQuotaReader interface (minimal: GetByID only)

Usage tooltip:
- Per-request/image billing shows per-request price instead of $0 token price
- Token billing continues to show input/output price per million tokens
2026-04-14 09:26:08 +08:00
erio
11c4606874 fix(channel): use upstream model for account stats pricing and remove channel pricing fallback
- resolveAccountStatsCost now uses the final upstream model (after
  account-level mapping) to match custom pricing rules, fixing the
  issue where requested model (e.g. claude-sonnet-4-5) didn't match
  rules configured for upstream model (e.g. claude-opus-4-6)
- Remove tryChannelPricing fallback — only custom rules are applied,
  unmatched requests use default formula (total_cost × rate)
- Remove unused billingService and serviceTier parameters
- Update description: "启用后将支持自定义账号统计的模型价格"
2026-04-14 09:26:08 +08:00
erio
31550a2c6a fix(notify): use real-time balance for crossing detection and simplify email logic
- Fix cached balance causing threshold crossing to never trigger:
  read real-time balance from billingCacheService instead of stale
  API key auth snapshot
- Remove email="" placeholder concept; all emails are user-managed
- Only send notifications to verified && non-disabled emails
- Frontend: pre-fill user's email in add input when list is empty
- Remove FilterEnabledEmails/IsPrimaryDisabled helpers (no longer needed)
2026-04-14 09:26:07 +08:00
erio
c3812ce1e3 fix(notify): address review findings - accountCost formula, dedup, refactor
- Fix accountCost calculation in finalizePostUsageBilling to match
  postUsageBilling (always multiply by AccountRateMultiplier)
- Use strings.EqualFold for email dedup in collectBalanceNotifyRecipients
- Extract CheckAccountQuotaAfterIncrement into smaller functions:
  buildQuotaDims + asyncSendQuotaAlert (< 30 lines each)
- Add "not splittable" comments for HTML template functions
- Extract QuotaNotifyToggle.vue sub-component to reduce
  QuotaLimitCard.vue from 404 to 339 lines
2026-04-14 09:23:16 +08:00
erio
b32d1a2c9f feat(notify): add balance low & account quota notification system
- User balance low notification: email alert when balance drops below
  configurable threshold (user email + verified extra emails)
- Account quota notification: broadcast email to admin-configured
  recipients when daily/weekly/total quota usage exceeds alert threshold
- Admin settings: global enable/disable, default threshold, quota
  notification email list (Email Settings tab)
- User profile: enable/disable, custom threshold, add/remove extra
  notification emails with verification code flow
- Account quota: per-dimension alert toggle and threshold in quota
  control card
- Trigger logic: first-crossing only (old >= threshold && new < threshold
  for balance; old < threshold && new >= threshold for quota), naturally
  prevents duplicate notifications without Redis dedup
2026-04-14 09:23:02 +08:00
erio
7535e312e0 feat(channels): add custom account stats pricing rules
Allow channels to configure independent model pricing for account
statistics cost calculation, decoupled from user billing.

Backend:
- Migration 101: channels.apply_pricing_to_account_stats toggle,
  channel_account_stats_pricing_rules/model_pricing tables,
  usage_logs.account_stats_cost column
- resolveAccountStatsCost: match rules by group/account, then channel
  pricing, fallback to original formula when unconfigured
- Integrate into both GatewayService.recordUsageCore and
  OpenAIGatewayService.RecordUsage
- Update 8 account stats SQL queries to use
  COALESCE(account_stats_cost, total_cost) * account_rate_multiplier
- 23 unit tests for matching, pricing lookup, and cost calculation

Frontend:
- Channel edit dialog: toggle + custom rules UI with group/account
  multi-select and pricing entry cards
- API types and i18n (zh/en)
2026-04-14 09:22:12 +08:00