Adds an opt-in tier-based fallback scheduling path for Anthropic accounts:
- accountTierLevel(): derives tier from account type without DB migration
(tier-0=OAuth/SetupToken, tier-1=APIKey, tier-2=Bedrock)
- enableTierFallbackChain(): new config flag
gateway.scheduling.enable_tier_fallback_chain (default false)
- selectAccountWithTierFallback(): loads all Anthropic accounts, groups by
tier, honors sticky sessions, applies all existing schedulability guards,
then tries tiers 0→1→2 in order via tryAcquireByLegacyOrder
- Wired into SelectAccountForModelWithExclusions: Anthropic platform +
tier fallback enabled → calls new path instead of mixed scheduling
- Fix pre-existing unit-test build break: NewGatewayService now requires
*RPMTokenBucketService (added in Task #5); add missing nil param
- 7 tests: tier mapping, config toggle, subscription preference,
APIKey fallback, exclusion handling, empty-pool error, Bedrock last resort
- Add GetQuotaRemainingFraction() to Account: returns [0,1] fraction of
remaining quota; 1.0 when no limit is configured (unlimited accounts)
- Add Quota float64 weight field to GatewayOpenAIWSSchedulerScoreWeights
and EnableP2CScheduling bool to GatewayOpenAIWSConfig (both default off)
- Extend selectByLoadBalance scoring with quota factor (gated by Quota>0)
- Add selectByPowerOfTwo(): O(1) P2C selection — samples 2 random candidates,
tries the better-scored one first then the other, falls back to wait plan;
activated when EnableP2CScheduling=true
- Add openAIWSP2CEnabled() helper on OpenAIGatewayService
- Add 6 tests covering quota fraction edge cases, P2C toggle, weight defaults,
single-candidate P2C, two-candidate P2C selection, and quota score ordering
Adds GET /api/v1/admin/ops/ws/requests — a fan-out WebSocket that pushes
per-request metadata (method, path, model, account_id, status, latency_ms)
to all connected admin clients the moment each gateway dispatch completes.
- service/request_event_bus.go: lock-free pub/sub with non-blocking drop
when per-subscriber buffer (64 slots) is full; nil-safe Publish
- service/request_event_bus_test.go: 6 tests (basic, fanout, drop, nil, close)
- GatewayHandler: records reqStartTime at entry; defer emits RequestEvent on
every return; sets status success/error/rate_limited in both Gemini and
Anthropic dispatch paths
- OpsHandler: accepts *RequestEventBus; wires it to RequestStreamWSHandler
- ops_ws_requests_handler.go: subscribes to bus, pushes JSON per event,
reuses existing upgrader/conn-limit/ping-pong infrastructure
- Route: ws.GET("/requests", ...) alongside existing /ws/qps
- wire_gen.go: requestEventBus shared between OpsHandler and GatewayHandler
- New context_compressor.go: pure functions operating on raw JSON body
(gjson/sjson pattern). approxTokens uses chars/4 heuristic.
- compressMessages: removes oldest messages from front, treating
consecutive assistant(tool_use)+user(tool_result) pairs as atomic units
to prevent orphaned tool_result blocks.
- Hooked into Forward() after StripEmptyTextBlocks, gated on
account.Credentials[enable_context_compression].
- Config: gateway.context_compression.max_tokens (default 190000).
- 8 unit tests covering: approx tokens, no-op when under budget,
oldest-message trimming, tool pair preservation, atomic pair removal,
body passthrough, body trimming.
Two anti-thundering-herd improvements:
1. OAuthRefreshAPI.RefreshIfNeeded
Wrap the existing distributed-lock + DB-reread + executor.Refresh
pipeline in a per-process singleflight keyed by cacheKey+window.
Without this, N concurrent goroutines on the same account each pay
one Redis lock RTT and one DB reread; with it, only the leader pays
and the rest share the result.
The refreshWindow is part of the key so a long background-refresh
window cannot starve a short foreground-refresh window.
2. accountRepository.SetTempUnschedulable
Wrap the same path (UPDATE + scheduler outbox enqueue + scheduler
cache sync) in a per-process singleflight keyed by id+until+reason.
The SQL guard (existing < new) already makes the UPDATE idempotent,
but N callers still cost N round-trips and N outbox inserts. With
singleflight, an upstream 401 burst that hits the same account
collapses to one execution.
Tests cover dedup behavior, key separation by account / refresh window,
and that the SQL exec count drops from N to <=2 (UPDATE + outbox).
Add HealthService with Liveness (no-op) and Readiness (DB+Redis ping
with per-component timeout) checks. Expose three endpoints:
- /healthz : new liveness endpoint, zero-dependency, always 200
- /ready : new readiness endpoint, returns 503 with details on dep
failure; suitable for K8s readinessProbe and load balancers
- /health : preserved for backward compatibility, equivalent to
/healthz
Switch primary docker-compose healthcheck to /ready so the container
is only marked healthy once DB+Redis are reachable. Standalone/dev/
local compose files keep /health to avoid disrupting existing setups.
Tests: unit tests cover liveness, readiness with both deps healthy,
each dep failing independently, and per-component timeout enforcement.
- Centralize Claude CLI fingerprint constants (UA, x-stainless-*) in
pkg/claude with BuildCLI/CodeUserAgent helpers
- Reuse constants in DefaultHeaders, identity_service defaults, and
antigravity identity defaults to keep all callers in sync
- Extend ClaudeCodeValidator to accept both claude-cli/ and claude-code/
UA prefixes (transport/helper requests use the latter)
- Update related tests to cover the new UA prefix and version
- When tools contain both web_search and function declarations, use
requestType=agent instead of web_search (Google web_search route
rejects functionDeclarations)
- Set toolConfig.mode=AUTO when mixed tools detected (VALIDATED is
incompatible with googleSearch + functionDeclarations)
- Add hasOnlyWebSearchTools helper
- Fix buildParts test calls missing 4th arg (stripSignatures)
- gofmt: realign AffiliateDetail struct tags in affiliate_service.go
- ineffassign: remove dead seenCompleted assignment before return in account_test_service.go
The hardcoded codex CLI version (0.104.0) causes upstream rejection
when using gpt-5.5 with compact, as the server treats the request
as an outdated client and returns 400/502.
Update codexCLIVersion, codexCLIUserAgent, and openAICodexProbeVersion
to 0.125.0 to match the current Codex CLI release.
Fixes#1933, #1887, #1865
Related: #1609, #1298, #849
- staticcheck QF1001: apply De Morgan's law to the OAuth-mimic header
passthrough guard (`!(a && b)` → `a != ... || !b`).
- unused: drop `isClaudeCodeRequest`, which became dead after PR #1914
switched both `/v1/messages` and `/count_tokens` paths to unconditional
`account.IsOAuth()` mimicry. The lowercase helper `isClaudeCodeClient`
is kept (still referenced by `TestIsClaudeCodeClient`).
- Drop SetAffiliateService setters and ProvideAuthService /
ProvidePaymentService / ProvideUserHandler wrappers in favor of direct
Wire constructor injection. AffiliateService has no back-edge to
Auth/Payment/User, so the indirection was never required.
- Change RegisterWithVerification's variadic affiliateCode to a fixed
parameter; adjust all call sites.
- Validate aff_code length and charset in BindInviterByCode before any
DB lookup, eliminating timing-side-channel and useless DB roundtrips
on malformed input.
- Make affiliate cache invalidation synchronous; surface Redis errors
via the project logger instead of swallowing them in a detached
goroutine.
- Add an integration test guarding cross-layer tx propagation in
AccrueQuota and a unit test pinning the aff_code format rules.
Root cause of persistent third-party detection: sub2api's
buildUpstreamRequest transparently forwards client headers via
allowedHeaders whitelist (addHeaderRaw) before applying mimicry
overrides. When third-party clients (opencode, etc.) send their own
anthropic-beta / user-agent / x-stainless-* / x-claude-code-session-id
values, these get appended to the request alongside our injected
headers, creating an inconsistent header set that Anthropic detects.
Parrot's build_upstream_headers constructs exactly 9 headers from
scratch and never forwards anything from the client. This is why
'same opencode version, some users work some don't' — different
opencode configs/versions send different header combinations.
Fix: when tokenType=oauth and mimicClaudeCode=true, skip the
client header passthrough loop entirely. The subsequent
applyClaudeCodeMimicHeaders + ApplyFingerprint + beta merge
pipeline constructs all necessary headers from our controlled values.
Also: remove systemIncludesClaudeCodePrompt gate — OAuth accounts
now unconditionally rewrite system (even if client already sent a
Claude Code-style prompt), ensuring billing attribution block is
always present.
Before: isClaudeCodeRequest() checked whether the client looks like a
real Claude Code CLI (UA, system prompt, X-App header, metadata format).
If it looked like Claude Code, all mimicry was skipped — the assumption
being that a real CLI needs no help.
Problem: third-party tools like opencode partially impersonate Claude
Code (sending claude-cli UA + claude-code beta + CC system prompt) but
miss critical details (billing attribution block, tool-name obfuscation,
cache breakpoints, full beta set). Some users' opencode instances pass
the isClaudeCodeRequest check, causing sub2api to skip mimicry entirely,
while Anthropic still detects the request as third-party.
This explains why 'same opencode version, some users work, some don't'
— it depends on which opencode features/config trigger the validator.
Fix: OAuth accounts now unconditionally run the full mimicry pipeline,
matching Parrot's behavior (Parrot never checks client identity).
This is safe because our mimicry is strictly more complete than any
third-party client's partial impersonation.
Changed:
- /v1/messages path: remove isClaudeCode gate
- /v1/messages/count_tokens path: same
The previous commit only wired stripMessageCacheControl,
addMessageCacheBreakpoints, and tool-name obfuscation into
applyClaudeCodeOAuthMimicryToBody (used by /chat/completions and
/responses). The native /v1/messages path and count_tokens path
have their own independent mimicry code blocks and were missed.
Now all three entry points share the same D/E/F pipeline:
- /v1/messages (gateway_service.go forwardAnthropic)
- /v1/messages/count_tokens (gateway_service.go countTokens)
- OpenAI compat (applyClaudeCodeOAuthMimicryToBody)
Implements the remaining three parity items with Parrot cc_mimicry:
D) Tool-name obfuscation
- Dynamic mapping when tools.length > 5 (matches Parrot threshold).
Fake names follow {prefix}{name[:3]}{i:02d} (e.g. 'manage_bas00').
Go port of random.Random(hash(tuple(names))) uses fnv64a seed +
math/rand; byte-exact reproduction is impossible (Python hash vs
Go hash), but the two invariants that matter are preserved:
* same input tool_names yield identical mapping (cache hit)
* prefix pool is shuffled (names look distributed)
- Static prefix map (sessions_ -> cc_sess_, session_ -> cc_ses_)
applied as fallback, matching Parrot TOOL_NAME_REWRITES verbatim.
- Server tools (web_search_20250305, computer_*, etc.) are NOT
renamed; only type=='function' and type=='custom' tools are.
- tool_choice.name is rewritten in sync (only when type=='tool').
- Response side: bytes-level replace on every SSE chunk / JSON
body at 6 injection points (standard stream/non-stream,
passthrough stream/non-stream, chat_completions stream +
non-stream, responses stream + non-stream). Reverse mapping
applied longest-fake-name-first to prevent substring conflicts
(parity with Parrot _restore_tool_names_in_chunk).
- tool_choice is no longer unconditionally deleted in
normalizeClaudeOAuthRequestBody — Parrot passes it through.
E) tools[-1] cache_control breakpoint
- Injected as {type:ephemeral, ttl:<DefaultCacheControlTTL>} when
the last tool has no cache_control. Client-provided ttl is
passed through unchanged (repo-wide policy).
F) messages cache_control strategy
- stripMessageCacheControl removes every client-provided
messages[*].content[*].cache_control (multi-turn stability).
- addMessageCacheBreakpoints then injects two stable breakpoints:
(1) last message, and (2) second-to-last user turn when
messages.length >= 4.
- Combined with the system block breakpoint and tools[-1]
breakpoint, this gives exactly the 4 breakpoints Anthropic
allows per request.
Non-trivial implementation details to be aware of when rebasing:
* Two new files, no upstream collision:
gateway_tool_rewrite.go (D + E algorithms)
gateway_messages_cache.go (F strip + breakpoints)
* Two new feature calls bolted onto the tail of
applyClaudeCodeOAuthMimicryToBody in gateway_service.go — rebase
conflicts will be ~10 lines maximum.
* Response-side injection points all wrap their existing write with
reverseToolNamesIfPresent(c, ...), preserving original behavior
when no mapping is stored (static prefix rollback still runs).
* Non-stream chat/responses switched from c.JSON to
json.Marshal + c.Data so bytes-level replace is possible.
* Retry bodies (FilterThinkingBlocksForRetry,
FilterSignatureSensitiveBlocksForRetry, RectifyThinkingBudget)
only prune blocks — they preserve the already-obfuscated tool
names, so no extra mapping re-application is needed.
Manual QA: end-to-end scenario verified with 6 tools (above threshold)
and tool_choice.type=='tool'. Obfuscation + restore roundtrip shown
in test logs; then removed the temp test file.
Tests (16 new):
- buildDynamicToolMap stability + below-threshold guard
- sanitizeToolName precedence (dynamic > static)
- restoreToolNamesInBytes longest-first + static rollback
- applyToolNameRewriteToBody skips server tools + syncs tool_choice
- applyToolsLastCacheBreakpoint defaults to 5m + passes client ttl
- stripMessageCacheControl + addMessageCacheBreakpoints in the
1/4/string-content cases + second-to-last user turn selection
- buildToolNameRewriteFromBody ReverseOrdered is desc-by-fake-length
- fake name shape follows Parrot {prefix}{head3}{i:02d}
Three field-level alignments in normalizeClaudeOAuthRequestBody to
match real Claude Code CLI traffic byte-for-byte:
1. temperature: previously deleted unconditionally; now passes
through client value, defaults to 1 when absent (real CLI
always sends temperature, default 1).
2. max_tokens: defaults to 128000 when absent (real CLI default).
3. context_management: when thinking.type is enabled/adaptive
and the client did not provide context_management, inject
{"edits":[{"type":"clear_thinking_20251015","keep":"all"}]}
to mirror real CLI behavior.
tool_choice removal is unchanged (Claude Code OAuth credentials
do not allow client-supplied tool_choice).
Tests updated:
- gateway_body_order_test.go: temperature/max_tokens are now
expected in output; tool_choice still removed.
- gateway_prompt_test.go: system array is now 2 blocks
(billing + cc prompt), assertions adjusted.
- gateway_anthropic_apikey_passthrough_test.go: same 2-block
assertion.
Real Claude Code CLI always sends a 2-block system array:
[0] {"type":"text", "text":"x-anthropic-billing-header: cc_version=X.Y.Z.{fp}; cc_entrypoint=cli; cch=00000;"}
[1] {"type":"text", "text":"You are Claude Code...", "cache_control":{...}}
Before this commit, sub2api's mimicry path only produced block [1].
The missing billing block is one of the primary third-party detection
signals Anthropic uses for Claude-Code-scoped OAuth tokens.
New file gateway_billing_block.go ports the fingerprint algorithm
(byte-for-byte from Parrot cc_mimicry.py:compute_fingerprint):
pick chars at positions [4,7,20] of the first user text, then
`sha256(SALT + chars + cc_version)[:3]`.
- claude/constants.go: CLICurrentVersion = "2.1.92" (must match UA)
- gateway_billing_block.go: computeClaudeCodeFingerprint +
buildBillingAttributionBlockJSON + extractFirstUserText
- gateway_service.go: rewriteSystemForNonClaudeCode now emits both
blocks in order; cch=00000 is filled in later by
signBillingHeaderCCH in buildUpstreamRequest.
Downstream compat note: syncBillingHeaderVersion's regex
`cc_version=\d+\.\d+\.\d+` only matches the semver triple,
leaving the `.{fp}` suffix intact when rewriting in buildUpstreamRequest.
Real Claude CLI traffic sends cache_control as
`{"type":"ephemeral","ttl":"1h"}`. Our previous payload only
sent `{"type":"ephemeral"}`, which is a bytewise mismatch with
the official CLI and one more third-party detection signal.
Policy: client-provided ttl is always passed through unchanged.
Proxy-generated cache_control blocks default to 5m (vs Parrot's 1h)
to avoid burning the 1h cache budget on automatic breakpoints while
still aligning with the `ttl` field being present.
- claude/constants.go: DefaultCacheControlTTL = "5m"
- apicompat/types.go: new AnthropicCacheControl type with TTL field;
AnthropicTool gains optional CacheControl pointer so the mimicry
path can attach a cache breakpoint to tools[-1] later.
- service/gateway_service.go: anthropicCacheControlPayload gains TTL;
marshalAnthropicSystemTextBlock and rewriteSystemForNonClaudeCode
emit ttl=5m by default.
The previous commit added FullClaudeCodeMimicryBetas() but the two
call sites in buildUpstreamRequest still hardcoded the old 3-token
subset. Anthropic now checks the complete set of beta tokens to
decide if a request qualifies as Claude Code. Wire them up:
- /v1/messages mimic path: requiredBetas = FullClaudeCodeMimicryBetas()
- /v1/messages/count_tokens mimic path: same + BetaTokenCounting
Haiku models keep the 2-token exemption (BetaOAuth + InterleaveThinking).
Before: the OpenAI-compat forwarders only called injectClaudeCodePrompt,
which prepends the Claude Code banner but leaves the rest of the body
in its original non-Claude-Code shape. The codebase already admits this
is insufficient (see the comment on rewriteSystemForNonClaudeCode in
gateway_service.go: "仅前置追加 Claude Code 提示词无法通过检测").
Effect: OAuth accounts served through /v1/chat/completions or /v1/responses
were detected as third-party apps and bled plan quota with:
Third-party apps now draw from your extra usage, not your plan limits.
Fix:
- apicompat.AnthropicRequest: add Metadata json.RawMessage so metadata
survives the OpenAI->Anthropic->Marshal round trip; without it the
downstream rewrite has no user_id to work with.
- service: extract applyClaudeCodeOAuthMimicryToBody, a ParsedRequest-free
variant of the /v1/messages mimicry pipeline
(rewriteSystemForNonClaudeCode + normalizeClaudeOAuthRequestBody +
metadata.user_id injection) so the OpenAI-compat forwarders can reuse it.
- service: add buildOAuthMetadataUserIDFromBody + hashBodyForSessionSeed
for the same reason (no ParsedRequest at the call site).
- ForwardAsChatCompletions / ForwardAsResponses: replace the 3-line
prompt-prepend with the full mimicry pipeline.
- applyClaudeCodeMimicHeaders: set x-client-request-id per-request
(real Claude CLI always does); missing/duplicated values are one more
third-party fingerprint signal.
No change to the native /v1/messages path: it already called the full
pipeline, we only lift those helpers into a reusable function.
Tests:
- go build ./... passes
- go test ./internal/service/... ./internal/pkg/apicompat/... passes
- lsp_diagnostics clean on all touched files
- pre-existing failures in internal/config are unrelated (env-sensitive
tests that also fail on upstream main)
Align Claude Code mimicry constants with the latest real CLI traffic
(see Parrot's src/transform/cc_mimicry.py). Anthropic now uses the full
set of anthropic-beta tokens to decide whether a request counts as
"official Claude Code"; requests missing tokens that real CLI ships
today are demoted to third-party usage:
Third-party apps now draw from your extra usage, not your plan limits.
Changes:
- claude/constants.go: add new beta tokens (prompt-caching-scope,
effort, redact-thinking, context-management, extended-cache-ttl) and
expose FullClaudeCodeMimicryBetas() for the OAuth mimicry path.
- claude/constants.go: bump default User-Agent to claude-cli/2.1.92.
- identity_service.go: bump defaultFingerprint User-Agent accordingly.
No behavioral change for clients that already send a newer UA (fingerprint
merge still prefers the incoming value).