Closes#1957
The OAuth path forwards client requests to chatgpt.com/backend-api/codex/responses,
where applyCodexOAuthTransform forces store=false (chatgpt.com's codex backend
rejects store=true). Reasoning items emitted under store=false are NEVER
persisted upstream, so any rs_* reference that a client carries forward in a
subsequent input[] array triggers a guaranteed upstream 404:
Item with id 'rs_...' not found. Items are not persisted when `store` is
set to false. Try again with `store` set to true, or remove this item
from your input.
sub2api wraps this as 502 "Upstream request failed" and the conversation
breaks on every multi-turn /v1/responses request that uses reasoning + tools
(reproducible with gpt-5.5; gpt-5.4 happens to dodge it because the upstream
does not emit reasoning items for that model).
Affected clients include any that follow the OpenAI Responses API spec and
replay prior assistant items verbatim — in practice this hit OpenClaw and
similar agent harnesses on every turn ≥2 with tool use.
The fix: in filterCodexInput, drop input items with type == "reasoning"
entirely. The model never reads reasoning summary text from input (only
encrypted_content can carry reasoning context across turns, and chatgpt.com
under store=false does not emit it), so this is a no-op for the model itself
and a clean removal of unreachable upstream lookups.
Scope is intentionally narrow:
* Only OAuth account requests (account.Type == AccountTypeOAuth) reach
applyCodexOAuthTransform / filterCodexInput.
* API-key accounts going to api.openai.com/v1/responses are unaffected
(store=true works there, rs_* persists, multi-turn already works).
* Anthropic / Gemini platform groups go through different transforms and
are unaffected.
* /v1/chat/completions is unaffected (no reasoning items).
* item_reference items (different type) are unaffected — only type ==
"reasoning" is dropped.
Verification:
* Existing tests pass: go test ./internal/service/ -run Codex|Tool|OAuth
* New regression test asserts reasoning items are dropped under both
preserveReferences=true and preserveReferences=false.
* End-to-end repro on gpt-5.5 multi-turn + tools: pre-patch 502, post-patch
200. Repro on gpt-5.4 unchanged. Three-turn deep loop on gpt-5.5 passes.
Two follow-ups to PR #2066's failover-wrap fix:
1. Failover ResponseBody (`UpstreamFailoverError.ResponseBody`) was encoded
as `{"error": "<msg>"}` (string field). `ExtractUpstreamErrorMessage`
probes for `error.message`, `detail`, or top-level `message` only — so
`handleFailoverExhausted` and downstream passthrough rules saw an empty
message, losing the EOF root cause in ops logs. Re-encode as the
Anthropic standard shape `{"type":"error","error":{"type":"upstream_disconnected","message":"..."}}`.
(Addresses the inline review comment from copilot-pull-request-reviewer
on Wei-Shaw/sub2api#2066.)
2. The streaming `event: error` SSE frame for `response_too_large`,
`stream_read_error`, and `stream_timeout` was non-standard
(`{"error":"<reason>"}`). Anthropic SDKs (and Claude Code) expect
`{"type":"error","error":{"type":"...","message":"..."}}` and parse
`error.type`/`error.message` accordingly. Refactor `sendErrorEvent` to
take both reason and message, and emit the standard frame so client
SDKs surface a real diagnostic message instead of a generic stream error.
This does not by itself prevent task interruption on long-stream EOF
(SSE has no resume; client-side retry remains the only complete fix), but
it gives both server-side ops logs and client-side error UIs a meaningful
upstream message so users know the next step is to retry.
Tests updated to assert the new body shape on both branches plus a new
assertion that `ExtractUpstreamErrorMessage` returns a non-empty string.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Anthropic streaming path (gateway_service.go) returned a plain error on
upstream SSE read failure, so the handler-level UpstreamFailoverError check
never fired and the client received a bare `stream_read_error` event,
breaking long-running tasks even when no bytes had been written yet.
The most common trigger is HTTP/2 GOAWAY from api.anthropic.com edge
backends doing graceful rotation: Go's http.Transport surfaces this as
`unexpected EOF` and never auto-retries.
Mirror what the OpenAI and antigravity gateways already do: when the read
error happens before any byte has reached the client (`!c.Writer.Written()`),
return `*UpstreamFailoverError{StatusCode: 502, RetryableOnSameAccount: true}`
so the handler can retry on the same or another account. After client
output has begun, SSE has no resume protocol — keep the existing passthrough
behavior.
Tests cover both branches via streamReadCloser-based fixtures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex CLI 0.125+ defaults to sending request bodies with
Content-Encoding: zstd. Without server-side decompression the gateway
returns 'Failed to parse request body' on /v1/responses (and any other
JSON endpoint) because gjson sees raw zstd bytes.
ReadRequestBodyWithPrealloc now inspects Content-Encoding and
transparently decodes zstd, gzip/x-gzip, and deflate bodies before
returning them, then strips the encoding headers and updates
ContentLength so downstream code can reuse the bytes safely.
Unsupported encodings produce a clear error.
Adds unit tests covering identity, zstd, gzip, deflate, unsupported
encoding, corrupt zstd payloads, nil bodies, and explicit identity.
PR #1914 unconditionally applied the full mimicry pipeline to all OAuth
accounts, including real Claude Code CLI clients. This replaced the
client's long system prompt (~10K+ tokens with stable cache_control
breakpoints) with a short ~45 token [billing, CC prompt] pair, which
falls below Anthropic's 1024-token minimum cacheable prefix threshold.
The result: every request created a new cache but never hit an existing
one.
Fix: restore the Claude Code client detection gate so that real CC
clients bypass body-level mimicry (system rewrite, message cache
management, tool name obfuscation). Non-CC third-party clients
(opencode, etc.) continue to receive full mimicry.
Also harden the detection logic:
- Make UA regex case-insensitive (align with claude_code_validator.go)
- Validate metadata.user_id format via ParseMetadataUserID() instead of
just checking non-empty, preventing third-party tools from spoofing
a claude-cli/* UA with an arbitrary user_id string to bypass mimicry
- gofmt: realign AffiliateDetail struct tags in affiliate_service.go
- ineffassign: remove dead seenCompleted assignment before return in account_test_service.go
The hardcoded codex CLI version (0.104.0) causes upstream rejection
when using gpt-5.5 with compact, as the server treats the request
as an outdated client and returns 400/502.
Update codexCLIVersion, codexCLIUserAgent, and openAICodexProbeVersion
to 0.125.0 to match the current Codex CLI release.
Fixes#1933, #1887, #1865
Related: #1609, #1298, #849
- staticcheck QF1001: apply De Morgan's law to the OAuth-mimic header
passthrough guard (`!(a && b)` → `a != ... || !b`).
- unused: drop `isClaudeCodeRequest`, which became dead after PR #1914
switched both `/v1/messages` and `/count_tokens` paths to unconditional
`account.IsOAuth()` mimicry. The lowercase helper `isClaudeCodeClient`
is kept (still referenced by `TestIsClaudeCodeClient`).
- Drop SetAffiliateService setters and ProvideAuthService /
ProvidePaymentService / ProvideUserHandler wrappers in favor of direct
Wire constructor injection. AffiliateService has no back-edge to
Auth/Payment/User, so the indirection was never required.
- Change RegisterWithVerification's variadic affiliateCode to a fixed
parameter; adjust all call sites.
- Validate aff_code length and charset in BindInviterByCode before any
DB lookup, eliminating timing-side-channel and useless DB roundtrips
on malformed input.
- Make affiliate cache invalidation synchronous; surface Redis errors
via the project logger instead of swallowing them in a detached
goroutine.
- Add an integration test guarding cross-layer tx propagation in
AccrueQuota and a unit test pinning the aff_code format rules.
Root cause of persistent third-party detection: sub2api's
buildUpstreamRequest transparently forwards client headers via
allowedHeaders whitelist (addHeaderRaw) before applying mimicry
overrides. When third-party clients (opencode, etc.) send their own
anthropic-beta / user-agent / x-stainless-* / x-claude-code-session-id
values, these get appended to the request alongside our injected
headers, creating an inconsistent header set that Anthropic detects.
Parrot's build_upstream_headers constructs exactly 9 headers from
scratch and never forwards anything from the client. This is why
'same opencode version, some users work some don't' — different
opencode configs/versions send different header combinations.
Fix: when tokenType=oauth and mimicClaudeCode=true, skip the
client header passthrough loop entirely. The subsequent
applyClaudeCodeMimicHeaders + ApplyFingerprint + beta merge
pipeline constructs all necessary headers from our controlled values.
Also: remove systemIncludesClaudeCodePrompt gate — OAuth accounts
now unconditionally rewrite system (even if client already sent a
Claude Code-style prompt), ensuring billing attribution block is
always present.
Before: isClaudeCodeRequest() checked whether the client looks like a
real Claude Code CLI (UA, system prompt, X-App header, metadata format).
If it looked like Claude Code, all mimicry was skipped — the assumption
being that a real CLI needs no help.
Problem: third-party tools like opencode partially impersonate Claude
Code (sending claude-cli UA + claude-code beta + CC system prompt) but
miss critical details (billing attribution block, tool-name obfuscation,
cache breakpoints, full beta set). Some users' opencode instances pass
the isClaudeCodeRequest check, causing sub2api to skip mimicry entirely,
while Anthropic still detects the request as third-party.
This explains why 'same opencode version, some users work, some don't'
— it depends on which opencode features/config trigger the validator.
Fix: OAuth accounts now unconditionally run the full mimicry pipeline,
matching Parrot's behavior (Parrot never checks client identity).
This is safe because our mimicry is strictly more complete than any
third-party client's partial impersonation.
Changed:
- /v1/messages path: remove isClaudeCode gate
- /v1/messages/count_tokens path: same
The previous commit only wired stripMessageCacheControl,
addMessageCacheBreakpoints, and tool-name obfuscation into
applyClaudeCodeOAuthMimicryToBody (used by /chat/completions and
/responses). The native /v1/messages path and count_tokens path
have their own independent mimicry code blocks and were missed.
Now all three entry points share the same D/E/F pipeline:
- /v1/messages (gateway_service.go forwardAnthropic)
- /v1/messages/count_tokens (gateway_service.go countTokens)
- OpenAI compat (applyClaudeCodeOAuthMimicryToBody)
Implements the remaining three parity items with Parrot cc_mimicry:
D) Tool-name obfuscation
- Dynamic mapping when tools.length > 5 (matches Parrot threshold).
Fake names follow {prefix}{name[:3]}{i:02d} (e.g. 'manage_bas00').
Go port of random.Random(hash(tuple(names))) uses fnv64a seed +
math/rand; byte-exact reproduction is impossible (Python hash vs
Go hash), but the two invariants that matter are preserved:
* same input tool_names yield identical mapping (cache hit)
* prefix pool is shuffled (names look distributed)
- Static prefix map (sessions_ -> cc_sess_, session_ -> cc_ses_)
applied as fallback, matching Parrot TOOL_NAME_REWRITES verbatim.
- Server tools (web_search_20250305, computer_*, etc.) are NOT
renamed; only type=='function' and type=='custom' tools are.
- tool_choice.name is rewritten in sync (only when type=='tool').
- Response side: bytes-level replace on every SSE chunk / JSON
body at 6 injection points (standard stream/non-stream,
passthrough stream/non-stream, chat_completions stream +
non-stream, responses stream + non-stream). Reverse mapping
applied longest-fake-name-first to prevent substring conflicts
(parity with Parrot _restore_tool_names_in_chunk).
- tool_choice is no longer unconditionally deleted in
normalizeClaudeOAuthRequestBody — Parrot passes it through.
E) tools[-1] cache_control breakpoint
- Injected as {type:ephemeral, ttl:<DefaultCacheControlTTL>} when
the last tool has no cache_control. Client-provided ttl is
passed through unchanged (repo-wide policy).
F) messages cache_control strategy
- stripMessageCacheControl removes every client-provided
messages[*].content[*].cache_control (multi-turn stability).
- addMessageCacheBreakpoints then injects two stable breakpoints:
(1) last message, and (2) second-to-last user turn when
messages.length >= 4.
- Combined with the system block breakpoint and tools[-1]
breakpoint, this gives exactly the 4 breakpoints Anthropic
allows per request.
Non-trivial implementation details to be aware of when rebasing:
* Two new files, no upstream collision:
gateway_tool_rewrite.go (D + E algorithms)
gateway_messages_cache.go (F strip + breakpoints)
* Two new feature calls bolted onto the tail of
applyClaudeCodeOAuthMimicryToBody in gateway_service.go — rebase
conflicts will be ~10 lines maximum.
* Response-side injection points all wrap their existing write with
reverseToolNamesIfPresent(c, ...), preserving original behavior
when no mapping is stored (static prefix rollback still runs).
* Non-stream chat/responses switched from c.JSON to
json.Marshal + c.Data so bytes-level replace is possible.
* Retry bodies (FilterThinkingBlocksForRetry,
FilterSignatureSensitiveBlocksForRetry, RectifyThinkingBudget)
only prune blocks — they preserve the already-obfuscated tool
names, so no extra mapping re-application is needed.
Manual QA: end-to-end scenario verified with 6 tools (above threshold)
and tool_choice.type=='tool'. Obfuscation + restore roundtrip shown
in test logs; then removed the temp test file.
Tests (16 new):
- buildDynamicToolMap stability + below-threshold guard
- sanitizeToolName precedence (dynamic > static)
- restoreToolNamesInBytes longest-first + static rollback
- applyToolNameRewriteToBody skips server tools + syncs tool_choice
- applyToolsLastCacheBreakpoint defaults to 5m + passes client ttl
- stripMessageCacheControl + addMessageCacheBreakpoints in the
1/4/string-content cases + second-to-last user turn selection
- buildToolNameRewriteFromBody ReverseOrdered is desc-by-fake-length
- fake name shape follows Parrot {prefix}{head3}{i:02d}
Three field-level alignments in normalizeClaudeOAuthRequestBody to
match real Claude Code CLI traffic byte-for-byte:
1. temperature: previously deleted unconditionally; now passes
through client value, defaults to 1 when absent (real CLI
always sends temperature, default 1).
2. max_tokens: defaults to 128000 when absent (real CLI default).
3. context_management: when thinking.type is enabled/adaptive
and the client did not provide context_management, inject
{"edits":[{"type":"clear_thinking_20251015","keep":"all"}]}
to mirror real CLI behavior.
tool_choice removal is unchanged (Claude Code OAuth credentials
do not allow client-supplied tool_choice).
Tests updated:
- gateway_body_order_test.go: temperature/max_tokens are now
expected in output; tool_choice still removed.
- gateway_prompt_test.go: system array is now 2 blocks
(billing + cc prompt), assertions adjusted.
- gateway_anthropic_apikey_passthrough_test.go: same 2-block
assertion.
Real Claude Code CLI always sends a 2-block system array:
[0] {"type":"text", "text":"x-anthropic-billing-header: cc_version=X.Y.Z.{fp}; cc_entrypoint=cli; cch=00000;"}
[1] {"type":"text", "text":"You are Claude Code...", "cache_control":{...}}
Before this commit, sub2api's mimicry path only produced block [1].
The missing billing block is one of the primary third-party detection
signals Anthropic uses for Claude-Code-scoped OAuth tokens.
New file gateway_billing_block.go ports the fingerprint algorithm
(byte-for-byte from Parrot cc_mimicry.py:compute_fingerprint):
pick chars at positions [4,7,20] of the first user text, then
`sha256(SALT + chars + cc_version)[:3]`.
- claude/constants.go: CLICurrentVersion = "2.1.92" (must match UA)
- gateway_billing_block.go: computeClaudeCodeFingerprint +
buildBillingAttributionBlockJSON + extractFirstUserText
- gateway_service.go: rewriteSystemForNonClaudeCode now emits both
blocks in order; cch=00000 is filled in later by
signBillingHeaderCCH in buildUpstreamRequest.
Downstream compat note: syncBillingHeaderVersion's regex
`cc_version=\d+\.\d+\.\d+` only matches the semver triple,
leaving the `.{fp}` suffix intact when rewriting in buildUpstreamRequest.
Real Claude CLI traffic sends cache_control as
`{"type":"ephemeral","ttl":"1h"}`. Our previous payload only
sent `{"type":"ephemeral"}`, which is a bytewise mismatch with
the official CLI and one more third-party detection signal.
Policy: client-provided ttl is always passed through unchanged.
Proxy-generated cache_control blocks default to 5m (vs Parrot's 1h)
to avoid burning the 1h cache budget on automatic breakpoints while
still aligning with the `ttl` field being present.
- claude/constants.go: DefaultCacheControlTTL = "5m"
- apicompat/types.go: new AnthropicCacheControl type with TTL field;
AnthropicTool gains optional CacheControl pointer so the mimicry
path can attach a cache breakpoint to tools[-1] later.
- service/gateway_service.go: anthropicCacheControlPayload gains TTL;
marshalAnthropicSystemTextBlock and rewriteSystemForNonClaudeCode
emit ttl=5m by default.