Conflicts resolved (preserving fork customizations):
- config.go: keep NodeTLSProxy + add upstream OpenAIHTTP2
- gateway_service.go: NewGatewayService now takes both rpmTokenBucketSvc
(local) and userPlatformQuotaRepo (upstream)
- wire_gen.go: wire both new args into the call site
- http_upstream.go: drop redundant settings re-assignment; keep proxy
URL log redaction
- http_upstream_test.go: adopt upstream's explicit-0-disables semantics;
keep 600s default constant in nil-cfg fallback test
- user_handler_test.go / gateway_record_usage_test.go: pick up new
userPlatformQuotaRepo nil parameter
Also updated test stubs (windsurf_google_login_test.go,
windsurf_tier_access_service_test.go, gateway_models_test.go) for new
SetModelRateLimit variadic signature and the extra NewGatewayService arg.
Upstream highlights: OpenAI embeddings gateway, user x platform USD
quota, content-moderation risk thresholds, OAuth 401 credentials
no-overwrite fix, HTTP/2 OpenAI upstream config, pool retry status code
configurability, long-context cache pricing multipliers.
Case B: when a slot wait flushes SSE ping comments first (Writer.Written
becomes true), the previous ensureForwardErrorResponse short-circuited
on `c.Writer.Written()` and returned false without notifying the client.
Subsequent upstream errors (http2 timeout, stream INTERNAL_ERROR, etc.)
produced silent EOF; Codex CLI reported "stream closed before
response.completed" just like the user-slot timeout case.
Remove the Written() early return; coerce streamStarted to true when
Writer has already been written to, and let handleStreamingAwareError
walk the existing logic — which now (thanks to the previous commits)
emits a protocol-compliant response.failed for /responses paths and the
legacy `event: error` for others.
Update tests that previously asserted "do not override written response":
the new contract is to *append* an SSE terminal frame so the client sees
a clean close instead of EOF. recoverResponsesPanic inherits this fix.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The first revision compared GetInboundEndpoint(c) against EndpointResponses
("/v1/responses"). NormalizeInboundEndpoint only recognizes paths that
contain the literal "/v1/responses" substring, but the project actually
registers six /responses routes — three of which (top-level
r.POST("/responses", ...) and codexDirect's "/backend-api/codex/responses")
have FullPath values without the "/v1" prefix and therefore fall through
to the default branch.
Codex CLI users targeting the bare /responses route at the production
deployment (observed 2026-05-24 ~11:05 UTC, user 16) never reached the
new writeResponsesFailedSSE path: the endpoint check was false, the
legacy `event: error` frame fired, and the strict SDK kept reporting
"stream closed before response.completed".
Replace the strict equality check with inboundIsResponses(c), which
uses suffix detection on FullPath (falling back to URL.Path when
FullPath is empty in test fixtures) and covers all six route variants:
/v1/responses[/...]
/responses[/...]
/backend-api/codex/responses[/...]
Add test table covering all routes plus negative cases.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When /v1/responses streaming hits the user/account concurrency wait, the
wait loop sends SSE ping comments to keep the connection alive, which
flushes HTTP 200 + headers. If the wait then times out (or any other
post-flush error fires), handleStreamingAwareError previously emitted a
generic `event: error` frame. Codex CLI requires the stream to end with
a Responses terminal event (response.completed/failed/incomplete/cancelled),
so it reports "stream closed before response.completed" and the user-facing
rate-limit intent is lost.
This change detects inbound = /v1/responses in both handleStreamingAwareError
implementations and emits a protocol-compliant response.failed event whose
field set mirrors apicompat.makeResponsesCompletedEvent
(id/object/model/status/output/error). The synthetic id reuses
ctxkey.RequestID so client errors can be grepped against server logs.
sequence_number is intentionally omitted to preserve monotonicity on streams
that already emitted real events.
Other inbound endpoints (/v1/chat/completions, /v1/messages) keep their
legacy formats untouched.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Export ApplyBedrockCCCompat() in GatewayService, called after channel
model mapping to ensure mapped model ID is used for Opus 4.7+ detection
- Add sanitizeBedrockCCFields(): remove service_tier/interface_geo/
context_management, inject max_tokens/anthropic_version defaults
- Add sanitizeBedrockCCBetaTokens(): filter anthropic_beta to keep only
Bedrock-supported tokens, reusing autoInjectBedrockBetaTokens and
filterBedrockBetaTokens for consistent rules
- Remove unsupported beta tokens (interleaved-thinking, context-management)
from whitelist based on AWS official docs
- Simplify IsBedrockCCCompatEnabled() to check boolean toggle directly,
applying CC compat to all accounts regardless of platform
- Add unit tests for IsBedrockCCCompatEnabled (8 cases),
sanitizeBedrockCCFields (8 cases), sanitizeBedrockCCBetaTokens (7 cases)
- Update bedrock beta policy tests for removed auto-injection
Adds GET /api/v1/admin/ops/ws/requests — a fan-out WebSocket that pushes
per-request metadata (method, path, model, account_id, status, latency_ms)
to all connected admin clients the moment each gateway dispatch completes.
- service/request_event_bus.go: lock-free pub/sub with non-blocking drop
when per-subscriber buffer (64 slots) is full; nil-safe Publish
- service/request_event_bus_test.go: 6 tests (basic, fanout, drop, nil, close)
- GatewayHandler: records reqStartTime at entry; defer emits RequestEvent on
every return; sets status success/error/rate_limited in both Gemini and
Anthropic dispatch paths
- OpsHandler: accepts *RequestEventBus; wires it to RequestStreamWSHandler
- ops_ws_requests_handler.go: subscribes to bus, pushes JSON per event,
reuses existing upgrader/conn-limit/ping-pong infrastructure
- Route: ws.GET("/requests", ...) alongside existing /ws/qps
- wire_gen.go: requestEventBus shared between OpsHandler and GatewayHandler
- When tools contain both web_search and function declarations, use
requestType=agent instead of web_search (Google web_search route
rejects functionDeclarations)
- Set toolConfig.mode=AUTO when mixed tools detected (VALIDATED is
incompatible with googleSearch + functionDeclarations)
- Add hasOnlyWebSearchTools helper
- Fix buildParts test calls missing 4th arg (stripSignatures)
- Add int64(0) param to SelectAccountWithLoadAwareness callers (signature change from channel scheduling refactor)
- Add UsageMapHook type and struct field to StreamingProcessor
- Revert Claude Max cache billing code to upstream/main (not part of channel feature)
- Revert credits overages logic to upstream/main (non-channel change)
- Remove Instructions field reference (non-channel OpenAI feature)
- Restore sora_client_handler_test.go from upstream + add channel service nil params
Move the model pricing restriction check from 8 handler entry points
to the account scheduling phase (SelectAccountForModelWithExclusions /
SelectAccountWithLoadAwareness), aligning restriction with billing:
- requested: check original request model against pricing list
- channel_mapped: check channel-mapped model against pricing list
- upstream: per-account check using account-mapped model
Handler layer now only resolves channel mapping (no restriction).
Scheduling layer performs pre-check for requested/channel_mapped,
and per-account filtering for upstream billing source.
The forward_failed error log only included account_id, making it
difficult to identify which account and proxy caused the failure
without querying the database. Add account_name, account_platform,
and proxy details (id, name, host, port) to the log fields.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add maximum Claude Code version limit to complement the existing minimum
version check. Refactor the version cache from single-value to unified
bounds struct (min+max) with a single atomic.Value and singleflight group.
- Backend: new constant, struct field, cache refactor, validation (semver
format + cross-validation max >= min), gateway enforcement, audit diff
- Frontend: settings UI input, TypeScript types, zh/en i18n
- Add CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 to all Claude Code
tutorials on /keys page (unix/cmd/powershell/vscode settings.json)