Case B: when a slot wait flushes SSE ping comments first (Writer.Written
becomes true), the previous ensureForwardErrorResponse short-circuited
on `c.Writer.Written()` and returned false without notifying the client.
Subsequent upstream errors (http2 timeout, stream INTERNAL_ERROR, etc.)
produced silent EOF; Codex CLI reported "stream closed before
response.completed" just like the user-slot timeout case.
Remove the Written() early return; coerce streamStarted to true when
Writer has already been written to, and let handleStreamingAwareError
walk the existing logic — which now (thanks to the previous commits)
emits a protocol-compliant response.failed for /responses paths and the
legacy `event: error` for others.
Update tests that previously asserted "do not override written response":
the new contract is to *append* an SSE terminal frame so the client sees
a clean close instead of EOF. recoverResponsesPanic inherits this fix.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The first revision compared GetInboundEndpoint(c) against EndpointResponses
("/v1/responses"). NormalizeInboundEndpoint only recognizes paths that
contain the literal "/v1/responses" substring, but the project actually
registers six /responses routes — three of which (top-level
r.POST("/responses", ...) and codexDirect's "/backend-api/codex/responses")
have FullPath values without the "/v1" prefix and therefore fall through
to the default branch.
Codex CLI users targeting the bare /responses route at the production
deployment (observed 2026-05-24 ~11:05 UTC, user 16) never reached the
new writeResponsesFailedSSE path: the endpoint check was false, the
legacy `event: error` frame fired, and the strict SDK kept reporting
"stream closed before response.completed".
Replace the strict equality check with inboundIsResponses(c), which
uses suffix detection on FullPath (falling back to URL.Path when
FullPath is empty in test fixtures) and covers all six route variants:
/v1/responses[/...]
/responses[/...]
/backend-api/codex/responses[/...]
Add test table covering all routes plus negative cases.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When /v1/responses streaming hits the user/account concurrency wait, the
wait loop sends SSE ping comments to keep the connection alive, which
flushes HTTP 200 + headers. If the wait then times out (or any other
post-flush error fires), handleStreamingAwareError previously emitted a
generic `event: error` frame. Codex CLI requires the stream to end with
a Responses terminal event (response.completed/failed/incomplete/cancelled),
so it reports "stream closed before response.completed" and the user-facing
rate-limit intent is lost.
This change detects inbound = /v1/responses in both handleStreamingAwareError
implementations and emits a protocol-compliant response.failed event whose
field set mirrors apicompat.makeResponsesCompletedEvent
(id/object/model/status/output/error). The synthetic id reuses
ctxkey.RequestID so client errors can be grepped against server logs.
sequence_number is intentionally omitted to preserve monotonicity on streams
that already emitted real events.
Other inbound endpoints (/v1/chat/completions, /v1/messages) keep their
legacy formats untouched.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Export ApplyBedrockCCCompat() in GatewayService, called after channel
model mapping to ensure mapped model ID is used for Opus 4.7+ detection
- Add sanitizeBedrockCCFields(): remove service_tier/interface_geo/
context_management, inject max_tokens/anthropic_version defaults
- Add sanitizeBedrockCCBetaTokens(): filter anthropic_beta to keep only
Bedrock-supported tokens, reusing autoInjectBedrockBetaTokens and
filterBedrockBetaTokens for consistent rules
- Remove unsupported beta tokens (interleaved-thinking, context-management)
from whitelist based on AWS official docs
- Simplify IsBedrockCCCompatEnabled() to check boolean toggle directly,
applying CC compat to all accounts regardless of platform
- Add unit tests for IsBedrockCCCompatEnabled (8 cases),
sanitizeBedrockCCFields (8 cases), sanitizeBedrockCCBetaTokens (7 cases)
- Update bedrock beta policy tests for removed auto-injection
- Add int64(0) param to SelectAccountWithLoadAwareness callers (signature change from channel scheduling refactor)
- Add UsageMapHook type and struct field to StreamingProcessor
- Revert Claude Max cache billing code to upstream/main (not part of channel feature)
- Revert credits overages logic to upstream/main (non-channel change)
- Remove Instructions field reference (non-channel OpenAI feature)
- Restore sora_client_handler_test.go from upstream + add channel service nil params
Move the model pricing restriction check from 8 handler entry points
to the account scheduling phase (SelectAccountForModelWithExclusions /
SelectAccountWithLoadAwareness), aligning restriction with billing:
- requested: check original request model against pricing list
- channel_mapped: check channel-mapped model against pricing list
- upstream: per-account check using account-mapped model
Handler layer now only resolves channel mapping (no restriction).
Scheduling layer performs pre-check for requested/channel_mapped,
and per-account filtering for upstream billing source.
The forward_failed error log only included account_id, making it
difficult to identify which account and proxy caused the failure
without querying the database. Add account_name, account_platform,
and proxy details (id, name, host, port) to the log fields.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add maximum Claude Code version limit to complement the existing minimum
version check. Refactor the version cache from single-value to unified
bounds struct (min+max) with a single atomic.Value and singleflight group.
- Backend: new constant, struct field, cache refactor, validation (semver
format + cross-validation max >= min), gateway enforcement, audit diff
- Frontend: settings UI input, TypeScript types, zh/en i18n
- Add CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 to all Claude Code
tutorials on /keys page (unix/cmd/powershell/vscode settings.json)
When all failover accounts are exhausted, handleFailoverExhausted maps
the upstream status code (e.g. 403) to a client-facing code (e.g. 502)
but did not write the original code to the gin context. This caused ops
error logs to show the mapped code instead of the real upstream code.
Call SetOpsUpstreamError before mapUpstreamError in all failover-
exhausted paths so that ops_error_logger captures the true upstream
status code and message.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Apply InboundEndpointMiddleware to all gateway route groups
- Replace normalizedOpenAIInboundEndpoint/normalizedOpenAIUpstreamEndpoint and normalizedGatewayInboundEndpoint/normalizedGatewayUpstreamEndpoint with GetInboundEndpoint/GetUpstreamEndpoint
- Remove 4 old constants and 4 old normalization functions (-70 lines)
- Migrate existing endpoint normalization test to new API
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Claude's output_config.effort parameter (low/medium/high/max) was not
being extracted from requests or logged in the reasoning_effort column
of usage logs. Only the OpenAI path populated this field.
Changes:
- Extract output_config.effort in ParseGatewayRequest
- Add ReasoningEffort field to ForwardResult
- Populate reasoning_effort in both RecordUsage and RecordUsageWithLongContext
- Guard against overwriting service-set effort values in handler
- Update stale comments that described reasoning_effort as OpenAI-only
- Add unit tests for extraction, normalization, and persistence
- Use TxPipeline (MULTI/EXEC) instead of Pipeline for atomic INCR+EXPIRE
- Filter negative values in GetBaseRPM(), update test expectation
- Add RPM batch query (GetRPMBatch) to account List API
- Add warn logs for RPM increment failures in gateway handler
- Reset enableRpmLimit on BulkEditAccountModal close
- Use union type 'tiered' | 'sticky_exempt' for rpmStrategy refs
- Add design decision comments for rdb.Time() RTT trade-off