Adds an opt-in tier-based fallback scheduling path for Anthropic accounts:
- accountTierLevel(): derives tier from account type without DB migration
(tier-0=OAuth/SetupToken, tier-1=APIKey, tier-2=Bedrock)
- enableTierFallbackChain(): new config flag
gateway.scheduling.enable_tier_fallback_chain (default false)
- selectAccountWithTierFallback(): loads all Anthropic accounts, groups by
tier, honors sticky sessions, applies all existing schedulability guards,
then tries tiers 0→1→2 in order via tryAcquireByLegacyOrder
- Wired into SelectAccountForModelWithExclusions: Anthropic platform +
tier fallback enabled → calls new path instead of mixed scheduling
- Fix pre-existing unit-test build break: NewGatewayService now requires
*RPMTokenBucketService (added in Task #5); add missing nil param
- 7 tests: tier mapping, config toggle, subscription preference,
APIKey fallback, exclusion handling, empty-pool error, Bedrock last resort
- Add GetQuotaRemainingFraction() to Account: returns [0,1] fraction of
remaining quota; 1.0 when no limit is configured (unlimited accounts)
- Add Quota float64 weight field to GatewayOpenAIWSSchedulerScoreWeights
and EnableP2CScheduling bool to GatewayOpenAIWSConfig (both default off)
- Extend selectByLoadBalance scoring with quota factor (gated by Quota>0)
- Add selectByPowerOfTwo(): O(1) P2C selection — samples 2 random candidates,
tries the better-scored one first then the other, falls back to wait plan;
activated when EnableP2CScheduling=true
- Add openAIWSP2CEnabled() helper on OpenAIGatewayService
- Add 6 tests covering quota fraction edge cases, P2C toggle, weight defaults,
single-candidate P2C, two-candidate P2C selection, and quota score ordering
Adds GET /api/v1/admin/ops/ws/requests — a fan-out WebSocket that pushes
per-request metadata (method, path, model, account_id, status, latency_ms)
to all connected admin clients the moment each gateway dispatch completes.
- service/request_event_bus.go: lock-free pub/sub with non-blocking drop
when per-subscriber buffer (64 slots) is full; nil-safe Publish
- service/request_event_bus_test.go: 6 tests (basic, fanout, drop, nil, close)
- GatewayHandler: records reqStartTime at entry; defer emits RequestEvent on
every return; sets status success/error/rate_limited in both Gemini and
Anthropic dispatch paths
- OpsHandler: accepts *RequestEventBus; wires it to RequestStreamWSHandler
- ops_ws_requests_handler.go: subscribes to bus, pushes JSON per event,
reuses existing upgrader/conn-limit/ping-pong infrastructure
- Route: ws.GET("/requests", ...) alongside existing /ws/qps
- wire_gen.go: requestEventBus shared between OpsHandler and GatewayHandler
- New context_compressor.go: pure functions operating on raw JSON body
(gjson/sjson pattern). approxTokens uses chars/4 heuristic.
- compressMessages: removes oldest messages from front, treating
consecutive assistant(tool_use)+user(tool_result) pairs as atomic units
to prevent orphaned tool_result blocks.
- Hooked into Forward() after StripEmptyTextBlocks, gated on
account.Credentials[enable_context_compression].
- Config: gateway.context_compression.max_tokens (default 190000).
- 8 unit tests covering: approx tokens, no-op when under budget,
oldest-message trimming, tool pair preservation, atomic pair removal,
body passthrough, body trimming.
Two anti-thundering-herd improvements:
1. OAuthRefreshAPI.RefreshIfNeeded
Wrap the existing distributed-lock + DB-reread + executor.Refresh
pipeline in a per-process singleflight keyed by cacheKey+window.
Without this, N concurrent goroutines on the same account each pay
one Redis lock RTT and one DB reread; with it, only the leader pays
and the rest share the result.
The refreshWindow is part of the key so a long background-refresh
window cannot starve a short foreground-refresh window.
2. accountRepository.SetTempUnschedulable
Wrap the same path (UPDATE + scheduler outbox enqueue + scheduler
cache sync) in a per-process singleflight keyed by id+until+reason.
The SQL guard (existing < new) already makes the UPDATE idempotent,
but N callers still cost N round-trips and N outbox inserts. With
singleflight, an upstream 401 burst that hits the same account
collapses to one execution.
Tests cover dedup behavior, key separation by account / refresh window,
and that the SQL exec count drops from N to <=2 (UPDATE + outbox).
Add HealthService with Liveness (no-op) and Readiness (DB+Redis ping
with per-component timeout) checks. Expose three endpoints:
- /healthz : new liveness endpoint, zero-dependency, always 200
- /ready : new readiness endpoint, returns 503 with details on dep
failure; suitable for K8s readinessProbe and load balancers
- /health : preserved for backward compatibility, equivalent to
/healthz
Switch primary docker-compose healthcheck to /ready so the container
is only marked healthy once DB+Redis are reachable. Standalone/dev/
local compose files keep /health to avoid disrupting existing setups.
Tests: unit tests cover liveness, readiness with both deps healthy,
each dep failing independently, and per-component timeout enforcement.
- Centralize Claude CLI fingerprint constants (UA, x-stainless-*) in
pkg/claude with BuildCLI/CodeUserAgent helpers
- Reuse constants in DefaultHeaders, identity_service defaults, and
antigravity identity defaults to keep all callers in sync
- Extend ClaudeCodeValidator to accept both claude-cli/ and claude-code/
UA prefixes (transport/helper requests use the latter)
- Update related tests to cover the new UA prefix and version
Replace charset→base64url double-encoding with standard random
bytes→base64url approach to match official client behavior and avoid
risk control detection.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Align OAuth scopes with upstream Claude Code client which now includes
the user:file_upload scope for file upload support.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cover isAntigravityInternalServerError body matching,
applyInternal500Penalty tier escalation, handleInternal500RetryExhausted
nil-safety and error handling, and resetInternal500Counter paths.
Move constants, detection, and penalty functions from
antigravity_gateway_service.go to antigravity_internal500_penalty.go.
Fix gofmt alignment and replace hardcoded duration strings with
constant references.