- TopK initial filter now drops quota-paused accounts: fold the quota check
into isAccountRequestCompatible so session-hash, TopK pool, and per-candidate
rechecks all skip paused accounts. Previously the candidate pool was built
without the quota check, so paused accounts could fill TopK and leave the
scheduler returning "no available accounts" even with healthy ones available.
- Add per-account explicit disable flags auto_pause_5h_disabled /
auto_pause_7d_disabled with toggles in EditAccountModal. Without these,
leaving the account threshold blank silently falls back to the global default,
so admins could not exempt a single account once a global default existed.
Disable is per-window: an account can opt out of 5h auto-pause while still
honoring 7d. Schedule snapshot whitelist includes the new fields, i18n EN/ZH
updated, threshold-hint text revised to explain "blank = global default".
- Move quota auto-pause settings off the request hot path: replace the per-repo
TTL+singleflight sync DB read with a per-SettingService stale-while-revalidate
in-memory snapshot. Get is non-blocking (atomic.Pointer load + async refresh
on staleness); writes via UpdateOpsAdvancedSettings push directly into the
cache through an injected sink; wire warms the cache at startup. Adds Warm
(sync) for tests/init and SetOpenAIQuotaAutoPauseSettings (sink target).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- gate previous_response_id sticky path with quota auto-pause check at both
the snapshot and DB-recheck stages (previously bypassed, #1)
- skip pausing when the usage window already reset to avoid a stale stuck-pause;
carry codex_*_reset_at / reset_after_seconds / codex_usage_updated_at through
the scheduler snapshot whitelist (#2)
- remove the incomplete limit mode; percentage threshold only (#3)
- add global default 5h/7d threshold inputs to the Ops settings dialog with
validation and en/zh i18n (#4)
- downgrade account_auto_paused_by_quota log from Info to Debug; it fires
per-candidate on the scheduling hot path (#5)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Conflicts resolved (preserving fork customizations):
- config.go: keep NodeTLSProxy + add upstream OpenAIHTTP2
- gateway_service.go: NewGatewayService now takes both rpmTokenBucketSvc
(local) and userPlatformQuotaRepo (upstream)
- wire_gen.go: wire both new args into the call site
- http_upstream.go: drop redundant settings re-assignment; keep proxy
URL log redaction
- http_upstream_test.go: adopt upstream's explicit-0-disables semantics;
keep 600s default constant in nil-cfg fallback test
- user_handler_test.go / gateway_record_usage_test.go: pick up new
userPlatformQuotaRepo nil parameter
Also updated test stubs (windsurf_google_login_test.go,
windsurf_tier_access_service_test.go, gateway_models_test.go) for new
SetModelRateLimit variadic signature and the extra NewGatewayService arg.
Upstream highlights: OpenAI embeddings gateway, user x platform USD
quota, content-moderation risk thresholds, OAuth 401 credentials
no-overwrite fix, HTTP/2 OpenAI upstream config, pool retry status code
configurability, long-context cache pricing multipliers.
Pool mode currently retries the same account for a fixed set of
upstream HTTP statuses: 401, 403, 429. Some upstream pool deployments
also need same-account retry for transient provider/proxy statuses
such as 502, 503, 520, 529, but hard-coding more statuses changes
behavior for everyone.
Add a per-account credentials option `pool_mode_retry_status_codes`
that lets admins choose which upstream HTTP status codes trigger
same-account retry in pool mode:
- Unset (default): preserve the current 401/403/429 default
- Explicit list: override the defaults with the configured codes
- Codes normalized to the 100-599 range, deduplicated, sorted
The standalone `isPoolModeRetryableStatus` helper is kept as the
default-only fallback. All 15 gateway call sites switch to the new
`Account.IsPoolModeRetryableStatus` method so behavior is preserved
for accounts that do not configure the new field.
Frontend admin UI gains a "Retry Status Codes" comma-separated input
under the pool-mode section in both Create/Edit account modals
(en + zh i18n).
Fixes#2731
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a low-visibility proxy IP resource link near proxy-related controls.
- Show the link beside account proxy selectors
- Show the link in the create proxy dialog tab row
- Keep the entry inline to avoid interrupting form workflows
Cache Hit Rate was calculated as cache_read / (cache_read + cache_creation),
which always yields 100% for OpenAI models since cache_creation is never
reported by the OpenAI API. The denominator should include all prompt tokens
(input_tokens + cache_read_tokens + cache_creation_tokens) so the rate
reflects the actual percentage of input tokens served from cache.
Fixes#2291
Add channel-level Bedrock CC compatibility toggle (similar to web_search_emulation)
that fixes 4 types of Bedrock 400 errors seen with Claude Code:
1. thinking.type "enabled" → "adaptive" for Opus 4.7+ (only supports adaptive)
2. Add default budget_tokens when missing for older models
3. Replace illegal characters in tool_use IDs to match Bedrock's ^[a-zA-Z0-9_-]+$ pattern
4. anthropic_version / invalid beta flag (already handled elsewhere)
Transformations run in Forward() before any forwarding path, so both native Bedrock
accounts and apikey passthrough accounts pointing to Bedrock relays benefit.
Includes channel-level toggle UI and unit tests.