- TopK initial filter now drops quota-paused accounts: fold the quota check
into isAccountRequestCompatible so session-hash, TopK pool, and per-candidate
rechecks all skip paused accounts. Previously the candidate pool was built
without the quota check, so paused accounts could fill TopK and leave the
scheduler returning "no available accounts" even with healthy ones available.
- Add per-account explicit disable flags auto_pause_5h_disabled /
auto_pause_7d_disabled with toggles in EditAccountModal. Without these,
leaving the account threshold blank silently falls back to the global default,
so admins could not exempt a single account once a global default existed.
Disable is per-window: an account can opt out of 5h auto-pause while still
honoring 7d. Schedule snapshot whitelist includes the new fields, i18n EN/ZH
updated, threshold-hint text revised to explain "blank = global default".
- Move quota auto-pause settings off the request hot path: replace the per-repo
TTL+singleflight sync DB read with a per-SettingService stale-while-revalidate
in-memory snapshot. Get is non-blocking (atomic.Pointer load + async refresh
on staleness); writes via UpdateOpsAdvancedSettings push directly into the
cache through an injected sink; wire warms the cache at startup. Adds Warm
(sync) for tests/init and SetOpenAIQuotaAutoPauseSettings (sink target).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- gate previous_response_id sticky path with quota auto-pause check at both
the snapshot and DB-recheck stages (previously bypassed, #1)
- skip pausing when the usage window already reset to avoid a stale stuck-pause;
carry codex_*_reset_at / reset_after_seconds / codex_usage_updated_at through
the scheduler snapshot whitelist (#2)
- remove the incomplete limit mode; percentage threshold only (#3)
- add global default 5h/7d threshold inputs to the Ops settings dialog with
validation and en/zh i18n (#4)
- downgrade account_auto_paused_by_quota log from Info to Debug; it fires
per-candidate on the scheduling hot path (#5)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Backend: Fix three race conditions in SetSnapshot that caused account
scheduling anomalies and broken sticky sessions:
- Use Lua CAS script for atomic version activation, preventing version
rollback when concurrent goroutines write snapshots simultaneously
- Add UnlockBucket to release rebuild lock immediately after completion
instead of waiting 30s TTL expiry
- Replace immediate DEL of old snapshots with 60s EXPIRE grace period,
preventing readers from hitting empty ZRANGE during version switches
Frontend: Remove serial queue throttle (1-2s delay per request) from
usage loading since backend now uses passive sampling. All usage
requests execute immediately in parallel.