Backend: Fix three race conditions in SetSnapshot that caused account
scheduling anomalies and broken sticky sessions:
- Use Lua CAS script for atomic version activation, preventing version
rollback when concurrent goroutines write snapshots simultaneously
- Add UnlockBucket to release rebuild lock immediately after completion
instead of waiting 30s TTL expiry
- Replace immediate DEL of old snapshots with 60s EXPIRE grace period,
preventing readers from hitting empty ZRANGE during version switches
Frontend: Remove serial queue throttle (1-2s delay per request) from
usage loading since backend now uses passive sampling. All usage
requests execute immediately in parallel.
Each retry in the SetOutboxWatermark loop now gets its own 5s context.
Previously a shared context could already be expired when the second or
third attempt ran, making the retries pointless.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Fixes#1691
- pollOutbox() reused a 10s context for SetOutboxWatermark after event
processing could take much longer, causing "outbox watermark write
failed: context deadline exceeded". The watermark never advanced so
the same 200 events were reprocessed every poll cycle, spiking CPU.
Now uses an independent 5s context with up to 3 retries (200ms apart).
- When multiple Codex accounts sharing the same 21-22 groups are all
rate-limited in quick succession, each account_changed event triggered
redundant bucket rebuild attempts for the same groups. Introduce
batchSeenKey{groupID, platform} and thread a seen map through the
handler chain; rebuildBucketsForPlatform skips (group, platform) pairs
already rebuilt within the same poll batch (~80% fewer rebuild calls in
the 5-accounts-same-groups scenario).
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>