- TopK initial filter now drops quota-paused accounts: fold the quota check
into isAccountRequestCompatible so session-hash, TopK pool, and per-candidate
rechecks all skip paused accounts. Previously the candidate pool was built
without the quota check, so paused accounts could fill TopK and leave the
scheduler returning "no available accounts" even with healthy ones available.
- Add per-account explicit disable flags auto_pause_5h_disabled /
auto_pause_7d_disabled with toggles in EditAccountModal. Without these,
leaving the account threshold blank silently falls back to the global default,
so admins could not exempt a single account once a global default existed.
Disable is per-window: an account can opt out of 5h auto-pause while still
honoring 7d. Schedule snapshot whitelist includes the new fields, i18n EN/ZH
updated, threshold-hint text revised to explain "blank = global default".
- Move quota auto-pause settings off the request hot path: replace the per-repo
TTL+singleflight sync DB read with a per-SettingService stale-while-revalidate
in-memory snapshot. Get is non-blocking (atomic.Pointer load + async refresh
on staleness); writes via UpdateOpsAdvancedSettings push directly into the
cache through an injected sink; wire warms the cache at startup. Adds Warm
(sync) for tests/init and SetOpenAIQuotaAutoPauseSettings (sink target).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Background / 背景
The ops cleanup task currently rejects retention days < 1 in both validate
and normalize, so operators who want minimal-history setups (e.g. high
churn deployments that prefer near-realtime cleanup) cannot express that
intent through the UI. The only options are 1+ days, which keeps at least
24h of history regardless of cron frequency.
ops 清理任务目前在 validate 和 normalize 两处都拒绝小于 1 的保留天数,
让希望尽量不留历史的运维场景(高吞吐部署 + 想用近实时清理)无法通过 UI
表达。最低只能配 1,等于不管 cron 多频繁,至少都会保留 24 小时的历史。
Purpose / 目的
Let admins set retention days to 0, meaning "every scheduled cleanup
run wipes the corresponding table(s) entirely". Combined with a more
frequent cron (e.g. `0 * * * *`) this yields effectively rolling cleanup.
允许管理员把保留天数设为 0,语义为"每次定时清理时把对应表全部清空"。
搭配更频繁的 cron(比如每小时整点)即可获得近似滚动清理的效果。
Changes / 改动内容
Backend
- service/ops_settings.go: validate accepts [0, 365]; normalize only
refills default 30 when value is < 0 (negative is treated as legacy
bad data, 0 is honoured)
- service/ops_cleanup_service.go: introduce `opsCleanupPlan(now, days)`
returning `(cutoff, truncate, ok)`. days==0 returns truncate=true and
short-circuits to a new `truncateOpsTable` helper that uses
`TRUNCATE TABLE` (O(1), no WAL, no VACUUM pressure). days>0 keeps
the existing batched DELETE path unchanged. Empty tables skip
TRUNCATE to avoid the ACCESS EXCLUSIVE lock entirely
- Extract `isMissingRelationError` helper to dedupe the "table not
yet created" tolerance shared by both delete and truncate paths
- Add unit tests for `opsCleanupPlan` (three branches) and
`isMissingRelationError`
后端
- service/ops_settings.go: validate 接受 [0, 365];normalize 仅在 < 0
时回填默认 30(负数视为脏数据,0 被尊重)
- service/ops_cleanup_service.go: 抽 `opsCleanupPlan(now, days)` 返回
`(cutoff, truncate, ok)`。days==0 → truncate=true,走新增
`truncateOpsTable`(TRUNCATE TABLE,O(1),无 WAL、无 VACUUM 压力);
days>0 仍走原批量 DELETE 路径,行为完全不变。空表跳过 TRUNCATE,
避免无意义的 ACCESS EXCLUSIVE 锁
- 抽 `isMissingRelationError` helper 复用 delete / truncate 两处的
"表不存在"宽容判断
- 补 `opsCleanupPlan` 三分支 + `isMissingRelationError` 单元测试
Frontend
- OpsSettingsDialog.vue: validation accepts [0, 365]; input min=0
- i18n (zh/en): hint mentions "0 = wipe all on every cleanup",
validation message updated to 0-365 range
前端
- OpsSettingsDialog.vue: 校验放宽到 [0, 365],input min 改 0
- i18n(zh/en):hint 补"0 = 每次清理时清空所有",错误提示改 0-365
Trade-offs / 取舍
- TRUNCATE requires ACCESS EXCLUSIVE lock briefly, but ops tables only
have the cleanup task as a writer, so the lock is invisible to other
workloads
- Empty-table guard avoids the lock when there is nothing to clean
- Negative values are still treated as legacy bad data and replaced
with default 30 to preserve compatibility
- Add 5th error filter switch IgnoreInsufficientBalanceErrors to suppress
upstream insufficient balance / insufficient_quota errors from ops log
- Extract hardcoded error strings into package-level constants for
shouldSkipOpsErrorLog, normalizeOpsErrorType, classifyOpsPhase, and
classifyOpsIsBusinessLimited
- Define ErrNoAvailableAccounts sentinel error and replace all
errors.New("no available accounts") call sites
- Update tests to use require.ErrorIs with the sentinel error