sub2api

Author	SHA1	Message	Date
wucm667	c9caadb378	fix(account): address second-round review on quota auto-pause - TopK initial filter now drops quota-paused accounts: fold the quota check into isAccountRequestCompatible so session-hash, TopK pool, and per-candidate rechecks all skip paused accounts. Previously the candidate pool was built without the quota check, so paused accounts could fill TopK and leave the scheduler returning "no available accounts" even with healthy ones available. - Add per-account explicit disable flags auto_pause_5h_disabled / auto_pause_7d_disabled with toggles in EditAccountModal. Without these, leaving the account threshold blank silently falls back to the global default, so admins could not exempt a single account once a global default existed. Disable is per-window: an account can opt out of 5h auto-pause while still honoring 7d. Schedule snapshot whitelist includes the new fields, i18n EN/ZH updated, threshold-hint text revised to explain "blank = global default". - Move quota auto-pause settings off the request hot path: replace the per-repo TTL+singleflight sync DB read with a per-SettingService stale-while-revalidate in-memory snapshot. Get is non-blocking (atomic.Pointer load + async refresh on staleness); writes via UpdateOpsAdvancedSettings push directly into the cache through an injected sink; wire warms the cache at startup. Adds Warm (sync) for tests/init and SetOpenAIQuotaAutoPauseSettings (sink target). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-29 14:32:45 +08:00
wucm667	ead471d64b	feat(account): 支持按 5h/7d 用量阈值自动暂停账号调度	2026-05-29 10:47:47 +08:00
erio	d218b6c2aa	refactor(ops-cleanup): 拆分 executor + table-driven + 提取常量 + 补测试代码审查反馈： 1. 文件行数超标：ops_cleanup_service.go 594→413 行。拆 opsCleanupPlan / deleteOldRowsByID / truncateOpsTable / isMissingRelationError + counts struct 到 ops_cleanup_executor.go (164 行)。 2. runCleanupOnce 89 行→30 行（table-driven）：用 []opsCleanupTarget 循环替代三组重复的 opsCleanupPlan → runOne → assign。 3. 魔法值提取常量： opsCleanupDefaultSchedule / opsCleanupBatchSize / opsCleanupCronStopTimeout / opsCleanupRunTimeout / opsCleanupHeartbeatTimeout。 ops_settings.go 中 "0 2 * * *" 也统一引用 opsCleanupDefaultSchedule。 4. 补 5 个缺失测试： - Reload 未 Start 时 no-op - Reload 已 Stop 后 no-op - cleanupReloader==nil 时 Update 不 panic - Start 重复调用幂等 - refreshEffectiveBeforeRun 正确更新 snapshot	2026-05-04 13:35:26 +08:00
erio	c4598aa9b6	fix(ops-cleanup): 让 UI 数据保留策略真正生效 UI 上 admin 改的数据保留策略（cron + retention 天数）此前只写入 settings 表的 ops_advanced_settings.data_retention，但 OpsCleanupService 启动时只读 cfg.Ops.Cleanup（config.yaml / 环境变量），从未读取 settings 表，导致 UI 配置完全不生效——cron 实际仍按默认 0 2 * * * 每日跑、retention 30 天。改动： - OpsCleanupService 增加 settingRepo 依赖，新增 effective 配置 + Reload 方法。 Start/Reload 时从 settings.ops_advanced_settings.data_retention 覆盖 cfg.Ops.Cleanup（Enabled、Schedule、*RetentionDays），无 settings 时整体 fallback 到 cfg。runScheduled 顶部刷新一次 effective，让 retention 改动当次即生效（schedule/enabled 改动需要 Reload 才换 cron）。 - 用 mu + started/stopped 替换 startOnce/stopOnce 以支持 Reload 重建 cron。 - OpsService 增加 CleanupReloader 接口与 SetCleanupReloader setter； UpdateOpsAdvancedSettings 写入后调用 Reload。 - wire 通过 setter 注入 cleanup hook，避免构造期循环依赖。 - 新增单测覆盖 overlay 五种情形 + Update 触发 Reload。	2026-05-04 12:43:15 +08:00
erio	4b6954f9f0	feat(ops): allow retention days = 0 to wipe table on each scheduled cleanup Background / 背景 The ops cleanup task currently rejects retention days < 1 in both validate and normalize, so operators who want minimal-history setups (e.g. high churn deployments that prefer near-realtime cleanup) cannot express that intent through the UI. The only options are 1+ days, which keeps at least 24h of history regardless of cron frequency. ops 清理任务目前在 validate 和 normalize 两处都拒绝小于 1 的保留天数，让希望尽量不留历史的运维场景（高吞吐部署 + 想用近实时清理）无法通过 UI 表达。最低只能配 1，等于不管 cron 多频繁，至少都会保留 24 小时的历史。 Purpose / 目的 Let admins set retention days to 0, meaning "every scheduled cleanup run wipes the corresponding table(s) entirely". Combined with a more frequent cron (e.g. `0 * * * *`) this yields effectively rolling cleanup. 允许管理员把保留天数设为 0，语义为"每次定时清理时把对应表全部清空"。搭配更频繁的 cron（比如每小时整点）即可获得近似滚动清理的效果。 Changes / 改动内容 Backend - service/ops_settings.go: validate accepts [0, 365]; normalize only refills default 30 when value is < 0 (negative is treated as legacy bad data, 0 is honoured) - service/ops_cleanup_service.go: introduce `opsCleanupPlan(now, days)` returning `(cutoff, truncate, ok)`. days==0 returns truncate=true and short-circuits to a new `truncateOpsTable` helper that uses `TRUNCATE TABLE` (O(1), no WAL, no VACUUM pressure). days>0 keeps the existing batched DELETE path unchanged. Empty tables skip TRUNCATE to avoid the ACCESS EXCLUSIVE lock entirely - Extract `isMissingRelationError` helper to dedupe the "table not yet created" tolerance shared by both delete and truncate paths - Add unit tests for `opsCleanupPlan` (three branches) and `isMissingRelationError` 后端 - service/ops_settings.go: validate 接受 [0, 365]；normalize 仅在 < 0 时回填默认 30（负数视为脏数据，0 被尊重） - service/ops_cleanup_service.go: 抽 `opsCleanupPlan(now, days)` 返回 `(cutoff, truncate, ok)`。days==0 → truncate=true，走新增 `truncateOpsTable`（TRUNCATE TABLE，O(1)，无 WAL、无 VACUUM 压力）； days>0 仍走原批量 DELETE 路径，行为完全不变。空表跳过 TRUNCATE，避免无意义的 ACCESS EXCLUSIVE 锁 - 抽 `isMissingRelationError` helper 复用 delete / truncate 两处的 "表不存在"宽容判断 - 补 `opsCleanupPlan` 三分支 + `isMissingRelationError` 单元测试 Frontend - OpsSettingsDialog.vue: validation accepts [0, 365]; input min=0 - i18n (zh/en): hint mentions "0 = wipe all on every cleanup", validation message updated to 0-365 range 前端 - OpsSettingsDialog.vue: 校验放宽到 [0, 365]，input min 改 0 - i18n（zh/en）：hint 补"0 = 每次清理时清空所有"，错误提示改 0-365 Trade-offs / 取舍 - TRUNCATE requires ACCESS EXCLUSIVE lock briefly, but ops tables only have the cleanup task as a writer, so the lock is invisible to other workloads - Empty-table guard avoids the lock when there is nothing to clean - Negative values are still treated as legacy bad data and replaced with default 30 to preserve compatibility	2026-04-29 15:01:02 +08:00
erio	cfe72159d0	feat(ops): add ignore insufficient balance errors toggle and extract error constants - Add 5th error filter switch IgnoreInsufficientBalanceErrors to suppress upstream insufficient balance / insufficient_quota errors from ops log - Extract hardcoded error strings into package-level constants for shouldSkipOpsErrorLog, normalizeOpsErrorType, classifyOpsPhase, and classifyOpsIsBusinessLimited - Define ErrNoAvailableAccounts sentinel error and replace all errors.New("no available accounts") call sites - Update tests to use require.ErrorIs with the sentinel error	2026-03-15 17:26:18 +08:00
Peter	29b0e4a8a5	feat(ops): allow hiding alert events	2026-03-13 17:18:04 +08:00
Peter	af9c4a7dd0	feat(ops): make openai token stats optional	2026-03-13 04:11:58 +08:00
alfadb	832b0185c7	style: fix gofmt formatting in ops_settings.go Remove extra space before inline comment to pass golangci-lint gofmt check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 18:00:49 +08:00
alfadb	b1719b26d1	fix(ops): 默认忽略 count_tokens 404 错误将 IgnoreCountTokensErrors 默认值从 false 改为 true。 count_tokens 返回 404 是预期业务行为（上游不支持 endpoint，客户端应 fallback 到本地 tokenizer 估算），不应被视为错误。 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 16:50:13 +08:00
IanShaw027	967e25878f	refactor(ops): 重构ops核心服务层代码	2026-01-14 12:40:12 +08:00
IanShaw027	182683814b	refactor(ops): 移除duration相关告警指标，简化监控配置主要改动： - 移除 p95_latency_ms 和 p99_latency_ms 告警指标类型 - 移除配置中的 latency_p99_ms_max 阈值设置 - 简化健康分数计算（移除latency权重，重新归一化SLA和错误率） - 移除duration相关的诊断规则和阈值检查 - 统一术语：延迟 → 请求时长 - 保留duration数据展示，但不再用于告警判断 - 聚焦TTFT作为主要的响应速度告警指标影响范围： - Backend: handler, service, models, tests - Frontend: API types, i18n, components	2026-01-14 10:52:56 +08:00
IanShaw027	2d45e61a9b	style(ops): 修复代码格式问题以通过 golangci-lint	2026-01-12 17:18:49 +08:00
IanShaw027	345a965fa3	feat(ops): 添加 count_tokens 错误过滤功能功能特性： - 自动识别并标记 count_tokens 请求的错误 - 支持配置是否在统计中忽略 count_tokens 错误 - 错误数据完整保留，仅在统计时动态过滤技术实现： - ops_error_logger.go: 自动标记 count_tokens 请求 - ops_repo.go: INSERT 语句添加 is_count_tokens 字段 - ops_repo_dashboard.go: buildErrorWhere 核心过滤函数 - ops_repo_preagg.go: 预聚合统计中添加过滤 - ops_repo_trends.go: 趋势统计查询添加过滤（2 处） - ops_settings_models.go: 添加 ignore_count_tokens_errors 配置 - ops_settings.go: 配置验证和默认值设置 - ops_port.go: 错误日志模型添加 IsCountTokens 字段业务价值： - count_tokens 是探测性请求，其错误不影响真实业务 SLA - 用户可根据需求灵活控制是否计入统计 - 提升错误率、告警等运维指标的准确性影响范围： - Dashboard 概览统计 - 错误趋势图表 - 告警规则评估 - 预聚合指标（hourly/daily） - 健康分数计算	2026-01-12 17:06:12 +08:00
IanShaw027	e0cccf6ed2	fix(ops): 修复Go代码格式问题	2026-01-12 14:36:32 +08:00
IanShaw027	7536dbfee5	feat(ops): 后端添加指标阈值管理API - 新增GetMetricThresholds和UpdateMetricThresholds接口 - 支持配置SLA、延迟P99、TTFT P99、请求错误率、上游错误率阈值 - 添加参数验证逻辑 - 提供默认阈值配置	2026-01-12 11:42:56 +08:00
IanShaw027	c48795a948	fix(ci): 修复最后一批CI错误 - 修复 ops_repo_trends.go 中剩余3处 Rows.Close 未检查错误 - 修复 ops_settings.go, ops_settings_models.go, ops_trends.go 的格式化问题	2026-01-12 00:02:19 +08:00
IanShaw027	988b4d0254	feat(ops): 添加高级设置API支持 - 新增OpsAdvancedSettings数据模型 - 支持数据保留策略配置（错误日志、分钟级指标、小时级指标） - 支持数据聚合开关配置 - 添加GET/PUT /admin/ops/advanced-settings接口 - 添加配置校验和默认值处理相关文件： - backend/internal/service/ops_settings_models.go - backend/internal/service/ops_settings.go - backend/internal/handler/admin/ops_settings_handler.go - backend/internal/server/routes/admin.go - backend/internal/service/domain_constants.go	2026-01-11 19:51:18 +08:00
IanShaw027	5baa8b5673	feat(service): 实现运维监控业务逻辑层 - 新增 ops 主服务（ops_service.go）和端口定义（ops_port.go） - 实现账号可用性检查服务（ops_account_availability.go） - 实现数据聚合服务（ops_aggregation_service.go） - 实现告警评估服务（ops_alert_evaluator_service.go） - 实现告警管理服务（ops_alerts.go） - 实现数据清理服务（ops_cleanup_service.go） - 实现并发控制服务（ops_concurrency.go） - 实现仪表板服务（ops_dashboard.go） - 实现错误处理服务（ops_errors.go） - 实现直方图服务（ops_histograms.go） - 实现指标采集服务（ops_metrics_collector.go） - 实现查询模式服务（ops_query_mode.go） - 实现实时监控服务（ops_realtime.go） - 实现请求详情服务（ops_request_details.go） - 实现重试机制服务（ops_retry.go） - 实现配置管理服务（ops_settings.go） - 实现趋势分析服务（ops_trends.go） - 实现窗口统计服务（ops_window_stats.go） - 添加 ops 相关领域常量 - 注册 service 依赖注入	2026-01-09 20:53:44 +08:00

19 Commits