sub2api

Author	SHA1	Message	Date
win	c5eb305f7f	chore: merge upstream v0.1.119-121, keep Windsurf/Antigravity customizations Upstream changes merged: - fix(scheduler): resolve SetSnapshot race conditions with Lua CAS script - fix: improve sticky session scheduling (debug logs + layer 1.5 checks) - feat: Anthropic cache TTL injection toggle - fix(gateway): stream EOF failover + sanitize stream errors - feat(httputil): zstd/gzip/deflate request decompression + bomb guard - feat(openai): OpenAI Fast/Flex Policy (HTTP + WebSocket + Admin) - feat(vertex): Vertex Service Account support - feat: account bulk edit scope and compact settings - feat(affiliate): rebate freeze migration - fix(openai): various fixes (passthrough fields, compact payload, etc.) Conflict resolutions: - domain/constants.go: keep both AccountTypeWindsurfSession + AccountTypeServiceAccount - scheduler_cache_unit_test.go: keep both test functions - gateway_service.go: remove dead code (claudeCodeUserAgentRe, isClaudeCodeRequest) - wire_gen.go: keep Windsurf service chain + add upstream claudeTokenProvider param - frontend/src/types/index.ts: keep windsurf + service_account types - frontend CreateAccountModal.vue: keep Windsurf login + Vertex service_account blocks - frontend PlatformTypeBadge.vue: keep both Session + Vertex cases - account_test_service.go: fix createTestPayload call to pass empty prompt arg	2026-05-02 16:52:21 +08:00
github-actions[bot]	48912014a1	chore: sync VERSION to 0.1.121 [skip ci]	2026-04-30 06:06:12 +00:00
shaw	9d801595c9	test: 更新管理员设置契约字段	2026-04-30 13:48:27 +08:00
Wesley Liddick	9c448f89a8	Merge pull request #2118 from DaydreamCoding/fix/restore-pagination-localStorage fix: 恢复表格分页大小 localStorage 持久化	2026-04-30 13:42:18 +08:00
shaw	73b872998e	feat: 添加 Anthropic 缓存 TTL 注入开关	2026-04-30 13:38:22 +08:00
shaw	094e1171ef	fix(openai): infer previous response for item references	2026-04-30 12:02:08 +08:00
shaw	733627cf9d	fix: improve sticky session scheduling	2026-04-30 11:38:11 +08:00
DaydreamCoding	f084d30d65	fix: 恢复表格分页大小 localStorage 持久化 - usePersistedPageSize: 恢复 localStorage 读写，以系统配置为 fallback - useTableLoader: handlePageSizeChange 时写入 localStorage - Pagination.vue: handlePageSizeChange 时写入 localStorage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-30 10:35:15 +08:00
github-actions[bot]	8ad099baa6	chore: sync VERSION to 0.1.120 [skip ci]	2026-04-29 15:08:59 +00:00
shaw	8bf2a7b88a	fix(scheduler): resolve SetSnapshot race conditions and remove usage throttle Backend: Fix three race conditions in SetSnapshot that caused account scheduling anomalies and broken sticky sessions: - Use Lua CAS script for atomic version activation, preventing version rollback when concurrent goroutines write snapshots simultaneously - Add UnlockBucket to release rebuild lock immediately after completion instead of waiting 30s TTL expiry - Replace immediate DEL of old snapshots with 60s EXPIRE grace period, preventing readers from hitting empty ZRANGE during version switches Frontend: Remove serial queue throttle (1-2s delay per request) from usage loading since backend now uses passive sampling. All usage requests execute immediately in parallel.	2026-04-29 22:48:39 +08:00
shaw	40feb86ba4	fix(httputil): add decompression bomb guard and fix errcheck lint	2026-04-29 22:11:45 +08:00
Wesley Liddick	f972a2faf2	Merge pull request #1990 from haha1903/feat/zstd-request-decompression feat(httputil): decode zstd/gzip/deflate request bodies	2026-04-29 22:08:28 +08:00
Wesley Liddick	55a7fa1e07	Merge pull request #2005 from gaoren002/pr/openai-strip-passthrough-fields fix(openai): strip unsupported passthrough fields	2026-04-29 21:46:19 +08:00
shaw	5e54d492be	fix(lint): check type assertion error in codex transform test The errcheck linter flagged an unchecked type assertion on item["type"].(string). Use the two-value form with require.True to satisfy the linter and fail clearly on unexpected types.	2026-04-29 21:35:18 +08:00
Wesley Liddick	8d6d31545f	Merge pull request #2068 from Ogannesson/fix/openai-drop-reasoning-items-from-input fix(openai): drop reasoning items from /v1/responses input on OAuth path	2026-04-29 21:32:52 +08:00
Wesley Liddick	17ced6b73a	Merge pull request #2027 from hansnow/codex/fix-api-key-rate-limit-reset fix(api-key): reset rate limit usage cache	2026-04-29 21:27:52 +08:00
Wesley Liddick	7f8f3fe0dd	Merge pull request #2100 from KnowSky404/fix/codex-cli-edit-resend-tool-continuation [codex] fix WS continuation inference for explicit tool replay	2026-04-29 21:14:55 +08:00
Wesley Liddick	46f06b2498	Merge pull request #2050 from zvensmoluya/fix/openai-compact-payload-fields fix(openai): preserve current Codex compact payload fields	2026-04-29 21:03:48 +08:00
shaw	7ce5b83215	chore: remove superpowers docs	2026-04-29 21:00:30 +08:00
Wesley Liddick	27cad10d30	Merge pull request #2030 from KnowSky404/feature/account-bulk-edit-scope-and-compact feat: support filtered account bulk edit and align compact OpenAI bulk fields	2026-04-29 20:56:43 +08:00
Wesley Liddick	ff6fa0203d	Merge pull request #2058 from ivanvolt-labs/fix-responses-function-tool-choice fix: use Responses-compatible function tool_choice format	2026-04-29 20:43:43 +08:00
KnowSky404	f7c13af11f	fix: format ingress continuation test	2026-04-29 18:02:19 +08:00
KnowSky404	28dc34b6a3	fix(openai): avoid inferred WS continuation on explicit tool replay	2026-04-29 17:38:08 +08:00
Wesley Liddick	4d676dddd1	Merge pull request #2066 from alfadb/fix/anthropic-stream-eof-failover fix(gateway): Anthropic 流式 EOF 失败移交 + SSE error 帧标准化	2026-04-29 17:09:47 +08:00
shaw	93d91e20b9	fix(vertex): audit fixes for Vertex Service Account feature (#1977 ) - Security: force token_uri to Google default, preventing SSRF via crafted service account JSON - Dedup: extract shared getVertexServiceAccountAccessToken() to eliminate ~35 lines of duplication between ClaudeTokenProvider and GeminiTokenProvider - Fix: apply model mapping + Vertex model ID normalization in forward_as_responses and forward_as_chat_completions paths - Fix: exclude service_account from AI Studio endpoint selection (Vertex cannot serve generativelanguage.googleapis.com) - Feature: add model restriction/mapping UI for service_account in EditAccountModal - Dedup: extract VERTEX_LOCATION_OPTIONS to shared constants - i18n: replace all hardcoded Chinese strings in Vertex UI with translation keys	2026-04-29 16:53:09 +08:00
Wesley Liddick	63ef23108c	Merge pull request #1977 from sholiverlee/vertex feat: 支持 Vertex Service Account（Anthropic / Gemini）	2026-04-29 15:48:26 +08:00
alfadb	d78478e866	fix(gateway): sanitize stream errors to avoid leaking infrastructure topology (net.OpError).Error() concatenates Source/Addr fields, so the previous disconnectMsg surfaced internal source IP/port and upstream server address to clients via SSE error frames and UpstreamFailoverError.ResponseBody (reported by @Wei-Shaw on PR #2066). - Add sanitizeStreamError that maps known errors (io.ErrUnexpectedEOF, context.Canceled, syscall.ECONNRESET/EPIPE/ETIMEDOUT/...) to fixed descriptions and falls back to a generic placeholder, with an explicit net.OpError branch that drops Source/Addr fields entirely. - Use sanitized message in client-facing disconnectMsg; full ev.err is still preserved in the existing operator log line for diagnosis. - Tests cover net.OpError redaction, the failover ResponseBody path, and every known sanitized error mapping.	2026-04-29 15:44:54 +08:00
win	fdd2d08a4d	feat: merge feat/omniroute-ideas — P2C scheduler, quota scoring, tier fallback	2026-04-29 15:42:37 +08:00
Wesley Liddick	bf43fb4e38	Merge pull request #2044 from VitalyAnkh/fix/openai-image-apikey-versioned-base-url fix(openai): honor versioned image base URLs	2026-04-29 15:20:14 +08:00
Wesley Liddick	a16c66500f	Merge pull request #2090 from touwaeriol/feat/ops-retention-zero feat(ops): allow retention days = 0 to wipe table on each scheduled cleanup	2026-04-29 15:12:30 +08:00
erio	4b6954f9f0	feat(ops): allow retention days = 0 to wipe table on each scheduled cleanup Background / 背景 The ops cleanup task currently rejects retention days < 1 in both validate and normalize, so operators who want minimal-history setups (e.g. high churn deployments that prefer near-realtime cleanup) cannot express that intent through the UI. The only options are 1+ days, which keeps at least 24h of history regardless of cron frequency. ops 清理任务目前在 validate 和 normalize 两处都拒绝小于 1 的保留天数，让希望尽量不留历史的运维场景（高吞吐部署 + 想用近实时清理）无法通过 UI 表达。最低只能配 1，等于不管 cron 多频繁，至少都会保留 24 小时的历史。 Purpose / 目的 Let admins set retention days to 0, meaning "every scheduled cleanup run wipes the corresponding table(s) entirely". Combined with a more frequent cron (e.g. `0 * * * *`) this yields effectively rolling cleanup. 允许管理员把保留天数设为 0，语义为"每次定时清理时把对应表全部清空"。搭配更频繁的 cron（比如每小时整点）即可获得近似滚动清理的效果。 Changes / 改动内容 Backend - service/ops_settings.go: validate accepts [0, 365]; normalize only refills default 30 when value is < 0 (negative is treated as legacy bad data, 0 is honoured) - service/ops_cleanup_service.go: introduce `opsCleanupPlan(now, days)` returning `(cutoff, truncate, ok)`. days==0 returns truncate=true and short-circuits to a new `truncateOpsTable` helper that uses `TRUNCATE TABLE` (O(1), no WAL, no VACUUM pressure). days>0 keeps the existing batched DELETE path unchanged. Empty tables skip TRUNCATE to avoid the ACCESS EXCLUSIVE lock entirely - Extract `isMissingRelationError` helper to dedupe the "table not yet created" tolerance shared by both delete and truncate paths - Add unit tests for `opsCleanupPlan` (three branches) and `isMissingRelationError` 后端 - service/ops_settings.go: validate 接受 [0, 365]；normalize 仅在 < 0 时回填默认 30（负数视为脏数据，0 被尊重） - service/ops_cleanup_service.go: 抽 `opsCleanupPlan(now, days)` 返回 `(cutoff, truncate, ok)`。days==0 → truncate=true，走新增 `truncateOpsTable`（TRUNCATE TABLE，O(1)，无 WAL、无 VACUUM 压力）； days>0 仍走原批量 DELETE 路径，行为完全不变。空表跳过 TRUNCATE，避免无意义的 ACCESS EXCLUSIVE 锁 - 抽 `isMissingRelationError` helper 复用 delete / truncate 两处的 "表不存在"宽容判断 - 补 `opsCleanupPlan` 三分支 + `isMissingRelationError` 单元测试 Frontend - OpsSettingsDialog.vue: validation accepts [0, 365]; input min=0 - i18n (zh/en): hint mentions "0 = wipe all on every cleanup", validation message updated to 0-365 range 前端 - OpsSettingsDialog.vue: 校验放宽到 [0, 365]，input min 改 0 - i18n（zh/en）：hint 补"0 = 每次清理时清空所有"，错误提示改 0-365 Trade-offs / 取舍 - TRUNCATE requires ACCESS EXCLUSIVE lock briefly, but ops tables only have the cleanup task as a writer, so the lock is invisible to other workloads - Empty-table guard avoids the lock when there is nothing to clean - Negative values are still treated as legacy bad data and replaced with default 30 to preserve compatibility	2026-04-29 15:01:02 +08:00
shaw	da4b078df2	chore: update sponsors	2026-04-29 14:41:35 +08:00
win	0a3666ef24	x Some checks failed Security Scan / backend-security (push) Failing after 1m31s Details Security Scan / frontend-security (push) Failing after 7s Details CI / test (push) Failing after 6s Details CI / frontend (push) Failing after 4s Details CI / golangci-lint (push) Failing after 4s Details CI / windsurf-platform (macos-latest) (push) Has been cancelled Details CI / windsurf-platform (windows-latest) (push) Has been cancelled Details	2026-04-29 10:32:36 +08:00
win	5123d92b44	feat(scheduling): add cross-tier fallback chain (subscription → API Key → Bedrock) Adds an opt-in tier-based fallback scheduling path for Anthropic accounts: - accountTierLevel(): derives tier from account type without DB migration (tier-0=OAuth/SetupToken, tier-1=APIKey, tier-2=Bedrock) - enableTierFallbackChain(): new config flag gateway.scheduling.enable_tier_fallback_chain (default false) - selectAccountWithTierFallback(): loads all Anthropic accounts, groups by tier, honors sticky sessions, applies all existing schedulability guards, then tries tiers 0→1→2 in order via tryAcquireByLegacyOrder - Wired into SelectAccountForModelWithExclusions: Anthropic platform + tier fallback enabled → calls new path instead of mixed scheduling - Fix pre-existing unit-test build break: NewGatewayService now requires *RPMTokenBucketService (added in Task #5); add missing nil param - 7 tests: tier mapping, config toggle, subscription preference, APIKey fallback, exclusion handling, empty-pool error, Bedrock last resort	2026-04-29 03:23:39 +08:00
win	a2ab67f8c7	feat(scheduler): add P2C + quota-aware scheduling for OpenAI accounts - Add GetQuotaRemainingFraction() to Account: returns [0,1] fraction of remaining quota; 1.0 when no limit is configured (unlimited accounts) - Add Quota float64 weight field to GatewayOpenAIWSSchedulerScoreWeights and EnableP2CScheduling bool to GatewayOpenAIWSConfig (both default off) - Extend selectByLoadBalance scoring with quota factor (gated by Quota>0) - Add selectByPowerOfTwo(): O(1) P2C selection — samples 2 random candidates, tries the better-scored one first then the other, falls back to wait plan; activated when EnableP2CScheduling=true - Add openAIWSP2CEnabled() helper on OpenAIGatewayService - Add 6 tests covering quota fraction edge cases, P2C toggle, weight defaults, single-candidate P2C, two-candidate P2C selection, and quota score ordering	2026-04-29 03:13:30 +08:00
win	d1e2d39c26	feat(viewer): add real-time request stream WebSocket endpoint Adds GET /api/v1/admin/ops/ws/requests — a fan-out WebSocket that pushes per-request metadata (method, path, model, account_id, status, latency_ms) to all connected admin clients the moment each gateway dispatch completes. - service/request_event_bus.go: lock-free pub/sub with non-blocking drop when per-subscriber buffer (64 slots) is full; nil-safe Publish - service/request_event_bus_test.go: 6 tests (basic, fanout, drop, nil, close) - GatewayHandler: records reqStartTime at entry; defer emits RequestEvent on every return; sets status success/error/rate_limited in both Gemini and Anthropic dispatch paths - OpsHandler: accepts *RequestEventBus; wires it to RequestStreamWSHandler - ops_ws_requests_handler.go: subscribes to bus, pushes JSON per event, reuses existing upgrader/conn-limit/ping-pong infrastructure - Route: ws.GET("/requests", ...) alongside existing /ws/qps - wire_gen.go: requestEventBus shared between OpsHandler and GatewayHandler	2026-04-29 01:48:15 +08:00
win	d535688bfd	feat(context): add proactive context compression for long conversations - New context_compressor.go: pure functions operating on raw JSON body (gjson/sjson pattern). approxTokens uses chars/4 heuristic. - compressMessages: removes oldest messages from front, treating consecutive assistant(tool_use)+user(tool_result) pairs as atomic units to prevent orphaned tool_result blocks. - Hooked into Forward() after StripEmptyTextBlocks, gated on account.Credentials[enable_context_compression]. - Config: gateway.context_compression.max_tokens (default 190000). - 8 unit tests covering: approx tokens, no-op when under budget, oldest-message trimming, tool pair preservation, atomic pair removal, body passthrough, body trimming.	2026-04-29 01:33:05 +08:00
win	95814974de	feat(rpm): add token bucket smoothing for RPM rate limiting - New RPMTokenBucketService: per-account continuous-refill token buckets (rate = rpm/60 tokens/sec, capacity = rpm). No new dependencies. - GatewayService.AcquireRPMToken() delegates to the bucket service. - Gateway handler inserts RPM token wait BEFORE wrapReleaseOnDone in both Gemini and Anthropic dispatch paths; timeout returns 429 and releases slot. - Config: gateway.rpm_smoothing.enabled (default false) + max_wait_ms (default 5000). - 7 unit tests covering: immediate acquire, zero RPM, timeout, wait+refill, context cancel, account isolation, bucket reset on RPM change.	2026-04-29 01:22:54 +08:00
win	5c8c15cdb1	feat(refresh,repo): add singleflight to dedupe concurrent token refresh and unschedulable writes Two anti-thundering-herd improvements: 1. OAuthRefreshAPI.RefreshIfNeeded Wrap the existing distributed-lock + DB-reread + executor.Refresh pipeline in a per-process singleflight keyed by cacheKey+window. Without this, N concurrent goroutines on the same account each pay one Redis lock RTT and one DB reread; with it, only the leader pays and the rest share the result. The refreshWindow is part of the key so a long background-refresh window cannot starve a short foreground-refresh window. 2. accountRepository.SetTempUnschedulable Wrap the same path (UPDATE + scheduler outbox enqueue + scheduler cache sync) in a per-process singleflight keyed by id+until+reason. The SQL guard (existing < new) already makes the UPDATE idempotent, but N callers still cost N round-trips and N outbox inserts. With singleflight, an upstream 401 burst that hits the same account collapses to one execution. Tests cover dedup behavior, key separation by account / refresh window, and that the SQL exec count drops from N to <=2 (UPDATE + outbox).	2026-04-29 00:43:23 +08:00
win	110902ad4b	feat(health): split liveness and readiness probes Add HealthService with Liveness (no-op) and Readiness (DB+Redis ping with per-component timeout) checks. Expose three endpoints: - /healthz : new liveness endpoint, zero-dependency, always 200 - /ready : new readiness endpoint, returns 503 with details on dep failure; suitable for K8s readinessProbe and load balancers - /health : preserved for backward compatibility, equivalent to /healthz Switch primary docker-compose healthcheck to /ready so the container is only marked healthy once DB+Redis are reachable. Standalone/dev/ local compose files keep /health to avoid disrupting existing setups. Tests: unit tests cover liveness, readiness with both deps healthy, each dep failing independently, and per-component timeout enforcement.	2026-04-28 23:39:50 +08:00
win	d6df41feaa	chore(claude): bump CLI fingerprint to 2.1.88 and accept claude-code/ UA - Centralize Claude CLI fingerprint constants (UA, x-stainless-*) in pkg/claude with BuildCLI/CodeUserAgent helpers - Reuse constants in DefaultHeaders, identity_service defaults, and antigravity identity defaults to keep all callers in sync - Extend ClaudeCodeValidator to accept both claude-cli/ and claude-code/ UA prefixes (transport/helper requests use the latter) - Update related tests to cover the new UA prefix and version	2026-04-28 22:35:24 +08:00
Oganneson	7452fad820	fix(openai): drop reasoning items from /v1/responses input on OAuth path Closes #1957 The OAuth path forwards client requests to chatgpt.com/backend-api/codex/responses, where applyCodexOAuthTransform forces store=false (chatgpt.com's codex backend rejects store=true). Reasoning items emitted under store=false are NEVER persisted upstream, so any rs_* reference that a client carries forward in a subsequent input[] array triggers a guaranteed upstream 404: Item with id 'rs_...' not found. Items are not persisted when `store` is set to false. Try again with `store` set to true, or remove this item from your input. sub2api wraps this as 502 "Upstream request failed" and the conversation breaks on every multi-turn /v1/responses request that uses reasoning + tools (reproducible with gpt-5.5; gpt-5.4 happens to dodge it because the upstream does not emit reasoning items for that model). Affected clients include any that follow the OpenAI Responses API spec and replay prior assistant items verbatim — in practice this hit OpenClaw and similar agent harnesses on every turn ≥2 with tool use. The fix: in filterCodexInput, drop input items with type == "reasoning" entirely. The model never reads reasoning summary text from input (only encrypted_content can carry reasoning context across turns, and chatgpt.com under store=false does not emit it), so this is a no-op for the model itself and a clean removal of unreachable upstream lookups. Scope is intentionally narrow: * Only OAuth account requests (account.Type == AccountTypeOAuth) reach applyCodexOAuthTransform / filterCodexInput. * API-key accounts going to api.openai.com/v1/responses are unaffected (store=true works there, rs_* persists, multi-turn already works). * Anthropic / Gemini platform groups go through different transforms and are unaffected. * /v1/chat/completions is unaffected (no reasoning items). * item_reference items (different type) are unaffected — only type == "reasoning" is dropped. Verification: * Existing tests pass: go test ./internal/service/ -run Codex\|Tool\|OAuth * New regression test asserts reasoning items are dropped under both preserveReferences=true and preserveReferences=false. * End-to-end repro on gpt-5.5 multi-turn + tools: pre-patch 502, post-patch 200. Repro on gpt-5.4 unchanged. Three-turn deep loop on gpt-5.5 passes.	2026-04-28 20:36:50 +08:00
alfadb	4c474616b9	fix(gateway): emit Anthropic-standard SSE error events and failover body Two follow-ups to PR #2066's failover-wrap fix: 1. Failover ResponseBody (`UpstreamFailoverError.ResponseBody`) was encoded as `{"error": "<msg>"}` (string field). `ExtractUpstreamErrorMessage` probes for `error.message`, `detail`, or top-level `message` only — so `handleFailoverExhausted` and downstream passthrough rules saw an empty message, losing the EOF root cause in ops logs. Re-encode as the Anthropic standard shape `{"type":"error","error":{"type":"upstream_disconnected","message":"..."}}`. (Addresses the inline review comment from copilot-pull-request-reviewer on Wei-Shaw/sub2api#2066.) 2. The streaming `event: error` SSE frame for `response_too_large`, `stream_read_error`, and `stream_timeout` was non-standard (`{"error":"<reason>"}`). Anthropic SDKs (and Claude Code) expect `{"type":"error","error":{"type":"...","message":"..."}}` and parse `error.type`/`error.message` accordingly. Refactor `sendErrorEvent` to take both reason and message, and emit the standard frame so client SDKs surface a real diagnostic message instead of a generic stream error. This does not by itself prevent task interruption on long-stream EOF (SSE has no resume; client-side retry remains the only complete fix), but it gives both server-side ops logs and client-side error UIs a meaningful upstream message so users know the next step is to retry. Tests updated to assert the new body shape on both branches plus a new assertion that `ExtractUpstreamErrorMessage` returns a non-empty string. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 20:24:17 +08:00
alfadb	6327573534	fix(gateway): wrap Anthropic stream EOF as failover error before client output Anthropic streaming path (gateway_service.go) returned a plain error on upstream SSE read failure, so the handler-level UpstreamFailoverError check never fired and the client received a bare `stream_read_error` event, breaking long-running tasks even when no bytes had been written yet. The most common trigger is HTTP/2 GOAWAY from api.anthropic.com edge backends doing graceful rotation: Go's http.Transport surfaces this as `unexpected EOF` and never auto-retries. Mirror what the OpenAI and antigravity gateways already do: when the read error happens before any byte has reached the client (`!c.Writer.Written()`), return `*UpstreamFailoverError{StatusCode: 502, RetryableOnSameAccount: true}` so the handler can retry on the same or another account. After client output has begun, SSE has no resume protocol — keep the existing passthrough behavior. Tests cover both branches via streamReadCloser-based fixtures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 19:12:48 +08:00
ivanvolt	04b2866f65	fix: use Responses-compatible function tool_choice format	2026-04-28 16:26:09 +08:00
Wesley Liddick	b0a2252ed1	Merge pull request #2051 from DaydreamCoding/openai-fast-flex-policy feat(openai): OpenAI Fast/Flex Policy 完整实现（HTTP + WebSocket + Admin）	2026-04-28 12:14:43 +08:00
DaydreamCoding	30f55a1f72	feat(openai): OpenAI Fast/Flex Policy 完整实现（HTTP + WebSocket + Admin）对称参照 Claude BetaPolicy 的 fast-mode 过滤实现，新增针对 OpenAI 上游 service_tier 字段（priority / flex，含客户端 "fast" → "priority" 归一化）的 pass / filter / block 三态策略，覆盖全部 OpenAI 入口 + admin 配置入口。后端核心 - 新增 SettingKeyOpenAIFastPolicySettings、OpenAIFastPolicyRule、 OpenAIFastPolicySettings 配置模型，含规则的 service_tier × action × scope × 模型白名单 × fallback action 维度。 - SettingService.Get/SetOpenAIFastPolicySettings；缺失时返回内置默认策略（所有模型的 priority 走 filter，whitelist 为空，fallback=pass）。设计依据：service_tier=fast 是用户级开关，与 model 字段正交，默认锁定特定 model slug 会留下"用 gpt-4 + fast 透传 priority 上游"的绕过路径。JSON 解析失败不再静默 fallback，slog.Warn 记录脏数据，便于运维定位。 - service_tier 归一化（trim + ToLower + fast→priority + 白名单 priority/flex）与策略评估（evaluateOpenAIFastPolicy）作为唯一真实来源，HTTP / WS 共用。抽出纯函数 evaluateOpenAIFastPolicyWithSettings，配合 ctx-bound settings 快照（withOpenAIFastPolicyContext / openAIFastPolicySettingsFromContext）， WS 长会话入口预取一次后所有帧复用，避免每帧打到 settingService。 HTTP 入口（4 个） - Chat Completions、Anthropic 兼容（Messages，含 BetaFastMode→priority 二次命中）、原生 Responses、Passthrough Responses 全部接入 applyOpenAIFastPolicyToBody，filter 走 sjson 顶层删除 service_tier，block 返回 403 forbidden_error JSON。 - 4 入口统一使用 upstream 视角的 model（GetMappedModel + normalizeOpenAIModelForUpstream + Codex OAuth normalize 后的 slug），避免 chat/messages/native /responses/passthrough 因为 model 维度不同造成 whitelist 命中差异。 - 在 pass 路径也把客户端 "fast" 别名归一化为 "priority" 写回 body，否则 native /responses 与 passthrough 入口会把 "fast" 原样透传给上游导致 400/拒绝（chat-completions 入口的 normalizeResponsesBodyServiceTier 此前已具备同等行为）。 WebSocket 入口 - 新增 applyOpenAIFastPolicyToWSResponseCreate：严格匹配 type="response.create"，仅处理顶层 service_tier；filter 用 sjson 删字段， block 返回 typed *OpenAIFastBlockedError。 - ingress 路径在 parseClientPayload 内调用，block 命中先 Write Realtime 风格 error event 再返回 OpenAIWSClientCloseError(StatusPolicyViolation =1008)，依赖底层 WebSocket Conn.Write 的同步 flush 保证 error 先于 close。 - passthrough 路径在 RunEntry 前对 firstClientMessage 应用策略，并通过 openAIWSPolicyEnforcingFrameConn 包装 ReadFrame 对每个 client→upstream 帧执行策略；后续帧无 model 字段时回退到 capturedSessionModel。 filter 闭包内同时侦测 session.update / session.created 帧的 session.model 字段刷新 capturedSessionModel，封堵"首帧 model=gpt-4o（pass）→ session.update 改为 gpt-5.5 → 不带 model 的 response.create fallback 到 gpt-4o"的 mid-session 绕过路径。 - passthrough billing：requestServiceTier 在策略 filter 之后再从 firstClientMessage 提取，filter 命中时 OpenAIForwardResult.ServiceTier 上报 nil（default tier），与 HTTP 入口（reqBody 来自 post-filter map） / WS ingress（payload 来自 post-filter bytes）的语义一致。 - 错误事件 schema：{event_id: "evt_<32hex>", type: "error", error: {type: "forbidden_error", code: "policy_violation", message}}，与 OpenAI codex 客户端 error event 解析兼容。 Admin / Frontend - dto.SystemSettings / UpdateSettingsRequest 新增 openai_fast_policy_settings 字段（omitempty），bulk GET/PUT 接入。 - Settings 页 Gateway 页签新增 Fast/Flex Policy 表单卡片： service_tier × action × scope × 模型白名单 × fallback action 全字段配置。 - 前端守门：openaiFastPolicyLoaded 标志仅在 GET 真带回字段时才允许回写，避免 rollout/错误把默认规则覆盖成空；saveSettings 回写循环 skip 该字段，由专用刷新逻辑处理；仅 action=block 时发送 error_message，匹配后端 omitempty 行为。测试 - HTTP 路径：openai_fast_policy_test.go 覆盖默认配置（whitelist=[]，所有模型 priority filter）/ block 自定义错误 / scope 区分 / filter 删字段 / block 不改 body / block 短路上游 / Anthropic BetaFastMode 触发 OpenAI fast policy 等场景。 - WebSocket 路径：openai_fast_policy_ws_test.go 覆盖 helper 单元（filter / fast→priority 归一化 / flex 透传 / block typed error / 无 service_tier 字节不变 / 非 response.create 帧不动 / 空 type 帧不动 / event_id+code 字段断言 / 非字符串 service_tier 容错）+ pass 路径 fast 别名归一化回归 + ingress 端到端（filter 后上游不含 service_tier / block 后客户端先收 error event 再收 close 1008 且上游 0 写）+ passthrough capturedSessionModel fallback 用例（whitelist 策略下首帧建立、缺 model 命中 fallback、缺少 fallback 时的 leak 文档化）+ passthrough session.update / session.created 旋转 capturedSessionModel 的 mid-session 绕过回归 + passthrough billing post-filter ServiceTier 与 idempotent filter 回归。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 11:15:09 +08:00
Zven	3d4ca5e8d1	fix(openai): preserve current Codex compact payload fields	2026-04-28 10:55:29 +08:00
Oliver Li	0537a490f0	Merge branch 'Wei-Shaw:main' into vertex	2026-04-27 20:25:11 -04:00
VitalyR	ca5d029e7c	fix(openai): honor versioned image base URLs	2026-04-28 04:53:29 +08:00

1 2 3 4 5 ...

3313 Commits