Conflicts resolved (preserving fork customizations):
- config.go: keep NodeTLSProxy + add upstream OpenAIHTTP2
- gateway_service.go: NewGatewayService now takes both rpmTokenBucketSvc
(local) and userPlatformQuotaRepo (upstream)
- wire_gen.go: wire both new args into the call site
- http_upstream.go: drop redundant settings re-assignment; keep proxy
URL log redaction
- http_upstream_test.go: adopt upstream's explicit-0-disables semantics;
keep 600s default constant in nil-cfg fallback test
- user_handler_test.go / gateway_record_usage_test.go: pick up new
userPlatformQuotaRepo nil parameter
Also updated test stubs (windsurf_google_login_test.go,
windsurf_tier_access_service_test.go, gateway_models_test.go) for new
SetModelRateLimit variadic signature and the extra NewGatewayService arg.
Upstream highlights: OpenAI embeddings gateway, user x platform USD
quota, content-moderation risk thresholds, OAuth 401 credentials
no-overwrite fix, HTTP/2 OpenAI upstream config, pool retry status code
configurability, long-context cache pricing multipliers.
Pool mode currently retries the same account for a fixed set of
upstream HTTP statuses: 401, 403, 429. Some upstream pool deployments
also need same-account retry for transient provider/proxy statuses
such as 502, 503, 520, 529, but hard-coding more statuses changes
behavior for everyone.
Add a per-account credentials option `pool_mode_retry_status_codes`
that lets admins choose which upstream HTTP status codes trigger
same-account retry in pool mode:
- Unset (default): preserve the current 401/403/429 default
- Explicit list: override the defaults with the configured codes
- Codes normalized to the 100-599 range, deduplicated, sorted
The standalone `isPoolModeRetryableStatus` helper is kept as the
default-only fallback. All 15 gateway call sites switch to the new
`Account.IsPoolModeRetryableStatus` method so behavior is preserved
for accounts that do not configure the new field.
Frontend admin UI gains a "Retry Status Codes" comma-separated input
under the pool-mode section in both Create/Edit account modals
(en + zh i18n).
Fixes#2731
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add GetQuotaRemainingFraction() to Account: returns [0,1] fraction of
remaining quota; 1.0 when no limit is configured (unlimited accounts)
- Add Quota float64 weight field to GatewayOpenAIWSSchedulerScoreWeights
and EnableP2CScheduling bool to GatewayOpenAIWSConfig (both default off)
- Extend selectByLoadBalance scoring with quota factor (gated by Quota>0)
- Add selectByPowerOfTwo(): O(1) P2C selection — samples 2 random candidates,
tries the better-scored one first then the other, falls back to wait plan;
activated when EnableP2CScheduling=true
- Add openAIWSP2CEnabled() helper on OpenAIGatewayService
- Add 6 tests covering quota fraction edge cases, P2C toggle, weight defaults,
single-candidate P2C, two-candidate P2C selection, and quota score ordering
- New context_compressor.go: pure functions operating on raw JSON body
(gjson/sjson pattern). approxTokens uses chars/4 heuristic.
- compressMessages: removes oldest messages from front, treating
consecutive assistant(tool_use)+user(tool_result) pairs as atomic units
to prevent orphaned tool_result blocks.
- Hooked into Forward() after StripEmptyTextBlocks, gated on
account.Credentials[enable_context_compression].
- Config: gateway.context_compression.max_tokens (default 190000).
- 8 unit tests covering: approx tokens, no-op when under budget,
oldest-message trimming, tool pair preservation, atomic pair removal,
body passthrough, body trimming.
Add quota exceeded check to IsSchedulable() and refactor
shouldClearStickySession to delegate to IsSchedulable(), eliminating
duplicated logic and fixing missed overload/rate-limit/expired checks.
Frontend displays quota exceeded status independently via quota fields.
- Add global toggle for account quota notification in admin settings
- Add percentage-based threshold type for per-account quota alerts
- Hide balance notify card on user profile when global toggle is off
- Expose balance_low_notify_enabled and account_quota_notify_enabled in PublicSettings
- Add threshold type (fixed/percentage) to QuotaNotifyToggle with $ / % switcher
Inject web search capability for Claude Console (API Key) accounts that
don't natively support Anthropic's web_search tool. When a pure
web_search request is detected, the gateway calls Brave Search or Tavily
API directly and constructs an Anthropic-protocol-compliant SSE/JSON
response without forwarding to upstream.
Backend:
- New `pkg/websearch/` SDK: Brave and Tavily provider implementations
with io.LimitReader, proxy support, and Redis-based quota tracking
(Lua atomic INCR + TTL, DECR rollback on failure)
- Global config via `settings.web_search_emulation_config` (JSON) with
in-process cache + singleflight, input validation, API key merge on
save, and sanitized API responses
- Channel-level toggle via `channels.features_config` JSONB column
(DB migration 101)
- Account-level toggle via `accounts.extra.web_search_emulation`
- Request interception in `Forward()` with SSE streaming response
construction using json.Marshal (no manual string concatenation)
- Manager hot-reload: `RebuildWebSearchManager()` called on config save
and startup via `SetWebSearchRedisClient()`
- 70 unit tests covering providers, manager, config validation,
sanitization, tool detection, query extraction, and response building
Frontend:
- Settings → Gateway tab: Web Search Emulation config card with global
toggle, provider list (add/remove, API key, priority, quota, proxy)
- Channels → Anthropic tab: web search emulation toggle with global
state linkage (disabled when global off)
- Account Create/Edit modals: web search emulation toggle for API Key
type with Toggle component
- Full i18n coverage (zh + en)
The quota reset mechanism is lazy — quota_daily_used/quota_weekly_used
in the database are only reset on the next IncrementQuotaUsed call.
The scheduling layer (IsQuotaExceeded) correctly checks period expiry
before enforcing limits, so the account remains usable. However, the
API response mapper reads the raw DB value without checking expiry,
causing the frontend to display cumulative usage (e.g. 110%) even
after the reset period has passed.
Add IsDailyQuotaPeriodExpired/IsWeeklyQuotaPeriodExpired methods and
use them in the mapper to return used=0 when the period has expired.
Consolidate two separate channel types (bedrock + bedrock-apikey) into
a single "AWS Bedrock" channel. Authentication mode is now distinguished
by credentials.auth_mode ("sigv4" | "apikey") instead of separate types.
Backend:
- Remove AccountTypeBedrockAPIKey constant
- IsBedrock() simplified; IsBedrockAPIKey() checks auth_mode
- Add IsAPIKeyOrBedrock() helper to eliminate repeated type checks
- Extend pool mode, quota scheduling, and billing to bedrock
- Add RetryableOnSameAccount to handleBedrockUpstreamErrors
- Add "bedrock" scope to Beta Policy for independent control
Frontend:
- Merge two buttons into one "AWS Bedrock" with auth mode radio
- Badge displays "Anthropic | AWS"
- Pool mode and quota limit UI available for bedrock
- Quota display in account list (usage bars, capacity badges, reset)
- Remove all bedrock-apikey type references
Extend the existing total quota limit with daily and weekly periodic
dimensions. Each dimension is independently configurable and uses lazy
reset — when the period expires, usage is automatically reset to zero on
the next increment. Any dimension exceeding its limit will pause the
account from scheduling.
Backend:
- Add GetQuotaDailyLimit/Used, GetQuotaWeeklyLimit/Used, HasAnyQuotaLimit
- Rewrite IncrementQuotaUsed with atomic CTE SQL for 3-dimension update
- Rewrite ResetQuotaUsed to clear all dimensions and period timestamps
- Update postUsageBilling to use HasAnyQuotaLimit()
- Preserve daily/weekly used values on account edit
Frontend:
- Refactor QuotaLimitCard from single v-model to 3-dimension props
- Add QuotaBadge component for compact D/W/$ display
- Update AccountCapacityCell with per-dimension badges
- Update Create/Edit modals with daily/weekly quota fields
- Update AccountActionMenu hasQuotaLimit to check all dimensions
- Add i18n strings for daily/weekly/total quota labels
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use TxPipeline (MULTI/EXEC) instead of Pipeline for atomic INCR+EXPIRE
- Filter negative values in GetBaseRPM(), update test expectation
- Add RPM batch query (GetRPMBatch) to account List API
- Add warn logs for RPM increment failures in gateway handler
- Reset enableRpmLimit on BulkEditAccountModal close
- Use union type 'tiered' | 'sticky_exempt' for rpmStrategy refs
- Add design decision comments for rdb.Time() RTT trade-off