fix(oauth): don't overwrite credentials JSONB in 401 handler
The 401 handler in RateLimitService.HandleUpstreamError set account.Credentials["expires_at"] = time.Now() and then persisted the full credentials map via persistAccountCredentials, which routes through accountRepository.UpdateCredentials -> ent SetCredentials and replaces the entire JSONB column. The account passed to the handler is the request-start snapshot taken by the gateway at SelectAccount time. When another worker has just rotated refresh_token via oauth_refresh_api.RefreshIfNeeded, the snapshot still holds the old refresh_token; writing the full snapshot back rolls refresh_token in the DB back to the stale value. The next refresh cycle then calls the upstream with the stale token, receives invalid_grant, and tryRecoverFromRefreshRace re-reads the DB only to find currentRT == usedRT (because the 401 handler just poisoned the DB), returns false, and the account is incorrectly disabled. Drop the credentials write. InvalidateToken + SetTempUnschedulable is sufficient: the account is held out of scheduling during the cooldown, and after the cooldown the next request goes through token_provider's NeedsRefresh check, which routes through the locked, DB-re-reading RefreshIfNeeded path. The "force background refresh by setting expires_at = now" semantic is intentionally dropped. token_refresh_service will naturally pick the account up when the real expires_at enters the refresh window, and if the real expires_at has already passed by the time the account becomes schedulable again, token_provider's NeedsRefresh returns true and RefreshIfNeeded fires synchronously on the next request.
This commit is contained in:
parent
89d96f4b25
commit
6aec505016
@ -248,17 +248,15 @@ func (s *RateLimitService) HandleUpstreamError(ctx context.Context, account *Acc
|
||||
shouldDisable = true
|
||||
break
|
||||
}
|
||||
// 2. 设置 expires_at 为当前时间,强制下次请求刷新 token
|
||||
if account.Credentials == nil {
|
||||
account.Credentials = make(map[string]any)
|
||||
}
|
||||
account.Credentials["expires_at"] = time.Now().Format(time.RFC3339)
|
||||
if err := persistAccountCredentials(ctx, s.accountRepo, account, account.Credentials); err != nil {
|
||||
slog.Warn("oauth_401_force_refresh_update_failed", "account_id", account.ID, "error", err)
|
||||
} else {
|
||||
slog.Info("oauth_401_force_refresh_set", "account_id", account.ID, "platform", account.Platform)
|
||||
}
|
||||
// 3. 临时不可调度,替代 SetError(保持 status=active 让刷新服务能拾取)
|
||||
// 2. 临时不可调度,替代 SetError(保持 status=active 让刷新服务能拾取)
|
||||
// 注意:此处不再写回 account.Credentials/expires_at。
|
||||
// 原实现使用请求开始时的 account 快照整列覆盖 credentials JSONB(见
|
||||
// persistAccountCredentials → accountRepository.UpdateCredentials → SetCredentials),
|
||||
// 在另一个 worker 刚刷新完 refresh_token 的窄窗口内会把新 refresh_token 回滚为旧值,
|
||||
// 导致下一周期用旧 refresh_token 调上游拿到 invalid_grant 后,
|
||||
// tryRecoverFromRefreshRace 重读 DB 发现 currentRT == usedRT 也救不回来,账号被错误 disable。
|
||||
// 这里仅依赖 InvalidateToken + SetTempUnschedulable 让账号在冷却期内不被调度,
|
||||
// 冷却结束后由 token_provider 的 NeedsRefresh / token_refresh_service 走带分布式锁的正路刷新。
|
||||
msg := "Authentication failed (401): invalid or expired credentials"
|
||||
if upstreamMsg != "" {
|
||||
msg = "OAuth 401: " + upstreamMsg
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user