Pluviobyte 8a999f438d
fix(ws): exclude terminal events from first-token detection
isOpenAIWSTokenEvent classified response.completed / response.done as
token events. When upstream finishes a request without ever emitting
a recognizable delta (e.g. cached completions or models that skip
incremental output), firstTokenMs was then filled at the terminal
event's timestamp, so the first-token latency metric effectively
reported total request duration.

Terminal events are already handled separately by
isOpenAIWSTerminalEvent. Treating them as token events makes the two
classifiers overlap, which violates the implicit invariant that the
token-event and terminal-event sets are disjoint.

The metric only affects ForwardResult.FirstTokenMs (logging and
observability) — billing and routing are unchanged.

Add regression tests for both directions:

* TestIsOpenAIWSTokenEvent_TerminalEventsExcluded covers each
  classification branch.
* TestIsOpenAIWSTokenEvent_DisjointWithTerminal asserts the
  disjoint-set invariant for every known terminal event.

Both new tests fail when the old `return eventType == "response.completed"
|| eventType == "response.done"` is restored.

Fixes #2651

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-29 01:33:42 +00:00
..
2026-03-13 17:00:16 +08:00
2026-03-13 19:15:27 +08:00
2026-03-13 19:15:27 +08:00
2026-01-05 17:07:29 +08:00
2026-01-15 15:14:44 +08:00
2026-01-15 15:14:44 +08:00
2026-01-15 15:14:44 +08:00
2026-05-19 19:37:41 +08:00
2026-05-19 19:37:41 +08:00
2026-05-19 19:37:41 +08:00
2026-02-02 22:13:50 +08:00
2026-04-23 16:34:37 +08:00