isOpenAIWSTokenEvent classified response.completed / response.done as
token events. When upstream finishes a request without ever emitting
a recognizable delta (e.g. cached completions or models that skip
incremental output), firstTokenMs was then filled at the terminal
event's timestamp, so the first-token latency metric effectively
reported total request duration.
Terminal events are already handled separately by
isOpenAIWSTerminalEvent. Treating them as token events makes the two
classifiers overlap, which violates the implicit invariant that the
token-event and terminal-event sets are disjoint.
The metric only affects ForwardResult.FirstTokenMs (logging and
observability) — billing and routing are unchanged.
Add regression tests for both directions:
* TestIsOpenAIWSTokenEvent_TerminalEventsExcluded covers each
classification branch.
* TestIsOpenAIWSTokenEvent_DisjointWithTerminal asserts the
disjoint-set invariant for every known terminal event.
Both new tests fail when the old `return eventType == "response.completed"
|| eventType == "response.done"` is restored.
Fixes#2651
Co-authored-by: Cursor <cursoragent@cursor.com>