Add support for GPT-Realtime-2.0 by ShayneP · Pull Request #5690 · livekit/agents

ShayneP · 2026-05-08T17:15:40Z

Summary

Update realtime agent output handling for Realtime 2.0 responses and fix transcript synchronization races around overlapping segment lifecycle events.

Realtime 2.0 can produce multiple message items for a single response. The Agents output stack exposes playback through a shared AudioOutput segment contract, so this PR forwards realtime message outputs sequentially through the sink. That keeps playback-start/playback-finished events attributable to the correct message and avoids adding or truncating the wrong assistant item during interruption.

This PR also hardens TranscriptSynchronizer segment handling when audio/text flush timing is not perfectly aligned.

Changes

Support multiple realtime message items in one generation.
Forward each realtime message’s audio/text output to completion before starting the next message’s sink forwarding.
Wait for per-message audio playout before registering the next message’s playback listener.
Preserve existing single-message realtime behavior for current providers.
Fix transcript synchronizer stale-duration handling by resetting per-segment pushed audio duration during flush.
Keep pending/finalizing transcript segment impls alive until their text/audio inputs are complete.
Emit interrupted or already-ready playback-finished events synchronously to preserve existing state transition ordering.
Delay non-interrupted synced playback-finished events only when needed to include complete synchronized transcript text.
Apply pause/resume to active, pending, and finalizing transcript segments.
Add focused regression tests for transcript segment advancement, delayed text completion, and pause/resume behavior.

Compatibility

This is intended to be public-API compatible. It does not change the AudioOutput or TextOutput interfaces or event payload shapes.

Existing realtime models emitted a single message item per response, so the sequential forwarding path preserves prior behavior while correctly handling the new Realtime 2.0 multi-message response shape.

The transcript synchronizer changes are internal lifecycle fixes. The main observable timing change is that a non-interrupted synced playback_finished event may wait for text/audio inputs to complete when playback finishes before transcript input drains. Interrupted and already-ready playback events remain synchronous.

Testing

make check
make unit-tests
uv run pytest tests/test_agent_session.py tests/test_transcript_synchronizer.py -q
git diff --check

make unit-tests result: 635 passed, 2 skipped

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

ShayneP added 4 commits May 8, 2026 10:06

Update for Realtime 2.0

3b6d189

Fix transcripts

fe7a0eb

Codex review

e1f5cd6

Commit makefile

91ac698

ShayneP requested review from theomonnom and tinalenguyen May 8, 2026 17:15

chenghao-mou requested a review from a team May 8, 2026 17:15

devin-ai-integration Bot reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for GPT-Realtime-2.0#5690

Add support for GPT-Realtime-2.0#5690
ShayneP wants to merge 4 commits intomainfrom
ShayneP/realtime-2.0

ShayneP commented May 8, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ShayneP commented May 8, 2026

Summary

Changes

Compatibility

Testing

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant