Skip to content

Add support for GPT-Realtime-2.0#5690

Open
ShayneP wants to merge 4 commits intomainfrom
ShayneP/realtime-2.0
Open

Add support for GPT-Realtime-2.0#5690
ShayneP wants to merge 4 commits intomainfrom
ShayneP/realtime-2.0

Conversation

@ShayneP
Copy link
Copy Markdown
Contributor

@ShayneP ShayneP commented May 8, 2026

Summary

Update realtime agent output handling for Realtime 2.0 responses and fix transcript synchronization races around overlapping segment lifecycle events.

Realtime 2.0 can produce multiple message items for a single response. The Agents output stack exposes playback through a shared AudioOutput segment contract, so this PR forwards realtime message outputs sequentially through the sink. That keeps playback-start/playback-finished events attributable to the correct message and avoids adding or truncating the wrong assistant item during interruption.

This PR also hardens TranscriptSynchronizer segment handling when audio/text flush timing is not perfectly aligned.

Changes

  • Support multiple realtime message items in one generation.
  • Forward each realtime message’s audio/text output to completion before starting the next message’s sink forwarding.
  • Wait for per-message audio playout before registering the next message’s playback listener.
  • Preserve existing single-message realtime behavior for current providers.
  • Fix transcript synchronizer stale-duration handling by resetting per-segment pushed audio duration during flush.
  • Keep pending/finalizing transcript segment impls alive until their text/audio inputs are complete.
  • Emit interrupted or already-ready playback-finished events synchronously to preserve existing state transition ordering.
  • Delay non-interrupted synced playback-finished events only when needed to include complete synchronized transcript text.
  • Apply pause/resume to active, pending, and finalizing transcript segments.
  • Add focused regression tests for transcript segment advancement, delayed text completion, and pause/resume behavior.

Compatibility

This is intended to be public-API compatible. It does not change the AudioOutput or TextOutput interfaces or event payload shapes.

Existing realtime models emitted a single message item per response, so the sequential forwarding path preserves prior behavior while correctly handling the new Realtime 2.0 multi-message response shape.

The transcript synchronizer changes are internal lifecycle fixes. The main observable timing change is that a non-interrupted synced playback_finished event may wait for text/audio inputs to complete when playback finishes before transcript input drains. Interrupted and already-ready playback events remain synchronous.

Testing

  • make check
  • make unit-tests
  • uv run pytest tests/test_agent_session.py tests/test_transcript_synchronizer.py -q
  • git diff --check

make unit-tests result: 635 passed, 2 skipped

@ShayneP ShayneP requested review from theomonnom and tinalenguyen May 8, 2026 17:15
@chenghao-mou chenghao-mou requested a review from a team May 8, 2026 17:15
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant