Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Update realtime agent output handling for Realtime 2.0 responses and fix transcript synchronization races around overlapping segment lifecycle events.
Realtime 2.0 can produce multiple message items for a single response. The Agents output stack exposes playback through a shared
AudioOutputsegment contract, so this PR forwards realtime message outputs sequentially through the sink. That keeps playback-start/playback-finished events attributable to the correct message and avoids adding or truncating the wrong assistant item during interruption.This PR also hardens
TranscriptSynchronizersegment handling when audio/text flush timing is not perfectly aligned.Changes
Compatibility
This is intended to be public-API compatible. It does not change the
AudioOutputorTextOutputinterfaces or event payload shapes.Existing realtime models emitted a single message item per response, so the sequential forwarding path preserves prior behavior while correctly handling the new Realtime 2.0 multi-message response shape.
The transcript synchronizer changes are internal lifecycle fixes. The main observable timing change is that a non-interrupted synced
playback_finishedevent may wait for text/audio inputs to complete when playback finishes before transcript input drains. Interrupted and already-ready playback events remain synchronous.Testing
make checkmake unit-testsuv run pytest tests/test_agent_session.py tests/test_transcript_synchronizer.py -qgit diff --checkmake unit-testsresult: 635 passed, 2 skipped