Skip to content

Python: coalesce code interpreter history chunks#5801

Open
he-yufeng wants to merge 3 commits into
microsoft:mainfrom
he-yufeng:fix/code-interpreter-history-chunks
Open

Python: coalesce code interpreter history chunks#5801
he-yufeng wants to merge 3 commits into
microsoft:mainfrom
he-yufeng:fix/code-interpreter-history-chunks

Conversation

@he-yufeng
Copy link
Copy Markdown
Contributor

Fixes #5793.

Summary

  • Coalesce streamed code_interpreter_tool_call and code_interpreter_tool_result content by call_id during response finalization.
  • When a later done event carries the full code, keep that full value instead of storing both chunk deltas and the complete script.
  • Add a history-provider regression test for the Cosmos-style chunked code interpreter shape.

To verify

  • uv run pytest packages\core\tests\core\test_sessions.py::TestHistoryProviderBase::test_after_run_stores_coalesced_code_interpreter_chunks -q --basetemp .tmp\pytest
  • uv run pytest packages\core\tests\core\test_sessions.py -q --basetemp .tmp\pytest -p no:cacheprovider
  • uv run pytest packages\core\tests\core\test_types.py -q --basetemp .tmp\pytest -p no:cacheprovider
  • uv run pytest packages\openai\tests\openai\test_openai_chat_client.py -q -k "code_interpreter" --basetemp .tmp\pytest -p no:cacheprovider
  • uv run ruff check packages\core\agent_framework_types.py packages\core\tests\core\test_sessions.py
  • uv run ruff format --check packages\core\agent_framework_types.py packages\core\tests\core\test_sessions.py
  • uv run mypy packages\core\agent_framework_types.py
  • uv run python -m py_compile packages\core\agent_framework_types.py packages\core\tests\core\test_sessions.py
  • git diff --check

Copilot AI review requested due to automatic review settings May 13, 2026 06:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses streamed code-interpreter history bloat in the Python core by coalescing code_interpreter_tool_call / code_interpreter_tool_result content items with the same call_id (or item_id) during response finalization, ensuring history providers receive a single aggregated item per logical tool call.

Changes:

  • Add response-finalization logic to coalesce code-interpreter tool call/result content by (type, call_id).
  • Implement merge behavior that prefers a later “done” event carrying the full code over keeping both deltas and the complete script.
  • Add a regression test ensuring history providers store the coalesced code-interpreter content shape.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
python/packages/core/agent_framework/_types.py Adds code-interpreter coalescing/merge helpers and applies them during _finalize_response.
python/packages/core/tests/core/test_sessions.py Adds a history-provider regression test verifying coalesced code-interpreter chunks are stored as a single content item.

@moonbox3
Copy link
Copy Markdown
Contributor

moonbox3 commented May 14, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework
   _types.py11879791%59, 68–69, 123, 128, 147, 149, 153, 157, 159, 161, 163, 181, 185, 211, 233, 238, 243, 247, 277, 690–691, 850–851, 1286, 1358, 1393, 1413, 1423, 1475, 1607–1609, 1791, 1894–1899, 1924, 1979, 1984, 1994, 2002, 2009–2013, 2031, 2104, 2112–2114, 2119, 2222, 2245, 2500, 2524, 2623, 2877, 3087, 3146, 3185, 3196, 3198–3202, 3204, 3207–3215, 3225, 3314, 3451, 3456, 3461, 3466, 3470, 3554–3556, 3585, 3673–3677
TOTAL34008392088% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
6703 30 💤 0 ❌ 0 🔥 1m 53s ⏱️

@moonbox3
Copy link
Copy Markdown
Contributor

@he-yufeng please have a look at the failing code quality failures. Thank you.

@he-yufeng
Copy link
Copy Markdown
Contributor Author

Thanks for the heads up. I pushed a small follow-up that narrows the nested content list types before iteration/deepcopy, which addresses the pyright failures from the package check.

Validated locally:

  • python -m py_compile agent_framework\_types.py tests\core\test_sessions.py
  • python -m pytest tests\core\test_sessions.py::TestHistoryProviderBase::test_after_run_stores_coalesced_code_interpreter_chunks -q --basetemp .tmp\pytest -p no:cacheprovider
  • python -m ruff check agent_framework\_types.py tests\core\test_sessions.py
  • uv run pyright packages\core\agent_framework\_types.py
  • git diff --check

@he-yufeng
Copy link
Copy Markdown
Contributor Author

Thanks, I pushed a follow-up for the mypy redundant-cast failure in packages/core/agent_framework/_types.py.

Validated locally:

  • uv run mypy packages\core\agent_framework\_types.py
  • uv run python scripts\workspace_poe_tasks.py ci-mypy with UTF-8 console env on Windows
  • uv run ruff check packages\core\agent_framework\_types.py packages\core\tests\core\test_sessions.py
  • uv run ruff format --check packages\core\agent_framework\_types.py packages\core\tests\core\test_sessions.py
  • uv run python -m py_compile packages\core\agent_framework\_types.py packages\core\tests\core\test_sessions.py
  • uv run pytest packages\core\tests\core\test_sessions.py::TestHistoryProviderBase::test_after_run_stores_coalesced_code_interpreter_chunks -q --basetemp .tmp\pytest -p no:cacheprovider
  • git diff --check

@he-yufeng he-yufeng force-pushed the fix/code-interpreter-history-chunks branch from 118ecb5 to a4a0a73 Compare May 14, 2026 13:31
@he-yufeng
Copy link
Copy Markdown
Contributor Author

I also rebased the branch onto current upstream/main after the mypy follow-up. The branch now carries only the three PR commits on top of main.

Revalidated after the rebase:

  • uv run mypy packages\core\agent_framework\_types.py
  • uv run ruff check packages\core\agent_framework\_types.py packages\core\tests\core\test_sessions.py
  • uv run pytest packages\core\tests\core\test_sessions.py::TestHistoryProviderBase::test_after_run_stores_coalesced_code_interpreter_chunks -q --basetemp .tmp\pytest -p no:cacheprovider
  • git diff --check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: CosmosHistoryProvider Code interpreter tool calls are saved chunk by chunk

3 participants