Multimodal synthesis by bschloss · Pull Request #21374 · run-llama/llama_index

bschloss · 2026-04-13T15:27:31Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

First of two or three PRs to broadly support multimodal synthesis. This PR:

Adds a BaseMultimodalSynthesizer class to address existing semantic issues with variable naming and logical issues with conversion of nodes to multimodal content.
Creates multimodal prompts for relevant synthesizers
Adds multimodal synthesizers for ContextOnly, Generation, NoText, SimpleSummarize (relatively basic synthesizers), as well as Refine and CompactAndRefine.
Support for streaming StructuredResponse objects was also added to the Refine synthesizer.
Adds lacking test support for response synthesizers

Fixes # 21373

Although the number of lines is large in this PR, the total logical changes are not that great. Fairly repetitive across the basic MultimodalSynthesizers. Since the Refine synthesizer contained more complicated updates, I will follow up with a second PR for the remaining synthesizers so focus can be given to the updates there. Also, many lines were added because there was little to no testing of the synthesizer classes. Some suggestions are made in PR comments on how to reduce total bloat here. Unfortunately though, because of some logical and semantic issues with the BaseSynthesizer class, it seemed like a better idea to make a new Multimodal synthesizer class so as to not introduce breaking changes or overly complicated logic/function signatures in the BaseSynthesizer class.

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Yes
No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Yes
No

Type of Change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

I added new unit tests to cover this change
I believe this change is already covered by existing unit tests

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran uv run make format; uv run make lint to appease the lint gods

bschloss · 2026-04-13T15:33:13Z

-                # add to list of partial messages
-                partial_messages.append(partial_message)
-
+            partial_messages = get_empty_prompt_messages(prompt)


This function already handles partial formatting and takes care of the TODO

bschloss · 2026-04-13T15:34:16Z

Expanded some mocking behavior to test streaming of various LLM programs

bschloss · 2026-04-13T15:35:44Z

 from llama_index.core.postprocessor.types import BaseNodePostprocessor
 from llama_index.core.prompts import BasePromptTemplate, SelectorPromptTemplate
-from llama_index.core.prompts.chat_prompts import CHAT_CHOICE_SELECT_PROMPT
+from llama_index.core.prompts.chat_prompts import CHAT_CONTENT_CHOICE_SELECT_PROMPT


Updated prompt names to indicate whether they are CHAT_TEXT or CHAT_CONTENT (the latter supporting multimodal chat input). open to naming feedback.

bschloss · 2026-04-13T15:43:04Z

reading through the synthesizer classes, one potential change that could be made that would drastically reduce code bloat would be to support formatting text prompts via messages_to_prompt in the multimodal synthesizers and, then, update all the BaseSynthesizer classes to be light wrappers around the multimodal synthesizers.

This would reduce code in the synthesizer classes, tests, and potentially in the prompts as well.
Downside is that the prompts might change a little bit since I don't think they're one to one and messages to prompt appends stuff like "user: " and "assistant: ". This is all relatively easy to handle though.

I kind of agree here. It'd be nice if the existing synthesizers just "handled" multimodal content. I'm all for anything that reduces the bloat here

bschloss · 2026-04-13T15:45:52Z

This class contains multiple updates:

1.) supports multimodal
2.) supports streaming structured responses
3.) reduced code duplication across functions combining _refine_response_single and _give_single_response into _update_response

bschloss · 2026-04-13T15:47:12Z

        num_iters += 1

-    assert num_iters > 10
+    assert num_iters == 1


Refine synthesizer with structured response and streaming now yields the entire text as a single chunk after streaming all the flexible models to ensure the complete json is present.

bschloss · 2026-04-13T15:47:49Z

A lot of extra code lines in tests since many of the synthesizers had few or no tests.

logan-markewich · 2026-04-16T02:13:42Z

+    def get_response(  # type: ignore[override]
+        self,
+        query_str: str,
+        message_chunks: Sequence[ChatMessage],


ooc why messages and not nodes? Technically the synthesizers are meant to be used with nodes from retrievers.

Ah I see, that base class expected str here

Maybe we can update the base class to accept a union of types here?

Alternatively, we just add to the base class: get_response_from_messages() etc. Could be some interesting routing with this approach that would help avoid duplicate code?

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Apr 13, 2026

bschloss commented Apr 13, 2026

View reviewed changes

bschloss force-pushed the feat-multimodal-synthesis branch from 90d87f2 to 7cb0ba3 Compare April 13, 2026 15:38

bschloss commented Apr 13, 2026

View reviewed changes

Multimodal synthesis

9b48dd9

bschloss force-pushed the feat-multimodal-synthesis branch from 7cb0ba3 to 9b48dd9 Compare April 13, 2026 15:54

logan-markewich reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal synthesis#21374

Multimodal synthesis#21374
bschloss wants to merge 1 commit intorun-llama:mainfrom
bschloss:feat-multimodal-synthesis

bschloss commented Apr 13, 2026 •

edited

Loading

Uh oh!

bschloss Apr 13, 2026

Uh oh!

bschloss Apr 13, 2026

Uh oh!

bschloss Apr 13, 2026

Uh oh!

bschloss Apr 13, 2026

Uh oh!

logan-markewich Apr 16, 2026

Uh oh!

bschloss Apr 13, 2026

Uh oh!

bschloss Apr 13, 2026

Uh oh!

bschloss Apr 13, 2026

Uh oh!

logan-markewich Apr 16, 2026

Uh oh!

logan-markewich Apr 16, 2026

Uh oh!

logan-markewich Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bschloss commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bschloss commented Apr 13, 2026 •

edited

Loading