Add minicpm5 tool_parser for MiniCPM5 XML tool calls (draft, #4268) by exzile · Pull Request #4339 · openvinotoolkit/model_server

exzile · 2026-06-29T02:52:13Z

Summary

Adds a minicpm5 tool parser so MiniCPM5 XML-style tool calls are surfaced as OpenAI-style tool_calls on /v3/chat/completions, instead of being left as raw XML in message.content with tool_calls: [].

Addresses #4268.

Note: #4268 is assigned to @przepeck. I opened this as a draft so it can either be reviewed/merged or used as a starting point — whatever is least disruptive to any work already underway. Happy to hand it off, iterate on review, or close it if it duplicates effort.

Approach

MiniCPM5's format (<function name="get_weather"><param name="city">Beijing</param></function>) is structurally almost identical to qwen3coder, so the parser is modeled directly on it:

New src/llm/io_processing/minicpm5/minicpm5_tool_parser.{hpp,cpp} — Minicpm5ToolParserImpl state machine + Minicpm5ToolParser : BaseOutputParser.
Unary parse() and streaming parseChunk() (same first-delta / full-delta protocol as qwen3coder).
Attribute-style tag parsing (name="...", also tolerates '...'), <think>...</think> stripping before parsing, and schema-driven argument typing (string vs number/bool) reusing the existing ToolsSchemas_t mechanism.
Registered minicpm5 in output_parser.cpp and getSupportedToolParserNames(); bazel target wired into output_parsers.

Tests (`src/test/llm/output_parsers/minicpm5_output_parser_test.cpp`)

Unary OutputParser::parse cases (single/multiple calls, integer-typed params, mixed types, <think> stripping, content preservation, content-around-call removal) — these run in CI (tokenizer-gated).
Impl-level state-machine cases: single/multiple calls, integer typing, newline trimming, content removal, single-quote attributes.
Streaming robustness: char-by-char and every fragment size 1..N must yield the same calls as a single-chunk parse; two-calls-char-by-char; leading content + <think> char-by-char.
Edge cases (no crash / no spurious calls): unterminated <function>, empty <param> value, stray < in plain prose.

All 15 impl/streaming tests pass locally (Windows build). The OutputParser-level tests are tokenizer-gated and run in CI (same as the existing qwen3coder tests).

Not yet included (intentional, for v1 follow-up / reviewer input)

export_model.py + docs/llm/reference.md updates for the new --tool_parser minicpm5 value.
End-to-end validation against a real openbmb/MiniCPM5-1B IR (the acceptance test in Add tool_parser minicpm5 for MiniCPM5-1B XML tool calls #4268).
Tool-guided / XGrammar generation for MiniCPM5 (explicitly out of scope per the issue).

Glad to add the docs/export wiring and run the end-to-end check once direction is confirmed.

MiniCPM5 emits XML-style tool calls (<function name="..."><param name="...">value</param></function>) which OVMS had no parser for, so /v3/chat/completions left the XML in message.content with empty tool_calls. Add a minicpm5 tool parser modeled on the qwen3coder XML parser: - new src/llm/io_processing/minicpm5/ tool parser (unary + streaming state machine; attribute-style tags; <think>...</think> stripping; schema-driven argument typing) - register "minicpm5" in output_parser.cpp and supported parser names - bazel target + unit tests Relates to openvinotoolkit#4268. Prototype for maintainer coordination; not yet a PR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Harden the prototype's test coverage: - streaming: char-by-char and every-fragment-size feeds must yield the same tool calls as a single-chunk parse (single call, two calls, leading content + <think> block) - edge cases that must not crash or emit spurious calls: unterminated <function>, empty <param> value, stray '<' in plain prose All 15 Minicpm5 impl/stream tests pass locally. Relates to openvinotoolkit#4268. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…easoning, add tests (openvinotoolkit#4268) Responds to review feedback on the MiniCPM5 tool parser: - Extract helpers shared with qwen3coder: trimNewline, jsonTypeOf, enforceStringValue and Python-bool normalization move to io_processing/utils; parseToolSchema/createToolsParametersTypesMap move to base_output_parser (which already owns the ParameterType/ToolsSchemas types). Both parsers now use the shared versions; qwen3coder behavior is unchanged. - Split the long parseUntilStateChange switch into per-state handler methods. - Stop handling <think> in the tool parser; reasoning is now the reasoning parser's job (reuse Qwen3ReasoningParser via reasoning_parser: qwen3, same <think> tags), so reasoning is preserved as reasoning_content. - Tests: drive the public parser streaming path (name delta then arguments delta) and add complex-content cases (embedded quotes/backslashes, code with braces/newlines, JSON-looking string values, special/unicode chars). The special-token-tag handling (requiresStreamingWithSpecialTokens) is deferred to a follow-up; it needs the model to confirm which tags are special tokens and is coupled to the reasoning parser's matching flag. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

exzile · 2026-06-30T14:53:58Z

Thanks for the detailed review @przepeck — pushed updates addressing all of the comments:

Reasoning extracted to a reasoning parser + preserved as reasoning_content (:222, :244): the tool parser no longer handles <think> (dropped the InsideThink state/tags). MiniCPM5 uses the same <think>/</think> as Qwen3, so reasoning is handled by a reasoning parser and emitted as reasoning_content, consistent with the other parsers.
Long method split (:215): parseUntilStateChange is now a small dispatcher delegating to per-state handlers.
Shared code with qwen3coder (:148): trimNewline, jsonTypeOf, enforceStringValue and Python-bool normalization moved to io_processing/utils; parseToolSchema/createToolsParametersTypesMap moved to base_output_parser (which owns the ParameterType/ToolsSchemas types). Both parsers use the shared versions; qwen3coder behavior is unchanged.
Tests (:304, :1): added a test driving the public parseChunk streaming path (name delta once, arguments delta after the function closes), plus complex-content cases — embedded quotes/backslashes, code with braces/newlines, JSON-looking string values, and special/unicode characters.
Special-token tags (:433): confirmed and fixed. In the openbmb/MiniCPM5-1B tokenizer the tool tags are special tokens — <function=18, </function>=19, <param=20, </param>=21 (all special: true) — so the default streamer skips them and the tag-scanning parser never saw them while streaming. The tool parser now overrides requiresStreamingWithSpecialTokens() to return true. Because the framework requires the paired reasoning parser to agree on that flag, and MiniCPM5's <think>/</think> are non-special tokens (so the Qwen3 reasoning logic still applies), I added a thin Minicpm5ReasoningParser subclass that flips the flag and registered minicpm5 as a reasoning parser. A deployment uses tool_parser: minicpm5 + reasoning_parser: minicpm5.

All changes compile-verified locally (parser libraries + output_parsers build clean); the full gtest run is via CI here, as my local ovms_test build is blocked by an unrelated genai-version skew in the audio module.

…rser (openvinotoolkit#4268) MiniCPM5-1B emits its tool-call tags as special tokens (<function id=18, </function>=19, <param>=20, </param>=21 in the openbmb/MiniCPM5-1B tokenizer), which the text streamer skips by default — so the tag-scanning parser never saw them during streaming. Override requiresStreamingWithSpecialTokens() to return true so they are preserved. The framework requires a paired reasoning parser to agree on that flag. Since MiniCPM5's <think>/</think> are ordinary (non-special) tokens, the Qwen3 reasoning logic still applies verbatim; add a thin Minicpm5ReasoningParser subclass that only flips the flag, and register "minicpm5" as a reasoning parser. Tests updated to pair minicpm5+minicpm5 and assert the flag. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

exzile · 2026-06-30T15:41:01Z

Local verification update

Built //src:ovms_test and ran the minicpm5 suite locally — 30 tests, 22 passed, 8 failed, where the 8 failures are purely an environmental missing-model issue, not a code problem:

The 8 failing tests (Minicpm5OutputParserTest.Parse*) call tokenizer->encode(...) and fail with Tokenizer::encode is not available — openvino_tokenizer.xml was not provided or it was not loaded correctly because the tokenizer IR isn't present in my local checkout. These are the tests that need a real model and will run under CI.
All 22 model-independent tests pass, including:
- the Minicpm5ToolParserImplTest core + edge-case + streaming-robustness tests,
- the public streaming-delta test (name delta once, arguments delta after </function>),
- the complex-content tests (embedded quotes/backslashes, code with braces/newlines, JSON-looking string values, unicode/special chars),
- and a RequiresStreamingWithSpecialTokens test confirming the tool parser and the new Minicpm5ReasoningParser both report true and the minicpm5 + minicpm5 pairing constructs without tripping the requiresStreamingWithSpecialTokens() consistency check.

So the refactor and the special-token handling are behaviorally verified for everything testable without a downloaded model; the end-to-end/encode-dependent cases are left to CI.

Addresses review feedback: no MiniCPM5 tokenizer exists in the test assets. The parser operates on decoded text (tokenizer.decode) via plain string/XML matching, never token ids or vocab-specific special tokens, so any tokenizer that round-trips the test strings via encode()+decode() is safe to use here. Signed-off-by: exzile <joeypongallo@gmail.com>

Addresses review feedback: Minicpm5ReasoningParser previously inherited Qwen3ReasoningParser's text-based parse() verbatim and only overrode requiresStreamingWithSpecialTokens(). MiniCPM5's <think>/</think> tags are real dedicated special tokens in its tokenizer (ids 8 and 9), not incidental text, so parse() now locates the reasoning segment by scanning generatedTokens for these token ids directly -- mirroring the existing Gemma4ReasoningParser pattern -- instead of searching decoded text for the tag substrings. requiresStreamingWithSpecialTokens() remains true: MiniCPM5's tool-call tags (<function>, <param>, ...) are also special tokens, and the framework requires the paired tool/reasoning parser to agree on this setting (same as Gemma4). Test fix: the existing reasoning-parser test encoded '<think>...</think>' text with the Qwen3-8B stand-in tokenizer used elsewhere in this test file (no MiniCPM5 tokenizer test asset exists). Qwen3-8B has its own, different ids for those strings (151667/151668), so the new token-id based parse() would not have found them. The test now assembles generatedTokens directly with the real ids (8/9) as delimiters and uses the stand-in tokenizer only to encode/decode the surrounding free-form text, which round-trips correctly regardless of whose vocab it is. Signed-off-by: exzile <joeypongallo@gmail.com>

…soning Self-review fix: parse() previously decoded the content spans (outside the <think>...</think> boundary) with skip_special_tokens(true), copied from Gemma4ReasoningParser's pattern. That's safe for Gemma4 because its tool parser reads raw token ids, but MiniCPM5's tool parser is text-based -- Minicpm5ToolParserImpl matches the literal "<function"/"<param" substrings in decoded content. Those tags are themselves special tokens in MiniCPM5's tokenizer (<function = id 18, <param = id 20, both "special": true), so skip_special_tokens(true) silently erased them from content before the tool parser ever ran, breaking tool-call extraction whenever a reasoning block was present. Content spans now decode with skip_special_tokens(false), matching the framework's own default full-decode behavior. Only the extracted reasoning text keeps skip_special_tokens(true), since it's a human-facing string not matched against tags downstream. Also documented the existing find-first-occurrence limitation shared with Gemma4ReasoningParser (a stray </think> before any <think> would hide a later well-formed pair). Signed-off-by: exzile <joeypongallo@gmail.com>

…verage Addresses review feedback: the Minicpm5ToolParserImplStreamTest section (streamInFragments-based) only asserted that the aggregated final result is fragmentation-invariant, not that the actual user-facing streaming deltas are surfaced correctly. Retitled/documented that section to make its scope explicit and point to Minicpm5PublicStreamTest (already added in an earlier review round, commit 869484c) for the delta-timing concern, which drives the real Minicpm5ToolParser::parseChunk API and asserts on the actual emitted name/arguments deltas. That existing coverage was single-call only, so added EmitsDeltasForTwoSequentialCalls: verifies each of two sequential tool calls gets exactly one name delta and one arguments delta, in order, each carrying the correct (distinct, increasing) tool_call index -- not just that the final parse result is correct. Signed-off-by: exzile <joeypongallo@gmail.com>

exzile and others added 2 commits June 28, 2026 22:42

exzile mentioned this pull request Jun 29, 2026

Add tool_parser minicpm5 for MiniCPM5-1B XML tool calls #4268

Open

atobiszei requested a review from przepeck June 29, 2026 07:32

Merge branch 'main' into feature/minicpm5-tool-parser

5c6e4a7

exzile marked this pull request as ready for review June 29, 2026 15:12

przepeck reviewed Jun 30, 2026

View reviewed changes

exzile and others added 2 commits June 30, 2026 09:50

Merge branch 'main' into feature/minicpm5-tool-parser

e9a9d86

exzile force-pushed the feature/minicpm5-tool-parser branch from 77e743a to 869484c Compare June 30, 2026 14:53

przepeck reviewed Jul 1, 2026

View reviewed changes

Comment thread src/test/llm/output_parsers/minicpm5_output_parser_test.cpp

przepeck reviewed Jul 1, 2026

View reviewed changes

Comment thread src/llm/io_processing/minicpm5/minicpm5_reasoning_parser.hpp

exzile added 5 commits July 1, 2026 09:48

Merge branch 'main' into feature/minicpm5-tool-parser

2243ea4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add minicpm5 tool_parser for MiniCPM5 XML tool calls (draft, #4268)#4339

Add minicpm5 tool_parser for MiniCPM5 XML tool calls (draft, #4268)#4339
exzile wants to merge 11 commits into
openvinotoolkit:mainfrom
exzile:feature/minicpm5-tool-parser

exzile commented Jun 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

exzile commented Jun 30, 2026 •

edited

Loading

Uh oh!

exzile commented Jun 30, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

exzile commented Jun 29, 2026

Summary

Approach

Tests (src/test/llm/output_parsers/minicpm5_output_parser_test.cpp)

Not yet included (intentional, for v1 follow-up / reviewer input)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

exzile commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

exzile commented Jun 30, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Tests (`src/test/llm/output_parsers/minicpm5_output_parser_test.cpp`)

exzile commented Jun 30, 2026 •

edited

Loading