Skip to content

Add minicpm5 tool_parser for MiniCPM5 XML tool calls (draft, #4268)#4339

Open
exzile wants to merge 11 commits into
openvinotoolkit:mainfrom
exzile:feature/minicpm5-tool-parser
Open

Add minicpm5 tool_parser for MiniCPM5 XML tool calls (draft, #4268)#4339
exzile wants to merge 11 commits into
openvinotoolkit:mainfrom
exzile:feature/minicpm5-tool-parser

Conversation

@exzile

@exzile exzile commented Jun 29, 2026

Copy link
Copy Markdown

Summary

Adds a minicpm5 tool parser so MiniCPM5 XML-style tool calls are surfaced as OpenAI-style tool_calls on /v3/chat/completions, instead of being left as raw XML in message.content with tool_calls: [].

Addresses #4268.

Note: #4268 is assigned to @przepeck. I opened this as a draft so it can either be reviewed/merged or used as a starting point — whatever is least disruptive to any work already underway. Happy to hand it off, iterate on review, or close it if it duplicates effort.

Approach

MiniCPM5's format (<function name="get_weather"><param name="city">Beijing</param></function>) is structurally almost identical to qwen3coder, so the parser is modeled directly on it:

  • New src/llm/io_processing/minicpm5/minicpm5_tool_parser.{hpp,cpp}Minicpm5ToolParserImpl state machine + Minicpm5ToolParser : BaseOutputParser.
  • Unary parse() and streaming parseChunk() (same first-delta / full-delta protocol as qwen3coder).
  • Attribute-style tag parsing (name="...", also tolerates '...'), <think>...</think> stripping before parsing, and schema-driven argument typing (string vs number/bool) reusing the existing ToolsSchemas_t mechanism.
  • Registered minicpm5 in output_parser.cpp and getSupportedToolParserNames(); bazel target wired into output_parsers.

Tests (src/test/llm/output_parsers/minicpm5_output_parser_test.cpp)

  • Unary OutputParser::parse cases (single/multiple calls, integer-typed params, mixed types, <think> stripping, content preservation, content-around-call removal) — these run in CI (tokenizer-gated).
  • Impl-level state-machine cases: single/multiple calls, integer typing, newline trimming, content removal, single-quote attributes.
  • Streaming robustness: char-by-char and every fragment size 1..N must yield the same calls as a single-chunk parse; two-calls-char-by-char; leading content + <think> char-by-char.
  • Edge cases (no crash / no spurious calls): unterminated <function>, empty <param> value, stray < in plain prose.

All 15 impl/streaming tests pass locally (Windows build). The OutputParser-level tests are tokenizer-gated and run in CI (same as the existing qwen3coder tests).

Not yet included (intentional, for v1 follow-up / reviewer input)

  • export_model.py + docs/llm/reference.md updates for the new --tool_parser minicpm5 value.
  • End-to-end validation against a real openbmb/MiniCPM5-1B IR (the acceptance test in Add tool_parser minicpm5 for MiniCPM5-1B XML tool calls #4268).
  • Tool-guided / XGrammar generation for MiniCPM5 (explicitly out of scope per the issue).

Glad to add the docs/export wiring and run the end-to-end check once direction is confirmed.

exzile and others added 2 commits June 28, 2026 22:42
MiniCPM5 emits XML-style tool calls (<function name="..."><param
name="...">value</param></function>) which OVMS had no parser for, so
/v3/chat/completions left the XML in message.content with empty
tool_calls. Add a minicpm5 tool parser modeled on the qwen3coder XML
parser:
- new src/llm/io_processing/minicpm5/ tool parser (unary + streaming
  state machine; attribute-style tags; <think>...</think> stripping;
  schema-driven argument typing)
- register "minicpm5" in output_parser.cpp and supported parser names
- bazel target + unit tests

Relates to openvinotoolkit#4268. Prototype for maintainer coordination; not yet a PR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Harden the prototype's test coverage:
- streaming: char-by-char and every-fragment-size feeds must yield the
  same tool calls as a single-chunk parse (single call, two calls,
  leading content + <think> block)
- edge cases that must not crash or emit spurious calls: unterminated
  <function>, empty <param> value, stray '<' in plain prose

All 15 Minicpm5 impl/stream tests pass locally.

Relates to openvinotoolkit#4268.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@exzile exzile marked this pull request as ready for review June 29, 2026 15:12
Comment thread src/llm/io_processing/minicpm5/minicpm5_tool_parser.cpp Outdated
Comment thread src/llm/io_processing/minicpm5/minicpm5_tool_parser.cpp
Comment thread src/llm/io_processing/minicpm5/minicpm5_tool_parser.cpp
Comment thread src/llm/io_processing/minicpm5/minicpm5_tool_parser.cpp Outdated
Comment thread src/llm/io_processing/minicpm5/minicpm5_tool_parser.cpp Outdated
Comment thread src/test/llm/output_parsers/minicpm5_output_parser_test.cpp Outdated
Comment thread src/test/llm/output_parsers/minicpm5_output_parser_test.cpp
exzile and others added 2 commits June 30, 2026 09:50
…easoning, add tests (openvinotoolkit#4268)

Responds to review feedback on the MiniCPM5 tool parser:
- Extract helpers shared with qwen3coder: trimNewline, jsonTypeOf,
  enforceStringValue and Python-bool normalization move to io_processing/utils;
  parseToolSchema/createToolsParametersTypesMap move to base_output_parser
  (which already owns the ParameterType/ToolsSchemas types). Both parsers now
  use the shared versions; qwen3coder behavior is unchanged.
- Split the long parseUntilStateChange switch into per-state handler methods.
- Stop handling <think> in the tool parser; reasoning is now the reasoning
  parser's job (reuse Qwen3ReasoningParser via reasoning_parser: qwen3, same
  <think> tags), so reasoning is preserved as reasoning_content.
- Tests: drive the public parser streaming path (name delta then arguments
  delta) and add complex-content cases (embedded quotes/backslashes, code with
  braces/newlines, JSON-looking string values, special/unicode chars).

The special-token-tag handling (requiresStreamingWithSpecialTokens) is deferred
to a follow-up; it needs the model to confirm which tags are special tokens and
is coupled to the reasoning parser's matching flag.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@exzile exzile force-pushed the feature/minicpm5-tool-parser branch from 77e743a to 869484c Compare June 30, 2026 14:53
@exzile

exzile commented Jun 30, 2026

Copy link
Copy Markdown
Author

Thanks for the detailed review @przepeck — pushed updates addressing all of the comments:

  • Reasoning extracted to a reasoning parser + preserved as reasoning_content (:222, :244): the tool parser no longer handles <think> (dropped the InsideThink state/tags). MiniCPM5 uses the same <think>/</think> as Qwen3, so reasoning is handled by a reasoning parser and emitted as reasoning_content, consistent with the other parsers.
  • Long method split (:215): parseUntilStateChange is now a small dispatcher delegating to per-state handlers.
  • Shared code with qwen3coder (:148): trimNewline, jsonTypeOf, enforceStringValue and Python-bool normalization moved to io_processing/utils; parseToolSchema/createToolsParametersTypesMap moved to base_output_parser (which owns the ParameterType/ToolsSchemas types). Both parsers use the shared versions; qwen3coder behavior is unchanged.
  • Tests (:304, :1): added a test driving the public parseChunk streaming path (name delta once, arguments delta after the function closes), plus complex-content cases — embedded quotes/backslashes, code with braces/newlines, JSON-looking string values, and special/unicode characters.
  • Special-token tags (:433): confirmed and fixed. In the openbmb/MiniCPM5-1B tokenizer the tool tags are special tokens<function=18, </function>=19, <param=20, </param>=21 (all special: true) — so the default streamer skips them and the tag-scanning parser never saw them while streaming. The tool parser now overrides requiresStreamingWithSpecialTokens() to return true. Because the framework requires the paired reasoning parser to agree on that flag, and MiniCPM5's <think>/</think> are non-special tokens (so the Qwen3 reasoning logic still applies), I added a thin Minicpm5ReasoningParser subclass that flips the flag and registered minicpm5 as a reasoning parser. A deployment uses tool_parser: minicpm5 + reasoning_parser: minicpm5.

All changes compile-verified locally (parser libraries + output_parsers build clean); the full gtest run is via CI here, as my local ovms_test build is blocked by an unrelated genai-version skew in the audio module.

…rser (openvinotoolkit#4268)

MiniCPM5-1B emits its tool-call tags as special tokens (<function id=18,
</function>=19, <param>=20, </param>=21 in the openbmb/MiniCPM5-1B tokenizer),
which the text streamer skips by default — so the tag-scanning parser never
saw them during streaming. Override requiresStreamingWithSpecialTokens() to
return true so they are preserved.

The framework requires a paired reasoning parser to agree on that flag. Since
MiniCPM5's <think>/</think> are ordinary (non-special) tokens, the Qwen3
reasoning logic still applies verbatim; add a thin Minicpm5ReasoningParser
subclass that only flips the flag, and register "minicpm5" as a reasoning
parser. Tests updated to pair minicpm5+minicpm5 and assert the flag.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@exzile

exzile commented Jun 30, 2026

Copy link
Copy Markdown
Author

Local verification update

Built //src:ovms_test and ran the minicpm5 suite locally — 30 tests, 22 passed, 8 failed, where the 8 failures are purely an environmental missing-model issue, not a code problem:

  • The 8 failing tests (Minicpm5OutputParserTest.Parse*) call tokenizer->encode(...) and fail with Tokenizer::encode is not available — openvino_tokenizer.xml was not provided or it was not loaded correctly because the tokenizer IR isn't present in my local checkout. These are the tests that need a real model and will run under CI.
  • All 22 model-independent tests pass, including:
    • the Minicpm5ToolParserImplTest core + edge-case + streaming-robustness tests,
    • the public streaming-delta test (name delta once, arguments delta after </function>),
    • the complex-content tests (embedded quotes/backslashes, code with braces/newlines, JSON-looking string values, unicode/special chars),
    • and a RequiresStreamingWithSpecialTokens test confirming the tool parser and the new Minicpm5ReasoningParser both report true and the minicpm5 + minicpm5 pairing constructs without tripping the requiresStreamingWithSpecialTokens() consistency check.

So the refactor and the special-token handling are behaviorally verified for everything testable without a downloaded model; the end-to-end/encode-dependent cases are left to CI.

Comment thread src/test/llm/output_parsers/minicpm5_output_parser_test.cpp
Comment thread src/llm/io_processing/minicpm5/minicpm5_reasoning_parser.hpp
exzile added 5 commits July 1, 2026 09:48
Addresses review feedback: no MiniCPM5 tokenizer exists in the test
assets. The parser operates on decoded text (tokenizer.decode) via
plain string/XML matching, never token ids or vocab-specific special
tokens, so any tokenizer that round-trips the test strings via
encode()+decode() is safe to use here.

Signed-off-by: exzile <joeypongallo@gmail.com>
Addresses review feedback: Minicpm5ReasoningParser previously inherited
Qwen3ReasoningParser's text-based parse() verbatim and only overrode
requiresStreamingWithSpecialTokens(). MiniCPM5's <think>/</think> tags
are real dedicated special tokens in its tokenizer (ids 8 and 9), not
incidental text, so parse() now locates the reasoning segment by
scanning generatedTokens for these token ids directly -- mirroring the
existing Gemma4ReasoningParser pattern -- instead of searching decoded
text for the tag substrings.

requiresStreamingWithSpecialTokens() remains true: MiniCPM5's tool-call
tags (<function>, <param>, ...) are also special tokens, and the
framework requires the paired tool/reasoning parser to agree on this
setting (same as Gemma4).

Test fix: the existing reasoning-parser test encoded '<think>...</think>'
text with the Qwen3-8B stand-in tokenizer used elsewhere in this test
file (no MiniCPM5 tokenizer test asset exists). Qwen3-8B has its own,
different ids for those strings (151667/151668), so the new token-id
based parse() would not have found them. The test now assembles
generatedTokens directly with the real ids (8/9) as delimiters and uses
the stand-in tokenizer only to encode/decode the surrounding free-form
text, which round-trips correctly regardless of whose vocab it is.

Signed-off-by: exzile <joeypongallo@gmail.com>
…soning

Self-review fix: parse() previously decoded the content spans (outside
the <think>...</think> boundary) with skip_special_tokens(true), copied
from Gemma4ReasoningParser's pattern. That's safe for Gemma4 because its
tool parser reads raw token ids, but MiniCPM5's tool parser is text-based
-- Minicpm5ToolParserImpl matches the literal "<function"/"<param"
substrings in decoded content. Those tags are themselves special tokens
in MiniCPM5's tokenizer (<function = id 18, <param = id 20, both
"special": true), so skip_special_tokens(true) silently erased them
from content before the tool parser ever ran, breaking tool-call
extraction whenever a reasoning block was present.

Content spans now decode with skip_special_tokens(false), matching the
framework's own default full-decode behavior. Only the extracted
reasoning text keeps skip_special_tokens(true), since it's a
human-facing string not matched against tags downstream. Also documented
the existing find-first-occurrence limitation shared with
Gemma4ReasoningParser (a stray </think> before any <think> would hide a
later well-formed pair).

Signed-off-by: exzile <joeypongallo@gmail.com>
…verage

Addresses review feedback: the Minicpm5ToolParserImplStreamTest section
(streamInFragments-based) only asserted that the aggregated final result
is fragmentation-invariant, not that the actual user-facing streaming
deltas are surfaced correctly. Retitled/documented that section to make
its scope explicit and point to Minicpm5PublicStreamTest (already added
in an earlier review round, commit 869484c) for the delta-timing
concern, which drives the real Minicpm5ToolParser::parseChunk API and
asserts on the actual emitted name/arguments deltas.

That existing coverage was single-call only, so added
EmitsDeltasForTwoSequentialCalls: verifies each of two sequential tool
calls gets exactly one name delta and one arguments delta, in order,
each carrying the correct (distinct, increasing) tool_call index -- not
just that the final parse result is correct.

Signed-off-by: exzile <joeypongallo@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants