Add minicpm5 tool_parser for MiniCPM5 XML tool calls (draft, #4268)#4339
Open
exzile wants to merge 11 commits into
Open
Add minicpm5 tool_parser for MiniCPM5 XML tool calls (draft, #4268)#4339exzile wants to merge 11 commits into
exzile wants to merge 11 commits into
Conversation
MiniCPM5 emits XML-style tool calls (<function name="..."><param name="...">value</param></function>) which OVMS had no parser for, so /v3/chat/completions left the XML in message.content with empty tool_calls. Add a minicpm5 tool parser modeled on the qwen3coder XML parser: - new src/llm/io_processing/minicpm5/ tool parser (unary + streaming state machine; attribute-style tags; <think>...</think> stripping; schema-driven argument typing) - register "minicpm5" in output_parser.cpp and supported parser names - bazel target + unit tests Relates to openvinotoolkit#4268. Prototype for maintainer coordination; not yet a PR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Harden the prototype's test coverage: - streaming: char-by-char and every-fragment-size feeds must yield the same tool calls as a single-chunk parse (single call, two calls, leading content + <think> block) - edge cases that must not crash or emit spurious calls: unterminated <function>, empty <param> value, stray '<' in plain prose All 15 Minicpm5 impl/stream tests pass locally. Relates to openvinotoolkit#4268. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
przepeck
reviewed
Jun 30, 2026
…easoning, add tests (openvinotoolkit#4268) Responds to review feedback on the MiniCPM5 tool parser: - Extract helpers shared with qwen3coder: trimNewline, jsonTypeOf, enforceStringValue and Python-bool normalization move to io_processing/utils; parseToolSchema/createToolsParametersTypesMap move to base_output_parser (which already owns the ParameterType/ToolsSchemas types). Both parsers now use the shared versions; qwen3coder behavior is unchanged. - Split the long parseUntilStateChange switch into per-state handler methods. - Stop handling <think> in the tool parser; reasoning is now the reasoning parser's job (reuse Qwen3ReasoningParser via reasoning_parser: qwen3, same <think> tags), so reasoning is preserved as reasoning_content. - Tests: drive the public parser streaming path (name delta then arguments delta) and add complex-content cases (embedded quotes/backslashes, code with braces/newlines, JSON-looking string values, special/unicode chars). The special-token-tag handling (requiresStreamingWithSpecialTokens) is deferred to a follow-up; it needs the model to confirm which tags are special tokens and is coupled to the reasoning parser's matching flag. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
77e743a to
869484c
Compare
Author
|
Thanks for the detailed review @przepeck — pushed updates addressing all of the comments:
All changes compile-verified locally (parser libraries + |
…rser (openvinotoolkit#4268) MiniCPM5-1B emits its tool-call tags as special tokens (<function id=18, </function>=19, <param>=20, </param>=21 in the openbmb/MiniCPM5-1B tokenizer), which the text streamer skips by default — so the tag-scanning parser never saw them during streaming. Override requiresStreamingWithSpecialTokens() to return true so they are preserved. The framework requires a paired reasoning parser to agree on that flag. Since MiniCPM5's <think>/</think> are ordinary (non-special) tokens, the Qwen3 reasoning logic still applies verbatim; add a thin Minicpm5ReasoningParser subclass that only flips the flag, and register "minicpm5" as a reasoning parser. Tests updated to pair minicpm5+minicpm5 and assert the flag. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Author
|
Local verification update Built
So the refactor and the special-token handling are behaviorally verified for everything testable without a downloaded model; the end-to-end/encode-dependent cases are left to CI. |
przepeck
reviewed
Jul 1, 2026
przepeck
reviewed
Jul 1, 2026
Addresses review feedback: no MiniCPM5 tokenizer exists in the test assets. The parser operates on decoded text (tokenizer.decode) via plain string/XML matching, never token ids or vocab-specific special tokens, so any tokenizer that round-trips the test strings via encode()+decode() is safe to use here. Signed-off-by: exzile <joeypongallo@gmail.com>
Addresses review feedback: Minicpm5ReasoningParser previously inherited Qwen3ReasoningParser's text-based parse() verbatim and only overrode requiresStreamingWithSpecialTokens(). MiniCPM5's <think>/</think> tags are real dedicated special tokens in its tokenizer (ids 8 and 9), not incidental text, so parse() now locates the reasoning segment by scanning generatedTokens for these token ids directly -- mirroring the existing Gemma4ReasoningParser pattern -- instead of searching decoded text for the tag substrings. requiresStreamingWithSpecialTokens() remains true: MiniCPM5's tool-call tags (<function>, <param>, ...) are also special tokens, and the framework requires the paired tool/reasoning parser to agree on this setting (same as Gemma4). Test fix: the existing reasoning-parser test encoded '<think>...</think>' text with the Qwen3-8B stand-in tokenizer used elsewhere in this test file (no MiniCPM5 tokenizer test asset exists). Qwen3-8B has its own, different ids for those strings (151667/151668), so the new token-id based parse() would not have found them. The test now assembles generatedTokens directly with the real ids (8/9) as delimiters and uses the stand-in tokenizer only to encode/decode the surrounding free-form text, which round-trips correctly regardless of whose vocab it is. Signed-off-by: exzile <joeypongallo@gmail.com>
…soning Self-review fix: parse() previously decoded the content spans (outside the <think>...</think> boundary) with skip_special_tokens(true), copied from Gemma4ReasoningParser's pattern. That's safe for Gemma4 because its tool parser reads raw token ids, but MiniCPM5's tool parser is text-based -- Minicpm5ToolParserImpl matches the literal "<function"/"<param" substrings in decoded content. Those tags are themselves special tokens in MiniCPM5's tokenizer (<function = id 18, <param = id 20, both "special": true), so skip_special_tokens(true) silently erased them from content before the tool parser ever ran, breaking tool-call extraction whenever a reasoning block was present. Content spans now decode with skip_special_tokens(false), matching the framework's own default full-decode behavior. Only the extracted reasoning text keeps skip_special_tokens(true), since it's a human-facing string not matched against tags downstream. Also documented the existing find-first-occurrence limitation shared with Gemma4ReasoningParser (a stray </think> before any <think> would hide a later well-formed pair). Signed-off-by: exzile <joeypongallo@gmail.com>
…verage Addresses review feedback: the Minicpm5ToolParserImplStreamTest section (streamInFragments-based) only asserted that the aggregated final result is fragmentation-invariant, not that the actual user-facing streaming deltas are surfaced correctly. Retitled/documented that section to make its scope explicit and point to Minicpm5PublicStreamTest (already added in an earlier review round, commit 869484c) for the delta-timing concern, which drives the real Minicpm5ToolParser::parseChunk API and asserts on the actual emitted name/arguments deltas. That existing coverage was single-call only, so added EmitsDeltasForTwoSequentialCalls: verifies each of two sequential tool calls gets exactly one name delta and one arguments delta, in order, each carrying the correct (distinct, increasing) tool_call index -- not just that the final parse result is correct. Signed-off-by: exzile <joeypongallo@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
minicpm5tool parser so MiniCPM5 XML-style tool calls are surfaced as OpenAI-styletool_callson/v3/chat/completions, instead of being left as raw XML inmessage.contentwithtool_calls: [].Addresses #4268.
Approach
MiniCPM5's format (
<function name="get_weather"><param name="city">Beijing</param></function>) is structurally almost identical toqwen3coder, so the parser is modeled directly on it:src/llm/io_processing/minicpm5/minicpm5_tool_parser.{hpp,cpp}—Minicpm5ToolParserImplstate machine +Minicpm5ToolParser : BaseOutputParser.parse()and streamingparseChunk()(same first-delta / full-delta protocol as qwen3coder).name="...", also tolerates'...'),<think>...</think>stripping before parsing, and schema-driven argument typing (string vs number/bool) reusing the existingToolsSchemas_tmechanism.minicpm5inoutput_parser.cppandgetSupportedToolParserNames(); bazel target wired intooutput_parsers.Tests (
src/test/llm/output_parsers/minicpm5_output_parser_test.cpp)OutputParser::parsecases (single/multiple calls, integer-typed params, mixed types,<think>stripping, content preservation, content-around-call removal) — these run in CI (tokenizer-gated).<think>char-by-char.<function>, empty<param>value, stray<in plain prose.All 15 impl/streaming tests pass locally (Windows build). The
OutputParser-level tests are tokenizer-gated and run in CI (same as the existing qwen3coder tests).Not yet included (intentional, for v1 follow-up / reviewer input)
export_model.py+docs/llm/reference.mdupdates for the new--tool_parser minicpm5value.openbmb/MiniCPM5-1BIR (the acceptance test in Add tool_parser minicpm5 for MiniCPM5-1B XML tool calls #4268).Glad to add the docs/export wiring and run the end-to-end check once direction is confirmed.