Skip to content

generation_config.json path override per LLM node #4233

Description

@korund

Component: LLM continuous batching, LLMCalculatorOptions / mediapipe registration
OVMS version: 2026.1.0.72cc0624 (OpenVINO backend 2026.1.0, OpenVINO GenAI backend 2026.1.0.0)

Context

When several deployments share the same on-disk model directory but need different generation defaults (e.g. different num_assistant_tokens, temperature, or sampling settings per served endpoint), the only current option is to duplicate the model directory — including the weights — because OVMS reads generation_config.json from a fixed name inside models_path. For multi-gigabyte LLMs this is impractical.
The same problem exists for graph.pbtxt, but is already solved there: graph_path in the mediapipe config entry lets one model directory back several deployments with different graphs. There is no equivalent for generation_config.json.

Related to #4221

Question

Would it be feasible to add a per-LLM-node override for the generation-config file path — analogous to graph_path? A natural shape would be either:

  • a generation_config_path field in LLMCalculatorOptions (next to models_path), absolute or relative to models_path; or
  • a sibling field at the mediapipe config-entry level (next to graph_path).
    From a quick read of openvino.genai, ContinuousBatchingPipeline accepts an optional GenerationConfig at construction and exposes set_config() post-construction, so the underlying mechanism appears to be already in place. The work seems contained within src/llm/language_model/continuous_batching/servable_initializer.cpp on the OVMS side.

Use case

Multiple served names backed by the same model weights, each with its own generation defaults. Without per-entry generation-config selection, each variant requires a full copy of the model directory on disk.

Open questions

  • Is there a reason this hasn't been exposed yet — for example, a planned different mechanism (per-deployment overrides through some other channel), or an interaction with model auto-detection/conversion that I'm missing?
  • Is one of the placement options preferred from the architecture side?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions