Skip to content

Commit 6c3a7aa

Browse files
Merge branch 'main' into main
2 parents 5acee3e + 0357cb9 commit 6c3a7aa

10 files changed

Lines changed: 706 additions & 555 deletions

File tree

.claude/skills/ptq/SKILL.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -124,9 +124,9 @@ Report the path and size to the user.
124124

125125
## Common Pitfalls
126126

127-
- **Transformers version**: Newer models (e.g., Devstral/ministral3) may require a transformers version not yet in the container. Check `config.json` for `transformers_version` and upgrade if needed. Install ModelOpt first, then upgrade transformers **with** deps (not `--no-deps`) to pull compatible `huggingface_hub`
127+
- **Transformers version**: New models may need a newer version of transformers than what's installed. Check `config.json` for `transformers_version`. In containers, beware of `PIP_CONSTRAINT` blocking upgrades — see `references/slurm-setup-ptq.md` for workarounds
128128
- **Gated datasets**: Some calibration datasets require HF authentication. Ensure `HF_TOKEN` is set in the job environment, or use `--dataset cnn_dailymail` as a non-gated alternative
129-
- **NFS root_squash + Docker**: Docker runs as root, but NFS squashes root to `nobody`. Use `docker run --user $(id -u):$(id -g)`, or `chmod -R a+rwX` on needed directories as a fallback. See `skills/common/slurm-setup.md` section 5
129+
- **NFS root_squash + Docker**: See `skills/common/slurm-setup.md` section 5
130130

131131
## References
132132

.claude/skills/ptq/references/slurm-setup-ptq.md

Lines changed: 36 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -7,29 +7,54 @@ monitoring), see `skills/common/slurm-setup.md`.
77

88
## 1. Container
99

10-
Get the recommended image version from `examples/llm_ptq/README.md`, then look for a `.sqsh` file in the workspace and common sibling directories:
10+
Get the recommended image version from `examples/llm_ptq/README.md`, then look for an existing `.sqsh` file:
1111

1212
```bash
1313
ls *.sqsh ../*.sqsh ~/containers/*.sqsh 2>/dev/null
1414
```
1515

16-
If you find a `.sqsh` but aren't sure of its version, check it:
16+
**If a `.sqsh` exists**, use it directly with `--container-image=<path>`. Skip import.
17+
18+
**If no `.sqsh` exists**, import with enroot (caches for subsequent smoke tests and reruns):
1719

1820
```bash
19-
srun --container-image=<path/to/container.sqsh> --ntasks=1 bash -c \
20-
"pip show tensorrt-llm 2>/dev/null | grep Version || cat /VERSION 2>/dev/null || echo unknown"
21+
export ENROOT_CACHE_PATH=/path/to/writable/enroot-cache
22+
export ENROOT_DATA_PATH=/path/to/writable/enroot-data
23+
mkdir -p "$ENROOT_CACHE_PATH" "$ENROOT_DATA_PATH"
24+
enroot import --output /path/to/container.sqsh docker://nvcr.io#nvidia/tensorrt-llm/release:<version>
2125
```
2226

23-
If no `.sqsh` exists, import it with enroot. Set writable cache paths first — the default `/raid/containers` is often not writable:
27+
If enroot import fails (e.g., permission errors on lustre), use pyxis inline pull as fallback — pass the NGC URI directly to `--container-image="nvcr.io/nvidia/tensorrt-llm/release:<version>"`. Note this re-pulls on every job.
28+
29+
### Container dependency pitfalls
30+
31+
**New models may need newer transformers** than what's in the container:
2432

2533
```bash
26-
export ENROOT_CACHE_PATH=/path/to/writable/enroot-cache
27-
export ENROOT_DATA_PATH=/path/to/writable/enroot-data
28-
export TMPDIR=/path/to/writable/tmp
29-
mkdir -p "$ENROOT_CACHE_PATH" "$ENROOT_DATA_PATH" "$TMPDIR"
34+
pip install -U transformers
35+
```
36+
37+
For unlisted models that need unreleased transformers (e.g., from git), see `references/unsupported-models.md` Step A.
38+
39+
**Prefer `PYTHONPATH`** to use the synced ModelOpt source instead of installing inside the container — this avoids risking dependency conflicts (e.g., `pip install -U nvidia-modelopt[hf]` can upgrade PyTorch and break other packages):
40+
41+
```bash
42+
export PYTHONPATH=/path/to/Model-Optimizer:$PYTHONPATH
43+
```
44+
45+
If `PYTHONPATH` doesn't work due to missing compiled extensions, fall back to `pip install -e ".[hf]" --no-build-isolation` (run from the Model-Optimizer repo root).
46+
47+
**Watch for pip dependency conflicts** — NGC containers set `PIP_CONSTRAINT` to pin versions, causing `ResolutionImpossible` errors. Unset it first so pip can resolve freely:
48+
49+
```bash
50+
unset PIP_CONSTRAINT
51+
pip install -U transformers # now upgrades and resolves with new deps included
52+
```
3053

31-
enroot import --output /path/to/container.sqsh \
32-
docker://nvcr.io#nvidia/tensorrt-llm/release:<version>
54+
If that still conflicts, fall back to `--no-deps` (skips new deps — may need to add missing ones manually):
55+
56+
```bash
57+
pip install -U transformers --no-deps
3358
```
3459

3560
---
@@ -68,10 +93,3 @@ This catches script errors cheaply before using GPU quota on a real run.
6893
See `skills/common/slurm-setup.md` section 2 for the smoke test partition pattern.
6994

7095
Only submit the full calibration job after the smoke test exits cleanly.
71-
72-
---
73-
74-
## 4. PTQ-Specific Notes
75-
76-
- **Gated datasets**: Some calibration datasets (e.g., `nvidia/Nemotron-Post-Training-Dataset-v2`) require HF authentication. Set `HF_TOKEN` in the job environment, or use `--dataset cnn_dailymail` to use a non-gated alternative.
77-
- **NFS permissions**: Docker + NFS root_squash causes `PermissionError` on output/cache dirs. See `skills/common/slurm-setup.md` section 5 for fixes.

.claude/skills/ptq/references/unsupported-models.md

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,11 @@ After download, inspect the model files on the target machine (use `remote_run`
1515

1616
Write custom scripts locally (in `./workspaces/<model>/scripts/`), then sync to remote before running.
1717

18-
**Then check `config.json`** (on the target machine):
18+
**Check transformers compatibility** (on the target machine):
19+
20+
First, if README or `config.json` specifies a required transformers version, check if installed version satisfies it. If not, upgrade: `pip install -U "transformers>=<required_version>"`.
21+
22+
Then try loading:
1923

2024
```bash
2125
python -c "
@@ -40,16 +44,14 @@ print(type(cfg).__name__)
4044

4145
Read the modeling file and proceed to Step B.
4246

43-
- **Raises `ValueError` / `OSError` (unknown architecture)** → not in the installed transformers. Determine why:
44-
45-
1. **Check the transformers `main` branch** (not yet released):
47+
- **Raises `ValueError` / `OSError` (unknown architecture)** → not in the installed transformers. Try `pip install -U transformers` first. If still not found, check the `main` branch:
4648

4749
```bash
4850
git clone --depth 1 https://github.com/huggingface/transformers.git /tmp/transformers-main --quiet
4951
grep -r "class <ArchName>" /tmp/transformers-main/src/transformers/models/
5052
```
5153

52-
- **Found**install with deps: `pip install /tmp/transformers-main`, then re-run `AutoConfig.from_pretrained()`. **Important**: if ModelOpt is already installed, its `[hf]` extras may have pinned an older transformers. Install ModelOpt first, then upgrade transformers **after** (with deps, not `--no-deps`) so compatible `huggingface_hub` and other transitive deps are pulled in.
54+
- **Found**`pip install /tmp/transformers-main`, then re-run `AutoConfig`.
5355
- **Not found** → ask the user: *"The checkpoint uses `<ArchName>` which isn't in released or main-branch transformers. Do you have a private fork or custom modeling code?"*
5456

5557
- **No `config.json`** → not a standard HF checkpoint. List the directory for README or `.py` files. If nothing useful, ask the user for the modeling code.
@@ -131,13 +133,15 @@ class QuantCustomModule(OriginalModule):
131133
132134
## Pattern 2: MoE Models
133135
134-
**Standard MoE** (per-expert `nn.Linear` in a `ModuleList` with `gate` + `experts`): Auto-detected by `register_sparse_moe_on_the_fly`. No custom code needed — amax sync and calibration coverage are handled automatically.
136+
**Most MoE models are auto-detected** — ModelOpt handles two common patterns automatically:
137+
138+
- **transformers >= 5.0**: Unified fused experts (`gate_up_proj` + `down_proj` 3D tensors) → auto-detected by `register_fused_experts_on_the_fly`, handled by `_QuantFusedExperts`. Covers Mixtral, Qwen, DeepSeek, Jamba, OlMoE, etc.
139+
- **transformers < 5.0**: Sequential per-expert `nn.Linear` with `gate` + `experts` → auto-detected by `register_sparse_moe_on_the_fly`.
135140
136-
**Custom MoE** requires patching. Read the model source to understand how expert weights are stored and computed, then find the closest pattern in the plugin (`modelopt/torch/quantization/plugins/huggingface.py`):
141+
**Custom MoE** (non-standard layout not matching auto-detection) requires patching. Find the closest pattern in the plugin (`modelopt/torch/quantization/plugins/huggingface.py`):
137142
138143
| MoE design | Strategy | Plugin example |
139144
| --- | --- | --- |
140-
| Fused weights + per-expert dispatch loop | Expand to per-expert `nn.Linear` | `_QuantQwen35MoeExperts` |
141145
| Fused weights + `torch.bmm` | Add `TensorQuantizer` around bmm | `_QuantLlama4TextExperts` |
142146
| Fused weights + functional interception | Intercept matmul ops | `_QuantGptOssExperts` |
143147
| Fused 2D weights (experts stacked in rows) | Two-level expansion | `_QuantDbrxExpertGLU` |
@@ -343,3 +347,4 @@ tokenizer.save_pretrained(output_path)
343347
- **Check quantizer summary**: `mtq.print_quant_summary(model)` shows which quantizers are enabled/disabled
344348
- **Inspect dtypes**: After loading, iterate `model.named_parameters()` and check for unexpected FP8 tensors
345349
- **Watch for silent disabling**: A misconfigured wildcard pattern can silently disable quantizers — always verify the summary
350+
- **Read pip errors carefully**: `ResolutionImpossible` means dependency conflict (try `--no-deps`), NOT network failure. Check for `Connection refused`/`Name resolution failed` before concluding network is down

examples/speculative_decoding/collect_hidden_states/compute_hidden_states_hf.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,8 @@ def keep_conversation(entry):
142142
tokenizer = AutoTokenizer.from_pretrained(args.model, trust_remote_code=args.trust_remote_code)
143143
if tokenizer.pad_token is None:
144144
tokenizer.pad_token = tokenizer.eos_token
145-
tokenizer.chat_template = tokenizer.chat_template.replace(REMOVE_THINK_CHAT_TEMPLATE, "")
145+
if tokenizer.chat_template is not None:
146+
tokenizer.chat_template = tokenizer.chat_template.replace(REMOVE_THINK_CHAT_TEMPLATE, "")
146147

147148
output_dir = args.output_dir
148149
output_dir.mkdir(parents=True, exist_ok=True)

modelopt/torch/__init__.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,25 @@
1515

1616
"""Model optimization and deployment subpackage for torch."""
1717

18+
import importlib
1819
import warnings as _warnings
1920

2021
from packaging.version import Version as _Version
2122
from torch import __version__ as _torch_version
2223

23-
from . import distill, nas, opt, peft, prune, quantization, sparsity, speculative, utils
24+
# Pre-initialize torch._dynamo to prevent double-registration with peft's torch.compile() call
25+
importlib.import_module("torch._dynamo")
26+
from . import ( # noqa: E402
27+
distill,
28+
nas,
29+
opt,
30+
peft,
31+
prune,
32+
quantization,
33+
sparsity,
34+
speculative,
35+
utils,
36+
)
2437

2538
if _Version(_torch_version) < _Version("2.9"):
2639
_warnings.warn(

modelopt/torch/quantization/config.py

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1560,6 +1560,10 @@ def normalize_quant_cfg_list(v: dict | list) -> list[QuantizerCfgEntry]:
15601560
- An empty entry ``{}``.
15611561
- An entry with only ``quantizer_name`` and no other keys — the only effect would be an
15621562
implicit ``enable=True``, which must be stated explicitly.
1563+
- An entry with ``enable=True`` (explicit or implicit) whose ``cfg`` is not a non-empty
1564+
``dict`` or ``list`` — e.g. ``{"quantizer_name": "*", "cfg": {}}`` or
1565+
``{"quantizer_name": "*", "cfg": 42}``. An enabled quantizer must have a valid
1566+
configuration.
15631567
15641568
**Normalization** — after conversion and validation every entry is put into canonical form:
15651569
@@ -1577,7 +1581,8 @@ def normalize_quant_cfg_list(v: dict | list) -> list[QuantizerCfgEntry]:
15771581
15781582
Raises:
15791583
ValueError: If any entry has only ``quantizer_name`` with neither ``cfg`` nor ``enable``,
1580-
or if the entry format is not recognized.
1584+
if ``enable=True`` with an empty or non-dict/list ``cfg``, or if the entry format
1585+
is not recognized.
15811586
"""
15821587

15831588
def _warn_legacy():
@@ -1662,6 +1667,28 @@ def _dict_to_entry(key: str, value) -> list[QuantizerCfgEntry]:
16621667
"enable=True is not allowed; set it explicitly)."
16631668
)
16641669

1670+
# Validate: when cfg is present and enable=True, cfg must be a non-empty
1671+
# dict or list. An empty cfg would attempt to create a
1672+
# QuantizerAttributeConfig with no actual configuration.
1673+
cfg = entry.get("cfg")
1674+
enable = entry.get("enable", True)
1675+
if enable and cfg is not None:
1676+
if isinstance(cfg, dict):
1677+
is_invalid = len(cfg) == 0
1678+
elif isinstance(cfg, list):
1679+
is_invalid = len(cfg) == 0 or any(
1680+
not isinstance(item, dict) or len(item) == 0 for item in cfg
1681+
)
1682+
else:
1683+
is_invalid = True
1684+
if is_invalid:
1685+
raise ValueError(
1686+
f"Invalid quant_cfg entry: {raw!r} — 'cfg' must be a non-empty dict "
1687+
f"or a non-empty list of non-empty dicts when enabling a quantizer "
1688+
f"(got {type(cfg).__name__}: {cfg!r}). Either provide quantizer "
1689+
"attributes in 'cfg' or remove 'cfg' and set 'enable' explicitly."
1690+
)
1691+
16651692
# Normalize: make enable and cfg always explicit.
16661693
entry.setdefault("enable", True)
16671694
entry.setdefault("cfg", None)

tests/examples/speculative_decoding/conftest.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,38 @@
1313
# See the License for the specific language governing permissions and
1414
# limitations under the License.
1515

16+
import json
17+
1618
import pytest
1719
import yaml
1820
from _test_utils.examples.run_command import run_example_command
1921

2022

23+
@pytest.fixture(scope="session")
24+
def tiny_conversations_path(tmp_path_factory):
25+
"""Tiny JSONL with short synthetic conversations for compute_hidden_states_hf tests.
26+
27+
Uses minimal single-turn conversations so that tokenized lengths stay well
28+
within the tiny test model's max_position_embeddings (32) even after chat
29+
template formatting.
30+
"""
31+
tmp_dir = tmp_path_factory.mktemp("tiny_convs")
32+
output_file = tmp_dir / "train.jsonl"
33+
conversations = [
34+
{
35+
"conversation_id": f"test-{i}",
36+
"conversations": [
37+
{"role": "user", "content": "What is 2 plus 2?"},
38+
{"role": "assistant", "content": "4"},
39+
],
40+
}
41+
for i in range(5)
42+
]
43+
with open(output_file, "w") as f:
44+
f.writelines(json.dumps(conv) + "\n" for conv in conversations)
45+
return output_file
46+
47+
2148
@pytest.fixture(scope="session", autouse=True)
2249
def tiny_daring_anteater_path(tmp_path_factory):
2350
tmp_dir = tmp_path_factory.mktemp("daring_anteater")

tests/examples/speculative_decoding/test_eagle_offline_ptq.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ def offline_ptq_dirs(tmp_path_factory):
5555
}
5656

5757

58-
def test_collect_hidden_states(tiny_llama_path, tiny_daring_anteater_path, offline_ptq_dirs):
58+
def test_collect_hidden_states(tiny_llama_path, tiny_conversations_path, offline_ptq_dirs):
5959
"""Stage 1: generate .pt hidden state files from the base model."""
6060
run_example_command(
6161
[
@@ -64,11 +64,13 @@ def test_collect_hidden_states(tiny_llama_path, tiny_daring_anteater_path, offli
6464
"--model",
6565
tiny_llama_path,
6666
"--input-data",
67-
str(tiny_daring_anteater_path),
67+
str(tiny_conversations_path),
6868
"--output-dir",
6969
str(offline_ptq_dirs["hidden_states"]),
7070
"--debug-max-num-conversations",
7171
"2",
72+
"--max-seq-len",
73+
"32",
7274
],
7375
"speculative_decoding",
7476
)

tests/unit/torch/quantization/test_config_validation.py

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,60 @@ def test_error_on_multi_key_legacy_dict(self):
163163
with pytest.raises(ValueError):
164164
normalize_quant_cfg_list([{"*weight_quantizer": {}, "*input_quantizer": {}}])
165165

166+
def test_error_on_empty_cfg_dict_implicit_enable(self):
167+
"""Entry with cfg={} and implicit enable=True is rejected."""
168+
with pytest.raises(ValueError, match="non-empty dict"):
169+
normalize_quant_cfg_list([{"quantizer_name": "*weight_quantizer", "cfg": {}}])
170+
171+
def test_error_on_empty_cfg_dict_explicit_enable_true(self):
172+
"""Entry with cfg={} and explicit enable=True is rejected."""
173+
with pytest.raises(ValueError, match="non-empty dict"):
174+
normalize_quant_cfg_list(
175+
[{"quantizer_name": "*weight_quantizer", "cfg": {}, "enable": True}]
176+
)
177+
178+
def test_error_on_empty_cfg_list_enable_true(self):
179+
"""Entry with cfg=[] and enable=True is rejected."""
180+
with pytest.raises(ValueError, match="non-empty dict"):
181+
normalize_quant_cfg_list(
182+
[{"quantizer_name": "*weight_quantizer", "cfg": [], "enable": True}]
183+
)
184+
185+
def test_error_on_non_dict_non_list_cfg_enable_true(self):
186+
"""Entry with cfg of invalid type (e.g. int) and enable=True is rejected."""
187+
with pytest.raises(ValueError, match="non-empty dict"):
188+
normalize_quant_cfg_list(
189+
[{"quantizer_name": "*weight_quantizer", "cfg": 42, "enable": True}]
190+
)
191+
192+
def test_error_on_cfg_list_with_empty_dict_enable_true(self):
193+
"""Entry with cfg=[{}] and enable=True is rejected (empty dict element)."""
194+
with pytest.raises(ValueError, match="non-empty dict"):
195+
normalize_quant_cfg_list(
196+
[{"quantizer_name": "*weight_quantizer", "cfg": [{}], "enable": True}]
197+
)
198+
199+
def test_error_on_cfg_list_with_non_dict_element_enable_true(self):
200+
"""Entry with cfg=[42] and enable=True is rejected (non-dict element)."""
201+
with pytest.raises(ValueError, match="non-empty dict"):
202+
normalize_quant_cfg_list(
203+
[{"quantizer_name": "*weight_quantizer", "cfg": [42], "enable": True}]
204+
)
205+
206+
def test_empty_cfg_dict_enable_false_accepted(self):
207+
"""Entry with cfg={} and enable=False is allowed (disable-only entry)."""
208+
result = normalize_quant_cfg_list(
209+
[{"quantizer_name": "*input_quantizer", "cfg": {}, "enable": False}]
210+
)
211+
assert result[0]["enable"] is False
212+
213+
def test_empty_cfg_list_enable_false_accepted(self):
214+
"""Entry with cfg=[] and enable=False is allowed (disable-only entry)."""
215+
result = normalize_quant_cfg_list(
216+
[{"quantizer_name": "*input_quantizer", "cfg": [], "enable": False}]
217+
)
218+
assert result[0]["enable"] is False
219+
166220
def test_new_format_with_list_cfg(self):
167221
"""cfg can be a list of dicts for SequentialQuantizer."""
168222
raw = [

0 commit comments

Comments
 (0)