You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .claude/skills/ptq/SKILL.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -124,9 +124,9 @@ Report the path and size to the user.
124
124
125
125
## Common Pitfalls
126
126
127
-
-**Transformers version**: Newer models (e.g., Devstral/ministral3) may require a transformers version not yet in the container. Check `config.json` for `transformers_version` and upgrade if needed. Install ModelOpt first, then upgrade transformers **with** deps (not `--no-deps`) to pull compatible `huggingface_hub`
127
+
-**Transformers version**: New models may need a newer version of transformers than what's installed. Check `config.json` for `transformers_version`. In containers, beware of `PIP_CONSTRAINT` blocking upgrades — see `references/slurm-setup-ptq.md` for workarounds
128
128
-**Gated datasets**: Some calibration datasets require HF authentication. Ensure `HF_TOKEN` is set in the job environment, or use `--dataset cnn_dailymail` as a non-gated alternative
129
-
-**NFS root_squash + Docker**: Docker runs as root, but NFS squashes root to `nobody`. Use `docker run --user $(id -u):$(id -g)`, or `chmod -R a+rwX` on needed directories as a fallback. See `skills/common/slurm-setup.md` section 5
129
+
-**NFS root_squash + Docker**: See `skills/common/slurm-setup.md` section 5
If no `.sqsh` exists, import it with enroot. Set writable cache paths first — the default `/raid/containers` is often not writable:
27
+
If enroot import fails (e.g., permission errors on lustre), use pyxis inline pull as fallback — pass the NGC URI directly to `--container-image="nvcr.io/nvidia/tensorrt-llm/release:<version>"`. Note this re-pulls on every job.
28
+
29
+
### Container dependency pitfalls
30
+
31
+
**New models may need newer transformers** than what's in the container:
For unlisted models that need unreleased transformers (e.g., from git), see `references/unsupported-models.md` Step A.
38
+
39
+
**Prefer `PYTHONPATH`** to use the synced ModelOpt source instead of installing inside the container — this avoids risking dependency conflicts (e.g., `pip install -U nvidia-modelopt[hf]` can upgrade PyTorch and break other packages):
If `PYTHONPATH` doesn't work due to missing compiled extensions, fall back to `pip install -e ".[hf]" --no-build-isolation` (run from the Model-Optimizer repo root).
46
+
47
+
**Watch for pip dependency conflicts** — NGC containers set `PIP_CONSTRAINT` to pin versions, causing `ResolutionImpossible` errors. Unset it first so pip can resolve freely:
48
+
49
+
```bash
50
+
unset PIP_CONSTRAINT
51
+
pip install -U transformers # now upgrades and resolves with new deps included
If that still conflicts, fall back to `--no-deps` (skips new deps — may need to add missing ones manually):
55
+
56
+
```bash
57
+
pip install -U transformers --no-deps
33
58
```
34
59
35
60
---
@@ -68,10 +93,3 @@ This catches script errors cheaply before using GPU quota on a real run.
68
93
See `skills/common/slurm-setup.md` section 2 for the smoke test partition pattern.
69
94
70
95
Only submit the full calibration job after the smoke test exits cleanly.
71
-
72
-
---
73
-
74
-
## 4. PTQ-Specific Notes
75
-
76
-
-**Gated datasets**: Some calibration datasets (e.g., `nvidia/Nemotron-Post-Training-Dataset-v2`) require HF authentication. Set `HF_TOKEN` in the job environment, or use `--dataset cnn_dailymail` to use a non-gated alternative.
77
-
-**NFS permissions**: Docker + NFS root_squash causes `PermissionError` on output/cache dirs. See `skills/common/slurm-setup.md` section 5 for fixes.
Copy file name to clipboardExpand all lines: .claude/skills/ptq/references/unsupported-models.md
+13-8Lines changed: 13 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,11 @@ After download, inspect the model files on the target machine (use `remote_run`
15
15
16
16
Write custom scripts locally (in `./workspaces/<model>/scripts/`), then sync to remote before running.
17
17
18
-
**Then check `config.json`** (on the target machine):
18
+
**Check transformers compatibility** (on the target machine):
19
+
20
+
First, if README or `config.json` specifies a required transformers version, check if installed version satisfies it. If not, upgrade: `pip install -U "transformers>=<required_version>"`.
21
+
22
+
Then try loading:
19
23
20
24
```bash
21
25
python -c "
@@ -40,16 +44,14 @@ print(type(cfg).__name__)
40
44
41
45
Read the modeling file and proceed to Step B.
42
46
43
-
-**Raises `ValueError` / `OSError` (unknown architecture)** → not in the installed transformers. Determine why:
44
-
45
-
1.**Check the transformers `main` branch** (not yet released):
47
+
-**Raises `ValueError` / `OSError` (unknown architecture)** → not in the installed transformers. Try `pip install -U transformers` first. If still not found, check the `main` branch:
- **Found** → install with deps: `pip install /tmp/transformers-main`, then re-run `AutoConfig.from_pretrained()`. **Important**: if ModelOpt is already installed, its `[hf]` extras may have pinned an older transformers. Install ModelOpt first, then upgrade transformers **after** (with deps, not `--no-deps`) so compatible `huggingface_hub` and other transitive deps are pulled in.
54
+
- **Found** → `pip install /tmp/transformers-main`, then re-run `AutoConfig`.
53
55
- **Not found** → ask the user: *"The checkpoint uses `<ArchName>` which isn't in released or main-branch transformers. Do you have a private fork or custom modeling code?"*
54
56
55
57
- **No `config.json`** → not a standard HF checkpoint. List the directory for README or `.py` files. If nothing useful, ask the user for the modeling code.
@@ -131,13 +133,15 @@ class QuantCustomModule(OriginalModule):
131
133
132
134
## Pattern 2: MoE Models
133
135
134
-
**Standard MoE** (per-expert `nn.Linear` in a `ModuleList` with `gate` + `experts`): Auto-detected by `register_sparse_moe_on_the_fly`. No custom code needed — amax sync and calibration coverage are handled automatically.
136
+
**Most MoE models are auto-detected** — ModelOpt handles two common patterns automatically:
137
+
138
+
- **transformers >= 5.0**: Unified fused experts (`gate_up_proj` + `down_proj` 3D tensors) → auto-detected by `register_fused_experts_on_the_fly`, handled by `_QuantFusedExperts`. Covers Mixtral, Qwen, DeepSeek, Jamba, OlMoE, etc.
139
+
- **transformers < 5.0**: Sequential per-expert `nn.Linear` with `gate` + `experts` → auto-detected by `register_sparse_moe_on_the_fly`.
135
140
136
-
**Custom MoE** requires patching. Read the model source to understand how expert weights are stored and computed, then find the closest pattern in the plugin (`modelopt/torch/quantization/plugins/huggingface.py`):
141
+
**Custom MoE** (non-standard layout not matching auto-detection) requires patching. Find the closest pattern in the plugin (`modelopt/torch/quantization/plugins/huggingface.py`):
- **Check quantizer summary**: `mtq.print_quant_summary(model)` shows which quantizers are enabled/disabled
344
348
- **Inspect dtypes**: After loading, iterate `model.named_parameters()` and check for unexpected FP8 tensors
345
349
- **Watch for silent disabling**: A misconfigured wildcard pattern can silently disable quantizers — always verify the summary
350
+
- **Read pip errors carefully**: `ResolutionImpossible` means dependency conflict (try `--no-deps`), NOT network failure. Check for `Connection refused`/`Name resolution failed` before concluding network is down
0 commit comments