Commit a0cbc46
authored
refactor(tinygrad): reuse tinygrad.apps.llm instead of vendored Transformer (#9380)
Drop the 295-line vendor/llama.py fork in favor of `tinygrad.apps.llm`,
which now provides the Transformer blocks, GGUF loader (incl. Q4/Q6/Q8
quantization), KV-cache and generate loop we were maintaining ourselves.
What changed:
- New vendor/appsllm_adapter.py (~90 LOC) — HF -> GGUF-native state-dict
keymap, Transformer kwargs builder, `_embed_hidden` helper, and a hard
rejection of qkv_bias models (Qwen2 / 2.5 are no longer supported; the
apps.llm Transformer ties `bias=False` on Q/K/V projections).
- backend.py routes both safetensors and GGUF paths through
apps.llm.Transformer. Generation now delegates to its (greedy-only)
`generate()`; Temperature / TopK / TopP / RepetitionPenalty are still
accepted on the wire but ignored — documented in the module docstring.
- Jinja chat render now passes `enable_thinking=False` so Qwen3's
reasoning preamble doesn't eat the tool-call token budget on small
models.
- Embedding path uses `_embed_hidden` (block stack + output_norm) rather
than the custom `embed()` method we were carrying on the vendored
Transformer.
- test.py gains TestAppsLLMAdapter covering the keymap rename, tied
embedding fallback, unknown-key skipping, and qkv_bias rejection.
- Makefile fixtures move from Qwen/Qwen2.5-0.5B-Instruct to Qwen/Qwen3-0.6B
(apps.llm-compatible) and tool_parser from qwen3_xml to hermes (the
HF chat template emits hermes-style JSON tool calls).
Verified with the docker-backed targets:
test-extra-backend-tinygrad 5/5 PASS
test-extra-backend-tinygrad-embeddings 3/3 PASS
test-extra-backend-tinygrad-whisper 4/4 PASS
test-extra-backend-tinygrad-sd 3/3 PASS1 parent b4e3069 commit a0cbc46
5 files changed
Lines changed: 345 additions & 433 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
560 | 560 | | |
561 | 561 | | |
562 | 562 | | |
563 | | - | |
| 563 | + | |
564 | 564 | | |
565 | 565 | | |
566 | 566 | | |
567 | 567 | | |
568 | 568 | | |
569 | | - | |
570 | | - | |
571 | | - | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
572 | 572 | | |
573 | 573 | | |
574 | | - | |
| 574 | + | |
575 | 575 | | |
576 | 576 | | |
577 | 577 | | |
| |||
0 commit comments