GAME-rs

A Rust port of GAME (Generative Adaptive MIDI Extractor) — an inference engine that transcribes singing/vocal audio into note events. It loads a pre-trained GGUF model and converts a WAV file into MIDI, TXT, or CSV by running three model stages: an encoder, a D3PM-diffusion segmenter, and an estimator.

This is a from-scratch port of openvpi/GAME with no Python or external ML-framework runtime dependency — just Rust, with hand-written CPU kernels and an optional WGPU backend.

Features

Self-contained CLI — one binary, no Python interpreter, no libtorch / ONNX runtime.
GGUF model loading with safety-hardened parsing (bounds checks, alignment-safe decode).
Two backends:
- CPU — hand-optimized kernels (blocked attention, GEMM-backed linear/matmul, depthwise conv, RoPE).
- GPU — WGPU compute shaders (Vulkan / Metal / DX12 / GL), enabled via the gpu feature.
Automatic device selection — tries GPU, falls back to CPU on failure (including driver timeouts).
Chunk parallelism — long audio is sliced on silence, split into bounded chunks, and inferred in parallel across CPU cores with deterministic per-chunk seeding.
Multiple output formats — MIDI (.mid), tab-separated text (.txt), or CSV (.csv).
Structured progress + logging — rich TTY progress bars, plus RUST_LOG integration for headless runs.
Production hardening — panic isolation in workers, bounded allocations, memory back-pressure, and contextual error messages.

Installation

Requires a stable Rust toolchain (≥ 1.85, edition 2024). The repo pins this via rust-toolchain.toml.

git clone https://github.com/Jobsecond/GAME-rs.git
cd GAME-rs

# Build the CPU-only CLI release binary (default)
cargo build --release -p game-cli --no-default-features

The binary is produced at target/release/game-cli (game-cli.exe on Windows).

Note: Always pass --no-default-features for CPU-only builds. The default feature set is empty, but omitting the flag can pull in unintended dependencies on some configurations.

Building the CLI with GPU support

cargo build --release -p game-cli --features gpu

This builds the CLI only. The root workspace does not have a gui feature, so cargo build --release --features gpu,gui is invalid.

The GPU backend uses WGPU and will pick a Vulkan / Metal / DX12 / GL adapter at runtime.

Building the GUI

The GUI is a separate package outside the root workspace:

cargo build --release --manifest-path gui/Cargo.toml --target-dir target

# Optional GPU-enabled GUI build
cargo build --release --manifest-path gui/Cargo.toml --features gpu --target-dir target

With --target-dir target, the GUI binary is produced at target/release/game-gui (game-gui.exe on Windows). Without that flag, Cargo uses the standalone GUI package's default gui/target/release/ directory.

Usage

The CLI has two subcommands: extract (audio → notes) and inspect (examine a GGUF model).

`extract` — transcribe audio to notes

game-cli extract --model path/to/model.gguf --output out.mid input.wav

The output format is inferred from the --output extension (.mid/.midi → MIDI, .txt → TXT, .csv → CSV), or set explicitly with --format.

Common options

Flag	Default	Description
`-m, --model <PATH>`	(required)	Path to the GGUF model file.
`-o, --output <PATH>`	(required)	Output file path; format inferred from extension.
`--format <midi\|txt\|csv>`	from extension	Force the output format.
`--device <cpu\|gpu>`	gpu if available, else cpu	Compute backend.
`--seed <U64>`	`0`	RNG seed; `0` means non-deterministic (random).
`--d3pm-nsteps <N>`	`1`	Number of D3PM diffusion refinement steps. Higher = better quality, slower.
`--d3pm-t0 <F>`	`0.0`	D3PM starting time.
`--boundary-threshold <F>`	`0.2`	Note-boundary detection threshold.
`--boundary-radius <N>`	`2`	Boundary smoothing radius.
`--note-threshold <F>`	`0.2`	Voicing/note presence threshold.
`--language <N>`	`0`	Language ID (for multi-language models).
`--chunk-parallelism <auto\|on\|off>`	`auto`	Parallelize inference across audio chunks.
`--max-chunk-seconds <N>`	`60`	Hard-split sliced chunks longer than this many seconds.

GPU adapter selection

When multiple GPUs are present, pick a specific one:

Flag	Description
`--gpu-name <SUBSTRING>`	Match adapter name (case-insensitive substring).
`--gpu-vendor-id <ID>`	Match PCI vendor ID (e.g. `0x10de` for NVIDIA). Accepts hex (`0x…`) or decimal.
`--gpu-device-id <ID>`	Match PCI device ID.

Examples

# Higher-quality transcription with 8 diffusion steps, deterministic output
game-cli extract -m path/to/model.gguf -o vocals.mid --d3pm-nsteps 8 --seed 42 vocals.wav

# CSV output, CPU only, serial (no chunk parallelism)
game-cli extract -m path/to/model.gguf -o notes.csv --device cpu --chunk-parallelism off song.wav

# Force a specific NVIDIA GPU
game-cli extract -m path/to/model.gguf -o out.mid --device gpu --gpu-vendor-id 0x10de input.wav

`inspect` — examine a GGUF model

game-cli inspect --model path/to/model.gguf

Prints the GGUF version, architecture, quantization, tensor/parameter counts, model config, and inference parameters (sample rate, hop size, mel-spectrogram setup, etc.).

Flag	Default	Description
`-m, --model <PATH>`	(required)	Path to the GGUF model file.
`--show-tensors <N>`	`8`	Number of tensors to list.
`--tensor-prefix <PREFIX>`	—	Filter listed tensors by name prefix.
`--format <text\|json>`	`text`	Output format. Use `json` for machine parsing.

# Machine-readable summary
game-cli inspect -m model.gguf --format json

# List all estimator tensors
game-cli inspect -m model.gguf --tensor-prefix estimator --show-tensors 100

Output formats

MIDI (.mid) — single-track SMF with note-on/note-off events at the configured tempo.
TXT (.txt) — tab-separated: offset<TAB>duration<TAB>pitch.
CSV (.csv) — comma-separated with header: offset,duration,pitch.

For text formats, timing is in seconds and pitch is in MIDI numbers (60 = C4, fractional values allowed for microtonal pitch). Unvoiced segments are emitted as rest.

Architecture

The root project is a virtual Cargo workspace with four reusable library crates plus the CLI package. The GUI package is checked separately because it has heavier frontend dependencies and a higher Rust version requirement.

Crate	Path	Responsibility
`game-core`	`crates/core`	GGUF loading, model forward passes, tensor backends (CPU/GPU), mel spectrogram, RNG, profiler.
`game-audio`	`crates/audio`	WAV decode, resample, mono mixdown, silence-based slicing, long-chunk splitting.
`game-output`	`crates/output`	MIDI encoding (via `midly`), TXT/CSV output.
`game-service`	`crates/service`	Orchestration: request → audio prep → chunk parallelism → inference → output. Public API: `extract_with_notifier()`.
`game-cli`	`cli/`	CLI with `inspect` and `extract` subcommands.
`game-gui`	`gui/`	Standalone egui frontend package, outside the root workspace.

Inference pipeline

Audio prep — decode WAV, mix to mono, resample to the model's target rate.
Slicing — cut on silence boundaries, then hard-split chunks longer than --max-chunk-seconds.
Per-chunk inference (parallel on CPU):
- Encoder — mel spectrogram → contextual embeddings.
- Segmenter — iterative D3PM diffusion refinement (run --d3pm-nsteps times).
- Estimator — final pitch/voicing logits → note events.
Aggregation — chunks sorted by index, note offsets shifted by chunk position, concatenated.
Output — encode to MIDI/TXT/CSV.

Tensor backends

A Tensor trait with two implementations is dispatched at model-load time:

CPU (tensor/cpu/) — Arc<Vec<f32>> storage with stride-based views and hand-written kernels (attention.rs, matmul.rs, conv.rs, norm.rs, rope.rs, …).
GPU (tensor/gpu/) — WGPU compute with WGSL shaders in tensor/gpu/shaders/.

Configuration via environment variables

Variable	Default	Purpose
`GAME_ATTENTION_BLOCK_K`	`128`	K-dimension block size for blocked attention (`0` disables, uses the old path).
`GAME_MAX_ATTENTION_SCORE_ELEMENTS`	`32M`	Attention score allocation cap.
`GAME_MAX_CONCURRENT_CHUNKS`	num threads	Max chunks processed simultaneously (memory back-pressure limiter).
`GAME_LINEAR_TARGET_TASKS`	physical cores	Rayon tasks for the linear layer.
`GAME_LINEAR_MIN_OUTPUTS_PER_CHUNK`	`16384`	Min outputs per Rayon task chunk.
`GAME_DISABLE_CHUNK_PARALLELISM`	—	Disable chunk parallelism at runtime.
`GAME_CPU_PROFILE`	off	Enable hand-rolled scope-based CPU profiling.
`GAME_CPU_PROFILE_TOP`	`20`	Number of top profiling entries to show.
`RUST_LOG`	—	Standard `env_logger` filter (e.g. `RUST_LOG=info`) for headless logging.

Development

# Fast compile check (no codegen)
cargo check --workspace --no-default-features

# Run the full root-workspace test suite
cargo test --workspace --no-default-features

# Run a single test with output
cargo test --workspace --no-default-features <test_name> -- --nocapture

# GPU compile check / tests
cargo check --workspace --features gpu
cargo test --workspace --features gpu tensor::gpu -- --nocapture

# Lint and format (advisory; matches CI)
cargo fmt --all --check
cargo clippy --workspace --all-targets --no-default-features

Feature flags

gpu — WGPU-based GPU inference.
cpu-attention-gemm-gemm — use the gemm crate for attention matmul (default CPU path).
cpu-attention-gemm-matrixmultiply — swap to matrixmultiply for A/B testing (mutually exclusive with the above).

CI

GitHub Actions runs an enforcing build+test matrix across Linux / macOS / Windows × two CPU attention backends, a GPU compile-check on all three OSes, and an advisory fmt + clippy + cargo-deny pass. See .github/workflows/ci.yml.

License

Licensed under the MIT License.

This project is a Rust port of GAME by Team OpenVPI, also distributed under the MIT License. The upstream copyright notice is preserved in the LICENSE file as required.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github/workflows		.github/workflows
cli		cli
crates		crates
gui		gui
.gitignore		.gitignore
.rustfmt.toml		.rustfmt.toml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
clippy.toml		clippy.toml
deny.toml		deny.toml
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GAME-rs

Features

Installation

Building the CLI with GPU support

Building the GUI

Usage

`extract` — transcribe audio to notes

Common options

GPU adapter selection

Examples

`inspect` — examine a GGUF model

Output formats

Architecture

Inference pipeline

Tensor backends

Configuration via environment variables

Development

Feature flags

CI

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GAME-rs

Features

Installation

Building the CLI with GPU support

Building the GUI

Usage

extract — transcribe audio to notes

Common options

GPU adapter selection

Examples

inspect — examine a GGUF model

Output formats

Architecture

Inference pipeline

Tensor backends

Configuration via environment variables

Development

Feature flags

CI

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`extract` — transcribe audio to notes

`inspect` — examine a GGUF model

Packages