A .NET 10 speech pipeline library and toolset for local, offline inference using ONNX models.
No cloud. No telemetry. Runs entirely on your hardware.
Vernacula converts audio into accurate, multi-speaker transcripts on your own computer. It ships as a reusable library (Vernacula.Base), a command-line tool (Vernacula.CLI), and a cross-platform desktop app (Vernacula-Desktop, built on Avalonia UI).
Powered by NVIDIA's Parakeet TDT v3 and Sortformer by default, with optional pluggable backends (Cohere Transcribe, Qwen3-ASR, VibeVoice-ASR, Granite Speech 4.1). Parakeet v3 posts a Word Error Rate of 4.85 on Google's FLEURS benchmark. Most modern computers will transcribe one hour of audio in about five minutes; GPU-accelerated systems are significantly faster.
vernacula-desktop_usage_demo.mp4
More screenshots and a feature tour live in docs/desktop-app.md.
- Local, private transcription — audio never leaves your computer
- Multi-speaker detection — identifies and labels up to four concurrent speakers
- No audio length limits — streaming and segmentation handle indefinite file lengths
- Transcript editor with confidence colouring, audio playback, and word-level timestamps
- Pluggable ASR backends — Parakeet TDT v3, Cohere Transcribe, Qwen3-ASR, VibeVoice-ASR, Granite Speech 4.1
- Shallow KenLM fusion for domain-specific English (general, medical)
- Export to XLSX, CSV, JSON, SRT, Markdown, DOCX, and SQLite
- GPU acceleration via CUDA (DirectML on Windows), with automatic CPU fallback
- 52 languages covered across the four backends — see the support matrix
Vernacula's models are converted in-house from upstream PyTorch / NeMo / HuggingFace checkpoints into the ONNX contract its C# inference code expects. The export tooling lives in scripts/ and is usable independently of the rest of the project — the export scripts are dev-time only and never ship as a runtime dependency.
| Model | Source | Export tooling |
|---|---|---|
| Parakeet TDT v3 / RNNT | nvidia/parakeet-tdt-0.6b-v3 | scripts/nemo_export |
| Sortformer streaming diarization | nvidia/diar_sortformer_4spk-v2.1 | scripts/nemo_export |
| Silero VAD | snakers4/silero-vad | scripts/nemo_export |
| Qwen3-ASR | Qwen/Qwen3-ASR-0.6B, Qwen3-ASR-1.7B | scripts/qwen3asr_export |
| Cohere Transcribe | CohereLabs/cohere-transcribe-03-2026 | scripts/cohere_export |
| VibeVoice-ASR | microsoft/VibeVoice-ASR-HF | scripts/vibevoice_export |
| Granite Speech 4.1 | ibm-granite/granite-speech-4.1-2b | scripts/granite_export |
| DeepFilterNet3 (streaming) | Rikorose/DeepFilterNet | scripts/deepfilternet3_export |
| DiariZen + WeSpeaker | BUTSpeechFIT/DiariZen | scripts/diarizen_export |
| VoxLingua107 (language ID) | speechbrain/lang-id-voxlingua107-ecapa | scripts/voxlingua107_export |
Most of these graphs (split KV-cache decoders, transducer/TDT decoder state, streaming GRU hidden-state I/O, six-input Sortformer chunked diarization) require non-trivial graph surgery beyond torch.onnx.export defaults. Each export folder has its own README with the contract, parity checks, and tuning notes.
A KenLM build pipeline for Parakeet shallow fusion lives in scripts/kenlm_build; an in-progress IndicConformer export spike is in scripts/indicconformer_export.
Install prerequisites — .NET 10 SDK plus FFmpeg. Full setup (including GPU) is in docs/installation.md.
Run the desktop app:
cd src/Vernacula.Avalonia
dotnet runOn Linux, ./install.sh from the repo root builds a self-contained package and registers a .desktop entry.
Run the CLI:
dotnet run --project src/Vernacula.CLI -p:EP=Cuda -- \
--audio meeting.wav --model ~/models/vernaculaFull argument reference and more examples in docs/cli-reference.md. Build configurations (CUDA / CPU / DirectML) in docs/building.md.
Full documentation lives in docs/.
Getting started
- Installation — .NET 10, FFmpeg, GPU prerequisites, Linux installer
- Desktop app — features, screenshots, walkthrough
- CLI reference — invocation, arguments, examples
- Models — required and optional model downloads
- Building from source — build configurations and publish guidance
Reference
Project
Vernacula.BaseandVernacula.CLI— MITVernacula.Avalonia— PolyForm Shield 1.0.0 (free to use and build; may not be used to create a competing commercial product)- Model weights — see respective HuggingFace repository licenses
See docs/licensing.md for the full breakdown.

