Vernacula

A .NET 10 speech pipeline library and toolset for local, offline inference using ONNX models.
No cloud. No telemetry. Runs entirely on your hardware.

Vernacula converts audio into accurate, multi-speaker transcripts on your own computer. It ships as a reusable library (Vernacula.Base), a command-line tool (Vernacula.CLI), and a cross-platform desktop app (Vernacula-Desktop, built on Avalonia UI).

Powered by NVIDIA's Parakeet TDT v3 and Sortformer by default, with optional pluggable backends (Cohere Transcribe, Qwen3-ASR, VibeVoice-ASR, Granite Speech 4.1). Parakeet v3 posts a Word Error Rate of 4.85 on Google's FLEURS benchmark. Most modern computers will transcribe one hour of audio in about five minutes; GPU-accelerated systems are significantly faster.

Demo

vernacula-desktop_usage_demo.mp4

More screenshots and a feature tour live in docs/desktop-app.md.

Highlights

Local, private transcription — audio never leaves your computer
Multi-speaker detection — identifies and labels up to four concurrent speakers
No audio length limits — streaming and segmentation handle indefinite file lengths
Transcript editor with confidence colouring, audio playback, and word-level timestamps
Pluggable ASR backends — Parakeet TDT v3, Cohere Transcribe, Qwen3-ASR, VibeVoice-ASR, Granite Speech 4.1
Shallow KenLM fusion for domain-specific English (general, medical)
Export to XLSX, CSV, JSON, SRT, Markdown, DOCX, and SQLite
GPU acceleration via CUDA (DirectML on Windows), with automatic CPU fallback
52 languages covered across the four backends — see the support matrix

Model conversion pipelines

Vernacula's models are converted in-house from upstream PyTorch / NeMo / HuggingFace checkpoints into the ONNX contract its C# inference code expects. The export tooling lives in scripts/ and is usable independently of the rest of the project — the export scripts are dev-time only and never ship as a runtime dependency.

Model	Source	Export tooling
Parakeet TDT v3 / RNNT	nvidia/parakeet-tdt-0.6b-v3	scripts/nemo_export
Sortformer streaming diarization	nvidia/diar_sortformer_4spk-v2.1	scripts/nemo_export
Silero VAD	snakers4/silero-vad	scripts/nemo_export
Qwen3-ASR	Qwen/Qwen3-ASR-0.6B, Qwen3-ASR-1.7B	scripts/qwen3asr_export
Cohere Transcribe	CohereLabs/cohere-transcribe-03-2026	scripts/cohere_export
VibeVoice-ASR	microsoft/VibeVoice-ASR-HF	scripts/vibevoice_export
Granite Speech 4.1	ibm-granite/granite-speech-4.1-2b	scripts/granite_export
DeepFilterNet3 (streaming)	Rikorose/DeepFilterNet	scripts/deepfilternet3_export
DiariZen + WeSpeaker	BUTSpeechFIT/DiariZen	scripts/diarizen_export
VoxLingua107 (language ID)	speechbrain/lang-id-voxlingua107-ecapa	scripts/voxlingua107_export

Most of these graphs (split KV-cache decoders, transducer/TDT decoder state, streaming GRU hidden-state I/O, six-input Sortformer chunked diarization) require non-trivial graph surgery beyond torch.onnx.export defaults. Each export folder has its own README with the contract, parity checks, and tuning notes.

A KenLM build pipeline for Parakeet shallow fusion lives in scripts/kenlm_build; an in-progress IndicConformer export spike is in scripts/indicconformer_export.

Quick start

Install prerequisites — .NET 10 SDK plus FFmpeg. Full setup (including GPU) is in docs/installation.md.

Run the desktop app:

cd src/Vernacula.Avalonia
dotnet run

On Linux, ./install.sh from the repo root builds a self-contained package and registers a .desktop entry.

Run the CLI:

dotnet run --project src/Vernacula.CLI -p:EP=Cuda -- \
  --audio meeting.wav --model ~/models/vernacula

Full argument reference and more examples in docs/cli-reference.md. Build configurations (CUDA / CPU / DirectML) in docs/building.md.

Documentation

Full documentation lives in docs/.

Getting started

Installation — .NET 10, FFmpeg, GPU prerequisites, Linux installer
Desktop app — features, screenshots, walkthrough
CLI reference — invocation, arguments, examples
Models — required and optional model downloads
Building from source — build configurations and publish guidance

Reference

Project

License

Vernacula.Base and Vernacula.CLI — MIT
Vernacula.Avalonia — PolyForm Shield 1.0.0 (free to use and build; may not be used to create a competing commercial product)
Model weights — see respective HuggingFace repository licenses

See docs/licensing.md for the full breakdown.

Name		Name	Last commit message	Last commit date
Latest commit History 470 Commits
docs		docs
screenshots		screenshots
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
Vernacula.slnx		Vernacula.slnx
install.sh		install.sh
uninstall.sh		uninstall.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vernacula

Demo

Highlights

Model conversion pipelines

Quick start

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vernacula

Demo

Highlights

Model conversion pipelines

Quick start

Documentation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages