Skip to content

christopherthompson81/vernacula

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

470 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vernacula

A .NET 10 speech pipeline library and toolset for local, offline inference using ONNX models.
No cloud. No telemetry. Runs entirely on your hardware.

Vernacula-Desktop

Vernacula converts audio into accurate, multi-speaker transcripts on your own computer. It ships as a reusable library (Vernacula.Base), a command-line tool (Vernacula.CLI), and a cross-platform desktop app (Vernacula-Desktop, built on Avalonia UI).

Powered by NVIDIA's Parakeet TDT v3 and Sortformer by default, with optional pluggable backends (Cohere Transcribe, Qwen3-ASR, VibeVoice-ASR, Granite Speech 4.1). Parakeet v3 posts a Word Error Rate of 4.85 on Google's FLEURS benchmark. Most modern computers will transcribe one hour of audio in about five minutes; GPU-accelerated systems are significantly faster.

Demo

vernacula-desktop_usage_demo.mp4

Results view

More screenshots and a feature tour live in docs/desktop-app.md.

Highlights

  • Local, private transcription — audio never leaves your computer
  • Multi-speaker detection — identifies and labels up to four concurrent speakers
  • No audio length limits — streaming and segmentation handle indefinite file lengths
  • Transcript editor with confidence colouring, audio playback, and word-level timestamps
  • Pluggable ASR backends — Parakeet TDT v3, Cohere Transcribe, Qwen3-ASR, VibeVoice-ASR, Granite Speech 4.1
  • Shallow KenLM fusion for domain-specific English (general, medical)
  • Export to XLSX, CSV, JSON, SRT, Markdown, DOCX, and SQLite
  • GPU acceleration via CUDA (DirectML on Windows), with automatic CPU fallback
  • 52 languages covered across the four backends — see the support matrix

Model conversion pipelines

Vernacula's models are converted in-house from upstream PyTorch / NeMo / HuggingFace checkpoints into the ONNX contract its C# inference code expects. The export tooling lives in scripts/ and is usable independently of the rest of the project — the export scripts are dev-time only and never ship as a runtime dependency.

Model Source Export tooling
Parakeet TDT v3 / RNNT nvidia/parakeet-tdt-0.6b-v3 scripts/nemo_export
Sortformer streaming diarization nvidia/diar_sortformer_4spk-v2.1 scripts/nemo_export
Silero VAD snakers4/silero-vad scripts/nemo_export
Qwen3-ASR Qwen/Qwen3-ASR-0.6B, Qwen3-ASR-1.7B scripts/qwen3asr_export
Cohere Transcribe CohereLabs/cohere-transcribe-03-2026 scripts/cohere_export
VibeVoice-ASR microsoft/VibeVoice-ASR-HF scripts/vibevoice_export
Granite Speech 4.1 ibm-granite/granite-speech-4.1-2b scripts/granite_export
DeepFilterNet3 (streaming) Rikorose/DeepFilterNet scripts/deepfilternet3_export
DiariZen + WeSpeaker BUTSpeechFIT/DiariZen scripts/diarizen_export
VoxLingua107 (language ID) speechbrain/lang-id-voxlingua107-ecapa scripts/voxlingua107_export

Most of these graphs (split KV-cache decoders, transducer/TDT decoder state, streaming GRU hidden-state I/O, six-input Sortformer chunked diarization) require non-trivial graph surgery beyond torch.onnx.export defaults. Each export folder has its own README with the contract, parity checks, and tuning notes.

A KenLM build pipeline for Parakeet shallow fusion lives in scripts/kenlm_build; an in-progress IndicConformer export spike is in scripts/indicconformer_export.

Quick start

Install prerequisites.NET 10 SDK plus FFmpeg. Full setup (including GPU) is in docs/installation.md.

Run the desktop app:

cd src/Vernacula.Avalonia
dotnet run

On Linux, ./install.sh from the repo root builds a self-contained package and registers a .desktop entry.

Run the CLI:

dotnet run --project src/Vernacula.CLI -p:EP=Cuda -- \
  --audio meeting.wav --model ~/models/vernacula

Full argument reference and more examples in docs/cli-reference.md. Build configurations (CUDA / CPU / DirectML) in docs/building.md.

Documentation

Full documentation lives in docs/.

Getting started

Reference

Project

License

  • Vernacula.Base and Vernacula.CLIMIT
  • Vernacula.AvaloniaPolyForm Shield 1.0.0 (free to use and build; may not be used to create a competing commercial product)
  • Model weights — see respective HuggingFace repository licenses

See docs/licensing.md for the full breakdown.