webbench

benchmarks small language models (0.6B–8B) on MMLU-style multiple-choice questions, entirely in the browser. the model is downloaded to your device and runs on your GPU through WebLLM and WebGPU — no server-side inference, no API keysz

you get an accuracy score with a 95% Wilson confidence interval, the full raw output for every question, and a shareable report. runs are anonymously added to a public results page.

stack

next.js, react, tailwind, framer motion, zustand, @mlc-ai/web-llm, supabase.

questions from the MMLU benchmark (Hendrycks et al., 2021). in-browser inference via WebLLM.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
public/logos		public/logos
scripts		scripts
src		src
supabase/migrations		supabase/migrations
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
components.json		components.json
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

webbench

stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

webbench

stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages