Skip to content

fhalde/howmanygpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPU fleet capacity planning for LLM Inference

"GPU poor" is not a lifestyle – it's just a capacity planning mistake.

Running LLMs at scale without thinking about throughput, bandwidth, and KV cache is how you end up either (a) burning money, and (b) under the bridge.

This toolkit helps you avoid both.

It answers a simple question: how many GPUs do you actually need to serve an LLM at your target load? Under the hood, it combines closed-form capacity floors, a discrete-event simulator (simpy), packaged as a streamlit app.

Live demo , Blog post

Quick start

uv sync
uv run streamlit run src/howmanygpus/main.py

References

The formulas and framing draw on:

About

GPU fleet capacity planning for LLM Inference

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages