hf-perftest

Performance benchmarking tool for Gradio apps. Measures per-phase request timing, load tests with concurrent users, and supports mixed background traffic simulation.

Installation

pip install hf-perftest

Or install from source:

git clone https://github.com/gradio-app/hf-perftest.git
cd hf-perftest
pip install -e .

AI Coding Tool Integration

Install the hf-perftest skill/rules for your AI coding tool:

# Install for all supported tools (Claude Code, Cursor, Codex)
hf-perftest install-skill

# Or pick one
hf-perftest install-skill claude
hf-perftest install-skill cursor
hf-perftest install-skill codex

This writes the appropriate skill/rules file into your project so your AI assistant knows how to use hf-perftest.

Quick Start

Run a benchmark against a Gradio app:

hf-perftest run \
    --app apps/echo_text.py \
    --tiers 1,10,100 \
    --requests-per-user 10 \
    --output-dir benchmark_results/my_run

This will:

Launch the app with GRADIO_PROFILING=1
Run warmup requests
For each tier, fire N concurrent users in burst mode for 10 rounds
Collect client latencies and server-side per-phase traces
Save results to benchmark_results/my_run/<timestamp>/

Commands

`hf-perftest run` — Local Benchmarks

# Minimal
hf-perftest run --app apps/echo_text.py

# Full options
hf-perftest run \
    --app apps/echo_text.py \
    --tiers 1,10,100 \
    --requests-per-user 50 \
    --mode burst \
    --concurrency-limit 1 \
    --mixed-traffic \
    --num-workers 2 \
    --output-dir results

Options

--app               Path to the Gradio app to test (required)
--tiers             Comma-separated concurrency tiers (default: 1,10,100)
--requests-per-user Number of rounds per tier (default: 10)
--mode              Load pattern: "burst" or "wave" (default: burst)
--concurrency-limit Concurrency limit for the app (default: 1, use "none" for unlimited)
--mixed-traffic     Run background traffic (page loads, uploads, downloads) alongside predictions
--num-workers       Number of Gradio workers via GRADIO_NUM_WORKERS (default: 1)
--output-dir        Output directory (default: benchmark_results)
--port              Port for the Gradio app (default: 7860)
--api-name          API endpoint name (auto-detected if not specified)

Burst vs Wave Mode

burst: All N users fire simultaneously per round. Measures worst-case queue contention.
wave: Each user waits a random 0–500ms jitter before firing. Simulates realistic staggered traffic.

Mixed Traffic

With --mixed-traffic, background workers run alongside predictions to simulate realistic server load:

Page loads: GET /, /config, /gradio_api/info, /theme.css, plus discovered JS/CSS assets
Uploads: POST /gradio_api/upload with files from sample-inputs/
Downloads: GET /gradio_api/file=... for static files

`hf-perftest run-remote` — Remote Benchmarks on HF Jobs

Submit benchmarks to HF Jobs infrastructure for reproducible results on standardized hardware.

Single Branch

hf-perftest run-remote run \
    --apps apps/echo_text.py apps/streaming_chat.py \
    --branch main \
    --tiers 1,10,100 \
    --requests-per-user 50 \
    --hardware cpu-upgrade

A/B Testing Two Branches

hf-perftest run-remote ab \
    --apps apps/echo_text.py apps/file_heavy.py \
    --base main \
    --branch my-optimization \
    --tiers 1,10,100 \
    --requests-per-user 50 \
    --hardware cpu-upgrade

Profiling a HF Space

hf-perftest run-remote run \
    --apps mrfakename/z-image-turbo \
    --sidecar apps/z-image-turbo.prompts.json \
    --api-name /generate_image \
    --branch main \
    --hardware gpu-l4-1

Remote Runner Options

--apps              Paths to local Gradio app files or a HF Space ID (required)
--branch/--base     Git branches to benchmark (resolved to commit SHA)
--commit            Direct commit SHA (overrides --branch)
--hardware          HF Jobs hardware flavor (default: cpu-basic)
--tiers             Comma-separated concurrency tiers (default: 1,10,100)
--requests-per-user Rounds per tier (default: 10)
--mode              Load pattern: "burst" or "wave" (default: burst)
--concurrency-limit App concurrency limit (default: 1)
--mixed-traffic     Run background traffic alongside predictions
--num-workers       Number of Gradio workers (default: 1)
--sidecar           Sidecar prompt files (.prompts.json) to upload alongside apps
--timeout           Job timeout (default: 90m)
--run-name          Human-readable label (default: auto-generated)
--dry-run           Print generated script without submitting

`hf-perftest result-schema` — Result Directory Structure

hf-perftest result-schema

Prints the structure of the results directory.

Test Apps

App	What it tests
`echo_text.py`	`lambda x: x` — pure framework overhead
`file_heavy.py`	256x256 random image — exercises postprocess serialization
`image_to_image.py`	Image identity — tests file upload + download
`stateful_counter.py`	`gr.State` with dict — tests session state handling
`streaming_chat.py`	`ChatInterface` generator with 6 yields — tests streaming
`llm_chat.py`	LLM chat via HF Inference API — real-world chat workload
`text_to_image.py`	Text-to-image via HF Inference API — real-world image gen

Server-Side Profiling

The instrumentation (enabled via GRADIO_PROFILING=1) traces six phases per request:

Phase	What it measures
`queue_wait`	Time from event creation to processing start
`preprocess`	Input deserialization
`fn_call`	User function execution (accumulated across generator yields)
`postprocess`	Output serialization (e.g. numpy → image → file cache)
`streaming_diff`	Computing incremental diffs for streaming output
`total`	Wall-clock time for the full `process_api()` call

Profiling endpoints (only available when GRADIO_PROFILING=1):

curl http://localhost:7860/gradio_api/profiling/traces | python -m json.tool
curl http://localhost:7860/gradio_api/profiling/summary | python -m json.tool
curl -X POST http://localhost:7860/gradio_api/profiling/clear

Monitoring Remote Jobs

hf jobs logs <job_id>
hf jobs inspect <job_id>

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.claude/skills/hf-perftest		.claude/skills/hf-perftest
apps		apps
dist		dist
hf_perftest		hf_perftest
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hf-perftest

Installation

AI Coding Tool Integration

Quick Start

Commands

`hf-perftest run` — Local Benchmarks

Options

Burst vs Wave Mode

Mixed Traffic

`hf-perftest run-remote` — Remote Benchmarks on HF Jobs

Single Branch

A/B Testing Two Branches

Profiling a HF Space

Remote Runner Options

`hf-perftest result-schema` — Result Directory Structure

Test Apps

Server-Side Profiling

Monitoring Remote Jobs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

hf-perftest

Installation

AI Coding Tool Integration

Quick Start

Commands

hf-perftest run — Local Benchmarks

Options

Burst vs Wave Mode

Mixed Traffic

hf-perftest run-remote — Remote Benchmarks on HF Jobs

Single Branch

A/B Testing Two Branches

Profiling a HF Space

Remote Runner Options

hf-perftest result-schema — Result Directory Structure

Test Apps

Server-Side Profiling

Monitoring Remote Jobs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`hf-perftest run` — Local Benchmarks

`hf-perftest run-remote` — Remote Benchmarks on HF Jobs

`hf-perftest result-schema` — Result Directory Structure

Packages