SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models (ICML'26 Spotlight)

Official implementation of "SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models".

_{* Equal contribution † Corresponding author}

[Paper] [Project Page]

🧭 Overview

TL;DR. VLA models have emerged as a promising paradigm for general-purpose robotic control, with test-time scaling (TTS) gaining attention to enhance robustness beyond training. However, existing TTS methods require additional training, verifiers, and multiple forward passes, making them impractical for deployment. Moreover, they intervene only at action decoding while keeping visual representations fixed—insufficient under perceptual ambiguity. We propose SCALE, a simple inference strategy that jointly modulates visual perception and action based on self-uncertainty, inspired by Active Inference theory—requiring no additional training, no verifier, and only a single forward pass.

SCALE conditions inference on the model's own self-uncertainty u, estimated from the output logits as the expected log-likelihood ratio between a one-hot (full-certainty) and a uniform (full-ambiguity) reference distribution:

u_k = D_KL(p || q_low) − D_KL(p || q_high)

and uses it to jointly modulate two things in a single forward pass :

Adaptive Action Decoding : per-token sampling temperature τ_k = T₀ · σ(u_k) — near-greedy when confident, explorative when uncertain.
Adaptive Visual Attention : vision-encoder attention temperature γ_t = κ^tanh(Δu_{t-1}) — sharpens focus when confident (γ < 1), broadens exploration when uncertain (γ > 1).

This repository provides the OpenVLA on LIBERO instantiation of SCALE used in the paper.

🔊 Updates

Release the paper on arXiv
Open the project page for SCALE!
Release the code for SCALE (OpenVLA + LIBERO)

📦 Installation

# 1) Create and activate the environment
conda create -n scale python=3.10 -y
conda activate scale

# 2) Install PyTorch + CUDA toolchain
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 cuda-compiler -c pytorch -c nvidia -y

# 3) Install this package (its deps pin torch==2.2.0 / torchvision==0.17.0 / torchaudio==2.2.0)
pip install -e .

# 4) Install Flash-Attention 2
pip install packaging ninja
pip install "flash-attn==2.5.5" --no-build-isolation

# 5) Clone & install LIBERO
git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
pip install -e LIBERO --config-settings editable_mode=compat

# 6) Install LIBERO's extra runtime requirements, then pin compatible versions
pip install -r experiments/robot/libero/libero_requirements.txt
pip install numpy==1.26.4 mujoco==3.3.2

🚀 Evaluation

Pick the GPU with CUDA_VISIBLE_DEVICES. The simplest entry point:

# SCALE (paper defaults from configs/scale.yaml).
# The checkpoint is auto-selected from --task_suite (pass --pretrained_checkpoint only to override).
CUDA_VISIBLE_DEVICES=0 python experiments/robot/libero/run_libero_eval.py \
    --task_suite libero_10 \
    --decoding_mode scale

Decoding modes

--decoding_mode selects the inference strategy. Supported modes and their flags:

Mode	Description	Flags (paper baseline config)
`greedy`	Standard greedy / argmax (OpenVLA default)	—
`temp`	Temperature sampling	`--temperature 1.0`
`topk`	Top-k sampling	`--top_k 40 --temperature 0.7`
`topp`	Top-p (nucleus) sampling	`--top_p 0.9`
`scale`	SCALE (ours)	reads `configs/scale.yaml` (override with `--T0`, `--kappa`, ...)

Examples:

# Greedy baseline
CUDA_VISIBLE_DEVICES=0 python experiments/robot/libero/run_libero_eval.py \
    --task_suite libero_spatial --decoding_mode greedy

# Top-k baseline (k=40, t=0.7)
CUDA_VISIBLE_DEVICES=0 python experiments/robot/libero/run_libero_eval.py \
    --task_suite libero_object --decoding_mode topk --top_k 40 --temperature 0.7

Or use the convenience wrapper:

# bash experiments/robot/libero/run_libero_eval.sh <GPU_ID> <TASK_SUITE> <DECODING_MODE>
bash experiments/robot/libero/run_libero_eval.sh 0 libero_10 scale

Per-episode logs are written to experiments/logs/ and replay videos to rollouts/ (disable with --save_video False).

⚙️ SCALE hyperparameters

The SCALE defaults live in configs/scale.yaml and follow the paper. Names are aligned with the paper and any key can be overridden on the command line, e.g. --T0 1.0 --kappa 2.0 --alpha 0.8.

📖 Citation

@inproceedings{choi2026scale,
  title={SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models},
  author={Hyeonbeom Choi and Daechul Ahn and Youhan Lee and Taewook Kang and Seongwon Cho and Jonghyun Choi},
  booktitle={ICML},
  year={2026}
}

🙏 Acknowledgements

This codebase is built on top of OpenVLA and LIBERO.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
configs		configs
experiments/robot		experiments/robot
prismatic		prismatic
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models (ICML'26 Spotlight)

🧭 Overview

🔊 Updates

📦 Installation

🚀 Evaluation

Decoding modes

⚙️ SCALE hyperparameters

📖 Citation

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models (ICML'26 Spotlight)

🧭 Overview

🔊 Updates

📦 Installation

🚀 Evaluation

Decoding modes

⚙️ SCALE hyperparameters

📖 Citation

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages