Skip to content

FudanCVL/Unison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

96 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Unison: Benchmarking Unified Multimodal Models via Synergistic Understanding and Generation

Jinyu Liu, Xincheng Shuai, Henghui Ding, Yu-Gang Jiang

Fudan University

Β Β Β Β  Β Β Β Β  Β Β Β Β  Β Β Β Β 

TL;DR: Unison evaluates Unified Multimodal Models (UMMs) by leveraging the synergy between understanding and generation capabilities across four comprehensive dimensions. The automatic evaluation model Unison-Judge achieves an 88.7% alignment with human judgments.

πŸ”₯ Updates

βœ… TODO

  • Inference and evaluation scripts
  • Unison Benchmark data and Unison-Judge model weights
  • The UMM toolkit TorchUMM will support Unison in the last few days
  • Evaluation results for more recent open-source models (Emu3.5, Ovis-U1, Ming series etc.) and the latest closed-source models (GPT-5.5 and Gemini 3.1 series)

πŸ“¬ Contact: If you have any questions, feel free to contact us at liujy24@m.fudan.edu.cn.

πŸ“– Overview

Unison Overview

πŸ“Š Evaluation Results

Open-Source Unified Multimodal Models

Model Params Internal Consistency Und.-Guided Gen. Gen-Guided Und. Mutual Enhancement Overall
Und.Gen.Uni. Und.Gen.Uni. Und.Gen.Uni. Und.Gen.Uni.
Show-o1.3B88.364.758.58.90--12.0------
Janus-Pro1.5B94.447.145.00.3--19.2------
Show-o21.5B96.067.965.826.7--9.4------
D-DiT2B86.565.058.10.2--6.8------
ILLUME+3B43.419.910.510.37.79.011.330.115.11.05.53.29.4
Janus-Pro7B95.771.769.83.2--15.1------
Show-o27B97.273.872.59.9--9.2------
ILLUME+7B80.220.416.712.410.411.411.327.713.92.76.84.811.7
OmniGen2 πŸ₯ˆ7B92.379.074.561.342.652.019.741.930.945.050.347.751.3
TokenFlow14B93.047.144.520.1--17.0------
BAGEL πŸ₯‡14B96.082.580.357.678.167.928.241.632.07.257.732.553.2
SEED-X17B82.838.934.218.613.716.113.527.420.80.216.88.519.9
UniWorld-V1 πŸ₯‰19B92.668.565.163.426.444.922.832.026.946.416.231.342.1

Closed-Source Models

Model Params Internal Consistency Und.-Guided Gen. Gen-Guided Und. Mutual Enhancement Overall
Und.Gen.Uni. Und.Gen.Uni. Und.Gen.Uni. Und.Gen.Uni.
Gemini 3 Pro-98.388.186.971.082.876.942.246.543.965.377.471.469.8
GPT-5.2-98.686.384.769.785.777.744.858.252.769.171.270.271.3

πŸ“¦ Data Preparation

Download Unison-Bench from HuggingFace into data/ at the repo root:

huggingface-cli download FudanCVL/Unison \
    --repo-type dataset --local-dir data/

The expected layout:

Unison/
└── data/
    β”œβ”€β”€ Internal_Consistency/
    β”œβ”€β”€ Und_Guided_Gen/
    β”œβ”€β”€ Gen_Guided_Und/
    └── Mutual_Enhancement/

Both launch scripts default to DATA_DIR=../data, so no extra flags are needed. To use a different path, pass --data-dir /path/to/data or set DATA_DIR.

πŸ› οΈ Installation

Step 1 β€” Base environment

cd Inference_Pipeline
UMM=/data/Unified_Models ./setup_envs.sh base

Creates the unison conda env from the root requirements.txt. This env covers both the inference and the evaluation pipeline.

Step 2 β€” Per-model environments

# All models at once
UMM=/data/Unified_Models ./setup_envs.sh

# Or selected models
UMM=/data/Unified_Models ./setup_envs.sh bagel omnigen2
Group conda env Upstream repo
bagel bagel ByteDance-Seed/Bagel
janus janus deepseek-ai/Janus
omnigen2 omnigen2 VectorSpaceLab/OmniGen2
seedx seedx AILab-CVC/SEED-X
showo showo2 showlab/Show-o
tokenflow tokenflow ByteVisionLab/TokenFlow
uniworld univa PKU-YuanGroup/UniWorld
illume illume illume-unified-mllm/ILLUME_plus
ddit d-dit zijieli-Jlee/Dual-Diffusion

Each group clones its upstream repo into $UMM/<Repo> and installs it into the corresponding conda env. The script is idempotent; logs go to setup_logs/.

πŸ€— Model Weights

Benchmark model weights

Model configs in Inference_Pipeline/config/*.json reference local weight paths using the placeholder root /path/to/Unified_Models/.... Edit each config to point at your local checkout, e.g.:

{
  "model_name": "UniWorld-V1",
  "model_path": "/path/to/Unified_Models/UniWorld/UniWorld-V1/model_weights/UniWorld-V1",
  "api_type": "uniworld",
  "conda_env": "univa",
  "capabilities": ["understanding", "generation", "editing"]
}

download_weights.sh fetches weights for all model backends. Set the local weight root and pick models:

UMM=/data/Unified_Models ./download_weights.sh                 # everything
UMM=/data/Unified_Models ./download_weights.sh bagel showo1    # selected groups

Gated repos (FLUX.1-dev, SD3) need huggingface-cli login + license acceptance. Run setup_envs.sh and download_weights.sh with the same UMM so code and weights share one root.

Unison-Judge

The default evaluation backend runs Unison-Judge.

Where to put it: download the checkpoint into Evaluation_Pipeline/unison-judge/. That is the default path used by evaluate_unison.py and run_evaluate_unison.sh, so no flags are needed:

Unison/
└── Evaluation_Pipeline/
    └── unison-judge/            # <- put Unison-Judge weights here
        β”œβ”€β”€ config.json
        β”œβ”€β”€ model-*.safetensors
        └── ...

To keep it elsewhere, set LOCAL_JUDGE_MODEL=/path/to/judge or pass --local-model-path /path/to/judge. No local judge weights are needed when using the api backend.

πŸš€ Inference

cd Inference_Pipeline

# Run all tasks on one model
GPUS=0,1,2,3,4,5,6,7 MODELS=BAGEL-7B-MoT TASKS=IC,UGG,GGU,ME ./run.sh

# Select tasks or test with 2 items
GPUS=0,1,2,3 MODELS=UniWorld-V1 TASKS=IC,GGU ./run.sh
GPUS=0 MODELS=Janus-Pro-7B TEST_MODE=true ./run.sh

Results are written to result/<ModelName>/<TaskID>/<TaskID>_<ModelName>_results.csv.

πŸ“ Evaluation

cd Evaluation_Pipeline

# Local judge (default) β€” uses Unison-Judge weights
GPU_IDS=0,1,2,3 MODELS=BAGEL-7B-MoT ./run_evaluate_unison.sh

# Select tasks or evaluate several models at once
MODELS=BAGEL-7B-MoT TASKS=IC,GGU ./run_evaluate_unison.sh
MODELS="BAGEL-7B-MoT,UniWorld-V1" GPU_IDS=0,1,2,3,4,5,6,7 ./run_evaluate_unison.sh

# Closed-source model API judge
JUDGE_BACKEND=api OPENAI_API_KEY=sk-... MODELS=UniWorld-V1 ./run_evaluate_unison.sh

# Aggregate results across models
python aggregate_results.py   # -> evaluation_summary.json

Output per model: eval_<ModelName>.json.

πŸ™ Acknowledgement

We sincerely thank the open-source community for their outstanding contributions. Unison-Judge is built upon Qwen3-VL. The evaluated models, including BAGEL, UniWorld, OmniGen2, Show-o, Janus-Pro, SEED-X, TokenFlow, ILLUME+, and D-DiT et al., form the foundation of this benchmark. We are grateful to all the authors for making their work publicly available.

πŸ“ Citation

If you find this work useful, please cite:

@inproceedings{liu2026unison,
  title     = {{Unison}: Benchmarking Unified Multimodal Models via Synergistic Understanding and Generation},
  author    = {Liu, Jinyu and Shuai, Xincheng and Ding, Henghui and Jiang, Yu-Gang},
  booktitle = {International Conference on Machine Learning},
  year      = {2026}
}

About

[ICML 2026] Unison: Benchmarking Unified Multimodal Models via Synergistic Understanding and Generation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors