Skip to content

FudanCVL/SAM2Matting

Repository files navigation

SAM2Matting: Generalized Image and Video Matting

Ruiqi Shen*1 · Guangquan Jie*1 . Chang Liu2✉️ · Henghui Ding1✉️

1Fudan University    2Shanghai University of Finance and Economics  

Project Page arXiv Hugging Face Models

SAM2Matting is a generalized matting framework that decouples high-level tracking from dedicated low-level matting. It supports diverse prompts for robust image & video matting of any open-world targets.

SAM2Matting qualitative results on fast motion and non-human targets

🎥 For more visual results, slider comparisons, and demo, visit our project page.

✨ Highlights

  • Decoupled design — VOS tracker for temporal consistency + ROI Detection & Progressive Matting for fine details
  • Image-only training, video SOTA — Strong zero-shot video matting without costly video matting datasets
  • Diverse prompts — Masks, points, boxes, text
  • Open-world generalization — Humans, animals, anime, translucent objects, rapid-motion scenes

📋 TODO

  • ✅ Release checkpoints of different variants.
  • ✅ Release inference code and interactive demo.
  • ⬜ Release training code.

🧠 Checkpoints

We provide three variants of SAM2Matting based on different VOS trackers.

Backbone Tracker Hugging Face
SAM2.1-T
SAM2.1-B+
SAM3

By default, place all checkpoints under the checkpoints/ directory.

⚙️ Installation

# clone the repo and enter directory
git clone https://github.com/FudanCVL/SAM2Matting.git
cd SAM2Matting

# create and activate conda environment
conda create -n sam2matting python=3.10 -y
conda activate sam2matting

# install required packages
pip install -r requirements.txt

🚀 Inference

We provide separate inference scripts for image and video matting (given initial-frame mask), organized by tracker family:

Task SAM2 variants SAM3 variant
Image matting inference_image_sam2.py inference_image_sam3.py
Video matting inference_video_sam2.py inference_video_sam3.py

For video matting, use --save_mp4 to save video, and optionally use --compiled to enable compilation (first-time may be slow), such as:

python inference_video_sam2.py --save_mp4
python inference_video_sam2.py --save_mp4 --compiled

You can replace the samples with your own image or video.

🎮 Interactive Demo

SAM2Matting supports interactive prompt types beyond masks, including point, box (SAM2 & SAM3), and text (SAM3), run the code below:

python interactive_sam2.py (point by default)
python interactive_sam3.py (text by default)

📚 Acknowledgements & Citation

We are inspired by the following excellent works: SAM2, SAM3, MatAnyone, and many other not listed.

If you find SAM2Matting useful in your research, please consider citing:

@inproceedings{SAM2Matting,
  title={{SAM2Matting}: Generalized Image and Video Matting},
  author={Shen, Ruiqi and Jie, Guangquan and Liu, Chang and Ding, Henghui},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2026}
}

⚖️ License

SAM2Matting is licensed under CC BY-NC-SA 4.0 for non-commercial research use only. For uses beyond this license, please contact henghui.ding[AT]gmail.com.

About

[ECCV 2026] SAM2Matting: Generalized Image and Video Matting

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors