Ruiqi Shen*1 · Guangquan Jie*1 . Chang Liu2✉️ · Henghui Ding1✉️
1Fudan University 2Shanghai University of Finance and Economics
SAM2Matting is a generalized matting framework that decouples high-level tracking from dedicated low-level matting. It supports diverse prompts for robust image & video matting of any open-world targets.
🎥 For more visual results, slider comparisons, and demo, visit our project page.
- Decoupled design — VOS tracker for temporal consistency + ROI Detection & Progressive Matting for fine details
- Image-only training, video SOTA — Strong zero-shot video matting without costly video matting datasets
- Diverse prompts — Masks, points, boxes, text
- Open-world generalization — Humans, animals, anime, translucent objects, rapid-motion scenes
- ✅ Release checkpoints of different variants.
- ✅ Release inference code and interactive demo.
- ⬜ Release training code.
We provide three variants of SAM2Matting based on different VOS trackers.
| Backbone Tracker | Hugging Face |
|---|---|
| SAM2.1-T | |
| SAM2.1-B+ | |
| SAM3 |
By default, place all checkpoints under the checkpoints/ directory.
# clone the repo and enter directory
git clone https://github.com/FudanCVL/SAM2Matting.git
cd SAM2Matting
# create and activate conda environment
conda create -n sam2matting python=3.10 -y
conda activate sam2matting
# install required packages
pip install -r requirements.txtWe provide separate inference scripts for image and video matting (given initial-frame mask), organized by tracker family:
| Task | SAM2 variants | SAM3 variant |
|---|---|---|
| Image matting | inference_image_sam2.py |
inference_image_sam3.py |
| Video matting | inference_video_sam2.py |
inference_video_sam3.py |
For video matting, use --save_mp4 to save video, and optionally use --compiled to enable compilation (first-time may be slow), such as:
python inference_video_sam2.py --save_mp4
python inference_video_sam2.py --save_mp4 --compiledYou can replace the samples with your own image or video.
SAM2Matting supports interactive prompt types beyond masks, including point, box (SAM2 & SAM3), and text (SAM3), run the code below:
python interactive_sam2.py (point by default)
python interactive_sam3.py (text by default)We are inspired by the following excellent works: SAM2, SAM3, MatAnyone, and many other not listed.
If you find SAM2Matting useful in your research, please consider citing:
@inproceedings{SAM2Matting,
title={{SAM2Matting}: Generalized Image and Video Matting},
author={Shen, Ruiqi and Jie, Guangquan and Liu, Chang and Ding, Henghui},
booktitle={European Conference on Computer Vision (ECCV)},
year={2026}
}SAM2Matting is licensed under CC BY-NC-SA 4.0 for non-commercial research use only. For uses beyond this license, please contact henghui.ding[AT]gmail.com.
