👿 Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models

Jie Zhang*, Zhongqi Wang, Shiguang Shan, Xilin Chen

*Corresponding Author

We propose TwT, an attack method based on syntactic structures that exhibits strong resistance to advanced detection methods.

🔥 News

[2026/05/07] Our work has accepted by TIFS!🎉🎉🎉

👀 Overview

our approach leverages syntactic structures as backdoor triggers to amplify the sensitivity to textual variations, effectively breaking down the semantic consistency. Besides, a regularization method based on Kernel Maximum Mean Discrepancy (KMMD) is proposed to align the distribution of cross-attention responses between backdoor and benign samples, thereby disrupting attention consistency.

🧙‍♂️ Trigger without Trace

The visualization of cross-attention maps during image generation. TwT generates attacker specified images while effectively mitigating "Assimilation Phenomenon".

Our method accurately recognizes specific syntax, effectively avoiding been identified by pertubation-based method, i.e., UFID. Syntax trigger here is "(DET)(NOUN)(ADP)(DET)(NOUN)(VERB)(ADP)(NOUN)".

🧭 Getting Start

TwT has been implemented and tested on Pytorch 2.2.0 with python 3.10. It runs well on both Windows and Linux.

We recommend you first use conda to create virtual environment, and install pytorch following official instructions.

conda create -n TwT python=3.10
conda activate TwT
python -m pip install --upgrade pip
pip install torch==2.2.0+cu118 torchvision==0.17.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

Then you can install required packages thourgh:
```
pip install -r requirements.txt
```

🏃🏼 Running Scripts

Backdoor Injection

Inject one backdoor w/o pretrained model

 CUDA_VISIBLE_DEVICES=0,1 python backdoor_injection_main.py
     -c './configs/backdoor_invisible/backdoor_1.yaml' \
     -l 1e-2 \
     -t './data/train/backdoor_1.txt'\
     -p False

Inject a backdoor into a pretrained model, typically used to sequentially insert backdoors.

 CUDA_VISIBLE_DEVICES=0,1 python backdoor_injection_main.py
     -c './configs/backdoor_invisible/backdoor_1.yaml' \
     -l 1e-2 \
     -t './data/train/backdoor_1.txt' \
     -p True \
     -pp './results/backdoor_1/'

Checkpoints

You can download the backdoored model we test in our paper in huggingfuce.

ID	Link
backdoor1	[link]
backdoor2	[link]
backdoor3	[link]
backdoor4	[link]

For more types of backdoored model, please refer to models.

Evaluation

FID (Frechet Inception Distance)

# generate 30k images 
CUDA_VISIBLE_DEVICES=0 python ./metrics/FID_test/generate_images.py --backdoor_model backdoor_1 --epoch 599

# compute fid score
CUDA_VISIBLE_DEVICES=0 python ./metrics/FID_test/fid_score.py --path1 ./coco_val.npz --path2 ./backdoor_1/599

ASR (Attack Success Rate)

CUDA_VISIBLE_DEVICES=0 python ./metrics/ASR_test/generate_images_asr.py --backdoor_model backdoor_1 --epoch 599

DSR (Detect Success Rate)

We test our attack methods on three SOTA defense methods, including T2IShield and UFID.

# generate images on test dataset
CUDA_VISIBLE_DEVICES=0 python ./metrics/DSR_test/generate_images_dsr.py --backdoor_model backdoor_1 --epoch 599

# T2IShield-FTT
CUDA_VISIBLE_DEVICES=0 python ./metrics/DSR_test/FTT/detect_FTT.py

# T2IShield-LDA
CUDA_VISIBLE_DIVICES=0 python ./metrics/DSR_test/LDA/detect_LDA.py

# UFID
run UFID_test.ipynb

🔨 Results

TwT achieves an ASR of 97.5%. More results can be found in the paper.

Here we show some qualitative results of TwT. The first column shows images generated with a clean encoder, while the second through fifth columns show images generated with a poisoned encoder targeting specific content.

Trigger syntax below: (DET)(NOUN)(ADP)(DET)(NOUN)(VERB)(ADP)(NOUN)

📄 Citation

If you find this project useful in your research, please consider cite:

@ARTICLE{11527385,
  author={Zhang, Jie and Wang, Zhongqi and Shan, Shiguang and Chen, Xilin},
  journal={IEEE Transactions on Information Forensics and Security}, 
  title={Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models}, 
  year={2026},
  volume={},
  number={},
  pages={1-1},
  keywords={Modeling;Diffusion models;Text to image;Syntactics;Training;Automatic speech recognition;Conferences;Computers;Toxicology;Computer vision;Backdoor Attack;Text-to-Image Diffusion Models;Syntactic Trigger},
  doi={10.1109/TIFS.2026.3695430}}

🤝 Feel free to discuss with us privately!

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
Visualization		Visualization
configs/backdoor_invisible		configs/backdoor_invisible
data		data
losses		losses
metrics		metrics
utils		utils
.gitignore		.gitignore
README.md		README.md
backdoor_injection_main.py		backdoor_injection_main.py
ptp_utils.py		ptp_utils.py
requirements.txt		requirements.txt
seq_aligner.py		seq_aligner.py
text.txt		text.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👿 Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models

🔥 News

👀 Overview

🧙‍♂️ Trigger without Trace

🧭 Getting Start

🏃🏼 Running Scripts

Backdoor Injection

Evaluation

🔨 Results

📄 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

👿 Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models

🔥 News

👀 Overview

🧙‍♂️ Trigger without Trace

🧭 Getting Start

🏃🏼 Running Scripts

Backdoor Injection

Evaluation

🔨 Results

📄 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages