Skip to content

kaustesseract/DexAvatar

Repository files navigation

DexAvatar: 3D Sign Language Reconstruction with Hand and Body Pose Priors [WACV'26 πŸ† Best Paper Award Finalist πŸ†]

My Image

My Image

The official repository of the paper with supplementary: DexAvatar

conda create -n dexavatar -y python=3.10
conda activate dexavatar
bash scripts/env_install.sh
bash scripts/bug_fix_dexavatar.sh
conda deactivate

Download the signfy frames from this link and place them in the ./data folder. For evaluation, also download the smplxgt files.

The folder structure should be as follows:

data/
└── images_sgnify/
    β”œβ”€β”€ sign1/
    β”‚   └── images/
    β”‚       β”œβ”€β”€ Img1.png
    β”‚       β”œβ”€β”€ Img2.png
    β”‚       └── ...
    β”œβ”€β”€ sign2/
    β”‚   └── images/
    β”‚       β”œβ”€β”€ Img1.png
    β”‚       β”œβ”€β”€ Img2.png
    β”‚       └── ...
    └── ...

The sign segmentations and the corresponding classes for each sign are already present in the ./data folder for SGNify dataset. If you want to have your own sign segmentations and classes for each sign, please generate them from the previous work in this link.

For Sapiens

Install sapiens lite from the original sapiens github repo. Please create a new environment called sapiens_lite by following their instructions. Please download the checkpoint of rtmpose from google drive and place them in the following directory structure.

sapiens/
└── lite/
    └── torchscript/
        β”œβ”€β”€ detector/
        β”‚   └── checkpoints/
        β”‚       └── rtmpose/
        β”‚           └── rtmdet_m_8xb32-100e_coco-obj365-person-235e8209.pth
        └── pose/
            └── checkpoints/
                └── sapiens_1b/
                    └── sapiens_1b_coco_wholebody_best_coco_wholebody_AP_727_torchscript.pt2

For SMPLer-X

conda create -n smpler_x python=3.8 -y
conda activate smpler_x
pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install mmcv-full==1.7.1 -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.12.0/index.html
pip install -r preprocess/SMPLer-X/requirements.txt
cd preprocess/SMPLer-X/main/transformer_utils
pip install -v -e .
cd ../../../../
pip install setuptools==69.5.1 yapf==0.40.1 numpy==1.23.5
bash scripts/bug_fix_dexavatar.sh

Please download the following checkpoints and smplx files from the google drive and place them in the following directory structure.

DexAvatar/
β”œβ”€β”€ checkpoints/
β”‚   β”œβ”€β”€ smpler_x_h32.pth.tar
β”‚   └── mmdet/
β”‚       β”œβ”€β”€ faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
β”‚       └── mmdet_faster_rcnn_r50_fpn_coco.py
└── SMPLer-X/
    └── common/
        └── utils/
            └── human_model_files/

Please download the SignBPoser and SignHPoser from the google drive and place them in the following directory structure.

dexavatar_fitting/
└── smplifyx/
    β”œβ”€β”€ signbposer/
    └── signhposer/

Run Fitting Code

Run the following command to execute the code:

python run_dexavatar.py --input_img_folder DATA_PATH --output_path OUTPUT_FOLDER --fitting_experiment ./dexavatar_fitting

About the project

This project is carried out at the Human-Centered AI Lab in the Faculty of Information Technology, Monash University, Melbourne (Clayton), Australia.

Project Members -

Kaustubh Kundu (Monash University, Melbourne, Australia),
Hrishav Bakul Barua (Monash University and TCS Research, Kolkata, India),
Lucy Robertson-Bell (Monash University, Melbourne, Australia),
Zhixi Cai (Monash University, Melbourne, Australia), and
Kalin Stefanov (Monash University, Melbourne, Australia)

Funding details

This work is supported by the prestigious Discovery Early Career Researcher Award (DECRA) fellowship by Australian Research Council (ARC) [Grant no. DE230100049 | Project: Towards automated Australian Sign Language translation]. We also acknowledge Monash University (M3 Cluster) and National Computational Infrastructure (NCI) for providing High Performance Computing (HPC) to carry out experiments.

Overview

The trend in sign language generation is centered around data-driven generative methods. These methods require vast amounts of precise 2D and 3D human pose data to achieve a generation quality acceptable to the Deaf com- munity. However, currently, most sign language datasets are video-based and limited to automatically reconstructed 2D human poses (i.e., keypoints) and lack accurate 3D in- formation. However, manual production of accurate 2D and 3D human pose information from videos is a labor- intensive process. Furthermore, existing state-of-the-art for automatic 3D human pose estimation from sign language videos is prone to self-occlusion, noise, and motion blur ef- fects, resulting in poor reconstruction quality. In response to this, we introduce DexAvatar, a novel framework to re- construct bio-mechanically accurate fine-grained hand ar- ticulations and body movements from in-the-wild monocu- lar sign language videos, guided by learned 3D hand and body priors. DexAvatar achieves strong performance in the SGNify motion capture dataset, the only benchmark avail- able for this task, reaching an improvement of 35.11% in the estimation of body and hand poses compared to the state- of-the-art.

Overall Architecture

My Image

Qualitative Results (check out the videos!!)

General.mp4

Motion blur cases

blur.mp4

Self-occlusion cases

occlusion.mp4

Gaussian Noise cases

noise.mp4

Citation

If you find our work (i.e., the code, the theory/concept, or the dataset) useful for your research or development activities, please consider citing our work as follows:

@InProceedings{Kundu_2026_WACV,
    author    = {Kundu, Kaustubh and Barua, Hrishav Bakul and Robertson-Bell, Lucy and Cai, Zhixi and Stefanov, Kalin},
    title     = {DexAvatar: 3D Sign Language Reconstruction with Hand and Body Pose Priors},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {March},
    year      = {2026},
    pages     = {5842-5852}
}

License and Copyright

----------------------------------------------------------------------------------------
Copyright 2025 | All the authors and contributors of this repository as mentioned above.
----------------------------------------------------------------------------------------

Please check the License Agreement.

About

WACV 2026: Theory, Experiments, Dataset, and Code for our newly proposed 3D Sign Language Reconstruction method DexAvatar

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors