A lightweight, dual-agent interaction system for 2D shadow puppetry digital humans, powered by LLMs and designed for low-resource environments.
Demo.mp4
Before running the system, make sure the following components are ready:
- Pull the Gemma 3 (4B) model via Ollama:
ollama pull gemma3:4b- Download the VOSK Chinese speech recognition model:
- Go to VOSK Models
- Download the model named
vosk-model-cn-0.22 - Place the unzipped folder inside the
model-cn/directory in this project:
project_root/
├── model-cn/
│ └── vosk-model-cn-0.22/
To start the digital human interaction system with LLM functionality, run:
python main.pyOnce the interface is running, you can interact in two modes:
- Type Chinese text directly into the input box as a prompt.
- Click the Text Mode button to switch to voice interaction.
- Press the Space key to start recording.
- Press Space again to stop recording.
- The recognized speech will appear in the top-left corner of the screen.
maybe you need to manually ollama serve in the terminal to active llm