Research Scientist @ Meta AI · London, UK
Video diffusion · generative neural rendering · talking avatars · world models
I'm a Research Scientist at Meta AI London, working on video diffusion models, generative neural rendering, talking avatars, and world models — teaching machines to observe, imagine, and simulate visual reality. The longer-term hope is to make worlds as easy to build as they are to imagine, so anyone can freely express the ones in their head — not only engineers.
I received my Ph.D. in Computer Science from the University of Surrey (supervised by Prof. Tao Xiang and Prof. Yi-Zhe Song, working closely with Dr. Xiatian Zhu), and my Bachelor's in Computer Science (AI) from the University of Malaya.
🌙 Currently building: Yume (夢)
A programmable, explicit world model on Godot — built by Claude, for Claude.
Yume is my take on a question I keep circling back to: what if you could describe a world in plain language and have it materialize into something runnable — without writing any per-world code? (夢 means "dream" in Japanese.)
A world's entities and rules are written as pure JSON; a small fixed interpreter advances that world tick by tick; Godot projects the resulting state to pixels, audio, HUD, or text. The engine ships seven primitives — Entity / Tag / Rule / Trigger / Effect / Query / Relation — and no game-specific code. You describe a world; you never edit the engine.
A world model is just a transition function f(state, action) → next_state. Yume lets you write f as JSON and run it — which makes it useful well beyond games:
| Use | How |
|---|---|
| 🎮 Games | the Godot projection — a playable build |
| 🤖 RL / agent-eval testbeds | deterministic, seedable, gym-like stepping |
| 🏞️ Scene / world generation | prose → 3D scene pipelines |
| 🧠 Training data for neural world models | roll a JSON world out, record (state, action, next_state) trajectories, train a Dreamer/Genie-style implicit model that approximates the same f at scale |
That last row is the thesis: a clean, authorable explicit substrate that bridges to the implicit (neural) world-model world — interpret it directly, and use it as a faucet of reproducible training data.
It's the most fun I've had with a side project in years — the entire repo is written and operated by Claude Code. It's pre-1.0 and experimental, and I'd love for you to take a look. ⭐
| Year | Work | Venue |
|---|---|---|
| 2026 | Kaleido — unified neural rendering via spatial generative models | ICLR 2026 |
| 2026 | Rays as Pixels — joint video generation & camera trajectory estimation | ICML 2026 |
| 2026 | VecGlypher — LLM-based vector glyph generation from text/image | CVPR 2026 |
| 2024 | PartCraft — part-based compositional image generation | ECCV 2024 |
| 2024 | ConceptHash — interpretable hashing via part-based concepts | CVPRW 2024 (Best Paper) |
| 2021 | OrthoHash — one-loss deep hashing with orthogonal centres | NeurIPS 2021 |
| 2019 | DeepIPR — DNN ownership protection via passport layers | NeurIPS 2019 |
📚 Full list on my website and Google Scholar.
📫 kamwoh [at] gmail.com


