Skip to content

emb-ai/mllm-course

Repository files navigation

Multimodal Large Language Models (MLLM)

Rules & FAQ

Class recordings (in Russian)

YouTube VKVideo

Class Materials

# Date Title Materials
1 Feb 11 Word Embeddings and Classification & Language Modelling slides
Embeddings & CNN/LSTM LMs with PyTorch notebook
2 Feb 18 Seq2seq, Attention, and Transformers slides
Transformer from Scratch notebook
3 Feb 25 Pretraining, SFT, RLHF & PEFT, LoRA slides
Parameter-efficient fine-tuning notebook
4 Mar 4 Reasoning, RLVF & RAG slides
Tokenization notebook
6 Mar 18 Introduction to MLLMs and Image Modality slides
Classification of VLMs: Deep Fusion vs Early Fusion notebook
7 Mar 25 VLLM and Data Generation slides
Visual Autoregressive Transformer notebook
8 Apr 1 Video Understanding slides
Video Modality and Any-to-any Models notebook
9 Apr 8 Action Modality (Robotics) slides
Intro to Vision Language Action Models notebook
10 Apr 15 Intelligent Document Processing (IDP) и UI Agents slides
Agentic Workflow notebook
11 Apr 22 3D Data Modality slides
VLM Grounding notebook
12 Apr 29 Efficient Inference: FlashAttention, KV cache, Distillation, Quantization slides
KV cache, Quantization notebook

About

Multimodal LLMs course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors