Multimodal Large Language Models (MLLM)

#	Date	Title	Materials
1	Feb 11	Word Embeddings and Classification & Language Modelling	slides
1	Feb 11	Embeddings & CNN/LSTM LMs with PyTorch	notebook
2	Feb 18	Seq2seq, Attention, and Transformers	slides
2	Feb 18	Transformer from Scratch	notebook
3	Feb 25	Pretraining, SFT, RLHF & PEFT, LoRA	slides
3	Feb 25	Parameter-efficient fine-tuning	notebook
4	Mar 4	Reasoning, RLVF & RAG	slides
4	Mar 4	Tokenization	notebook
6	Mar 18	Introduction to MLLMs and Image Modality	slides
6	Mar 18	Classification of VLMs: Deep Fusion vs Early Fusion	notebook
7	Mar 25	VLLM and Data Generation	slides
7	Mar 25	Visual Autoregressive Transformer	notebook
8	Apr 1	Video Understanding	slides
8	Apr 1	Video Modality and Any-to-any Models	notebook
9	Apr 8	Action Modality (Robotics)	slides
9	Apr 8	Intro to Vision Language Action Models	notebook
10	Apr 15	Intelligent Document Processing (IDP) и UI Agents	slides
10	Apr 15	Agentic Workflow	notebook
11	Apr 22	3D Data Modality	slides
11	Apr 22	VLM Grounding	notebook
12	Apr 29	Efficient Inference: FlashAttention, KV cache, Distillation, Quantization	slides
12	Apr 29	KV cache, Quantization	notebook

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
01-w2v-cls-langmodeling		01-w2v-cls-langmodeling
02-seq2seq		02-seq2seq
03-llm		03-llm
04-reason-rag		04-reason-rag
06-images		06-images
07-generation		07-generation
08-video		08-video
09-vla		09-vla
10-agent		10-agent
11-3D		11-3D
12-inference		12-inference
.gitattributes		.gitattributes
FAQ.md		FAQ.md
README.md		README.md

Provide feedback