Repository files navigation Multimodal Large Language Models (MLLM)
Rules & FAQ
Class recordings (in Russian)
#
Date
Title
Materials
1
Feb 11
Word Embeddings and Classification & Language Modelling
slides
Embeddings & CNN/LSTM LMs with PyTorch
notebook
2
Feb 18
Seq2seq, Attention, and Transformers
slides
Transformer from Scratch
notebook
3
Feb 25
Pretraining, SFT, RLHF & PEFT, LoRA
slides
Parameter-efficient fine-tuning
notebook
4
Mar 4
Reasoning, RLVF & RAG
slides
Tokenization
notebook
6
Mar 18
Introduction to MLLMs and Image Modality
slides
Classification of VLMs: Deep Fusion vs Early Fusion
notebook
7
Mar 25
VLLM and Data Generation
slides
Visual Autoregressive Transformer
notebook
8
Apr 1
Video Understanding
slides
Video Modality and Any-to-any Models
notebook
9
Apr 8
Action Modality (Robotics)
slides
Intro to Vision Language Action Models
notebook
10
Apr 15
Intelligent Document Processing (IDP) и UI Agents
slides
Agentic Workflow
notebook
11
Apr 22
3D Data Modality
slides
VLM Grounding
notebook
12
Apr 29
Efficient Inference: FlashAttention, KV cache, Distillation, Quantization
slides
KV cache, Quantization
notebook
About
Multimodal LLMs course
Resources
Stars
Watchers
Forks
You can’t perform that action at this time.