List of topics

Tổng quan xây dựng mô hình ngôn ngữ lớn + Xây dựng LLMs đầu tiên

Ôn tập Tokenizer + Thực hành HuggingFace

Ôn tập học máy + học sâu và chi tiết mô hình ngôn ngữ

Ôn tập Transformer

[Xem thêm] Mô hình Bert

Demo Day Pretrained + Finetune LLMs

Họ model GPT - Fintune LLMs cho đa nhiệm bài toán

Chuẩn bị dữ liệu pre-trained cho mô hình ngôn ngữ

Kỹ thuật training ưu tiên - RLHF

Kỹ thuật training ưu tiên - DPO

Họ model LLAMA

Đánh giá chất lượng mô hình + Các kỹ thuật finetune tham số tối ưu - PEFT + Chữa bài tập

Chữa bài tập + Chuyên lượng tử hóa mô hình + Định dạng cho mô hình ngôn ngữ

Họ Model GPT OSS

Học model Kimi

Mô hình hóa đào tạo chuỗi tới chuỗi (Text - to - text models)

Họ mô hình DeepSeek

Multimodal

Dự án cuối khóa - 3 buổi

Agents và các bài toán liên quan

[Nâng cao] Mô hình Hope - Attention 2.0

Multimodal

Chi tiết các mô hình Multimodal hiện tại và cách thiết kế chúng
- Mô hình CLIP
- Mô hình VIT
- LLAMA3.2 Vision Model

1. Mô hình CLIP

1.1. CLIP model

1.2. Thực hành

Lập trình CLIP từ đầu

https://drive.google.com/file/d/1zpEFP1K5gMZwQafZ0rIi7EA8nW91XUDo/view?usp=drive_link

Fine tune CLIP
https://drive.google.com/file/d/1WXA6w-MZn0j72Vnl07neHb6Gs5xNImZy/view?usp=drive_link

CLIP là mô hình phân loại ảnh

https://drive.google.com/file/d/1erbHcICMhA3Wooh-MUwhr8jA0IybbJy8/view?usp=drive_link

2. VIT Foundation

2.1. VIT Model

2.2. Slide

3. Flamingo

3.1. Flamingo

4. Đọc nghiên cứu MultiModal

4.1. Các nghiên cứu cần đọc

4 loại kiến trúc MultiModals phổ biến:

Type	Name	Description
Type-A	SCDF	Standard Cross-attention based Deep Fusion
Type-B	CLDF	Custom Layer based Deep Fusion
Type-C	NTEF	Non-Tokenized Early Fusion
Type-D	TEF	Tokenized Early Fusion

Flamingo: https://slds-lmu.github.io/seminar_multimodal_dl/c02-00-multimodal.html#flamingo

The Evolution of Multimodal Model Architectures: https://arxiv.org/pdf/2405.17927

5. Video

5.1. [Zoom 25-04-2025] CLIP + VIT

5.2. [Zoom 09-05-2025] Flamingo