List of topics

Các thông tin quan trọng của lớp

Biểu thức chính quy - Regular Expressions

Chi tiết các thuật toán tách token

Chuẩn hóa văn bản

Ôn tập đại số tuyến tính

Ôn tập học máy, học sâu và các khái niệm liên quan

Ôn tập quá trình training

Ôn tập softmax + mạng nơ ron

[Bổ trợ] Các thuật toán Training

Vector Semantics và Embeddings

SkipGram, Glove và FastText

Mô hình ngôn ngữ

RNN và LSTM

Bài toán dịch máy

Mô hình Transfomer (4 buổi)

Mô hình Bert và ứng dụng

Thực hành Bert và ứng dụng với bài toán NER và POS Tag

GPT + Đào tạo phân tán

Bài toán truy xuất thông tin - Information Retrieval

Vector Database and RAG

Transformer Nâng cao

Transformer Nâng cao

Transformer Nâng cao

1. Tối ưu trên văn bản dài

1.1. Cách 1 - ROPE

Đưa Rope vào Transformer:

Team nâng cấp mô hình Transformer thay vì sử dụng Embedding vị trí thường thì sử dụng Phép nhúng vị trí xoay (Rotary Position Embedding) giúp cải thiện hiệu năng phân loại từ 1-2%.

RoFormer thêm thông tin vị trí vào vector q và k thay vì phải tạo một lớp chỉ positional Embedding.

RoFormer áp dụng việc xoay vector q và k với một góc không đổi để tăng mối quan hệ vị trí tương đối.

Ví dụ ở vị trí trong câu từ m=1 đến m=2 và vị trí m=2 đến m=3, ở cùng một vị trí embedding ví dụ i = 0 vector q sẽ quay một góc giống nhau. Tương tự vector k cũng quay một góc giống nhau.

Notebook: https://colab.research.google.com/drive/1QBkP6ve4f2-KapKaWDeodkR8_2JEYWjw?usp=sharing.

Bài báo: https://arxiv.org/pdf/2104.09864v5.

Biểu diễn ROPE: https://colab.research.google.com/drive/1SMsORT8958HOs2c99bC9FYV4sfKrs0el?usp=sharing

1.2. Cách 2 - Sparse Attention

Code: https://colab.research.google.com/drive/1OyXD3ZQJuWfCzxIaxjBIOAvOPbq2Fw8I?usp=drive_link

Code hiển thị: https://drive.google.com/file/d/1Y3eOWClXk8dxpKw9FxGEeyiS6p3FsNys/view?usp=drive_link

Paper: https://arxiv.org/pdf/1904.10509

1.3. Cách 3 - Flash Attention

Cơ chế

Lập trình

Team thực hiện Benchmark Multiheaded Attention thường của Transformer và FlashAttention để tối ưu truy cập memory của GPU.

Kết quả sau 1000 lần thực hiện thì FlashAttention nhanh hơn khoảng gấp rưỡi so với Attention thông thường.

BenchMark này được thực hiện trên GPU T4 Google Colab.

P/S: trong Pytorch hàm scaled_dot_product_attention đã sử dụng Attention.

Notebook và kết quả: https://colab.research.google.com/drive/1-HjN3McMS_boMyBZFAt1TP7NLoRUd346?usp=sharing

1.4. Cách 4 - Multiquery Attention

Chi tiết code: https://colab.research.google.com/drive/1RRLM1PPekAcqz6f2a_cmbLf6kvkVVHMI?usp=sharing.

Ảnh minh họa: https://docs.google.com/presentation/d/16way0j4UZwznkEgXAKL24RkLH4gaizk7yWfcrmMLnzs/edit?usp=sharing

1.5. Mixture of Experts

Giải thích chi tiết: https://huggingface.co/blog/moe

Open Source MOE: https://huggingface.co/allenai/OLMoE-1B-7B-0924

1.6. Deep Seek v3

Paper: https://drive.google.com/file/d/1llCvhH3byOzg4LwlBtn5TZnWzBlL2FBd/view

Slide: https://docs.google.com/presentation/d/1M5wAvXWpb0Eq793P7JDRPMST3H3NVT_C/edit?usp=sharing&ouid=100974175953554169152&rtpof=true&sd=true

2. Tối ưu tốc độ sinh

2.1. KV Cache

Slide: https://docs.google.com/presentation/d/1Qe49-oFh1R8_KLP0P1pYmT_giTKrqlwe/edit?usp=sharing&ouid=100974175953554169152&rtpof=true&sd=true

Notebook: https://colab.research.google.com/drive/1JDsQ9QLqS5t4dnjnxAY4JqSUpKOXfttV?usp=sharing

3. Nâng cấp chất lượng

3.1. Sigmoid Attention

Bài báo: https://arxiv.org/pdf/2409.04431

Code: https://colab.research.google.com/drive/1f1Z4MUcVzyb7tDTNe2_zn-heaxBvDERP?usp=sharing

4. Video

4.1. [Zoom 10-12-2024] NLP 03

4.2. [NLP 06] Transformer nâng cao

1. Tối ưu trên văn bản dài

1.1. Cách 1 - ROPE

1.2. Cách 2 - Sparse Attention

1.3. Cách 3 - Flash Attention

1.4. Cách 4 - Multiquery Attention

1.5. Mixture of Experts

1.6. Deep Seek v3

2. Tối ưu tốc độ sinh

2.1. KV Cache

3. Nâng cấp chất lượng

3.1. Sigmoid Attention

4. Video

4.1. [Zoom 10-12-2024] NLP 03

4.2. [NLP 06] Transformer nâng cao