List of topics

Tổng quan xây dựng mô hình ngôn ngữ lớn + Xây dựng LLMs đầu tiên

Ôn tập Tokenizer + Thực hành HuggingFace

Ôn tập học máy + học sâu và chi tiết mô hình ngôn ngữ

Ôn tập Transformer

[Xem thêm] Mô hình Bert

Demo Day Pretrained + Finetune LLMs

Họ model GPT - Fintune LLMs cho đa nhiệm bài toán

Chuẩn bị dữ liệu pre-trained cho mô hình ngôn ngữ

Kỹ thuật training ưu tiên - RLHF

Kỹ thuật training ưu tiên - DPO

Đánh giá chất lượng mô hình + Các kỹ thuật finetune tham số tối ưu - PEFT + Chữa bài tập

Chữa bài tập + Chuyên lượng tử hóa mô hình + Định dạng cho mô hình ngôn ngữ

Họ model LLAMA

Mô hình hóa đào tạo chuỗi tới chuỗi (Text - to - text models)

Họ mô hình DeepSeek

Multimodal

Dự án cuối khóa - 3 buổi

Agents và các bài toán liên quan

Các kỹ thuật Deployment mô hình ngôn ngữ

Mô hình hóa đào tạo chuỗi tới chuỗi (Text - to - text models)

Mô hình hóa đầu vào là văn bản đầu ra là văn bản

1. Mô hình T5

1.1. Paper T5

https://arxiv.org/pdf/1910.10683

1.2. T5-model

1.3. Thực hành fine tune text-to-text model

Thực hành finetune T-5 trên các bài toán khác nhau:

https://colab.research.google.com/drive/1Efus6aEk3R7fiKjZmG0MGwbt_Xu41SJJ?usp=sharing

Thực hành đánh giá mô hình: https://drive.google.com/file/d/1_2_cT2P3UUdSsdMEtE6v4thiz45c-3Xj/view?usp=drive_link

1.4. Đánh giá T5 model

Presentation Title: ROUGE: Evaluating Text Summarization & Translation

1: Introduction to ROUGE

What is ROUGE?
- ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation.
- A set of metrics and a software package.
- Used to evaluate automatic summarization and machine translation.
- Compares model-generated text against human-produced references.
- Focuses on recall.

2: ROUGE-N: N-gram Matching

Measures the overlap of n-grams between the candidate and reference texts.

Example:
- Reference (R): "The cat is on the mat."
- Candidate (C): "The cat and the dog."
ROUGE-1 Example:
- Precision: 3/5 = 0.6
- Recall: 3/6 = 0.5
- F1-score: 0.54
ROUGE-2 Example:
- Precision: 1/4 = 0.25
- Recall: 1/5 = 0.20
- F1-score: 0.22

ROUGE-L: Longest Common Subsequence (LCS)

Measures the longest common subsequence between the candidate and reference texts.

Example:
- R: "The cat is on the mat."
- C: "The cat and the dog."
- LCS: "the cat the"
Calculations:
- Precision: 3/5 = 0.6
- Recall: 3/6 = 0.5
- F1-score: 0.55

ROUGE-S: Skip-grams

ROUGE-S: Skip-gram Concurrence: Allows for a degree of leniency by matching non-consecutive n-grams.

Example:
- R: "The cat is on the mat."
- C: "The gray cat and the dog."
- "the cat" in R matches "the gray cat" in C.
Key Point: Useful when there are minor variations in word order.

Slide 6: Pros and Cons of ROUGE

Pros:
- Correlates positively with human evaluation.
- Inexpensive to compute.
- Language-independent.
Cons:
- Does not handle semantic similarity (synonyms).
- Measures syntactical matches rather than semantics.

ROUGE vs. BLEU

ROUGE:
- Focuses on recall.
- How much of the reference is in the candidate.
BLEU:
- Focuses on precision.
- How much of the candidate is in the reference.
Key Point: Complementary metrics; precision vs. recall tradeoff.

Conclusion