[Zoom 04-04-2025] Mô hình hóa đào tạo chuỗi tới chuỗi (Text - to - text models)
Mô hình hóa đầu vào là văn bản đầu ra là văn bản
1. Mô hình T5
1.1. Paper T5
1.2. T5-model
1.3. Thực hành fine tune text-to-text model
Thực hành finetune T-5 trên các bài toán khác nhau:
https://colab.research.google.com/drive/1Efus6aEk3R7fiKjZmG0MGwbt_Xu41SJJ?usp=sharing
Thực hành đánh giá mô hình: https://drive.google.com/file/d/1_2_cT2P3UUdSsdMEtE6v4thiz45c-3Xj/view?usp=drive_link
1.4. Đánh giá T5 model
Presentation Title: ROUGE: Evaluating Text Summarization & Translation
1: Introduction to ROUGE
What is ROUGE?
ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation.
A set of metrics and a software package.
Used to evaluate automatic summarization and machine translation.
Compares model-generated text against human-produced references.
Focuses on recall.
2: ROUGE-N: N-gram Matching
Measures the overlap of n-grams between the candidate and reference texts.
Example:
Reference (R): "The cat is on the mat."
Candidate (C): "The cat and the dog."
ROUGE-1 Example:
Precision: 3/5 = 0.6
Recall: 3/6 = 0.5
F1-score: 0.54
ROUGE-2 Example:
Precision: 1/4 = 0.25
Recall: 1/5 = 0.20
F1-score: 0.22
ROUGE-L: Longest Common Subsequence (LCS)
Measures the longest common subsequence between the candidate and reference texts.
Example:
R: "The cat is on the mat."
C: "The cat and the dog."
LCS: "the cat the"
Calculations:
Precision: 3/5 = 0.6
Recall: 3/6 = 0.5
F1-score: 0.55
ROUGE-S: Skip-grams
ROUGE-S: Skip-gram Concurrence: Allows for a degree of leniency by matching non-consecutive n-grams.
Example:
R: "The cat is on the mat."
C: "The gray cat and the dog."
"the cat" in R matches "the gray cat" in C.
Key Point: Useful when there are minor variations in word order.
Slide 6: Pros and Cons of ROUGE
Pros:
Correlates positively with human evaluation.
Inexpensive to compute.
Language-independent.
Cons:
Does not handle semantic similarity (synonyms).
Measures syntactical matches rather than semantics.
ROUGE vs. BLEU
ROUGE:
Focuses on recall.
How much of the reference is in the candidate.
BLEU:
Focuses on precision.
How much of the candidate is in the reference.
Key Point: Complementary metrics; precision vs. recall tradeoff.
Conclusion
Summary:
ROUGE is a valuable tool for evaluating summarization and translation.
Different ROUGE metrics provide different perspectives.
Understanding the pros and cons is important for proper usage.