Tag: BERTScore
24Jan
Beyond BLEU and ROUGE: Why Semantic Metrics Are the New Standard for LLM Evaluation
BLEU and ROUGE are outdated for evaluating modern LLMs. Semantic metrics like BERTScore and BLEURT measure meaning, not word overlap, and correlate far better with human judgment. Here's how to use them effectively.