What Language Model To Choose For Your Project? 🤔 LLM Evaluation
Analytics Camp Analytics Camp
1.32K subscribers
552 views
26

 Published On Feb 9, 2024

#llm #huggingface #gpt4 #ai
With more than 490,000 language models uploaded in the Hugging Face model repositories, how do you find the best language model for your personal or business projects? I have spent two weeks searching for the best models so you don’t have to.

In this video, you will get to know all details about Hugging Face LLM Leaderboard and how it evaluates all the models objectively, the criteria and ratings, the top models in each category, and the performances of popular models such as GPT-3 and GPT-4, BERT, RoBERTa, and many more.

You will also get to know all six evaluation benchmarks in this leaderboard: MMLU, ARC, HellaSwag, TrithfulQA, Winogrande, and GSM8k. And of course, I’ll let you know about a platform where you can evaluate these models on your own!

Stick around for more videos on LLM, Natural Language Processing (NLP), Generative AI, fun coding and machine learning projects, and follow Analytics Camp on Twitter (X):   / analyticscamp  

Don’t forget to subscribe and watch these related videos:

Is Mamba destroying Transformers For Good? Language Models in AI
   • Is Mamba Destroying Transformers For ...  

Transformer Language Models Simplified in JUST 3 MINUTES!
   • Transformer Language Models Simplifie...  

Mamba Language Model Simplified In JUST 5 MINUTES!
   • Mamba Language Model Simplified In JU...  

This Is How EXACTLY Language Models Work In AI-- NO Background Needed:
   • This is how EXACTLY Language Models w...  

Zeno AI Evaluation platform
https://zenoml.com/

https://www.youtube.com/@analyticsCam...

Key Terms and Concepts:
00:00 Intro
00:28 Hugging Face LLM Leaderboard
01:44 MMLU (Measuring Massive Multitask Language Understanding)
02:08 Hendrycks Tests in MMLU
02:36 Test of Moral Scenarios
03:55 EleutherAI
04:04 Eleuther AI Language Model Evaluation Harness
04:17 AI2 Reasoning Challenge (ARC)
04:30 TruthfulQA tests
04:57 Humans VS LLM scores
05:19 GPT-3 answers to TruthfulQA test (#gpt3)
06:06 HellaSwag tests
06:58 Sample test from HellaSwag
08:31 GPT-4 results of HellaSwag tests (#gpt4)
08:41 RoBERTa, BERT(#googleai) and GPT base models results
09:06 Winogrande test
09:54 GSM8K test
11:03 Results for deciding the best LLM
12:16 Best language models for Question Answering projects
12:48 Zeno AI Evaluation platform

show more

Share/Embed