Published On Apr 7, 2024
0:00 Introduction and Problem Description
0:23 LLM Reasoning Challenge
0:51 Details of the $10,000 Challenge
1:15 Internet Takes Up the Challenge
2:28 Winning Entry and Success Rates
3:25 LLMs Can Do Reasoning with Prompting
4:10 Benchmarking LLM Reasoning Capabilities
5:07 Boundaries of LLM Reasoning Unclear
5:36 Claude Opus Outperforms GPT-4
6:08 Conclusion and Future Video Plans
Original claim: / 1776096481704804789
$10k challenge: / 1776677635491344744
Challenge won: / 1777049193489572064
BigBench-Hard numbers from Claude whitepaper: https://www-cdn.anthropic.com/de8ba9b...
http://vivekhaldar.com
http://x.com/vivekhaldar
show more