How 1 Bit LLMs Work
Oxen Oxen
4.67K subscribers
27,683 views
888

 Published On Apr 8, 2024

We dive into The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits paper, a technique to represent weights with 0, 1, or -1 integers instead of floats.

--

Get Oxen AI 🐂 https://oxen.ai/

Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI.

--

Paper 📜 https://arxiv.org/abs/2402.17764

Links + Notes 📝 https://www.oxen.ai/blog/arxiv-dives-...

Join Arxiv Dives 🤿 https://oxen.ai/community

Discord 🗿   / discord  

--

Chapters
0:00 Intro
2:28 Why Called BitNet 1.58
3:08 Why Should I Care?
4:18 Math
6:08 Quantization Without BitNet
8:50 BitLinear Layer
11:30 What About Backpropagation?
13:42 How Many Gainz?
15:03 Bessie the BitNet
16:15 Testing the Base Model
21:20 Fine_Tuning for QA/Instructions
33:03 The Code
33:25 Diving into the Quantization
43:30 Good News and Bad News
44:22 What’s Next?
44:58 Takeaways

show more

Share/Embed