GGUF quantization of LLMs with llama cpp
AI Bites AI Bites
8.76K subscribers
2,812 views
77

 Published On Mar 22, 2024

Would you like to run LLMs on your laptop and tiny devices like mobile phones and watches? If so, you will need to quantize LLMs. LLAMA.cpp is an open-source library written in C and C++. It allows us to quantize a given model and run LLMs without GPUs.
In this video, I demonstrate how we can quantize a fine-tuned LLM on a Macbook and run it on the same Macbook for inference. I quantize the fine-tuned Gemma 2 Billion parameter model that we fine-tuned in my previous tutorial but you can use the same steps for quantizing any other fine-tuned LLMs of your choice.

MY KEY LINKS
YouTube:    / @aibites  
Twitter:   / ai_bites​  
Patreon:   / ai_bites​  
Github: https://github.com/ai-bites​

WHO AM I?
I am a Machine Learning researcher/practitioner who has seen the grind of academia and start-ups. I started my career as a software engineer 15 years ago. Because of my love for Mathematics (coupled with a glimmer of luck), I graduated with a Master's in Computer Vision and Robotics in 2016 when the now happening AI revolution started. Life has changed for the better ever since.

#machinelearning #deeplearning #aibites

show more

Share/Embed