Self-Host and Deploy Local LLAMA-3 with NIMs
Prompt Engineering Prompt Engineering
169K subscribers
6,627 views
189

 Published On Jun 30, 2024

In this video, I walk you through deploying Llama models using NVIDIA NIM. NVIDIA NIM uses microservices to enhance the deployment of various AI models, offering up to three times improvement in performance. I demonstrate how to set up an NVIDIA Launchpad, deploy the Llama 3 8 billion instruct version, and stress test it to see throughput. I also show you how to utilize OpenAI compatible API servers with NVIDIA NIM.

LINKS:
NIM: https://nvda.ws/44u5KYH
https://org.ngc.nvidia.com/setup/pers...
NIM Previous Video:    • Deploy AI Models to Production with N...  


💻 RAG Beyond Basics Course:
https://prompt-s-site.thinkific.com/c...

Let's Connect:
🦾 Discord:   / discord  
☕ Buy me a Coffee: https://ko-fi.com/promptengineering
|🔴 Patreon:   / promptengineering  
💼Consulting: https://calendly.com/engineerprompt/c...
📧 Business Contact: [email protected]
Become Member: http://tinyurl.com/y5h28s6h

💻 Pre-configured localGPT VM: https://bit.ly/localGPT (use Code: PromptEngineering for 50% off).

Signup for Newsletter, localgpt:
https://tally.so/r/3y9bb0


TIMESTAMPS
00:00 Introduction to Deploying Large Language Models
00:13 Overview of NVIDIA NIM
01:02 Setting Up and Deploying a NIM
01:51 Accessing and Monitoring the GPU
03:39 Generating API Keys and Running Docker
05:36 Interacting with the Deployed Model
07:16 Stress Testing the API Endpoint
09:53 Using OpenAI Compatible API with NVIDIA NIM
12:32 Conclusion and Next Steps

All Interesting Videos:
Everything LangChain:    • LangChain  

Everything LLM:    • Large Language Models  

Everything Midjourney:    • MidJourney Tutorials  

AI Image Generation:    • AI Image Generation Tutorials  

show more

Share/Embed