Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83 Video Tanpa Iklan

Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83

21.5K subscribers

8,774 views

237

About
Share

Published On Streamed live on Oct 30, 2023

Episode 83 of the Stanford MLSys Seminar Series!

Training Large Language Models at Scale
Speaker: Deepak Narayanan

Abstract:
Training LLMs efficiently is challenging for a few reasons: training can require yottaFLOPs of compute, and accelerators have limited memory capacity making it impossible to fit large models on even a multi-GPU server. Consequently, new methods of model parallelism such as tensor and pipeline parallelism have been proposed. Unfortunately, naïve usage of these methods leads to scaling issues at thousands of GPUs. In this talk, I describe various systems innovations incorporated into Megatron-LM (https://github.com/nvidia/megatron-lm) that allow us to run training iterations for models with up to a trillion parameters on thousands of GPUs.

Bio:
Deepak is a Senior Applied Deep Learning Research Scientist in the ADLR group at NVIDIA, where he builds software systems to more efficiently train and serve LLMs. He graduated from Stanford with a Ph.D. in Computer Science in September 2021, where he was advised by Prof. Matei Zaharia.

--

Stanford MLSys Seminar hosts: Simran Arora, Dan Fu

Twitter:
/ simran_s_arora
/ realdanfu

--

Check out our website for the schedule: http://mlsys.stanford.edu
Join our mailing list to get weekly updates: https://groups.google.com/forum/#!for...

#machinelearning #ai #artificialintelligence #systems #mlsys #computerscience #stanford

Published On Streamed live on Oct 30, 2023

Share/Embed

Video Link