Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83
Stanford MLSys Seminars Stanford MLSys Seminars
21.5K subscribers
8,774 views
237

 Published On Streamed live on Oct 30, 2023

Episode 83 of the Stanford MLSys Seminar Series!

Training Large Language Models at Scale
Speaker: Deepak Narayanan

Abstract:
Training LLMs efficiently is challenging for a few reasons: training can require yottaFLOPs of compute, and accelerators have limited memory capacity making it impossible to fit large models on even a multi-GPU server. Consequently, new methods of model parallelism such as tensor and pipeline parallelism have been proposed. Unfortunately, naïve usage of these methods leads to scaling issues at thousands of GPUs. In this talk, I describe various systems innovations incorporated into Megatron-LM (https://github.com/nvidia/megatron-lm) that allow us to run training iterations for models with up to a trillion parameters on thousands of GPUs.

Bio:
Deepak is a Senior Applied Deep Learning Research Scientist in the ADLR group at NVIDIA, where he builds software systems to more efficiently train and serve LLMs. He graduated from Stanford with a Ph.D. in Computer Science in September 2021, where he was advised by Prof. Matei Zaharia.

--

Stanford MLSys Seminar hosts: Simran Arora, Dan Fu

Twitter:
  / simran_s_arora  
  / realdanfu​  

--

Check out our website for the schedule: http://mlsys.stanford.edu
Join our mailing list to get weekly updates: https://groups.google.com/forum/#!for...

#machinelearning #ai #artificialintelligence #systems #mlsys #computerscience #stanford

show more

Share/Embed