How ChatGPT is Trained

25.7K subscribers

521,673 views

12K

About
Share

Published On Jan 24, 2023

This short tutorial explains the training objectives used to develop ChatGPT, the new chatbot language model from OpenAI.

Timestamps:
0:00 - Non-intro
0:24 - Training overview
1:33 - Generative pretraining (the raw language model)
4:18 - The alignment problem
6:26 - Supervised fine-tuning
7:19 - Limitations of supervision: distributional shift
8:50 - Reward learning based on preferences
10:39 - Reinforcement learning from human feedback
13:02 - Room for improvement

ChatGPT: https://openai.com/blog/chatgpt

Relevant papers for learning more:
InstructGPT: Ouyang et al., 2022 - https://arxiv.org/abs/2203.02155
GPT-3: Brown et al., 2020 - https://arxiv.org/abs/2005.14165
PaLM: Chowdhery et al., 2022 - https://arxiv.org/abs/2204.02311
Efficient reductions for imitation learning: Ross & Bagnell, 2010 - https://proceedings.mlr.press/v9/ross...
Deep reinforcement learning from human preferences: Christiano et al., 2017 - https://arxiv.org/abs/1706.03741
Learning to summarize from human feedback: Stiennon et al., 2020 - https://arxiv.org/abs/2009.01325
Scaling laws for reward model overoptimization: Gao et al., 2022 - https://arxiv.org/abs/2210.10760
Proximal policy optimization algorithms: Schulman et al., 2017 - https://arxiv.org/abs/1707.06347

Special thanks to Elmira Amirloo for feedback on this video.

Links:
YouTube: / ariseffai
Twitter: / ari_seff
Homepage: https://www.ariseff.com

If you'd like to help support the channel (completely optional), you can donate a cup of coffee via the following:
Venmo: https://venmo.com/ariseff
PayPal: https://www.paypal.me/ariseff

Published On Jan 24, 2023

Share/Embed

Video Link