How ChatGPT is Trained
Ari Seff Ari Seff
25.7K subscribers
521,673 views
12K

 Published On Jan 24, 2023

This short tutorial explains the training objectives used to develop ChatGPT, the new chatbot language model from OpenAI.

Timestamps:
0:00 - Non-intro
0:24 - Training overview
1:33 - Generative pretraining (the raw language model)
4:18 - The alignment problem
6:26 - Supervised fine-tuning
7:19 - Limitations of supervision: distributional shift
8:50 - Reward learning based on preferences
10:39 - Reinforcement learning from human feedback
13:02 - Room for improvement

ChatGPT: https://openai.com/blog/chatgpt

Relevant papers for learning more:
InstructGPT: Ouyang et al., 2022 - https://arxiv.org/abs/2203.02155
GPT-3: Brown et al., 2020 - https://arxiv.org/abs/2005.14165
PaLM: Chowdhery et al., 2022 - https://arxiv.org/abs/2204.02311
Efficient reductions for imitation learning: Ross & Bagnell, 2010 - https://proceedings.mlr.press/v9/ross...
Deep reinforcement learning from human preferences: Christiano et al., 2017 - https://arxiv.org/abs/1706.03741
Learning to summarize from human feedback: Stiennon et al., 2020 - https://arxiv.org/abs/2009.01325
Scaling laws for reward model overoptimization: Gao et al., 2022 - https://arxiv.org/abs/2210.10760
Proximal policy optimization algorithms: Schulman et al., 2017 - https://arxiv.org/abs/1707.06347

Special thanks to Elmira Amirloo for feedback on this video.

Links:
YouTube:    / ariseffai  
Twitter:   / ari_seff  
Homepage: https://www.ariseff.com

If you'd like to help support the channel (completely optional), you can donate a cup of coffee via the following:
Venmo: https://venmo.com/ariseff
PayPal: https://www.paypal.me/ariseff

show more

Share/Embed