Implement and Train ViT From Scratch for Image Recognition - PyTorch
Uygar Kurt Uygar Kurt
1.58K subscribers
14,335 views
652

 Published On Sep 29, 2023

We're going to implement ViT (Vision Transformer) and train our implementation on the MNIST dataset to classify images! Video where I explain the ViT paper and GitHub below ↓

Want to support the channel? Hit that like button and subscribe!

ViT (Vision Transformer) - An Image Is Worth 16x16 Words (Paper Explained)
   • ViT (Vision Transformer) - An Image I...  

GitHub Link of the Code
https://github.com/uygarkurt/ViT-PyTorch

Notebook
https://github.com/uygarkurt/ViT-PyTo...

ViT (Vision Transformer) is introduced in the paper: "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"
https://arxiv.org/abs/2010.11929

What should I implement next? Let me know in the comments!

00:00:00 Introduction
00:00:09 Paper Overview
00:02:41 Imports and Hyperparameter Definitions
00:11:09 Patch Embedding Implementation
00:19:36 ViT Implementation
00:29:00 Dataset Preparation
00:51:16 Train Loop
01:09:27 Prediction Loop
01:12:05 Classifying Our Own Images

show more

Share/Embed