Published On Sep 28, 2022
HuggingFace Web App: https://bit.ly/3SDyOWt
Image captioning is the process of taking an image and generating a caption that accurately describes the scene. This is a difficult task for neural networks because it requires understanding both natural language and computer vision.
In this video, I discuss my complete approach to this problem. For visual understanding, we will use Inception V3, and for natural language understanding, we will first use RNN, but it will fail to generalize well on unseen data, therefore we will shift to Transformer. And as you will see, Transformer will nail it!
Source Code:
Image Captioning with RNN: https://bit.ly/3SBPoGi
Image Captioning with Transformer: https://bit.ly/3HToJRC
Image Captioning (on MS COCO Dataset): https://bit.ly/40t2da9
🔗 Social Media 🔗
📱 Twitter: https://bit.ly/3aJWAeF
📝 LinkedIn: https://bit.ly/3aQGGiL
📂 GitHub: https://bit.ly/2QGLVYV
Timestamps:
00:00 Introduction
00:16 Quick overview of Image Captioning
01:08 The Model Architecture (RNN)
01:56 Getting the Image feature vectors using Inception V3
04:39 What Attention Mechanism is doing?
05:10 Choosing the Dataset
05:56 Data Preprocessing
06:54 Training!!!
07:13 Checking the results
09:24 Over Dramatic Transformer Introduction
10:25 Why I used COCO Dataset
11:12 Side-by-side result of RNN and Transformer
11:59 Deploying model to HuggingFace so anyone can use it!
#artificialintelligence #ai #deeplearning #machinelearning #transformer #transformers
Thank You,
Pritish Mishra