How to Make Your Images Talk: The AI that Captions Any Image
Pritish Mishra Pritish Mishra
4.25K subscribers
54,701 views
1K

 Published On Sep 28, 2022

HuggingFace Web App: https://bit.ly/3SDyOWt

Image captioning is the process of taking an image and generating a caption that accurately describes the scene. This is a difficult task for neural networks because it requires understanding both natural language and computer vision.

In this video, I discuss my complete approach to this problem. For visual understanding, we will use Inception V3, and for natural language understanding, we will first use RNN, but it will fail to generalize well on unseen data, therefore we will shift to Transformer. And as you will see, Transformer will nail it!

Source Code:
Image Captioning with RNN: https://bit.ly/3SBPoGi
Image Captioning with Transformer: https://bit.ly/3HToJRC
Image Captioning (on MS COCO Dataset): https://bit.ly/40t2da9

🔗 Social Media 🔗
📱 Twitter: https://bit.ly/3aJWAeF​​
📝 LinkedIn: https://bit.ly/3aQGGiL​​
📂 GitHub: https://bit.ly/2QGLVYV​​

Timestamps:
00:00 Introduction
00:16 Quick overview of Image Captioning
01:08 The Model Architecture (RNN)
01:56 Getting the Image feature vectors using Inception V3
04:39 What Attention Mechanism is doing?
05:10 Choosing the Dataset
05:56 Data Preprocessing
06:54 Training!!!
07:13 Checking the results
09:24 Over Dramatic Transformer Introduction
10:25 Why I used COCO Dataset
11:12 Side-by-side result of RNN and Transformer
11:59 Deploying model to HuggingFace so anyone can use it!

#artificialintelligence #ai #deeplearning #machinelearning #transformer #transformers

Thank You,
Pritish Mishra

show more

Share/Embed