How AI models are grabbing the world's data

How AI models are grabbing the world's data | GZERO AI

105K subscribers

2,430 views

About
Share

Published On Jun 18, 2024

What does it take to build AI? Human labor, natural resources, and—most significantly—an insane amount of data! But how are tech giants like Meta and Google collecting this data? In this episode of GZERO AI, host Taylor Owen examines the scale and implications of the historic data land grab happening in the AI sector.

Subscribe to GZERO on YouTube and turn on notifications (🔔):    / @gzeromedia
Sign up for GZERO Daily (free newsletter on global politics): https://rebrand.ly/gzeronewsletter

In this episode of GZERO AI, Taylor Owen, host of the Machines Like Us podcast, examines the scale and implications of the historic data land grab happening in the AI sector. According to researcher Kate Crawford, AI is the largest superstructure ever built by humans, requiring immense human labor, natural resources, and staggering amounts of data. But how are tech giants like Meta and Google amassing this data?

So AI researcher Kate Crawford recently told me that she thinks that AI is the largest superstructure that our species has ever built. This is because of the enormous amount of human labor that goes into building AI, the physical infrastructure that's needed for the compute of these AI systems, the natural resources, the energy and the water that goes into this entire infrastructure. And of course, because of the insane amounts of data that is needed to build our frontier models. It's increasingly clear that we're in the middle of a historic land grab for these data, essentially for all of the data that has ever been created by humanity. So where is all this data coming from and how are these companies getting access to it? Well, first, they're clearly scraping the public internet. It's safe to say that if anything you've done has been posted to the internet in a public way, it's inside the training data of at least one of these models.

But it's also probably the case that these scraping includes a large amount of copyrighted data, or not publicly necessarily available data. They're probably also getting behind paywalls as we'll find out soon enough as the New York Times lawsuit against OpenAI works its way through the system and they're scraping each other's data. According to the New York Times, Google found out that OpenAI was scraping YouTube, but they didn't reveal it or push or reel it to the public because they too were scraping all of YouTube themselves and didn't just want this getting out. Second, all these companies are purchasing or licensing data. This includes news licensing entering into agreements with publishers, data purchased from data brokers, purchasing companies, or getting access to company datas that have rich data sets. Meta, for example, was considering buying the publisher Simon and Schuster just for access to their copyrighted books in order to train their LLM.

Read more:

Want to know more about global news and why it matters? Follow us on:
Instagram:   / gzeromedia
Twitter:   / gzeromedia
TikTok:   / gzeromedia
Facebook:   / gzeromedia
LinkedIn:   / gzeromedia
Threads: https://threads.net/@gzeromedia

Subscribe to our YouTube channel and turn on notifications (🔔):    / @gzeromedia
Sign up for GZERO Daily (free newsletter on global politics): https://rebrand.ly/gzeronewsletter
Subscribe to the GZERO podcast: https://podcasts.apple.com/us/podcast...

GZERO Media is a multimedia publisher providing news, insights and commentary on the events shaping our world. Our properties include GZERO World with Ian Bremmer, our newsletter GZERO Daily, Puppet Regime, the GZERO World Podcast, In 60 Seconds and GZEROMedia.com

#GZEROAI #AI #Data

Published On Jun 18, 2024

Share/Embed

Video Link