Data Versioning in Generative AI: A Pathway to Cost-effective ML
MLOps World: Machine Learning in Production MLOps World: Machine Learning in Production
2.78K subscribers
43 views
0

 Published On May 16, 2024

Speaker: Dmitry Petrov, CEO, DVC

For 5 years we have been building DVC and we know how data versioning helps teams. The evolving Generative AI workflows are different and require an evolution of versioning workflows to accomplish Generative AI goals. This new era thrives on vast amounts of unstructured data, which include everything from images, videos, and audio, to MRI scans, document scans, and plain text dialogues. This data, often scaling into billions of objects, together with the resource-consuming task of scoring models on expensive GPU hardware or using model APIs like ChatGPT, brings forth unique challenges in the field of data management and versioning.

In this talk, we will delve into data versioning in the context of generative AI. Our focus will be on strategies that assist businesses in minimizing their processing time and the volume of API calls to external models like ChatGPT, resulting in substantial cost savings. Furthermore, we will discuss effective methodologies for sharing datasets amongst ML researchers to promote seamless collaboration.

Lastly, we will examine the pivotal transformations generative AI has introduced to data versioning in the past year including annotations and embeddings versioning. Together, these insights will provide attendees with an in-depth understanding of the rapidly evolving data management landscape in the era of generative AI. ""
1. How data management is different in a Generative AI environment compared to traditional ML
2. How to save cost on compute and API calls using data versioning
3. Dataset sharing in the team as a way to improve collaboration
4. How to efficiently version annotation, embeddings, and auto-labels together with data

show more

Share/Embed