How to evaluate upgrading your app to GPT-4o | LangSmith Evaluations - Part 18
LangChain LangChain
58.5K subscribers
9,833 views
216

 Published On May 13, 2024

OpenAI recently released GPT-4o, which reports significant improvements in latency and cost. Many users may wonder how to evaluate the effects of upgrading their app to GPT-4o? For example, what latency benefit will users expect to gain and are there any material differences in app performance when I switch to the new GPT-4o model.

Decisions like this are often limited by quality evaluations! Here, we show the process of evaluating GPT-4o on an example RAG app with a 20 question eval set related to LangChain documentation. We show how regression testing in the LangSmith UI allows you to quickly pinpoint examples where GPT-4o shows improvements or regressions over your current app.

GPT-4o docs:
https://openai.com/index/hello-gpt-4o/

LangSmith regression testing UI docs:
https://docs.smith.langchain.com/old/...

RAG evaluation docs:
https://docs.smith.langchain.com/old/...

Public dataset referenced in the video:
https://smith.langchain.com/public/ea...

Cookbook referenced in the video:
https://github.com/langchain-ai/langs...

show more

Share/Embed