Owain Evans - AI Situational Awareness, LLM Out-of-Context Reasoning
The Inside View The Inside View
5.66K subscribers
1,906 views
55

 Published On Aug 23, 2024

Owain Evans is an AI Alignment researcher, research associate at the Center of Human Compatible AI at UC Berkeley, and now leading a new AI safety research group.

In this episode we discuss two of his recent papers, “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs" (https://arxiv.org/abs/2407.04694) and “Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data” (https://arxiv.org/abs/2406.14546) alongside some Twitter questions.

Patreon:   / theinsideview  

Manifund: https://manifund.org/projects/making-...

Ask questions:   / michaeltrazzi  

Owain Evans:   / owainevans_uk  

OUTLINE

00:00:00 Intro
00:01:12 Owain's Agenda
00:02:25 Defining Situational Awareness
00:03:30 Safety Motivation
00:04:58 Why Release A Dataset
00:06:17 Risks From Releasing It
00:10:03 Claude 3 on the Longform Task
00:14:57 Needle in a Haystack
00:19:23 Situating Prompt
00:23:08 Deceptive Alignment Precursor
00:30:12 Distribution Over Two Random Words
00:34:36 Discontinuing a 01 sequence
00:40:20 GPT-4 Base On the Longform Task
00:46:44 Human-AI Data in GPT-4's Pretraining
00:49:25 Are Longform Task Questions Unusual
00:51:48 When Will Situational Awareness Saturate
00:53:36 Safety And Governance Implications Of Saturation
00:56:17 Evaluation Implications Of Saturation
00:57:40 Follow-up Work On The Situational Awarenss Dataset
01:00:04 Would Removing Chain-Of-Thought Work?
01:02:18 Out-of-Context Reasoning: the "Connecting the Dots" paper
01:05:15 Experimental Setup
01:07:46 Concrete Function Example: 3x + 1
01:11:23 Isn't It Just A Simple Mapping?
01:17:20 Safety Motivation
01:22:40 Out-Of-Context Reasoning Results Were Surprising
01:24:51 The Biased Coin Task
01:27:00 Will Out-Of-Context Resaoning Scale
01:32:50 Checking If In-Context Learning Work
01:34:33 Mixture-Of-Functions
01:38:24 Infering New Architectures From ArXiv
01:43:52 Twitter Questions
01:44:27 How Does Owain Come Up With Ideas?
01:49:44 How Did Owain's Background Influence His Research Style And Taste?
01:52:06 Should AI Alignment Researchers Aim For Publication?
01:57:01 How Can We Apply LLM Understanding To Mitigate Deceptive Alignment?
01:58:52 Could Owain's Research Accelerate Capabilities?
02:08:44 How Was Owain's Work Received?
02:13:23 Last Message

show more

Share/Embed