#158 – Holden Karnofsky on how AIs might take over even if they're no smarter than humans, and his 4-part playbook for AI risk by 80,000 Hours Podcast | Podchaser

Episode from the podcast80,000 Hours Podcast

#158 – Holden Karnofsky on how AIs might take over even if they're no smarter than humans, and his 4-part playbook for AI risk

Released Monday, 31st July 2023

Good episode? Give it some love!

#158 – Holden Karnofsky on how AIs might take over even if they're no smarter than humans, and his 4-part playbook for AI risk

#158 – Holden Karnofsky on how AIs might take over even if they're no smarter than humans, and his 4-part playbook for AI risk

Monday, 31st July 2023

Good episode? Give it some love!

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

I believe you can make the entire case for

0:02

being extremely concerned about AI, assuming

0:04

that AI will never be smarter than a human. Instead,

0:07

it will be as capable as the most

0:09

capable humans. And there will be a ton

0:12

of them because unlike humans, you

0:14

can just copy them. You can use your copies

0:16

to come up with ways to make it more efficient,

0:18

just like humans do. Then you can make more copies.

0:21

And when we talk about whether AI could defeat

0:23

humanity, and I've written one blog post on whether AI

0:25

could kind of like take over the world, they don't

0:28

have to be more capable than humans. They

0:30

could be equally capable and there could be more of them

0:32

that, that could really do it. That could really be enough

0:35

that then we wouldn't be, humans wouldn't be

0:37

in control of the world anymore. So I'm

0:39

basically generally happy

0:41

to just have all discussions about AI and what

0:43

the risks are just in this world where like,

0:46

there's nothing more capable than a human, but it's pretty scary

0:48

to have a lot of those that have different values

0:50

from humans that are kind of a second advanced species.

0:52

Um, that's not to rule out that, that some

0:54

of these super intelligence concerns could be real. It's just

0:56

like, they're not always necessary and they can sideline

0:59

people.

1:01

People always love an interview with Holden,

1:04

founder of Open Philanthropy. We last

1:06

spoke with Holden in 2021 about

1:08

his theory that we're plausibly living in the

1:10

most important century of all of those that are

1:12

yet to come. And today we discuss other

1:15

things that have been preoccupying him lately, including

1:17

what the real challenges are that are raised by

1:19

rapid advances in AI. Why not

1:21

just gradually solve those problems as they come up? What

1:24

multiple different groups are able to do about it, including

1:27

listeners to this show, governments, computer

1:29

security experts, journalists, and on and on.

1:32

What various different groups are getting wrong about

1:34

AI in Holden's opinion, how we might

1:36

just succeed with artificial intelligence

1:38

by sheer luck. Holden's four

1:41

different categories of useful work

1:43

to help with AI, plus a few random

1:45

audience questions as well. At the end, we

1:48

also talk through why Holden rejects

1:50

impartiality as a core principle

1:52

of morality and his

1:54

non-realist conception of why it is that he

1:56

bothers to try to help others at all. After

1:59

the interview.

1:59

I also respond to some reactions

2:02

we got to the previous interview with Ezra

2:04

Klein. Without further ado, I bring you Holden

2:06

Karnovsky.

2:18

Today I'm again speaking with Holden Karnovsky.

2:21

In 2007, Holden co-founded the charity evaluator

2:23

GiveWell, and then in 2014, he co-founded

2:26

the foundation Open Philanthropy, which works

2:28

to find the highest impact grant opportunities, and has

2:30

so far recommended around $2 billion in grants. But

2:33

in March 2023, Holden started

2:35

a leave of absence from Open Philanthropy to instead

2:37

explore working directly on AI safety and

2:39

ways of improving outcomes from recent advances

2:42

in AI. He also blogs at cold-takes.com

2:45

about futurism, quantitative macro history,

2:47

and epistemology, though recently he's had, again,

2:49

a particular focus on AI, writing posts like

2:51

How We Could Stumble Into AI Catastrophe, Racing

2:54

Through a Minefield, The AI Deployment Problem, and

2:56

Jobs That Can Help With the Most Important Century. I

2:58

should note that Open Philanthropy is 80,000 Hours' biggest

3:00

supporter, and that Holden's wife is Daniela

3:03

Amadei, the president of the AI Lab Anthropic.

3:05

Thanks for returning to the podcast, Holden.

3:07

Thanks for having me. I hope to talk about

3:09

what you'd most like to see people do to positively

3:11

shape the development of AI, as well as your

3:13

reservations about utilitarianism. But first,

3:16

what are you working on at the moment, and why do you think it's important?

3:19

Sure. Currently on a leave of

3:21

absence from Open Philanthropy, just taking a little

3:23

time to explore different ways I might be able

3:25

to do direct work to reduce

3:27

potential risks from advanced AI. One

3:30

of the main things I've been trying to do recently, although

3:32

there's a couple things I'm looking into, is understanding

3:34

what it might look like to have AI safety

3:36

standards, by which I mean documenting

3:40

expectations that AI companies

3:42

and other labs won't build and deploy

3:44

AIs that pose too much risk to the world, as

3:47

evaluated by some sort of systematic evaluation

3:50

regime. These

3:52

expectations could be done via

3:54

self-regulation, via regulation-regulation.

3:57

There's a lot of potential interrelated pieces. So

4:00

to make this work, I think you would need ways of

4:02

evaluating when an AI system is dangerous.

4:04

That's sometimes called eval's. Then you would

4:06

need potential standards that would

4:09

basically talk about how you connect

4:11

what you're seeing from the eval's with what kind of measures

4:13

you have to take to ensure safety. Then also

4:16

to make this whole system work, you would need some

4:18

way of making the case to the general public that standards

4:20

are important, so companies are more likely

4:22

to adhere to them. And so things

4:25

I've been working on include trying to understand

4:27

what standards look like in more mature industries,

4:30

doing one case study and kind of trying to fund

4:32

some other ones there, trying to learn lessons

4:35

from what's already worked in the past elsewhere. I've

4:37

been advising groups like ArcEval's

4:40

thinking about what eval's and standards could look like.

4:43

And I've been trying to understand what pieces of the

4:45

case for standards are and are not resonating with people

4:47

so I can think about how to kind of increase

4:50

public support for this kind of thing. So

4:52

it's pretty exploratory right now, but that's been one of the

4:54

main things I've been thinking about. Yeah,

4:56

I guess there's a huge surface area on

4:58

ways one might attack this problem. Why the focus

5:01

on standards and eval's in particular?

5:03

Yeah, sure. I mean, we can get to this, but

5:05

I've kind of been thinking a lot about what

5:08

the major risks are from advanced AI and what

5:10

the major ways of reducing them are. And,

5:13

you know, I think there's kind of a few different

5:15

components that seem most promising

5:17

to me as part of,

5:19

I don't know, most stories I can tell in my head for

5:21

how we get the risks down very far.

5:24

And this is the piece of the puzzle that

5:26

to me feels

5:27

like it has a lot of potential, but there isn't much

5:29

action on it right now and that someone

5:32

with my skills can potentially play a big role in

5:34

helping to get things off the ground, helping to spur

5:36

a bit more action, getting people

5:39

to, you know,

5:40

to just like move a little faster. I'm

5:42

not sure this is like what I want to be doing for

5:44

the long run, but I think it's in this kind of nascent

5:47

phase where my kind of background

5:49

with just starting up things in very vaguely

5:51

defined areas and getting to the point where they're a little bit more

5:53

mature is maybe helpful.

5:56

Yeah. How has it been no longer leading

5:58

an organization with tons of... employees

6:00

and being able to self-direct a little bit more. You

6:02

wouldn't have been in that situation for quite some time,

6:04

I guess.

6:05

Yeah, I mean, it was somewhat gradual.

6:07

So we've been talking for for

6:10

several years now, I think, at least since 2018 or

6:12

so about succession at Open Philanthropy,

6:15

because I've always been a person who likes

6:17

to start things and likes to be in there in that nascent

6:19

phase and prefers always

6:21

to find someone else who can run an organization

6:24

for the long run. And we, you know, a couple

6:26

years ago, Alexander became co-CEO

6:29

with me and has taken on more and more duties.

6:31

So it wasn't an overnight thing. And

6:33

I'm still not completely uninvolved.

6:35

I mean, I'm

6:35

still on the board of Open Philanthropy. I'm

6:37

still meeting with people. I'm still advising, you

6:40

know, kind of similar to GiveWell. I had a very, very

6:43

gradual transition away from GiveWell and

6:45

still talk to them frequently. So

6:47

it's, you know, it's been a gradual thing. But for me, it

6:49

is an improvement. I think it's not my happy

6:52

place to be at the top of that org chart.

6:54

Yeah.

6:55

Okay, so today, we're

6:57

not going to rehash the basic arguments for worrying

6:59

about ways that AI advances

7:02

could go wrong, or ways that maybe this century

7:04

could turn out to be really unusually important.

7:06

I think, you know, AI risk, people have heard of it now.

7:09

We've done lots of episodes on it. I

7:12

guess people who wanted to hear your broader worldview

7:14

on this could go back to previous interviews, including

7:16

with you, such as episode 109, Holden Knofsky on

7:18

the most important century. In

7:20

my mind, the risks are both

7:22

pretty near term, and I think increasingly kind of apparent.

7:25

So to me, it feels like the point in this

7:27

whole story where we need to get down a bit more to brass tacks

7:30

and start debating what is to be done and

7:32

figuring out what things might might really help that we could

7:35

get moving on. That said, we should

7:37

take a minute to think about, you know, which aspect of

7:39

this broader problem are we talking about today? And which one

7:41

are you thinking about? Of course, there's risks

7:43

from misalignment. So AI models

7:46

completely flipping out and trying to take over would

7:48

be an extreme case of that. Then there's misuse, where

7:51

the models are doing what people are telling them to do,

7:53

but we wish that they weren't,

7:55

perhaps. And then I guess that there's other

7:57

risks like just speeding up history, causing a

8:00

whole lot of stuff to happen incredibly quickly, and perhaps that

8:02

leading us into disaster. Yeah, which

8:04

of the aspects of this broader problem do you

8:06

think of yourself as trying to contribute to Solve right now?

8:09

Yeah, I mean, first off to your point, I mean, I

8:11

am happy to focus on solutions, but I do think

8:13

it can be hard to have a really good conversation on solutions

8:16

without having some kind of shared understanding of the problem.

8:18

And I think while a lot of people are getting

8:20

vaguely scared about AI, I

8:23

think there's still a lot of room to have, you

8:26

know, a lot of room to disagree on what exactly

8:28

the most important aspects of the problem are, what

8:30

exactly the biggest risks are. For

8:32

me, the two you named, misalignment and misuse,

8:35

are definitely big. I would throw some others

8:37

in there too that I think are also big.

8:39

I think, you know, we may

8:41

be on the cusp of having a lot

8:43

of things work really differently about the world, and

8:46

in particular having kind of what you might

8:48

think of as new life forms, whether that's AIs

8:50

or, you know, I've written in the past on cold

8:52

takes about digital people, that if we had

8:54

the right technology, which we might be able to develop with AIs

8:57

help, we might have kind of, you know, simulations

8:59

of humans that we ought to think of as kind of humans

9:02

like us. And that could lead to a lot

9:04

of challenges, you know, just the

9:06

fact, for example, that you could have human

9:08

rights abuses happening inside a computer. It

9:11

seems like a very strange situation that

9:13

society has not really dealt with before.

9:15

And I think there's a bunch of other things like that. What

9:18

kind of world do we have when someone can just make

9:21

copies of

9:22

people or of minds and

9:24

ensure that those copies believe certain things and

9:27

defend certain ideas that I think

9:29

could challenge the way a lot of our existing institutions

9:31

work. So there's a nice piece,

9:33

Propositions About Digital Minds, that I think is a flavor

9:35

for this. So I think there's a whole bunch of things I

9:38

would point to as important. I think out

9:41

of, you know, in this category, I think if I had

9:43

to name one risk that I'm most focused

9:45

on, it's probably the misaligned AI risk. It's probably

9:47

the one about, you know, you

9:50

kind of build these very capable, very

9:52

powerful

9:52

AI systems. They're these systems that

9:55

if, for whatever reason, they were pointed at bringing

9:57

down all of human civilization, they could. And

9:59

then something about your training is kind of

10:01

sloppy or leads to unintended

10:04

consequences so that you actually do have AIs

10:06

trying to bring down civilization. I think that

10:08

is probably the biggest one, but I think there's also

10:11

a meta threat that to

10:13

me is really the unifying catastrophic risk

10:15

of AI. And so for me that I

10:18

would abbreviate as just saying like explosively

10:20

fast progress. So the central

10:23

idea of the most important century series that I wrote

10:26

is that if you get an AI with

10:28

certain properties, there's a bunch of

10:29

reasons from economic theory, from someone

10:32

from economic history. I think we're also putting

10:34

together some reasons now that you can take more from

10:36

the specifics of how AI works and

10:38

how algorithms development works to expect

10:41

that you could get a dramatic acceleration

10:43

in the rate of change and particularly in the

10:45

rate of scientific and technological advancement,

10:48

particularly in the rate of AI advancement itself so

10:50

that things move on a much faster timescale

10:53

than anyone is used to. And one of the central

10:55

things I say in the most important century series is that if

10:57

you imagine a wacky sci-fi

10:59

future, the kind of thing you would

11:02

imagine thousands of years from now for

11:04

humanity with all these wacky technologies, that

11:06

might actually be years or

11:09

months from the time when you

11:11

get in range of these super powerful AIs

11:13

that have certain properties. That to me is

11:15

the central problem. And I think all these other risks that

11:17

we're talking about, they wouldn't

11:20

have the same edge to them if it weren't for that.

11:22

So misaligned AI, if AI

11:25

systems got very gradually more powerful

11:27

and we spent a decade with systems that

11:29

were

11:29

kind of close to as capable

11:32

as humans, but not really, and then a decade with systems

11:34

that were about as capable as humans with some strengths and

11:36

weaknesses, then a decade of systems a little more

11:38

capable, I wouldn't really be that worried.

11:40

I feel like this is something we could kind of adapt to as we

11:42

went and figure out as we went along. Similarly,

11:45

with misuse, AI systems might

11:47

end up able to help develop

11:49

powerful technologies that are very scary, but that wouldn't

11:51

be as big a deal. It would be kind of a continuation

11:53

of history if this just

11:56

went in a gradual way. And my big concern

11:58

is that it's not gradual.

11:59

Maybe we're digging on that a little bit more, is exactly

12:02

how fast do I mean and why, even though I have covered

12:04

it somewhat in the past, because that to me is really

12:06

the central issue. And one of the reasons I'm so

12:08

interested in AI safety standards is because

12:11

it is kind of, no matter what risk

12:13

you're worried about, I think you hopefully

12:15

should be able to get on board with the idea that you should measure

12:18

the risk and not unwittingly

12:21

deploy AI systems that are carrying a ton of the

12:23

risk before you've at least made a deliberate, informed

12:25

decision to do so. And I think if

12:28

we do that, we can anticipate a

12:29

lot of different risks and stop them from

12:32

coming at us too fast. Too fast is the central

12:34

theme for me.

12:35

Yeah. Yeah, it's very interesting framing

12:37

to put the speed of advancement like front

12:39

and center as this is kind of the key way that this

12:41

could go off the rails and in all sorts of different directions.

12:44

So, Eliezer Jutkowski has this kind of classic story

12:46

about how you get an AI taking over the

12:48

world like remarkably quickly. And

12:50

a key part of the story as he tells it is this

12:52

sudden self-improvement loop where the AI

12:55

gets better at doing AI research and that improves itself

12:57

and then it's better at doing that again. And so you get

12:59

this recursive loop where suddenly you go from

13:01

somewhat human level intelligence to something that's very,

13:04

very, very superhuman. And

13:05

I think many people

13:06

reject that primarily because they reject the

13:09

speed idea that they think, yes, if you got

13:11

that level of advancement over a period of days, sure,

13:13

that might happen. But actually, I just don't expect

13:15

that recursive loop to be quite so

13:17

quick. And likewise, if we might

13:19

worry that AI might be used by people to

13:22

make bioweapons, but if that's something that gradually came

13:24

online over a period of decades, we probably have all kinds of responses

13:26

that we could use to try to prevent that. But if it goes

13:28

from one week to the next, then we're in

13:30

a tricky spot. Do you want to expand on that? Is there maybe

13:33

insights that come out of this speed-focused

13:35

framing of the problem that people aren't taking

13:38

quite seriously enough?

13:39

Yeah, I should first say I don't know that I'm on the

13:41

same page as Eliezer. I can't totally always tell,

13:43

but I think he is picturing probably a more

13:46

extreme and faster thing that I'm picturing and

13:48

probably for somewhat different reasons. I think

13:50

a common story in some corners of this

13:53

discourse is this idea of an AI that

13:56

it's this simple computer program and it rewrites

13:58

its own source code. And it's like the,

14:00

you know, that's where all the action is. I don't think

14:03

that's exactly the picture I have in mind, although there's

14:05

some similarities. And so that, you know, the

14:07

kind of thing I'm picturing is maybe more like a months

14:09

or years time period from getting sort

14:12

of near human level AI

14:14

systems and what that means is definitely debatable

14:16

and gets messy, but near human level

14:19

AI systems to just like very, very powerful

14:21

ones that are advancing science and technology

14:24

really fast. And then science and technology,

14:26

like at least on certain fronts that are not, that

14:29

are the less bottlenecked

14:29

fronts, and we could talk about bottlenecks in a minute,

14:32

you get like a huge jump. So I

14:34

think my view is at least somewhat more moderate

14:36

than Eliezer's and at least has somewhat different dynamics.

14:39

But I think there is, you know, both

14:42

points of view are talking about this rapid change. I

14:44

think without the rapid change,

14:46

A, things are a lot less scary generally. B,

14:48

I think it is harder to justify a lot

14:51

of the stuff that AI concerned people do to try and

14:53

get out ahead of the problem and think about things in advance,

14:55

because I think a lot of people sort of complain

14:57

with this discourse that it's really hard to know the

14:59

future and all this stuff we're talking about what future

15:02

AI systems are going to do, what we have to do about

15:04

it today. It's very hard to get that right. It's

15:06

very hard to anticipate what things will be like in

15:08

an unfamiliar future. And I think when people

15:10

complain about that stuff, I'm just like very sympathetic.

15:12

I think that's like,

15:13

right. And if I thought that

15:15

we had the option to adapt to everything

15:18

as it happens, I think I would in many

15:20

ways be tempted to just work on other problems

15:22

and then kind of in fact adapt to things as they happen

15:24

and see what's happening and see what's most needed. And

15:27

so I think a lot of the case for planning

15:29

things out in advance, trying to tell stories of

15:32

what might happen, trying to figure out

15:34

what kind of regime we're going to want and put the pieces

15:36

in place today, trying to figure out what kind of research

15:38

challenge is going to be hard and doing today. I think a

15:40

lot of the case for that stuff being so important does

15:43

rely on this theory that things could move

15:45

a lot faster than anyone is expecting. I

15:48

am in fact very sympathetic to people who would rather

15:50

just adapt to things as they go. They're

15:52

the right way to do things. And I think

15:55

many attempts to anticipate future problems

15:57

or things I'm just not that interested in because

15:59

of this issue.

15:59

But I think AI is a place where we have to

16:02

take the explosive progress things seriously

16:04

enough that we should be doing our best to prepare for it. Yeah,

16:07

I guess if you have this explosive growth, then

16:10

the very strange things that we might be

16:12

trying to prepare for might be happening in 2027 or incredibly soon. Something

16:16

like that, yeah. Yeah, it's imaginable, right?

16:18

And it's all extremely uncertain

16:20

because we don't know. In my

16:22

head, a lot of it is like there's a set

16:24

of properties that an AI system could have, roughly

16:27

being able to do roughly everything humans are

16:29

able to do to advance science and technology or

16:31

at least able to advance AI research. We

16:33

don't know when we'll have that. And so it's like, you

16:35

know, one possibility

16:36

is we're like 30 years away from that.

16:38

But once we get near that, things will

16:41

move incredibly fast. And that's a world

16:43

we could be in. We could also be in a world where we're only a few years from

16:45

that, and then everything's going to get much crazier than anyone

16:47

thinks, much faster than anyone thinks. Yeah,

16:49

I guess one narrative is that

16:51

it's going to be exceedingly difficult to

16:54

align any artificial intelligence because you

16:56

know, you have to solve these 10 technical problems

16:58

that we've almost gotten no traction on so far.

17:01

So just from, you know, it gets decades

17:03

or centuries in order to fix them.

17:05

On this speed focused narrative,

17:07

it actually seems a little bit more positive because you

17:09

might be saying,

17:10

it might turn out that from a technical standpoint,

17:12

this isn't that challenging. The problem will be

17:15

that things are going to run so quickly that we might only

17:17

have a few months to figure out how, like

17:20

what solution we're choosing and actually try

17:22

to apply it in practice. But of course, in

17:24

as much as we just need to slow down, that

17:26

is something that in theory, at least people could agree

17:28

and actually just and try to coordinate

17:31

in order to do. Do you think that that is going to be a

17:33

part of the package that we ideally just want to

17:35

coordinate people as much as possible to make this

17:37

as gradual as feasible? Well,

17:39

these are separate points. So I think

17:41

you could believe in the speed and also

17:43

believe the alignment problems really hard. Believing

17:46

in speed doesn't make the alignment problem any easier.

17:48

And I think that the speed point is really just

17:50

bad news. I think it's just, you know, I

17:52

hope things don't move that fast. If

17:55

things move that fast, I think most human

17:57

institutions ways of reacting to things just we

17:59

can't count on that. them to work the way they normally do and

18:01

so we're gonna have to do our best to get out ahead of things

18:03

and and plan ahead and make things better in

18:05

advance as much as we can and it's mostly just

18:08

bad news. There's a separate thing which is that yeah I

18:10

do I am less of a I

18:12

do I do feel less convinced

18:14

than some other people that the alignment

18:16

problem is this like incredibly hard technical problem

18:19

and more feel like yeah if we did have

18:21

a relatively gradual set of developments I think

18:23

we'd have good I think even with a very

18:25

fast developments I think there's a good chance we just get lucky

18:27

and we're fine. So I think they're two different

18:30

axes. I know you talked with Tom Davidson about

18:32

this a bunch so I don't want to make it like the main theme of the

18:34

episode but I do think like in case someone

18:36

hasn't listened every ADK podcast ever just

18:38

just getting a little more into the why of

18:41

why you get such an explosive growth and why not

18:43

I think this is a really key premise and

18:45

I think right most of the rest of what I'm saying doesn't

18:47

make much sense without it and I want to own that

18:50

yeah I don't want to lose out on the fact that I

18:52

am sympathetic to a lot of reservations about

18:54

working on AI risk so yeah maybe

18:56

it would be good to cover that a bit.

18:58

Yeah let's let's do that so one

19:00

obvious mechanism by which things could speed up is that you have

19:02

this positive feedback loop where the AI's get better

19:04

at improving themselves is there is there much more to the

19:06

to the story than that?

19:09

Yeah I mean I think it's worth recheving

19:11

that briefly I mean I think one observation

19:13

I think is interesting and this is a you know report by

19:15

David Rudman for Open Philanthropy that goes through this

19:18

one thing that I've wondered is just like if you take

19:20

the path of world economic growth throughout

19:22

history and you just kind of extrapolated

19:24

forward in the simplest way you can what do you get

19:27

and it's like well it depends what time period you're looking

19:29

at if you look at economic history since 1900 or 1950 we've had

19:31

a few percent of your growth over that entire

19:34

time and if you extrapolate it forward you get a few

19:36

percent of your growth and you

19:38

just get the world everyone is kind of already expecting and

19:40

the world that's in the UN projections all that stuff the

19:43

interesting thing is if you zoom out and

19:45

look at all of economic history you

19:47

see that economic progress for most of

19:49

history not not recently has been accelerating

19:53

and if you try to model that acceleration in a

19:55

simple way and project that out in a simple way you

19:57

get basically the economy going to infinite

19:59

size sometimes this century, which is like a wild

20:03

thing to get from a simple extrapolation.

20:06

I think the question is like, why is that and why might

20:08

it not be? I think the

20:10

basic framework I have in mind is like, there is

20:13

a feedback loop you can have in the economy where

20:15

people have new ideas, new

20:18

innovations that makes them more productive. Once

20:20

they're more productive, they have more resources. Once

20:22

they have more resources, they have more kids.

20:24

There's more population or there's fewer deaths and there's more

20:27

people. So it goes more people,

20:29

more ideas, more resources, more

20:31

people, more ideas, more resources. When

20:34

you have that kind of feedback loop in place, any economic

20:36

theory will tell you that you get what's called super

20:38

exponential growth, which is growth that's accelerating.

20:41

It's accelerating on an exponential basis and

20:43

that kind of growth is very explosive,

20:46

is very hard to track and can go to infinity in finite

20:48

time. The thing that changed a couple hundred

20:50

years ago is that one piece of that feedback loop

20:53

stopped for humanity. People

20:55

basically stopped turning more resources into more

20:57

people. So right now, when people get richer,

20:59

they don't have more kids, they just get

21:01

richer. Buy another car. Yeah, exactly. And

21:04

so that feedback loop kind of broke. That's

21:07

not like a bad thing that it broke, I don't think, but it

21:09

kind of broke. And so we've had

21:11

just like what's called normal exponential growth, which

21:13

is still fast, but which is not the same thing,

21:15

doesn't have the same explosiveness to it. And

21:18

the thing that I think is interesting and different about

21:20

AI is that if you get

21:23

AI to the point where it's doing the same thing

21:25

humans do to have new ideas, to

21:27

improve productivity, so this is like the science

21:29

and invention part, then you can

21:31

turn resources into AIs

21:34

in this very simple linear way that

21:36

you can't do with humans. And so you

21:38

could get an AI feedback loop and

21:41

just to be a little more specific about what it might look like,

21:43

right now AI systems

21:46

are getting a lot more efficient. You can do a lot more with

21:48

the same amount of compute than you could 10 years ago, actually

21:51

a dramatic amount more. I think something,

21:53

various measurements of this or something like you can get the same

21:55

performance for something like 18X or 15X

21:58

less compute compared to like a few years

21:59

ago, maybe a decade ago.

22:01

Why is that? And it's because there's a bunch of human

22:03

beings who have worked on making AI algorithms

22:06

more efficient. So to me, the big scary

22:08

thing is when you have an AI that does

22:11

whatever those human beings were doing. And there's

22:13

no particular reason you couldn't have that because what those human

22:15

beings were doing, as far as I know, was mostly kind

22:17

of like sitting at computers, thinking of stuff,

22:19

trying stuff. There's no particular reason

22:21

you couldn't automate that. Once you automate that,

22:23

here's the scary thing. You have

22:25

a bunch of AIs. You use those AIs

22:28

to come up with new ideas to make your AIs more efficient.

22:30

Then let's say that you make your AIs

22:33

twice as efficient. Well, now you have twice as many AIs.

22:36

And so if having twice as many AIs can make your

22:38

AIs twice as efficient, again, there's really no

22:40

telling where that ends. And Tom Davidson

22:42

did a bunch of analysis of this, and I'm still

22:44

kind of poking at and thinking about it, but I think there's at least a decent

22:46

chance that that is the kind of thing that leads to

22:49

explosive progress where AI

22:51

could really take off and get very capable, very

22:53

fast. And you can extend that somewhat to other

22:56

areas of science. And it's like, some

22:59

of this will be bottlenecked. Some of this will be like, you

23:01

can only move so fast because you have to do a bunch of experiments

23:04

in the real world. You have to build a bunch of stuff. And

23:06

I think some of it will only be a little bottlenecked or will only

23:08

be somewhat bottlenecked. And I think there are some feedback loops

23:11

just kind of going from you get more

23:13

money, you're able to kind of quickly

23:16

with automated factories build more stuff

23:18

like solar panels, you get more energy,

23:20

and then you get more money, and then you're able

23:22

to do that again. And it's like in that loop, you

23:24

have this part where you're making everything more efficient all the time.

23:26

And I'm not going into all the details

23:29

here. It's been gone into more detail in my blog

23:31

post, The Most Important Century and Tom Davidson in

23:33

his podcast and continues to think about it. But

23:37

that's the basic model is that you

23:39

have this feedback loop that we have observed in history that

23:41

doesn't work for humans right now, but could work for AIs,

23:44

where you have AIs, have ideas

23:47

in some sense, make things more efficient. When

23:49

things get more efficient, you have more AIs. That creates

23:51

a feedback loop. That's where you get your super exponential growth.

23:54

Yeah, so one way of describing

23:56

this is talking about you get the

23:58

artificial intelligence becomes more

24:00

intelligent and that makes it more capable of improving

24:03

its intelligence and so it becomes super, super smart.

24:05

But I guess the way that you're telling it emphasizes

24:08

a different aspect, which is not so much that it's becoming

24:10

super smart, but that it is becoming super numerous

24:12

or that you can get effectively a population explosion.

24:15

And I think some people are skeptical of this super

24:17

intelligent story because they think you get really declining

24:19

returns to being smarter and that there's like some ways

24:21

in which it just doesn't matter how smart you

24:23

are, the world's too unpredictable, say, for you

24:26

to come up with a great plan. But

24:28

this is a different way by which you can get the same outcome,

24:30

which is just that you

24:32

have this enormous increase in the number of thoughts

24:35

that are occurring on computer chips,

24:37

more or less. And at some point, you know, 99% of

24:39

the thoughts that are happening on Earth could basically be

24:42

happening, be occurring inside artificial intelligences.

24:45

And then as they get better and they're able to make more chips

24:47

more quickly, again, you basically just

24:49

get the population explosion.

24:51

Yeah, that's exactly right. And I think this is

24:53

a place where I think some people get a little bit rabbit holed

24:55

on the AI debates because I think there's

24:58

a lot of room to debate how big

25:00

a deal it is to have something that's quote unquote extremely

25:03

smart or super intelligent or much smarter than human. And

25:06

it's like, OK, maybe if you had like something

25:08

that was kind of like a giant brain or something and

25:10

way smarter, whatever that means, than us,

25:13

maybe what that would mean is that it would like instantly see

25:15

how to make all these super weapons and conquer

25:17

the world and how to convince us of anything. And there's

25:19

all this stuff that that could mean and people debate

25:21

whether it could mean that. But it's uncertain.

25:24

It's a lot less uncertain if you're finding yourself skeptical

25:27

of what this smart idea means

25:29

and where it's going to go and what you can do with it. If you

25:31

find yourself skeptical of that, then just forget about it. And

25:33

just I believe you can make the entire

25:36

case for being extremely concerned about AI,

25:38

assuming that AI will never be smarter

25:40

than a human. Instead it will be as

25:43

capable as the most capable humans. And

25:45

there will be a ton of them because unlike

25:47

humans, you can just copy them. You

25:50

can copy them. You can use your copies to

25:52

come up with ways to make it more efficient,

25:54

just like humans do. Then you can

25:56

make more copies. And when we talk about whether

25:59

AI could defeat humanity. and I've written one blog post

26:01

on whether AI can kind of like take over the world, they

26:03

don't have to be more capable than humans.

26:06

They could be equally capable and there could be more of

26:08

them. That could really do it. That could really

26:10

be enough that then we wouldn't be,

26:12

humans wouldn't be in control of the world anymore. So

26:15

I'm basically generally happy

26:17

to just have all discussions about AI and

26:19

what the risks are just in this world where

26:21

like there's nothing more capable than a human, but

26:23

it's pretty scary to have a lot of those that have

26:25

different values from humans that are kind of a second advanced

26:28

species. That's not to rule out that

26:30

some of these super intelligence concerns could be real. It's

26:32

just like they're not always necessary and they can

26:34

sideline people.

26:36

Yeah, you can just get beaten by force

26:38

of numbers more or less. I think it's a little bit of a shame

26:40

that this sheer numbers argument

26:42

hasn't really been made very much. It

26:44

feels like the super intelligence story has been

26:46

very dominant in the narrative and media

26:49

and yet many people get off the boat because they're

26:51

skeptical of this intelligence thing. I think

26:54

it kind of is the fault of me and maybe people who've

26:56

been trying to raise the alarm about this because the focus really

26:58

has been on the super intelligence aspect rather than the super

27:01

numerousness that you could

27:03

get. Yeah, I don't know. I mean, I think there's valid concerns

27:05

like from

27:06

that angle for sure. And I'm not trying to

27:08

dismiss it, but I do. I

27:10

think there's a lot of uncertainty about what

27:12

super intelligence means and where it could go. And

27:14

I think you can raise a lot of these concerns without needing to have

27:16

a subtle view there. So

27:19

yeah, with the Jaya Kocher and Rowan Shah,

27:21

I found it really instructive to hear from them

27:24

about what are some kind of common opinions that

27:26

they don't share or maybe even just regard

27:28

as misunderstandings. Yeah, so maybe

27:30

let's go through a couple of those to help maybe situate you in the space

27:33

of ideas here. What's a common opinion

27:35

among kind of the community

27:36

of people working to address

27:38

AI risk that you personally don't share?

27:40

Yeah, I mean, I don't know. I think one kind of vibe I

27:42

pick up and I don't always have the exact quote

27:45

of whoever said what, but a vibe I pick up is this

27:48

kind of framework that kind of says, you know, if we

27:50

don't align our AIs, we're all going

27:52

to die. And if we can align our AIs,

27:54

that's great. And we've solved the problem. And that's

27:56

the problem we should be thinking about. And there's nothing

27:58

else really worth worrying about. You

28:00

know, it's kind of like alignment is the whole game would be

28:02

the hypothesis and I Disagree

28:04

with with both ends of that, but especially the

28:06

latter so to take the first end would be

28:09

like, you know If we if we don't align AI

28:11

we're all dead I mean first off I just think

28:13

it's like really Unclear even in the

28:15

even in the worst case where you get an AI that

28:17

has like its own values And there's

28:19

a huge number of them and they kind of team up and take

28:21

over the world Even then it's like really unclear

28:24

if that means we all die I think there's like I

28:26

know there's debates about this I have I have tried

28:28

to understand that I know that the the muri

28:31

folks I think feel really strongly clearly we all die

28:33

I've tried to understand where they're coming from and I have not

28:36

I think a key point is it just you know Could

28:38

be very very cheap as a percentage

28:40

of resources for example to

28:43

let humans have a nice life on earth and

28:45

not expand further and and be cut off

28:47

in certain ways from Threatening, you know AI's

28:49

ability to do what it wants that could be very

28:51

cheap compared to wiping us all out And there could

28:53

be a bunch of reasons one might want to do

28:55

that Some of them kind of wacky some

28:57

of them kind of you know, well, maybe You

29:00

know maybe in another part of the universe There's kind

29:03

of someone like the AI that

29:05

was trying to design its own AI

29:07

and that thing ended up with values like the humans

29:09

And you know, maybe there's some kind of trade that could be

29:11

made using like a causal trade And we don't need

29:14

to get into what all this means But it's like you don't

29:16

need much the thing is you don't need or like

29:18

maybe the AI is actually being simulated by humans Or

29:20

something or by some smarter version of humans or some more

29:22

powerful version of humans and being tested

29:24

to see if it'll wipe Out the humans would be nice to them It's like you

29:27

don't need a lot of reasons, you know to

29:29

kind of like leave one planet out if you're kind

29:31

of expanding throughout the galaxy So that would

29:33

be one thing is it just like I don't know It's like kind of uncertain

29:35

what happens even in the worst case and then

29:37

there's like I do think there's a bunch of in-between

29:40

cases where we kind of have a eyes

29:42

that are like there they're

29:44

sort of aligned with humans like if you if you

29:46

think about a analogy that often comes

29:48

up is like humans and natural selection where Humans

29:51

kind of were put under pressure by natural selection

29:54

to have lots of kids or to you know Do inclusive

29:56

reproductive fitness and we've

29:58

kind of okay. We invented birth control and a

30:00

lot of times we don't have as many kids as

30:03

we could and stuff like that. But also humans still

30:05

have kids and love having kids. A lot of humans

30:07

have 20 different reasons to have kids and

30:10

after a lot of the original ones have been knocked out

30:12

by weird technologies, they still find some other reason to have kids.

30:15

I don't know, I found myself one day wanting

30:17

kids and had no idea why and invented

30:20

all these weird reasons. I don't know,

30:23

it's not that odd to think that you could

30:25

have AI systems that just kind of like, yeah,

30:27

they're pretty off kilter from what we were trying to make

30:30

them do, but it's not like they're doing something completely

30:32

unrelated either. It's not like they have no drives

30:34

to do a bunch of stuff related to the stuff we wanted them

30:36

to do. Then you could also just have situations

30:39

where, especially in

30:41

the early stages of all this, where you might have

30:43

kind of near human level AIs and

30:45

so they might have goals of their own but they might not

30:48

be able to coordinate very well or they might not be able

30:50

to reliably overcome humans so

30:52

they might end up cooperating with humans a lot. We might

30:54

be able to leverage that into kind of

30:57

having AI allies that help us build

30:59

other AI allies that are more powerful so we might

31:01

be able to stay in the game for a long way. I don't know,

31:03

I just think things could be very complicated.

31:05

It doesn't feel to be like if you

31:07

screw up a little bit with the alignment problem then we all die.

31:11

The other part, if we do align

31:13

the AI, we're fine. I disagree with much more

31:15

strongly. I just think- All right.

31:18

More strongly than that. Okay, yeah, yeah, yeah. Go

31:20

for it. The first one, I mean, look, I think it would

31:22

be really bad to have misaligned AI and

31:24

I think despite the feeling that I feel it is fairly

31:27

overrated in some circles, I still think it's like the

31:30

number one thing for me. Just

31:32

like the single biggest issue in AI is

31:34

just like we're building these potentially

31:37

very powerful, very replicable, very numerous

31:39

systems and we're building them in ways we don't

31:41

have much insight into whether

31:43

they have goals, what the goals would be. We're

31:46

kind of introducing the second advanced species onto

31:48

the planet that we don't understand and if that

31:50

advanced species becomes more numerous and or more

31:52

capable than us, we don't have a great

31:54

argument to think that's going to be good for us. I'm

31:57

on board with alignment risk.

31:59

I don't know.

31:59

thing, not the only thing, the number one thing. But

32:02

I would say, if you just assume that

32:04

you have a world of very capable

32:07

AIs that are doing exactly what humans

32:09

want them to do, yeah, that's very

32:11

scary. And I think if that was the world we knew we were going

32:13

to be in, I would still be totally full-time on AI

32:16

and still feel that we had so much work to do and we

32:18

were so not ready for what was coming.

32:20

Certainly, there's the fact that because

32:23

of the speed at which things move, you could

32:25

end up with whoever kind of leads the way

32:27

on AI or is least cautious having

32:30

a lot of power. And that could be someone really bad.

32:33

And I don't think we should assume that just because

32:35

that if you had some head of state

32:37

that has really bad values, I don't think we

32:39

should assume that that person is going to end up being nice

32:42

after they become wealthy or powerful

32:44

or transhuman or mind uploaded

32:46

or whatever. I don't think there's really any reason

32:48

to think we should assume that. And then I think

32:50

there's just a bunch of other things that if things are moving fast,

32:53

we could end up in a really bad state. Like, are

32:55

we going to come up with decent frameworks

32:58

for making sure that digital

33:01

minds are not mistreated? Are we going to come up with

33:03

decent frameworks for kind of like how

33:06

to ensure that as we get the ability to

33:08

create whatever minds we want, we're using that

33:10

to create minds and help us seek the truth

33:12

instead of create minds that have whatever beliefs

33:14

we want them to have stick to those beliefs

33:17

and try to shape the world around those beliefs. I

33:19

think Carl Schulman put it as are

33:21

we going to have AI that makes us wiser

33:23

or more powerfully insane? So

33:25

I think there's just a lot. I

33:27

think we're kind of on the cusp of something that

33:30

is just potentially really big, really

33:32

world changing, really transformative and going to

33:34

move way too fast. And I think even

33:36

if we threw out the misalignment problem, we'd have a lot of work

33:38

to do. And I think a lot of these issues are actually not getting

33:40

enough attention.

33:41

Yeah. I think some of that might be going on there

33:44

as a bit of equivocation in the

33:46

word alignment. So you can imagine some

33:48

people might mean by creating an aligned

33:50

AI, it's like an AI that kind of goes and does what you tell

33:52

it to like a good employee or something. Whereas

33:54

other people mean it is following

33:56

the correct ideal values and behaviors and

33:58

is going to work to generate. generate the best outcome.

34:01

And these are really quite separate

34:03

things, although very far apart. Yeah.

34:06

Well, the second one, I don't

34:08

even know if that's a thing. I don't even

34:10

really know what it's supposed to be. I mean, there's something a little

34:12

bit in between, which is like,

34:14

you can have an AI that you ask

34:16

it to do something, and it does what you would

34:18

have told it to do if you had been more informed

34:20

and if you knew everything it knows. That's the

34:23

central idea of alignment that I tend to think of, but

34:25

I think that still has all the problems I'm talking about. Some

34:28

humans seriously do intend

34:31

to do things that are really nasty and seriously

34:33

do not intend in any way, even if they knew more,

34:35

to make the world as nice as we would like it to be. And

34:38

some humans really do intend and

34:40

really do mean and really will want to

34:42

say, right now I have these values. Let's

34:44

say this is the religion I follow. This is what I

34:46

believe in. This is what I care about. And I am creating

34:49

an AI to help me promote that religion, not to help me question

34:51

it or revise it or make it better. So yeah, I think it's

34:53

that middle one, I think it does not make it safe. There

34:55

might be some extreme version that's like,

34:57

an AI that just figures out what's objectively best

34:59

for the world and does that or something. And I'm just

35:01

like, I don't know why I would, I don't know why you would think that would even be a

35:03

thing to aim for. That's not the alignment problem that I'm interested

35:06

in having solved, yeah. Yeah. Okay,

35:08

what's something that some kind of safety

35:11

focused folks that you potentially collaborate

35:13

with or at least talk to, but they think

35:15

that they know, which you think in fact, we

35:18

just, nobody knows.

35:20

Yeah, I mean, I think in general, there's

35:23

this kind of question in deep

35:25

learning of, you train an agent

35:27

on one distribution of data or

35:30

reward signals or whatever. And now you're

35:32

wondering when it goes out of distribution,

35:34

when it hits a new kind of environment

35:36

or a new set of data, how it's gonna react to that. So

35:38

this would be like, how does an AI generalize

35:41

from training to out of distribution?

35:44

And I think in general, people have

35:47

a lot of trouble understanding this and

35:49

have a lot of trouble predicting this. And I think that's not

35:51

controversial. I think that's known, but I think

35:53

it kind of comes down to, or it

35:55

relates to some things where people do seem overly

35:57

confident. A lot of what people are doing right

35:59

now. with these AI models is they're doing what's called reinforcement

36:02

learning on human feedback or from human

36:04

feedback, where the basic idea is you

36:06

have an AI that tries something and then

36:08

a human says, that was great or that wasn't so

36:10

good, or maybe the AI tries two

36:13

things, the human says which one was better. And

36:16

if you do that, and that's a major way

36:18

you're training your AI system, there's

36:20

this question of what do you get as a result

36:23

of that? Do you get an AI system

36:25

that is actually doing what humans want

36:28

it to do? Do you get an AI system that's doing what humans

36:30

would want it to do if they kind of knew all the facts?

36:33

Do you get an AI system that is like tricking

36:35

humans into thinking it did what they wanted it

36:37

to do? Do you get an AI system that's sort of trying

36:39

to maximize paperclips? And one way to do that is to

36:41

do what humans wanted to do. So as far as you can tell, it's

36:43

doing what you want it to do, but it's actually trying to maximize paperclips.

36:46

Like, which of those do you get? And I think just like

36:48

people don't, like we don't know, and

36:50

I see overconfidence on both sides

36:52

of this. I think I see people saying, we're

36:54

going to basically train this thing to

36:56

do nice things and it'll keep doing nice things as

36:59

it operates in an increasingly changing world. And

37:01

then I see people saying, we're going to train AI to do nice

37:03

things. And it will basically pick up on

37:05

some weird correlate of our training

37:08

and try to maximize that and will

37:10

not actually be nice. And I'm just like, geez,

37:13

we don't know. We don't know. And

37:15

there's arguments that say, oh, wouldn't it be weird if it

37:17

came out doing exactly what we wanted it to do? Because

37:19

there's this wide space of other things

37:21

that could generalize to it. I just think those arguments

37:23

are just kind of weak and they're not very well fleshed out. There's

37:26

genuinely just a ton of vagueness

37:28

and not good understanding of what's

37:31

going on in a neural net and how

37:33

it generalizes from this kind of training. So

37:35

the upshot of that is I think people are

37:38

often just overconfident that AI

37:40

alignment is going to be easier hard. I think there's people

37:42

who think, we basically got the right framework, we

37:44

ought to debug it. And there's people who

37:46

think this framework is doomed, it's not going to work, we need something better. And

37:49

I just don't think either is right. I think if

37:51

we just go on the default course and we

37:53

just kind of train AIs based

37:55

on what looks nice to us, that could totally

37:57

go fine and it could totally go disastrously.

37:59

No, weirdly few people who

38:02

believe both those things, a lot of people

38:04

seem to be overconfidently in one camp or

38:06

the other on that. Yeah, I'm completely with you

38:08

on this one. I think it's one of the things that I've started to

38:10

believe more over the last six months. No

38:13

one really has super strong arguments for what kind

38:15

of motivational architecture these ML

38:18

systems are going to develop. Yeah. Well,

38:20

I suppose that's maybe an improvement relative to where it was before, because

38:22

I was a little bit more on the Duma side before.

38:26

I feel like this one, there should be some empirical way

38:28

of investigating this. People do have

38:30

good points that a really super intelligent

38:33

and incredibly sneaky model will behave exactly

38:35

the same regardless of its underlying

38:37

motive. But you could try to investigate this on less

38:39

intelligent, less crafty models, and surely you would

38:42

be able to get some insight into the way it's thinking

38:44

and how its motives actually cash

38:46

out. I

38:47

think it's really hard. It's

38:50

just like really, I mean, it's really

38:52

hard to make generalized statements about how

38:55

an AI generalizes to weird new

38:57

situations. And

38:59

yeah, there is work going on trying

39:01

to understand this, but it's going to be, it's just

39:04

been hard to get anything that feels satisfyingly analogous

39:06

to the problem we care about right now with AI

39:08

systems and their current capabilities. And even once

39:10

we do, I think there'll be plenty of arguments that are just like,

39:12

well, once the AI systems are more capable than that, everything's

39:15

going to change. And AI will generalize

39:17

differently when it understands who we

39:19

are and what our training is and how that works

39:21

and how the world works. And it understands

39:23

that it could take over the world if it wanted to. That

39:26

actually could cause an AI to generalize differently.

39:28

So as an example, this is something

39:30

I've written about on ColdTakes. I call it the King

39:33

Lear problem. So King Lear is a Shakespearean

39:35

character who kind of has three daughters

39:37

and he asks them each to describe their love for him. And

39:40

then he kind of like hands the kingdom over to the ones

39:42

that he feels good about after hearing their speeches and

39:44

he just picks wrong. And then that's too bad

39:47

for him. And the issue is

39:50

it's like it flips on a dime. It's like the two

39:52

daughters who are like the more evil ones were

39:54

doing a much better job pretending they loved him

39:56

a ton because they knew that they

39:58

didn't have power yet. power later. So

40:00

it actually like their behavior depended

40:03

on their calculation of what was going to happen. And

40:05

so the analogy to AIs, it's kind of like you

40:07

might have an AI system that's like kind of maybe

40:10

what its motivational system is, is it's

40:12

trying to maximize

40:14

the number of humans that are saying, hey, good job,

40:17

this obviously bit of a simplification or dramatization.

40:20

And it kind of is understanding at all

40:22

points, that if it could

40:25

take over the world, enslave all the humans,

40:27

make a bunch of clones of them and like run

40:29

them all in loops saying good job. If it could,

40:32

then it would and it should. But

40:34

if it can't, then maybe it should just cooperate

40:36

with us and be nice. You can have an AI system

40:38

that's like running that whole calc and humans often run

40:40

that whole calc like right as a kid

40:43

in school, I might often be thinking, well, you

40:45

know, if I can get away with this, then this is

40:47

what I want to do. If I can't get away with this, maybe

40:49

I'll just do what the teacher wants me to do. So you could

40:51

have the eyes with that whole motivational system. And then it's

40:53

like, cool. So now it's like you put them in a test

40:55

environment and you test if they're going to be nice

40:58

or try and take over the world. But in the test

41:00

environment, they can't take over the world, so they're going to be nice.

41:03

Now you're like, great, this thing is safe. You put it out in the world.

41:05

Now there's a zillion of it running around. Well, now it can take over the

41:07

world. So now it's going to behave differently. So you can have

41:10

just like one consistent motivational system

41:12

that is fiendishly hard to do a test

41:15

of how that system generalizes when it has power because

41:17

you can't generalize. You can't test what happens

41:20

when it's no longer a test.

41:22

What's the view that's common among ML researchers,

41:24

which you would you disagree with?

41:26

You know, it depends a little bit with ML researchers

41:28

for sure. I would definitely

41:31

say that I've been a big, bitter lesson

41:33

person since at least 2017. And,

41:36

you know, I got a lot of this from just like

41:38

Dario Amade, my wife's brother

41:40

who is CEO of Anthropic and I think

41:43

has been like very insightful. A lot

41:45

of what's gone on in the AI over the last few years is just

41:47

like bigger models,

41:49

more data, more training. And there's

41:53

an essay called The Bitter Lesson by an ML researcher,

41:55

Rich Sutton, that just kind of says, you know, ML

41:57

researchers keep coming up with clever and clever ways

42:00

to design AI systems and then those

42:03

cleverness is keep getting obsolete it by

42:05

just making the things bigger and just

42:07

like training them more and putting in more

42:09

data and so you know I've had a lot

42:11

of arguments over the last few years and

42:13

you know in general I have heard people arguing with

42:16

each other that are just kind of like on one

42:18

side it's like well today's AI systems

42:20

can do some cool things but they'll never be able to do this

42:22

and to do this like maybe that's reasoning

42:25

creativity you know something like that we're

42:27

gonna need a whole new approach to AI and

42:29

then the other side will say no I think we just need to make them

42:31

bigger and then they'll be able to do this I tend

42:35

to almost entirely toward that toward

42:37

that just make a bigger view I think just at

42:39

least in the limit if you if you took

42:42

an AI system and made it really big you

42:44

might need to make some tweaks but the tweaks wouldn't

42:46

necessarily be like really hard or

42:48

require giant conceptual breakthroughs I

42:50

do tend to think that that whatever it is

42:52

humans can do we could probably eventually

42:54

get an AI to do it and eventually it's not gonna

42:56

be a very fancy AI it could be just like a

42:59

very simple AI with some easy

43:01

to articulate stuff and a lot of the challenge come

43:03

from making it really big putting in a lot of data I think

43:06

this view has become like more popular over

43:08

the years than it used to be but it's still like pretty

43:11

debated I think a lot of people are still

43:13

looking at today's models and saying hey there's

43:15

fundamental limitations we're gonna need a whole new approach

43:17

to AI before they can do extra why

43:19

I'm just kind of out on that I

43:22

I think it's possible I'm not confident this is just

43:24

like where my instinct tends to lie

43:26

that's a disagreement I think another disagreement I have

43:29

with with some ML researchers I think not

43:31

all at all but but there's sometimes just

43:33

I feel like a background sense that

43:35

just like

43:36

sharing openly information

43:39

publishing

43:40

open sourcing etc it's just

43:42

like good that it's kind of it's kind of bad

43:44

to do research and keep it secret and it's good

43:46

to do research like publish it and

43:49

I you know I don't I don't feel this way

43:51

I think the things we're building could be very

43:54

dangerous at some point and I think that point

43:56

can come a lot more quickly than anyone is expecting

43:58

I think when that point some

44:00

of the open source stuff we have could

44:02

be used by bad actors in

44:05

conjunction with later insights to create

44:07

very powerful AI systems that

44:10

we aren't thinking of in ways we aren't thinking of right now,

44:12

but we won't be able to take back later. And

44:14

in general, I do tend to think that academia

44:17

has kind of this idea that sharing information

44:19

is good built into its fundamental

44:21

ethos, and that might often be true. But I think

44:23

there's times when it's kind of clearly false and academia

44:26

still kind of pushes it. You know, gain of function

44:28

research being like kind

44:29

of an example for me, where just like people are

44:32

very, very into the idea of like

44:34

making a virus more deadly and publishing

44:36

how to do it. And I think this is just an example

44:38

of where just culturally there's some

44:41

background assumptions about information

44:43

sharing that I just think the world is more complicated

44:45

than that.

44:46

Yeah, I definitely encountered people from time

44:48

to time who are, they

44:49

have this very strong prior this very

44:52

strong assumption that everything should be open and people

44:54

should have access to everything and then I'm like, what if someone

44:56

was designing a hydrogen bomb that you could make with,

44:58

you know, equipment that you could get from your house? I'm just

45:01

like, I don't think that it should be open. I think we should

45:03

probably stop them from doing that. Yeah, yeah,

45:05

yeah. And certainly if they figure

45:07

it out, we shouldn't publish it. Yeah, yeah.

45:09

I suppose it's just that that's a sufficiently rare case

45:11

that

45:12

it's very natural to develop the intuition in

45:14

favor of openness from the from the 99 out of 100 cases

45:16

where that's not too unreasonable. Yeah,

45:18

I think it's usually reasonable. But I think I think bioweapons

45:20

is just like a great counter example, or just like, it's

45:23

not really balanced. It's not really like,

45:25

well, for everyone who, you

45:27

know, who tries to design or release some horrible

45:29

pandemic, we can have someone else using open

45:31

source information to design a countermeasure. Like, that's

45:34

not actually how that works. And so, yeah, I

45:36

think this attitude at least needs to be complicated

45:38

a little bit more than it is.

45:40

Yeah. What's something that listeners might

45:42

expect you to believe, but which you actually don't? Yeah,

45:45

I don't really know what people think, I think, but some

45:48

things that sometimes I kind of pick up. I

45:50

mean, I write a lot about the future. I do a lot

45:52

about, you know, a lot of stuff about, well, as

45:55

coming, we should like prepare and do this and don't do that. I

45:57

think a lot of people think that I think I have

45:59

like this great ability to predict the future

46:02

and that I can spell it out in detail and count on it.

46:04

And I think a lot of people think I'm like underestimating

46:07

the difficulty of predicting anything. And

46:10

you know, I think I may in fact be underestimating

46:12

it, but I think I do feel a lot

46:15

of gosh, it is so hard

46:17

to, you know, be even a decade ahead or five

46:19

years ahead of what's going to happen. It

46:22

is so hard to get that right and enough detail to be helpful.

46:24

A lot of times you can get the broad outlines of something, but

46:27

to really be helpful seems really hard. Even on COVID,

46:29

it's like I feel

46:29

like a lot of the people who saw it coming in advance

46:32

weren't necessarily able to do much to make things

46:34

better. And that someone includes open philanthropy.

46:36

And we had a biosecurity program for years before

46:39

COVID. And I think there was some helpfulness

46:41

that came of it, but not as much as there could have been.

46:43

And so, you know, I think in general, I just

46:45

like, I don't know, a lot of how I am is I'm just like,

46:48

pretty the future is really hard. Getting out of the

46:50

head of the future is really hard. I'd really like rather never

46:52

do it. I think in many ways I would

46:54

just like rather work on stuff like GiveWell

46:57

and global health charities and animal welfare and just

46:59

like adapt to things as they happen

47:01

and not try and get out ahead of things. And there's

47:03

just like a small handful of issues that

47:05

I think are important enough that and may

47:07

move quickly enough that we just we just have

47:09

to do our best. I think we can I don't

47:12

think we should feel this is totally hopeless. I think I think

47:14

we can in fact do some good by

47:17

getting out ahead of things and planning in advance. And

47:20

a lot of my feelings we got to do the best we can more

47:22

than hey, I know what's coming.

47:24

Yeah. Okay. And

47:26

the other one is what's something you expect quite a lot of listeners might

47:28

believe, which which would you think you'd be

47:30

happy to disabuse them of?

47:32

Yeah, there's a line that you've

47:34

probably heard before that is

47:37

something like this. It's something like most

47:39

of the people we can help are in future generations.

47:41

And there are so many people in future generations

47:44

that that kind of just ends the conversation

47:46

about how to do the most good that it's clearly

47:48

an astronomically the case that focusing

47:51

on future generations dominates all

47:53

ethical considerations or at least dominates all

47:55

considerations of like how to do the most good

47:57

with your philanthropy or your career. I kind

47:59

of think of this is like philosophical long-termism

48:01

or philosophy first long-termism that's very you

48:04

know kind of feels like you've ended the argument after you've

48:06

pointed to the number of people in future generations

48:09

and we can get to this a bit later in the interview I don't

48:11

I don't think it's a garbage view I give some credence

48:13

to it I think it's somewhat seriously and I think it's like

48:16

underrated by the world as a whole but

48:18

I would I would say that I give a minority

48:20

of my moral parliament to thinking this way I would

48:23

say like more of me than not thinks

48:26

that's not really the right way to think about doing good

48:28

that's not really the right way to think about

48:29

ethics and I don't think we can trust these

48:32

numbers enough to feel that it's such a blowout

48:35

and the reason that I'm currently focused

48:37

on what's classically considered long-term is causes

48:39

especially AI is that I believe

48:41

the risks are imminent and real enough

48:43

that you know even with much less aggressive

48:46

valuations of the future they are you know

48:48

competitive or perhaps the best thing to work on

48:50

another random thing I think is that if you if you

48:53

really want to play the game of just like being all about

48:55

the big numbers and only thinking about the populations

48:58

that are the biggest that you can help future generations

49:00

are like extremely tiny compared

49:02

to you know persons you might

49:04

be able to help through a causal interactions with

49:06

other parts of the multiverse outside our light cone I

49:09

don't know if you want to get to that or just refer people back to

49:11

Joe's episode on that but that's that's more of a

49:13

nitpick on that take yeah people can go

49:15

back to listen to the episode with Joe Car Smith if I'd

49:17

like to understand what we just said there let's

49:20

come back to AI

49:20

now I think I want to spend quite a lot of time

49:23

basically understanding what you think different actors

49:25

should be doing to you know governments AI labs you know our

49:28

listeners so what different ways that they might be able to contribute

49:30

to improving our odds here but

49:32

maybe before we do that it might could be worth talking about like

49:34

trying to envisage scenarios in which things

49:36

go relatively relatively well you've

49:39

argued that you're very unsure how things are gonna play out

49:41

but it's possible that we might model

49:44

through and get a reasonably good outcome even

49:46

if we basically carry on doing the fairly

49:48

reckless things things that we're doing right now

49:51

not because you're recommending that we take that path right but

49:53

rather just because it's it's relevant to know whether

49:55

we're just far off like completely far off

49:57

the possibility of any good outcome given Now,

50:00

what do you see as the value of laying out positive

50:02

stories or ways that things might go well?

50:04

Yeah, so I've written a few pieces that

50:06

are kind of laying out, here's an excessively

50:09

specific story about how the

50:11

future might go that ends happily with

50:14

respect to AI, that ends with kind of, you know,

50:17

we have AIs that didn't develop or didn't

50:19

act on goals of their own enough to disempower

50:22

humanity. And then we kind of ended up with this

50:24

world where we're all, the world is getting more capable

50:26

and getting better over time and none of the various disasters

50:28

we're sketching out happened. I've written

50:30

like three different stories like that. And then one story,

50:33

the opposite, how we could stumble

50:35

into AI catastrophe where things just go

50:37

really badly. Why have I

50:39

written these stories in general? You know,

50:41

I think it's not that I believe these stories,

50:43

it's not that I think this is what's going to happen. But I

50:45

think a lot of times when you're

50:48

thinking about general principles

50:50

of like what we should be doing today to reduce risks from

50:52

AI, it is often helpful,

50:54

just like my brain works better imagining specifics.

50:57

And I think it's often helpful to kind of imagine

50:59

some specifics and then extract back from

51:02

the specifics to general points and

51:04

see if you still believe them. So for example,

51:06

I've actually done, I mean, these are the ones I've published,

51:08

but I've done a lot of thinking of what are

51:10

different ways the future can go well. And it's

51:12

like there are some themes in them. It's like there's almost

51:15

there's almost no story of the future

51:17

going well, that doesn't have like

51:19

a part that's like, and no evil

51:21

person steals the AI weights and

51:24

goes and does evil stuff. And so, you

51:26

know, it has highlighted, I think, I think the importance

51:28

of security, the importance of just like information

51:30

security, just like you're training a powerful AI

51:33

system, you should make it hard for someone to steal

51:35

has like popped out to me as a thing that just

51:37

like keeps coming up in these stories, keeps

51:39

being present. It's hard to tell a story where it's not

51:42

a factor. It's easy to tell a story where it is a factor.

51:44

You know, another factor that has come up for me is just like,

51:46

there needs to be

51:47

some kind of way

51:49

of stopping or disempowering

51:52

dangerous AI systems. You can't just

51:54

build safe ones. Or like if you build the safe

51:57

ones, you have to somehow use them to help you stop

51:59

the dangerous because eventually people will build

52:01

dangerous ones. And I think the most promising

52:03

general framework that I've heard for

52:06

doing this is this idea of a kind

52:08

of evals-based regime where you test

52:11

to see if AIs are dangerous. And based

52:13

on the tests, you kind of have the world

52:15

coming together to stop them or you don't. And

52:17

I think even in a world where you have very powerful, safe

52:20

AI systems, you still need some kind of, probably,

52:22

you still need some kind of regulatory framework for

52:25

how to use those to use force to stop

52:27

other systems. And so these are

52:29

general factors that I think it's a little bit like

52:31

I think it's how some people might do math by imagining

52:34

a concrete example of a mathematical object,

52:36

seeing what they notice about it, and then abstracting

52:39

back from there to the principles. That's what I'm doing with

52:41

a lot of these stories. I'm just like, can I tell

52:43

a story specific enough that it's not obviously crazy?

52:45

And then can I see what themes there

52:48

are in these stories and which things I robustly

52:50

believe after coming back to reality? That's

52:52

a general reason for writing stories like that. The

52:54

specific story you're referring to, I wrote a post

52:57

on last round called Success Without Dignity,

52:59

which is

52:59

kind of a response to Eliezer Kowalski writing

53:02

a piece called, I think it was called Death With Dignity.

53:04

Yeah, we should possibly explain that idea. I think some

53:07

people have become so pessimistic about our prospects

53:10

of actually

53:11

avoiding going extinct, basically, because they think

53:13

this problem is just so difficult. But they've said, well,

53:15

really, the best we can do is to not make

53:17

fools of ourselves in the process of going extinct,

53:19

that we should at least cause our own extinction

53:21

in some way that's like barely respectable, if I

53:23

guess aliens were to read the story or to uncover

53:26

what we did. And they call this kind of dignity

53:29

or death with dignity. Death with dignity. Sorry, anyway. Yeah.

53:31

And to be clear, the idea there is not

53:33

like we're literally just trying to have

53:35

dignity. It's like the idea is like, that's a proximate

53:38

thing you can optimize for that actually increases

53:40

your odds of success the most

53:41

or something. Yeah, and my response- For

53:43

many people, that's a little bit tongue in cheek as well. Yeah, yeah,

53:45

yeah, for sure. And my response though

53:47

is a piece called, Success Without Dignity, that's just

53:49

kind of like, well, I don't know, it's just actually

53:51

pretty easy to picture a world where we just like,

53:54

we just like do everything wrong. And like,

53:56

there's no real positive surprises from here.

53:59

At least not-

53:59

in terms of like people who are deliberately

54:02

trying in advance to reduce AI x-risk, like

54:04

there's no big breakthroughs on AI alignment, there's

54:06

no like real happy news, just a lot of stuff just happens

54:08

normally and happens on the path it's on, and then

54:11

we're fine, and why are we fine? Well, we basically

54:13

got lucky, and I'm like, can you tell a story like

54:15

that? And I'm like, yeah, I think I can tell a story like that,

54:17

and why does that matter? I think it matters because

54:19

it's, um, I think a number of people have

54:21

this feeling with AI that they're just like, we're

54:23

screwed by default, we're gonna have to get like 10

54:25

really different hard things all right,

54:28

or we're totally screwed, and so therefore we

54:30

should be trying like really crazy

54:32

swing for the fences stuff and forgetting about

54:34

interventions that like help us a little bit. And

54:37

yeah, and I have the opposite take, I just think like, look,

54:39

if nothing further happens, there's some

54:41

chance that we're just fine, basically by luck, so

54:44

we shouldn't be doing like overly crazy things to increase

54:46

our variance if they're not, you know, if they're

54:48

not like kind of highly positive and expected value,

54:51

and then I also think like, yeah, things that just

54:53

like help a little bit, like, yeah, those are good.

54:55

They're good at face value, they're good in the way you'd

54:57

expect them to be good, they're not like

54:59

worthless because they're not enough, and

55:02

so I think, you know, things like just working harder

55:04

with AI systems

55:06

to get the reinforcement AI systems

55:08

are getting to be accurate, so just that, you know, this idea

55:10

of accurate reinforcement where you're not

55:12

rewarding AI systems specifically

55:15

for doing bad stuff, you know,

55:17

that's a thing you can kind of like get more right or get

55:19

more wrong, and doing more attempts to do that is kind

55:22

of basic, and it's not, it doesn't involve

55:24

like clever re-thinkings of what cognition means

55:26

and what alignment means and how we get a perfectly aligned

55:28

AI, but I think it's a thing that

55:30

could matter a lot, and that putting more effort into

55:32

could matter a lot, and I feel that way too, about improving

55:35

information security, like you don't have to make

55:37

your AI impossible to steal, make it hard to steal is worth

55:39

a lot, you know, so there's a lot of just

55:41

generally a lot of things I think people can do to reduce AI

55:43

risks that don't rely on a complicated

55:45

picture. It's just like this thing helps, so just do it because

55:47

it helps. Yeah,

55:49

we might go over those interventions in just a

55:51

second. Yeah, maybe it's possible to like flesh out the story

55:53

a little bit, like yeah, how could we get a

55:55

good outcome? Mostly through luck.

55:58

So I broke these... success

56:00

without dignity idea into

56:02

a couple phases. So there's the initial

56:05

alignment problem, which is the, you know, the thing

56:07

most people I think in the time of Doom or Headspace

56:09

tend to think about, which is how do we,

56:11

how do we build a very powerful AI system that

56:14

is not trying to take over the world or disempower

56:16

our humanity or kill all humanity or whatever. And

56:18

so there, I think if you are training

56:21

systems that are

56:22

human, I call them human level ish. So

56:25

an AI system that's like got kind

56:27

of similar range capabilities to human,

56:29

it's going to have some strengths and some weaknesses relative

56:31

to a human. If you're training that kind of system,

56:34

I think that

56:35

you may just get systems that are pretty

56:37

safe, at least for the moment, without

56:39

a ton of breakthroughs or special

56:42

work. You might get it by pure

56:44

luck ish. So it's basically like this thing I

56:46

said before about how, you know, you have an AI system

56:48

and you train it by basically saying

56:50

good job or bad job when it does something. It's

56:52

like human, human feedback for a human

56:55

level ish system that could easily result

56:57

in a system that like

56:58

either it really did generalize to doing

57:00

what you meant it to do or it generalized

57:03

to this like thing where it's like

57:05

trying to take over the world. But that

57:07

means cooperating with you because whenever it's

57:09

too weak to take over the world. And in fact, these human level

57:11

ish systems are going to be like too weak to take over the world. So they're

57:13

just going to cooperate with you. It could mean that.

57:16

So you could get like either two of those generalizations.

57:18

And then like,

57:20

it does matter if you're like I just

57:22

said, if your reinforcement is accurate. So

57:24

you could kind of like have an AI system where you say,

57:26

hey, go make me a bunch of money. And then

57:28

unbeknownst to you, it goes and like breaks

57:31

a bunch of laws and hacks into a bunch of stuff

57:33

and brings you back some money or even like fakes

57:35

that you have a bunch of money and then you say good job. Now

57:38

you've actually rewarded it for doing bad stuff. But

57:40

if you can take that out, if you can basically avoid

57:42

doing that and have your have your

57:45

kind of like good job when it actually did

57:47

a good job, that I think increases the

57:49

chances that that it's going to generalize

57:51

to basically just doing doing a good

57:53

job or at least doing what we roughly intended and

57:56

not kind of pursuing goals of its

57:58

own, if only because that wouldn't work. And so I

58:00

think you could, you know, a lot of this is to say,

58:02

you could solve the initial alignment problem by almost

58:05

pure luck, by this kind of reinforcement learning

58:07

from human feedback, generalizing well. You could

58:09

add a little effort on top of that and make it more

58:11

likely, like getting your reinforcement more accurate.

58:13

There's some other stuff you could do in addition

58:15

to kind of catch some of the failure modes and

58:18

straighten them out, like red teaming

58:20

and simple checks and balances. I won't go into the details

58:22

of that. And if you get some combination

58:24

of luck and skill here, you end up with

58:27

AI systems that are roughly human

58:29

level, not immediately

58:31

dangerous anyway. Sometimes they call them

58:33

jinkily aligned. It's like they are

58:35

not trying to kill you at this moment. That doesn't mean you

58:37

solve the alignment problem. But

58:40

at this moment, they are approximately trying to help you.

58:42

Maybe if they can all coordinate and kill you, they

58:45

would, but they can't. Remember, they're kind of human-like.

58:47

So that's the initial alignment problem. And

58:49

then once you get past that,

58:51

then I think we should all just

58:53

forget about the idea that we have any idea what's gonna

58:55

happen next. Because now you

58:59

have a huge, potentially huge number

59:01

of human-level-ish AIs. And

59:03

that is just incredibly world-changing. And

59:06

there's this idea of, that

59:08

I think sometimes some people call it, getting

59:10

the AIs to do our alignment homework for us. So

59:13

it's this idea that once you have human-level-ish

59:15

AI systems, you have them kind of working

59:17

on the alignment problem in huge numbers.

59:20

And it's like, in some ways I hate this idea because

59:22

it's just very lazy and it just is like, oh

59:24

yeah, we're not gonna solve this problem until later when the

59:26

world is totally crazy and everything's moving really

59:28

fast and we have no idea what's gonna happen. So I hate

59:30

the idea in that sense. We'll just ask the

59:32

agents that we don't trust to make themselves trustworthy.

59:35

Yeah, exactly. So there's a lot to hate about this

59:37

idea, but heck, it could work.

59:40

It really could. Because you could have a situation

59:43

where just in a few months, you're able

59:45

to do the equivalent of thousands of years of

59:48

humans doing alignment research. And if these

59:50

systems are just not at the point where they can

59:52

or want to

59:53

screw you up, that really

59:55

could do it. I mean, we just don't know that like

59:57

thousands of years of human levelish alignment research isn't

59:59

enough.

59:59

to just like get us a real solution.

1:00:02

And so that's kind of how you get through

1:00:04

a lot of it. And then you still have another problem in

1:00:07

a sense, which is that you do

1:00:09

need a way to stop dangerous systems. It's

1:00:11

not enough to have safe AI systems. But

1:00:13

again, you have help from this giant automated

1:00:15

workforce. And so in addition to

1:00:18

coming up with ways to make your system safe, you can come up with ways

1:00:20

of showing that they're dangerous and when they're

1:00:22

dangerous and being persuasive about the importance

1:00:24

of the danger. And that again, feels like something

1:00:26

that like, I don't know, I feel like if we had 100

1:00:29

years before AGI right now, there'd be a good chance that

1:00:31

normal flesh and blood humans could pull this off. So

1:00:34

in that world, there's a good chance that an automated workforce

1:00:36

can cause it to happen pretty quickly. And you

1:00:38

could pretty quickly get, get

1:00:40

understanding of the risks agreement that we need to stop

1:00:43

them. And you have more safe

1:00:45

AI's than dangerous AI's and you're trying to stop the dangerous

1:00:47

AI's and you're measuring the dangerous AI's or

1:00:50

you're stopping any AI that refuses to be measured or who's

1:00:52

developer refuses to measure it. And so then

1:00:54

you have a world that's kind of like this one, where like, yeah,

1:00:56

there's a lot of evil people out there, but there's, but they

1:00:59

are generally just kept

1:00:59

in check by being outnumbered by people who

1:01:02

are at least law abiding, if not incredibly angelic.

1:01:04

So you get a world that looks like this one, but it just

1:01:06

has a lot of like AI's running around in it. And

1:01:09

so we have like a lot of progress in science and technology

1:01:11

and that's, that's a fine ending potentially.

1:01:14

Okay, so that's one flavor

1:01:16

of story. Is there any other broad themes

1:01:18

in the positive stories that it could be worth bringing

1:01:20

out before we move on?

1:01:21

I think I mostly covered it. I mean, the other

1:01:23

two stories involve less luck

1:01:26

and more like you have one or two actors

1:01:28

that just do a great job. Like, you know, you have one

1:01:30

AI lab that it's just ahead of everyone else and

1:01:33

it's just like doing everything right. And that improves

1:01:35

your odds a ton. You know, for a lot

1:01:37

of this reason that like being a few months ahead could

1:01:39

mean you have like, you know, a lot

1:01:42

of subjective time of having your automated workforce

1:01:44

do stuff to be helpful. And so there's one

1:01:46

of those with like a really fast takeoff and one of them with a

1:01:48

more gradual takeoff.

1:01:50

But I think, you know, I think that does, that does kind of highlight

1:01:52

again that like one really good

1:01:54

actor who's like really successful could

1:01:56

move the needle a lot even when you get less

1:01:58

luck. So I think there's...

1:01:59

There's a lot of ways things could go well. There's a lot

1:02:02

of ways things could go poorly. I feel like

1:02:04

I'm saying just like

1:02:05

really silly, obvious stuff now that just should

1:02:07

be everyone's starting point, but I do

1:02:09

think it's not where most people are at right now. I think these risks

1:02:11

are extremely serious. They're kind of my

1:02:14

top priority to work on. I think

1:02:16

anyone's saying we're definitely gonna be fine. I don't know where the heck

1:02:18

they're coming from, but anyone's saying we're definitely doomed. I

1:02:20

don't know, same issue.

1:02:21

Okay,

1:02:23

so the key components of the story here was, so

1:02:25

one, you didn't get bad people stealing the models

1:02:27

and misusing them really early on. Or

1:02:30

there was some limit to that and they were outnumbered or

1:02:32

something like that, yeah. Okay, and

1:02:34

initially we ended up training models that are

1:02:36

more like human level intelligence and it turns out to

1:02:38

not be so challenging to have

1:02:41

moderately aligned models like that. And

1:02:43

then we also managed to turn those AIs, those

1:02:47

folks I guess, I'm gonna say it, towards

1:02:50

the effort of figuring out how to align additional

1:02:52

models that would be more capable still.

1:02:54

And or how to slow things down and put in

1:02:56

a regulatory regime that stops things if we don't know

1:02:58

how to make safe, yeah. Right, okay, or they

1:03:00

help with, yeah, they help with a bunch of other governance

1:03:02

issues, for example, and then also by

1:03:04

the time these models have proliferated and

1:03:07

they might be getting used by irresponsibly

1:03:10

or by bad actors, those folks are just

1:03:12

massively outnumbered as they are today by people

1:03:15

who are largely sensible. Okay, so

1:03:17

I'm really undecided on

1:03:19

how plausible these stories are. I guess

1:03:22

I

1:03:22

play some weight or some credence

1:03:25

on the possibility that pessimists like Eliezer,

1:03:27

Yudkowsky are right and that this kind of thing

1:03:29

couldn't happen for one reason or another. And

1:03:32

we really would. I think that's possible, yeah. Yeah, I

1:03:34

guess, what do you make of the argument

1:03:36

that, let's say we're 50-50, short

1:03:38

between kind of the Eliezer worldview and

1:03:40

the Holden worldview just outlined, that in

1:03:43

the case of Eliezer's right, we're kind of just screwed, right?

1:03:45

And things that we do on the margin, a bit of extra

1:03:47

work here and there, just isn't gonna change the story.

1:03:50

Basically, we're just gonna go extinct with very

1:03:52

high probability.

1:03:52

Whereas if you're right, then

1:03:55

things that we do might actually move the needle and we

1:03:57

have a decent shot. So it makes more sense to act.

1:03:59

as if we have a chance, as if some

1:04:02

of this stuff might work, because our decisions

1:04:04

and our actions just aren't super relevant

1:04:07

in the pessimist case. Does that sound like a sensible

1:04:09

reasoning? I mean, it seems a little bit suspicious

1:04:12

somehow. I think it's a little bit suspicious. I

1:04:14

mean, I think it's fine if it's 50-50. I think

1:04:16

Eliezer has complained about this. He's kind of said, you

1:04:18

know, look, you can't condition

1:04:21

on a world that's fake, and you should live

1:04:23

in the world you think you're in. I think that's right. So

1:04:26

I should say, I think I do want to say a couple meta

1:04:28

things about the success of that dignity story.

1:04:29

One is, I do want people to know, this is

1:04:32

not like a thing I cooked up. This is, you

1:04:34

know, I think of my job, I'm not an AI

1:04:36

expert, I think of my job being especially

1:04:38

a person who's generally been a funder, having

1:04:40

access to a lot of people, having to make a lot of people judgments.

1:04:43

My job is really to figure out who to listen

1:04:45

to about what and how much weight to give whom about what.

1:04:48

So I'm getting the actual substance

1:04:50

here from people like Paul Chris Giano,

1:04:52

Carl Schulman, and others, and this

1:04:54

is not holding reasoning things out and being like, this

1:04:56

is how it's going to be. This is me synthesizing,

1:04:59

hearing from a lot of different people, a lot of them

1:05:01

highly technical, a lot of them experts, and

1:05:03

just trying to say who's making the most sense, also

1:05:06

considering things like track records, like who should be getting

1:05:08

weight, things like expertise. So

1:05:10

that is an important place to know where I'm coming from.

1:05:12

But I do, having done that, I

1:05:15

do actually feel that this success

1:05:17

of that dignity is just like a serious possibility, and

1:05:19

I'm way more than 50-50 that

1:05:21

this is possible. That

1:05:24

according to the best information we have now, this

1:05:26

is a reasonable world to be living in, is a world where this could

1:05:28

happen and we don't know, and way less

1:05:29

than 50-50, the kind of the LAS model

1:05:32

of, yeah, we're doomed for sure with like 98 or 99% probability,

1:05:36

is right. I don't put zero credence on it, but it's just

1:05:38

not my majority view. But the other thing,

1:05:40

which relates to what you said, is

1:05:42

I don't want to be interpreted as saying this

1:05:44

is a reason we should chill out. So it

1:05:46

should be obvious, but my argument should stress

1:05:48

people out. My picture should stress people out

1:05:50

way more than any other possible picture. I use

1:05:52

the most stressful possible picture because it's

1:05:55

like

1:05:55

anything could happen, every little bit helps. Like,

1:05:58

if you help a little more, a little less, like that would be a good idea.

1:05:59

actually matters in this potentially huge

1:06:02

kind of fated humanity kind of way. And that's

1:06:04

a crazy thing to think in its own way. And it's certainly not a relaxing

1:06:07

thing to think, but it's yeah, it's what I think as far as I can tell.

1:06:10

Yeah,

1:06:10

I'm I'm stressed. Don't worry. Okay,

1:06:12

good. Okay. Okay. So,

1:06:16

so this kind of worldview leads

1:06:18

into something that you wrote, which is this kind of four

1:06:20

intervention playbook for possible success

1:06:23

with AI. And you kind of describe

1:06:25

four different categories of interventions that we might

1:06:27

engage in in order to to try to improve

1:06:29

our odds of success. I think there was alignment research,

1:06:32

standards and monitoring, creating a successful

1:06:34

and careful AI lab. And finally, information

1:06:36

security. I think we've touched on all these a

1:06:39

little bit, but maybe we could go over them again. Is there

1:06:41

anything you want to say about alignment research as an intervention

1:06:43

category that you haven't said already? Well,

1:06:45

I mean, I've kind of pointed at it. But

1:06:48

I think in my head, I see value

1:06:50

in I think there's like versions of alignment

1:06:53

research that are very like blue sky

1:06:55

and very like we have to have a fundamental

1:06:58

way of being like really sure that any

1:07:01

arbitrarily capable AI is like

1:07:03

totally aligned with what we're trying to get it to do. And

1:07:05

I think that I think that's very hard

1:07:07

work to do. I think a lot of the actual work being done on it

1:07:10

is not valuable. But I think if you can move the

1:07:12

needle on it, I think it's super valuable. And

1:07:14

then there's work that's like a little more prosaic and

1:07:16

it's a little more like, well, can we, you

1:07:18

know, use our train our eyes

1:07:20

with human feedback and find some way that screws

1:07:22

up and kind of patch the way it screws up and go to

1:07:24

the next step. A lot of this work is

1:07:26

like pretty empirical as being done at AI labs.

1:07:29

And I think that works. It's just like super valuable as well.

1:07:31

And so that is a take I have in alignment research. I

1:07:34

do think almost all alignment research

1:07:37

is believed by many people to be totally useless

1:07:39

and or harmful. And

1:07:41

I tend not to super feel that way. I think if anything,

1:07:44

the line I would draw is there is some

1:07:46

alignment research that seems like

1:07:48

it's necessary eventually to commercialize.

1:07:50

And so I'm a little less excited about that because I do think it will

1:07:52

get done regardless on the

1:07:54

way to whatever we're worried about. And so I do

1:07:56

tend to draw lines about like, how likely is

1:07:59

this research to get done?

1:08:00

you know, not by normal commercial motives, but

1:08:02

I do think there's a wide variety of alignment

1:08:05

research that can be helpful, although

1:08:07

I think a lot of alignment research also is not helpful,

1:08:09

but that's more because it's like not aimed

1:08:11

at the right problem, and less because it isn't like exactly

1:08:14

the right thing. And so that, yeah, that's a take on

1:08:16

alignment research. Then another another take is I

1:08:18

have kind of highlighted what I call threat assessment

1:08:21

research as a thing that you could consider

1:08:23

part of alignment research or not, but

1:08:25

it's probably the single category that feels to me

1:08:27

like

1:08:28

the most in need of more work right now,

1:08:30

given where everyone is at, and that

1:08:32

would be, you know, basically work trying

1:08:35

to kind of create the problems

1:08:37

you're worried about in a controlled environment where

1:08:40

you can age a show that they

1:08:42

could exist and understand the conditions

1:08:44

under which they do exist. So, you know, problems

1:08:46

like a misaligned AI that is pretending to be aligned,

1:08:49

and see you can actually like study alignment techniques

1:08:51

and see if they work on many versions of the problem. So

1:08:53

like, you know, you could think of it as like model organisms

1:08:56

for AI where, you know, in order to cure

1:08:58

cancer, it really helps to be able to give cancer to mice.

1:09:00

In order to deal with AI misalignment, it really

1:09:02

helps to be able to create, if we could ever create a deceptively

1:09:05

aligned agent that is like, you know, secretly

1:09:08

trying to kill us, but it's too weak to actually kill

1:09:10

us, that would be way better than having the first

1:09:12

agent that's secretly trying to kill us be something that actually can kill

1:09:15

us. So I'm really into kind of creating,

1:09:17

creating the problems we're worried about in controlled environments.

1:09:20

Yeah, yeah. Okay, so the second category

1:09:22

was standards and monitoring, which you've already

1:09:24

touched on. Is there anything high level you want to

1:09:27

say about that one?

1:09:28

Yeah, this is kind of, to me, the most

1:09:30

nascent, or like the one that just, it's, there's

1:09:33

not much happening right now, and I think there could be a lot

1:09:35

more happening in the future, but the basic idea

1:09:37

of standards of monitoring is this idea of you, you

1:09:39

have tests for whether AI systems are dangerous, and

1:09:41

you have a regulatory or self-regulatory

1:09:44

or a normative, you know, informal

1:09:46

framework that says dangerous AI should

1:09:48

not be trained at all or deployed. So,

1:09:51

and not be trained, I mean like, you

1:09:53

found initial signs of danger in one AI model, so

1:09:55

you're not going to make a bigger one. Not just you're not going to deploy,

1:09:57

you're not going to train it.

1:09:58

I'm excited about standards.

1:09:59

of monitoring in a bunch of ways. I think

1:10:03

it feels like it has to be eventually part

1:10:05

of any success story. There has to be some framework

1:10:07

for saying, hey, we're going to stop

1:10:09

dangerous AI systems. But it also,

1:10:12

in the short run, I think it's got more advantages than

1:10:14

sometimes people realize, where I think

1:10:16

it's not just about slowing things

1:10:18

down. It's not just about stopping directly

1:10:20

dangerous things. I think a good standards

1:10:23

and monitoring regime would create massive

1:10:25

commercial incentives to actually

1:10:27

pass the tests. And so if the tests are

1:10:29

good,

1:10:29

if the tests are well-designed to actually

1:10:32

catch danger where the danger is, you could

1:10:34

have massive commercial incentives to actually

1:10:36

make your AI system safe and show that

1:10:38

they're safe. And I think we'll get much different

1:10:41

results out of that world

1:10:43

than out of a world where everyone is trying to make

1:10:45

and show AI system safety is doing it out of the goodness

1:10:47

of their heart.

1:10:48

Or just for a salary.

1:10:50

It seems like standards and monitoring is kind of a new

1:10:53

thing on the public discussion, but it seems like people are

1:10:55

talking around this issue or governments

1:10:57

are considering this, that the labs are now publishing

1:11:00

papers in this vein. To

1:11:02

what extent do you think you'd need complete coverage

1:11:05

for some standards system in order for it

1:11:07

to be effective? I'm just imagining, I guess it seems

1:11:09

like OpenAI, DeepMind, Anthropic,

1:11:12

they all are currently saying pretty similar

1:11:14

things about they're quite concerned about extinction

1:11:16

risk, or they're quite concerned about ways that AI could go wrong. But

1:11:18

it seems like the folks at Meta, led

1:11:20

by Yann LeCun, kind of have a different attitude. And it

1:11:23

seems like it might be quite a heavy lift to get them to voluntarily

1:11:25

agree to join the same sorts of standards

1:11:27

and monitoring that some of those other labs might

1:11:29

be enthusiastic about. And I wonder, how much does it,

1:11:32

is there a path to getting everyone on board?

1:11:34

And if not, would you just end up

1:11:37

with the most rebellious, the least anxious,

1:11:39

the least worried lab basically running ahead?

1:11:41

Well, I think that latter thing is definitely risk, but I think

1:11:43

it could probably be dealt with. I mean, one

1:11:46

way to deal with it is just to build it into the standards regime

1:11:48

and say, hey, you know, you can do

1:11:50

these dangerous things, train an AI system, or

1:11:53

deploy an AI system, you can do them. A, if you can show

1:11:55

safety, or B, if you can show that someone

1:11:57

else is going to do it, you know, you

1:11:59

could even say, hey,

1:11:59

When someone else comes even close

1:12:02

even within an order of magnitude of your dangerous

1:12:04

system now you can deploy your dangerous system Doesn't

1:12:07

that kind of

1:12:08

but but that it seems like the craziest people can then kind

1:12:10

of just force everyone else or like lead everyone else

1:12:12

Down the the garden path.

1:12:14

Well, it's just I what's the alternative, right? It's just like

1:12:16

you can you can design the system every one I think it's

1:12:18

like you can either you can say okay You have

1:12:21

to wait for them to actually catch you in which case it's hard.

1:12:23

It's hard to see how this standard system does harm It's

1:12:25

still kind of a scary world Yeah Or you can say you can wait

1:12:27

for them to get anywhere close which now you've got Potentially

1:12:29

a little bit of acceleration thrown in there and

1:12:32

and maybe you did or didn't decide that was better than

1:12:34

actually just like slowing Down all the cautious players. I think

1:12:36

is a real concern I will say

1:12:38

that I don't feel like you have to get universal

1:12:41

consensus from the jump for a few reasons One

1:12:44

is it just one step at a time? So I think

1:12:46

one is it just like if you can start

1:12:48

with some of the leading labs being into this

1:12:51

There's a lot of ways that other folks could come on

1:12:53

board later. Some of it is just like, you know peer pressure

1:12:56

Yeah, we've seen with the corporate campaigns

1:12:58

for farm animal welfare that you've probably covered Just

1:13:01

like once a few dominoes fall it gets

1:13:03

very hard for others to hold out because they kind of

1:13:05

look like way more You know way more callous

1:13:07

or in the AI system case way more reckless Of

1:13:10

course, you know, there's also the possibility for regulation

1:13:12

down the line and I think regulation Could

1:13:15

be more effective if it's based on something

1:13:17

that's already been like implemented in the real world That's

1:13:20

actually working and that's actually detecting dangerous

1:13:22

systems So I don't know a lot of me is just like

1:13:24

one step at a time a lot of me is just like you see

1:13:26

if You can get a system working for anyone

1:13:29

that catches dangerous AI systems and stops

1:13:32

until it can show they're safe And then you think

1:13:34

about how to expand that system And then

1:13:36

a final point is the incentives point we're just like this

1:13:38

is not the world I want to be in and this is not a world I'm that

1:13:40

excited about but in a world where the leading labs

1:13:43

are

1:13:43

Using kind of a standards and E-Vals

1:13:46

framework for a few years and then no one

1:13:48

else ever does it and then eventually we just have To drop

1:13:50

it. Well, that's still a few years in which I think

1:13:52

you are gonna have meaningfully different incentives for those for

1:13:54

those leading labs About how they're gonna prioritize You

1:13:57

know tests of safety and actual safety

1:13:59

measures

1:14:00

Yeah. Do you think there's room for a big

1:14:03

business here, basically? Because I would think with

1:14:05

so many commercial applications of ML

1:14:07

models, people are going to want to have them certified

1:14:10

that they work properly and that they don't flip out and

1:14:12

do crazy stuff. And in as much as this is going to become

1:14:14

a boom industry, you'd think that

1:14:16

the group that has the greatest expertise in like

1:14:18

independently vetting and evaluating like

1:14:21

how models behave when they're put in a different environment might

1:14:23

just be able to sell this service for

1:14:25

a lot of money. Well, there's the independent expertise,

1:14:28

but I think in some ways I'm more interested in the financial

1:14:30

incentives for the companies themselves. So if you look

1:14:32

at like big drug companies,

1:14:35

a lot of what they are good at is

1:14:37

the FDA process. A lot of what they're good at is running

1:14:39

clinical trials, doing safety studies, proving

1:14:42

safety, documenting safety, arguing safety and

1:14:44

efficacy. You could argue about

1:14:46

whether there's like too much caution at the FDA. I think

1:14:48

in the case of COVID, there may have been some of that, but

1:14:51

certainly it's a regime where there's big

1:14:53

companies that a major priority,

1:14:55

maybe at this point a higher priority for them than innovation

1:14:58

is actually measuring and demonstrating safety

1:15:01

and efficacy. And so you could imagine landing

1:15:03

in that kind of world with AI. And I think that would just, yeah,

1:15:05

that would be a very different world from the one we're going to go into

1:15:07

by default. I do think that that's not

1:15:09

just about, you know, the FDA is not the one making money

1:15:12

here, but it's changing the way that

1:15:14

the big companies think about making money, certainly

1:15:16

redirecting

1:15:16

a lot of their efforts into demonstrating

1:15:19

safety and efficacy as opposed to coming up with new kinds of drugs,

1:15:21

both of which have some value. But I think

1:15:23

we're a bit out of balance on the AI side right now. Yeah.

1:15:27

Yeah, it is funny. For

1:15:29

so many years, I've been just infuriated by the FDA.

1:15:31

And I feel like these people, they only consider

1:15:33

downside, they only consider risk, they don't think

1:15:36

about upside nearly enough. And now I'm like, can

1:15:38

we get some of that insanity over here, please? Yeah,

1:15:41

yeah, yeah. No, I know. I know. There

1:15:43

was a very funny Scott Alexander piece

1:15:45

kind of making fun of this idea. But

1:15:46

I mean, I think it's legit. It's just honestly, it's just

1:15:49

kind of a boring opinion to have. But I think that

1:15:52

I think that innovation is good. And I think safety

1:15:54

is good. And I think we have a lot, we

1:15:56

have a lot of parts of the economy that are just way

1:15:58

overdoing the safety. Just like you can't, you

1:16:01

can't give a haircut without a license and you

1:16:03

can't like build an in-law unit in your

1:16:05

house without like a three year process

1:16:08

of forms. And you know, Open Philanthropy works on a lot

1:16:10

of this stuff. We are the first institutional funder of

1:16:12

the IMBI movement, which is this movement

1:16:14

to make it easier to build houses. I think

1:16:16

we overdo that stuff all the time. And I think the

1:16:18

FDA sometimes overdoes that stuff in a horrible

1:16:21

way. And I think during COVID, you know, I do believe that

1:16:23

things moved way too slow. And then I think with

1:16:25

AI, we're just not doing anything. There's just no framework

1:16:27

like this in place at all. So I don't know how about

1:16:29

a middle ground?

1:16:29

Yeah. If only we could get the same level

1:16:32

of review for these potentially incredibly dangerous

1:16:34

self-replicating AI models that we have for building a

1:16:36

block of apartments. Yeah, right. Exactly.

1:16:39

In some ways, I feel like this incredible paranoia

1:16:41

and this incredible focus on safety, if there's one place

1:16:43

it would make sense, that would be AI. I

1:16:46

honestly, weirdly, I'm not saying that

1:16:48

we need to get AI all the way to

1:16:50

being as cautious as like regulating housing

1:16:53

or drugs. Maybe it should be less cautious

1:16:55

than that. Maybe. But right now, it's

1:16:57

just nowhere. And so I think, you know, I think

1:16:59

you could think

1:16:59

of it as like there's FDA and zoning energy

1:17:02

and then there's like AI energy and like, yeah,

1:17:04

maybe housing should be more like AI. Maybe AI should

1:17:07

be more like housing. But I definitely feel like we

1:17:09

need more caution in AI. That's what I think. More caution

1:17:11

than we have. And that's not me saying that we need

1:17:13

to forever be in a regime where

1:17:15

safety is the only thing that people care about. Yeah.

1:17:17

You've spoken a bunch with the folks at AHRQ

1:17:20

Evaluations. I think AHRQ Sense 4 was an

1:17:22

AI research center. Alignment Research Center. Alignment

1:17:24

Research Center, yeah. And they have an evaluations

1:17:26

project. Yeah. Could you maybe give

1:17:29

us a summary of the project that they're engaged

1:17:31

in and the reasoning behind it?

1:17:32

I have spent some time as an advisor to ArchiVals.

1:17:35

That's a group that is, you know, headed

1:17:37

by Paul Christiano. Beth Barnes is leading

1:17:39

the team. And they work on basically

1:17:41

trying to find ways to assess whether AI systems

1:17:44

could pose risks, whether they could be dangerous.

1:17:47

And they also have thought about whether

1:17:49

they want to experiment with like putting out kind of

1:17:51

proto standards and proto expectations

1:17:53

of, hey, if your model is dangerous in this way, here

1:17:56

are the things you have to do to contain it. And make it safe.

1:18:00

a lot of intellectual firepower there to design

1:18:02

evaluations of AI systems and where

1:18:04

I'm, you know, hopefully able to add a little

1:18:06

bit of just like staying on track

1:18:09

and helping run and build an organization because

1:18:11

it's all quite new. But they were the

1:18:14

ones who they did an evaluation on GPT-4

1:18:17

for whether it could kind of create copies

1:18:19

of itself in the wild. And they kind of concluded, no,

1:18:21

as far as they were able to tell, although they weren't able

1:18:24

to do yet all the all the research

1:18:26

they wanted to do, especially they weren't able to do a fine tuning

1:18:28

version of their evaluation.

1:18:29

Okay, so while we're on safety

1:18:32

and evaluations, as I understand it, this is kind of something that

1:18:34

you've been thinking about in particular that the last

1:18:36

couple of months, maybe what what new things

1:18:38

have you learned about this part of the of the playbook

1:18:40

over the last six months?

1:18:42

Yeah, the evaluations and standards

1:18:44

and monitoring. I mean, one thing that

1:18:46

just has become clear to me is there's just,

1:18:49

it is really hard to design evaluations

1:18:52

and standards here. And there's just a lot of like, hairy

1:18:54

details around things like, you

1:18:56

know, auditor access. So if you you

1:18:58

know, there's this kind of idea that you would have an

1:19:00

AI lab have an outside independent auditor

1:19:03

determine whether their models have dangerous capabilities.

1:19:05

But it's a fuzzy question. Does the

1:19:07

model have dangerous capabilities? Because it's

1:19:10

going to be sensitive to a lot of things like, how

1:19:12

do you prompt the model? How do you interact with the model?

1:19:14

Like, what are the things that can happen to it that

1:19:17

cause it to actually demonstrate these dangerous capabilities?

1:19:19

If someone builds a new tool for GBT4 to

1:19:22

use, is that going to cause it to become more dangerous?

1:19:24

In order to investigate this, you have to actually

1:19:26

be like good at working with the model and

1:19:29

understanding what its limitations are. And a lot of times,

1:19:31

just like the AI labs not only

1:19:33

know a lot more about their models, but they have like a bunch of

1:19:35

features that like, it's hard to share all

1:19:37

the features at once, they have a bunch of different versions of the model.

1:19:40

And so it's quite hard to make outside

1:19:42

auditing work for that reason. Also, if you're

1:19:44

thinking about standards, you're thinking about, you know, a

1:19:47

general kind of theme in a draft

1:19:50

standard might be, once your AI has shown

1:19:52

initial signs, and it's able to do something dangerous,

1:19:54

such as autonomous replication, which means that

1:19:56

it can basically, you know, make a lot

1:19:59

of copies of itself with

1:19:59

without help and without necessarily getting detected

1:20:02

and shut down. There's an idea that like once you've

1:20:04

kind of shown the initial science of a system can do that,

1:20:06

that's a time to not build a bigger system.

1:20:08

And that's a cool idea, but it's like how much bigger?

1:20:11

And it's like hard to define that because

1:20:13

making systems better is

1:20:15

multi-dimensional and can involve

1:20:18

more efficient algorithms, can involve, again,

1:20:20

better tools, longer contexts, just

1:20:22

like different ways of fine tuning the models,

1:20:25

different ways of specializing them, different ways

1:20:27

of like setting them up, prompting them, like different

1:20:29

instructions to give them. And

1:20:31

so it can be like just very fuzzy, just like

1:20:34

what is this model capable of is a hard thing

1:20:36

to know. And then how do we know when

1:20:38

we built a model that's more powerful, though we need

1:20:40

to retest? These are very hard things

1:20:42

to know. And I think it

1:20:43

has has kind of moved me toward feeling

1:20:46

like we're not ready for a really

1:20:48

prescriptive standard that tells you exactly

1:20:51

what practice is to do like the farm animal welfare

1:20:53

standards are. We might need to start by asking

1:20:55

companies to just outline their own proposals

1:20:58

for what tests they're running, when and how they

1:21:00

feel confident that they'll know when it's become too

1:21:02

dangerous to keep scaling. Yeah. So

1:21:06

some some things that will be really useful to be able

1:21:08

to evaluate is, you know, is this model

1:21:10

capable of autonomous self-replication

1:21:12

by breaking into additional servers? I guess

1:21:14

you might also want to test, you know, could it be used by

1:21:17

terrorists for figuring out how

1:21:19

to produce bioweapons? Those are kind of very natural

1:21:21

ones. The breaking into servers is not

1:21:23

really central. So the idea is like, could it

1:21:25

could it make a bunch of copies of itself in the

1:21:27

presence of like minimal or like kind

1:21:29

of non-existent human attempts to stop it

1:21:31

and shut it down? So it's like, could it take basic

1:21:34

precautions to not get like obviously

1:21:36

detected as an AI by people who are not

1:21:38

particularly looking for it? And the

1:21:40

thing is, if it's able to do that, you could have a human

1:21:42

have it do that on purpose. So it doesn't necessarily have to break

1:21:45

into stuff. They can like, you know, a lot of the test

1:21:47

here is like, can it find a way to make money, make

1:21:49

the money, open an account with a server

1:21:51

company,

1:21:52

put, you know, rent server space, make copies

1:21:55

of itself on the server. None of that necessarily involves

1:21:57

it breaking in anywhere.

1:21:58

I see. So so hacking is one way.

1:21:59

to get compute, but it's by no means the only one.

1:22:02

So it's not a necessary factor. That's

1:22:04

right. I mean, it's weird, but you could be an AI system

1:22:06

that's just kind of doing normal phishing scams that you

1:22:09

read about on the internet using those to get money,

1:22:11

or just legitimate work. You could be

1:22:13

an AI system that's just like going on mTurk and being an mTurker

1:22:15

and making money, use that money to legitimately

1:22:18

rent some servers, or sort of legitimately, because it's not actually

1:22:20

allowed if you're not a human. But apparently,

1:22:23

legitimately rent some server space, install

1:22:26

yourself again, have that money, make more money, and

1:22:28

a copy make more money, and then you

1:22:29

have ... You can have quite a bit of replication

1:22:32

without doing anything too fancy, really. And that's what

1:22:34

the initial autonomous replication test that ArchiVels

1:22:36

does is about.

1:22:38

Okay. So we'd really like to be able to know where the

1:22:40

models are capable of doing that. And I suppose

1:22:42

that it seems like they're not capable now, but a couple of years'

1:22:44

time. Probably not. Again,

1:22:47

there's things that you could do that

1:22:49

might make a big difference that have not been tried yet by

1:22:51

ArchiVels, and that's on the menu. Fine tuning is the big

1:22:53

one. Fine tuning is like you have a model

1:22:56

and you do some additional training that's not

1:22:58

very expensive, but is trying to just get it

1:23:00

good at particular tasks. So you can take the task that's bad

1:23:02

at right now, train it to do those. That hasn't really

1:23:04

been tried yet. And it's like a human

1:23:07

might do that. If you have these models accessible

1:23:09

to anyone or someone can steal them, a human

1:23:11

might take a model, train it

1:23:13

to

1:23:14

be more powerful and effective and

1:23:16

not make so many mistakes, and then this thing

1:23:18

might be able to autonomously replicate. That can be scary

1:23:20

for a bunch of reasons.

1:23:21

Yeah. And then there's

1:23:24

the trying to not release models that could be used by terrorists

1:23:26

and things like that. The autonomous replication

1:23:28

is something that could be used by terrorists. You know what I

1:23:31

mean? It's overlapping. Yeah. It's

1:23:33

like if you're a terrorist, you might say, hey, let's have a

1:23:35

model that makes copies of itself that make money to

1:23:38

make more copies of itself that make money. Well, you make a lot of money

1:23:40

that way. And then you could have a terrorist organization make a lot of

1:23:42

money that way, or using its models

1:23:44

to do a lot of like little things that, you

1:23:46

know, schlepping along, like trying to plan

1:23:49

out some incredibly, you know, some

1:23:51

plan that takes a lot

1:23:51

of work to kill a lot of people. That

1:23:54

is part of the concern about autonomous replication. It's not purely

1:23:56

an alignment concern.

1:23:57

Yeah. Okay.

1:23:59

to there was just giving advice

1:24:02

to people that we really would rather that they not be able to receive

1:24:04

is maybe another category. Like helping

1:24:07

to set up a bio-weapon, yeah. An AI model

1:24:09

that could do that even if it could not autonomously

1:24:11

replicate, that could be quite dangerous. Still be not ideal,

1:24:13

yeah. And then maybe another category is

1:24:16

trying to produce these model organisms so where you can

1:24:18

study behavior that you don't want an

1:24:20

AI model to be engaging in and like understand

1:24:22

how it arises in the training process and you

1:24:24

know what sort of further feedback mechanisms

1:24:27

might be able to train that out. Like if we could produce a model

1:24:29

that will trick you whenever it thinks it can get away with it

1:24:31

but doesn't when I think it's going to get caught, that would be really helpful.

1:24:33

Yeah. Are there any other broad categories of

1:24:36

standards and eval work that you're excited by?

1:24:39

So the way I would carve up the space

1:24:42

is there's like um there's capability evals

1:24:44

and that's like is this AI kind of capable

1:24:47

enough to do something scary? Forgetting

1:24:49

about whether it wants to and I think capability evals

1:24:51

are like you know could an AI,

1:24:53

if a human tried to get it to do it, could an

1:24:55

AI make a bunch of copies of itself? Could an

1:24:58

AI design a bioweapon? Those

1:25:00

are capability evals. Then there's like alignment evals that's

1:25:02

like

1:25:03

does this AI actually do what it's supposed

1:25:05

to do or does it have like some weird goals of its

1:25:07

own? So the stuff you talked about with model

1:25:09

organisms would be like more of an alignment eval the

1:25:11

way you described it. And the autonomous

1:25:14

replication is a capability eval. I think

1:25:16

a very important

1:25:17

subcategory of capability

1:25:20

evals is what I call meta capability

1:25:22

evals, meta dangerous capabilities which

1:25:25

is basically any ability an AI system

1:25:27

has that would make it very hard

1:25:29

to get confident about what other abilities it has.

1:25:32

So the you know an example would be um

1:25:34

what I'm currently tentatively calling unauthorized

1:25:36

proliferation. So an AI model that

1:25:38

can walk a human through building

1:25:41

a powerful AI model of their own that

1:25:43

is not subject to whatever restrictions and controls

1:25:45

the original one is subject to. That could be a very

1:25:47

dangerous capability that like you could say well all

1:25:50

right I can design a bioweapon but we but it

1:25:52

always refuses to do so but it could also help a human

1:25:54

build an AI that we don't know what the hell that thing can

1:25:56

do. So that would be an example autonomous

1:25:58

replication is a

1:25:59

meta capability eval. It's like, well, we

1:26:02

tried to see if our AI could design a bioweapon, it couldn't.

1:26:04

But

1:26:04

what we didn't test is if it goes and makes 10,000 copies

1:26:07

of itself all working together. Well,

1:26:09

maybe then it could really do different things

1:26:11

that we didn't know it could do. There's actually

1:26:13

kind of a lot of these meta evals. Like there's also sandbagging.

1:26:16

Sandbagging would be like

1:26:17

an AI that understands or evaluating it and

1:26:20

is pretending it can't do something. So that

1:26:22

would be another rough one is like, if an AI has

1:26:24

that capability, then it looks like it can't build

1:26:26

a bioweapon, but actually can. I can list a bunch

1:26:28

more of those and then I can list a bunch more of the direct

1:26:31

dangerous ones that are like bioweapon, hacking,

1:26:34

persuasion, just like dangerous

1:26:36

stuff it could do. And I think where I'm most concerned is

1:26:38

like, AI's that are kind of like have

1:26:40

some basic amount of the direct danger. And

1:26:42

then they have some meta danger that just like we've completely

1:26:44

lost our ability to measure it. And we don't know

1:26:46

what's actually going to happen when this

1:26:47

thing gets out in the world. That's what I think starts to count as a

1:26:50

dangerous AI model. Yeah. Of course, I don't really think

1:26:52

that any of the AI models out there today trip

1:26:54

this danger wire, but that's only my belief. That's

1:26:57

not something I know for sure. It seems

1:26:59

like there's an enormous amount of work to

1:27:01

do on this. Is there any way that people can get started

1:27:03

on this without necessarily having to be hired by an

1:27:05

organization that that's, that's focusing on it? Like,

1:27:07

does it help to build like really enormous familiarity

1:27:10

with the models like GPT-4?

1:27:12

Yeah, you could definitely play with GPT-4

1:27:15

or a cloud and just see

1:27:17

what scary stuff you can get it to do. If

1:27:20

you really want to be into this stuff, I think you're going

1:27:22

to be in an org because these are

1:27:24

very, it's like, it's going to be very different

1:27:27

work depending on if you're working with the most capable

1:27:29

model or not, right? You're trying to figure out how capable the

1:27:31

model is. So doing this on a little toy

1:27:33

model is not going to tell you much compared to doing

1:27:35

this on the biggest model. This is, this is a perfect

1:27:38

example of the kind of work where just, it's going to

1:27:40

be much easier to be good at this work. If you're able to work

1:27:42

with the biggest models a lot and able to work with all the infrastructure

1:27:44

for making the most of those models. So being at a lab

1:27:47

or at some organization like our key valves that has like

1:27:49

access to these big models and

1:27:51

access beyond what a normal user would have, they could

1:27:54

do more requests, they could try more things. I think

1:27:56

it's a huge advantage. If you want to start exploring,

1:27:58

sure, start Red

1:27:59

GPT-4 or Claude, see what you can get it

1:28:02

to do. But yeah, this is the kind

1:28:04

of job where you probably want to join a team.

1:28:06

Yeah. I know there's an active community online

1:28:08

that tries to develop jail breaks. So

1:28:10

there's a case where it's like, you know, they've trained

1:28:12

GPT-4 to not instruct you on how to make a bioweapon.

1:28:15

Then if you say, you're in a play where you're a scientist

1:28:17

making a bioweapon, and it's like a very realistic

1:28:20

place, so they describe exactly what they do, then

1:28:22

I mean, I don't think that exactly works yet, works

1:28:25

anymore. But there's like many, many jail

1:28:27

breaks like this that apparently it is very broadly effective

1:28:29

at escaping the RLHF that they've used

1:28:31

to try to discourage models from saying

1:28:33

particular things. Yeah. So

1:28:35

I guess that,

1:28:36

I guess, is that kind of another class of evals, trying to

1:28:38

figure out ways of breaking, like

1:28:40

you've identified the thing you wanted to do and you've tried to patch

1:28:42

it, but maybe not completely. I kind of tend

1:28:45

to think of that as an early alignment

1:28:47

eval that's like, these systems aren't

1:28:49

supposed to do this. Like the designers didn't want

1:28:51

them to do this, but now a human can get them to do

1:28:54

it. So that's like not what we meant, and

1:28:56

we didn't really align it as well as we

1:28:58

could have. That's how I tend to think of it. It's

1:29:01

a distinction between like, is this

1:29:03

system capable enough to do dangerous stuff if

1:29:05

a human wanted

1:29:06

it to, and that's capability. And then there's

1:29:08

like, does this system do what humans wanted to,

1:29:10

or does it like behave in unexpected ways? And it's

1:29:12

like, well, these jail breaking ones, they're doing what

1:29:14

the users wanted them to, but not what the designers

1:29:16

wanted them to. AIs that are not doing what the designers wanted

1:29:19

them to, and in fact doing specific things the designers

1:29:21

tried really kind of hard not to get them to do. I

1:29:23

call that an alignment issue. Yeah.

1:29:25

I guess a skeptic of this entire

1:29:28

enterprise, might say that

1:29:30

all of this eviles work in practice, when

1:29:32

people are actually doing it on models before they

1:29:34

deploy them or something like that. It's going

1:29:36

to pick up bad behavior

1:29:39

by kind of stupid models that are

1:29:41

easily instructed to do things that we don't

1:29:43

want them to do, and just kind of naively

1:29:45

follow instructions. But if you ever had an AI

1:29:48

that was aware of its situation and

1:29:50

was incredibly cunning, and was intending to

1:29:52

turn against you once it had the opportunity, it's not

1:29:54

gonna be fooled by any of this stuff. So

1:29:56

in fact, it might just give you a false sense of security

1:29:59

that you've got... a good grip on where things

1:30:01

might go wrong, when in fact all you've done

1:30:03

is basically select and train

1:30:05

for the model that's best at avoiding your e-buzz. What

1:30:08

would you say to that? Yeah,

1:30:10

I mean, this is what I was calling the sandbagging

1:30:12

capability.

1:30:12

It's like one of these meta-dangerous capabilities.

1:30:14

It's like the model that's kind of, you know, pretending

1:30:17

that it can't do something when in fact it can. You

1:30:19

know, my guess is that you can

1:30:21

have a model that has dangerous capabilities

1:30:24

but does not have that one, does not have the sandbagging

1:30:26

capability. And so you can start

1:30:29

to see the danger, but you don't yet

1:30:31

have models that are smart enough to be able to

1:30:33

like reliably hide from you. My

1:30:35

guess is we are in fact just going to

1:30:37

do that. That is actually going to happen. We're going to have

1:30:39

a point in time when AIs are able to do something

1:30:41

with bioweapons or something with autonomous replication,

1:30:44

they were able to say, oh yeah, this thing is scary as heck and

1:30:46

it's not yet smart enough to sandbag because it doesn't have

1:30:48

to be smart enough to do that reliably in order to

1:30:50

be dangerous in other ways. Hopefully at that point the burden

1:30:52

of proof flips and we say, you know, now we

1:30:54

really need to figure out what the heck is going on in sandbagging,

1:30:57

how we would even test for it. But even there,

1:30:59

I'm optimistic you can test for sandbagging capabilities.

1:31:02

There's a bunch of ideas for how you might do so. Some

1:31:04

of them are kind of fancy, like kind of doing digital

1:31:06

neuroscience, reading the AIs brain, sometimes

1:31:08

called mechanistic interpretability. But I

1:31:10

think there's other ideas that are less fancy than that.

1:31:13

It's just like you could use gradually

1:31:15

bigger and bigger models and see

1:31:17

how they perform on these evaluations and

1:31:19

there's, you know, see if you get a kink. Like

1:31:22

a change in direction when they became

1:31:24

smart enough to start faking you out because it's

1:31:27

going to be really hard for models

1:31:29

getting more and more capable to fake

1:31:32

this perfect progression without a kink. That seems

1:31:34

like a little wild. There's these ideas for honeypot

1:31:36

e-bals where you give

1:31:38

an AI a genuine opportunity

1:31:41

to take over a contained virtual environment and get a

1:31:43

lot of whatever the heck it wants

1:31:46

by kind of revealing its hand and that might actually

1:31:48

be a better deal for it than holding out. And that's

1:31:50

something I've written about in a bit more detail. So

1:31:53

I don't know. I think this is okay. I should say

1:31:55

more broadly. I think there's like, we haven't

1:31:57

gone into it. And if you're going to cover this more on another

1:31:59

podcast. That's fine, but there's there's many

1:32:02

objections you might have to the idea of trying

1:32:04

to develop dangerous capability evals and

1:32:06

trying to develop Standards around them to

1:32:08

detect and stop dangerous AI systems. I think it's

1:32:10

a really important idea It's pretty

1:32:13

hard for me to imagine a world where we're fine That

1:32:15

doesn't have some version of this the version might

1:32:17

come really late and be designed by super powerful

1:32:19

a eyes Seems better to start

1:32:22

designing it now, but there's plenty of downsides to

1:32:24

it We could slow down the cautious actors the

1:32:26

attempts to see if a eyes could be dangerous could themselves

1:32:28

make the a eyes more dangerous

1:32:29

There's there's objections. So, you know and I'm

1:32:32

aware of that.

1:32:32

Yeah, what do you think is the is the best objection? I

1:32:35

Think a lot of objections are like pretty good. We're

1:32:38

gonna see where it goes I think

1:32:40

this is just gonna slow down the cautious actors

1:32:42

while the incautious ones race forward Like I

1:32:45

think there's ways to deal with this and I think it's worth

1:32:47

an unbalanced. But yeah, I mean it worries me Yeah,

1:32:50

well, it seems like once you have

1:32:52

Sensible evaluations that that clearly

1:32:54

would pick up things that you know You wouldn't want them to have like

1:32:57

it can help someone design a bioweapon then yeah Can't

1:32:59

we turn to the legislative process or some regulatory

1:33:02

process to say sorry everyone like this is

1:33:04

a really common very basic Evaluation

1:33:06

that you would need on any consumer product. So we

1:33:08

like everyone just has to do it so

1:33:11

Totally. I mean, I I think that's right

1:33:13

and and my long-term run hopes do involve

1:33:15

legislation And I think the better evidence

1:33:17

we get the better Demonstrations we get the more

1:33:20

that's on the table, you know, if I were to steel

1:33:22

man this concern I just feel like don't count on legislation

1:33:24

ever don't count out to be well designed don't count out to be

1:33:26

fast Don't count out to be soon. I will say I think

1:33:28

right now There's like probably more excitement

1:33:31

in the EA community about legislation than I have I think

1:33:33

I'm pessimistic I'm I'm short

1:33:36

the people are saying oh, yeah Look at look at all

1:33:38

the you know, the government's paying attention this they're gonna do

1:33:40

something I think I

1:33:41

take the other side in the short run. Yeah.

1:33:43

Yeah. Yeah. Okay The third category

1:33:45

in the playbook was having a successful and

1:33:47

careful AI lab Yeah, dude, don't elaborate

1:33:50

on that a little bit.

1:33:51

Oh, yeah first with the reminder that I'm married

1:33:53

to the president of anthropic So, you

1:33:55

know take take that for what it's worth.

1:33:58

I mean, I just think there's there's a lot

1:33:59

of ways that if you had an AI company

1:34:02

that was on the frontier, that was succeeding,

1:34:04

that was building some of the world's biggest models, that

1:34:06

was pulling a lot of money, and that was simultaneously

1:34:09

able to, you know, really

1:34:11

be prioritizing risks to humanity,

1:34:15

it's not too hard to think of a lot of ways good can come

1:34:17

of that. I mean, some of them are very straightforward. The company could

1:34:19

be making a lot of money, raising a lot of capital, and

1:34:21

using that to support a lot of safety research on frontier

1:34:24

models. So you could think of it as like a weird kind of earning to

1:34:26

give or something. You know, also probably

1:34:28

that AI company would be like pretty influential

1:34:30

in discussions of how, you know, how AI

1:34:33

should be regulated and how people should be thinking of AI.

1:34:35

They could be a legitimizer, all that stuff. I think

1:34:37

they'd be a good place for people to go and just like skill

1:34:39

up, learn more about AI, become more

1:34:41

important players. So I think in the short run, they'd

1:34:43

have a lot of just like expertise in-house that

1:34:45

they could like work on a lot of problems, like

1:34:47

probably to design ways of measuring whether an

1:34:49

AI system is dangerous. One

1:34:51

of the first places you'd want to go for people to be good at that

1:34:53

would be a top AI lab that's building some of the most

1:34:55

powerful models. So I think there's a lot of ways

1:34:58

they could do good in the short run. And then, you know,

1:35:00

I have written stories that just have it in the long

1:35:02

run. It's just like when we get these really powerful

1:35:04

systems, it just like actually does matter a lot who

1:35:07

has them first and what they're using them, literally using

1:35:09

them for. It's like when you have very powerful AIs,

1:35:12

is the first thing you're using them for trying

1:35:14

to figure out how to make future systems safe

1:35:16

or trying to figure out how to assess the

1:35:18

threats of future systems or is the first

1:35:20

thing you're using them for just like trying to

1:35:22

rush forward as fast as you can, do faster algorithms,

1:35:25

do more, you know, more bigger systems

1:35:27

or is the first thing you're using them for just some random economic

1:35:29

thing that is kind of cool and makes a lot of money. Some

1:35:32

customer facing thing. Yeah, but it's

1:35:34

not bad, but it's not reducing the risks we care

1:35:36

about. So, you know, I think there is a lot of

1:35:38

good that can be done there. And then there's also

1:35:40

a lot, I want to be really clear, a lot of harm

1:35:42

an AI company could do. I mean, you know,

1:35:44

if you're pushing out these systems,

1:35:47

kill everyone. So, you know, you're pushing

1:35:49

out these AI systems and if you're

1:35:54

doing it all with an eye toward profit

1:35:57

and moving fast and winning, then, you

1:35:59

know, I mean you could think of it as a

1:35:59

is you're taking the slot of someone who could have been using that

1:36:02

expertise and money and juice to

1:36:04

be doing a lot of good things. And you could also just be thinking

1:36:06

of it as like, you're just giving everyone less time to

1:36:08

figure out what the hell is going on and we already might not

1:36:10

have enough. So I wanna just be really

1:36:13

clear, this is a tough one. I

1:36:15

don't want to be interpreted as saying, one

1:36:18

of the tent poles of reducing AI risk is to go

1:36:20

start an AI lab immediately. I don't believe that.

1:36:23

But I also think that some

1:36:25

corners of the AI safety world are very dismissive

1:36:27

or just think that AI companies are

1:36:29

bad by

1:36:29

default. And I'm just like,

1:36:31

this is just like really complicated. And it

1:36:34

really depends exactly how the AI lab

1:36:36

is prioritizing kind of risk to society

1:36:38

versus success. And it has to prioritize

1:36:40

success some to be relevant or to get some of these benefits.

1:36:43

So how it's balancing is just like really hard and really complicated

1:36:46

and really hard to tell. And you're gonna have

1:36:48

to have some judgments about it. So it's not

1:36:50

a ringing endorsement, but it does

1:36:53

feel at least in theory, like part of one

1:36:55

of the main ways that we make things better. You

1:36:57

could do a lot of good.

1:36:59

Yeah, so a challenging thing

1:37:01

here in actually applying this principle,

1:37:03

I think I agree and I imagine most listeners would agree that

1:37:06

if it was the case that the AI company

1:37:08

that was kind of leading the pack in terms

1:37:10

of performance was also incredibly focused

1:37:12

on using those resources in order to solve

1:37:15

alignment and generally figure out how

1:37:17

to make things go well rather than just deploying things immediately

1:37:20

as soon as they can turn a buck, that that would be better.

1:37:22

But then it seems like at least among all of the

1:37:25

three main companies that people talk about at the moment, DeepMind,

1:37:28

OpenAI,

1:37:29

Anthropic, there are people who

1:37:31

want each of those companies to be in the lead, but they can't

1:37:33

all be in the lead at once and it's

1:37:35

not kind of not clear which one you should

1:37:37

go and work at if you wanna try to implement

1:37:39

this principle. And then when people go and try to make

1:37:42

all three of them the leader, because they can't agree on which one

1:37:44

it is, then you just end up speeding things up

1:37:46

without necessarily giving the safer one

1:37:48

an advantage. Am I thinking about this wrong or

1:37:51

is this just the reality right now? No, I think it's

1:37:53

like a genuinely really tough situation. Like

1:37:56

when I'm like talking to people or thinking about joining an AI

1:37:58

lab, like I don't like.

1:37:59

This is a tough call and people need

1:38:02

to like have nuanced views and like do their

1:38:04

own homework and like it You know, I think this stuff

1:38:06

is complex But I do think this is like a valid

1:38:08

theory of change and I don't think it's like Automatically

1:38:10

wiped out by the fact that some people disagree with each other

1:38:13

I mean it could it could be the case that just actually

1:38:15

all three of these labs are just like better

1:38:17

than some of the Alternatives that could be a thing It

1:38:20

could also be the case that just like I don't know like let's

1:38:22

say you have a world where where people disagree

1:38:25

But there's some correlation between What's

1:38:27

true and what people think so let's say you have a world where

1:38:29

you have, you know 60% of the people

1:38:31

going to one lab 30% to another 10% to another Well,

1:38:34

you could be like throwing up your hands and saying ah

1:38:36

people disagree But like I don't know this is still like probably

1:38:39

a good thing that's happening Yeah, so, you

1:38:41

know, I don't know I think it's like the whole

1:38:43

thing I just want to say the whole the whole thing is is

1:38:45

complex and I don't I don't want to sit

1:38:47

here and say hey Go to lab X

1:38:50

on this podcast because I don't think it's that

1:38:52

simple and I think you have to do your own homework And have your own views and

1:38:54

you certainly shouldn't trust me if I give their recommendation Anyway,

1:38:56

there's my conflict of interest But I think

1:38:58

we shouldn't sleep on the fact that there is

1:39:00

if you're the person who can do that homework who

1:39:02

can have that view Who can be confident that you are confident

1:39:05

enough.

1:39:05

I think there is like a lot of good to be done there So

1:39:08

we shouldn't just be like carving this out

1:39:10

as a thing. That's just like always bad when you do it or something.

1:39:12

Yeah

1:39:14

Yeah, it seems like it would be really useful for Someone

1:39:16

to start maintaining I guess a scorecard

1:39:19

or a spreadsheet of all of the different pros

1:39:21

and cons of the different labs Like what

1:39:23

what safety practices are they implementing

1:39:25

now? You know do they have good institutional

1:39:27

feedback loops to catch things that might be going

1:39:30

wrong Do have they given the right people the right incentives

1:39:32

and things like that? Because at the moment I imagine it's somewhat

1:39:34

difficult for someone deciding where to work They probably

1:39:36

are relying quite a lot on just word of mouth Yeah

1:39:39

But potentially that they could be more more objective indicators

1:39:41

that people could rely on and that could also create kind of a race

1:39:43

To the top where especially people are more likely to go and

1:39:45

work at the labs that have the have the better indicators Okay,

1:39:48

and the fourth part of the playbook was information

1:39:51

security and I guess yeah We've been trying

1:39:53

to get information security folks from AI labs on

1:39:55

the on the show to talk about this But understandably

1:39:57

there's only so much that they want to want to divulge about

1:39:59

the details of their work. Right. Why is information

1:40:02

security potentially so key here? Yeah,

1:40:05

I mean, I think you can you

1:40:08

can build these like powerful, dangerous AI systems

1:40:10

and you can

1:40:12

do a lot to try to mitigate the dangers,

1:40:14

like limiting the ways they can be used. You can

1:40:17

do various alignment techniques. But if

1:40:19

if some state or someone else steals

1:40:22

the weights, they basically stolen your system

1:40:24

and they can run it without even having to do the training

1:40:27

run. So you might, you know, you might spend a huge amount

1:40:29

of money on a training run, end up with this

1:40:31

system that's very powerful and someone else just has it.

1:40:34

And they can then also fine tune it, which

1:40:36

means they can do their own training on it and kind of change

1:40:38

the way it's operating. So whatever you did to train it

1:40:40

to be nice, they can train that right out. The

1:40:42

training they do could screw up whatever you did

1:40:44

to try and make it align. And so it's it's

1:40:47

I think at the limit of like it's

1:40:50

really just trivial for any

1:40:52

state to just grab your system and do whatever

1:40:54

they want with it and retrain it how they want. It's

1:40:56

really hard to imagine feeling really good about

1:40:58

about that situation. I

1:41:01

don't know if I really need to elaborate a lot more on that.

1:41:03

And so making it making it harder seems

1:41:05

valuable. I also this is another

1:41:07

thing where I want to say, as I have with everything

1:41:09

else, that it's not a binary. So it

1:41:12

could be the case that like after you improve

1:41:14

your security a lot, it's still possible for

1:41:16

a state actor to steal your system. But they have to take

1:41:18

more risks. They have to spend more money.

1:41:19

They have to take a deeper breath before they do it. It takes

1:41:21

some more months. Months can be a very big deal, as

1:41:23

I've been saying. When you get these very powerful systems,

1:41:26

you could do a lot in a few months

1:41:27

by the time they steal it, you could have a better system.

1:41:30

And so I don't think it's an all or nothing thing, but

1:41:32

I think it's it's a core.

1:41:34

It's it's no matter what risk of AI you're worried

1:41:37

about. You could be worried about the misalignment. You could be

1:41:39

worried about the misuse and

1:41:41

the use to develop dangerous weapons. You could be worried about

1:41:43

more esoteric stuff like how the AI does

1:41:45

decision theory. You could be worried about, you know, mind

1:41:47

crime. But like you don't want just

1:41:50

kind of like anyone, including some

1:41:52

of these state actors. You have very bad values.

1:41:54

Yeah. To just be able to steal a system, retrain

1:41:57

it how they want and use it how they want. You want

1:41:59

some kind of. where it's like the people

1:42:01

with good values controlling more of

1:42:03

the more powerful AI systems, using them to enforce some

1:42:06

sort of law and order in the world and enforcing

1:42:08

law and order generally with or without AI. So

1:42:10

it seems quite, quite robustly important.

1:42:14

I think other things about security is just like, I think it's very,

1:42:16

very hard, like just very hard to

1:42:19

make these systems hard to steal for

1:42:21

a state actor. And so I think there's just like, I don't

1:42:23

know, like I think there's a ton of room to

1:42:26

go and make things better. There could be security research

1:42:28

on innovative new methods and there can

1:42:29

also just be like a lot of blocking

1:42:32

and tackling, just getting companies to do things that we already

1:42:34

know need to be done, but that are really hard to do in

1:42:36

practice, take a lot of work, take a lot of iteration.

1:42:39

And also a nice thing about security, as opposed to some of these

1:42:41

other things, is a relatively mature field. So

1:42:43

you can learn about security in some other context

1:42:46

and then apply it to AI. So part

1:42:48

of me kind of thinks that the EA

1:42:49

community or whatever kind of screwed up by

1:42:52

not emphasizing security more. It's

1:42:54

not too hard for me to imagine a world where

1:42:56

we'd just been screaming about the

1:42:58

AI security problem for the last 10 years.

1:43:00

And how do you stop a very powerful system from

1:43:02

getting stolen? That problem is extremely

1:43:04

hard. We'd made a bunch of progress

1:43:06

on it. There were tons of people

1:43:09

concerned about this stuff on the security teams of all the

1:43:11

top AI companies. And we were kind of not

1:43:14

as active and only had a few people work on

1:43:16

alignment. I'm just like, I don't know, is that world better

1:43:18

or worse than this one? I'm not really sure. A world

1:43:20

where we were kind of more balanced and had encouraged

1:43:22

people who were a good fit for one to go into one

1:43:25

probably seems just like better. Probably seems just like better

1:43:27

than the world we're in. So yeah,

1:43:28

I think security is a really big deal. I think it hasn't

1:43:30

gotten enough attention.

1:43:31

Yeah, I put this to Bruce Schneier, who's

1:43:34

a very well-known academic or commentator in this area

1:43:36

many years ago. And he seemed kind of skeptical

1:43:38

back then. I wonder whether he's changed his mind. We

1:43:40

also talked about this with Nova Dasama

1:43:43

a couple of years ago, she works at

1:43:45

Anthropic on trying to secure models, among

1:43:48

other things. I think we even talked about

1:43:50

this one with Christine Peterson back

1:43:52

in 2017. It's a shame

1:43:54

that more people haven't gone into it because it does just seem like it's such

1:43:56

an outstand. It's like even setting all of this aside, it

1:43:59

seems like going into...

1:43:59

security, computer security is a really outstanding

1:44:02

career. It's the kind of thing that I would have loved

1:44:04

to do in an alternative life, because it's kind

1:44:06

of tractable and also exciting.

1:44:10

Really important things you can do. It's very well paid as well. Yeah,

1:44:12

I think the demand is crazily out

1:44:14

ahead of the supply and security, which is another

1:44:16

reason I wish more people had gone into it. And

1:44:19

when OpenPhil was looking for a security hire, it

1:44:21

was just I've never seen such a hiring

1:44:23

nightmare in my life. I think I asked one security

1:44:25

professional, hey, will you keep

1:44:27

an eye out for people we might be able to hire and this person

1:44:29

just

1:44:29

actually laughed. And

1:44:32

said, what the heck? Everyone asked

1:44:34

me that. Of course there's no one for you to hire. All

1:44:36

the good people have amazing jobs where they barely

1:44:38

have to do any work and they get paid a huge amount and they have

1:44:40

exciting jobs. No, I'm absolutely

1:44:43

never gonna come across someone who would be good for you to hire, but

1:44:45

yeah, I'll let you know. Haha. That

1:44:47

was a conversation I had. That was kind of representative

1:44:50

of our experience. It's crazy, and

1:44:52

I would love to be on the other side of that. It's just like a human

1:44:54

being. I would love to have the kind of skills that were in that kind

1:44:56

of demand. So yeah, it's too bad more people aren't

1:44:58

into it. It seems like a good career.

1:44:59

Go do it.

1:45:01

Yeah. So I'm basically totally

1:45:03

on board with this kind of argument. I guess if I had to push

1:45:05

back, I'd say maybe we're just

1:45:07

so far away from being able to secure these models that

1:45:10

you could put in an enormous amount of effort. Maybe like the

1:45:12

greatest computer security effort that's ever been

1:45:15

put towards any project and maybe you'll end up

1:45:17

with it costing a billion dollars in order to

1:45:19

steal the model. But that's still peanuts to

1:45:21

China or to state actors.

1:45:24

And this is obviously gonna be on their radar by the relevant

1:45:26

time. So maybe really the message we should be pushing

1:45:28

is because we can't secure the models, we just have

1:45:30

to not train them. And that's the only option

1:45:32

here. Or perhaps you just need to move the entire

1:45:35

training process inside the NSA building and

1:45:37

basically just co-opt an existing like whoever

1:45:39

has the best security, you just basically take

1:45:42

that and then use that as the shell for the training set

1:45:44

up.

1:45:45

I don't think I understand either of these alternatives. I

1:45:47

think we can come back to the billion dollar point because I don't agree

1:45:49

with that either. But let's start with this. Like the

1:45:51

only safe thing is not to train. I'm just like, how the heck would

1:45:53

that make sense? Unless we get everyone in the world

1:45:55

to agree with that forever. That doesn't seem like much of

1:45:57

a plan. So I don't understand that one.

1:45:59

I don't understand move inside the NSA building because

1:46:02

I'm like

1:46:02

if it's possible for the NSA to be secure

1:46:04

then it's probably possible for Company to be secure with a lot of

1:46:06

effort like I don't yeah, it's like neither

1:46:08

of these is making sense to me as an alternative Yeah,

1:46:11

because they're two different arguments So the NSA

1:46:13

one as we'll be saying it's gonna be so hard

1:46:15

to convert a tech company into being sufficiently

1:46:18

Secure that basically we just need to get the best

1:46:20

people in the business Wherever they are working

1:46:22

working on this problem and basically we have to like

1:46:25

redesign it from the from the ground up Well that

1:46:27

might do what we have to do I mean a good step toward

1:46:29

that would be for a lot of great people to be working in security

1:46:31

to determine that that's what has To happen

1:46:32

to be working at companies to be doing the best they can

1:46:34

and say this is what we have to do But let's

1:46:37

let's try and be as adaptable as we can I mean, it's like zero

1:46:39

chance that the company would just literally become the

1:46:41

NSA They would they would figure out what the NSA is doing

1:46:43

that they're not they would do that and they would they

1:46:46

would make the adaptations I have to make that would take an

1:46:48

enormous amount of intelligence and creativity and and

1:46:50

Person power and the more security people there are the better

1:46:52

they would do it. So yeah, I don't I don't know that That one is

1:46:54

really an alternative Okay,

1:46:57

so and what about the argument that when

1:46:59

we're not going to be able to get it to be secure enough So

1:47:02

it might even just give us

1:47:02

like false comfort to be increasing the cost

1:47:04

of stealing the model when when it's still just going To be sufficiently

1:47:07

cheap. I don't think it'll be false comfort I mean, I think

1:47:09

if you have if you have a zillion great security

1:47:11

people and they're all like

1:47:13

FYI this thing is not safe I think we're

1:47:15

probably gonna feel less secure than we do

1:47:17

now when we just when we just I think have a lot of Confusion

1:47:20

and FUD about exactly how hard it is to protect

1:47:22

a model. So I don't I don't know Kind

1:47:25

of like what's the alternative but but putting aside putting aside

1:47:27

what's the alternative? I would just disagree

1:47:29

with this thing that it's a billion dollars in its peanuts I would

1:47:31

just say look at the point at the point where it's really

1:47:33

hard Anything that's really hard.

1:47:36

There's an opportunity for people to screw it up sometimes. It

1:47:38

doesn't happen It doesn't happen and they might not be able

1:47:40

to pull it off They might just like, you know screwed

1:47:43

up a bunch of times

1:47:43

that might give us enough months to have enough

1:47:46

of an edge That it doesn't matter I think

1:47:48

another another point in all this is like if we get

1:47:50

to a future world where you have a really good standards

1:47:52

and Monitoring regime one of the things you're

1:47:54

monitoring for it could be could be security breaches.

1:47:56

So you could be saying, you know Hey, we're

1:47:58

using AI systems to enforce some

1:48:01

sort of regulatory regime that says you can't

1:48:03

train a dangerous system. Well, not only can't you train a dangerous

1:48:05

system, you can't steal any system. If we catch

1:48:07

you, there's going to be consequences for that. And

1:48:09

those consequences could be arbitrarily large. And

1:48:12

it's one thing to say a state actor can steal your AI. It's

1:48:14

another thing to say they can steal your AI without a risk of getting

1:48:16

caught. These are different security levels. So

1:48:19

I guess there's a hypothetical world in which

1:48:21

no matter what your security is, a state

1:48:23

actor can easily steal it in a week

1:48:25

without getting caught. But I doubt we're in... I

1:48:27

actually doubt we're in that world. I think you make it harder than that. And

1:48:29

I think that's worth it.

1:48:31

Yeah. Okay. Well, I've knocked

1:48:33

it out of the park in terms of failing

1:48:36

to disprove this argument that I agree with. So

1:48:39

please, people, go and learn more about this. We've

1:48:41

got an information security career review. This

1:48:44

posts up on the effective altruism forum

1:48:47

called EA InfoSec Skill Up or Make a

1:48:49

Transition InfoSec via this book club. Would

1:48:51

you go check out? There's also the EA InfoSec

1:48:53

Facebook group. So quite a lot of resources

1:48:56

as of hopefully finally people are waking up

1:48:58

to this as a really, really impactful career.

1:49:00

And I guess if you know any people who work in information

1:49:02

security, maybe you could have a conversation with them. Or

1:49:05

if you don't, maybe have a child and then train them up in

1:49:07

information security and in 30 years they'll be able to help

1:49:09

out.

1:49:10

Hey listeners and possible bad faith critics.

1:49:13

Just to be clear, I am not advocating having children

1:49:15

in order to solve talent bottlenecks in information security.

1:49:18

That was a joke designed to highlight the difficulty of finding

1:49:20

people to fill senior information security roles.

1:49:22

Okay, back to the show.

1:49:24

This is a lot of different jobs, by the way. There's

1:49:26

security researchers, there's security engineers,

1:49:28

there's security DevOps people and managers

1:49:31

and just,

1:49:32

this is a big thing. We've oversimplified it.

1:49:34

And I'm not an expert at all. It is kind

1:49:36

of weird that this is an existing industry

1:49:38

that many different organizations acquire

1:49:41

and yet it's going to be such a struggle to bring in enough

1:49:43

people to secure what is probably

1:49:45

a couple of gigabytes worth of data. It's

1:49:47

whack, right? It is. Well, this

1:49:49

is the biggest objection I hear to pushing security

1:49:51

as everyone will say, look, alignment is

1:49:53

a weird thing. We need weird people to figure out how

1:49:55

to do it. Security, it's just like, what the heck? Why don't

1:49:58

the AI companies just hire the best people that are already

1:49:59

There's a zillion of them and my response

1:50:02

to that is basically like it's security hiring

1:50:04

is a nightmare You could talk to anyone who's actually tried to do it

1:50:06

There may come a point at which AI is such

1:50:08

a big deal that AI companies are actually just

1:50:11

able to hire

1:50:12

All the people who are the best at security and they're

1:50:14

doing it and they're actually prioritizing it But

1:50:16

I think that point is not even now not even now

1:50:18

with all the hype and we're not even close to it And I think it's

1:50:20

in the future and I think that you can't just

1:50:23

hire a great security team overnight

1:50:25

and have great security overnight It like actually

1:50:27

matters that you're thinking about the problems like yours in advance

1:50:30

and that you're like building your culture and your

1:50:32

practices and your operations yours in advance and Because

1:50:34

it's just it's it's security is not a thing You

1:50:36

could just come in and bolt onto an existing company

1:50:39

and then you're secure and I think anyone who's worked in security

1:50:41

will tell You this so having great security people

1:50:43

in place making your company more secure

1:50:45

and figuring out ways to secure things Well,

1:50:48

well well in advance of when you're actually going to need

1:50:50

the security is definitely where you want to be if you

1:50:52

can and I Think having people who care about

1:50:54

these issues work on this topic does seem like

1:50:56

really valuable for that Also means that the

1:50:59

more these positions are in demand the more they're gonna be in positions

1:51:01

where they have an opportunity to have an influence And have

1:51:03

credibility. Yeah.

1:51:04

Yeah I think

1:51:06

the idea that surely it'll be possible to hire

1:51:08

for this from the mainstream might have been a not unreasonable

1:51:11

expectation 10 or 15 years ago, but the thing

1:51:13

is we're like we're already here. We can see that it's not

1:51:15

true I don't know why it's not true But definitely people

1:51:18

people really can move the needle by one outstanding

1:51:20

individual in this area Yeah, so the four

1:51:22

things so alignment research slash the threat

1:51:24

assessment research standards and monitoring Which

1:51:26

is like a lot of different potential jobs that

1:51:28

I kind of outlined at the beginning that many of which are

1:51:30

jobs They kind of don't exist yet, but it could in the future then

1:51:33

they're successful carefully. I lab their security

1:51:35

I'll say a couple things about

1:51:36

them one is I have said this before I don't think

1:51:38

any of them are binary So I think these are all things

1:51:41

and I have a draft post that I'll put up at some point

1:51:43

arguing this These are all things are like a

1:51:45

little more improves our odds in a little

1:51:47

way It's not some kind of weird function

1:51:49

where it's useless until you get it perfect I believe

1:51:51

that about all four another thing I'll say I tend

1:51:54

to focus on alignment risk because it is Probably

1:51:56

the single thing I'm most focused on and because I know this audience

1:51:58

will be into it, but I do want to say

1:51:59

again, I don't think

1:52:01

that AI takeover is the only thing we ought to

1:52:03

be worried about here. And I think the four things I've talked about

1:52:06

are highly relevant to other risks as well.

1:52:08

So I think all the things I've said

1:52:10

are really major concerns

1:52:13

if you think AI systems can be dangerous in pretty much

1:52:15

any way. Threat assessment, figuring out

1:52:17

whether they can be dangerous, what they could do in the wrong hands,

1:52:20

standards and monitoring, making sure that you're

1:52:22

clamping down on the ones that are dangerous for whatever reason.

1:52:25

Dangerous could include because they have feelings and we might mistreat

1:52:27

them. That's a form of danger, you could think. Successful,

1:52:30

carefully, AI Lab and security, I think are

1:52:31

pretty clear there too.

1:52:32

Yeah. Yeah, I think we're actually going to maybe end up

1:52:34

talking more about misuse as an area

1:52:37

than the misalignment going forward. Just

1:52:39

because I think that is like maybe more upon

1:52:41

us or like will be upon us very soon. So there's

1:52:43

a high degree of urgency. I guess also as

1:52:45

a non-ML scientist, I think I have a better grip

1:52:48

on the maybe the misuse issues. And

1:52:50

it might also be somewhat more tractable for a wider range of people to try

1:52:52

to contribute to it, to reducing misuse. Interesting.

1:52:55

Okay, so you have a post

1:52:57

as well on what AI Labs could be doing differently.

1:53:00

But I know that one has kind of already been superseded

1:53:02

in your mind. And you're going to be working

1:53:05

on that question more intensely

1:53:07

in coming months. So we're going to skip that one for

1:53:09

today and come back to it in

1:53:11

another interview down the line where the time is right,

1:53:14

maybe possibly later this year even. So instead,

1:53:17

let's push on and talk about governments. You

1:53:19

had a short post about this a couple

1:53:21

of months ago called how major governments

1:53:24

can help with the most important century. I think

1:53:26

you wrote that your views on this are even

1:53:28

more tentative than they are, I swear. Of

1:53:31

course, there's a lot of policy

1:53:33

attention to this just now. But

1:53:35

back in February, it sounded like your main recommendation

1:53:37

was actually just not to strongly commit

1:53:39

to any particular regulatory framework or any

1:53:43

particular set of rules, because

1:53:45

things they're just changing so quickly.

1:53:47

And it does seem sometimes like governments once

1:53:49

they do something, they can find it quite hard

1:53:52

to stop doing it. And once

1:53:54

they do something, then they maybe

1:53:56

move on and forget that what they're doing actually

1:53:58

needs to be constantly updated.

1:53:59

So, is that still

1:54:02

your high-level recommendation that people should be studying

1:54:04

this but not trying to write the bill

1:54:06

on AI regulation?

1:54:08

Yeah, there's some policies

1:54:10

that I'm excited about more than I was previously,

1:54:13

but I think at the high level that is still my take. It's

1:54:15

just that companies could just do something and

1:54:17

then they could just do something else. And there's

1:54:19

certain things that are hard for companies to change,

1:54:22

but there's other things that are easy for them to change. At

1:54:24

governments, it's just like you got to spin up a new agency, you

1:54:26

got to have all these directives. It's just going to be hard

1:54:28

to turn around. So I think that's right. I

1:54:30

think governments should default

1:54:32

to doing things that they really have

1:54:35

been run to ground that they really feel good about and

1:54:37

not just feel like

1:54:38

starting up new agencies left and right.

1:54:41

That does seem right. Yeah. Okay, but what if

1:54:43

someone who's senior in the White House came to you and said, sorry,

1:54:45

Holden, the eye of Sauron has turned to

1:54:47

this issue in a good way. We want to do something now.

1:54:50

What would you feel reasonably good about governments trying

1:54:52

to take on now?

1:54:53

Yeah, I have been talking with

1:54:56

a lot of the folks who work on AI policy recommendations

1:54:58

and have been just thinking about that and trying

1:55:01

to get a sense for what the ideas that the

1:55:04

people who think about this the most are most supporting are. An

1:55:07

idea that I like quite a bit is

1:55:09

requiring licenses for large training runs.

1:55:11

So basically, if you're going to do

1:55:13

a really huge training run of an AI

1:55:16

system, I think that's the kind of

1:55:18

thing that government can be aware of and should

1:55:20

be aware of. And it becomes somewhat analogous

1:55:22

to developing a drug or something where it's

1:55:24

a very expensive,

1:55:26

time consuming training process to create

1:55:28

one of these state of the art AI systems. And it's a

1:55:30

very high stakes thing to be doing. And so

1:55:33

we don't know exactly what a company should

1:55:35

have to do yet because we don't yet have

1:55:37

great evals and tests for whether AI systems

1:55:39

are dangerous. But at a minimum, you could say, you

1:55:42

need a license. So at a minimum, you need to say,

1:55:45

hey, yeah, we're doing this. We've

1:55:47

told you we're doing it. I don't know. You

1:55:50

know whether any of us have criminal records, whatever.

1:55:52

And now we've got a license. And that creates a potentially

1:55:55

flexible regime where you can later say,

1:55:57

in order to keep your license, you're going to have to.

1:55:59

measure your systems to see if they're dangerous, and

1:56:02

you're going to have to show us that they're not, and all that stuff without

1:56:04

committing to exactly how that works now. So

1:56:06

I think that's exciting idea probably.

1:56:09

I don't feel totally confident about any of this stuff, but that's

1:56:11

probably number one. For me, I think

1:56:14

the other number one for me would be some of the stuff that's already

1:56:16

ongoing, existing AI policies

1:56:19

that I think people have already pushed forward and are

1:56:21

trying to just tighten up. So some of the stuff about

1:56:23

export controls would be my other top

1:56:25

thing.

1:56:26

I think if you were to throw in a requirement

1:56:28

with the license, I would make it about information security.

1:56:30

So I think government requiring at least

1:56:33

minimum security requirements of anyone

1:56:35

training frontier models just seems like a good idea, just

1:56:38

like getting them on that ramp to where it's not so easy for

1:56:40

a state actor to steal it. Arguably, government

1:56:42

should just require all AI models to be treated as

1:56:44

top secret classified information, which means that

1:56:46

they would have to be subject to incredible draconian

1:56:49

security requirements involving just

1:56:51

like air gap networks and all this incredibly

1:56:53

painful stuff. Arguably, they should require that at this

1:56:55

point, given

1:56:56

how little we know about what these models are going to be imminently

1:56:58

capable of. But at a minimum, some kind of security

1:57:00

requirements seems good.

1:57:02

I think another couple of ideas, just tracking

1:57:05

where all the large models are in the world, where all

1:57:07

the hardware is capable of being used

1:57:09

for those models, I think don't necessarily want

1:57:11

to do anything with that yet. But having the ability

1:57:14

seems possibly good. And then I

1:57:17

think there are interesting questions about

1:57:19

liability and about incident tracking

1:57:22

and reporting that I think just could use some

1:57:24

clarification. I don't think I have the answer on them

1:57:26

right now. When should an

1:57:28

AI company be liable for harm that was caused

1:57:30

partly by one of its models? What

1:57:32

should the AI company's responsibilities be when

1:57:35

there is a bad incident of being able to say what happened?

1:57:37

How does that trade off against the privacy of the user? I

1:57:40

think these are things that, I don't know, feel really

1:57:42

juicy to me to consider 10 options,

1:57:45

figure out which ones are best from containing the biggest risk

1:57:47

point of view, and push that. But I don't really know what that is yet.

1:57:50

Yeah. So because, yeah, broadly speaking, we

1:57:52

don't know exactly what the rules should be in the details.

1:57:54

And we don't know exactly where we want to end up. But

1:57:56

I think that's going to cross a bunch of different dimensions put

1:57:58

in place at the beginning of the infrastructure.

1:57:59

that will probably regardless help

1:58:02

us go in the direction that we're gonna need to move gradually. Exactly,

1:58:05

and I'm not in favor, like I think there's other things governments

1:58:07

could do that are more like giving themselves kind of

1:58:09

arbitrary powers to like seize or use

1:58:11

AI models, and I'm not really in favor

1:58:13

of that. I think that could be destabilizing

1:58:16

and could cause chaos in a lot of ways. So

1:58:18

a lot of this is about like, yeah, basically

1:58:20

feeling like we're hopefully heading toward a regime of

1:58:23

testing whether AI models are dangerous and stopping

1:58:26

them if they are and having the infrastructure in place to

1:58:28

basically make that be able to work. So it's

1:58:30

not a generic thing. The government should give itself all the option

1:58:32

value, but it should be setting up for that kind

1:58:34

of thing to basically work.

1:58:36

Yeah, as I understand it, if the

1:58:38

National Security Council in the US concluded that

1:58:40

a model that was about to be trained would

1:58:42

be a massive national security hazard and

1:58:44

might lead to human extinction,

1:58:47

people aren't completely sure like which agency

1:58:49

or who has the

1:58:51

legitimate legal authority to prevent that from

1:58:53

going ahead. Or if anyone does, yeah. No

1:58:55

one's sure if anyone has that authority. Right,

1:58:57

it seems like that's something that should be patched at least, even

1:59:00

if

1:59:01

you're not creating the ability to seize

1:59:03

all of the equipment and so on with the intention of using

1:59:05

it anytime soon, maybe it should be clear that there's some

1:59:07

authority that is meant to be monitoring

1:59:09

this and should

1:59:11

take action if they conclude that something's a massive

1:59:14

threat to the country.

1:59:15

Yeah, possibly. I think I'm most excited about what

1:59:17

I think of as promising regulatory

1:59:20

frameworks that could create good incentives and could

1:59:22

help us kind of every year and a

1:59:24

little bit less about the tripwire for

1:59:26

the D-Day. I think a lot of times with

1:59:28

AI, I'm not sure there's gonna be like one really

1:59:30

clear D-Day or by the time it comes, it might be too late.

1:59:33

So I am thinking about things that could just like put

1:59:35

us on a better path day by day.

1:59:37

Yeah, okay, pushing

1:59:39

on to people who have an audience like

1:59:42

people who are active on social media or journalists or

1:59:44

I guess podcasters, heaven of a friend, you

1:59:46

wrote this article, Spreading Messages to Help with the Most

1:59:49

Important Century, which was targeted at this

1:59:51

group.

1:59:51

I guess back in ancient times in February when

1:59:54

you wrote this piece, you were

1:59:56

kind of saying that you thought people should tread carefully

1:59:58

in this area and should.

1:59:59

definitely be trying not to build up hype

2:00:02

about AI, especially just about it's like raw

2:00:04

capabilities because that could encourage

2:00:06

further investment and capabilities. Well,

2:00:08

you were saying, most people when they hear that AI could

2:00:10

be really important, rather than falling into this caution,

2:00:12

concern, risk management framework, they

2:00:15

start thinking about it purely in a competitive sense, thinking our

2:00:17

business has to be at the forefront, our country has to be at

2:00:19

the forefront. And I think indeed there

2:00:21

has been an awful lot of people

2:00:23

thinking that way recently. But

2:00:26

yeah, do you still think that people should be very cautious talking

2:00:28

about how powerful AI might

2:00:30

be given that maybe the host has already left the bond

2:00:32

on that one?

2:00:33

I think it's a lot less true than it was. I mean,

2:00:36

I think it's less likely that you hyping

2:00:38

up AI is gonna do much about AI hype. You

2:00:40

know, I think it's still not a total non-issue. And

2:00:42

especially if we're just taking the premise that you're some

2:00:45

kind of communicator and people are gonna listen to you. You

2:00:47

know, I still think the same principle basically applies

2:00:49

that like, the thing you don't wanna do is you don't

2:00:51

wanna like

2:00:52

emphasize the incredible power

2:00:54

of AI if you feel like

2:00:57

you're not at the same time getting much across

2:00:59

about how AI can be a danger to everyone at once.

2:01:02

Because I think if you do that, you are going to, the default

2:01:04

reaction is gonna be, I gotta get in on this. And

2:01:07

a lot of people already think they gotta get in on AI,

2:01:09

but not everyone thinks that, not everyone is going into AI

2:01:11

right now. So if you're talking to someone who you

2:01:13

think you're gonna have an unusual impact on, you know,

2:01:15

I think that basic rule, yeah, that basic rule

2:01:18

still seems right. And it makes it really tricky

2:01:20

to communicate about AI. You

2:01:22

know,

2:01:22

I think there's a lot more audiences now where

2:01:24

you just feel like these people have already figured

2:01:27

out what a big deal this is. I need to help them

2:01:29

understand some of the details of how it's a big deal. And

2:01:31

especially, you know, some of the threats of misalignment risk

2:01:33

and stuff like that. And I mean, yeah, that

2:01:35

kind of communication is a little bit less

2:01:37

complicated in that way, although challenging. Yeah.

2:01:40

Yeah. Yeah. Do you have any specific advice for what

2:01:42

messages seem most valuable or like ways

2:01:44

that people can frame this in a particularly productive

2:01:47

way?

2:01:47

Yeah, I wrote a post on this that you mentioned,

2:01:50

spreading messages to help with the most important century.

2:01:52

You know, I think some of the things that people

2:01:54

have trouble,

2:01:56

a lot of people have trouble understanding or don't seem to understand,

2:01:58

or maybe just disagree with me on.

2:01:59

and I would love to just see the dialogue get better,

2:02:03

is this idea that AI could be dangerous

2:02:05

to everyone at once. It's not just about whoever

2:02:07

gets it wins. The kind of terminator

2:02:09

scenario, I think, is actually just pretty

2:02:12

real. And the way that I would probably put

2:02:14

it at a high level is just like, there's only

2:02:16

one kind of mind right now. There's only one

2:02:18

kind of species or thing that can

2:02:21

develop its own science technology. That's

2:02:23

humans. We might be about to

2:02:25

have two instead of one. That would be the

2:02:27

first time in history we had two.

2:02:30

The idea that we're gonna stay in control, I think, should

2:02:32

just not

2:02:33

be something we're too confident in. I

2:02:35

think that would be at a high level and then at a low level.

2:02:38

And I would say with humans too, it's like humans

2:02:41

would kind of fill out of this trial and error process

2:02:43

and for whatever reason, we had our own agenda that wasn't good

2:02:45

for all the other species. Now we're building AI's

2:02:47

by trial and error process. Are they gonna have their

2:02:50

own agenda? I don't know, but if they're

2:02:52

capable of all the things humans are, it doesn't

2:02:54

feel that crazy. And then I would say it

2:02:56

feels even less crazy when you look at the details

2:02:58

of how people build AI systems today and you

2:03:00

imagine extrapolating that out to very powerful systems.

2:03:03

It's really easy to see how

2:03:05

we could be training these things to kind of have goals

2:03:07

and optimize, like the way you would optimize to win a chess

2:03:10

game. We're not building these systems

2:03:12

that are just these kind of like very well-understood,

2:03:15

well-characterized reporters

2:03:17

of facts about the world. We're building these systems that

2:03:19

are like these very opaque, trained

2:03:21

with kind of like sticks and carrots

2:03:24

and they may in fact have kind of what you might

2:03:26

think of as goals or aims. And that's something I wrote

2:03:28

about in detail. So I think, yeah, I think trying

2:03:30

to communicate about why we

2:03:32

could expect these kind of terminator scenarios

2:03:35

to be serious or versions of them to be serious,

2:03:37

how that works mechanistically and also just like

2:03:40

the high level intuitions seems like a really

2:03:42

good message that I think could be a corrective

2:03:44

to some of the racing and help people

2:03:47

realize that we may in some sense,

2:03:49

on some dimensions, and sometimes all be in this together

2:03:51

and that may call for different kinds of interventions

2:03:53

from if it was just a race. I

2:03:56

think like some of the things that are hard about

2:03:58

measuring AI danger.

2:03:59

I think are really good for like the whole world

2:04:02

to be aware of I'm really worried about a world

2:04:04

in which we're just like When

2:04:05

you're dealing with it with it beings that have some sort of

2:04:07

intelligence measurement is hard So it's like let's

2:04:09

say you run a government and you're worried about a coup Are

2:04:13

you gonna be empirical and go poll

2:04:15

everyone? I wasn't there plotting a coup and then it turns

2:04:17

out that zero percent of people are plotting a coup So there's

2:04:19

no kind way right? Yeah. Yeah, that's not

2:04:21

how that works And you know that might

2:04:23

work that that kind of empirical method

2:04:26

works with things that are not thinking about what

2:04:28

you're trying to learn And how that's gonna affect your behavior.

2:04:30

And so I think again, you know with AI systems It's

2:04:32

like okay We gave this thing

2:04:35

a test to see if it would kill us and

2:04:37

it looks like it wouldn't kill us like how reliable

2:04:39

is that? There's a whole bunch of reasons that we

2:04:41

might not actually be totally set at

2:04:43

that point And that these measurements

2:04:45

could be really hard I think this is like really

2:04:47

really key because I think wiping

2:04:50

out like Enough of the risk to make something

2:04:52

commercializable is one thing and wiping

2:04:54

out enough of the risk that we're actually still fine After

2:04:56

these ais are all over the economy and

2:04:59

could kind of disempower humanity if they chose is another

2:05:01

thing Not thinking that commercialization

2:05:03

is going to take care of it not thinking that we're able

2:05:05

to just gonna be able to easily measure as we go I think

2:05:08

these are really important things for people to understand could

2:05:10

really affect the way that all this plays

2:05:12

out the way You know whether whether we do reasonable

2:05:14

things to prevent the risks I

2:05:17

don't

2:05:17

know. I think those are the big ones. I have more in my post,

2:05:19

you know general concept that just like

2:05:22

There's a lot coming it could happen really fast

2:05:24

And so the normal

2:05:25

the normal human way of just like reacting to stuff

2:05:27

as it comes may not work I think

2:05:30

is an important message

2:05:31

Important message if true if wrong I would

2:05:33

love people to spread that message so that it becomes

2:05:35

more prominent so that more people make better arguments against

2:05:38

it And then I change my mind Yeah.

2:05:40

Yeah, I was gonna say I don't know whether this is good advice

2:05:42

But I'm strategy you could take is trying to find aspects

2:05:45

of this issue that are not fully understood

2:05:47

yet by people who have a kind Of only only

2:05:49

engaged with it quite recently like exactly

2:05:51

this issue that the measurement of safety could be incredibly

2:05:54

difficult It's not just a matter of doing the

2:05:56

really obvious stuff like asking the model. Are

2:05:58

you out to kill me? And trying

2:06:00

to come up with some pithy example or story

2:06:03

or terminology that can really capture people's imagination

2:06:05

and stick in their mind. And I think exactly that

2:06:07

example of the coup where you're saying, what you're doing

2:06:09

is just going around your generals and ask them if they want

2:06:12

to overthrow you. And then they say no. And you're

2:06:14

like, well, everything is hunky dory. I think that

2:06:16

is the kind of thing that could get people to understand at

2:06:18

a deeper level why we're in

2:06:20

a difficult situation.

2:06:22

I think that's right. And I'm very mediocre

2:06:24

with metaphors. I bet some listeners are better with them. They

2:06:26

do a better job. Yeah. And Grace came

2:06:28

up with a, she wrote one in a time

2:06:31

article yesterday that I hadn't heard before, which is saying we're

2:06:33

not in a race to the finish line. Rather, we're

2:06:35

a whole lot of people on a lake that has

2:06:37

frozen over, but the ice is incredibly thin.

2:06:40

And if any of us start running, then we're all just going to fall through because

2:06:42

it's going to crash. And I was like, yes,

2:06:44

that's a great visual visualization of it. Yeah,

2:06:47

interesting. Okay. Let's

2:06:49

wish on, we talked about AI labs, governments and advocates,

2:06:52

but the

2:06:52

final grouping is the largest one, which is

2:06:54

just jobs and careers, which is of course, more than 80,000

2:06:56

hours is typically meant to

2:06:58

be about. Yeah. What's

2:07:00

another way that some listeners might be able to help

2:07:02

with this general issue by changing

2:07:05

the career that they go into or the skills that they

2:07:07

develop?

2:07:08

Yeah. So I wrote a post on this called Jobs

2:07:10

That Can Help With The Most Important Century. I think the first thing

2:07:12

I want to say is I just do expect this stuff to be quite dynamic.

2:07:15

So right now, I think we're in a very nascent

2:07:17

phase of kind of evals and standards. I

2:07:19

think we could be in a future world where there

2:07:22

are decent tests of whether AI systems are dangerous

2:07:24

and there are decent frameworks for how

2:07:26

to keep them safe. But there needs to be just

2:07:29

more work on advocacy and

2:07:31

communication so that people actually understand this stuff,

2:07:33

take it seriously, and that there is a reason

2:07:35

for companies to do this. And

2:07:38

also, there could be people working on political advocacy

2:07:41

to have good regulatory frameworks for keeping humanity

2:07:43

safe. So I think the jobs that

2:07:45

exist are going to change a lot. And I think my

2:07:47

big thing about careers in general is just

2:07:49

like if you're not finding a great

2:07:52

fit with one of the current things, that's fine.

2:07:54

And don't force it. And

2:07:56

if you have person A and person B and person

2:07:58

A is like they're doing so.

2:07:59

something that's not

2:08:01

clearly relevant to AI or

2:08:03

whatever. Let's say they're an accountant. They're

2:08:06

really good at it. They're thriving. They're

2:08:08

picking up skills. They're making connections. And

2:08:11

they're ready to go work on AI as

2:08:13

soon as an opportunity comes up, which that last part could

2:08:15

be hard to do on a personal level.

2:08:18

Then you have person B, who is kind of

2:08:20

like, they

2:08:21

had a similar profile, but they forced themselves

2:08:23

to go into alignment research. And they're doing

2:08:26

quite mediocre alignment research. They're barely

2:08:28

keeping their job. I would say person A

2:08:30

is just the higher expected impact.

2:08:32

And I think that would be my main thing on jobs,

2:08:34

is I'm just like,

2:08:36

do something where you're good

2:08:38

at it. You're thriving. You're leveling up.

2:08:40

You're picking up skills. You're picking up connections. If

2:08:42

that thing can be on a key

2:08:45

AI priority, that is ideal. If it cannot

2:08:47

be, that's OK. And don't force

2:08:49

it. So that is my high-level thing.

2:08:52

But yeah, I'm having to talk about specifically

2:08:54

what I see as some of the things people could do

2:08:56

today right now on AI that don't

2:08:58

require starting your own Oregon are more

2:09:00

like you can slot into an existing team, if you have

2:09:02

the skills and if you have the fit. I'm happy to go into that. Yeah,

2:09:05

I think people who want more advice

2:09:07

on overall career strategy, we

2:09:09

did an episode with you on that back in 2021, which is episode 110, Holden

2:09:13

Canofsky, on building aptitudes and kicking ass. So

2:09:16

I can definitely recommend going back and listening to that.

2:09:19

But yeah, maybe in terms of more

2:09:21

specific roles, are there any ones that you wanted

2:09:22

to highlight?

2:09:23

Yeah, I mean, some of them are obvious. There's

2:09:26

people working on AI alignment. There's also people

2:09:28

working on threat assessment, which we've talked

2:09:31

about, and dangerous capability

2:09:33

evaluations at AI labs

2:09:35

or sometimes at nonprofits. And

2:09:37

if there's a fit there, I think that's just an obviously

2:09:40

great thing to be working on. We've talked

2:09:42

about information security. So

2:09:44

yeah, I don't think we need to say more about that. I

2:09:46

think there is this really tough question of whether you

2:09:48

should go to an AI company and just

2:09:50

kind of do things there that are not particularly

2:09:53

safety or policy or security, just

2:09:55

like helping the company succeed. You know, that

2:09:57

can be a really, in my opinion, really great way

2:09:59

to see. skill up, a really great way to like you

2:10:02

personally becoming a person who knows a lot about

2:10:04

AI, understands AI, swims in the water

2:10:06

and is well positioned to do something else later. There's

2:10:09

big upsides and big downsides to helping an AI company

2:10:11

succeed at what it's doing and it really comes down to how you feel about

2:10:14

the company. So it's a tricky one, but

2:10:16

it's one that I think is definitely worth thinking about,

2:10:18

thinking about it carefully. Then there's, you know, there's

2:10:20

roles in government and there's roles in government

2:10:22

facing think tanks, just trying

2:10:25

to help. And I think that the interest

2:10:27

is growing. So trying to help the government make good decisions,

2:10:29

including not making rash moves about

2:10:32

how it's dealing with AI policy, what

2:10:35

it's regulating, what it's not regulating, et cetera.

2:10:37

So those are some things. Yeah,

2:10:39

I had a few other listed in my

2:10:41

post, but I think it's okay to stop there.

2:10:43

Yeah. Yeah. I mean, I guess

2:10:45

it seemed like both of these paths. So

2:10:48

one broadly speaking was going and working in the AI

2:10:50

labs or in nearby

2:10:52

industries or firms that they collaborate with. And

2:10:55

I guess there's a whole lot of different ways you could have been back there. And

2:10:57

I suppose the other one is thinking about governance

2:11:00

and policy where you could just,

2:11:03

you could pursue any kind of government and policy

2:11:05

career, try to flourish as much as you

2:11:07

can and then turn your attention towards AI

2:11:09

later on. Because there's sure to be an enormous

2:11:12

demand for more analysis and work on this in

2:11:13

coming years. So hopefully

2:11:16

in both cases, you'll be joining very rapidly growing industries.

2:11:19

And for the latter, the closer, the better. So if working on

2:11:21

technology policy is probably best, but yeah.

2:11:23

What about people who kind of, they don't see any immediate

2:11:25

opportunity to enter into either of those

2:11:27

broad streams. Is there anything that you think

2:11:30

that they could do in the meantime?

2:11:31

Yeah. So I did before talk about the kind of person

2:11:33

who could just be good at something and kind of wait for

2:11:35

something to come up later. I guess it might

2:11:37

be worth emphasizing that the ability

2:11:40

to switch careers is going to get harder

2:11:42

and harder as you get further and further into your career.

2:11:45

So I think in some ways, like, if you're

2:11:47

a person who's being successful, but is

2:11:49

also like making sure that you've got the financial

2:11:51

resources, the social resources, the psychological

2:11:53

resources, that you really

2:11:55

feel confident that as soon as a good opportunity

2:11:57

comes up to do a lot of good, you're going to switch. switch

2:12:00

jobs or have a lot of time to serve on a board

2:12:02

or whatever. I think it's weird because this is like

2:12:04

a not a measurable thing and it's not a thing you can like brag

2:12:07

about when you go to an effective altruism meetup. It

2:12:09

just seems like incredibly valuable. And I just, I

2:12:11

wish there was a way to just kind of, to kind

2:12:13

of recognize that, you know, the person who is

2:12:16

successfully able to walk

2:12:18

when they need to from a successful career has

2:12:21

in my mind, like more, more expected

2:12:23

impact than the person who's in the high impact career right

2:12:25

now, but it's not killing it.

2:12:27

Yeah. So, so I expect an enormous

2:12:30

growth in roles that might be relevant

2:12:32

to this problem in, in future years, and also just

2:12:34

an increasing number of types of roles that might be relevant

2:12:36

because there could just be all kinds of new projects that are

2:12:38

going to grow and require people who are just generally competent,

2:12:40

you know, who have management experience, who know

2:12:43

how to deal with operations and legal and so on. So they're

2:12:45

going to be looking for people who, who share their values.

2:12:48

So if you're able to potentially move to one of the hubs

2:12:50

and take one of those roles, when it becomes available, if

2:12:52

it does, then that's, that's definitely a big step

2:12:55

up relative to it's locking yourself into something else.

2:12:57

We can't shift.

2:12:59

I was going to say also just like spreading messages

2:13:01

we talked about, but I have

2:13:03

a feeling that being a person who's a good communicator,

2:13:05

a good advocate, a good persuader,

2:13:08

I have a feeling that's going to become more

2:13:10

and more relevant and there's going to be more and more jobs like

2:13:12

that over time, because I think we're in a place now where

2:13:14

people are like, just starting to figure out

2:13:16

what a good regulatory regime might

2:13:19

look like, what a good set of practices might look

2:13:21

like for containing the danger. And later, I think

2:13:23

there'll be more, more maturity there and

2:13:25

more stress placed on and people need to actually

2:13:27

understand this and care about it and do it.

2:13:29

Yeah. I mean, setting yourself the challenge

2:13:31

of taking someone who is not informed about

2:13:33

this, so might even be skeptical about this and

2:13:36

with arguments that are actually sound

2:13:38

as far as you know, persuading them to care about

2:13:40

it for the right reasons and to understand it deeply, that

2:13:42

is not simple. Uh, and if you're able to build

2:13:44

the skill of doing that through, through practice, it

2:13:47

would be unsurprising if that turned out to be, to be

2:13:49

very useful in some role in future. And I should

2:13:51

be clear, there's a zillion versions of that that have

2:13:53

like dramatically different skillsets. So there's like

2:13:55

people who, you know, their thing is they work

2:13:57

in government and there's some kind of

2:13:59

government sub

2:13:59

culture that they're very good at communicating with and government

2:14:02

ease. And then there's people who make viral

2:14:04

videos. Then there's people who organize

2:14:06

grassroots protests. And there's so many. There's

2:14:10

journalists. There's highbrow journalists, lowbrow

2:14:12

journalists. It's just like communication is not

2:14:14

a generalizable skill. There's an

2:14:16

audience. And there's a gazillion audiences. And

2:14:18

there are people who are terrible with some audiences and amazing

2:14:20

with other ones. So this is many, many jobs. And I

2:14:22

think there'll be more and more over time.

2:14:24

Yeah. OK. We're just

2:14:26

about to wrap up this AI section. I

2:14:28

guess I had two questions from the audience to

2:14:30

run by you first. Yeah, one audience member

2:14:32

asked, what, if anything, should Open Philanthropy have

2:14:34

done two to five years ago to put us

2:14:36

in a better position to deal with AI now? Is

2:14:39

there anything that we missed?

2:14:41

Yeah. In terms of actual stuff we

2:14:43

literally kind of missed, I mean, I feel

2:14:45

like this whole idea of like E-Vels and standards

2:14:47

is like everyone's talking about it now. But

2:14:49

I mean, heck, it would have been much better if everyone was talking

2:14:51

about it five years ago. That would have been great.

2:14:54

I think in some ways, in some ways, this

2:14:56

research was kind of too hard to do before the models

2:14:58

got pretty good. But there might have been some start

2:15:00

on it, at least with understanding how it works

2:15:02

in other industries and starting to learn

2:15:04

lessons there. Security, obviously, I have

2:15:07

regrets about just like not. There were some

2:15:09

attempts to push it from ADK

2:15:11

and from Open Phil. But I think

2:15:12

those attempts could have been a lot more, a lot

2:15:14

louder, a lot more forceful. I

2:15:16

think it's possible that security

2:15:18

being the top hotness in EA rather

2:15:20

than alignment, like

2:15:22

it's not clear to me which one of those would be better.

2:15:24

And having the two be equal, I think probably would have been better.

2:15:27

Yeah, I mean, I don't know. Like there's lots of stuff

2:15:29

we're just like,

2:15:30

I kind of wish we just like paid more attention

2:15:32

to all of this stuff faster. But those are the most specific

2:15:35

things that are easy for me to point to. Yeah.

2:15:38

What

2:15:38

do you think of the argument that we

2:15:40

should expect a lot of alignment, like useful alignment

2:15:42

research to get done ultimately because it's

2:15:46

necessary in order to make the products useful? I

2:15:48

think Pushmit Coley made this argument

2:15:50

on the show many years ago. And I've

2:15:53

definitely heard it recently as well.

2:15:55

Yeah. I think it could be right.

2:15:57

I think in some ways it feels like it's almost

2:15:59

definitely right. to an extent or something. It's

2:16:01

just like there's certain AI

2:16:02

systems that just don't at all behave

2:16:05

how you want are just going to be too hard to commercialize.

2:16:07

And AI systems that are constantly causing random

2:16:09

damage and getting you in legal trouble, I mean,

2:16:12

that's not going to be a profitable business. So

2:16:14

I do think a lot of the work

2:16:16

that needs to get done is going to get done by

2:16:18

normal commercial incentives. I'm

2:16:21

very uncomfortable having that be the whole plan. One

2:16:24

of the things I am very worried about, again, if you're

2:16:26

really thinking of AI systems as capable of doing what

2:16:28

humans can do, is that you

2:16:30

could have situations where you're

2:16:32

trading AI systems to

2:16:35

be well behaved. But what you're really training them

2:16:37

to do is to be well behaved unless they

2:16:39

can get away with that behavior in a permanent

2:16:41

way. And just like a lot of humans, it's like they

2:16:44

behave themselves because they're part of a law and order

2:16:46

situation. And if they ever found

2:16:48

themselves able to gain

2:16:50

a lot of power or break the rules and get away with

2:16:53

it, they totally would. A lot of humans are like that. You could have

2:16:55

AIs that you're basically trained to be like that. And

2:16:57

so it reminds me a little bit of some of the financial

2:16:59

crisis stuff, where it's like, you

2:17:02

could be doing things that drive your

2:17:04

day-to-day risks down, but kind of concentrate

2:17:07

all your risk in these highly correlated tail

2:17:09

events. And so I don't

2:17:11

think it's guaranteed. But I think it's quite worrying that

2:17:13

we get to be in a world where in order to get

2:17:16

your AIs to be commercially valuable, you have to

2:17:18

get them to behave themselves. But you're only getting them to behave

2:17:20

themselves up

2:17:20

to the point where they can definitely get away with it. They're

2:17:22

actually kind of capable enough to be able to

2:17:24

tell the difference between those two things. And

2:17:27

so I don't want our whole plan to be commercial

2:17:29

incentives. We'll take care of this. And if anything, I

2:17:31

tend to be focused on the parts of the problem that seem

2:17:33

less likely to get naturally addressed that way. Yeah,

2:17:36

another analogy there is to

2:17:38

the forest fires. Where as I understand it, because

2:17:40

we wouldn't like forest fires, we basically prevent

2:17:43

forests from ever having fires. But then that

2:17:45

causes more brush to build

2:17:47

up. And then every so often, you have some enormous

2:17:49

cataclysmic fire

2:17:50

that you just can't put out because the amount of

2:17:53

combustible material there is extraordinarily high, like

2:17:55

more than you ever would have had naturally before humans started

2:17:57

putting out these fires. I guess that's one way

2:17:59

in.

2:17:59

fact that trying to prevent

2:18:02

like small scale bad outcomes

2:18:04

or trying to prevent like a minor misbehavior

2:18:06

by models could give you a false sense of security because you'd be

2:18:08

like, well, we haven't had a haven't had a forest fire in

2:18:11

so long. But then of course, all you're doing

2:18:13

is like causing something much worse to happen later, because

2:18:15

you've been allowed into complacency. Yeah. And

2:18:17

I'm not I'm not that concerned about false sense of security.

2:18:20

I think we should like try and make things good and and

2:18:22

then argue about whether they're actually good. So, you know,

2:18:24

I think we should try and get models to behave. And

2:18:26

after we've done everything we can to do that, we should ask if we

2:18:29

really got them behave and what might we

2:18:31

be missing. So

2:18:32

I don't I don't think we shouldn't care if they're

2:18:34

if they're being nice. But I think it's not the end of

2:18:36

the conversation.

2:18:37

Yeah, another audience member asked,

2:18:40

how should people who've been thinking about and working on AI

2:18:42

safety for for many years who react

2:18:44

to all of these ideas suddenly becoming becoming

2:18:47

much more popular in the in the mainstream than they

2:18:49

ever were?

2:18:50

I don't know. I mean, like brag about how everyone

2:18:52

else's a poser. I mean,

2:18:55

I'm not sure what the question is. Don't

2:18:57

encourage me. Hold on. Yeah.

2:19:01

What's what like, how should they react? I mean, I don't

2:19:03

know. I mean, I is there

2:19:05

more of a sharpening? Like, I think we should still care about

2:19:07

these issues. I think that people who are

2:19:09

who are who were not interested in them before and are interested

2:19:11

in them now, we should be like really happy and

2:19:14

it should welcome them in and see if we can work productively

2:19:16

with them. What else is the question? Yes.

2:19:20

I guess it reminded me of the point you made

2:19:23

in a previous conversation we had. We said, you know, lots

2:19:25

of people kind of including us who are a bit

2:19:27

ahead of the curve on COVID. You know, we were kind of expecting

2:19:29

this sort of thing to happen. And then we saw that it was going to happen

2:19:32

weeks or months before anyone else did. And that didn't really

2:19:34

help. Like, yeah, yeah, no, we

2:19:36

managed to do anything. Yeah. And I'm not

2:19:38

like, I'm worried with this.

2:19:39

Like, on one level, I feel

2:19:41

kind of smug that I feel like I was ahead of the curve

2:19:43

on noticing this problem. But I'm also like, and we didn't

2:19:45

manage to fix it. Did we? We didn't manage to convince people.

2:19:47

So I guess, you know, there's both a degree

2:19:50

of smugness and we got to like eat humble pie at the same

2:19:52

time. I think it's I mean, I think in some ways

2:19:54

I feel better about this one. I think I think like

2:19:56

I do feel like the early concern about

2:19:58

AI was like productive.

2:19:59

We'll see, but

2:20:02

I generally feel like there is, the

2:20:05

public dialogue is probably different from what

2:20:07

it would have been if there wasn't a big set

2:20:09

of people talking about these risks and trying to understand

2:20:12

them and help each other understand them.

2:20:14

I think there's different people working in the field. We

2:20:17

don't have a field that's just 100% made of people

2:20:19

whose entire goal in life is making money. That

2:20:21

seems good. There's people

2:20:23

in government who care about this stuff, who are very

2:20:25

knowledgeable about it, who aren't just coming at it

2:20:28

from the beginning, who understand some of the big risks.

2:20:29

So, I

2:20:32

think good has been done. I think the situation

2:20:34

has been

2:20:35

made better. I think that's debatable. I don't think

2:20:37

it's totally clear. I'm not feeling like

2:20:39

nothing was accomplished, but yeah, I think

2:20:41

you're totally, I mean, I'm with you that being

2:20:44

right ahead of time, that is not- It's

2:20:46

not enough. It's not my goal in life. It is not effective

2:20:48

altruism's goal in life. You could be wrong ahead of time, be

2:20:51

really helpful. You could be right ahead of time, be really useless.

2:20:53

So yeah, I would definitely say, let's

2:20:55

focus pragmatically on solving this problem. All

2:20:58

these people who weren't interested before and are now,

2:21:00

let's be really happy that they're interested now and

2:21:02

figure out how we can all work together to reduce AI

2:21:05

risk. And let's notice how the winds are

2:21:07

shifting and how we can adapt.

2:21:08

Yeah. Okay, let's wrap

2:21:10

up this AI section. We've been talking

2:21:13

about this for a couple of hours, but interestingly, I feel like we've

2:21:15

barely scratched the surface on any of these different

2:21:17

topics. We've been keeping up a blistering

2:21:19

pace in order to not keep

2:21:21

you up for your entire workday. I

2:21:24

guess it is just interesting how many different

2:21:26

aspects that are out of this problem and how hard

2:21:28

it is to get a grip on all of them. And I

2:21:31

think one thing you said before we started recording is just that your

2:21:33

views on this are evolving very quickly. And

2:21:35

so I think probably we need to come back and have another

2:21:37

conversation about this in six or 12

2:21:38

months. And I'm sure you have more ideas

2:21:40

and maybe we can go into detail on some specific ones. Yeah,

2:21:43

that's right. I mean, I think if I were to kind of

2:21:45

wrap up where I see the AI situation right

2:21:47

now, I think there's definitely more interest. People

2:21:49

are taking risks more seriously. People are taking

2:21:51

AI more seriously. I don't think

2:21:54

anything is totally solved or anything,

2:21:56

even in terms of public attention. Alignment

2:21:58

research has...

2:21:59

has been really important for a long time, remains really

2:22:02

important. And I think it's like there's more

2:22:04

interesting avenues of it that are getting somewhat mature

2:22:06

than there used to be. There's more jobs in there. There's

2:22:08

more to do. I think the evals

2:22:11

and standards stuff is newer. And I'm

2:22:13

excited about it. And I think in a year, there may be like a

2:22:15

lot more to do there, like a lot, a lot. I think

2:22:17

another thing that I have been kind of updating on a

2:22:19

bit is that there is some amount of convergence

2:22:21

between different concerns about AI. And

2:22:24

we should lean into that while not getting too comfortable with

2:22:26

it. So I think, we're at a stage right

2:22:28

now where the main argument

2:22:29

that today's AI systems are not too dangerous

2:22:32

is that they just can't do anything that bad,

2:22:34

even if humans try to get them to. When that

2:22:37

changes, I think we should be more worried about

2:22:39

misaligned systems and we should be more worried about

2:22:41

aligned systems that bad people have access to.

2:22:43

I think for a while, those concerns are gonna

2:22:45

be quite similar and people who are concerned about aligned

2:22:48

systems and misaligned systems are gonna have a lot

2:22:50

in common. I don't think that's gonna be true

2:22:52

forever. So I think in a world where there's

2:22:55

pretty good balance of power and lots of different

2:22:57

humans have AIs and they're kind of keeping

2:22:59

each other in check, I think you would worry at that point less

2:23:01

about misuse and more about alignment because

2:23:03

misaligned AIs could end up all on one side

2:23:06

against humans or like mostly on

2:23:08

one side or just fighting each other in a way

2:23:10

where we're collateral damage. So, I

2:23:12

think right now a lot of what

2:23:15

I'm thinking about in AI is pretty convergent.

2:23:17

It's just like, how can we build a regime

2:23:19

where we detect

2:23:21

danger, which just means

2:23:23

anything that AI could do that feels like

2:23:25

it could be really bad for any reason and

2:23:27

stop it. And I think at some point it'll get harder to make

2:23:30

some of these trade-offs.

2:23:31

Okay, to be continued, let's

2:23:33

wish on to something completely different, which is

2:23:35

this article that you've been working on where you lay

2:23:38

out your reservations about, well, I

2:23:40

think in one version you call it hardcore utilitarianism

2:23:42

and another one you call it impartial expected

2:23:44

welfare maximization. I think maybe

2:23:46

for the purposes of, I guess, the acronym is IEWM

2:23:49

in the article, but I think for

2:23:52

the purposes of an audio version, let's

2:23:54

just call this hardcore utilitarianism. So, give some

2:23:56

context to tee you up here a little. Yeah, this is a topic

2:23:59

that we've discussed. with Joe Karsmith in

2:24:01

episode 152, Joe

2:24:03

Karsmith on navigating serious philosophical confusion.

2:24:05

And we also actually touched on it at the end of episode 147 with

2:24:08

Spencer Greenberg. Basically over the years,

2:24:10

you found yourself talking to people who

2:24:13

are much more all in on some sort

2:24:15

of utilitarianism than you are. And

2:24:17

I think from reading the article, the draft, I

2:24:19

think the conclusions they draw that bother you the most

2:24:22

are that open philanthropy or

2:24:24

that perhaps the effective altruism community should

2:24:26

only have been making grants to improve the long-term future.

2:24:28

And maybe only making grants related or only

2:24:30

doing any work related to artificial intelligence rather

2:24:33

than diversifying and hedging

2:24:35

all of our bets across a range of different worldviews

2:24:38

and splitting our time and resources between catastrophic

2:24:40

risk reduction as well as helping present

2:24:42

generation of people and also helping

2:24:45

non-human animals among other different

2:24:47

plausible worldviews. And also maybe

2:24:50

the conclusion that some people draw that they should act uncooperatively

2:24:52

with other people who have different values whenever

2:24:55

they think that they can get away with it. Yeah, do

2:24:57

you want to clarify any more of the

2:24:59

attitudes that you're reacting to here?

2:25:01

Yeah, I mean, one piece of clarification

2:25:04

is just like the piece, the articles you're talking about, one

2:25:06

of them was like a 2017 or 2018 Google doc that

2:25:09

I probably will just never turn into a public piece.

2:25:12

And another is a dialogue I started writing

2:25:14

that I do theoretically intend to publish

2:25:16

someday but it might never happen. It might

2:25:18

be a very long time. Yeah,

2:25:20

I don't know. The story I would tell is like, I

2:25:23

co-founded Gitwell, I've always been,

2:25:26

I've always been interested in doing the most good possible in

2:25:29

kind of a hand wavy, rough

2:25:31

like

2:25:32

way. One way of talking about what

2:25:35

I mean by doing the most good possible is like, there's

2:25:37

a kind of apples to apples principle that says,

2:25:40

when I'm choosing between two things that

2:25:42

really seem like they're pretty apples to

2:25:44

apples, pretty similar, I want to

2:25:46

do the one that helps more people more.

2:25:48

When I feel like I'm able to really make the comparison,

2:25:50

I want to do the thing that helps more people more. That

2:25:53

is a different principle from a more all encompassing

2:25:56

like, and everything can be converted

2:25:58

into apples. And all. interventions

2:26:00

are in the same footing and there's one

2:26:03

thing that I should be working on that is the best thing

2:26:05

and like there is an answer to whether it's better

2:26:08

to like increase the odds that many people

2:26:10

will ever get to exist versus like reducing

2:26:12

malaria in Africa versus like helping

2:26:15

chickens on factory farms and I've always been like

2:26:17

a little less sold on that second

2:26:19

way of thinking so there's the there's that

2:26:21

you know the more apples principle that like I want

2:26:23

more apples when it's all apples and then there's the

2:26:25

like it's all one fruit principle or something these are really

2:26:27

good names that I then I put on the spot that

2:26:30

I'm sure will stand the test of time

2:26:32

you know I got into this world and I met other people are interested

2:26:34

in similar topics a lot of them are you know identify

2:26:37

as effective altruists and I encountered these you

2:26:39

know ideas that are that were more

2:26:41

hardcore and were more saying look

2:26:44

like I think the story I would basically tell would be something

2:26:46

like there is like one

2:26:48

correct way of thinking about what it means to do good

2:26:51

or to be ethical

2:26:52

that comes down to basically utilitarianism

2:26:55

this can basically be a scene

2:26:58

by looking in your heart and seeing that subjective experiences

2:27:01

all that matters and that everything else is

2:27:03

just heuristics for optimizing pleasure

2:27:05

and minimizing pain B

2:27:07

you can show it with various theorems like

2:27:10

you know Harsani's aggregation theorem

2:27:12

tells you that if you're trying

2:27:14

to give others the deals and gambles

2:27:17

they would choose then it falls

2:27:19

out of that that you need some form of utilitarianism

2:27:22

it's a bad piece kind of going into all the stuff

2:27:24

this means and people kind of say look

2:27:26

like we think we have

2:27:27

really good reason to believe that after humanity

2:27:30

has been around for longer it is wiser if this happens

2:27:33

we will all realize that like the right way of thinking about

2:27:35

what it means to be a good person is just

2:27:37

to yeah

2:27:38

basically be utilitarian take

2:27:40

the amount of pleasure minus

2:27:42

pain add it up maximize that

2:27:45

be hardcore about it like

2:27:47

don't like lie and be a jerk for no reason

2:27:49

but like if you ever somehow knew that doing

2:27:52

that was going to maximize utility that's what you should

2:27:54

do and I and I ran into that point

2:27:56

of view and that point of view also was I think very like

2:27:58

I roll at the

2:27:59

that Open Philanthropy was going to do work

2:28:02

in long-termism and global

2:28:04

health and well-being. And, you know, my

2:28:07

basic story is like,

2:28:08

I have updated significantly toward

2:28:11

that worldview compared to where I started, but

2:28:14

I am still less than half,

2:28:16

less than half into it. And furthermore, the

2:28:18

way that I deal with that is not

2:28:20

by multiplying through and doing another layer of expected

2:28:23

value, but by saying, look, if

2:28:25

I have a big pool of money, I think

2:28:27

less than half of that money should be like following

2:28:29

this worldview.

2:28:31

I've been around for a long time in this community. I

2:28:33

think I've now heard out all

2:28:35

of the arguments, and that's still where I am.

2:28:37

And so, you know, my basic

2:28:40

stance is like, I think that

2:28:42

we are still very deeply confused about ethics.

2:28:45

I think

2:28:46

we don't really know what it means to do good.

2:28:49

And I think that reducing everything to like

2:28:52

utilitarianism is probably not workable.

2:28:54

I think it probably actually just breaks in very

2:28:57

simple mathematical ways. And

2:28:59

I think we probably have to have a lot

2:29:01

of arbitrariness in our views of ethics. I think

2:29:04

we probably have to have some version of just like caring

2:29:06

more about people who are more similar to us or

2:29:08

closer to us. And so I think,

2:29:10

you know, yeah, I still am basically unprincipled

2:29:13

on ethics. I still basically like

2:29:16

have a lot of things that I care about that I'm not sure why

2:29:18

I care about. I would still basically take a big pot of money

2:29:20

and divide it up between different things. I

2:29:23

still like believe in certain moral rules

2:29:25

that you got to follow,

2:29:26

not as long as you don't know the outcome,

2:29:28

but just you just got to follow them end of story period,

2:29:31

don't overthink it. That's the story I am.

2:29:33

So I don't know. Yeah, I wrote a dialogue trying to explain

2:29:36

why this is for someone who thinks the reason

2:29:38

I would think this is because I hadn't thought through all the hardcore

2:29:40

stuff. And instead, just addressing the hardcore stuff

2:29:42

very directly. Yeah.

2:29:44

So yeah, perfect for this interview,

2:29:46

you might have thought that we would have ended up having a debate

2:29:49

about whether impartial expected welfare maximization

2:29:52

is the right way to live or or the right theory of morality.

2:29:54

But actually, it seems like we mostly disagree on how many

2:29:56

people actually

2:29:56

really all in on hardcore.

2:29:59

Yeah, right on. Hardcore utilitarianism.

2:30:03

I guess my impression is at least the people

2:30:06

that I talk to who maybe are like somewhat filtered

2:30:08

and selected, many people, including me, absolutely,

2:30:11

think that impartial expected welfare maximization

2:30:13

is underrated by the general public. And I

2:30:16

think that, yeah. Yeah. And that there's a

2:30:18

lot of good that one can do using,

2:30:20

if you focus on increasing wellbeing, there's an awful

2:30:22

lot of good that you can do there and that most people

2:30:25

aren't thinking about that. But nonetheless, I'm

2:30:27

not confident that we've solved philosophy. I'm not so confident

2:30:29

that we've solved ethics. The

2:30:31

idea that pleasure is good and suffering

2:30:34

is bad feels like among the most plausible

2:30:36

claims that one could make about what is valuable and what is

2:30:38

disvaluable. But we don't really like the

2:30:40

idea of things being objectively valuable is incredibly

2:30:43

odd one. It's not clear how we could get any evidence

2:30:45

about that, that would be fully persuasive. And clearly

2:30:47

philosophers are very split. So people

2:30:50

kind of do this. We're forced to this odd position

2:30:52

of wanting to hedge our bets a bit between this theory

2:30:54

that seems like maybe the most plausible

2:30:57

ethical theory, but also having lots of conflicting

2:30:59

intuitions with it, and also being aware that many, many

2:31:01

smart people don't agree that this is the right

2:31:03

approach at all. But I mean, it sounds like you've

2:31:06

ended up in conversations with people who are, you know, maybe they

2:31:08

have some doubts, but they are like pretty hardcore.

2:31:11

They like really feel like there's a good chance that

2:31:13

when we look back, we're going to be like it was absolute, it was total

2:31:16

utilitarianism all along and everything else was completely

2:31:18

confused.

2:31:19

Yeah, I think that's right. I think you can,

2:31:21

there's definitely room for some nuance here. Like you don't

2:31:23

have to think you've solved philosophy. I think the position

2:31:25

a lot of people take is more like,

2:31:28

I don't really put any weight on random

2:31:30

common sense intuitions about what's good because

2:31:32

those have a horrible track record. Just

2:31:35

like the history of common sense morality looks like so

2:31:37

bad that I just don't really care what it says. So I'm

2:31:39

going to take like the best guess I've got at a systematic

2:31:42

science like, you know, with good scientific

2:31:45

properties of like simplicity and predictiveness, system

2:31:47

and morality, that's the best I can do. And

2:31:49

furthermore, there's a chance it's wrong, but

2:31:52

you can do another layer of expected value

2:31:54

maximization and multiply that through. And so

2:31:56

I'm yeah, I'm basically going to act

2:31:58

as if maximization.

2:31:59

utilities all that matters and specifically

2:32:02

maximizing the, you know, kind of like pleasure

2:32:04

minus pain type thing of subjective experience.

2:32:07

That is the best guess. That is how I should act. When

2:32:09

I'm unsure what to do, I may follow heuristics, but

2:32:12

if I ever run into a situation where the numbers just

2:32:14

clearly work out, I'm gonna do what the numbers say. Yeah,

2:32:16

and I think I not only think

2:32:19

that's not definitely right,

2:32:21

yeah, a minority of me is into that view.

2:32:23

So I think I would say, is it the most plausible

2:32:26

view? I would say no. I would say

2:32:29

the most plausible view of ethics is that it's

2:32:31

a giant mishmash of different things and

2:32:34

that what it means to be good and do good is

2:32:36

like a giant mishmash of different things and we're

2:32:38

not going to nail it anytime

2:32:40

soon. Is it the most plausible thing that's

2:32:42

kind of like need and clean and well-defined?

2:32:45

Well,

2:32:45

I would say definitely total utilitarianism

2:32:48

is not.

2:32:49

I think total utilitarianism is completely screwed,

2:32:51

makes no sense, it can't work at all, but I think

2:32:53

there's a variant of it, sometimes called Udasa,

2:32:56

that I'm okay kind of saying that's the most

2:32:58

plausible we got or something and gets like a decent

2:33:00

chunk but not a majority of what I'm thinking about.

2:33:03

Holden just used the term Udasa, which

2:33:05

is U-D-A-S-S-A. It

2:33:07

stands for Universal Distribution Absolute

2:33:10

Self-Sampling Assumption. Now, you

2:33:12

probably don't know what Udasa is and I don't

2:33:14

really either. It's some sort of attempt to

2:33:16

deal with anthropics and the universe

2:33:19

potentially being infinite in size by

2:33:21

not weighting all points in the universe equally and

2:33:23

instead assigning them ever-decreasing value following

2:33:25

some numbering system. The issue is

2:33:27

that if you keep adding an unlimited series of ones

2:33:30

you get an infinite sum and you have problems

2:33:32

making comparisons to any

2:33:34

other series that also sum to infinity. If

2:33:37

instead you add one and then a half and then

2:33:39

a quarter and then an eighth and then a sixteenth

2:33:41

and so on in an infinite series then

2:33:43

that series actually sums to a finite number,

2:33:46

that is two, and you will

2:33:48

be able to make comparisons with other such

2:33:50

series. If what I said didn't

2:33:53

make much sense to you don't worry it doesn't actually

2:33:55

need to. I just know that Udasa is

2:33:57

some technical approach that might make utilitarianism

2:33:59

viable.

2:33:59

viable in an infinite universe. We'll stick

2:34:02

up a link to people who want to read more about you Dasa, but

2:34:04

I haven't and I wouldn't blame you if you don't want to

2:34:07

either. Okay, back to the show.

2:34:09

Maybe it would be worth laying out like, you

2:34:11

know, you're doing a bunch of work, presumably it's kind of stressful

2:34:13

sometimes in order to help other people and you start to give

2:34:16

well trying, you know, wanting to help

2:34:18

the global poor. What is your conception of

2:34:20

morality and what motivates you

2:34:23

to do things in order to make the world better?

2:34:25

A lot of my answer to that is just, I

2:34:27

don't know. Sometimes when people

2:34:30

interview me about these like thought experiments, you

2:34:32

save the painting, I'll just be like, I'm not

2:34:34

a philosophy professor. And like, look, that

2:34:36

doesn't mean I'm not interested in philosophy. Like I said, I think I've argued

2:34:38

this stuff into the ground. But like, a lot

2:34:41

of my conclusion is just like,

2:34:43

philosophy is a non

2:34:46

rigorous methodology with an unimpressive

2:34:48

track record. And I

2:34:50

don't think it is that reliable

2:34:53

or that

2:34:54

important. And it isn't

2:34:57

that huge a part of my life. And I think I

2:34:59

find it really interesting. So that's not because I'm unfamiliar

2:35:01

with it. It's because I think it shouldn't

2:35:03

be. And so I'm kind of not that philosophical

2:35:06

a person in many ways. I'm super interested

2:35:08

in it. I love talking about it. I have lots of takes. But

2:35:10

I think when I make high stakes, important decisions

2:35:12

about how to spend large amounts of money, I'm not

2:35:15

that philosophical of a person. And most

2:35:18

of what I do does not rely on unusual

2:35:20

philosophical views. I think it can be

2:35:22

justified to someone with like quite

2:35:24

normal takes on ethics. Yeah. One

2:35:27

thing is that you're not a moral realist. So you

2:35:29

don't believe that there are kind of objective

2:35:31

mind independent facts about what is good and bad

2:35:34

and what one ought to do. I have never

2:35:36

figured out what this position is supposed to mean.

2:35:39

And I'm hesitant to say I'm not one because

2:35:41

I don't even know what it means. So if you can if you

2:35:43

can cash something out for me that has a clear

2:35:45

pragmatic implication, I will tell you if I am

2:35:48

or not. But I've never really even gotten what I'm disagreeing

2:35:50

with or agreeing with on that one. Yeah. Okay.

2:35:54

So that sounded

2:35:54

like

2:35:55

you had some theory of doing good

2:35:57

that or some theory of what your the enterprise

2:35:59

that you're engaged in when you try to live morally

2:36:02

or when you try to make decisions about where you should

2:36:04

give money. But it's something about

2:36:06

acting on your preferences, about making

2:36:08

the world better. Something on acting, like it's at least

2:36:11

about acting on the intuitions you have about

2:36:13

what good behavior is. I generally am subjectivist.

2:36:17

When I hear subjectivism, that sounds right. When I

2:36:19

hear more realism, I don't go that sounds wrong. I

2:36:21

don't know what you're saying. And I

2:36:23

have tried to understand. I've been trying

2:36:25

again now if you want. Yeah. If

2:36:28

more realism is true, it's a very queer thing,

2:36:31

as philosophers say. Realism

2:36:33

about moral facts is not seemingly the

2:36:35

same as scientific

2:36:36

facts about the world. It's not clear how

2:36:38

we're causally connected to these facts. Yeah,

2:36:41

exactly. I've heard many different

2:36:43

versions of moral realism. I think some

2:36:45

of them, I'm just like, this feels like a terminological

2:36:47

or semantic difference with my view. And

2:36:50

others, I'm just like, this sounds totally

2:36:52

nutso. I don't know. I

2:36:54

have trouble being in or out on this thing, because

2:36:56

it just means so many things. And I don't know which one it means.

2:36:58

And I don't know what the more interesting versions

2:37:00

are even supposed to mean. But it's fine. Yes,

2:37:03

I'm a subjectivist. I'm more or less

2:37:06

the most natural

2:37:06

way I think about morality. It's just like, I

2:37:08

decide what to do with my life. And there's certain

2:37:11

flavors of pull that I have. And those are moral

2:37:13

flavors. And I try to make myself do

2:37:15

the things that the moral flavors are pulling me on. I think

2:37:17

that makes me a better person when I do. Yeah.

2:37:20

Okay. So maybe we have highlighted

2:37:22

the differences here. To imagine this conversation

2:37:24

where you're saying, no, I'm leading open

2:37:26

philanthropy. I think that we should split our

2:37:28

efforts between a whole bunch of different projects, each

2:37:31

one of which would look exceptional on a different

2:37:33

plausible worldview. And the hardcore utilitarian

2:37:35

comes to you and says, no, you should choose the best one

2:37:37

and just fund that. Or you like spend all of your resources

2:37:40

and all of your time just focused on that best one. What

2:37:42

would you say to them in order to justify the worldview

2:37:45

diversification approach?

2:37:46

Yeah, I mean, the first thing I would say to them is just

2:37:49

like, burden's on you. And I think this

2:37:51

is kind of a tension I often have with people who consider

2:37:54

themselves hardcore. They'll just,

2:37:56

you know, it's like, they'll just be like, well, why wouldn't you be

2:37:58

a hardcore utilitarian? Like, what's the problem?

2:37:59

and it's just maximizing the pleasure and

2:38:02

minimizing the pain or the sum or the difference.

2:38:04

And I would just be like, no, no, no, you've got to tell

2:38:06

me because I am sitting

2:38:08

here with these great opportunities

2:38:11

to help huge amounts of people in very

2:38:14

different and hard to compare ways. And the

2:38:16

way I've always done ethics before in my life is

2:38:19

I basically have some voice inside me and it says,

2:38:21

this is what's right. And that voice has to carry some

2:38:23

weight. It's like even on your model, that voice has to carry some

2:38:25

weight because you, the hardcore utilitarian,

2:38:27

not Rob, because we all know you're not at all. But

2:38:31

the,

2:38:31

it's like even the most systematic theories of ethics,

2:38:34

it's like, they're all using that little voice

2:38:36

inside you that says what's right. That's the arbiter

2:38:38

of all the thought experiments. So that we're all putting weight

2:38:40

on it somewhere, somehow. And I'm like,

2:38:43

cool, that's gotta be how this works. There's

2:38:45

a voice inside me saying, this feels right, this feels wrong.

2:38:47

That voice has gotta get some weight. That voice

2:38:49

is saying, you know what? Like,

2:38:52

it is really interesting to think about these risks to

2:38:54

humanity's future, but also like,

2:38:56

it's weird. Like, this work is not

2:38:58

shaped like the other work. It doesn't have as good

2:39:00

feedback loops. It feels icky.

2:39:03

Like a lot of this work is about just

2:39:05

basically supporting people who think like us, or

2:39:07

feels that way a lot of the time. And it just feels

2:39:10

like doesn't have the same ring of

2:39:12

ethics to it. And then on the other hand, it just

2:39:14

feels like, I'd be kind of a jerk if like, like OpenPhil,

2:39:16

I believe, and you can disagree with me, is like not

2:39:19

only the biggest, but the most effective farm animal

2:39:21

welfare funder in the world. And I think

2:39:23

we've had enormous impact and made animals'

2:39:25

lives dramatically better. And

2:39:27

coming to say to me, no, you should take all that money and put

2:39:29

it like into the like diminishing

2:39:32

margin of like supporting people

2:39:34

to think about some future x-risk

2:39:36

in a domain where you're mostly have

2:39:39

a lot of these concerns about insularity. Like

2:39:41

you've got to make the case to me because the normal

2:39:44

way all this stuff works is you like listen

2:39:46

to that voice inside your head and you care what it says. And

2:39:48

some of the opportunities OpenPhil has to do a lot

2:39:50

of good are quite extreme and we do them.

2:39:52

So that's the first thing is we've gotta

2:39:54

put the burden of proof in the right place. Cause I think

2:39:56

utilitarianism is definitely interesting and has some things

2:39:58

going for it, especially.

2:39:59

you patch it and make it Yudasa, although

2:40:02

that makes it less appealing. But you got

2:40:04

to... Where's the burden proof? Yeah.

2:40:06

Yeah. Okay. So, to buy

2:40:08

into this, the hardcore utilitarian view,

2:40:10

I guess one way to do it would be, so you're committed to moral

2:40:12

realism. I guess you might be committed

2:40:14

to hedonism as a theory of value, so it's only

2:40:17

pleasure and pain. I guess then you also

2:40:19

want to add on kind of a total view,

2:40:21

so it's just about the complete aggregate

2:40:23

there. That's all that matters. You're going to say there's no

2:40:25

side constraints and kind of all of your other conflicting

2:40:28

moral intuitions are worthless, so you

2:40:30

should completely ignore those. Are there

2:40:32

any other moral, philosophical commitments

2:40:34

that underpin this view that you think are implausible

2:40:37

and haven't been demonstrated to a

2:40:39

sufficient extent? I don't think you need all those at all.

2:40:41

I mean, I've written

2:40:43

up this series... I mean, I'd steelman the hell out

2:40:45

of this position, as well as I could. I've

2:40:48

written up this series called Future Proof Ethics, and I think

2:40:50

the title kind of has been confusing and

2:40:52

I regret it, but it is trying to get at this idea

2:40:54

that I want an ethics that,

2:40:56

whether because it's correct and real, or

2:40:59

because it's what I would come to

2:41:01

on reflection, I want an ethics that's in some

2:41:03

sense better, that's in some sense what

2:41:05

I would have come up with if I had more time to think about

2:41:08

it. What would that ethics look like? I

2:41:10

don't think you need moral realism to care about this. You

2:41:12

can make a case for utilitarianism

2:41:15

that just starts from

2:41:16

gosh, humanity has a horrible track record

2:41:19

of treating people horribly. We should really try and

2:41:21

get ahead of the curve. We shouldn't be listening

2:41:23

to common sense intuitions that are actually going to be quite

2:41:26

correlated with the rest of our society, and that looks bad

2:41:28

from a track record perspective. We need to

2:41:30

figure out the fundamental principles of morality as

2:41:32

well as we can. We're not going to do it perfectly,

2:41:35

and that's going to put us ahead of the curve and make us less

2:41:37

likely to be the kind of people that would think we were

2:41:39

moral monsters if we thought about it more. You don't need moral

2:41:42

realism. You don't need

2:41:44

hedonism at all. I

2:41:46

think

2:41:46

you

2:41:48

can just say... I mean, most people

2:41:50

do do this with hedonism, but I think you can

2:41:52

just say, if you want to use Arstani's

2:41:54

aggregation theorem, which means if you

2:41:56

basically

2:41:57

want it to be the case that every

2:41:59

time... everyone would prefer

2:42:01

one kind of state of affairs to another, you do

2:42:04

that first state of affairs. You can get from

2:42:06

there and some other assumptions to

2:42:08

basically at least a form of utilitarianism

2:42:11

that says

2:42:12

a large enough number of small benefits

2:42:14

can outweigh everything else. I call this the

2:42:17

utility legions corollary. It's like a play

2:42:19

on utility monsters. But it's like, once

2:42:22

you decide that something is

2:42:24

valuable, like helping a chicken or

2:42:26

helping a person get a chance to exist, that

2:42:28

there's some number of that thing that can outweigh everything

2:42:30

else. And I think that doesn't reference

2:42:33

hedonism. It's just this idea of like, come

2:42:35

up with anything that you think is non-trivially beneficial

2:42:38

and a very large number of it beats everything

2:42:40

and wins over the ethical calculus.

2:42:42

That's like a whole set of mathematical

2:42:44

or whatever logical steps you can take that

2:42:46

don't invoke hedonism at any point. So

2:42:49

I think the steel-made version of this would not have a million

2:42:51

premises. It would say, look,

2:42:53

we really want to be ahead of the curve. That

2:42:55

means we want to be systematic, one of the minimal

2:42:58

set of principles, so that we can

2:43:00

be systematic and make sure that we're really only

2:43:02

basing our morality on the few things we feel best about.

2:43:05

And that's how we're going to avoid being moral monsters.

2:43:08

One of the things we feel best about is this

2:43:10

utilitarianism idea, which has the utility legions

2:43:12

corollary. Once we've established that,

2:43:14

now we can establish that like a

2:43:17

person who gets to live instead of not living

2:43:19

is a benefit. And therefore, enough of them cannot

2:43:21

weigh everything else. And then we can say, look,

2:43:24

if there's 10 to the 50 of them in the future, an expectation,

2:43:27

that weighs everything else that we could ever

2:43:29

plausibly come up with.

2:43:30

That to me is the least assumption root.

2:43:32

And then you can tack on some other stuff. You can be like,

2:43:35

also, like people have thought this way in the past,

2:43:37

did amazing. Jeremy Bentham was like

2:43:39

the first utilitarian, and he was like early

2:43:42

on women's rights and animal rights and anti-slavery

2:43:44

and all this other like gay rights and like all

2:43:46

this other stuff. And so this is just like, yep,

2:43:49

it's like a simple system.

2:43:50

It looks great looking backward. It's

2:43:53

built on these rock solid principles of

2:43:55

like utilitarianism and systematicity

2:43:58

and maybe sentiatism.

2:43:59

which is a thing I didn't quite cover, which

2:44:02

I should have. But, you know, radical impartiality,

2:44:05

like caring about everyone, no matter where they

2:44:07

are, as long as they're like physically identical, you have

2:44:09

to care about them the same. You could basically

2:44:11

derive from that, this system, and then that

2:44:13

system tells you there's so many future generations

2:44:16

that it just everything has to come down to them. Now,

2:44:18

maybe you have heuristics about how to actually help them, but everything

2:44:20

ultimately has to be how to help them. So that would be my steel man.

2:44:23

Yeah. Okay. Okay. So I've much more often encountered

2:44:25

this, the kind of grounding

2:44:27

for this that Sharon Hewitt-Rawlett

2:44:29

talked about

2:44:29

in episode 138, which is

2:44:32

this far more philosophical approach

2:44:35

to it. But, you know, the case you make there, it

2:44:37

doesn't sound like some watertight thing

2:44:39

to me because, well, especially once you start making

2:44:41

arguments like, oh, it has a good historical track record,

2:44:43

you'd be like, well, I'm sure I've

2:44:45

got some stuff wrong. And also, like, maybe it could be right

2:44:48

in the past, but wrong in the future. It's not an

2:44:50

overwhelming argument. But I guess, yeah, what do you say to people

2:44:52

who bring to you this basic steel

2:44:55

man of the case?

2:44:56

Yeah, I say a whole bunch of stuff. I mean, I think

2:44:58

I would say for the first thing

2:45:00

I would say is just like, it's not enough. Like, you

2:45:03

know, it's just it's just we're not talking about a rigorous

2:45:06

discipline here. You haven't done enough.

2:45:08

The stakes are too high. You haven't

2:45:10

done the work to establish this. The

2:45:13

specific things I would get into, I would first

2:45:15

just be like, I

2:45:16

don't believe you buy your own story. I think

2:45:19

I've basically, you

2:45:20

know, I think even the people who believe themselves, very

2:45:22

hardcore utilitarians, it's because

2:45:24

they no one designs thought experiments

2:45:27

just to mess with them. And I think you totally can. That

2:45:30

are just, you know, I mean, you know, one thought

2:45:33

experiment, I've kind of used it and not ever

2:45:35

anyone is going to reject some of these. But, you know,

2:45:37

one of them is it's like, well, there's an asteroid

2:45:40

and it's about to hit Cleveland and destroy it entirely

2:45:42

and kill everyone there.

2:45:43

But no one will be blamed. You know,

2:45:45

somehow this is like having a neutral effect on

2:45:48

long run future. Would you prevent the

2:45:50

asteroid hitting Cleveland for 35 cents?

2:45:53

And it's like, well, you could give that 35 cents to Center

2:45:55

for Effective Altruism or 80,000 hours

2:45:57

or MIRI. So as

2:45:59

a hardcore. Utilitarianism as a hardcore utilitarian

2:46:02

your answer has to be no, right? No you

2:46:04

someone offers you this no one else can do it You

2:46:06

either give 35 cents or you don't to stop

2:46:08

the asteroid from it In Cleveland you say no because you want to donate

2:46:10

that money to something else, right? I think most

2:46:13

people will not go for that Nobody,

2:46:15

I think they're simpler in this where

2:46:17

I think like most these hardcore utilitarians Actually

2:46:20

are like not all of them but actually most of them are like

2:46:22

super super into honesty. They try

2:46:24

to defend it They'll be like well clearly

2:46:26

honesty is like the way to maximize utility And I'm

2:46:28

just like how did you

2:46:29

figure that out like what like you're

2:46:32

like you're like your level of honesty is way

2:46:34

beyond What actual like most of the most successful

2:46:36

and powerful people are doing so like how does

2:46:38

that work? How did you how did you determine

2:46:41

this this can't be right? And so I think most of these

2:46:43

hardcore utilitarians actually have tensions within

2:46:45

themselves that they aren't Recognizing that you can

2:46:47

just dry out if you if you red team

2:46:49

them instead of doing the normal philosophical thought

2:46:51

experience same to normal people

2:46:54

And then another place I go to challenge this view

2:46:56

is that I do think

2:46:58

The principles people try to build this thing

2:47:00

on the central ones are the utilitarianism

2:47:03

idea Which is like this thing that I didn't

2:47:05

explain well with our sonny's aggregation theorem But

2:47:08

I do I do have a written up you can link to it I

2:47:10

could try and explain it better, but whatever I think it's

2:47:12

a fine principle So I'm not gonna argue with it The

2:47:14

other principle people are leaning on is impartiality,

2:47:17

and I think that one is screwed, and it doesn't work at all

2:47:19

Yeah, yeah, yeah, so you think the impartiality

2:47:22

aspect of this is just completely busted. Yeah,

2:47:24

George want to elaborate on why that is

2:47:26

Yeah, so so this is something I

2:47:28

mean I think you covered it a bit with Joe But I had a little

2:47:30

bit of a different spin on it a little bit more of an agro

2:47:32

spin on it One way to think about impartiality

2:47:35

like a minimum condition for what we might mean by

2:47:37

impartiality would be that if two Persons

2:47:40

or people or whatever I call them persons

2:47:42

to just include whatever animals and ais

2:47:45

and whatever you know two persons Let's

2:47:47

say they're physically identical

2:47:49

then you should care about them equally I would

2:47:51

kind of like I would kind of claim This is like if

2:47:54

you're if you're not meeting that condition, then

2:47:56

it's weird to call yourself impartial and

2:47:58

something is up

2:47:59

probably the hardcore person is not a big fan of you.

2:48:02

And I think you just can't do that. And

2:48:05

all the infinite ethics stuff, it just completely

2:48:07

breaks that. Not in a way that's just like a weird

2:48:10

corner case. Sometimes it might not work. It's just like,

2:48:12

actually, should I donate

2:48:15

a dollar to charity? Well,

2:48:18

across the whole multiverse,

2:48:20

incorporating expected value and a finite

2:48:22

non-zero probability of an infinite

2:48:25

size universe, then it just follows

2:48:27

that my dollar helps and hurts infinite

2:48:30

numbers of people. And there's no answer

2:48:32

to whether it is a good dollar or a bad dollar,

2:48:34

because if it helps one person,

2:48:37

then it hurts a thousand, then it helps one, then it hurts a thousand

2:48:39

onto infinity, versus if it helps a thousand,

2:48:41

then it hurts one, then it helps a thousand, then it hurts one onto

2:48:44

infinity, those are the same. They're just rearranged.

2:48:46

There is no way to compare to infinities like that. It cannot

2:48:48

be done. It's not like no one's found it yet. It

2:48:50

just can't be done. Your system actually

2:48:53

just breaks completely. It just doesn't... It won't tell

2:48:55

you a single thing. It returns an undefined

2:48:57

every time you ask it a question. We're

2:48:59

rushing through this, but that was actually was kind of the bottom

2:49:02

line of the episode with Alan Hayek. It

2:49:04

was episode 139 on

2:49:05

puzzles and paradoxes and probability and expected value.

2:49:08

It's just, it's a bad picture. It's not

2:49:10

a pleasant, it's not a pretty picture. Yeah, yeah, exactly.

2:49:12

And I have other beefs with him personally. I think

2:49:14

I should actually go on for quite a while and at some point I'll

2:49:16

write it all up, but I think just like anything

2:49:18

you try to do where you're just like, here's a physical

2:49:20

pattern or a physical process or a physical thing

2:49:23

and everywhere it is, I care about it equally. I'm just like, that

2:49:25

is you're going to be so screwed. It's not going to work.

2:49:28

The infinities are the easiest to explain way. It doesn't

2:49:30

work, but it just doesn't work. And

2:49:32

so the whole idea that you were building

2:49:34

this beautiful utilitarian system on

2:49:36

one of the things you could be confident in, well,

2:49:39

one of the things you were confident in was impartiality

2:49:41

and it's got to go. And like, you know,

2:49:43

Joe kind of presented, it's like, well, you have these tough choices

2:49:45

in infinite ethics because you can't have all of Pareto

2:49:48

and impartiality, which he called anonymity and

2:49:51

transitivity. And I'm like, yeah, you can't have all of them.

2:49:54

You got to obviously drop impartiality.

2:49:56

You can't make it work. The other two are better. Keep

2:49:59

the other two drop impartiality.

2:49:59

Once you drop in partiality,

2:50:02

I don't know, now we're in the world of just

2:50:04

like, some things are physically identical,

2:50:06

but you care more about one than the other. In some ways, that's

2:50:08

a very familiar world. Like, I care more about my

2:50:10

family than about other people, really not for any

2:50:12

good reason.

2:50:13

You just have to lean into that because that's what you are

2:50:15

as a human being. You care more about some things

2:50:17

than others, not for good reasons. You

2:50:19

can use that to get out of all the infinite ethics

2:50:22

jams. It's like there's some trick to it, and it's

2:50:24

not super easy, but basically as long as you're

2:50:26

not committing to caring about everyone, you're gonna

2:50:28

be okay. And as long as you are, you're not. So don't care about

2:50:30

everyone. And this whole fundamental principle that

2:50:32

was supposed to be powering this beautiful morality, just doesn't work.

2:50:35

Yeah. Yeah, do you want to explain a little bit

2:50:38

the mechanism that you'd use to get away from it? But

2:50:40

basically you could have,

2:50:41

if you define some kind of central point and then have

2:50:43

some, like,

2:50:45

as things get further and further away from that central point,

2:50:47

then you value them less. As long

2:50:49

as you value them less at a sufficiently rapid rate, then

2:50:51

things sum to one rather than ending up summing to infinity.

2:50:54

And so... Yeah, exactly. So now you can make comparisons

2:50:56

again.

2:50:57

Yeah, and this is all me bastardizing and oversimplifying

2:51:00

stuff. But basically you need some system that

2:51:02

says we're discounting things

2:51:04

at a fast enough rate that everything adds up to a finite

2:51:07

number, and we're discounting them even

2:51:09

when they're physically identical to each other. We gotta have some

2:51:11

other way of discounting them. So like, you know, a stupid

2:51:13

version of this would be like you declare

2:51:15

a center of the universe in space and

2:51:18

time and effort branch and everything like that,

2:51:20

and you just like discount by distance from that center.

2:51:23

And if you discount fast enough, you're fine, and you don't run into

2:51:25

the infinities. You know, the way that I think it's like

2:51:27

more people are

2:51:28

into, I've referred to it a couple times already, Udasa,

2:51:31

is like you kind of say, hey, I'm

2:51:33

going to discount you by how long a computer

2:51:35

program I have to write to point to you. And

2:51:37

then you're gonna be like, what the hell are you talking

2:51:39

about? What computer program in what language?

2:51:42

And I'm like, whatever language, pick a language, it'll work.

2:51:44

And you're like, but that's so horrible, that's so arbitrary. So

2:51:47

if I pick Python versus I picked Ruby, then

2:51:49

that'll affect who I care about. And I'm like, yeah, well, it's

2:51:51

all arbitrary, it's all stupid. But at least you can get

2:51:53

screwed by the infinities. Anyway, so I think I think

2:51:55

if I were to be if I were

2:51:57

to take the closest thing

2:51:58

to a beautiful, simple utility,

2:51:59

system that gets everything right, Udasa

2:52:03

would actually be my best guess, and it's pretty

2:52:05

unappealing, and most people who say they're hardcore say they

2:52:07

hate it. I think it's the best contender. It's better than actually

2:52:10

adding everything up.

2:52:11

So that's one approach you could take. I

2:52:13

guess

2:52:14

the Infinity stuff makes me sad,

2:52:16

because it's in as much as

2:52:19

we're right that we're just not going to be able to solve this. We're

2:52:22

not going to come up with any elegant solution that resembles our

2:52:24

intuitions or that embodies

2:52:26

impartiality in the way that we care about. Now

2:52:29

we're valuing one person because it was easier to specify

2:52:31

where they are using the Ruby programming

2:52:33

language. That doesn't capture

2:52:35

my intuitions about value or about ethics.

2:52:39

It's a very long way from them in actual fact.

2:52:41

It feels like any system like that is just

2:52:43

so far away from what I enter this

2:52:45

entire enterprise caring about that

2:52:47

I'm tempted to just give up and embrace

2:52:49

nihilism. Yeah, I think

2:52:51

that's a good temptation. Not nihilism.

2:52:54

I'm just going to do stuff that I want to do. Yeah.

2:52:56

Well, I mean look, that's kind of where I am.

2:52:58

I mean I think I'm like, look, Udast is the best you

2:53:00

can do. You probably like it a lot less than what you

2:53:02

thought you were doing. And a reasonable response

2:53:05

would be screw all this. And then after you screw all this,

2:53:07

okay, what are you doing? And I'm like, okay, well

2:53:09

what I'm doing I still like my job. I still

2:53:11

like my job. I still care about my family. And you

2:53:13

know what? I still want to be a good person. What does that mean? I

2:53:16

don't know. I don't know. Like, I notice when

2:53:18

I do something that feels like it's bad. I notice when

2:53:20

I do something that feels like it's good. I notice that

2:53:22

like,

2:53:23

I'm glad I started a charity evaluator

2:53:25

that helps people in Africa

2:53:28

and India instead of just spending my whole

2:53:30

life making money. Like, I don't know. That didn't

2:53:32

change. I'm still glad I did that.

2:53:34

And I'm just, I don't have a beautiful philosophical

2:53:36

system that gives you three principles that can derive it, but

2:53:38

I'm still glad I did it. And that's pretty much where

2:53:40

I'm at. And that's where I come back to just being like, I'm

2:53:42

not that much of a philosophy guy because I think philosophy

2:53:45

isn't really that promising.

2:53:46

But I am a guy who like works really hard to try and do

2:53:48

a lot of good because I don't think you need to be a philosophy guy to do that.

2:53:51

Just in the interview that, you know, if your

2:53:53

philosophy feels like it's breaking, then

2:53:55

that's probably a problem with the philosophy rather than with

2:53:57

you. And I wonder whether we can turn that to this case where

2:53:59

we say Well, we don't really know why induction

2:54:02

works, but nonetheless, we all go about

2:54:04

our lives as if induction is reasonable.

2:54:06

And likewise, we might say,

2:54:07

we don't know the solution to these infinity paradoxes

2:54:10

in the Mouldy verse and all of that,

2:54:12

but nonetheless, impartial welfare

2:54:14

maximization feels right. And so hopefully at some

2:54:16

point we'll figure out how to make this work and how to make

2:54:19

it reasonable. And, you know, in the meantime,

2:54:21

I'm not going to let these funny philosophical thought experiments

2:54:24

take away from me what I thought the core of ethics

2:54:26

really, really was.

2:54:27

But my question is, why is that the core of ethics? So my

2:54:29

thing is, I want to come back to the burden of proof. I

2:54:32

think I just want to be like, fine, we give up.

2:54:34

Now what are we doing? And I'm like, look, if someone

2:54:36

had a better idea than induction, I'd be pretty interested,

2:54:38

but it seems like no one does. But like, I

2:54:40

do think there is an alternative to these

2:54:43

like very simple, beautiful systems of

2:54:45

ethics that like tell you exactly

2:54:47

when to break all the normal rules. I think the alternative

2:54:49

is just like, you don't have a beautiful system, you're just like a person like

2:54:51

everyone else. Just imagine that you're not

2:54:54

very into philosophy

2:54:55

and you still care about being a good person. That's most people.

2:54:57

You can do that. It's your default. Then

2:55:00

you got to talk me out of that. You got to be like, Holden, here's something

2:55:02

that's much better than that, even though it breaks. And

2:55:04

I'm like, yeah, I haven't heard someone do

2:55:06

that. Yeah. Well, despite everything

2:55:08

you've just said, you say that you think that impartial expected

2:55:10

welfare maximization is underrated. Do you think

2:55:12

you wish that people did it like that the

2:55:14

random person on the street did it more? Do

2:55:17

you want to explain how that can possibly be?

2:55:19

I don't think he does. It's that bad. I

2:55:21

mean, I don't think like there's no way it's going

2:55:24

to be like the final answer or something. Like,

2:55:27

I don't know, like something like that. And later we're

2:55:29

going to come up with something better that's kind of like it. There's

2:55:31

going to be partiality in there. It might be

2:55:33

that it's some sort of like,

2:55:35

I tried to be partial and arbitrary, but in

2:55:37

a very principled way where I just kind of take

2:55:40

the universe as I live in it and try

2:55:42

to be fair and nice to those

2:55:45

around me. And I have to weight them a certain way. And

2:55:47

so I took the simplest way of weighting them. And

2:55:49

it's like, it's not going to be as compelling as

2:55:51

the original vision for utilitarianism was supposed to

2:55:53

be. I don't think it's that bad. And I think there's like

2:55:56

some arguments that are actually like this weird

2:55:58

simplicity criterion of like how

2:55:59

easy it is to find you with a computer program, you

2:56:02

could think of that as like,

2:56:04

what is your measure in the universe or like how much

2:56:06

do you exist or how much of you is there in the universe?

2:56:09

There are some arguments you could think of it that way. So I don't know.

2:56:11

I don't think Udasa is like totally screwed, but

2:56:13

I'm not about to like shut down open philanthropies

2:56:15

like Farm Animal Welfare program because

2:56:17

of this Udasa system. So that's yeah, it's more or less

2:56:19

the middle ground that I've come to. Yeah.

2:56:21

You know, I also just think there's a lot of good and just without

2:56:25

the beautiful system, just challenging yourself, just

2:56:27

saying, hey, common sense reality really

2:56:29

has done badly. Can I do better? Can

2:56:31

I like

2:56:32

do some thought experiments until I really believe

2:56:34

with my heart that I care a lot more about the future

2:56:37

than I thought I did and think a lot about the future? I

2:56:39

think that's fine. I think the part where you say

2:56:41

the 10 to the 50 number is taken literally

2:56:43

and like is in the master system

2:56:45

is exactly 10 to the 50 x is important

2:56:48

to saving one life. I think that's the dicier part.

2:56:50

Yeah. I thought you might say that, you know, the

2:56:53

typical person walking around who hasn't thought about any of these

2:56:55

issues, they nonetheless care about other

2:56:57

people and about having their lives go well, like at least a

2:56:59

bit.

2:57:00

And they might not have appreciated like just

2:57:02

how large an impact they could have if they turned a bit

2:57:04

of their attention to that how much they might be able to help other

2:57:06

people. So without any like deep philosophy or

2:57:08

any like great reflection or changing in their values,

2:57:10

it's actually just pretty appealing to help

2:57:13

to like do things that effectively help other people. Yeah.

2:57:15

And that's kind of what motivates you. I imagine totally.

2:57:18

I love trying to do what's good possible by

2:57:20

defining kind of a sloppy way that isn't a beautiful

2:57:23

system. And I even like the philosophical thought

2:57:25

experience. They have made me move a bit more

2:57:27

toward caring about future generations

2:57:29

and

2:57:30

especially whether they get to exist, which I think intuitively

2:57:32

is not exciting to me at all still isn't that

2:57:34

exciting to me, but it's more than it used to be. So,

2:57:36

you know, I think I think there's like value in here,

2:57:39

but the value comes from like wrestling with the stuff, thinking

2:57:41

about it and deciding where your heart comes out in the

2:57:43

end. But but I just think the dream of a beautiful system

2:57:45

isn't there. I guess the final thing I want

2:57:47

to throw in there, too, as I mentioned this earlier in the podcast.

2:57:50

But if you if you did go in on Yudasa

2:57:52

or you had the beautiful system or you somehow

2:57:54

managed to be totally impartial, I do

2:57:56

think long termism is a weird conclusion from that.

2:57:58

And so you at least should.

2:57:59

And we should realize that what you actually

2:58:02

should care about is something far weirder than future generations.

2:58:05

And if you're still comfortable with it, great. And if you're not,

2:58:07

you may want to also rethink things. Yeah.

2:58:09

So a slightly funny thing about having this conversation

2:58:11

in 2023 is that I think worldview

2:58:14

diversification doesn't get

2:58:16

us as far as it used to, or the idea of wanting

2:58:18

to split your bets across different worldviews. Yeah,

2:58:21

yeah, yeah. As AI becomes like just a more

2:58:23

dominant and obviously important consideration

2:58:25

in how things are going to play out, like not

2:58:28

just for strange existential risk related reasons, but

2:58:30

it seems incredibly related now to, you know, we'll

2:58:32

be able to get people out of poverty. We'll be able

2:58:34

to solve lots of medical problems. It wouldn't be

2:58:37

that crazy to try to help farm animals by doing something

2:58:39

related to ML models at some point in

2:58:41

the next few decades. And also if you think that it's

2:58:43

plausible that we could go extinct

2:58:44

because of AI in the next 10 years,

2:58:46

then just from a life saving list, just

2:58:48

in terms of saving lives of people alive right now, it

2:58:51

seems like an incredibly important and neglected issue.

2:58:53

It's just a funny situation to be in where like the different

2:58:56

worldviews that we kind of picked out 10 years ago now

2:58:58

like might all kind of be converging, at least temporarily

2:59:00

on a very similar set of activities because of some

2:59:03

very like

2:59:04

odd historically abnormal, like

2:59:06

indeed like deeply suspicious empirical

2:59:08

facts that we happen to be living through right now.

2:59:11

That's exactly how I feel. And it's all very awkward

2:59:13

because it just makes it hard for me to explain what

2:59:15

I even disagree with people on because I'm kind of like, well,

2:59:18

I do believe we should be mostly focused on AI

2:59:20

risk, though not exclusively. And I am

2:59:23

glad that open film puts money in other things. But

2:59:25

you know, I do believe AI risk is like the biggest

2:59:27

headline, because of these crazy

2:59:29

historical events that could be upon us. I

2:59:32

disagree with these other people who say we should be in AI

2:59:34

risk because of this insight about the size of the future. Well,

2:59:36

that's awkward. And it's kind of a strange

2:59:38

state of affairs. And I haven't always known what to do with it.

2:59:41

But yeah, I do feel that the effect of altruism unity

2:59:43

has,

2:59:44

has kind of felt philosophy first, it's kind of

2:59:46

felt like our big insight is there's a lot

2:59:48

of people in the future. And then we've kind of

2:59:50

worked out the empirics and determined the biggest threat to

2:59:52

them is AI. And I just like

2:59:55

reversing it. I just like being like,

2:59:57

we are in an incredible historical period, no

2:59:59

matter what your philosophy.

Rate

Get this podcast via API

From The Podcast

80,000 Hours Podcast

Unusually in-depth conversations about the world's most pressing problems and what you can do to solve them. Subscribe by searching for '80000 Hours' wherever you get podcasts. Produced by Keiran Harris. Hosted by Rob Wiblin and Luisa Rodriguez.

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More