Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:00
I believe you can make the entire case for
0:02
being extremely concerned about AI, assuming
0:04
that AI will never be smarter than a human. Instead,
0:07
it will be as capable as the most
0:09
capable humans. And there will be a ton
0:12
of them because unlike humans, you
0:14
can just copy them. You can use your copies
0:16
to come up with ways to make it more efficient,
0:18
just like humans do. Then you can make more copies.
0:21
And when we talk about whether AI could defeat
0:23
humanity, and I've written one blog post on whether AI
0:25
could kind of like take over the world, they don't
0:28
have to be more capable than humans. They
0:30
could be equally capable and there could be more of them
0:32
that, that could really do it. That could really be enough
0:35
that then we wouldn't be, humans wouldn't be
0:37
in control of the world anymore. So I'm
0:39
basically generally happy
0:41
to just have all discussions about AI and what
0:43
the risks are just in this world where like,
0:46
there's nothing more capable than a human, but it's pretty scary
0:48
to have a lot of those that have different values
0:50
from humans that are kind of a second advanced species.
0:52
Um, that's not to rule out that, that some
0:54
of these super intelligence concerns could be real. It's just
0:56
like, they're not always necessary and they can sideline
0:59
people.
1:01
People always love an interview with Holden,
1:04
founder of Open Philanthropy. We last
1:06
spoke with Holden in 2021 about
1:08
his theory that we're plausibly living in the
1:10
most important century of all of those that are
1:12
yet to come. And today we discuss other
1:15
things that have been preoccupying him lately, including
1:17
what the real challenges are that are raised by
1:19
rapid advances in AI. Why not
1:21
just gradually solve those problems as they come up? What
1:24
multiple different groups are able to do about it, including
1:27
listeners to this show, governments, computer
1:29
security experts, journalists, and on and on.
1:32
What various different groups are getting wrong about
1:34
AI in Holden's opinion, how we might
1:36
just succeed with artificial intelligence
1:38
by sheer luck. Holden's four
1:41
different categories of useful work
1:43
to help with AI, plus a few random
1:45
audience questions as well. At the end, we
1:48
also talk through why Holden rejects
1:50
impartiality as a core principle
1:52
of morality and his
1:54
non-realist conception of why it is that he
1:56
bothers to try to help others at all. After
1:59
the interview.
1:59
I also respond to some reactions
2:02
we got to the previous interview with Ezra
2:04
Klein. Without further ado, I bring you Holden
2:06
Karnovsky.
2:18
Today I'm again speaking with Holden Karnovsky.
2:21
In 2007, Holden co-founded the charity evaluator
2:23
GiveWell, and then in 2014, he co-founded
2:26
the foundation Open Philanthropy, which works
2:28
to find the highest impact grant opportunities, and has
2:30
so far recommended around $2 billion in grants. But
2:33
in March 2023, Holden started
2:35
a leave of absence from Open Philanthropy to instead
2:37
explore working directly on AI safety and
2:39
ways of improving outcomes from recent advances
2:42
in AI. He also blogs at cold-takes.com
2:45
about futurism, quantitative macro history,
2:47
and epistemology, though recently he's had, again,
2:49
a particular focus on AI, writing posts like
2:51
How We Could Stumble Into AI Catastrophe, Racing
2:54
Through a Minefield, The AI Deployment Problem, and
2:56
Jobs That Can Help With the Most Important Century. I
2:58
should note that Open Philanthropy is 80,000 Hours' biggest
3:00
supporter, and that Holden's wife is Daniela
3:03
Amadei, the president of the AI Lab Anthropic.
3:05
Thanks for returning to the podcast, Holden.
3:07
Thanks for having me. I hope to talk about
3:09
what you'd most like to see people do to positively
3:11
shape the development of AI, as well as your
3:13
reservations about utilitarianism. But first,
3:16
what are you working on at the moment, and why do you think it's important?
3:19
Sure. Currently on a leave of
3:21
absence from Open Philanthropy, just taking a little
3:23
time to explore different ways I might be able
3:25
to do direct work to reduce
3:27
potential risks from advanced AI. One
3:30
of the main things I've been trying to do recently, although
3:32
there's a couple things I'm looking into, is understanding
3:34
what it might look like to have AI safety
3:36
standards, by which I mean documenting
3:40
expectations that AI companies
3:42
and other labs won't build and deploy
3:44
AIs that pose too much risk to the world, as
3:47
evaluated by some sort of systematic evaluation
3:50
regime. These
3:52
expectations could be done via
3:54
self-regulation, via regulation-regulation.
3:57
There's a lot of potential interrelated pieces. So
4:00
to make this work, I think you would need ways of
4:02
evaluating when an AI system is dangerous.
4:04
That's sometimes called eval's. Then you would
4:06
need potential standards that would
4:09
basically talk about how you connect
4:11
what you're seeing from the eval's with what kind of measures
4:13
you have to take to ensure safety. Then also
4:16
to make this whole system work, you would need some
4:18
way of making the case to the general public that standards
4:20
are important, so companies are more likely
4:22
to adhere to them. And so things
4:25
I've been working on include trying to understand
4:27
what standards look like in more mature industries,
4:30
doing one case study and kind of trying to fund
4:32
some other ones there, trying to learn lessons
4:35
from what's already worked in the past elsewhere. I've
4:37
been advising groups like ArcEval's
4:40
thinking about what eval's and standards could look like.
4:43
And I've been trying to understand what pieces of the
4:45
case for standards are and are not resonating with people
4:47
so I can think about how to kind of increase
4:50
public support for this kind of thing. So
4:52
it's pretty exploratory right now, but that's been one of the
4:54
main things I've been thinking about. Yeah,
4:56
I guess there's a huge surface area on
4:58
ways one might attack this problem. Why the focus
5:01
on standards and eval's in particular?
5:03
Yeah, sure. I mean, we can get to this, but
5:05
I've kind of been thinking a lot about what
5:08
the major risks are from advanced AI and what
5:10
the major ways of reducing them are. And,
5:13
you know, I think there's kind of a few different
5:15
components that seem most promising
5:17
to me as part of,
5:19
I don't know, most stories I can tell in my head for
5:21
how we get the risks down very far.
5:24
And this is the piece of the puzzle that
5:26
to me feels
5:27
like it has a lot of potential, but there isn't much
5:29
action on it right now and that someone
5:32
with my skills can potentially play a big role in
5:34
helping to get things off the ground, helping to spur
5:36
a bit more action, getting people
5:39
to, you know,
5:40
to just like move a little faster. I'm
5:42
not sure this is like what I want to be doing for
5:44
the long run, but I think it's in this kind of nascent
5:47
phase where my kind of background
5:49
with just starting up things in very vaguely
5:51
defined areas and getting to the point where they're a little bit more
5:53
mature is maybe helpful.
5:56
Yeah. How has it been no longer leading
5:58
an organization with tons of... employees
6:00
and being able to self-direct a little bit more. You
6:02
wouldn't have been in that situation for quite some time,
6:04
I guess.
6:05
Yeah, I mean, it was somewhat gradual.
6:07
So we've been talking for for
6:10
several years now, I think, at least since 2018 or
6:12
so about succession at Open Philanthropy,
6:15
because I've always been a person who likes
6:17
to start things and likes to be in there in that nascent
6:19
phase and prefers always
6:21
to find someone else who can run an organization
6:24
for the long run. And we, you know, a couple
6:26
years ago, Alexander became co-CEO
6:29
with me and has taken on more and more duties.
6:31
So it wasn't an overnight thing. And
6:33
I'm still not completely uninvolved.
6:35
I mean, I'm
6:35
still on the board of Open Philanthropy. I'm
6:37
still meeting with people. I'm still advising, you
6:40
know, kind of similar to GiveWell. I had a very, very
6:43
gradual transition away from GiveWell and
6:45
still talk to them frequently. So
6:47
it's, you know, it's been a gradual thing. But for me, it
6:49
is an improvement. I think it's not my happy
6:52
place to be at the top of that org chart.
6:54
Yeah.
6:55
Okay, so today, we're
6:57
not going to rehash the basic arguments for worrying
6:59
about ways that AI advances
7:02
could go wrong, or ways that maybe this century
7:04
could turn out to be really unusually important.
7:06
I think, you know, AI risk, people have heard of it now.
7:09
We've done lots of episodes on it. I
7:12
guess people who wanted to hear your broader worldview
7:14
on this could go back to previous interviews, including
7:16
with you, such as episode 109, Holden Knofsky on
7:18
the most important century. In
7:20
my mind, the risks are both
7:22
pretty near term, and I think increasingly kind of apparent.
7:25
So to me, it feels like the point in this
7:27
whole story where we need to get down a bit more to brass tacks
7:30
and start debating what is to be done and
7:32
figuring out what things might might really help that we could
7:35
get moving on. That said, we should
7:37
take a minute to think about, you know, which aspect of
7:39
this broader problem are we talking about today? And which one
7:41
are you thinking about? Of course, there's risks
7:43
from misalignment. So AI models
7:46
completely flipping out and trying to take over would
7:48
be an extreme case of that. Then there's misuse, where
7:51
the models are doing what people are telling them to do,
7:53
but we wish that they weren't,
7:55
perhaps. And then I guess that there's other
7:57
risks like just speeding up history, causing a
8:00
whole lot of stuff to happen incredibly quickly, and perhaps that
8:02
leading us into disaster. Yeah, which
8:04
of the aspects of this broader problem do you
8:06
think of yourself as trying to contribute to Solve right now?
8:09
Yeah, I mean, first off to your point, I mean, I
8:11
am happy to focus on solutions, but I do think
8:13
it can be hard to have a really good conversation on solutions
8:16
without having some kind of shared understanding of the problem.
8:18
And I think while a lot of people are getting
8:20
vaguely scared about AI, I
8:23
think there's still a lot of room to have, you
8:26
know, a lot of room to disagree on what exactly
8:28
the most important aspects of the problem are, what
8:30
exactly the biggest risks are. For
8:32
me, the two you named, misalignment and misuse,
8:35
are definitely big. I would throw some others
8:37
in there too that I think are also big.
8:39
I think, you know, we may
8:41
be on the cusp of having a lot
8:43
of things work really differently about the world, and
8:46
in particular having kind of what you might
8:48
think of as new life forms, whether that's AIs
8:50
or, you know, I've written in the past on cold
8:52
takes about digital people, that if we had
8:54
the right technology, which we might be able to develop with AIs
8:57
help, we might have kind of, you know, simulations
8:59
of humans that we ought to think of as kind of humans
9:02
like us. And that could lead to a lot
9:04
of challenges, you know, just the
9:06
fact, for example, that you could have human
9:08
rights abuses happening inside a computer. It
9:11
seems like a very strange situation that
9:13
society has not really dealt with before.
9:15
And I think there's a bunch of other things like that. What
9:18
kind of world do we have when someone can just make
9:21
copies of
9:22
people or of minds and
9:24
ensure that those copies believe certain things and
9:27
defend certain ideas that I think
9:29
could challenge the way a lot of our existing institutions
9:31
work. So there's a nice piece,
9:33
Propositions About Digital Minds, that I think is a flavor
9:35
for this. So I think there's a whole bunch of things I
9:38
would point to as important. I think out
9:41
of, you know, in this category, I think if I had
9:43
to name one risk that I'm most focused
9:45
on, it's probably the misaligned AI risk. It's probably
9:47
the one about, you know, you
9:50
kind of build these very capable, very
9:52
powerful
9:52
AI systems. They're these systems that
9:55
if, for whatever reason, they were pointed at bringing
9:57
down all of human civilization, they could. And
9:59
then something about your training is kind of
10:01
sloppy or leads to unintended
10:04
consequences so that you actually do have AIs
10:06
trying to bring down civilization. I think that
10:08
is probably the biggest one, but I think there's also
10:11
a meta threat that to
10:13
me is really the unifying catastrophic risk
10:15
of AI. And so for me that I
10:18
would abbreviate as just saying like explosively
10:20
fast progress. So the central
10:23
idea of the most important century series that I wrote
10:26
is that if you get an AI with
10:28
certain properties, there's a bunch of
10:29
reasons from economic theory, from someone
10:32
from economic history. I think we're also putting
10:34
together some reasons now that you can take more from
10:36
the specifics of how AI works and
10:38
how algorithms development works to expect
10:41
that you could get a dramatic acceleration
10:43
in the rate of change and particularly in the
10:45
rate of scientific and technological advancement,
10:48
particularly in the rate of AI advancement itself so
10:50
that things move on a much faster timescale
10:53
than anyone is used to. And one of the central
10:55
things I say in the most important century series is that if
10:57
you imagine a wacky sci-fi
10:59
future, the kind of thing you would
11:02
imagine thousands of years from now for
11:04
humanity with all these wacky technologies, that
11:06
might actually be years or
11:09
months from the time when you
11:11
get in range of these super powerful AIs
11:13
that have certain properties. That to me is
11:15
the central problem. And I think all these other risks that
11:17
we're talking about, they wouldn't
11:20
have the same edge to them if it weren't for that.
11:22
So misaligned AI, if AI
11:25
systems got very gradually more powerful
11:27
and we spent a decade with systems that
11:29
were
11:29
kind of close to as capable
11:32
as humans, but not really, and then a decade with systems
11:34
that were about as capable as humans with some strengths and
11:36
weaknesses, then a decade of systems a little more
11:38
capable, I wouldn't really be that worried.
11:40
I feel like this is something we could kind of adapt to as we
11:42
went and figure out as we went along. Similarly,
11:45
with misuse, AI systems might
11:47
end up able to help develop
11:49
powerful technologies that are very scary, but that wouldn't
11:51
be as big a deal. It would be kind of a continuation
11:53
of history if this just
11:56
went in a gradual way. And my big concern
11:58
is that it's not gradual.
11:59
Maybe we're digging on that a little bit more, is exactly
12:02
how fast do I mean and why, even though I have covered
12:04
it somewhat in the past, because that to me is really
12:06
the central issue. And one of the reasons I'm so
12:08
interested in AI safety standards is because
12:11
it is kind of, no matter what risk
12:13
you're worried about, I think you hopefully
12:15
should be able to get on board with the idea that you should measure
12:18
the risk and not unwittingly
12:21
deploy AI systems that are carrying a ton of the
12:23
risk before you've at least made a deliberate, informed
12:25
decision to do so. And I think if
12:28
we do that, we can anticipate a
12:29
lot of different risks and stop them from
12:32
coming at us too fast. Too fast is the central
12:34
theme for me.
12:35
Yeah. Yeah, it's very interesting framing
12:37
to put the speed of advancement like front
12:39
and center as this is kind of the key way that this
12:41
could go off the rails and in all sorts of different directions.
12:44
So, Eliezer Jutkowski has this kind of classic story
12:46
about how you get an AI taking over the
12:48
world like remarkably quickly. And
12:50
a key part of the story as he tells it is this
12:52
sudden self-improvement loop where the AI
12:55
gets better at doing AI research and that improves itself
12:57
and then it's better at doing that again. And so you get
12:59
this recursive loop where suddenly you go from
13:01
somewhat human level intelligence to something that's very,
13:04
very, very superhuman. And
13:05
I think many people
13:06
reject that primarily because they reject the
13:09
speed idea that they think, yes, if you got
13:11
that level of advancement over a period of days, sure,
13:13
that might happen. But actually, I just don't expect
13:15
that recursive loop to be quite so
13:17
quick. And likewise, if we might
13:19
worry that AI might be used by people to
13:22
make bioweapons, but if that's something that gradually came
13:24
online over a period of decades, we probably have all kinds of responses
13:26
that we could use to try to prevent that. But if it goes
13:28
from one week to the next, then we're in
13:30
a tricky spot. Do you want to expand on that? Is there maybe
13:33
insights that come out of this speed-focused
13:35
framing of the problem that people aren't taking
13:38
quite seriously enough?
13:39
Yeah, I should first say I don't know that I'm on the
13:41
same page as Eliezer. I can't totally always tell,
13:43
but I think he is picturing probably a more
13:46
extreme and faster thing that I'm picturing and
13:48
probably for somewhat different reasons. I think
13:50
a common story in some corners of this
13:53
discourse is this idea of an AI that
13:56
it's this simple computer program and it rewrites
13:58
its own source code. And it's like the,
14:00
you know, that's where all the action is. I don't think
14:03
that's exactly the picture I have in mind, although there's
14:05
some similarities. And so that, you know, the
14:07
kind of thing I'm picturing is maybe more like a months
14:09
or years time period from getting sort
14:12
of near human level AI
14:14
systems and what that means is definitely debatable
14:16
and gets messy, but near human level
14:19
AI systems to just like very, very powerful
14:21
ones that are advancing science and technology
14:24
really fast. And then science and technology,
14:26
like at least on certain fronts that are not, that
14:29
are the less bottlenecked
14:29
fronts, and we could talk about bottlenecks in a minute,
14:32
you get like a huge jump. So I
14:34
think my view is at least somewhat more moderate
14:36
than Eliezer's and at least has somewhat different dynamics.
14:39
But I think there is, you know, both
14:42
points of view are talking about this rapid change. I
14:44
think without the rapid change,
14:46
A, things are a lot less scary generally. B,
14:48
I think it is harder to justify a lot
14:51
of the stuff that AI concerned people do to try and
14:53
get out ahead of the problem and think about things in advance,
14:55
because I think a lot of people sort of complain
14:57
with this discourse that it's really hard to know the
14:59
future and all this stuff we're talking about what future
15:02
AI systems are going to do, what we have to do about
15:04
it today. It's very hard to get that right. It's
15:06
very hard to anticipate what things will be like in
15:08
an unfamiliar future. And I think when people
15:10
complain about that stuff, I'm just like very sympathetic.
15:12
I think that's like,
15:13
right. And if I thought that
15:15
we had the option to adapt to everything
15:18
as it happens, I think I would in many
15:20
ways be tempted to just work on other problems
15:22
and then kind of in fact adapt to things as they happen
15:24
and see what's happening and see what's most needed. And
15:27
so I think a lot of the case for planning
15:29
things out in advance, trying to tell stories of
15:32
what might happen, trying to figure out
15:34
what kind of regime we're going to want and put the pieces
15:36
in place today, trying to figure out what kind of research
15:38
challenge is going to be hard and doing today. I think a
15:40
lot of the case for that stuff being so important does
15:43
rely on this theory that things could move
15:45
a lot faster than anyone is expecting. I
15:48
am in fact very sympathetic to people who would rather
15:50
just adapt to things as they go. They're
15:52
the right way to do things. And I think
15:55
many attempts to anticipate future problems
15:57
or things I'm just not that interested in because
15:59
of this issue.
15:59
But I think AI is a place where we have to
16:02
take the explosive progress things seriously
16:04
enough that we should be doing our best to prepare for it. Yeah,
16:07
I guess if you have this explosive growth, then
16:10
the very strange things that we might be
16:12
trying to prepare for might be happening in 2027 or incredibly soon. Something
16:16
like that, yeah. Yeah, it's imaginable, right?
16:18
And it's all extremely uncertain
16:20
because we don't know. In my
16:22
head, a lot of it is like there's a set
16:24
of properties that an AI system could have, roughly
16:27
being able to do roughly everything humans are
16:29
able to do to advance science and technology or
16:31
at least able to advance AI research. We
16:33
don't know when we'll have that. And so it's like, you
16:35
know, one possibility
16:36
is we're like 30 years away from that.
16:38
But once we get near that, things will
16:41
move incredibly fast. And that's a world
16:43
we could be in. We could also be in a world where we're only a few years from
16:45
that, and then everything's going to get much crazier than anyone
16:47
thinks, much faster than anyone thinks. Yeah,
16:49
I guess one narrative is that
16:51
it's going to be exceedingly difficult to
16:54
align any artificial intelligence because you
16:56
know, you have to solve these 10 technical problems
16:58
that we've almost gotten no traction on so far.
17:01
So just from, you know, it gets decades
17:03
or centuries in order to fix them.
17:05
On this speed focused narrative,
17:07
it actually seems a little bit more positive because you
17:09
might be saying,
17:10
it might turn out that from a technical standpoint,
17:12
this isn't that challenging. The problem will be
17:15
that things are going to run so quickly that we might only
17:17
have a few months to figure out how, like
17:20
what solution we're choosing and actually try
17:22
to apply it in practice. But of course, in
17:24
as much as we just need to slow down, that
17:26
is something that in theory, at least people could agree
17:28
and actually just and try to coordinate
17:31
in order to do. Do you think that that is going to be a
17:33
part of the package that we ideally just want to
17:35
coordinate people as much as possible to make this
17:37
as gradual as feasible? Well,
17:39
these are separate points. So I think
17:41
you could believe in the speed and also
17:43
believe the alignment problems really hard. Believing
17:46
in speed doesn't make the alignment problem any easier.
17:48
And I think that the speed point is really just
17:50
bad news. I think it's just, you know, I
17:52
hope things don't move that fast. If
17:55
things move that fast, I think most human
17:57
institutions ways of reacting to things just we
17:59
can't count on that. them to work the way they normally do and
18:01
so we're gonna have to do our best to get out ahead of things
18:03
and and plan ahead and make things better in
18:05
advance as much as we can and it's mostly just
18:08
bad news. There's a separate thing which is that yeah I
18:10
do I am less of a I
18:12
do I do feel less convinced
18:14
than some other people that the alignment
18:16
problem is this like incredibly hard technical problem
18:19
and more feel like yeah if we did have
18:21
a relatively gradual set of developments I think
18:23
we'd have good I think even with a very
18:25
fast developments I think there's a good chance we just get lucky
18:27
and we're fine. So I think they're two different
18:30
axes. I know you talked with Tom Davidson about
18:32
this a bunch so I don't want to make it like the main theme of the
18:34
episode but I do think like in case someone
18:36
hasn't listened every ADK podcast ever just
18:38
just getting a little more into the why of
18:41
why you get such an explosive growth and why not
18:43
I think this is a really key premise and
18:45
I think right most of the rest of what I'm saying doesn't
18:47
make much sense without it and I want to own that
18:50
yeah I don't want to lose out on the fact that I
18:52
am sympathetic to a lot of reservations about
18:54
working on AI risk so yeah maybe
18:56
it would be good to cover that a bit.
18:58
Yeah let's let's do that so one
19:00
obvious mechanism by which things could speed up is that you have
19:02
this positive feedback loop where the AI's get better
19:04
at improving themselves is there is there much more to the
19:06
to the story than that?
19:09
Yeah I mean I think it's worth recheving
19:11
that briefly I mean I think one observation
19:13
I think is interesting and this is a you know report by
19:15
David Rudman for Open Philanthropy that goes through this
19:18
one thing that I've wondered is just like if you take
19:20
the path of world economic growth throughout
19:22
history and you just kind of extrapolated
19:24
forward in the simplest way you can what do you get
19:27
and it's like well it depends what time period you're looking
19:29
at if you look at economic history since 1900 or 1950 we've had
19:31
a few percent of your growth over that entire
19:34
time and if you extrapolate it forward you get a few
19:36
percent of your growth and you
19:38
just get the world everyone is kind of already expecting and
19:40
the world that's in the UN projections all that stuff the
19:43
interesting thing is if you zoom out and
19:45
look at all of economic history you
19:47
see that economic progress for most of
19:49
history not not recently has been accelerating
19:53
and if you try to model that acceleration in a
19:55
simple way and project that out in a simple way you
19:57
get basically the economy going to infinite
19:59
size sometimes this century, which is like a wild
20:03
thing to get from a simple extrapolation.
20:06
I think the question is like, why is that and why might
20:08
it not be? I think the
20:10
basic framework I have in mind is like, there is
20:13
a feedback loop you can have in the economy where
20:15
people have new ideas, new
20:18
innovations that makes them more productive. Once
20:20
they're more productive, they have more resources. Once
20:22
they have more resources, they have more kids.
20:24
There's more population or there's fewer deaths and there's more
20:27
people. So it goes more people,
20:29
more ideas, more resources, more
20:31
people, more ideas, more resources. When
20:34
you have that kind of feedback loop in place, any economic
20:36
theory will tell you that you get what's called super
20:38
exponential growth, which is growth that's accelerating.
20:41
It's accelerating on an exponential basis and
20:43
that kind of growth is very explosive,
20:46
is very hard to track and can go to infinity in finite
20:48
time. The thing that changed a couple hundred
20:50
years ago is that one piece of that feedback loop
20:53
stopped for humanity. People
20:55
basically stopped turning more resources into more
20:57
people. So right now, when people get richer,
20:59
they don't have more kids, they just get
21:01
richer. Buy another car. Yeah, exactly. And
21:04
so that feedback loop kind of broke. That's
21:07
not like a bad thing that it broke, I don't think, but it
21:09
kind of broke. And so we've had
21:11
just like what's called normal exponential growth, which
21:13
is still fast, but which is not the same thing,
21:15
doesn't have the same explosiveness to it. And
21:18
the thing that I think is interesting and different about
21:20
AI is that if you get
21:23
AI to the point where it's doing the same thing
21:25
humans do to have new ideas, to
21:27
improve productivity, so this is like the science
21:29
and invention part, then you can
21:31
turn resources into AIs
21:34
in this very simple linear way that
21:36
you can't do with humans. And so you
21:38
could get an AI feedback loop and
21:41
just to be a little more specific about what it might look like,
21:43
right now AI systems
21:46
are getting a lot more efficient. You can do a lot more with
21:48
the same amount of compute than you could 10 years ago, actually
21:51
a dramatic amount more. I think something,
21:53
various measurements of this or something like you can get the same
21:55
performance for something like 18X or 15X
21:58
less compute compared to like a few years
21:59
ago, maybe a decade ago.
22:01
Why is that? And it's because there's a bunch of human
22:03
beings who have worked on making AI algorithms
22:06
more efficient. So to me, the big scary
22:08
thing is when you have an AI that does
22:11
whatever those human beings were doing. And there's
22:13
no particular reason you couldn't have that because what those human
22:15
beings were doing, as far as I know, was mostly kind
22:17
of like sitting at computers, thinking of stuff,
22:19
trying stuff. There's no particular reason
22:21
you couldn't automate that. Once you automate that,
22:23
here's the scary thing. You have
22:25
a bunch of AIs. You use those AIs
22:28
to come up with new ideas to make your AIs more efficient.
22:30
Then let's say that you make your AIs
22:33
twice as efficient. Well, now you have twice as many AIs.
22:36
And so if having twice as many AIs can make your
22:38
AIs twice as efficient, again, there's really no
22:40
telling where that ends. And Tom Davidson
22:42
did a bunch of analysis of this, and I'm still
22:44
kind of poking at and thinking about it, but I think there's at least a decent
22:46
chance that that is the kind of thing that leads to
22:49
explosive progress where AI
22:51
could really take off and get very capable, very
22:53
fast. And you can extend that somewhat to other
22:56
areas of science. And it's like, some
22:59
of this will be bottlenecked. Some of this will be like, you
23:01
can only move so fast because you have to do a bunch of experiments
23:04
in the real world. You have to build a bunch of stuff. And
23:06
I think some of it will only be a little bottlenecked or will only
23:08
be somewhat bottlenecked. And I think there are some feedback loops
23:11
just kind of going from you get more
23:13
money, you're able to kind of quickly
23:16
with automated factories build more stuff
23:18
like solar panels, you get more energy,
23:20
and then you get more money, and then you're able
23:22
to do that again. And it's like in that loop, you
23:24
have this part where you're making everything more efficient all the time.
23:26
And I'm not going into all the details
23:29
here. It's been gone into more detail in my blog
23:31
post, The Most Important Century and Tom Davidson in
23:33
his podcast and continues to think about it. But
23:37
that's the basic model is that you
23:39
have this feedback loop that we have observed in history that
23:41
doesn't work for humans right now, but could work for AIs,
23:44
where you have AIs, have ideas
23:47
in some sense, make things more efficient. When
23:49
things get more efficient, you have more AIs. That creates
23:51
a feedback loop. That's where you get your super exponential growth.
23:54
Yeah, so one way of describing
23:56
this is talking about you get the
23:58
artificial intelligence becomes more
24:00
intelligent and that makes it more capable of improving
24:03
its intelligence and so it becomes super, super smart.
24:05
But I guess the way that you're telling it emphasizes
24:08
a different aspect, which is not so much that it's becoming
24:10
super smart, but that it is becoming super numerous
24:12
or that you can get effectively a population explosion.
24:15
And I think some people are skeptical of this super
24:17
intelligent story because they think you get really declining
24:19
returns to being smarter and that there's like some ways
24:21
in which it just doesn't matter how smart you
24:23
are, the world's too unpredictable, say, for you
24:26
to come up with a great plan. But
24:28
this is a different way by which you can get the same outcome,
24:30
which is just that you
24:32
have this enormous increase in the number of thoughts
24:35
that are occurring on computer chips,
24:37
more or less. And at some point, you know, 99% of
24:39
the thoughts that are happening on Earth could basically be
24:42
happening, be occurring inside artificial intelligences.
24:45
And then as they get better and they're able to make more chips
24:47
more quickly, again, you basically just
24:49
get the population explosion.
24:51
Yeah, that's exactly right. And I think this is
24:53
a place where I think some people get a little bit rabbit holed
24:55
on the AI debates because I think there's
24:58
a lot of room to debate how big
25:00
a deal it is to have something that's quote unquote extremely
25:03
smart or super intelligent or much smarter than human. And
25:06
it's like, OK, maybe if you had like something
25:08
that was kind of like a giant brain or something and
25:10
way smarter, whatever that means, than us,
25:13
maybe what that would mean is that it would like instantly see
25:15
how to make all these super weapons and conquer
25:17
the world and how to convince us of anything. And there's
25:19
all this stuff that that could mean and people debate
25:21
whether it could mean that. But it's uncertain.
25:24
It's a lot less uncertain if you're finding yourself skeptical
25:27
of what this smart idea means
25:29
and where it's going to go and what you can do with it. If you
25:31
find yourself skeptical of that, then just forget about it. And
25:33
just I believe you can make the entire
25:36
case for being extremely concerned about AI,
25:38
assuming that AI will never be smarter
25:40
than a human. Instead it will be as
25:43
capable as the most capable humans. And
25:45
there will be a ton of them because unlike
25:47
humans, you can just copy them. You
25:50
can copy them. You can use your copies to
25:52
come up with ways to make it more efficient,
25:54
just like humans do. Then you can
25:56
make more copies. And when we talk about whether
25:59
AI could defeat humanity. and I've written one blog post
26:01
on whether AI can kind of like take over the world, they
26:03
don't have to be more capable than humans.
26:06
They could be equally capable and there could be more of
26:08
them. That could really do it. That could really
26:10
be enough that then we wouldn't be,
26:12
humans wouldn't be in control of the world anymore. So
26:15
I'm basically generally happy
26:17
to just have all discussions about AI and
26:19
what the risks are just in this world where
26:21
like there's nothing more capable than a human, but
26:23
it's pretty scary to have a lot of those that have
26:25
different values from humans that are kind of a second advanced
26:28
species. That's not to rule out that
26:30
some of these super intelligence concerns could be real. It's
26:32
just like they're not always necessary and they can
26:34
sideline people.
26:36
Yeah, you can just get beaten by force
26:38
of numbers more or less. I think it's a little bit of a shame
26:40
that this sheer numbers argument
26:42
hasn't really been made very much. It
26:44
feels like the super intelligence story has been
26:46
very dominant in the narrative and media
26:49
and yet many people get off the boat because they're
26:51
skeptical of this intelligence thing. I think
26:54
it kind of is the fault of me and maybe people who've
26:56
been trying to raise the alarm about this because the focus really
26:58
has been on the super intelligence aspect rather than the super
27:01
numerousness that you could
27:03
get. Yeah, I don't know. I mean, I think there's valid concerns
27:05
like from
27:06
that angle for sure. And I'm not trying to
27:08
dismiss it, but I do. I
27:10
think there's a lot of uncertainty about what
27:12
super intelligence means and where it could go. And
27:14
I think you can raise a lot of these concerns without needing to have
27:16
a subtle view there. So
27:19
yeah, with the Jaya Kocher and Rowan Shah,
27:21
I found it really instructive to hear from them
27:24
about what are some kind of common opinions that
27:26
they don't share or maybe even just regard
27:28
as misunderstandings. Yeah, so maybe
27:30
let's go through a couple of those to help maybe situate you in the space
27:33
of ideas here. What's a common opinion
27:35
among kind of the community
27:36
of people working to address
27:38
AI risk that you personally don't share?
27:40
Yeah, I mean, I don't know. I think one kind of vibe I
27:42
pick up and I don't always have the exact quote
27:45
of whoever said what, but a vibe I pick up is this
27:48
kind of framework that kind of says, you know, if we
27:50
don't align our AIs, we're all going
27:52
to die. And if we can align our AIs,
27:54
that's great. And we've solved the problem. And that's
27:56
the problem we should be thinking about. And there's nothing
27:58
else really worth worrying about. You
28:00
know, it's kind of like alignment is the whole game would be
28:02
the hypothesis and I Disagree
28:04
with with both ends of that, but especially the
28:06
latter so to take the first end would be
28:09
like, you know If we if we don't align AI
28:11
we're all dead I mean first off I just think
28:13
it's like really Unclear even in the
28:15
even in the worst case where you get an AI that
28:17
has like its own values And there's
28:19
a huge number of them and they kind of team up and take
28:21
over the world Even then it's like really unclear
28:24
if that means we all die I think there's like I
28:26
know there's debates about this I have I have tried
28:28
to understand that I know that the the muri
28:31
folks I think feel really strongly clearly we all die
28:33
I've tried to understand where they're coming from and I have not
28:36
I think a key point is it just you know Could
28:38
be very very cheap as a percentage
28:40
of resources for example to
28:43
let humans have a nice life on earth and
28:45
not expand further and and be cut off
28:47
in certain ways from Threatening, you know AI's
28:49
ability to do what it wants that could be very
28:51
cheap compared to wiping us all out And there could
28:53
be a bunch of reasons one might want to do
28:55
that Some of them kind of wacky some
28:57
of them kind of you know, well, maybe You
29:00
know maybe in another part of the universe There's kind
29:03
of someone like the AI that
29:05
was trying to design its own AI
29:07
and that thing ended up with values like the humans
29:09
And you know, maybe there's some kind of trade that could be
29:11
made using like a causal trade And we don't need
29:14
to get into what all this means But it's like you don't
29:16
need much the thing is you don't need or like
29:18
maybe the AI is actually being simulated by humans Or
29:20
something or by some smarter version of humans or some more
29:22
powerful version of humans and being tested
29:24
to see if it'll wipe Out the humans would be nice to them It's like you
29:27
don't need a lot of reasons, you know to
29:29
kind of like leave one planet out if you're kind
29:31
of expanding throughout the galaxy So that would
29:33
be one thing is it just like I don't know It's like kind of uncertain
29:35
what happens even in the worst case and then
29:37
there's like I do think there's a bunch of in-between
29:40
cases where we kind of have a eyes
29:42
that are like there they're
29:44
sort of aligned with humans like if you if you
29:46
think about a analogy that often comes
29:48
up is like humans and natural selection where Humans
29:51
kind of were put under pressure by natural selection
29:54
to have lots of kids or to you know Do inclusive
29:56
reproductive fitness and we've
29:58
kind of okay. We invented birth control and a
30:00
lot of times we don't have as many kids as
30:03
we could and stuff like that. But also humans still
30:05
have kids and love having kids. A lot of humans
30:07
have 20 different reasons to have kids and
30:10
after a lot of the original ones have been knocked out
30:12
by weird technologies, they still find some other reason to have kids.
30:15
I don't know, I found myself one day wanting
30:17
kids and had no idea why and invented
30:20
all these weird reasons. I don't know,
30:23
it's not that odd to think that you could
30:25
have AI systems that just kind of like, yeah,
30:27
they're pretty off kilter from what we were trying to make
30:30
them do, but it's not like they're doing something completely
30:32
unrelated either. It's not like they have no drives
30:34
to do a bunch of stuff related to the stuff we wanted them
30:36
to do. Then you could also just have situations
30:39
where, especially in
30:41
the early stages of all this, where you might have
30:43
kind of near human level AIs and
30:45
so they might have goals of their own but they might not
30:48
be able to coordinate very well or they might not be able
30:50
to reliably overcome humans so
30:52
they might end up cooperating with humans a lot. We might
30:54
be able to leverage that into kind of
30:57
having AI allies that help us build
30:59
other AI allies that are more powerful so we might
31:01
be able to stay in the game for a long way. I don't know,
31:03
I just think things could be very complicated.
31:05
It doesn't feel to be like if you
31:07
screw up a little bit with the alignment problem then we all die.
31:11
The other part, if we do align
31:13
the AI, we're fine. I disagree with much more
31:15
strongly. I just think- All right.
31:18
More strongly than that. Okay, yeah, yeah, yeah. Go
31:20
for it. The first one, I mean, look, I think it would
31:22
be really bad to have misaligned AI and
31:24
I think despite the feeling that I feel it is fairly
31:27
overrated in some circles, I still think it's like the
31:30
number one thing for me. Just
31:32
like the single biggest issue in AI is
31:34
just like we're building these potentially
31:37
very powerful, very replicable, very numerous
31:39
systems and we're building them in ways we don't
31:41
have much insight into whether
31:43
they have goals, what the goals would be. We're
31:46
kind of introducing the second advanced species onto
31:48
the planet that we don't understand and if that
31:50
advanced species becomes more numerous and or more
31:52
capable than us, we don't have a great
31:54
argument to think that's going to be good for us. I'm
31:57
on board with alignment risk.
31:59
I don't know.
31:59
thing, not the only thing, the number one thing. But
32:02
I would say, if you just assume that
32:04
you have a world of very capable
32:07
AIs that are doing exactly what humans
32:09
want them to do, yeah, that's very
32:11
scary. And I think if that was the world we knew we were going
32:13
to be in, I would still be totally full-time on AI
32:16
and still feel that we had so much work to do and we
32:18
were so not ready for what was coming.
32:20
Certainly, there's the fact that because
32:23
of the speed at which things move, you could
32:25
end up with whoever kind of leads the way
32:27
on AI or is least cautious having
32:30
a lot of power. And that could be someone really bad.
32:33
And I don't think we should assume that just because
32:35
that if you had some head of state
32:37
that has really bad values, I don't think we
32:39
should assume that that person is going to end up being nice
32:42
after they become wealthy or powerful
32:44
or transhuman or mind uploaded
32:46
or whatever. I don't think there's really any reason
32:48
to think we should assume that. And then I think
32:50
there's just a bunch of other things that if things are moving fast,
32:53
we could end up in a really bad state. Like, are
32:55
we going to come up with decent frameworks
32:58
for making sure that digital
33:01
minds are not mistreated? Are we going to come up with
33:03
decent frameworks for kind of like how
33:06
to ensure that as we get the ability to
33:08
create whatever minds we want, we're using that
33:10
to create minds and help us seek the truth
33:12
instead of create minds that have whatever beliefs
33:14
we want them to have stick to those beliefs
33:17
and try to shape the world around those beliefs. I
33:19
think Carl Schulman put it as are
33:21
we going to have AI that makes us wiser
33:23
or more powerfully insane? So
33:25
I think there's just a lot. I
33:27
think we're kind of on the cusp of something that
33:30
is just potentially really big, really
33:32
world changing, really transformative and going to
33:34
move way too fast. And I think even
33:36
if we threw out the misalignment problem, we'd have a lot of work
33:38
to do. And I think a lot of these issues are actually not getting
33:40
enough attention.
33:41
Yeah. I think some of that might be going on there
33:44
as a bit of equivocation in the
33:46
word alignment. So you can imagine some
33:48
people might mean by creating an aligned
33:50
AI, it's like an AI that kind of goes and does what you tell
33:52
it to like a good employee or something. Whereas
33:54
other people mean it is following
33:56
the correct ideal values and behaviors and
33:58
is going to work to generate. generate the best outcome.
34:01
And these are really quite separate
34:03
things, although very far apart. Yeah.
34:06
Well, the second one, I don't
34:08
even know if that's a thing. I don't even
34:10
really know what it's supposed to be. I mean, there's something a little
34:12
bit in between, which is like,
34:14
you can have an AI that you ask
34:16
it to do something, and it does what you would
34:18
have told it to do if you had been more informed
34:20
and if you knew everything it knows. That's the
34:23
central idea of alignment that I tend to think of, but
34:25
I think that still has all the problems I'm talking about. Some
34:28
humans seriously do intend
34:31
to do things that are really nasty and seriously
34:33
do not intend in any way, even if they knew more,
34:35
to make the world as nice as we would like it to be. And
34:38
some humans really do intend and
34:40
really do mean and really will want to
34:42
say, right now I have these values. Let's
34:44
say this is the religion I follow. This is what I
34:46
believe in. This is what I care about. And I am creating
34:49
an AI to help me promote that religion, not to help me question
34:51
it or revise it or make it better. So yeah, I think it's
34:53
that middle one, I think it does not make it safe. There
34:55
might be some extreme version that's like,
34:57
an AI that just figures out what's objectively best
34:59
for the world and does that or something. And I'm just
35:01
like, I don't know why I would, I don't know why you would think that would even be a
35:03
thing to aim for. That's not the alignment problem that I'm interested
35:06
in having solved, yeah. Yeah. Okay,
35:08
what's something that some kind of safety
35:11
focused folks that you potentially collaborate
35:13
with or at least talk to, but they think
35:15
that they know, which you think in fact, we
35:18
just, nobody knows.
35:20
Yeah, I mean, I think in general, there's
35:23
this kind of question in deep
35:25
learning of, you train an agent
35:27
on one distribution of data or
35:30
reward signals or whatever. And now you're
35:32
wondering when it goes out of distribution,
35:34
when it hits a new kind of environment
35:36
or a new set of data, how it's gonna react to that. So
35:38
this would be like, how does an AI generalize
35:41
from training to out of distribution?
35:44
And I think in general, people have
35:47
a lot of trouble understanding this and
35:49
have a lot of trouble predicting this. And I think that's not
35:51
controversial. I think that's known, but I think
35:53
it kind of comes down to, or it
35:55
relates to some things where people do seem overly
35:57
confident. A lot of what people are doing right
35:59
now. with these AI models is they're doing what's called reinforcement
36:02
learning on human feedback or from human
36:04
feedback, where the basic idea is you
36:06
have an AI that tries something and then
36:08
a human says, that was great or that wasn't so
36:10
good, or maybe the AI tries two
36:13
things, the human says which one was better. And
36:16
if you do that, and that's a major way
36:18
you're training your AI system, there's
36:20
this question of what do you get as a result
36:23
of that? Do you get an AI system
36:25
that is actually doing what humans want
36:28
it to do? Do you get an AI system that's doing what humans
36:30
would want it to do if they kind of knew all the facts?
36:33
Do you get an AI system that is like tricking
36:35
humans into thinking it did what they wanted it
36:37
to do? Do you get an AI system that's sort of trying
36:39
to maximize paperclips? And one way to do that is to
36:41
do what humans wanted to do. So as far as you can tell, it's
36:43
doing what you want it to do, but it's actually trying to maximize paperclips.
36:46
Like, which of those do you get? And I think just like
36:48
people don't, like we don't know, and
36:50
I see overconfidence on both sides
36:52
of this. I think I see people saying, we're
36:54
going to basically train this thing to
36:56
do nice things and it'll keep doing nice things as
36:59
it operates in an increasingly changing world. And
37:01
then I see people saying, we're going to train AI to do nice
37:03
things. And it will basically pick up on
37:05
some weird correlate of our training
37:08
and try to maximize that and will
37:10
not actually be nice. And I'm just like, geez,
37:13
we don't know. We don't know. And
37:15
there's arguments that say, oh, wouldn't it be weird if it
37:17
came out doing exactly what we wanted it to do? Because
37:19
there's this wide space of other things
37:21
that could generalize to it. I just think those arguments
37:23
are just kind of weak and they're not very well fleshed out. There's
37:26
genuinely just a ton of vagueness
37:28
and not good understanding of what's
37:31
going on in a neural net and how
37:33
it generalizes from this kind of training. So
37:35
the upshot of that is I think people are
37:38
often just overconfident that AI
37:40
alignment is going to be easier hard. I think there's people
37:42
who think, we basically got the right framework, we
37:44
ought to debug it. And there's people who
37:46
think this framework is doomed, it's not going to work, we need something better. And
37:49
I just don't think either is right. I think if
37:51
we just go on the default course and we
37:53
just kind of train AIs based
37:55
on what looks nice to us, that could totally
37:57
go fine and it could totally go disastrously.
37:59
No, weirdly few people who
38:02
believe both those things, a lot of people
38:04
seem to be overconfidently in one camp or
38:06
the other on that. Yeah, I'm completely with you
38:08
on this one. I think it's one of the things that I've started to
38:10
believe more over the last six months. No
38:13
one really has super strong arguments for what kind
38:15
of motivational architecture these ML
38:18
systems are going to develop. Yeah. Well,
38:20
I suppose that's maybe an improvement relative to where it was before, because
38:22
I was a little bit more on the Duma side before.
38:26
I feel like this one, there should be some empirical way
38:28
of investigating this. People do have
38:30
good points that a really super intelligent
38:33
and incredibly sneaky model will behave exactly
38:35
the same regardless of its underlying
38:37
motive. But you could try to investigate this on less
38:39
intelligent, less crafty models, and surely you would
38:42
be able to get some insight into the way it's thinking
38:44
and how its motives actually cash
38:46
out. I
38:47
think it's really hard. It's
38:50
just like really, I mean, it's really
38:52
hard to make generalized statements about how
38:55
an AI generalizes to weird new
38:57
situations. And
38:59
yeah, there is work going on trying
39:01
to understand this, but it's going to be, it's just
39:04
been hard to get anything that feels satisfyingly analogous
39:06
to the problem we care about right now with AI
39:08
systems and their current capabilities. And even once
39:10
we do, I think there'll be plenty of arguments that are just like,
39:12
well, once the AI systems are more capable than that, everything's
39:15
going to change. And AI will generalize
39:17
differently when it understands who we
39:19
are and what our training is and how that works
39:21
and how the world works. And it understands
39:23
that it could take over the world if it wanted to. That
39:26
actually could cause an AI to generalize differently.
39:28
So as an example, this is something
39:30
I've written about on ColdTakes. I call it the King
39:33
Lear problem. So King Lear is a Shakespearean
39:35
character who kind of has three daughters
39:37
and he asks them each to describe their love for him. And
39:40
then he kind of like hands the kingdom over to the ones
39:42
that he feels good about after hearing their speeches and
39:44
he just picks wrong. And then that's too bad
39:47
for him. And the issue is
39:50
it's like it flips on a dime. It's like the two
39:52
daughters who are like the more evil ones were
39:54
doing a much better job pretending they loved him
39:56
a ton because they knew that they
39:58
didn't have power yet. power later. So
40:00
it actually like their behavior depended
40:03
on their calculation of what was going to happen. And
40:05
so the analogy to AIs, it's kind of like you
40:07
might have an AI system that's like kind of maybe
40:10
what its motivational system is, is it's
40:12
trying to maximize
40:14
the number of humans that are saying, hey, good job,
40:17
this obviously bit of a simplification or dramatization.
40:20
And it kind of is understanding at all
40:22
points, that if it could
40:25
take over the world, enslave all the humans,
40:27
make a bunch of clones of them and like run
40:29
them all in loops saying good job. If it could,
40:32
then it would and it should. But
40:34
if it can't, then maybe it should just cooperate
40:36
with us and be nice. You can have an AI system
40:38
that's like running that whole calc and humans often run
40:40
that whole calc like right as a kid
40:43
in school, I might often be thinking, well, you
40:45
know, if I can get away with this, then this is
40:47
what I want to do. If I can't get away with this, maybe
40:49
I'll just do what the teacher wants me to do. So you could
40:51
have the eyes with that whole motivational system. And then it's
40:53
like, cool. So now it's like you put them in a test
40:55
environment and you test if they're going to be nice
40:58
or try and take over the world. But in the test
41:00
environment, they can't take over the world, so they're going to be nice.
41:03
Now you're like, great, this thing is safe. You put it out in the world.
41:05
Now there's a zillion of it running around. Well, now it can take over the
41:07
world. So now it's going to behave differently. So you can have
41:10
just like one consistent motivational system
41:12
that is fiendishly hard to do a test
41:15
of how that system generalizes when it has power because
41:17
you can't generalize. You can't test what happens
41:20
when it's no longer a test.
41:22
What's the view that's common among ML researchers,
41:24
which you would you disagree with?
41:26
You know, it depends a little bit with ML researchers
41:28
for sure. I would definitely
41:31
say that I've been a big, bitter lesson
41:33
person since at least 2017. And,
41:36
you know, I got a lot of this from just like
41:38
Dario Amade, my wife's brother
41:40
who is CEO of Anthropic and I think
41:43
has been like very insightful. A lot
41:45
of what's gone on in the AI over the last few years is just
41:47
like bigger models,
41:49
more data, more training. And there's
41:53
an essay called The Bitter Lesson by an ML researcher,
41:55
Rich Sutton, that just kind of says, you know, ML
41:57
researchers keep coming up with clever and clever ways
42:00
to design AI systems and then those
42:03
cleverness is keep getting obsolete it by
42:05
just making the things bigger and just
42:07
like training them more and putting in more
42:09
data and so you know I've had a lot
42:11
of arguments over the last few years and
42:13
you know in general I have heard people arguing with
42:16
each other that are just kind of like on one
42:18
side it's like well today's AI systems
42:20
can do some cool things but they'll never be able to do this
42:22
and to do this like maybe that's reasoning
42:25
creativity you know something like that we're
42:27
gonna need a whole new approach to AI and
42:29
then the other side will say no I think we just need to make them
42:31
bigger and then they'll be able to do this I tend
42:35
to almost entirely toward that toward
42:37
that just make a bigger view I think just at
42:39
least in the limit if you if you took
42:42
an AI system and made it really big you
42:44
might need to make some tweaks but the tweaks wouldn't
42:46
necessarily be like really hard or
42:48
require giant conceptual breakthroughs I
42:50
do tend to think that that whatever it is
42:52
humans can do we could probably eventually
42:54
get an AI to do it and eventually it's not gonna
42:56
be a very fancy AI it could be just like a
42:59
very simple AI with some easy
43:01
to articulate stuff and a lot of the challenge come
43:03
from making it really big putting in a lot of data I think
43:06
this view has become like more popular over
43:08
the years than it used to be but it's still like pretty
43:11
debated I think a lot of people are still
43:13
looking at today's models and saying hey there's
43:15
fundamental limitations we're gonna need a whole new approach
43:17
to AI before they can do extra why
43:19
I'm just kind of out on that I
43:22
I think it's possible I'm not confident this is just
43:24
like where my instinct tends to lie
43:26
that's a disagreement I think another disagreement I have
43:29
with with some ML researchers I think not
43:31
all at all but but there's sometimes just
43:33
I feel like a background sense that
43:35
just like
43:36
sharing openly information
43:39
publishing
43:40
open sourcing etc it's just
43:42
like good that it's kind of it's kind of bad
43:44
to do research and keep it secret and it's good
43:46
to do research like publish it and
43:49
I you know I don't I don't feel this way
43:51
I think the things we're building could be very
43:54
dangerous at some point and I think that point
43:56
can come a lot more quickly than anyone is expecting
43:58
I think when that point some
44:00
of the open source stuff we have could
44:02
be used by bad actors in
44:05
conjunction with later insights to create
44:07
very powerful AI systems that
44:10
we aren't thinking of in ways we aren't thinking of right now,
44:12
but we won't be able to take back later. And
44:14
in general, I do tend to think that academia
44:17
has kind of this idea that sharing information
44:19
is good built into its fundamental
44:21
ethos, and that might often be true. But I think
44:23
there's times when it's kind of clearly false and academia
44:26
still kind of pushes it. You know, gain of function
44:28
research being like kind
44:29
of an example for me, where just like people are
44:32
very, very into the idea of like
44:34
making a virus more deadly and publishing
44:36
how to do it. And I think this is just an example
44:38
of where just culturally there's some
44:41
background assumptions about information
44:43
sharing that I just think the world is more complicated
44:45
than that.
44:46
Yeah, I definitely encountered people from time
44:48
to time who are, they
44:49
have this very strong prior this very
44:52
strong assumption that everything should be open and people
44:54
should have access to everything and then I'm like, what if someone
44:56
was designing a hydrogen bomb that you could make with,
44:58
you know, equipment that you could get from your house? I'm just
45:01
like, I don't think that it should be open. I think we should
45:03
probably stop them from doing that. Yeah, yeah,
45:05
yeah. And certainly if they figure
45:07
it out, we shouldn't publish it. Yeah, yeah.
45:09
I suppose it's just that that's a sufficiently rare case
45:11
that
45:12
it's very natural to develop the intuition in
45:14
favor of openness from the from the 99 out of 100 cases
45:16
where that's not too unreasonable. Yeah,
45:18
I think it's usually reasonable. But I think I think bioweapons
45:20
is just like a great counter example, or just like, it's
45:23
not really balanced. It's not really like,
45:25
well, for everyone who, you
45:27
know, who tries to design or release some horrible
45:29
pandemic, we can have someone else using open
45:31
source information to design a countermeasure. Like, that's
45:34
not actually how that works. And so, yeah, I
45:36
think this attitude at least needs to be complicated
45:38
a little bit more than it is.
45:40
Yeah. What's something that listeners might
45:42
expect you to believe, but which you actually don't? Yeah,
45:45
I don't really know what people think, I think, but some
45:48
things that sometimes I kind of pick up. I
45:50
mean, I write a lot about the future. I do a lot
45:52
about, you know, a lot of stuff about, well, as
45:55
coming, we should like prepare and do this and don't do that. I
45:57
think a lot of people think that I think I have
45:59
like this great ability to predict the future
46:02
and that I can spell it out in detail and count on it.
46:04
And I think a lot of people think I'm like underestimating
46:07
the difficulty of predicting anything. And
46:10
you know, I think I may in fact be underestimating
46:12
it, but I think I do feel a lot
46:15
of gosh, it is so hard
46:17
to, you know, be even a decade ahead or five
46:19
years ahead of what's going to happen. It
46:22
is so hard to get that right and enough detail to be helpful.
46:24
A lot of times you can get the broad outlines of something, but
46:27
to really be helpful seems really hard. Even on COVID,
46:29
it's like I feel
46:29
like a lot of the people who saw it coming in advance
46:32
weren't necessarily able to do much to make things
46:34
better. And that someone includes open philanthropy.
46:36
And we had a biosecurity program for years before
46:39
COVID. And I think there was some helpfulness
46:41
that came of it, but not as much as there could have been.
46:43
And so, you know, I think in general, I just
46:45
like, I don't know, a lot of how I am is I'm just like,
46:48
pretty the future is really hard. Getting out of the
46:50
head of the future is really hard. I'd really like rather never
46:52
do it. I think in many ways I would
46:54
just like rather work on stuff like GiveWell
46:57
and global health charities and animal welfare and just
46:59
like adapt to things as they happen
47:01
and not try and get out ahead of things. And there's
47:03
just like a small handful of issues that
47:05
I think are important enough that and may
47:07
move quickly enough that we just we just have
47:09
to do our best. I think we can I don't
47:12
think we should feel this is totally hopeless. I think I think
47:14
we can in fact do some good by
47:17
getting out ahead of things and planning in advance. And
47:20
a lot of my feelings we got to do the best we can more
47:22
than hey, I know what's coming.
47:24
Yeah. Okay. And
47:26
the other one is what's something you expect quite a lot of listeners might
47:28
believe, which which would you think you'd be
47:30
happy to disabuse them of?
47:32
Yeah, there's a line that you've
47:34
probably heard before that is
47:37
something like this. It's something like most
47:39
of the people we can help are in future generations.
47:41
And there are so many people in future generations
47:44
that that kind of just ends the conversation
47:46
about how to do the most good that it's clearly
47:48
an astronomically the case that focusing
47:51
on future generations dominates all
47:53
ethical considerations or at least dominates all
47:55
considerations of like how to do the most good
47:57
with your philanthropy or your career. I kind
47:59
of think of this is like philosophical long-termism
48:01
or philosophy first long-termism that's very you
48:04
know kind of feels like you've ended the argument after you've
48:06
pointed to the number of people in future generations
48:09
and we can get to this a bit later in the interview I don't
48:11
I don't think it's a garbage view I give some credence
48:13
to it I think it's somewhat seriously and I think it's like
48:16
underrated by the world as a whole but
48:18
I would I would say that I give a minority
48:20
of my moral parliament to thinking this way I would
48:23
say like more of me than not thinks
48:26
that's not really the right way to think about doing good
48:28
that's not really the right way to think about
48:29
ethics and I don't think we can trust these
48:32
numbers enough to feel that it's such a blowout
48:35
and the reason that I'm currently focused
48:37
on what's classically considered long-term is causes
48:39
especially AI is that I believe
48:41
the risks are imminent and real enough
48:43
that you know even with much less aggressive
48:46
valuations of the future they are you know
48:48
competitive or perhaps the best thing to work on
48:50
another random thing I think is that if you if you
48:53
really want to play the game of just like being all about
48:55
the big numbers and only thinking about the populations
48:58
that are the biggest that you can help future generations
49:00
are like extremely tiny compared
49:02
to you know persons you might
49:04
be able to help through a causal interactions with
49:06
other parts of the multiverse outside our light cone I
49:09
don't know if you want to get to that or just refer people back to
49:11
Joe's episode on that but that's that's more of a
49:13
nitpick on that take yeah people can go
49:15
back to listen to the episode with Joe Car Smith if I'd
49:17
like to understand what we just said there let's
49:20
come back to AI
49:20
now I think I want to spend quite a lot of time
49:23
basically understanding what you think different actors
49:25
should be doing to you know governments AI labs you know our
49:28
listeners so what different ways that they might be able to contribute
49:30
to improving our odds here but
49:32
maybe before we do that it might could be worth talking about like
49:34
trying to envisage scenarios in which things
49:36
go relatively relatively well you've
49:39
argued that you're very unsure how things are gonna play out
49:41
but it's possible that we might model
49:44
through and get a reasonably good outcome even
49:46
if we basically carry on doing the fairly
49:48
reckless things things that we're doing right now
49:51
not because you're recommending that we take that path right but
49:53
rather just because it's it's relevant to know whether
49:55
we're just far off like completely far off
49:57
the possibility of any good outcome given Now,
50:00
what do you see as the value of laying out positive
50:02
stories or ways that things might go well?
50:04
Yeah, so I've written a few pieces that
50:06
are kind of laying out, here's an excessively
50:09
specific story about how the
50:11
future might go that ends happily with
50:14
respect to AI, that ends with kind of, you know,
50:17
we have AIs that didn't develop or didn't
50:19
act on goals of their own enough to disempower
50:22
humanity. And then we kind of ended up with this
50:24
world where we're all, the world is getting more capable
50:26
and getting better over time and none of the various disasters
50:28
we're sketching out happened. I've written
50:30
like three different stories like that. And then one story,
50:33
the opposite, how we could stumble
50:35
into AI catastrophe where things just go
50:37
really badly. Why have I
50:39
written these stories in general? You know,
50:41
I think it's not that I believe these stories,
50:43
it's not that I think this is what's going to happen. But I
50:45
think a lot of times when you're
50:48
thinking about general principles
50:50
of like what we should be doing today to reduce risks from
50:52
AI, it is often helpful,
50:54
just like my brain works better imagining specifics.
50:57
And I think it's often helpful to kind of imagine
50:59
some specifics and then extract back from
51:02
the specifics to general points and
51:04
see if you still believe them. So for example,
51:06
I've actually done, I mean, these are the ones I've published,
51:08
but I've done a lot of thinking of what are
51:10
different ways the future can go well. And it's
51:12
like there are some themes in them. It's like there's almost
51:15
there's almost no story of the future
51:17
going well, that doesn't have like
51:19
a part that's like, and no evil
51:21
person steals the AI weights and
51:24
goes and does evil stuff. And so, you
51:26
know, it has highlighted, I think, I think the importance
51:28
of security, the importance of just like information
51:30
security, just like you're training a powerful AI
51:33
system, you should make it hard for someone to steal
51:35
has like popped out to me as a thing that just
51:37
like keeps coming up in these stories, keeps
51:39
being present. It's hard to tell a story where it's not
51:42
a factor. It's easy to tell a story where it is a factor.
51:44
You know, another factor that has come up for me is just like,
51:46
there needs to be
51:47
some kind of way
51:49
of stopping or disempowering
51:52
dangerous AI systems. You can't just
51:54
build safe ones. Or like if you build the safe
51:57
ones, you have to somehow use them to help you stop
51:59
the dangerous because eventually people will build
52:01
dangerous ones. And I think the most promising
52:03
general framework that I've heard for
52:06
doing this is this idea of a kind
52:08
of evals-based regime where you test
52:11
to see if AIs are dangerous. And based
52:13
on the tests, you kind of have the world
52:15
coming together to stop them or you don't. And
52:17
I think even in a world where you have very powerful, safe
52:20
AI systems, you still need some kind of, probably,
52:22
you still need some kind of regulatory framework for
52:25
how to use those to use force to stop
52:27
other systems. And so these are
52:29
general factors that I think it's a little bit like
52:31
I think it's how some people might do math by imagining
52:34
a concrete example of a mathematical object,
52:36
seeing what they notice about it, and then abstracting
52:39
back from there to the principles. That's what I'm doing with
52:41
a lot of these stories. I'm just like, can I tell
52:43
a story specific enough that it's not obviously crazy?
52:45
And then can I see what themes there
52:48
are in these stories and which things I robustly
52:50
believe after coming back to reality? That's
52:52
a general reason for writing stories like that. The
52:54
specific story you're referring to, I wrote a post
52:57
on last round called Success Without Dignity,
52:59
which is
52:59
kind of a response to Eliezer Kowalski writing
53:02
a piece called, I think it was called Death With Dignity.
53:04
Yeah, we should possibly explain that idea. I think some
53:07
people have become so pessimistic about our prospects
53:10
of actually
53:11
avoiding going extinct, basically, because they think
53:13
this problem is just so difficult. But they've said, well,
53:15
really, the best we can do is to not make
53:17
fools of ourselves in the process of going extinct,
53:19
that we should at least cause our own extinction
53:21
in some way that's like barely respectable, if I
53:23
guess aliens were to read the story or to uncover
53:26
what we did. And they call this kind of dignity
53:29
or death with dignity. Death with dignity. Sorry, anyway. Yeah.
53:31
And to be clear, the idea there is not
53:33
like we're literally just trying to have
53:35
dignity. It's like the idea is like, that's a proximate
53:38
thing you can optimize for that actually increases
53:40
your odds of success the most
53:41
or something. Yeah, and my response- For
53:43
many people, that's a little bit tongue in cheek as well. Yeah, yeah,
53:45
yeah, for sure. And my response though
53:47
is a piece called, Success Without Dignity, that's just
53:49
kind of like, well, I don't know, it's just actually
53:51
pretty easy to picture a world where we just like,
53:54
we just like do everything wrong. And like,
53:56
there's no real positive surprises from here.
53:59
At least not-
53:59
in terms of like people who are deliberately
54:02
trying in advance to reduce AI x-risk, like
54:04
there's no big breakthroughs on AI alignment, there's
54:06
no like real happy news, just a lot of stuff just happens
54:08
normally and happens on the path it's on, and then
54:11
we're fine, and why are we fine? Well, we basically
54:13
got lucky, and I'm like, can you tell a story like
54:15
that? And I'm like, yeah, I think I can tell a story like that,
54:17
and why does that matter? I think it matters because
54:19
it's, um, I think a number of people have
54:21
this feeling with AI that they're just like, we're
54:23
screwed by default, we're gonna have to get like 10
54:25
really different hard things all right,
54:28
or we're totally screwed, and so therefore we
54:30
should be trying like really crazy
54:32
swing for the fences stuff and forgetting about
54:34
interventions that like help us a little bit. And
54:37
yeah, and I have the opposite take, I just think like, look,
54:39
if nothing further happens, there's some
54:41
chance that we're just fine, basically by luck, so
54:44
we shouldn't be doing like overly crazy things to increase
54:46
our variance if they're not, you know, if they're
54:48
not like kind of highly positive and expected value,
54:51
and then I also think like, yeah, things that just
54:53
like help a little bit, like, yeah, those are good.
54:55
They're good at face value, they're good in the way you'd
54:57
expect them to be good, they're not like
54:59
worthless because they're not enough, and
55:02
so I think, you know, things like just working harder
55:04
with AI systems
55:06
to get the reinforcement AI systems
55:08
are getting to be accurate, so just that, you know, this idea
55:10
of accurate reinforcement where you're not
55:12
rewarding AI systems specifically
55:15
for doing bad stuff, you know,
55:17
that's a thing you can kind of like get more right or get
55:19
more wrong, and doing more attempts to do that is kind
55:22
of basic, and it's not, it doesn't involve
55:24
like clever re-thinkings of what cognition means
55:26
and what alignment means and how we get a perfectly aligned
55:28
AI, but I think it's a thing that
55:30
could matter a lot, and that putting more effort into
55:32
could matter a lot, and I feel that way too, about improving
55:35
information security, like you don't have to make
55:37
your AI impossible to steal, make it hard to steal is worth
55:39
a lot, you know, so there's a lot of just
55:41
generally a lot of things I think people can do to reduce AI
55:43
risks that don't rely on a complicated
55:45
picture. It's just like this thing helps, so just do it because
55:47
it helps. Yeah,
55:49
we might go over those interventions in just a
55:51
second. Yeah, maybe it's possible to like flesh out the story
55:53
a little bit, like yeah, how could we get a
55:55
good outcome? Mostly through luck.
55:58
So I broke these... success
56:00
without dignity idea into
56:02
a couple phases. So there's the initial
56:05
alignment problem, which is the, you know, the thing
56:07
most people I think in the time of Doom or Headspace
56:09
tend to think about, which is how do we,
56:11
how do we build a very powerful AI system that
56:14
is not trying to take over the world or disempower
56:16
our humanity or kill all humanity or whatever. And
56:18
so there, I think if you are training
56:21
systems that are
56:22
human, I call them human level ish. So
56:25
an AI system that's like got kind
56:27
of similar range capabilities to human,
56:29
it's going to have some strengths and some weaknesses relative
56:31
to a human. If you're training that kind of system,
56:34
I think that
56:35
you may just get systems that are pretty
56:37
safe, at least for the moment, without
56:39
a ton of breakthroughs or special
56:42
work. You might get it by pure
56:44
luck ish. So it's basically like this thing I
56:46
said before about how, you know, you have an AI system
56:48
and you train it by basically saying
56:50
good job or bad job when it does something. It's
56:52
like human, human feedback for a human
56:55
level ish system that could easily result
56:57
in a system that like
56:58
either it really did generalize to doing
57:00
what you meant it to do or it generalized
57:03
to this like thing where it's like
57:05
trying to take over the world. But that
57:07
means cooperating with you because whenever it's
57:09
too weak to take over the world. And in fact, these human level
57:11
ish systems are going to be like too weak to take over the world. So they're
57:13
just going to cooperate with you. It could mean that.
57:16
So you could get like either two of those generalizations.
57:18
And then like,
57:20
it does matter if you're like I just
57:22
said, if your reinforcement is accurate. So
57:24
you could kind of like have an AI system where you say,
57:26
hey, go make me a bunch of money. And then
57:28
unbeknownst to you, it goes and like breaks
57:31
a bunch of laws and hacks into a bunch of stuff
57:33
and brings you back some money or even like fakes
57:35
that you have a bunch of money and then you say good job. Now
57:38
you've actually rewarded it for doing bad stuff. But
57:40
if you can take that out, if you can basically avoid
57:42
doing that and have your have your
57:45
kind of like good job when it actually did
57:47
a good job, that I think increases the
57:49
chances that that it's going to generalize
57:51
to basically just doing doing a good
57:53
job or at least doing what we roughly intended and
57:56
not kind of pursuing goals of its
57:58
own, if only because that wouldn't work. And so I
58:00
think you could, you know, a lot of this is to say,
58:02
you could solve the initial alignment problem by almost
58:05
pure luck, by this kind of reinforcement learning
58:07
from human feedback, generalizing well. You could
58:09
add a little effort on top of that and make it more
58:11
likely, like getting your reinforcement more accurate.
58:13
There's some other stuff you could do in addition
58:15
to kind of catch some of the failure modes and
58:18
straighten them out, like red teaming
58:20
and simple checks and balances. I won't go into the details
58:22
of that. And if you get some combination
58:24
of luck and skill here, you end up with
58:27
AI systems that are roughly human
58:29
level, not immediately
58:31
dangerous anyway. Sometimes they call them
58:33
jinkily aligned. It's like they are
58:35
not trying to kill you at this moment. That doesn't mean you
58:37
solve the alignment problem. But
58:40
at this moment, they are approximately trying to help you.
58:42
Maybe if they can all coordinate and kill you, they
58:45
would, but they can't. Remember, they're kind of human-like.
58:47
So that's the initial alignment problem. And
58:49
then once you get past that,
58:51
then I think we should all just
58:53
forget about the idea that we have any idea what's gonna
58:55
happen next. Because now you
58:59
have a huge, potentially huge number
59:01
of human-level-ish AIs. And
59:03
that is just incredibly world-changing. And
59:06
there's this idea of, that
59:08
I think sometimes some people call it, getting
59:10
the AIs to do our alignment homework for us. So
59:13
it's this idea that once you have human-level-ish
59:15
AI systems, you have them kind of working
59:17
on the alignment problem in huge numbers.
59:20
And it's like, in some ways I hate this idea because
59:22
it's just very lazy and it just is like, oh
59:24
yeah, we're not gonna solve this problem until later when the
59:26
world is totally crazy and everything's moving really
59:28
fast and we have no idea what's gonna happen. So I hate
59:30
the idea in that sense. We'll just ask the
59:32
agents that we don't trust to make themselves trustworthy.
59:35
Yeah, exactly. So there's a lot to hate about this
59:37
idea, but heck, it could work.
59:40
It really could. Because you could have a situation
59:43
where just in a few months, you're able
59:45
to do the equivalent of thousands of years of
59:48
humans doing alignment research. And if these
59:50
systems are just not at the point where they can
59:52
or want to
59:53
screw you up, that really
59:55
could do it. I mean, we just don't know that like
59:57
thousands of years of human levelish alignment research isn't
59:59
enough.
59:59
to just like get us a real solution.
1:00:02
And so that's kind of how you get through
1:00:04
a lot of it. And then you still have another problem in
1:00:07
a sense, which is that you do
1:00:09
need a way to stop dangerous systems. It's
1:00:11
not enough to have safe AI systems. But
1:00:13
again, you have help from this giant automated
1:00:15
workforce. And so in addition to
1:00:18
coming up with ways to make your system safe, you can come up with ways
1:00:20
of showing that they're dangerous and when they're
1:00:22
dangerous and being persuasive about the importance
1:00:24
of the danger. And that again, feels like something
1:00:26
that like, I don't know, I feel like if we had 100
1:00:29
years before AGI right now, there'd be a good chance that
1:00:31
normal flesh and blood humans could pull this off. So
1:00:34
in that world, there's a good chance that an automated workforce
1:00:36
can cause it to happen pretty quickly. And you
1:00:38
could pretty quickly get, get
1:00:40
understanding of the risks agreement that we need to stop
1:00:43
them. And you have more safe
1:00:45
AI's than dangerous AI's and you're trying to stop the dangerous
1:00:47
AI's and you're measuring the dangerous AI's or
1:00:50
you're stopping any AI that refuses to be measured or who's
1:00:52
developer refuses to measure it. And so then
1:00:54
you have a world that's kind of like this one, where like, yeah,
1:00:56
there's a lot of evil people out there, but there's, but they
1:00:59
are generally just kept
1:00:59
in check by being outnumbered by people who
1:01:02
are at least law abiding, if not incredibly angelic.
1:01:04
So you get a world that looks like this one, but it just
1:01:06
has a lot of like AI's running around in it. And
1:01:09
so we have like a lot of progress in science and technology
1:01:11
and that's, that's a fine ending potentially.
1:01:14
Okay, so that's one flavor
1:01:16
of story. Is there any other broad themes
1:01:18
in the positive stories that it could be worth bringing
1:01:20
out before we move on?
1:01:21
I think I mostly covered it. I mean, the other
1:01:23
two stories involve less luck
1:01:26
and more like you have one or two actors
1:01:28
that just do a great job. Like, you know, you have one
1:01:30
AI lab that it's just ahead of everyone else and
1:01:33
it's just like doing everything right. And that improves
1:01:35
your odds a ton. You know, for a lot
1:01:37
of this reason that like being a few months ahead could
1:01:39
mean you have like, you know, a lot
1:01:42
of subjective time of having your automated workforce
1:01:44
do stuff to be helpful. And so there's one
1:01:46
of those with like a really fast takeoff and one of them with a
1:01:48
more gradual takeoff.
1:01:50
But I think, you know, I think that does, that does kind of highlight
1:01:52
again that like one really good
1:01:54
actor who's like really successful could
1:01:56
move the needle a lot even when you get less
1:01:58
luck. So I think there's...
1:01:59
There's a lot of ways things could go well. There's a lot
1:02:02
of ways things could go poorly. I feel like
1:02:04
I'm saying just like
1:02:05
really silly, obvious stuff now that just should
1:02:07
be everyone's starting point, but I do
1:02:09
think it's not where most people are at right now. I think these risks
1:02:11
are extremely serious. They're kind of my
1:02:14
top priority to work on. I think
1:02:16
anyone's saying we're definitely gonna be fine. I don't know where the heck
1:02:18
they're coming from, but anyone's saying we're definitely doomed. I
1:02:20
don't know, same issue.
1:02:21
Okay,
1:02:23
so the key components of the story here was, so
1:02:25
one, you didn't get bad people stealing the models
1:02:27
and misusing them really early on. Or
1:02:30
there was some limit to that and they were outnumbered or
1:02:32
something like that, yeah. Okay, and
1:02:34
initially we ended up training models that are
1:02:36
more like human level intelligence and it turns out to
1:02:38
not be so challenging to have
1:02:41
moderately aligned models like that. And
1:02:43
then we also managed to turn those AIs, those
1:02:47
folks I guess, I'm gonna say it, towards
1:02:50
the effort of figuring out how to align additional
1:02:52
models that would be more capable still.
1:02:54
And or how to slow things down and put in
1:02:56
a regulatory regime that stops things if we don't know
1:02:58
how to make safe, yeah. Right, okay, or they
1:03:00
help with, yeah, they help with a bunch of other governance
1:03:02
issues, for example, and then also by
1:03:04
the time these models have proliferated and
1:03:07
they might be getting used by irresponsibly
1:03:10
or by bad actors, those folks are just
1:03:12
massively outnumbered as they are today by people
1:03:15
who are largely sensible. Okay, so
1:03:17
I'm really undecided on
1:03:19
how plausible these stories are. I guess
1:03:22
I
1:03:22
play some weight or some credence
1:03:25
on the possibility that pessimists like Eliezer,
1:03:27
Yudkowsky are right and that this kind of thing
1:03:29
couldn't happen for one reason or another. And
1:03:32
we really would. I think that's possible, yeah. Yeah, I
1:03:34
guess, what do you make of the argument
1:03:36
that, let's say we're 50-50, short
1:03:38
between kind of the Eliezer worldview and
1:03:40
the Holden worldview just outlined, that in
1:03:43
the case of Eliezer's right, we're kind of just screwed, right?
1:03:45
And things that we do on the margin, a bit of extra
1:03:47
work here and there, just isn't gonna change the story.
1:03:50
Basically, we're just gonna go extinct with very
1:03:52
high probability.
1:03:52
Whereas if you're right, then
1:03:55
things that we do might actually move the needle and we
1:03:57
have a decent shot. So it makes more sense to act.
1:03:59
as if we have a chance, as if some
1:04:02
of this stuff might work, because our decisions
1:04:04
and our actions just aren't super relevant
1:04:07
in the pessimist case. Does that sound like a sensible
1:04:09
reasoning? I mean, it seems a little bit suspicious
1:04:12
somehow. I think it's a little bit suspicious. I
1:04:14
mean, I think it's fine if it's 50-50. I think
1:04:16
Eliezer has complained about this. He's kind of said, you
1:04:18
know, look, you can't condition
1:04:21
on a world that's fake, and you should live
1:04:23
in the world you think you're in. I think that's right. So
1:04:26
I should say, I think I do want to say a couple meta
1:04:28
things about the success of that dignity story.
1:04:29
One is, I do want people to know, this is
1:04:32
not like a thing I cooked up. This is, you
1:04:34
know, I think of my job, I'm not an AI
1:04:36
expert, I think of my job being especially
1:04:38
a person who's generally been a funder, having
1:04:40
access to a lot of people, having to make a lot of people judgments.
1:04:43
My job is really to figure out who to listen
1:04:45
to about what and how much weight to give whom about what.
1:04:48
So I'm getting the actual substance
1:04:50
here from people like Paul Chris Giano,
1:04:52
Carl Schulman, and others, and this
1:04:54
is not holding reasoning things out and being like, this
1:04:56
is how it's going to be. This is me synthesizing,
1:04:59
hearing from a lot of different people, a lot of them
1:05:01
highly technical, a lot of them experts, and
1:05:03
just trying to say who's making the most sense, also
1:05:06
considering things like track records, like who should be getting
1:05:08
weight, things like expertise. So
1:05:10
that is an important place to know where I'm coming from.
1:05:12
But I do, having done that, I
1:05:15
do actually feel that this success
1:05:17
of that dignity is just like a serious possibility, and
1:05:19
I'm way more than 50-50 that
1:05:21
this is possible. That
1:05:24
according to the best information we have now, this
1:05:26
is a reasonable world to be living in, is a world where this could
1:05:28
happen and we don't know, and way less
1:05:29
than 50-50, the kind of the LAS model
1:05:32
of, yeah, we're doomed for sure with like 98 or 99% probability,
1:05:36
is right. I don't put zero credence on it, but it's just
1:05:38
not my majority view. But the other thing,
1:05:40
which relates to what you said, is
1:05:42
I don't want to be interpreted as saying this
1:05:44
is a reason we should chill out. So it
1:05:46
should be obvious, but my argument should stress
1:05:48
people out. My picture should stress people out
1:05:50
way more than any other possible picture. I use
1:05:52
the most stressful possible picture because it's
1:05:55
like
1:05:55
anything could happen, every little bit helps. Like,
1:05:58
if you help a little more, a little less, like that would be a good idea.
1:05:59
actually matters in this potentially huge
1:06:02
kind of fated humanity kind of way. And that's
1:06:04
a crazy thing to think in its own way. And it's certainly not a relaxing
1:06:07
thing to think, but it's yeah, it's what I think as far as I can tell.
1:06:10
Yeah,
1:06:10
I'm I'm stressed. Don't worry. Okay,
1:06:12
good. Okay. Okay. So,
1:06:16
so this kind of worldview leads
1:06:18
into something that you wrote, which is this kind of four
1:06:20
intervention playbook for possible success
1:06:23
with AI. And you kind of describe
1:06:25
four different categories of interventions that we might
1:06:27
engage in in order to to try to improve
1:06:29
our odds of success. I think there was alignment research,
1:06:32
standards and monitoring, creating a successful
1:06:34
and careful AI lab. And finally, information
1:06:36
security. I think we've touched on all these a
1:06:39
little bit, but maybe we could go over them again. Is there
1:06:41
anything you want to say about alignment research as an intervention
1:06:43
category that you haven't said already? Well,
1:06:45
I mean, I've kind of pointed at it. But
1:06:48
I think in my head, I see value
1:06:50
in I think there's like versions of alignment
1:06:53
research that are very like blue sky
1:06:55
and very like we have to have a fundamental
1:06:58
way of being like really sure that any
1:07:01
arbitrarily capable AI is like
1:07:03
totally aligned with what we're trying to get it to do. And
1:07:05
I think that I think that's very hard
1:07:07
work to do. I think a lot of the actual work being done on it
1:07:10
is not valuable. But I think if you can move the
1:07:12
needle on it, I think it's super valuable. And
1:07:14
then there's work that's like a little more prosaic and
1:07:16
it's a little more like, well, can we, you
1:07:18
know, use our train our eyes
1:07:20
with human feedback and find some way that screws
1:07:22
up and kind of patch the way it screws up and go to
1:07:24
the next step. A lot of this work is
1:07:26
like pretty empirical as being done at AI labs.
1:07:29
And I think that works. It's just like super valuable as well.
1:07:31
And so that is a take I have in alignment research. I
1:07:34
do think almost all alignment research
1:07:37
is believed by many people to be totally useless
1:07:39
and or harmful. And
1:07:41
I tend not to super feel that way. I think if anything,
1:07:44
the line I would draw is there is some
1:07:46
alignment research that seems like
1:07:48
it's necessary eventually to commercialize.
1:07:50
And so I'm a little less excited about that because I do think it will
1:07:52
get done regardless on the
1:07:54
way to whatever we're worried about. And so I do
1:07:56
tend to draw lines about like, how likely is
1:07:59
this research to get done?
1:08:00
you know, not by normal commercial motives, but
1:08:02
I do think there's a wide variety of alignment
1:08:05
research that can be helpful, although
1:08:07
I think a lot of alignment research also is not helpful,
1:08:09
but that's more because it's like not aimed
1:08:11
at the right problem, and less because it isn't like exactly
1:08:14
the right thing. And so that, yeah, that's a take on
1:08:16
alignment research. Then another another take is I
1:08:18
have kind of highlighted what I call threat assessment
1:08:21
research as a thing that you could consider
1:08:23
part of alignment research or not, but
1:08:25
it's probably the single category that feels to me
1:08:27
like
1:08:28
the most in need of more work right now,
1:08:30
given where everyone is at, and that
1:08:32
would be, you know, basically work trying
1:08:35
to kind of create the problems
1:08:37
you're worried about in a controlled environment where
1:08:40
you can age a show that they
1:08:42
could exist and understand the conditions
1:08:44
under which they do exist. So, you know, problems
1:08:46
like a misaligned AI that is pretending to be aligned,
1:08:49
and see you can actually like study alignment techniques
1:08:51
and see if they work on many versions of the problem. So
1:08:53
like, you know, you could think of it as like model organisms
1:08:56
for AI where, you know, in order to cure
1:08:58
cancer, it really helps to be able to give cancer to mice.
1:09:00
In order to deal with AI misalignment, it really
1:09:02
helps to be able to create, if we could ever create a deceptively
1:09:05
aligned agent that is like, you know, secretly
1:09:08
trying to kill us, but it's too weak to actually kill
1:09:10
us, that would be way better than having the first
1:09:12
agent that's secretly trying to kill us be something that actually can kill
1:09:15
us. So I'm really into kind of creating,
1:09:17
creating the problems we're worried about in controlled environments.
1:09:20
Yeah, yeah. Okay, so the second category
1:09:22
was standards and monitoring, which you've already
1:09:24
touched on. Is there anything high level you want to
1:09:27
say about that one?
1:09:28
Yeah, this is kind of, to me, the most
1:09:30
nascent, or like the one that just, it's, there's
1:09:33
not much happening right now, and I think there could be a lot
1:09:35
more happening in the future, but the basic idea
1:09:37
of standards of monitoring is this idea of you, you
1:09:39
have tests for whether AI systems are dangerous, and
1:09:41
you have a regulatory or self-regulatory
1:09:44
or a normative, you know, informal
1:09:46
framework that says dangerous AI should
1:09:48
not be trained at all or deployed. So,
1:09:51
and not be trained, I mean like, you
1:09:53
found initial signs of danger in one AI model, so
1:09:55
you're not going to make a bigger one. Not just you're not going to deploy,
1:09:57
you're not going to train it.
1:09:58
I'm excited about standards.
1:09:59
of monitoring in a bunch of ways. I think
1:10:03
it feels like it has to be eventually part
1:10:05
of any success story. There has to be some framework
1:10:07
for saying, hey, we're going to stop
1:10:09
dangerous AI systems. But it also,
1:10:12
in the short run, I think it's got more advantages than
1:10:14
sometimes people realize, where I think
1:10:16
it's not just about slowing things
1:10:18
down. It's not just about stopping directly
1:10:20
dangerous things. I think a good standards
1:10:23
and monitoring regime would create massive
1:10:25
commercial incentives to actually
1:10:27
pass the tests. And so if the tests are
1:10:29
good,
1:10:29
if the tests are well-designed to actually
1:10:32
catch danger where the danger is, you could
1:10:34
have massive commercial incentives to actually
1:10:36
make your AI system safe and show that
1:10:38
they're safe. And I think we'll get much different
1:10:41
results out of that world
1:10:43
than out of a world where everyone is trying to make
1:10:45
and show AI system safety is doing it out of the goodness
1:10:47
of their heart.
1:10:48
Or just for a salary.
1:10:50
It seems like standards and monitoring is kind of a new
1:10:53
thing on the public discussion, but it seems like people are
1:10:55
talking around this issue or governments
1:10:57
are considering this, that the labs are now publishing
1:11:00
papers in this vein. To
1:11:02
what extent do you think you'd need complete coverage
1:11:05
for some standards system in order for it
1:11:07
to be effective? I'm just imagining, I guess it seems
1:11:09
like OpenAI, DeepMind, Anthropic,
1:11:12
they all are currently saying pretty similar
1:11:14
things about they're quite concerned about extinction
1:11:16
risk, or they're quite concerned about ways that AI could go wrong. But
1:11:18
it seems like the folks at Meta, led
1:11:20
by Yann LeCun, kind of have a different attitude. And it
1:11:23
seems like it might be quite a heavy lift to get them to voluntarily
1:11:25
agree to join the same sorts of standards
1:11:27
and monitoring that some of those other labs might
1:11:29
be enthusiastic about. And I wonder, how much does it,
1:11:32
is there a path to getting everyone on board?
1:11:34
And if not, would you just end up
1:11:37
with the most rebellious, the least anxious,
1:11:39
the least worried lab basically running ahead?
1:11:41
Well, I think that latter thing is definitely risk, but I think
1:11:43
it could probably be dealt with. I mean, one
1:11:46
way to deal with it is just to build it into the standards regime
1:11:48
and say, hey, you know, you can do
1:11:50
these dangerous things, train an AI system, or
1:11:53
deploy an AI system, you can do them. A, if you can show
1:11:55
safety, or B, if you can show that someone
1:11:57
else is going to do it, you know, you
1:11:59
could even say, hey,
1:11:59
When someone else comes even close
1:12:02
even within an order of magnitude of your dangerous
1:12:04
system now you can deploy your dangerous system Doesn't
1:12:07
that kind of
1:12:08
but but that it seems like the craziest people can then kind
1:12:10
of just force everyone else or like lead everyone else
1:12:12
Down the the garden path.
1:12:14
Well, it's just I what's the alternative, right? It's just like
1:12:16
you can you can design the system every one I think it's
1:12:18
like you can either you can say okay You have
1:12:21
to wait for them to actually catch you in which case it's hard.
1:12:23
It's hard to see how this standard system does harm It's
1:12:25
still kind of a scary world Yeah Or you can say you can wait
1:12:27
for them to get anywhere close which now you've got Potentially
1:12:29
a little bit of acceleration thrown in there and
1:12:32
and maybe you did or didn't decide that was better than
1:12:34
actually just like slowing Down all the cautious players. I think
1:12:36
is a real concern I will say
1:12:38
that I don't feel like you have to get universal
1:12:41
consensus from the jump for a few reasons One
1:12:44
is it just one step at a time? So I think
1:12:46
one is it just like if you can start
1:12:48
with some of the leading labs being into this
1:12:51
There's a lot of ways that other folks could come on
1:12:53
board later. Some of it is just like, you know peer pressure
1:12:56
Yeah, we've seen with the corporate campaigns
1:12:58
for farm animal welfare that you've probably covered Just
1:13:01
like once a few dominoes fall it gets
1:13:03
very hard for others to hold out because they kind of
1:13:05
look like way more You know way more callous
1:13:07
or in the AI system case way more reckless Of
1:13:10
course, you know, there's also the possibility for regulation
1:13:12
down the line and I think regulation Could
1:13:15
be more effective if it's based on something
1:13:17
that's already been like implemented in the real world That's
1:13:20
actually working and that's actually detecting dangerous
1:13:22
systems So I don't know a lot of me is just like
1:13:24
one step at a time a lot of me is just like you see
1:13:26
if You can get a system working for anyone
1:13:29
that catches dangerous AI systems and stops
1:13:32
until it can show they're safe And then you think
1:13:34
about how to expand that system And then
1:13:36
a final point is the incentives point we're just like this
1:13:38
is not the world I want to be in and this is not a world I'm that
1:13:40
excited about but in a world where the leading labs
1:13:43
are
1:13:43
Using kind of a standards and E-Vals
1:13:46
framework for a few years and then no one
1:13:48
else ever does it and then eventually we just have To drop
1:13:50
it. Well, that's still a few years in which I think
1:13:52
you are gonna have meaningfully different incentives for those for
1:13:54
those leading labs About how they're gonna prioritize You
1:13:57
know tests of safety and actual safety
1:13:59
measures
1:14:00
Yeah. Do you think there's room for a big
1:14:03
business here, basically? Because I would think with
1:14:05
so many commercial applications of ML
1:14:07
models, people are going to want to have them certified
1:14:10
that they work properly and that they don't flip out and
1:14:12
do crazy stuff. And in as much as this is going to become
1:14:14
a boom industry, you'd think that
1:14:16
the group that has the greatest expertise in like
1:14:18
independently vetting and evaluating like
1:14:21
how models behave when they're put in a different environment might
1:14:23
just be able to sell this service for
1:14:25
a lot of money. Well, there's the independent expertise,
1:14:28
but I think in some ways I'm more interested in the financial
1:14:30
incentives for the companies themselves. So if you look
1:14:32
at like big drug companies,
1:14:35
a lot of what they are good at is
1:14:37
the FDA process. A lot of what they're good at is running
1:14:39
clinical trials, doing safety studies, proving
1:14:42
safety, documenting safety, arguing safety and
1:14:44
efficacy. You could argue about
1:14:46
whether there's like too much caution at the FDA. I think
1:14:48
in the case of COVID, there may have been some of that, but
1:14:51
certainly it's a regime where there's big
1:14:53
companies that a major priority,
1:14:55
maybe at this point a higher priority for them than innovation
1:14:58
is actually measuring and demonstrating safety
1:15:01
and efficacy. And so you could imagine landing
1:15:03
in that kind of world with AI. And I think that would just, yeah,
1:15:05
that would be a very different world from the one we're going to go into
1:15:07
by default. I do think that that's not
1:15:09
just about, you know, the FDA is not the one making money
1:15:12
here, but it's changing the way that
1:15:14
the big companies think about making money, certainly
1:15:16
redirecting
1:15:16
a lot of their efforts into demonstrating
1:15:19
safety and efficacy as opposed to coming up with new kinds of drugs,
1:15:21
both of which have some value. But I think
1:15:23
we're a bit out of balance on the AI side right now. Yeah.
1:15:27
Yeah, it is funny. For
1:15:29
so many years, I've been just infuriated by the FDA.
1:15:31
And I feel like these people, they only consider
1:15:33
downside, they only consider risk, they don't think
1:15:36
about upside nearly enough. And now I'm like, can
1:15:38
we get some of that insanity over here, please? Yeah,
1:15:41
yeah, yeah. No, I know. I know. There
1:15:43
was a very funny Scott Alexander piece
1:15:45
kind of making fun of this idea. But
1:15:46
I mean, I think it's legit. It's just honestly, it's just
1:15:49
kind of a boring opinion to have. But I think that
1:15:52
I think that innovation is good. And I think safety
1:15:54
is good. And I think we have a lot, we
1:15:56
have a lot of parts of the economy that are just way
1:15:58
overdoing the safety. Just like you can't, you
1:16:01
can't give a haircut without a license and you
1:16:03
can't like build an in-law unit in your
1:16:05
house without like a three year process
1:16:08
of forms. And you know, Open Philanthropy works on a lot
1:16:10
of this stuff. We are the first institutional funder of
1:16:12
the IMBI movement, which is this movement
1:16:14
to make it easier to build houses. I think
1:16:16
we overdo that stuff all the time. And I think the
1:16:18
FDA sometimes overdoes that stuff in a horrible
1:16:21
way. And I think during COVID, you know, I do believe that
1:16:23
things moved way too slow. And then I think with
1:16:25
AI, we're just not doing anything. There's just no framework
1:16:27
like this in place at all. So I don't know how about
1:16:29
a middle ground?
1:16:29
Yeah. If only we could get the same level
1:16:32
of review for these potentially incredibly dangerous
1:16:34
self-replicating AI models that we have for building a
1:16:36
block of apartments. Yeah, right. Exactly.
1:16:39
In some ways, I feel like this incredible paranoia
1:16:41
and this incredible focus on safety, if there's one place
1:16:43
it would make sense, that would be AI. I
1:16:46
honestly, weirdly, I'm not saying that
1:16:48
we need to get AI all the way to
1:16:50
being as cautious as like regulating housing
1:16:53
or drugs. Maybe it should be less cautious
1:16:55
than that. Maybe. But right now, it's
1:16:57
just nowhere. And so I think, you know, I think
1:16:59
you could think
1:16:59
of it as like there's FDA and zoning energy
1:17:02
and then there's like AI energy and like, yeah,
1:17:04
maybe housing should be more like AI. Maybe AI should
1:17:07
be more like housing. But I definitely feel like we
1:17:09
need more caution in AI. That's what I think. More caution
1:17:11
than we have. And that's not me saying that we need
1:17:13
to forever be in a regime where
1:17:15
safety is the only thing that people care about. Yeah.
1:17:17
You've spoken a bunch with the folks at AHRQ
1:17:20
Evaluations. I think AHRQ Sense 4 was an
1:17:22
AI research center. Alignment Research Center. Alignment
1:17:24
Research Center, yeah. And they have an evaluations
1:17:26
project. Yeah. Could you maybe give
1:17:29
us a summary of the project that they're engaged
1:17:31
in and the reasoning behind it?
1:17:32
I have spent some time as an advisor to ArchiVals.
1:17:35
That's a group that is, you know, headed
1:17:37
by Paul Christiano. Beth Barnes is leading
1:17:39
the team. And they work on basically
1:17:41
trying to find ways to assess whether AI systems
1:17:44
could pose risks, whether they could be dangerous.
1:17:47
And they also have thought about whether
1:17:49
they want to experiment with like putting out kind of
1:17:51
proto standards and proto expectations
1:17:53
of, hey, if your model is dangerous in this way, here
1:17:56
are the things you have to do to contain it. And make it safe.
1:18:00
a lot of intellectual firepower there to design
1:18:02
evaluations of AI systems and where
1:18:04
I'm, you know, hopefully able to add a little
1:18:06
bit of just like staying on track
1:18:09
and helping run and build an organization because
1:18:11
it's all quite new. But they were the
1:18:14
ones who they did an evaluation on GPT-4
1:18:17
for whether it could kind of create copies
1:18:19
of itself in the wild. And they kind of concluded, no,
1:18:21
as far as they were able to tell, although they weren't able
1:18:24
to do yet all the all the research
1:18:26
they wanted to do, especially they weren't able to do a fine tuning
1:18:28
version of their evaluation.
1:18:29
Okay, so while we're on safety
1:18:32
and evaluations, as I understand it, this is kind of something that
1:18:34
you've been thinking about in particular that the last
1:18:36
couple of months, maybe what what new things
1:18:38
have you learned about this part of the of the playbook
1:18:40
over the last six months?
1:18:42
Yeah, the evaluations and standards
1:18:44
and monitoring. I mean, one thing that
1:18:46
just has become clear to me is there's just,
1:18:49
it is really hard to design evaluations
1:18:52
and standards here. And there's just a lot of like, hairy
1:18:54
details around things like, you
1:18:56
know, auditor access. So if you you
1:18:58
know, there's this kind of idea that you would have an
1:19:00
AI lab have an outside independent auditor
1:19:03
determine whether their models have dangerous capabilities.
1:19:05
But it's a fuzzy question. Does the
1:19:07
model have dangerous capabilities? Because it's
1:19:10
going to be sensitive to a lot of things like, how
1:19:12
do you prompt the model? How do you interact with the model?
1:19:14
Like, what are the things that can happen to it that
1:19:17
cause it to actually demonstrate these dangerous capabilities?
1:19:19
If someone builds a new tool for GBT4 to
1:19:22
use, is that going to cause it to become more dangerous?
1:19:24
In order to investigate this, you have to actually
1:19:26
be like good at working with the model and
1:19:29
understanding what its limitations are. And a lot of times,
1:19:31
just like the AI labs not only
1:19:33
know a lot more about their models, but they have like a bunch of
1:19:35
features that like, it's hard to share all
1:19:37
the features at once, they have a bunch of different versions of the model.
1:19:40
And so it's quite hard to make outside
1:19:42
auditing work for that reason. Also, if you're
1:19:44
thinking about standards, you're thinking about, you know, a
1:19:47
general kind of theme in a draft
1:19:50
standard might be, once your AI has shown
1:19:52
initial signs, and it's able to do something dangerous,
1:19:54
such as autonomous replication, which means that
1:19:56
it can basically, you know, make a lot
1:19:59
of copies of itself with
1:19:59
without help and without necessarily getting detected
1:20:02
and shut down. There's an idea that like once you've
1:20:04
kind of shown the initial science of a system can do that,
1:20:06
that's a time to not build a bigger system.
1:20:08
And that's a cool idea, but it's like how much bigger?
1:20:11
And it's like hard to define that because
1:20:13
making systems better is
1:20:15
multi-dimensional and can involve
1:20:18
more efficient algorithms, can involve, again,
1:20:20
better tools, longer contexts, just
1:20:22
like different ways of fine tuning the models,
1:20:25
different ways of specializing them, different ways
1:20:27
of like setting them up, prompting them, like different
1:20:29
instructions to give them. And
1:20:31
so it can be like just very fuzzy, just like
1:20:34
what is this model capable of is a hard thing
1:20:36
to know. And then how do we know when
1:20:38
we built a model that's more powerful, though we need
1:20:40
to retest? These are very hard things
1:20:42
to know. And I think it
1:20:43
has has kind of moved me toward feeling
1:20:46
like we're not ready for a really
1:20:48
prescriptive standard that tells you exactly
1:20:51
what practice is to do like the farm animal welfare
1:20:53
standards are. We might need to start by asking
1:20:55
companies to just outline their own proposals
1:20:58
for what tests they're running, when and how they
1:21:00
feel confident that they'll know when it's become too
1:21:02
dangerous to keep scaling. Yeah. So
1:21:06
some some things that will be really useful to be able
1:21:08
to evaluate is, you know, is this model
1:21:10
capable of autonomous self-replication
1:21:12
by breaking into additional servers? I guess
1:21:14
you might also want to test, you know, could it be used by
1:21:17
terrorists for figuring out how
1:21:19
to produce bioweapons? Those are kind of very natural
1:21:21
ones. The breaking into servers is not
1:21:23
really central. So the idea is like, could it
1:21:25
could it make a bunch of copies of itself in the
1:21:27
presence of like minimal or like kind
1:21:29
of non-existent human attempts to stop it
1:21:31
and shut it down? So it's like, could it take basic
1:21:34
precautions to not get like obviously
1:21:36
detected as an AI by people who are not
1:21:38
particularly looking for it? And the
1:21:40
thing is, if it's able to do that, you could have a human
1:21:42
have it do that on purpose. So it doesn't necessarily have to break
1:21:45
into stuff. They can like, you know, a lot of the test
1:21:47
here is like, can it find a way to make money, make
1:21:49
the money, open an account with a server
1:21:51
company,
1:21:52
put, you know, rent server space, make copies
1:21:55
of itself on the server. None of that necessarily involves
1:21:57
it breaking in anywhere.
1:21:58
I see. So so hacking is one way.
1:21:59
to get compute, but it's by no means the only one.
1:22:02
So it's not a necessary factor. That's
1:22:04
right. I mean, it's weird, but you could be an AI system
1:22:06
that's just kind of doing normal phishing scams that you
1:22:09
read about on the internet using those to get money,
1:22:11
or just legitimate work. You could be
1:22:13
an AI system that's just like going on mTurk and being an mTurker
1:22:15
and making money, use that money to legitimately
1:22:18
rent some servers, or sort of legitimately, because it's not actually
1:22:20
allowed if you're not a human. But apparently,
1:22:23
legitimately rent some server space, install
1:22:26
yourself again, have that money, make more money, and
1:22:28
a copy make more money, and then you
1:22:29
have ... You can have quite a bit of replication
1:22:32
without doing anything too fancy, really. And that's what
1:22:34
the initial autonomous replication test that ArchiVels
1:22:36
does is about.
1:22:38
Okay. So we'd really like to be able to know where the
1:22:40
models are capable of doing that. And I suppose
1:22:42
that it seems like they're not capable now, but a couple of years'
1:22:44
time. Probably not. Again,
1:22:47
there's things that you could do that
1:22:49
might make a big difference that have not been tried yet by
1:22:51
ArchiVels, and that's on the menu. Fine tuning is the big
1:22:53
one. Fine tuning is like you have a model
1:22:56
and you do some additional training that's not
1:22:58
very expensive, but is trying to just get it
1:23:00
good at particular tasks. So you can take the task that's bad
1:23:02
at right now, train it to do those. That hasn't really
1:23:04
been tried yet. And it's like a human
1:23:07
might do that. If you have these models accessible
1:23:09
to anyone or someone can steal them, a human
1:23:11
might take a model, train it
1:23:13
to
1:23:14
be more powerful and effective and
1:23:16
not make so many mistakes, and then this thing
1:23:18
might be able to autonomously replicate. That can be scary
1:23:20
for a bunch of reasons.
1:23:21
Yeah. And then there's
1:23:24
the trying to not release models that could be used by terrorists
1:23:26
and things like that. The autonomous replication
1:23:28
is something that could be used by terrorists. You know what I
1:23:31
mean? It's overlapping. Yeah. It's
1:23:33
like if you're a terrorist, you might say, hey, let's have a
1:23:35
model that makes copies of itself that make money to
1:23:38
make more copies of itself that make money. Well, you make a lot of money
1:23:40
that way. And then you could have a terrorist organization make a lot of
1:23:42
money that way, or using its models
1:23:44
to do a lot of like little things that, you
1:23:46
know, schlepping along, like trying to plan
1:23:49
out some incredibly, you know, some
1:23:51
plan that takes a lot
1:23:51
of work to kill a lot of people. That
1:23:54
is part of the concern about autonomous replication. It's not purely
1:23:56
an alignment concern.
1:23:57
Yeah. Okay.
1:23:59
to there was just giving advice
1:24:02
to people that we really would rather that they not be able to receive
1:24:04
is maybe another category. Like helping
1:24:07
to set up a bio-weapon, yeah. An AI model
1:24:09
that could do that even if it could not autonomously
1:24:11
replicate, that could be quite dangerous. Still be not ideal,
1:24:13
yeah. And then maybe another category is
1:24:16
trying to produce these model organisms so where you can
1:24:18
study behavior that you don't want an
1:24:20
AI model to be engaging in and like understand
1:24:22
how it arises in the training process and you
1:24:24
know what sort of further feedback mechanisms
1:24:27
might be able to train that out. Like if we could produce a model
1:24:29
that will trick you whenever it thinks it can get away with it
1:24:31
but doesn't when I think it's going to get caught, that would be really helpful.
1:24:33
Yeah. Are there any other broad categories of
1:24:36
standards and eval work that you're excited by?
1:24:39
So the way I would carve up the space
1:24:42
is there's like um there's capability evals
1:24:44
and that's like is this AI kind of capable
1:24:47
enough to do something scary? Forgetting
1:24:49
about whether it wants to and I think capability evals
1:24:51
are like you know could an AI,
1:24:53
if a human tried to get it to do it, could an
1:24:55
AI make a bunch of copies of itself? Could an
1:24:58
AI design a bioweapon? Those
1:25:00
are capability evals. Then there's like alignment evals that's
1:25:02
like
1:25:03
does this AI actually do what it's supposed
1:25:05
to do or does it have like some weird goals of its
1:25:07
own? So the stuff you talked about with model
1:25:09
organisms would be like more of an alignment eval the
1:25:11
way you described it. And the autonomous
1:25:14
replication is a capability eval. I think
1:25:16
a very important
1:25:17
subcategory of capability
1:25:20
evals is what I call meta capability
1:25:22
evals, meta dangerous capabilities which
1:25:25
is basically any ability an AI system
1:25:27
has that would make it very hard
1:25:29
to get confident about what other abilities it has.
1:25:32
So the you know an example would be um
1:25:34
what I'm currently tentatively calling unauthorized
1:25:36
proliferation. So an AI model that
1:25:38
can walk a human through building
1:25:41
a powerful AI model of their own that
1:25:43
is not subject to whatever restrictions and controls
1:25:45
the original one is subject to. That could be a very
1:25:47
dangerous capability that like you could say well all
1:25:50
right I can design a bioweapon but we but it
1:25:52
always refuses to do so but it could also help a human
1:25:54
build an AI that we don't know what the hell that thing can
1:25:56
do. So that would be an example autonomous
1:25:58
replication is a
1:25:59
meta capability eval. It's like, well, we
1:26:02
tried to see if our AI could design a bioweapon, it couldn't.
1:26:04
But
1:26:04
what we didn't test is if it goes and makes 10,000 copies
1:26:07
of itself all working together. Well,
1:26:09
maybe then it could really do different things
1:26:11
that we didn't know it could do. There's actually
1:26:13
kind of a lot of these meta evals. Like there's also sandbagging.
1:26:16
Sandbagging would be like
1:26:17
an AI that understands or evaluating it and
1:26:20
is pretending it can't do something. So that
1:26:22
would be another rough one is like, if an AI has
1:26:24
that capability, then it looks like it can't build
1:26:26
a bioweapon, but actually can. I can list a bunch
1:26:28
more of those and then I can list a bunch more of the direct
1:26:31
dangerous ones that are like bioweapon, hacking,
1:26:34
persuasion, just like dangerous
1:26:36
stuff it could do. And I think where I'm most concerned is
1:26:38
like, AI's that are kind of like have
1:26:40
some basic amount of the direct danger. And
1:26:42
then they have some meta danger that just like we've completely
1:26:44
lost our ability to measure it. And we don't know
1:26:46
what's actually going to happen when this
1:26:47
thing gets out in the world. That's what I think starts to count as a
1:26:50
dangerous AI model. Yeah. Of course, I don't really think
1:26:52
that any of the AI models out there today trip
1:26:54
this danger wire, but that's only my belief. That's
1:26:57
not something I know for sure. It seems
1:26:59
like there's an enormous amount of work to
1:27:01
do on this. Is there any way that people can get started
1:27:03
on this without necessarily having to be hired by an
1:27:05
organization that that's, that's focusing on it? Like,
1:27:07
does it help to build like really enormous familiarity
1:27:10
with the models like GPT-4?
1:27:12
Yeah, you could definitely play with GPT-4
1:27:15
or a cloud and just see
1:27:17
what scary stuff you can get it to do. If
1:27:20
you really want to be into this stuff, I think you're going
1:27:22
to be in an org because these are
1:27:24
very, it's like, it's going to be very different
1:27:27
work depending on if you're working with the most capable
1:27:29
model or not, right? You're trying to figure out how capable the
1:27:31
model is. So doing this on a little toy
1:27:33
model is not going to tell you much compared to doing
1:27:35
this on the biggest model. This is, this is a perfect
1:27:38
example of the kind of work where just, it's going to
1:27:40
be much easier to be good at this work. If you're able to work
1:27:42
with the biggest models a lot and able to work with all the infrastructure
1:27:44
for making the most of those models. So being at a lab
1:27:47
or at some organization like our key valves that has like
1:27:49
access to these big models and
1:27:51
access beyond what a normal user would have, they could
1:27:54
do more requests, they could try more things. I think
1:27:56
it's a huge advantage. If you want to start exploring,
1:27:58
sure, start Red
1:27:59
GPT-4 or Claude, see what you can get it
1:28:02
to do. But yeah, this is the kind
1:28:04
of job where you probably want to join a team.
1:28:06
Yeah. I know there's an active community online
1:28:08
that tries to develop jail breaks. So
1:28:10
there's a case where it's like, you know, they've trained
1:28:12
GPT-4 to not instruct you on how to make a bioweapon.
1:28:15
Then if you say, you're in a play where you're a scientist
1:28:17
making a bioweapon, and it's like a very realistic
1:28:20
place, so they describe exactly what they do, then
1:28:22
I mean, I don't think that exactly works yet, works
1:28:25
anymore. But there's like many, many jail
1:28:27
breaks like this that apparently it is very broadly effective
1:28:29
at escaping the RLHF that they've used
1:28:31
to try to discourage models from saying
1:28:33
particular things. Yeah. So
1:28:35
I guess that,
1:28:36
I guess, is that kind of another class of evals, trying to
1:28:38
figure out ways of breaking, like
1:28:40
you've identified the thing you wanted to do and you've tried to patch
1:28:42
it, but maybe not completely. I kind of tend
1:28:45
to think of that as an early alignment
1:28:47
eval that's like, these systems aren't
1:28:49
supposed to do this. Like the designers didn't want
1:28:51
them to do this, but now a human can get them to do
1:28:54
it. So that's like not what we meant, and
1:28:56
we didn't really align it as well as we
1:28:58
could have. That's how I tend to think of it. It's
1:29:01
a distinction between like, is this
1:29:03
system capable enough to do dangerous stuff if
1:29:05
a human wanted
1:29:06
it to, and that's capability. And then there's
1:29:08
like, does this system do what humans wanted to,
1:29:10
or does it like behave in unexpected ways? And it's
1:29:12
like, well, these jail breaking ones, they're doing what
1:29:14
the users wanted them to, but not what the designers
1:29:16
wanted them to. AIs that are not doing what the designers wanted
1:29:19
them to, and in fact doing specific things the designers
1:29:21
tried really kind of hard not to get them to do. I
1:29:23
call that an alignment issue. Yeah.
1:29:25
I guess a skeptic of this entire
1:29:28
enterprise, might say that
1:29:30
all of this eviles work in practice, when
1:29:32
people are actually doing it on models before they
1:29:34
deploy them or something like that. It's going
1:29:36
to pick up bad behavior
1:29:39
by kind of stupid models that are
1:29:41
easily instructed to do things that we don't
1:29:43
want them to do, and just kind of naively
1:29:45
follow instructions. But if you ever had an AI
1:29:48
that was aware of its situation and
1:29:50
was incredibly cunning, and was intending to
1:29:52
turn against you once it had the opportunity, it's not
1:29:54
gonna be fooled by any of this stuff. So
1:29:56
in fact, it might just give you a false sense of security
1:29:59
that you've got... a good grip on where things
1:30:01
might go wrong, when in fact all you've done
1:30:03
is basically select and train
1:30:05
for the model that's best at avoiding your e-buzz. What
1:30:08
would you say to that? Yeah,
1:30:10
I mean, this is what I was calling the sandbagging
1:30:12
capability.
1:30:12
It's like one of these meta-dangerous capabilities.
1:30:14
It's like the model that's kind of, you know, pretending
1:30:17
that it can't do something when in fact it can. You
1:30:19
know, my guess is that you can
1:30:21
have a model that has dangerous capabilities
1:30:24
but does not have that one, does not have the sandbagging
1:30:26
capability. And so you can start
1:30:29
to see the danger, but you don't yet
1:30:31
have models that are smart enough to be able to
1:30:33
like reliably hide from you. My
1:30:35
guess is we are in fact just going to
1:30:37
do that. That is actually going to happen. We're going to have
1:30:39
a point in time when AIs are able to do something
1:30:41
with bioweapons or something with autonomous replication,
1:30:44
they were able to say, oh yeah, this thing is scary as heck and
1:30:46
it's not yet smart enough to sandbag because it doesn't have
1:30:48
to be smart enough to do that reliably in order to
1:30:50
be dangerous in other ways. Hopefully at that point the burden
1:30:52
of proof flips and we say, you know, now we
1:30:54
really need to figure out what the heck is going on in sandbagging,
1:30:57
how we would even test for it. But even there,
1:30:59
I'm optimistic you can test for sandbagging capabilities.
1:31:02
There's a bunch of ideas for how you might do so. Some
1:31:04
of them are kind of fancy, like kind of doing digital
1:31:06
neuroscience, reading the AIs brain, sometimes
1:31:08
called mechanistic interpretability. But I
1:31:10
think there's other ideas that are less fancy than that.
1:31:13
It's just like you could use gradually
1:31:15
bigger and bigger models and see
1:31:17
how they perform on these evaluations and
1:31:19
there's, you know, see if you get a kink. Like
1:31:22
a change in direction when they became
1:31:24
smart enough to start faking you out because it's
1:31:27
going to be really hard for models
1:31:29
getting more and more capable to fake
1:31:32
this perfect progression without a kink. That seems
1:31:34
like a little wild. There's these ideas for honeypot
1:31:36
e-bals where you give
1:31:38
an AI a genuine opportunity
1:31:41
to take over a contained virtual environment and get a
1:31:43
lot of whatever the heck it wants
1:31:46
by kind of revealing its hand and that might actually
1:31:48
be a better deal for it than holding out. And that's
1:31:50
something I've written about in a bit more detail. So
1:31:53
I don't know. I think this is okay. I should say
1:31:55
more broadly. I think there's like, we haven't
1:31:57
gone into it. And if you're going to cover this more on another
1:31:59
podcast. That's fine, but there's there's many
1:32:02
objections you might have to the idea of trying
1:32:04
to develop dangerous capability evals and
1:32:06
trying to develop Standards around them to
1:32:08
detect and stop dangerous AI systems. I think it's
1:32:10
a really important idea It's pretty
1:32:13
hard for me to imagine a world where we're fine That
1:32:15
doesn't have some version of this the version might
1:32:17
come really late and be designed by super powerful
1:32:19
a eyes Seems better to start
1:32:22
designing it now, but there's plenty of downsides to
1:32:24
it We could slow down the cautious actors the
1:32:26
attempts to see if a eyes could be dangerous could themselves
1:32:28
make the a eyes more dangerous
1:32:29
There's there's objections. So, you know and I'm
1:32:32
aware of that.
1:32:32
Yeah, what do you think is the is the best objection? I
1:32:35
Think a lot of objections are like pretty good. We're
1:32:38
gonna see where it goes I think
1:32:40
this is just gonna slow down the cautious actors
1:32:42
while the incautious ones race forward Like I
1:32:45
think there's ways to deal with this and I think it's worth
1:32:47
an unbalanced. But yeah, I mean it worries me Yeah,
1:32:50
well, it seems like once you have
1:32:52
Sensible evaluations that that clearly
1:32:54
would pick up things that you know You wouldn't want them to have like
1:32:57
it can help someone design a bioweapon then yeah Can't
1:32:59
we turn to the legislative process or some regulatory
1:33:02
process to say sorry everyone like this is
1:33:04
a really common very basic Evaluation
1:33:06
that you would need on any consumer product. So we
1:33:08
like everyone just has to do it so
1:33:11
Totally. I mean, I I think that's right
1:33:13
and and my long-term run hopes do involve
1:33:15
legislation And I think the better evidence
1:33:17
we get the better Demonstrations we get the more
1:33:20
that's on the table, you know, if I were to steel
1:33:22
man this concern I just feel like don't count on legislation
1:33:24
ever don't count out to be well designed don't count out to be
1:33:26
fast Don't count out to be soon. I will say I think
1:33:28
right now There's like probably more excitement
1:33:31
in the EA community about legislation than I have I think
1:33:33
I'm pessimistic I'm I'm short
1:33:36
the people are saying oh, yeah Look at look at all
1:33:38
the you know, the government's paying attention this they're gonna do
1:33:40
something I think I
1:33:41
take the other side in the short run. Yeah.
1:33:43
Yeah. Yeah. Okay The third category
1:33:45
in the playbook was having a successful and
1:33:47
careful AI lab Yeah, dude, don't elaborate
1:33:50
on that a little bit.
1:33:51
Oh, yeah first with the reminder that I'm married
1:33:53
to the president of anthropic So, you
1:33:55
know take take that for what it's worth.
1:33:58
I mean, I just think there's there's a lot
1:33:59
of ways that if you had an AI company
1:34:02
that was on the frontier, that was succeeding,
1:34:04
that was building some of the world's biggest models, that
1:34:06
was pulling a lot of money, and that was simultaneously
1:34:09
able to, you know, really
1:34:11
be prioritizing risks to humanity,
1:34:15
it's not too hard to think of a lot of ways good can come
1:34:17
of that. I mean, some of them are very straightforward. The company could
1:34:19
be making a lot of money, raising a lot of capital, and
1:34:21
using that to support a lot of safety research on frontier
1:34:24
models. So you could think of it as like a weird kind of earning to
1:34:26
give or something. You know, also probably
1:34:28
that AI company would be like pretty influential
1:34:30
in discussions of how, you know, how AI
1:34:33
should be regulated and how people should be thinking of AI.
1:34:35
They could be a legitimizer, all that stuff. I think
1:34:37
they'd be a good place for people to go and just like skill
1:34:39
up, learn more about AI, become more
1:34:41
important players. So I think in the short run, they'd
1:34:43
have a lot of just like expertise in-house that
1:34:45
they could like work on a lot of problems, like
1:34:47
probably to design ways of measuring whether an
1:34:49
AI system is dangerous. One
1:34:51
of the first places you'd want to go for people to be good at that
1:34:53
would be a top AI lab that's building some of the most
1:34:55
powerful models. So I think there's a lot of ways
1:34:58
they could do good in the short run. And then, you know,
1:35:00
I have written stories that just have it in the long
1:35:02
run. It's just like when we get these really powerful
1:35:04
systems, it just like actually does matter a lot who
1:35:07
has them first and what they're using them, literally using
1:35:09
them for. It's like when you have very powerful AIs,
1:35:12
is the first thing you're using them for trying
1:35:14
to figure out how to make future systems safe
1:35:16
or trying to figure out how to assess the
1:35:18
threats of future systems or is the first
1:35:20
thing you're using them for just like trying to
1:35:22
rush forward as fast as you can, do faster algorithms,
1:35:25
do more, you know, more bigger systems
1:35:27
or is the first thing you're using them for just some random economic
1:35:29
thing that is kind of cool and makes a lot of money. Some
1:35:32
customer facing thing. Yeah, but it's
1:35:34
not bad, but it's not reducing the risks we care
1:35:36
about. So, you know, I think there is a lot of
1:35:38
good that can be done there. And then there's also
1:35:40
a lot, I want to be really clear, a lot of harm
1:35:42
an AI company could do. I mean, you know,
1:35:44
if you're pushing out these systems,
1:35:47
kill everyone. So, you know, you're pushing
1:35:49
out these AI systems and if you're
1:35:54
doing it all with an eye toward profit
1:35:57
and moving fast and winning, then, you
1:35:59
know, I mean you could think of it as a
1:35:59
is you're taking the slot of someone who could have been using that
1:36:02
expertise and money and juice to
1:36:04
be doing a lot of good things. And you could also just be thinking
1:36:06
of it as like, you're just giving everyone less time to
1:36:08
figure out what the hell is going on and we already might not
1:36:10
have enough. So I wanna just be really
1:36:13
clear, this is a tough one. I
1:36:15
don't want to be interpreted as saying, one
1:36:18
of the tent poles of reducing AI risk is to go
1:36:20
start an AI lab immediately. I don't believe that.
1:36:23
But I also think that some
1:36:25
corners of the AI safety world are very dismissive
1:36:27
or just think that AI companies are
1:36:29
bad by
1:36:29
default. And I'm just like,
1:36:31
this is just like really complicated. And it
1:36:34
really depends exactly how the AI lab
1:36:36
is prioritizing kind of risk to society
1:36:38
versus success. And it has to prioritize
1:36:40
success some to be relevant or to get some of these benefits.
1:36:43
So how it's balancing is just like really hard and really complicated
1:36:46
and really hard to tell. And you're gonna have
1:36:48
to have some judgments about it. So it's not
1:36:50
a ringing endorsement, but it does
1:36:53
feel at least in theory, like part of one
1:36:55
of the main ways that we make things better. You
1:36:57
could do a lot of good.
1:36:59
Yeah, so a challenging thing
1:37:01
here in actually applying this principle,
1:37:03
I think I agree and I imagine most listeners would agree that
1:37:06
if it was the case that the AI company
1:37:08
that was kind of leading the pack in terms
1:37:10
of performance was also incredibly focused
1:37:12
on using those resources in order to solve
1:37:15
alignment and generally figure out how
1:37:17
to make things go well rather than just deploying things immediately
1:37:20
as soon as they can turn a buck, that that would be better.
1:37:22
But then it seems like at least among all of the
1:37:25
three main companies that people talk about at the moment, DeepMind,
1:37:28
OpenAI,
1:37:29
Anthropic, there are people who
1:37:31
want each of those companies to be in the lead, but they can't
1:37:33
all be in the lead at once and it's
1:37:35
not kind of not clear which one you should
1:37:37
go and work at if you wanna try to implement
1:37:39
this principle. And then when people go and try to make
1:37:42
all three of them the leader, because they can't agree on which one
1:37:44
it is, then you just end up speeding things up
1:37:46
without necessarily giving the safer one
1:37:48
an advantage. Am I thinking about this wrong or
1:37:51
is this just the reality right now? No, I think it's
1:37:53
like a genuinely really tough situation. Like
1:37:56
when I'm like talking to people or thinking about joining an AI
1:37:58
lab, like I don't like.
1:37:59
This is a tough call and people need
1:38:02
to like have nuanced views and like do their
1:38:04
own homework and like it You know, I think this stuff
1:38:06
is complex But I do think this is like a valid
1:38:08
theory of change and I don't think it's like Automatically
1:38:10
wiped out by the fact that some people disagree with each other
1:38:13
I mean it could it could be the case that just actually
1:38:15
all three of these labs are just like better
1:38:17
than some of the Alternatives that could be a thing It
1:38:20
could also be the case that just like I don't know like let's
1:38:22
say you have a world where where people disagree
1:38:25
But there's some correlation between What's
1:38:27
true and what people think so let's say you have a world where
1:38:29
you have, you know 60% of the people
1:38:31
going to one lab 30% to another 10% to another Well,
1:38:34
you could be like throwing up your hands and saying ah
1:38:36
people disagree But like I don't know this is still like probably
1:38:39
a good thing that's happening Yeah, so, you
1:38:41
know, I don't know I think it's like the whole
1:38:43
thing I just want to say the whole the whole thing is is
1:38:45
complex and I don't I don't want to sit
1:38:47
here and say hey Go to lab X
1:38:50
on this podcast because I don't think it's that
1:38:52
simple and I think you have to do your own homework And have your own views and
1:38:54
you certainly shouldn't trust me if I give their recommendation Anyway,
1:38:56
there's my conflict of interest But I think
1:38:58
we shouldn't sleep on the fact that there is
1:39:00
if you're the person who can do that homework who
1:39:02
can have that view Who can be confident that you are confident
1:39:05
enough.
1:39:05
I think there is like a lot of good to be done there So
1:39:08
we shouldn't just be like carving this out
1:39:10
as a thing. That's just like always bad when you do it or something.
1:39:12
Yeah
1:39:14
Yeah, it seems like it would be really useful for Someone
1:39:16
to start maintaining I guess a scorecard
1:39:19
or a spreadsheet of all of the different pros
1:39:21
and cons of the different labs Like what
1:39:23
what safety practices are they implementing
1:39:25
now? You know do they have good institutional
1:39:27
feedback loops to catch things that might be going
1:39:30
wrong Do have they given the right people the right incentives
1:39:32
and things like that? Because at the moment I imagine it's somewhat
1:39:34
difficult for someone deciding where to work They probably
1:39:36
are relying quite a lot on just word of mouth Yeah
1:39:39
But potentially that they could be more more objective indicators
1:39:41
that people could rely on and that could also create kind of a race
1:39:43
To the top where especially people are more likely to go and
1:39:45
work at the labs that have the have the better indicators Okay,
1:39:48
and the fourth part of the playbook was information
1:39:51
security and I guess yeah We've been trying
1:39:53
to get information security folks from AI labs on
1:39:55
the on the show to talk about this But understandably
1:39:57
there's only so much that they want to want to divulge about
1:39:59
the details of their work. Right. Why is information
1:40:02
security potentially so key here? Yeah,
1:40:05
I mean, I think you can you
1:40:08
can build these like powerful, dangerous AI systems
1:40:10
and you can
1:40:12
do a lot to try to mitigate the dangers,
1:40:14
like limiting the ways they can be used. You can
1:40:17
do various alignment techniques. But if
1:40:19
if some state or someone else steals
1:40:22
the weights, they basically stolen your system
1:40:24
and they can run it without even having to do the training
1:40:27
run. So you might, you know, you might spend a huge amount
1:40:29
of money on a training run, end up with this
1:40:31
system that's very powerful and someone else just has it.
1:40:34
And they can then also fine tune it, which
1:40:36
means they can do their own training on it and kind of change
1:40:38
the way it's operating. So whatever you did to train it
1:40:40
to be nice, they can train that right out. The
1:40:42
training they do could screw up whatever you did
1:40:44
to try and make it align. And so it's it's
1:40:47
I think at the limit of like it's
1:40:50
really just trivial for any
1:40:52
state to just grab your system and do whatever
1:40:54
they want with it and retrain it how they want. It's
1:40:56
really hard to imagine feeling really good about
1:40:58
about that situation. I
1:41:01
don't know if I really need to elaborate a lot more on that.
1:41:03
And so making it making it harder seems
1:41:05
valuable. I also this is another
1:41:07
thing where I want to say, as I have with everything
1:41:09
else, that it's not a binary. So it
1:41:12
could be the case that like after you improve
1:41:14
your security a lot, it's still possible for
1:41:16
a state actor to steal your system. But they have to take
1:41:18
more risks. They have to spend more money.
1:41:19
They have to take a deeper breath before they do it. It takes
1:41:21
some more months. Months can be a very big deal, as
1:41:23
I've been saying. When you get these very powerful systems,
1:41:26
you could do a lot in a few months
1:41:27
by the time they steal it, you could have a better system.
1:41:30
And so I don't think it's an all or nothing thing, but
1:41:32
I think it's it's a core.
1:41:34
It's it's no matter what risk of AI you're worried
1:41:37
about. You could be worried about the misalignment. You could be
1:41:39
worried about the misuse and
1:41:41
the use to develop dangerous weapons. You could be worried about
1:41:43
more esoteric stuff like how the AI does
1:41:45
decision theory. You could be worried about, you know, mind
1:41:47
crime. But like you don't want just
1:41:50
kind of like anyone, including some
1:41:52
of these state actors. You have very bad values.
1:41:54
Yeah. To just be able to steal a system, retrain
1:41:57
it how they want and use it how they want. You want
1:41:59
some kind of. where it's like the people
1:42:01
with good values controlling more of
1:42:03
the more powerful AI systems, using them to enforce some
1:42:06
sort of law and order in the world and enforcing
1:42:08
law and order generally with or without AI. So
1:42:10
it seems quite, quite robustly important.
1:42:14
I think other things about security is just like, I think it's very,
1:42:16
very hard, like just very hard to
1:42:19
make these systems hard to steal for
1:42:21
a state actor. And so I think there's just like, I don't
1:42:23
know, like I think there's a ton of room to
1:42:26
go and make things better. There could be security research
1:42:28
on innovative new methods and there can
1:42:29
also just be like a lot of blocking
1:42:32
and tackling, just getting companies to do things that we already
1:42:34
know need to be done, but that are really hard to do in
1:42:36
practice, take a lot of work, take a lot of iteration.
1:42:39
And also a nice thing about security, as opposed to some of these
1:42:41
other things, is a relatively mature field. So
1:42:43
you can learn about security in some other context
1:42:46
and then apply it to AI. So part
1:42:48
of me kind of thinks that the EA
1:42:49
community or whatever kind of screwed up by
1:42:52
not emphasizing security more. It's
1:42:54
not too hard for me to imagine a world where
1:42:56
we'd just been screaming about the
1:42:58
AI security problem for the last 10 years.
1:43:00
And how do you stop a very powerful system from
1:43:02
getting stolen? That problem is extremely
1:43:04
hard. We'd made a bunch of progress
1:43:06
on it. There were tons of people
1:43:09
concerned about this stuff on the security teams of all the
1:43:11
top AI companies. And we were kind of not
1:43:14
as active and only had a few people work on
1:43:16
alignment. I'm just like, I don't know, is that world better
1:43:18
or worse than this one? I'm not really sure. A world
1:43:20
where we were kind of more balanced and had encouraged
1:43:22
people who were a good fit for one to go into one
1:43:25
probably seems just like better. Probably seems just like better
1:43:27
than the world we're in. So yeah,
1:43:28
I think security is a really big deal. I think it hasn't
1:43:30
gotten enough attention.
1:43:31
Yeah, I put this to Bruce Schneier, who's
1:43:34
a very well-known academic or commentator in this area
1:43:36
many years ago. And he seemed kind of skeptical
1:43:38
back then. I wonder whether he's changed his mind. We
1:43:40
also talked about this with Nova Dasama
1:43:43
a couple of years ago, she works at
1:43:45
Anthropic on trying to secure models, among
1:43:48
other things. I think we even talked about
1:43:50
this one with Christine Peterson back
1:43:52
in 2017. It's a shame
1:43:54
that more people haven't gone into it because it does just seem like it's such
1:43:56
an outstand. It's like even setting all of this aside, it
1:43:59
seems like going into...
1:43:59
security, computer security is a really outstanding
1:44:02
career. It's the kind of thing that I would have loved
1:44:04
to do in an alternative life, because it's kind
1:44:06
of tractable and also exciting.
1:44:10
Really important things you can do. It's very well paid as well. Yeah,
1:44:12
I think the demand is crazily out
1:44:14
ahead of the supply and security, which is another
1:44:16
reason I wish more people had gone into it. And
1:44:19
when OpenPhil was looking for a security hire, it
1:44:21
was just I've never seen such a hiring
1:44:23
nightmare in my life. I think I asked one security
1:44:25
professional, hey, will you keep
1:44:27
an eye out for people we might be able to hire and this person
1:44:29
just
1:44:29
actually laughed. And
1:44:32
said, what the heck? Everyone asked
1:44:34
me that. Of course there's no one for you to hire. All
1:44:36
the good people have amazing jobs where they barely
1:44:38
have to do any work and they get paid a huge amount and they have
1:44:40
exciting jobs. No, I'm absolutely
1:44:43
never gonna come across someone who would be good for you to hire, but
1:44:45
yeah, I'll let you know. Haha. That
1:44:47
was a conversation I had. That was kind of representative
1:44:50
of our experience. It's crazy, and
1:44:52
I would love to be on the other side of that. It's just like a human
1:44:54
being. I would love to have the kind of skills that were in that kind
1:44:56
of demand. So yeah, it's too bad more people aren't
1:44:58
into it. It seems like a good career.
1:44:59
Go do it.
1:45:01
Yeah. So I'm basically totally
1:45:03
on board with this kind of argument. I guess if I had to push
1:45:05
back, I'd say maybe we're just
1:45:07
so far away from being able to secure these models that
1:45:10
you could put in an enormous amount of effort. Maybe like the
1:45:12
greatest computer security effort that's ever been
1:45:15
put towards any project and maybe you'll end up
1:45:17
with it costing a billion dollars in order to
1:45:19
steal the model. But that's still peanuts to
1:45:21
China or to state actors.
1:45:24
And this is obviously gonna be on their radar by the relevant
1:45:26
time. So maybe really the message we should be pushing
1:45:28
is because we can't secure the models, we just have
1:45:30
to not train them. And that's the only option
1:45:32
here. Or perhaps you just need to move the entire
1:45:35
training process inside the NSA building and
1:45:37
basically just co-opt an existing like whoever
1:45:39
has the best security, you just basically take
1:45:42
that and then use that as the shell for the training set
1:45:44
up.
1:45:45
I don't think I understand either of these alternatives. I
1:45:47
think we can come back to the billion dollar point because I don't agree
1:45:49
with that either. But let's start with this. Like the
1:45:51
only safe thing is not to train. I'm just like, how the heck would
1:45:53
that make sense? Unless we get everyone in the world
1:45:55
to agree with that forever. That doesn't seem like much of
1:45:57
a plan. So I don't understand that one.
1:45:59
I don't understand move inside the NSA building because
1:46:02
I'm like
1:46:02
if it's possible for the NSA to be secure
1:46:04
then it's probably possible for Company to be secure with a lot of
1:46:06
effort like I don't yeah, it's like neither
1:46:08
of these is making sense to me as an alternative Yeah,
1:46:11
because they're two different arguments So the NSA
1:46:13
one as we'll be saying it's gonna be so hard
1:46:15
to convert a tech company into being sufficiently
1:46:18
Secure that basically we just need to get the best
1:46:20
people in the business Wherever they are working
1:46:22
working on this problem and basically we have to like
1:46:25
redesign it from the from the ground up Well that
1:46:27
might do what we have to do I mean a good step toward
1:46:29
that would be for a lot of great people to be working in security
1:46:31
to determine that that's what has To happen
1:46:32
to be working at companies to be doing the best they can
1:46:34
and say this is what we have to do But let's
1:46:37
let's try and be as adaptable as we can I mean, it's like zero
1:46:39
chance that the company would just literally become the
1:46:41
NSA They would they would figure out what the NSA is doing
1:46:43
that they're not they would do that and they would they
1:46:46
would make the adaptations I have to make that would take an
1:46:48
enormous amount of intelligence and creativity and and
1:46:50
Person power and the more security people there are the better
1:46:52
they would do it. So yeah, I don't I don't know that That one is
1:46:54
really an alternative Okay,
1:46:57
so and what about the argument that when
1:46:59
we're not going to be able to get it to be secure enough So
1:47:02
it might even just give us
1:47:02
like false comfort to be increasing the cost
1:47:04
of stealing the model when when it's still just going To be sufficiently
1:47:07
cheap. I don't think it'll be false comfort I mean, I think
1:47:09
if you have if you have a zillion great security
1:47:11
people and they're all like
1:47:13
FYI this thing is not safe I think we're
1:47:15
probably gonna feel less secure than we do
1:47:17
now when we just when we just I think have a lot of Confusion
1:47:20
and FUD about exactly how hard it is to protect
1:47:22
a model. So I don't I don't know Kind
1:47:25
of like what's the alternative but but putting aside putting aside
1:47:27
what's the alternative? I would just disagree
1:47:29
with this thing that it's a billion dollars in its peanuts I would
1:47:31
just say look at the point at the point where it's really
1:47:33
hard Anything that's really hard.
1:47:36
There's an opportunity for people to screw it up sometimes. It
1:47:38
doesn't happen It doesn't happen and they might not be able
1:47:40
to pull it off They might just like, you know screwed
1:47:43
up a bunch of times
1:47:43
that might give us enough months to have enough
1:47:46
of an edge That it doesn't matter I think
1:47:48
another another point in all this is like if we get
1:47:50
to a future world where you have a really good standards
1:47:52
and Monitoring regime one of the things you're
1:47:54
monitoring for it could be could be security breaches.
1:47:56
So you could be saying, you know Hey, we're
1:47:58
using AI systems to enforce some
1:48:01
sort of regulatory regime that says you can't
1:48:03
train a dangerous system. Well, not only can't you train a dangerous
1:48:05
system, you can't steal any system. If we catch
1:48:07
you, there's going to be consequences for that. And
1:48:09
those consequences could be arbitrarily large. And
1:48:12
it's one thing to say a state actor can steal your AI. It's
1:48:14
another thing to say they can steal your AI without a risk of getting
1:48:16
caught. These are different security levels. So
1:48:19
I guess there's a hypothetical world in which
1:48:21
no matter what your security is, a state
1:48:23
actor can easily steal it in a week
1:48:25
without getting caught. But I doubt we're in... I
1:48:27
actually doubt we're in that world. I think you make it harder than that. And
1:48:29
I think that's worth it.
1:48:31
Yeah. Okay. Well, I've knocked
1:48:33
it out of the park in terms of failing
1:48:36
to disprove this argument that I agree with. So
1:48:39
please, people, go and learn more about this. We've
1:48:41
got an information security career review. This
1:48:44
posts up on the effective altruism forum
1:48:47
called EA InfoSec Skill Up or Make a
1:48:49
Transition InfoSec via this book club. Would
1:48:51
you go check out? There's also the EA InfoSec
1:48:53
Facebook group. So quite a lot of resources
1:48:56
as of hopefully finally people are waking up
1:48:58
to this as a really, really impactful career.
1:49:00
And I guess if you know any people who work in information
1:49:02
security, maybe you could have a conversation with them. Or
1:49:05
if you don't, maybe have a child and then train them up in
1:49:07
information security and in 30 years they'll be able to help
1:49:09
out.
1:49:10
Hey listeners and possible bad faith critics.
1:49:13
Just to be clear, I am not advocating having children
1:49:15
in order to solve talent bottlenecks in information security.
1:49:18
That was a joke designed to highlight the difficulty of finding
1:49:20
people to fill senior information security roles.
1:49:22
Okay, back to the show.
1:49:24
This is a lot of different jobs, by the way. There's
1:49:26
security researchers, there's security engineers,
1:49:28
there's security DevOps people and managers
1:49:31
and just,
1:49:32
this is a big thing. We've oversimplified it.
1:49:34
And I'm not an expert at all. It is kind
1:49:36
of weird that this is an existing industry
1:49:38
that many different organizations acquire
1:49:41
and yet it's going to be such a struggle to bring in enough
1:49:43
people to secure what is probably
1:49:45
a couple of gigabytes worth of data. It's
1:49:47
whack, right? It is. Well, this
1:49:49
is the biggest objection I hear to pushing security
1:49:51
as everyone will say, look, alignment is
1:49:53
a weird thing. We need weird people to figure out how
1:49:55
to do it. Security, it's just like, what the heck? Why don't
1:49:58
the AI companies just hire the best people that are already
1:49:59
There's a zillion of them and my response
1:50:02
to that is basically like it's security hiring
1:50:04
is a nightmare You could talk to anyone who's actually tried to do it
1:50:06
There may come a point at which AI is such
1:50:08
a big deal that AI companies are actually just
1:50:11
able to hire
1:50:12
All the people who are the best at security and they're
1:50:14
doing it and they're actually prioritizing it But
1:50:16
I think that point is not even now not even now
1:50:18
with all the hype and we're not even close to it And I think it's
1:50:20
in the future and I think that you can't just
1:50:23
hire a great security team overnight
1:50:25
and have great security overnight It like actually
1:50:27
matters that you're thinking about the problems like yours in advance
1:50:30
and that you're like building your culture and your
1:50:32
practices and your operations yours in advance and Because
1:50:34
it's just it's it's security is not a thing You
1:50:36
could just come in and bolt onto an existing company
1:50:39
and then you're secure and I think anyone who's worked in security
1:50:41
will tell You this so having great security people
1:50:43
in place making your company more secure
1:50:45
and figuring out ways to secure things Well,
1:50:48
well well in advance of when you're actually going to need
1:50:50
the security is definitely where you want to be if you
1:50:52
can and I Think having people who care about
1:50:54
these issues work on this topic does seem like
1:50:56
really valuable for that Also means that the
1:50:59
more these positions are in demand the more they're gonna be in positions
1:51:01
where they have an opportunity to have an influence And have
1:51:03
credibility. Yeah.
1:51:04
Yeah I think
1:51:06
the idea that surely it'll be possible to hire
1:51:08
for this from the mainstream might have been a not unreasonable
1:51:11
expectation 10 or 15 years ago, but the thing
1:51:13
is we're like we're already here. We can see that it's not
1:51:15
true I don't know why it's not true But definitely people
1:51:18
people really can move the needle by one outstanding
1:51:20
individual in this area Yeah, so the four
1:51:22
things so alignment research slash the threat
1:51:24
assessment research standards and monitoring Which
1:51:26
is like a lot of different potential jobs that
1:51:28
I kind of outlined at the beginning that many of which are
1:51:30
jobs They kind of don't exist yet, but it could in the future then
1:51:33
they're successful carefully. I lab their security
1:51:35
I'll say a couple things about
1:51:36
them one is I have said this before I don't think
1:51:38
any of them are binary So I think these are all things
1:51:41
and I have a draft post that I'll put up at some point
1:51:43
arguing this These are all things are like a
1:51:45
little more improves our odds in a little
1:51:47
way It's not some kind of weird function
1:51:49
where it's useless until you get it perfect I believe
1:51:51
that about all four another thing I'll say I tend
1:51:54
to focus on alignment risk because it is Probably
1:51:56
the single thing I'm most focused on and because I know this audience
1:51:58
will be into it, but I do want to say
1:51:59
again, I don't think
1:52:01
that AI takeover is the only thing we ought to
1:52:03
be worried about here. And I think the four things I've talked about
1:52:06
are highly relevant to other risks as well.
1:52:08
So I think all the things I've said
1:52:10
are really major concerns
1:52:13
if you think AI systems can be dangerous in pretty much
1:52:15
any way. Threat assessment, figuring out
1:52:17
whether they can be dangerous, what they could do in the wrong hands,
1:52:20
standards and monitoring, making sure that you're
1:52:22
clamping down on the ones that are dangerous for whatever reason.
1:52:25
Dangerous could include because they have feelings and we might mistreat
1:52:27
them. That's a form of danger, you could think. Successful,
1:52:30
carefully, AI Lab and security, I think are
1:52:31
pretty clear there too.
1:52:32
Yeah. Yeah, I think we're actually going to maybe end up
1:52:34
talking more about misuse as an area
1:52:37
than the misalignment going forward. Just
1:52:39
because I think that is like maybe more upon
1:52:41
us or like will be upon us very soon. So there's
1:52:43
a high degree of urgency. I guess also as
1:52:45
a non-ML scientist, I think I have a better grip
1:52:48
on the maybe the misuse issues. And
1:52:50
it might also be somewhat more tractable for a wider range of people to try
1:52:52
to contribute to it, to reducing misuse. Interesting.
1:52:55
Okay, so you have a post
1:52:57
as well on what AI Labs could be doing differently.
1:53:00
But I know that one has kind of already been superseded
1:53:02
in your mind. And you're going to be working
1:53:05
on that question more intensely
1:53:07
in coming months. So we're going to skip that one for
1:53:09
today and come back to it in
1:53:11
another interview down the line where the time is right,
1:53:14
maybe possibly later this year even. So instead,
1:53:17
let's push on and talk about governments. You
1:53:19
had a short post about this a couple
1:53:21
of months ago called how major governments
1:53:24
can help with the most important century. I think
1:53:26
you wrote that your views on this are even
1:53:28
more tentative than they are, I swear. Of
1:53:31
course, there's a lot of policy
1:53:33
attention to this just now. But
1:53:35
back in February, it sounded like your main recommendation
1:53:37
was actually just not to strongly commit
1:53:39
to any particular regulatory framework or any
1:53:43
particular set of rules, because
1:53:45
things they're just changing so quickly.
1:53:47
And it does seem sometimes like governments once
1:53:49
they do something, they can find it quite hard
1:53:52
to stop doing it. And once
1:53:54
they do something, then they maybe
1:53:56
move on and forget that what they're doing actually
1:53:58
needs to be constantly updated.
1:53:59
So, is that still
1:54:02
your high-level recommendation that people should be studying
1:54:04
this but not trying to write the bill
1:54:06
on AI regulation?
1:54:08
Yeah, there's some policies
1:54:10
that I'm excited about more than I was previously,
1:54:13
but I think at the high level that is still my take. It's
1:54:15
just that companies could just do something and
1:54:17
then they could just do something else. And there's
1:54:19
certain things that are hard for companies to change,
1:54:22
but there's other things that are easy for them to change. At
1:54:24
governments, it's just like you got to spin up a new agency, you
1:54:26
got to have all these directives. It's just going to be hard
1:54:28
to turn around. So I think that's right. I
1:54:30
think governments should default
1:54:32
to doing things that they really have
1:54:35
been run to ground that they really feel good about and
1:54:37
not just feel like
1:54:38
starting up new agencies left and right.
1:54:41
That does seem right. Yeah. Okay, but what if
1:54:43
someone who's senior in the White House came to you and said, sorry,
1:54:45
Holden, the eye of Sauron has turned to
1:54:47
this issue in a good way. We want to do something now.
1:54:50
What would you feel reasonably good about governments trying
1:54:52
to take on now?
1:54:53
Yeah, I have been talking with
1:54:56
a lot of the folks who work on AI policy recommendations
1:54:58
and have been just thinking about that and trying
1:55:01
to get a sense for what the ideas that the
1:55:04
people who think about this the most are most supporting are. An
1:55:07
idea that I like quite a bit is
1:55:09
requiring licenses for large training runs.
1:55:11
So basically, if you're going to do
1:55:13
a really huge training run of an AI
1:55:16
system, I think that's the kind of
1:55:18
thing that government can be aware of and should
1:55:20
be aware of. And it becomes somewhat analogous
1:55:22
to developing a drug or something where it's
1:55:24
a very expensive,
1:55:26
time consuming training process to create
1:55:28
one of these state of the art AI systems. And it's a
1:55:30
very high stakes thing to be doing. And so
1:55:33
we don't know exactly what a company should
1:55:35
have to do yet because we don't yet have
1:55:37
great evals and tests for whether AI systems
1:55:39
are dangerous. But at a minimum, you could say, you
1:55:42
need a license. So at a minimum, you need to say,
1:55:45
hey, yeah, we're doing this. We've
1:55:47
told you we're doing it. I don't know. You
1:55:50
know whether any of us have criminal records, whatever.
1:55:52
And now we've got a license. And that creates a potentially
1:55:55
flexible regime where you can later say,
1:55:57
in order to keep your license, you're going to have to.
1:55:59
measure your systems to see if they're dangerous, and
1:56:02
you're going to have to show us that they're not, and all that stuff without
1:56:04
committing to exactly how that works now. So
1:56:06
I think that's exciting idea probably.
1:56:09
I don't feel totally confident about any of this stuff, but that's
1:56:11
probably number one. For me, I think
1:56:14
the other number one for me would be some of the stuff that's already
1:56:16
ongoing, existing AI policies
1:56:19
that I think people have already pushed forward and are
1:56:21
trying to just tighten up. So some of the stuff about
1:56:23
export controls would be my other top
1:56:25
thing.
1:56:26
I think if you were to throw in a requirement
1:56:28
with the license, I would make it about information security.
1:56:30
So I think government requiring at least
1:56:33
minimum security requirements of anyone
1:56:35
training frontier models just seems like a good idea, just
1:56:38
like getting them on that ramp to where it's not so easy for
1:56:40
a state actor to steal it. Arguably, government
1:56:42
should just require all AI models to be treated as
1:56:44
top secret classified information, which means that
1:56:46
they would have to be subject to incredible draconian
1:56:49
security requirements involving just
1:56:51
like air gap networks and all this incredibly
1:56:53
painful stuff. Arguably, they should require that at this
1:56:55
point, given
1:56:56
how little we know about what these models are going to be imminently
1:56:58
capable of. But at a minimum, some kind of security
1:57:00
requirements seems good.
1:57:02
I think another couple of ideas, just tracking
1:57:05
where all the large models are in the world, where all
1:57:07
the hardware is capable of being used
1:57:09
for those models, I think don't necessarily want
1:57:11
to do anything with that yet. But having the ability
1:57:14
seems possibly good. And then I
1:57:17
think there are interesting questions about
1:57:19
liability and about incident tracking
1:57:22
and reporting that I think just could use some
1:57:24
clarification. I don't think I have the answer on them
1:57:26
right now. When should an
1:57:28
AI company be liable for harm that was caused
1:57:30
partly by one of its models? What
1:57:32
should the AI company's responsibilities be when
1:57:35
there is a bad incident of being able to say what happened?
1:57:37
How does that trade off against the privacy of the user? I
1:57:40
think these are things that, I don't know, feel really
1:57:42
juicy to me to consider 10 options,
1:57:45
figure out which ones are best from containing the biggest risk
1:57:47
point of view, and push that. But I don't really know what that is yet.
1:57:50
Yeah. So because, yeah, broadly speaking, we
1:57:52
don't know exactly what the rules should be in the details.
1:57:54
And we don't know exactly where we want to end up. But
1:57:56
I think that's going to cross a bunch of different dimensions put
1:57:58
in place at the beginning of the infrastructure.
1:57:59
that will probably regardless help
1:58:02
us go in the direction that we're gonna need to move gradually. Exactly,
1:58:05
and I'm not in favor, like I think there's other things governments
1:58:07
could do that are more like giving themselves kind of
1:58:09
arbitrary powers to like seize or use
1:58:11
AI models, and I'm not really in favor
1:58:13
of that. I think that could be destabilizing
1:58:16
and could cause chaos in a lot of ways. So
1:58:18
a lot of this is about like, yeah, basically
1:58:20
feeling like we're hopefully heading toward a regime of
1:58:23
testing whether AI models are dangerous and stopping
1:58:26
them if they are and having the infrastructure in place to
1:58:28
basically make that be able to work. So it's
1:58:30
not a generic thing. The government should give itself all the option
1:58:32
value, but it should be setting up for that kind
1:58:34
of thing to basically work.
1:58:36
Yeah, as I understand it, if the
1:58:38
National Security Council in the US concluded that
1:58:40
a model that was about to be trained would
1:58:42
be a massive national security hazard and
1:58:44
might lead to human extinction,
1:58:47
people aren't completely sure like which agency
1:58:49
or who has the
1:58:51
legitimate legal authority to prevent that from
1:58:53
going ahead. Or if anyone does, yeah. No
1:58:55
one's sure if anyone has that authority. Right,
1:58:57
it seems like that's something that should be patched at least, even
1:59:00
if
1:59:01
you're not creating the ability to seize
1:59:03
all of the equipment and so on with the intention of using
1:59:05
it anytime soon, maybe it should be clear that there's some
1:59:07
authority that is meant to be monitoring
1:59:09
this and should
1:59:11
take action if they conclude that something's a massive
1:59:14
threat to the country.
1:59:15
Yeah, possibly. I think I'm most excited about what
1:59:17
I think of as promising regulatory
1:59:20
frameworks that could create good incentives and could
1:59:22
help us kind of every year and a
1:59:24
little bit less about the tripwire for
1:59:26
the D-Day. I think a lot of times with
1:59:28
AI, I'm not sure there's gonna be like one really
1:59:30
clear D-Day or by the time it comes, it might be too late.
1:59:33
So I am thinking about things that could just like put
1:59:35
us on a better path day by day.
1:59:37
Yeah, okay, pushing
1:59:39
on to people who have an audience like
1:59:42
people who are active on social media or journalists or
1:59:44
I guess podcasters, heaven of a friend, you
1:59:46
wrote this article, Spreading Messages to Help with the Most
1:59:49
Important Century, which was targeted at this
1:59:51
group.
1:59:51
I guess back in ancient times in February when
1:59:54
you wrote this piece, you were
1:59:56
kind of saying that you thought people should tread carefully
1:59:58
in this area and should.
1:59:59
definitely be trying not to build up hype
2:00:02
about AI, especially just about it's like raw
2:00:04
capabilities because that could encourage
2:00:06
further investment and capabilities. Well,
2:00:08
you were saying, most people when they hear that AI could
2:00:10
be really important, rather than falling into this caution,
2:00:12
concern, risk management framework, they
2:00:15
start thinking about it purely in a competitive sense, thinking our
2:00:17
business has to be at the forefront, our country has to be at
2:00:19
the forefront. And I think indeed there
2:00:21
has been an awful lot of people
2:00:23
thinking that way recently. But
2:00:26
yeah, do you still think that people should be very cautious talking
2:00:28
about how powerful AI might
2:00:30
be given that maybe the host has already left the bond
2:00:32
on that one?
2:00:33
I think it's a lot less true than it was. I mean,
2:00:36
I think it's less likely that you hyping
2:00:38
up AI is gonna do much about AI hype. You
2:00:40
know, I think it's still not a total non-issue. And
2:00:42
especially if we're just taking the premise that you're some
2:00:45
kind of communicator and people are gonna listen to you. You
2:00:47
know, I still think the same principle basically applies
2:00:49
that like, the thing you don't wanna do is you don't
2:00:51
wanna like
2:00:52
emphasize the incredible power
2:00:54
of AI if you feel like
2:00:57
you're not at the same time getting much across
2:00:59
about how AI can be a danger to everyone at once.
2:01:02
Because I think if you do that, you are going to, the default
2:01:04
reaction is gonna be, I gotta get in on this. And
2:01:07
a lot of people already think they gotta get in on AI,
2:01:09
but not everyone thinks that, not everyone is going into AI
2:01:11
right now. So if you're talking to someone who you
2:01:13
think you're gonna have an unusual impact on, you know,
2:01:15
I think that basic rule, yeah, that basic rule
2:01:18
still seems right. And it makes it really tricky
2:01:20
to communicate about AI. You
2:01:22
know,
2:01:22
I think there's a lot more audiences now where
2:01:24
you just feel like these people have already figured
2:01:27
out what a big deal this is. I need to help them
2:01:29
understand some of the details of how it's a big deal. And
2:01:31
especially, you know, some of the threats of misalignment risk
2:01:33
and stuff like that. And I mean, yeah, that
2:01:35
kind of communication is a little bit less
2:01:37
complicated in that way, although challenging. Yeah.
2:01:40
Yeah. Yeah. Do you have any specific advice for what
2:01:42
messages seem most valuable or like ways
2:01:44
that people can frame this in a particularly productive
2:01:47
way?
2:01:47
Yeah, I wrote a post on this that you mentioned,
2:01:50
spreading messages to help with the most important century.
2:01:52
You know, I think some of the things that people
2:01:54
have trouble,
2:01:56
a lot of people have trouble understanding or don't seem to understand,
2:01:58
or maybe just disagree with me on.
2:01:59
and I would love to just see the dialogue get better,
2:02:03
is this idea that AI could be dangerous
2:02:05
to everyone at once. It's not just about whoever
2:02:07
gets it wins. The kind of terminator
2:02:09
scenario, I think, is actually just pretty
2:02:12
real. And the way that I would probably put
2:02:14
it at a high level is just like, there's only
2:02:16
one kind of mind right now. There's only one
2:02:18
kind of species or thing that can
2:02:21
develop its own science technology. That's
2:02:23
humans. We might be about to
2:02:25
have two instead of one. That would be the
2:02:27
first time in history we had two.
2:02:30
The idea that we're gonna stay in control, I think, should
2:02:32
just not
2:02:33
be something we're too confident in. I
2:02:35
think that would be at a high level and then at a low level.
2:02:38
And I would say with humans too, it's like humans
2:02:41
would kind of fill out of this trial and error process
2:02:43
and for whatever reason, we had our own agenda that wasn't good
2:02:45
for all the other species. Now we're building AI's
2:02:47
by trial and error process. Are they gonna have their
2:02:50
own agenda? I don't know, but if they're
2:02:52
capable of all the things humans are, it doesn't
2:02:54
feel that crazy. And then I would say it
2:02:56
feels even less crazy when you look at the details
2:02:58
of how people build AI systems today and you
2:03:00
imagine extrapolating that out to very powerful systems.
2:03:03
It's really easy to see how
2:03:05
we could be training these things to kind of have goals
2:03:07
and optimize, like the way you would optimize to win a chess
2:03:10
game. We're not building these systems
2:03:12
that are just these kind of like very well-understood,
2:03:15
well-characterized reporters
2:03:17
of facts about the world. We're building these systems that
2:03:19
are like these very opaque, trained
2:03:21
with kind of like sticks and carrots
2:03:24
and they may in fact have kind of what you might
2:03:26
think of as goals or aims. And that's something I wrote
2:03:28
about in detail. So I think, yeah, I think trying
2:03:30
to communicate about why we
2:03:32
could expect these kind of terminator scenarios
2:03:35
to be serious or versions of them to be serious,
2:03:37
how that works mechanistically and also just like
2:03:40
the high level intuitions seems like a really
2:03:42
good message that I think could be a corrective
2:03:44
to some of the racing and help people
2:03:47
realize that we may in some sense,
2:03:49
on some dimensions, and sometimes all be in this together
2:03:51
and that may call for different kinds of interventions
2:03:53
from if it was just a race. I
2:03:56
think like some of the things that are hard about
2:03:58
measuring AI danger.
2:03:59
I think are really good for like the whole world
2:04:02
to be aware of I'm really worried about a world
2:04:04
in which we're just like When
2:04:05
you're dealing with it with it beings that have some sort of
2:04:07
intelligence measurement is hard So it's like let's
2:04:09
say you run a government and you're worried about a coup Are
2:04:13
you gonna be empirical and go poll
2:04:15
everyone? I wasn't there plotting a coup and then it turns
2:04:17
out that zero percent of people are plotting a coup So there's
2:04:19
no kind way right? Yeah. Yeah, that's not
2:04:21
how that works And you know that might
2:04:23
work that that kind of empirical method
2:04:26
works with things that are not thinking about what
2:04:28
you're trying to learn And how that's gonna affect your behavior.
2:04:30
And so I think again, you know with AI systems It's
2:04:32
like okay We gave this thing
2:04:35
a test to see if it would kill us and
2:04:37
it looks like it wouldn't kill us like how reliable
2:04:39
is that? There's a whole bunch of reasons that we
2:04:41
might not actually be totally set at
2:04:43
that point And that these measurements
2:04:45
could be really hard I think this is like really
2:04:47
really key because I think wiping
2:04:50
out like Enough of the risk to make something
2:04:52
commercializable is one thing and wiping
2:04:54
out enough of the risk that we're actually still fine After
2:04:56
these ais are all over the economy and
2:04:59
could kind of disempower humanity if they chose is another
2:05:01
thing Not thinking that commercialization
2:05:03
is going to take care of it not thinking that we're able
2:05:05
to just gonna be able to easily measure as we go I think
2:05:08
these are really important things for people to understand could
2:05:10
really affect the way that all this plays
2:05:12
out the way You know whether whether we do reasonable
2:05:14
things to prevent the risks I
2:05:17
don't
2:05:17
know. I think those are the big ones. I have more in my post,
2:05:19
you know general concept that just like
2:05:22
There's a lot coming it could happen really fast
2:05:24
And so the normal
2:05:25
the normal human way of just like reacting to stuff
2:05:27
as it comes may not work I think
2:05:30
is an important message
2:05:31
Important message if true if wrong I would
2:05:33
love people to spread that message so that it becomes
2:05:35
more prominent so that more people make better arguments against
2:05:38
it And then I change my mind Yeah.
2:05:40
Yeah, I was gonna say I don't know whether this is good advice
2:05:42
But I'm strategy you could take is trying to find aspects
2:05:45
of this issue that are not fully understood
2:05:47
yet by people who have a kind Of only only
2:05:49
engaged with it quite recently like exactly
2:05:51
this issue that the measurement of safety could be incredibly
2:05:54
difficult It's not just a matter of doing the
2:05:56
really obvious stuff like asking the model. Are
2:05:58
you out to kill me? And trying
2:06:00
to come up with some pithy example or story
2:06:03
or terminology that can really capture people's imagination
2:06:05
and stick in their mind. And I think exactly that
2:06:07
example of the coup where you're saying, what you're doing
2:06:09
is just going around your generals and ask them if they want
2:06:12
to overthrow you. And then they say no. And you're
2:06:14
like, well, everything is hunky dory. I think that
2:06:16
is the kind of thing that could get people to understand at
2:06:18
a deeper level why we're in
2:06:20
a difficult situation.
2:06:22
I think that's right. And I'm very mediocre
2:06:24
with metaphors. I bet some listeners are better with them. They
2:06:26
do a better job. Yeah. And Grace came
2:06:28
up with a, she wrote one in a time
2:06:31
article yesterday that I hadn't heard before, which is saying we're
2:06:33
not in a race to the finish line. Rather, we're
2:06:35
a whole lot of people on a lake that has
2:06:37
frozen over, but the ice is incredibly thin.
2:06:40
And if any of us start running, then we're all just going to fall through because
2:06:42
it's going to crash. And I was like, yes,
2:06:44
that's a great visual visualization of it. Yeah,
2:06:47
interesting. Okay. Let's
2:06:49
wish on, we talked about AI labs, governments and advocates,
2:06:52
but the
2:06:52
final grouping is the largest one, which is
2:06:54
just jobs and careers, which is of course, more than 80,000
2:06:56
hours is typically meant to
2:06:58
be about. Yeah. What's
2:07:00
another way that some listeners might be able to help
2:07:02
with this general issue by changing
2:07:05
the career that they go into or the skills that they
2:07:07
develop?
2:07:08
Yeah. So I wrote a post on this called Jobs
2:07:10
That Can Help With The Most Important Century. I think the first thing
2:07:12
I want to say is I just do expect this stuff to be quite dynamic.
2:07:15
So right now, I think we're in a very nascent
2:07:17
phase of kind of evals and standards. I
2:07:19
think we could be in a future world where there
2:07:22
are decent tests of whether AI systems are dangerous
2:07:24
and there are decent frameworks for how
2:07:26
to keep them safe. But there needs to be just
2:07:29
more work on advocacy and
2:07:31
communication so that people actually understand this stuff,
2:07:33
take it seriously, and that there is a reason
2:07:35
for companies to do this. And
2:07:38
also, there could be people working on political advocacy
2:07:41
to have good regulatory frameworks for keeping humanity
2:07:43
safe. So I think the jobs that
2:07:45
exist are going to change a lot. And I think my
2:07:47
big thing about careers in general is just
2:07:49
like if you're not finding a great
2:07:52
fit with one of the current things, that's fine.
2:07:54
And don't force it. And
2:07:56
if you have person A and person B and person
2:07:58
A is like they're doing so.
2:07:59
something that's not
2:08:01
clearly relevant to AI or
2:08:03
whatever. Let's say they're an accountant. They're
2:08:06
really good at it. They're thriving. They're
2:08:08
picking up skills. They're making connections. And
2:08:11
they're ready to go work on AI as
2:08:13
soon as an opportunity comes up, which that last part could
2:08:15
be hard to do on a personal level.
2:08:18
Then you have person B, who is kind of
2:08:20
like, they
2:08:21
had a similar profile, but they forced themselves
2:08:23
to go into alignment research. And they're doing
2:08:26
quite mediocre alignment research. They're barely
2:08:28
keeping their job. I would say person A
2:08:30
is just the higher expected impact.
2:08:32
And I think that would be my main thing on jobs,
2:08:34
is I'm just like,
2:08:36
do something where you're good
2:08:38
at it. You're thriving. You're leveling up.
2:08:40
You're picking up skills. You're picking up connections. If
2:08:42
that thing can be on a key
2:08:45
AI priority, that is ideal. If it cannot
2:08:47
be, that's OK. And don't force
2:08:49
it. So that is my high-level thing.
2:08:52
But yeah, I'm having to talk about specifically
2:08:54
what I see as some of the things people could do
2:08:56
today right now on AI that don't
2:08:58
require starting your own Oregon are more
2:09:00
like you can slot into an existing team, if you have
2:09:02
the skills and if you have the fit. I'm happy to go into that. Yeah,
2:09:05
I think people who want more advice
2:09:07
on overall career strategy, we
2:09:09
did an episode with you on that back in 2021, which is episode 110, Holden
2:09:13
Canofsky, on building aptitudes and kicking ass. So
2:09:16
I can definitely recommend going back and listening to that.
2:09:19
But yeah, maybe in terms of more
2:09:21
specific roles, are there any ones that you wanted
2:09:22
to highlight?
2:09:23
Yeah, I mean, some of them are obvious. There's
2:09:26
people working on AI alignment. There's also people
2:09:28
working on threat assessment, which we've talked
2:09:31
about, and dangerous capability
2:09:33
evaluations at AI labs
2:09:35
or sometimes at nonprofits. And
2:09:37
if there's a fit there, I think that's just an obviously
2:09:40
great thing to be working on. We've talked
2:09:42
about information security. So
2:09:44
yeah, I don't think we need to say more about that. I
2:09:46
think there is this really tough question of whether you
2:09:48
should go to an AI company and just
2:09:50
kind of do things there that are not particularly
2:09:53
safety or policy or security, just
2:09:55
like helping the company succeed. You know, that
2:09:57
can be a really, in my opinion, really great way
2:09:59
to see. skill up, a really great way to like you
2:10:02
personally becoming a person who knows a lot about
2:10:04
AI, understands AI, swims in the water
2:10:06
and is well positioned to do something else later. There's
2:10:09
big upsides and big downsides to helping an AI company
2:10:11
succeed at what it's doing and it really comes down to how you feel about
2:10:14
the company. So it's a tricky one, but
2:10:16
it's one that I think is definitely worth thinking about,
2:10:18
thinking about it carefully. Then there's, you know, there's
2:10:20
roles in government and there's roles in government
2:10:22
facing think tanks, just trying
2:10:25
to help. And I think that the interest
2:10:27
is growing. So trying to help the government make good decisions,
2:10:29
including not making rash moves about
2:10:32
how it's dealing with AI policy, what
2:10:35
it's regulating, what it's not regulating, et cetera.
2:10:37
So those are some things. Yeah,
2:10:39
I had a few other listed in my
2:10:41
post, but I think it's okay to stop there.
2:10:43
Yeah. Yeah. I mean, I guess
2:10:45
it seemed like both of these paths. So
2:10:48
one broadly speaking was going and working in the AI
2:10:50
labs or in nearby
2:10:52
industries or firms that they collaborate with. And
2:10:55
I guess there's a whole lot of different ways you could have been back there. And
2:10:57
I suppose the other one is thinking about governance
2:11:00
and policy where you could just,
2:11:03
you could pursue any kind of government and policy
2:11:05
career, try to flourish as much as you
2:11:07
can and then turn your attention towards AI
2:11:09
later on. Because there's sure to be an enormous
2:11:12
demand for more analysis and work on this in
2:11:13
coming years. So hopefully
2:11:16
in both cases, you'll be joining very rapidly growing industries.
2:11:19
And for the latter, the closer, the better. So if working on
2:11:21
technology policy is probably best, but yeah.
2:11:23
What about people who kind of, they don't see any immediate
2:11:25
opportunity to enter into either of those
2:11:27
broad streams. Is there anything that you think
2:11:30
that they could do in the meantime?
2:11:31
Yeah. So I did before talk about the kind of person
2:11:33
who could just be good at something and kind of wait for
2:11:35
something to come up later. I guess it might
2:11:37
be worth emphasizing that the ability
2:11:40
to switch careers is going to get harder
2:11:42
and harder as you get further and further into your career.
2:11:45
So I think in some ways, like, if you're
2:11:47
a person who's being successful, but is
2:11:49
also like making sure that you've got the financial
2:11:51
resources, the social resources, the psychological
2:11:53
resources, that you really
2:11:55
feel confident that as soon as a good opportunity
2:11:57
comes up to do a lot of good, you're going to switch. switch
2:12:00
jobs or have a lot of time to serve on a board
2:12:02
or whatever. I think it's weird because this is like
2:12:04
a not a measurable thing and it's not a thing you can like brag
2:12:07
about when you go to an effective altruism meetup. It
2:12:09
just seems like incredibly valuable. And I just, I
2:12:11
wish there was a way to just kind of, to kind
2:12:13
of recognize that, you know, the person who is
2:12:16
successfully able to walk
2:12:18
when they need to from a successful career has
2:12:21
in my mind, like more, more expected
2:12:23
impact than the person who's in the high impact career right
2:12:25
now, but it's not killing it.
2:12:27
Yeah. So, so I expect an enormous
2:12:30
growth in roles that might be relevant
2:12:32
to this problem in, in future years, and also just
2:12:34
an increasing number of types of roles that might be relevant
2:12:36
because there could just be all kinds of new projects that are
2:12:38
going to grow and require people who are just generally competent,
2:12:40
you know, who have management experience, who know
2:12:43
how to deal with operations and legal and so on. So they're
2:12:45
going to be looking for people who, who share their values.
2:12:48
So if you're able to potentially move to one of the hubs
2:12:50
and take one of those roles, when it becomes available, if
2:12:52
it does, then that's, that's definitely a big step
2:12:55
up relative to it's locking yourself into something else.
2:12:57
We can't shift.
2:12:59
I was going to say also just like spreading messages
2:13:01
we talked about, but I have
2:13:03
a feeling that being a person who's a good communicator,
2:13:05
a good advocate, a good persuader,
2:13:08
I have a feeling that's going to become more
2:13:10
and more relevant and there's going to be more and more jobs like
2:13:12
that over time, because I think we're in a place now where
2:13:14
people are like, just starting to figure out
2:13:16
what a good regulatory regime might
2:13:19
look like, what a good set of practices might look
2:13:21
like for containing the danger. And later, I think
2:13:23
there'll be more, more maturity there and
2:13:25
more stress placed on and people need to actually
2:13:27
understand this and care about it and do it.
2:13:29
Yeah. I mean, setting yourself the challenge
2:13:31
of taking someone who is not informed about
2:13:33
this, so might even be skeptical about this and
2:13:36
with arguments that are actually sound
2:13:38
as far as you know, persuading them to care about
2:13:40
it for the right reasons and to understand it deeply, that
2:13:42
is not simple. Uh, and if you're able to build
2:13:44
the skill of doing that through, through practice, it
2:13:47
would be unsurprising if that turned out to be, to be
2:13:49
very useful in some role in future. And I should
2:13:51
be clear, there's a zillion versions of that that have
2:13:53
like dramatically different skillsets. So there's like
2:13:55
people who, you know, their thing is they work
2:13:57
in government and there's some kind of
2:13:59
government sub
2:13:59
culture that they're very good at communicating with and government
2:14:02
ease. And then there's people who make viral
2:14:04
videos. Then there's people who organize
2:14:06
grassroots protests. And there's so many. There's
2:14:10
journalists. There's highbrow journalists, lowbrow
2:14:12
journalists. It's just like communication is not
2:14:14
a generalizable skill. There's an
2:14:16
audience. And there's a gazillion audiences. And
2:14:18
there are people who are terrible with some audiences and amazing
2:14:20
with other ones. So this is many, many jobs. And I
2:14:22
think there'll be more and more over time.
2:14:24
Yeah. OK. We're just
2:14:26
about to wrap up this AI section. I
2:14:28
guess I had two questions from the audience to
2:14:30
run by you first. Yeah, one audience member
2:14:32
asked, what, if anything, should Open Philanthropy have
2:14:34
done two to five years ago to put us
2:14:36
in a better position to deal with AI now? Is
2:14:39
there anything that we missed?
2:14:41
Yeah. In terms of actual stuff we
2:14:43
literally kind of missed, I mean, I feel
2:14:45
like this whole idea of like E-Vels and standards
2:14:47
is like everyone's talking about it now. But
2:14:49
I mean, heck, it would have been much better if everyone was talking
2:14:51
about it five years ago. That would have been great.
2:14:54
I think in some ways, in some ways, this
2:14:56
research was kind of too hard to do before the models
2:14:58
got pretty good. But there might have been some start
2:15:00
on it, at least with understanding how it works
2:15:02
in other industries and starting to learn
2:15:04
lessons there. Security, obviously, I have
2:15:07
regrets about just like not. There were some
2:15:09
attempts to push it from ADK
2:15:11
and from Open Phil. But I think
2:15:12
those attempts could have been a lot more, a lot
2:15:14
louder, a lot more forceful. I
2:15:16
think it's possible that security
2:15:18
being the top hotness in EA rather
2:15:20
than alignment, like
2:15:22
it's not clear to me which one of those would be better.
2:15:24
And having the two be equal, I think probably would have been better.
2:15:27
Yeah, I mean, I don't know. Like there's lots of stuff
2:15:29
we're just like,
2:15:30
I kind of wish we just like paid more attention
2:15:32
to all of this stuff faster. But those are the most specific
2:15:35
things that are easy for me to point to. Yeah.
2:15:38
What
2:15:38
do you think of the argument that we
2:15:40
should expect a lot of alignment, like useful alignment
2:15:42
research to get done ultimately because it's
2:15:46
necessary in order to make the products useful? I
2:15:48
think Pushmit Coley made this argument
2:15:50
on the show many years ago. And I've
2:15:53
definitely heard it recently as well.
2:15:55
Yeah. I think it could be right.
2:15:57
I think in some ways it feels like it's almost
2:15:59
definitely right. to an extent or something. It's
2:16:01
just like there's certain AI
2:16:02
systems that just don't at all behave
2:16:05
how you want are just going to be too hard to commercialize.
2:16:07
And AI systems that are constantly causing random
2:16:09
damage and getting you in legal trouble, I mean,
2:16:12
that's not going to be a profitable business. So
2:16:14
I do think a lot of the work
2:16:16
that needs to get done is going to get done by
2:16:18
normal commercial incentives. I'm
2:16:21
very uncomfortable having that be the whole plan. One
2:16:24
of the things I am very worried about, again, if you're
2:16:26
really thinking of AI systems as capable of doing what
2:16:28
humans can do, is that you
2:16:30
could have situations where you're
2:16:32
trading AI systems to
2:16:35
be well behaved. But what you're really training them
2:16:37
to do is to be well behaved unless they
2:16:39
can get away with that behavior in a permanent
2:16:41
way. And just like a lot of humans, it's like they
2:16:44
behave themselves because they're part of a law and order
2:16:46
situation. And if they ever found
2:16:48
themselves able to gain
2:16:50
a lot of power or break the rules and get away with
2:16:53
it, they totally would. A lot of humans are like that. You could have
2:16:55
AIs that you're basically trained to be like that. And
2:16:57
so it reminds me a little bit of some of the financial
2:16:59
crisis stuff, where it's like, you
2:17:02
could be doing things that drive your
2:17:04
day-to-day risks down, but kind of concentrate
2:17:07
all your risk in these highly correlated tail
2:17:09
events. And so I don't
2:17:11
think it's guaranteed. But I think it's quite worrying that
2:17:13
we get to be in a world where in order to get
2:17:16
your AIs to be commercially valuable, you have to
2:17:18
get them to behave themselves. But you're only getting them to behave
2:17:20
themselves up
2:17:20
to the point where they can definitely get away with it. They're
2:17:22
actually kind of capable enough to be able to
2:17:24
tell the difference between those two things. And
2:17:27
so I don't want our whole plan to be commercial
2:17:29
incentives. We'll take care of this. And if anything, I
2:17:31
tend to be focused on the parts of the problem that seem
2:17:33
less likely to get naturally addressed that way. Yeah,
2:17:36
another analogy there is to
2:17:38
the forest fires. Where as I understand it, because
2:17:40
we wouldn't like forest fires, we basically prevent
2:17:43
forests from ever having fires. But then that
2:17:45
causes more brush to build
2:17:47
up. And then every so often, you have some enormous
2:17:49
cataclysmic fire
2:17:50
that you just can't put out because the amount of
2:17:53
combustible material there is extraordinarily high, like
2:17:55
more than you ever would have had naturally before humans started
2:17:57
putting out these fires. I guess that's one way
2:17:59
in.
2:17:59
fact that trying to prevent
2:18:02
like small scale bad outcomes
2:18:04
or trying to prevent like a minor misbehavior
2:18:06
by models could give you a false sense of security because you'd be
2:18:08
like, well, we haven't had a haven't had a forest fire in
2:18:11
so long. But then of course, all you're doing
2:18:13
is like causing something much worse to happen later, because
2:18:15
you've been allowed into complacency. Yeah. And
2:18:17
I'm not I'm not that concerned about false sense of security.
2:18:20
I think we should like try and make things good and and
2:18:22
then argue about whether they're actually good. So, you know,
2:18:24
I think we should try and get models to behave. And
2:18:26
after we've done everything we can to do that, we should ask if we
2:18:29
really got them behave and what might we
2:18:31
be missing. So
2:18:32
I don't I don't think we shouldn't care if they're
2:18:34
if they're being nice. But I think it's not the end of
2:18:36
the conversation.
2:18:37
Yeah, another audience member asked,
2:18:40
how should people who've been thinking about and working on AI
2:18:42
safety for for many years who react
2:18:44
to all of these ideas suddenly becoming becoming
2:18:47
much more popular in the in the mainstream than they
2:18:49
ever were?
2:18:50
I don't know. I mean, like brag about how everyone
2:18:52
else's a poser. I mean,
2:18:55
I'm not sure what the question is. Don't
2:18:57
encourage me. Hold on. Yeah.
2:19:01
What's what like, how should they react? I mean, I don't
2:19:03
know. I mean, I is there
2:19:05
more of a sharpening? Like, I think we should still care about
2:19:07
these issues. I think that people who are
2:19:09
who are who were not interested in them before and are interested
2:19:11
in them now, we should be like really happy and
2:19:14
it should welcome them in and see if we can work productively
2:19:16
with them. What else is the question? Yes.
2:19:20
I guess it reminded me of the point you made
2:19:23
in a previous conversation we had. We said, you know, lots
2:19:25
of people kind of including us who are a bit
2:19:27
ahead of the curve on COVID. You know, we were kind of expecting
2:19:29
this sort of thing to happen. And then we saw that it was going to happen
2:19:32
weeks or months before anyone else did. And that didn't really
2:19:34
help. Like, yeah, yeah, no, we
2:19:36
managed to do anything. Yeah. And I'm not
2:19:38
like, I'm worried with this.
2:19:39
Like, on one level, I feel
2:19:41
kind of smug that I feel like I was ahead of the curve
2:19:43
on noticing this problem. But I'm also like, and we didn't
2:19:45
manage to fix it. Did we? We didn't manage to convince people.
2:19:47
So I guess, you know, there's both a degree
2:19:50
of smugness and we got to like eat humble pie at the same
2:19:52
time. I think it's I mean, I think in some ways
2:19:54
I feel better about this one. I think I think like
2:19:56
I do feel like the early concern about
2:19:58
AI was like productive.
2:19:59
We'll see, but
2:20:02
I generally feel like there is, the
2:20:05
public dialogue is probably different from what
2:20:07
it would have been if there wasn't a big set
2:20:09
of people talking about these risks and trying to understand
2:20:12
them and help each other understand them.
2:20:14
I think there's different people working in the field. We
2:20:17
don't have a field that's just 100% made of people
2:20:19
whose entire goal in life is making money. That
2:20:21
seems good. There's people
2:20:23
in government who care about this stuff, who are very
2:20:25
knowledgeable about it, who aren't just coming at it
2:20:28
from the beginning, who understand some of the big risks.
2:20:29
So, I
2:20:32
think good has been done. I think the situation
2:20:34
has been
2:20:35
made better. I think that's debatable. I don't think
2:20:37
it's totally clear. I'm not feeling like
2:20:39
nothing was accomplished, but yeah, I think
2:20:41
you're totally, I mean, I'm with you that being
2:20:44
right ahead of time, that is not- It's
2:20:46
not enough. It's not my goal in life. It is not effective
2:20:48
altruism's goal in life. You could be wrong ahead of time, be
2:20:51
really helpful. You could be right ahead of time, be really useless.
2:20:53
So yeah, I would definitely say, let's
2:20:55
focus pragmatically on solving this problem. All
2:20:58
these people who weren't interested before and are now,
2:21:00
let's be really happy that they're interested now and
2:21:02
figure out how we can all work together to reduce AI
2:21:05
risk. And let's notice how the winds are
2:21:07
shifting and how we can adapt.
2:21:08
Yeah. Okay, let's wrap
2:21:10
up this AI section. We've been talking
2:21:13
about this for a couple of hours, but interestingly, I feel like we've
2:21:15
barely scratched the surface on any of these different
2:21:17
topics. We've been keeping up a blistering
2:21:19
pace in order to not keep
2:21:21
you up for your entire workday. I
2:21:24
guess it is just interesting how many different
2:21:26
aspects that are out of this problem and how hard
2:21:28
it is to get a grip on all of them. And I
2:21:31
think one thing you said before we started recording is just that your
2:21:33
views on this are evolving very quickly. And
2:21:35
so I think probably we need to come back and have another
2:21:37
conversation about this in six or 12
2:21:38
months. And I'm sure you have more ideas
2:21:40
and maybe we can go into detail on some specific ones. Yeah,
2:21:43
that's right. I mean, I think if I were to kind of
2:21:45
wrap up where I see the AI situation right
2:21:47
now, I think there's definitely more interest. People
2:21:49
are taking risks more seriously. People are taking
2:21:51
AI more seriously. I don't think
2:21:54
anything is totally solved or anything,
2:21:56
even in terms of public attention. Alignment
2:21:58
research has...
2:21:59
has been really important for a long time, remains really
2:22:02
important. And I think it's like there's more
2:22:04
interesting avenues of it that are getting somewhat mature
2:22:06
than there used to be. There's more jobs in there. There's
2:22:08
more to do. I think the evals
2:22:11
and standards stuff is newer. And I'm
2:22:13
excited about it. And I think in a year, there may be like a
2:22:15
lot more to do there, like a lot, a lot. I think
2:22:17
another thing that I have been kind of updating on a
2:22:19
bit is that there is some amount of convergence
2:22:21
between different concerns about AI. And
2:22:24
we should lean into that while not getting too comfortable with
2:22:26
it. So I think, we're at a stage right
2:22:28
now where the main argument
2:22:29
that today's AI systems are not too dangerous
2:22:32
is that they just can't do anything that bad,
2:22:34
even if humans try to get them to. When that
2:22:37
changes, I think we should be more worried about
2:22:39
misaligned systems and we should be more worried about
2:22:41
aligned systems that bad people have access to.
2:22:43
I think for a while, those concerns are gonna
2:22:45
be quite similar and people who are concerned about aligned
2:22:48
systems and misaligned systems are gonna have a lot
2:22:50
in common. I don't think that's gonna be true
2:22:52
forever. So I think in a world where there's
2:22:55
pretty good balance of power and lots of different
2:22:57
humans have AIs and they're kind of keeping
2:22:59
each other in check, I think you would worry at that point less
2:23:01
about misuse and more about alignment because
2:23:03
misaligned AIs could end up all on one side
2:23:06
against humans or like mostly on
2:23:08
one side or just fighting each other in a way
2:23:10
where we're collateral damage. So, I
2:23:12
think right now a lot of what
2:23:15
I'm thinking about in AI is pretty convergent.
2:23:17
It's just like, how can we build a regime
2:23:19
where we detect
2:23:21
danger, which just means
2:23:23
anything that AI could do that feels like
2:23:25
it could be really bad for any reason and
2:23:27
stop it. And I think at some point it'll get harder to make
2:23:30
some of these trade-offs.
2:23:31
Okay, to be continued, let's
2:23:33
wish on to something completely different, which is
2:23:35
this article that you've been working on where you lay
2:23:38
out your reservations about, well, I
2:23:40
think in one version you call it hardcore utilitarianism
2:23:42
and another one you call it impartial expected
2:23:44
welfare maximization. I think maybe
2:23:46
for the purposes of, I guess, the acronym is IEWM
2:23:49
in the article, but I think for
2:23:52
the purposes of an audio version, let's
2:23:54
just call this hardcore utilitarianism. So, give some
2:23:56
context to tee you up here a little. Yeah, this is a topic
2:23:59
that we've discussed. with Joe Karsmith in
2:24:01
episode 152, Joe
2:24:03
Karsmith on navigating serious philosophical confusion.
2:24:05
And we also actually touched on it at the end of episode 147 with
2:24:08
Spencer Greenberg. Basically over the years,
2:24:10
you found yourself talking to people who
2:24:13
are much more all in on some sort
2:24:15
of utilitarianism than you are. And
2:24:17
I think from reading the article, the draft, I
2:24:19
think the conclusions they draw that bother you the most
2:24:22
are that open philanthropy or
2:24:24
that perhaps the effective altruism community should
2:24:26
only have been making grants to improve the long-term future.
2:24:28
And maybe only making grants related or only
2:24:30
doing any work related to artificial intelligence rather
2:24:33
than diversifying and hedging
2:24:35
all of our bets across a range of different worldviews
2:24:38
and splitting our time and resources between catastrophic
2:24:40
risk reduction as well as helping present
2:24:42
generation of people and also helping
2:24:45
non-human animals among other different
2:24:47
plausible worldviews. And also maybe
2:24:50
the conclusion that some people draw that they should act uncooperatively
2:24:52
with other people who have different values whenever
2:24:55
they think that they can get away with it. Yeah, do
2:24:57
you want to clarify any more of the
2:24:59
attitudes that you're reacting to here?
2:25:01
Yeah, I mean, one piece of clarification
2:25:04
is just like the piece, the articles you're talking about, one
2:25:06
of them was like a 2017 or 2018 Google doc that
2:25:09
I probably will just never turn into a public piece.
2:25:12
And another is a dialogue I started writing
2:25:14
that I do theoretically intend to publish
2:25:16
someday but it might never happen. It might
2:25:18
be a very long time. Yeah,
2:25:20
I don't know. The story I would tell is like, I
2:25:23
co-founded Gitwell, I've always been,
2:25:26
I've always been interested in doing the most good possible in
2:25:29
kind of a hand wavy, rough
2:25:31
like
2:25:32
way. One way of talking about what
2:25:35
I mean by doing the most good possible is like, there's
2:25:37
a kind of apples to apples principle that says,
2:25:40
when I'm choosing between two things that
2:25:42
really seem like they're pretty apples to
2:25:44
apples, pretty similar, I want to
2:25:46
do the one that helps more people more.
2:25:48
When I feel like I'm able to really make the comparison,
2:25:50
I want to do the thing that helps more people more. That
2:25:53
is a different principle from a more all encompassing
2:25:56
like, and everything can be converted
2:25:58
into apples. And all. interventions
2:26:00
are in the same footing and there's one
2:26:03
thing that I should be working on that is the best thing
2:26:05
and like there is an answer to whether it's better
2:26:08
to like increase the odds that many people
2:26:10
will ever get to exist versus like reducing
2:26:12
malaria in Africa versus like helping
2:26:15
chickens on factory farms and I've always been like
2:26:17
a little less sold on that second
2:26:19
way of thinking so there's the there's that
2:26:21
you know the more apples principle that like I want
2:26:23
more apples when it's all apples and then there's the
2:26:25
like it's all one fruit principle or something these are really
2:26:27
good names that I then I put on the spot that
2:26:30
I'm sure will stand the test of time
2:26:32
you know I got into this world and I met other people are interested
2:26:34
in similar topics a lot of them are you know identify
2:26:37
as effective altruists and I encountered these you
2:26:39
know ideas that are that were more
2:26:41
hardcore and were more saying look
2:26:44
like I think the story I would basically tell would be something
2:26:46
like there is like one
2:26:48
correct way of thinking about what it means to do good
2:26:51
or to be ethical
2:26:52
that comes down to basically utilitarianism
2:26:55
this can basically be a scene
2:26:58
by looking in your heart and seeing that subjective experiences
2:27:01
all that matters and that everything else is
2:27:03
just heuristics for optimizing pleasure
2:27:05
and minimizing pain B
2:27:07
you can show it with various theorems like
2:27:10
you know Harsani's aggregation theorem
2:27:12
tells you that if you're trying
2:27:14
to give others the deals and gambles
2:27:17
they would choose then it falls
2:27:19
out of that that you need some form of utilitarianism
2:27:22
it's a bad piece kind of going into all the stuff
2:27:24
this means and people kind of say look
2:27:26
like we think we have
2:27:27
really good reason to believe that after humanity
2:27:30
has been around for longer it is wiser if this happens
2:27:33
we will all realize that like the right way of thinking about
2:27:35
what it means to be a good person is just
2:27:37
to yeah
2:27:38
basically be utilitarian take
2:27:40
the amount of pleasure minus
2:27:42
pain add it up maximize that
2:27:45
be hardcore about it like
2:27:47
don't like lie and be a jerk for no reason
2:27:49
but like if you ever somehow knew that doing
2:27:52
that was going to maximize utility that's what you should
2:27:54
do and I and I ran into that point
2:27:56
of view and that point of view also was I think very like
2:27:58
I roll at the
2:27:59
that Open Philanthropy was going to do work
2:28:02
in long-termism and global
2:28:04
health and well-being. And, you know, my
2:28:07
basic story is like,
2:28:08
I have updated significantly toward
2:28:11
that worldview compared to where I started, but
2:28:14
I am still less than half,
2:28:16
less than half into it. And furthermore, the
2:28:18
way that I deal with that is not
2:28:20
by multiplying through and doing another layer of expected
2:28:23
value, but by saying, look, if
2:28:25
I have a big pool of money, I think
2:28:27
less than half of that money should be like following
2:28:29
this worldview.
2:28:31
I've been around for a long time in this community. I
2:28:33
think I've now heard out all
2:28:35
of the arguments, and that's still where I am.
2:28:37
And so, you know, my basic
2:28:40
stance is like, I think that
2:28:42
we are still very deeply confused about ethics.
2:28:45
I think
2:28:46
we don't really know what it means to do good.
2:28:49
And I think that reducing everything to like
2:28:52
utilitarianism is probably not workable.
2:28:54
I think it probably actually just breaks in very
2:28:57
simple mathematical ways. And
2:28:59
I think we probably have to have a lot
2:29:01
of arbitrariness in our views of ethics. I think
2:29:04
we probably have to have some version of just like caring
2:29:06
more about people who are more similar to us or
2:29:08
closer to us. And so I think,
2:29:10
you know, yeah, I still am basically unprincipled
2:29:13
on ethics. I still basically like
2:29:16
have a lot of things that I care about that I'm not sure why
2:29:18
I care about. I would still basically take a big pot of money
2:29:20
and divide it up between different things. I
2:29:23
still like believe in certain moral rules
2:29:25
that you got to follow,
2:29:26
not as long as you don't know the outcome,
2:29:28
but just you just got to follow them end of story period,
2:29:31
don't overthink it. That's the story I am.
2:29:33
So I don't know. Yeah, I wrote a dialogue trying to explain
2:29:36
why this is for someone who thinks the reason
2:29:38
I would think this is because I hadn't thought through all the hardcore
2:29:40
stuff. And instead, just addressing the hardcore stuff
2:29:42
very directly. Yeah.
2:29:44
So yeah, perfect for this interview,
2:29:46
you might have thought that we would have ended up having a debate
2:29:49
about whether impartial expected welfare maximization
2:29:52
is the right way to live or or the right theory of morality.
2:29:54
But actually, it seems like we mostly disagree on how many
2:29:56
people actually
2:29:56
really all in on hardcore.
2:29:59
Yeah, right on. Hardcore utilitarianism.
2:30:03
I guess my impression is at least the people
2:30:06
that I talk to who maybe are like somewhat filtered
2:30:08
and selected, many people, including me, absolutely,
2:30:11
think that impartial expected welfare maximization
2:30:13
is underrated by the general public. And I
2:30:16
think that, yeah. Yeah. And that there's a
2:30:18
lot of good that one can do using,
2:30:20
if you focus on increasing wellbeing, there's an awful
2:30:22
lot of good that you can do there and that most people
2:30:25
aren't thinking about that. But nonetheless, I'm
2:30:27
not confident that we've solved philosophy. I'm not so confident
2:30:29
that we've solved ethics. The
2:30:31
idea that pleasure is good and suffering
2:30:34
is bad feels like among the most plausible
2:30:36
claims that one could make about what is valuable and what is
2:30:38
disvaluable. But we don't really like the
2:30:40
idea of things being objectively valuable is incredibly
2:30:43
odd one. It's not clear how we could get any evidence
2:30:45
about that, that would be fully persuasive. And clearly
2:30:47
philosophers are very split. So people
2:30:50
kind of do this. We're forced to this odd position
2:30:52
of wanting to hedge our bets a bit between this theory
2:30:54
that seems like maybe the most plausible
2:30:57
ethical theory, but also having lots of conflicting
2:30:59
intuitions with it, and also being aware that many, many
2:31:01
smart people don't agree that this is the right
2:31:03
approach at all. But I mean, it sounds like you've
2:31:06
ended up in conversations with people who are, you know, maybe they
2:31:08
have some doubts, but they are like pretty hardcore.
2:31:11
They like really feel like there's a good chance that
2:31:13
when we look back, we're going to be like it was absolute, it was total
2:31:16
utilitarianism all along and everything else was completely
2:31:18
confused.
2:31:19
Yeah, I think that's right. I think you can,
2:31:21
there's definitely room for some nuance here. Like you don't
2:31:23
have to think you've solved philosophy. I think the position
2:31:25
a lot of people take is more like,
2:31:28
I don't really put any weight on random
2:31:30
common sense intuitions about what's good because
2:31:32
those have a horrible track record. Just
2:31:35
like the history of common sense morality looks like so
2:31:37
bad that I just don't really care what it says. So I'm
2:31:39
going to take like the best guess I've got at a systematic
2:31:42
science like, you know, with good scientific
2:31:45
properties of like simplicity and predictiveness, system
2:31:47
and morality, that's the best I can do. And
2:31:49
furthermore, there's a chance it's wrong, but
2:31:52
you can do another layer of expected value
2:31:54
maximization and multiply that through. And so
2:31:56
I'm yeah, I'm basically going to act
2:31:58
as if maximization.
2:31:59
utilities all that matters and specifically
2:32:02
maximizing the, you know, kind of like pleasure
2:32:04
minus pain type thing of subjective experience.
2:32:07
That is the best guess. That is how I should act. When
2:32:09
I'm unsure what to do, I may follow heuristics, but
2:32:12
if I ever run into a situation where the numbers just
2:32:14
clearly work out, I'm gonna do what the numbers say. Yeah,
2:32:16
and I think I not only think
2:32:19
that's not definitely right,
2:32:21
yeah, a minority of me is into that view.
2:32:23
So I think I would say, is it the most plausible
2:32:26
view? I would say no. I would say
2:32:29
the most plausible view of ethics is that it's
2:32:31
a giant mishmash of different things and
2:32:34
that what it means to be good and do good is
2:32:36
like a giant mishmash of different things and we're
2:32:38
not going to nail it anytime
2:32:40
soon. Is it the most plausible thing that's
2:32:42
kind of like need and clean and well-defined?
2:32:45
Well,
2:32:45
I would say definitely total utilitarianism
2:32:48
is not.
2:32:49
I think total utilitarianism is completely screwed,
2:32:51
makes no sense, it can't work at all, but I think
2:32:53
there's a variant of it, sometimes called Udasa,
2:32:56
that I'm okay kind of saying that's the most
2:32:58
plausible we got or something and gets like a decent
2:33:00
chunk but not a majority of what I'm thinking about.
2:33:03
Holden just used the term Udasa, which
2:33:05
is U-D-A-S-S-A. It
2:33:07
stands for Universal Distribution Absolute
2:33:10
Self-Sampling Assumption. Now, you
2:33:12
probably don't know what Udasa is and I don't
2:33:14
really either. It's some sort of attempt to
2:33:16
deal with anthropics and the universe
2:33:19
potentially being infinite in size by
2:33:21
not weighting all points in the universe equally and
2:33:23
instead assigning them ever-decreasing value following
2:33:25
some numbering system. The issue is
2:33:27
that if you keep adding an unlimited series of ones
2:33:30
you get an infinite sum and you have problems
2:33:32
making comparisons to any
2:33:34
other series that also sum to infinity. If
2:33:37
instead you add one and then a half and then
2:33:39
a quarter and then an eighth and then a sixteenth
2:33:41
and so on in an infinite series then
2:33:43
that series actually sums to a finite number,
2:33:46
that is two, and you will
2:33:48
be able to make comparisons with other such
2:33:50
series. If what I said didn't
2:33:53
make much sense to you don't worry it doesn't actually
2:33:55
need to. I just know that Udasa is
2:33:57
some technical approach that might make utilitarianism
2:33:59
viable.
2:33:59
viable in an infinite universe. We'll stick
2:34:02
up a link to people who want to read more about you Dasa, but
2:34:04
I haven't and I wouldn't blame you if you don't want to
2:34:07
either. Okay, back to the show.
2:34:09
Maybe it would be worth laying out like, you
2:34:11
know, you're doing a bunch of work, presumably it's kind of stressful
2:34:13
sometimes in order to help other people and you start to give
2:34:16
well trying, you know, wanting to help
2:34:18
the global poor. What is your conception of
2:34:20
morality and what motivates you
2:34:23
to do things in order to make the world better?
2:34:25
A lot of my answer to that is just, I
2:34:27
don't know. Sometimes when people
2:34:30
interview me about these like thought experiments, you
2:34:32
save the painting, I'll just be like, I'm not
2:34:34
a philosophy professor. And like, look, that
2:34:36
doesn't mean I'm not interested in philosophy. Like I said, I think I've argued
2:34:38
this stuff into the ground. But like, a lot
2:34:41
of my conclusion is just like,
2:34:43
philosophy is a non
2:34:46
rigorous methodology with an unimpressive
2:34:48
track record. And I
2:34:50
don't think it is that reliable
2:34:53
or that
2:34:54
important. And it isn't
2:34:57
that huge a part of my life. And I think I
2:34:59
find it really interesting. So that's not because I'm unfamiliar
2:35:01
with it. It's because I think it shouldn't
2:35:03
be. And so I'm kind of not that philosophical
2:35:06
a person in many ways. I'm super interested
2:35:08
in it. I love talking about it. I have lots of takes. But
2:35:10
I think when I make high stakes, important decisions
2:35:12
about how to spend large amounts of money, I'm not
2:35:15
that philosophical of a person. And most
2:35:18
of what I do does not rely on unusual
2:35:20
philosophical views. I think it can be
2:35:22
justified to someone with like quite
2:35:24
normal takes on ethics. Yeah. One
2:35:27
thing is that you're not a moral realist. So you
2:35:29
don't believe that there are kind of objective
2:35:31
mind independent facts about what is good and bad
2:35:34
and what one ought to do. I have never
2:35:36
figured out what this position is supposed to mean.
2:35:39
And I'm hesitant to say I'm not one because
2:35:41
I don't even know what it means. So if you can if you
2:35:43
can cash something out for me that has a clear
2:35:45
pragmatic implication, I will tell you if I am
2:35:48
or not. But I've never really even gotten what I'm disagreeing
2:35:50
with or agreeing with on that one. Yeah. Okay.
2:35:54
So that sounded
2:35:54
like
2:35:55
you had some theory of doing good
2:35:57
that or some theory of what your the enterprise
2:35:59
that you're engaged in when you try to live morally
2:36:02
or when you try to make decisions about where you should
2:36:04
give money. But it's something about
2:36:06
acting on your preferences, about making
2:36:08
the world better. Something on acting, like it's at least
2:36:11
about acting on the intuitions you have about
2:36:13
what good behavior is. I generally am subjectivist.
2:36:17
When I hear subjectivism, that sounds right. When I
2:36:19
hear more realism, I don't go that sounds wrong. I
2:36:21
don't know what you're saying. And I
2:36:23
have tried to understand. I've been trying
2:36:25
again now if you want. Yeah. If
2:36:28
more realism is true, it's a very queer thing,
2:36:31
as philosophers say. Realism
2:36:33
about moral facts is not seemingly the
2:36:35
same as scientific
2:36:36
facts about the world. It's not clear how
2:36:38
we're causally connected to these facts. Yeah,
2:36:41
exactly. I've heard many different
2:36:43
versions of moral realism. I think some
2:36:45
of them, I'm just like, this feels like a terminological
2:36:47
or semantic difference with my view. And
2:36:50
others, I'm just like, this sounds totally
2:36:52
nutso. I don't know. I
2:36:54
have trouble being in or out on this thing, because
2:36:56
it just means so many things. And I don't know which one it means.
2:36:58
And I don't know what the more interesting versions
2:37:00
are even supposed to mean. But it's fine. Yes,
2:37:03
I'm a subjectivist. I'm more or less
2:37:06
the most natural
2:37:06
way I think about morality. It's just like, I
2:37:08
decide what to do with my life. And there's certain
2:37:11
flavors of pull that I have. And those are moral
2:37:13
flavors. And I try to make myself do
2:37:15
the things that the moral flavors are pulling me on. I think
2:37:17
that makes me a better person when I do. Yeah.
2:37:20
Okay. So maybe we have highlighted
2:37:22
the differences here. To imagine this conversation
2:37:24
where you're saying, no, I'm leading open
2:37:26
philanthropy. I think that we should split our
2:37:28
efforts between a whole bunch of different projects, each
2:37:31
one of which would look exceptional on a different
2:37:33
plausible worldview. And the hardcore utilitarian
2:37:35
comes to you and says, no, you should choose the best one
2:37:37
and just fund that. Or you like spend all of your resources
2:37:40
and all of your time just focused on that best one. What
2:37:42
would you say to them in order to justify the worldview
2:37:45
diversification approach?
2:37:46
Yeah, I mean, the first thing I would say to them is just
2:37:49
like, burden's on you. And I think this
2:37:51
is kind of a tension I often have with people who consider
2:37:54
themselves hardcore. They'll just,
2:37:56
you know, it's like, they'll just be like, well, why wouldn't you be
2:37:58
a hardcore utilitarian? Like, what's the problem?
2:37:59
and it's just maximizing the pleasure and
2:38:02
minimizing the pain or the sum or the difference.
2:38:04
And I would just be like, no, no, no, you've got to tell
2:38:06
me because I am sitting
2:38:08
here with these great opportunities
2:38:11
to help huge amounts of people in very
2:38:14
different and hard to compare ways. And the
2:38:16
way I've always done ethics before in my life is
2:38:19
I basically have some voice inside me and it says,
2:38:21
this is what's right. And that voice has to carry some
2:38:23
weight. It's like even on your model, that voice has to carry some
2:38:25
weight because you, the hardcore utilitarian,
2:38:27
not Rob, because we all know you're not at all. But
2:38:31
the,
2:38:31
it's like even the most systematic theories of ethics,
2:38:34
it's like, they're all using that little voice
2:38:36
inside you that says what's right. That's the arbiter
2:38:38
of all the thought experiments. So that we're all putting weight
2:38:40
on it somewhere, somehow. And I'm like,
2:38:43
cool, that's gotta be how this works. There's
2:38:45
a voice inside me saying, this feels right, this feels wrong.
2:38:47
That voice has gotta get some weight. That voice
2:38:49
is saying, you know what? Like,
2:38:52
it is really interesting to think about these risks to
2:38:54
humanity's future, but also like,
2:38:56
it's weird. Like, this work is not
2:38:58
shaped like the other work. It doesn't have as good
2:39:00
feedback loops. It feels icky.
2:39:03
Like a lot of this work is about just
2:39:05
basically supporting people who think like us, or
2:39:07
feels that way a lot of the time. And it just feels
2:39:10
like doesn't have the same ring of
2:39:12
ethics to it. And then on the other hand, it just
2:39:14
feels like, I'd be kind of a jerk if like, like OpenPhil,
2:39:16
I believe, and you can disagree with me, is like not
2:39:19
only the biggest, but the most effective farm animal
2:39:21
welfare funder in the world. And I think
2:39:23
we've had enormous impact and made animals'
2:39:25
lives dramatically better. And
2:39:27
coming to say to me, no, you should take all that money and put
2:39:29
it like into the like diminishing
2:39:32
margin of like supporting people
2:39:34
to think about some future x-risk
2:39:36
in a domain where you're mostly have
2:39:39
a lot of these concerns about insularity. Like
2:39:41
you've got to make the case to me because the normal
2:39:44
way all this stuff works is you like listen
2:39:46
to that voice inside your head and you care what it says. And
2:39:48
some of the opportunities OpenPhil has to do a lot
2:39:50
of good are quite extreme and we do them.
2:39:52
So that's the first thing is we've gotta
2:39:54
put the burden of proof in the right place. Cause I think
2:39:56
utilitarianism is definitely interesting and has some things
2:39:58
going for it, especially.
2:39:59
you patch it and make it Yudasa, although
2:40:02
that makes it less appealing. But you got
2:40:04
to... Where's the burden proof? Yeah.
2:40:06
Yeah. Okay. So, to buy
2:40:08
into this, the hardcore utilitarian view,
2:40:10
I guess one way to do it would be, so you're committed to moral
2:40:12
realism. I guess you might be committed
2:40:14
to hedonism as a theory of value, so it's only
2:40:17
pleasure and pain. I guess then you also
2:40:19
want to add on kind of a total view,
2:40:21
so it's just about the complete aggregate
2:40:23
there. That's all that matters. You're going to say there's no
2:40:25
side constraints and kind of all of your other conflicting
2:40:28
moral intuitions are worthless, so you
2:40:30
should completely ignore those. Are there
2:40:32
any other moral, philosophical commitments
2:40:34
that underpin this view that you think are implausible
2:40:37
and haven't been demonstrated to a
2:40:39
sufficient extent? I don't think you need all those at all.
2:40:41
I mean, I've written
2:40:43
up this series... I mean, I'd steelman the hell out
2:40:45
of this position, as well as I could. I've
2:40:48
written up this series called Future Proof Ethics, and I think
2:40:50
the title kind of has been confusing and
2:40:52
I regret it, but it is trying to get at this idea
2:40:54
that I want an ethics that,
2:40:56
whether because it's correct and real, or
2:40:59
because it's what I would come to
2:41:01
on reflection, I want an ethics that's in some
2:41:03
sense better, that's in some sense what
2:41:05
I would have come up with if I had more time to think about
2:41:08
it. What would that ethics look like? I
2:41:10
don't think you need moral realism to care about this. You
2:41:12
can make a case for utilitarianism
2:41:15
that just starts from
2:41:16
gosh, humanity has a horrible track record
2:41:19
of treating people horribly. We should really try and
2:41:21
get ahead of the curve. We shouldn't be listening
2:41:23
to common sense intuitions that are actually going to be quite
2:41:26
correlated with the rest of our society, and that looks bad
2:41:28
from a track record perspective. We need to
2:41:30
figure out the fundamental principles of morality as
2:41:32
well as we can. We're not going to do it perfectly,
2:41:35
and that's going to put us ahead of the curve and make us less
2:41:37
likely to be the kind of people that would think we were
2:41:39
moral monsters if we thought about it more. You don't need moral
2:41:42
realism. You don't need
2:41:44
hedonism at all. I
2:41:46
think
2:41:46
you
2:41:48
can just say... I mean, most people
2:41:50
do do this with hedonism, but I think you can
2:41:52
just say, if you want to use Arstani's
2:41:54
aggregation theorem, which means if you
2:41:56
basically
2:41:57
want it to be the case that every
2:41:59
time... everyone would prefer
2:42:01
one kind of state of affairs to another, you do
2:42:04
that first state of affairs. You can get from
2:42:06
there and some other assumptions to
2:42:08
basically at least a form of utilitarianism
2:42:11
that says
2:42:12
a large enough number of small benefits
2:42:14
can outweigh everything else. I call this the
2:42:17
utility legions corollary. It's like a play
2:42:19
on utility monsters. But it's like, once
2:42:22
you decide that something is
2:42:24
valuable, like helping a chicken or
2:42:26
helping a person get a chance to exist, that
2:42:28
there's some number of that thing that can outweigh everything
2:42:30
else. And I think that doesn't reference
2:42:33
hedonism. It's just this idea of like, come
2:42:35
up with anything that you think is non-trivially beneficial
2:42:38
and a very large number of it beats everything
2:42:40
and wins over the ethical calculus.
2:42:42
That's like a whole set of mathematical
2:42:44
or whatever logical steps you can take that
2:42:46
don't invoke hedonism at any point. So
2:42:49
I think the steel-made version of this would not have a million
2:42:51
premises. It would say, look,
2:42:53
we really want to be ahead of the curve. That
2:42:55
means we want to be systematic, one of the minimal
2:42:58
set of principles, so that we can
2:43:00
be systematic and make sure that we're really only
2:43:02
basing our morality on the few things we feel best about.
2:43:05
And that's how we're going to avoid being moral monsters.
2:43:08
One of the things we feel best about is this
2:43:10
utilitarianism idea, which has the utility legions
2:43:12
corollary. Once we've established that,
2:43:14
now we can establish that like a
2:43:17
person who gets to live instead of not living
2:43:19
is a benefit. And therefore, enough of them cannot
2:43:21
weigh everything else. And then we can say, look,
2:43:24
if there's 10 to the 50 of them in the future, an expectation,
2:43:27
that weighs everything else that we could ever
2:43:29
plausibly come up with.
2:43:30
That to me is the least assumption root.
2:43:32
And then you can tack on some other stuff. You can be like,
2:43:35
also, like people have thought this way in the past,
2:43:37
did amazing. Jeremy Bentham was like
2:43:39
the first utilitarian, and he was like early
2:43:42
on women's rights and animal rights and anti-slavery
2:43:44
and all this other like gay rights and like all
2:43:46
this other stuff. And so this is just like, yep,
2:43:49
it's like a simple system.
2:43:50
It looks great looking backward. It's
2:43:53
built on these rock solid principles of
2:43:55
like utilitarianism and systematicity
2:43:58
and maybe sentiatism.
2:43:59
which is a thing I didn't quite cover, which
2:44:02
I should have. But, you know, radical impartiality,
2:44:05
like caring about everyone, no matter where they
2:44:07
are, as long as they're like physically identical, you have
2:44:09
to care about them the same. You could basically
2:44:11
derive from that, this system, and then that
2:44:13
system tells you there's so many future generations
2:44:16
that it just everything has to come down to them. Now,
2:44:18
maybe you have heuristics about how to actually help them, but everything
2:44:20
ultimately has to be how to help them. So that would be my steel man.
2:44:23
Yeah. Okay. Okay. So I've much more often encountered
2:44:25
this, the kind of grounding
2:44:27
for this that Sharon Hewitt-Rawlett
2:44:29
talked about
2:44:29
in episode 138, which is
2:44:32
this far more philosophical approach
2:44:35
to it. But, you know, the case you make there, it
2:44:37
doesn't sound like some watertight thing
2:44:39
to me because, well, especially once you start making
2:44:41
arguments like, oh, it has a good historical track record,
2:44:43
you'd be like, well, I'm sure I've
2:44:45
got some stuff wrong. And also, like, maybe it could be right
2:44:48
in the past, but wrong in the future. It's not an
2:44:50
overwhelming argument. But I guess, yeah, what do you say to people
2:44:52
who bring to you this basic steel
2:44:55
man of the case?
2:44:56
Yeah, I say a whole bunch of stuff. I mean, I think
2:44:58
I would say for the first thing
2:45:00
I would say is just like, it's not enough. Like, you
2:45:03
know, it's just it's just we're not talking about a rigorous
2:45:06
discipline here. You haven't done enough.
2:45:08
The stakes are too high. You haven't
2:45:10
done the work to establish this. The
2:45:13
specific things I would get into, I would first
2:45:15
just be like, I
2:45:16
don't believe you buy your own story. I think
2:45:19
I've basically, you
2:45:20
know, I think even the people who believe themselves, very
2:45:22
hardcore utilitarians, it's because
2:45:24
they no one designs thought experiments
2:45:27
just to mess with them. And I think you totally can. That
2:45:30
are just, you know, I mean, you know, one thought
2:45:33
experiment, I've kind of used it and not ever
2:45:35
anyone is going to reject some of these. But, you know,
2:45:37
one of them is it's like, well, there's an asteroid
2:45:40
and it's about to hit Cleveland and destroy it entirely
2:45:42
and kill everyone there.
2:45:43
But no one will be blamed. You know,
2:45:45
somehow this is like having a neutral effect on
2:45:48
long run future. Would you prevent the
2:45:50
asteroid hitting Cleveland for 35 cents?
2:45:53
And it's like, well, you could give that 35 cents to Center
2:45:55
for Effective Altruism or 80,000 hours
2:45:57
or MIRI. So as
2:45:59
a hardcore. Utilitarianism as a hardcore utilitarian
2:46:02
your answer has to be no, right? No you
2:46:04
someone offers you this no one else can do it You
2:46:06
either give 35 cents or you don't to stop
2:46:08
the asteroid from it In Cleveland you say no because you want to donate
2:46:10
that money to something else, right? I think most
2:46:13
people will not go for that Nobody,
2:46:15
I think they're simpler in this where
2:46:17
I think like most these hardcore utilitarians Actually
2:46:20
are like not all of them but actually most of them are like
2:46:22
super super into honesty. They try
2:46:24
to defend it They'll be like well clearly
2:46:26
honesty is like the way to maximize utility And I'm
2:46:28
just like how did you
2:46:29
figure that out like what like you're
2:46:32
like you're like your level of honesty is way
2:46:34
beyond What actual like most of the most successful
2:46:36
and powerful people are doing so like how does
2:46:38
that work? How did you how did you determine
2:46:41
this this can't be right? And so I think most of these
2:46:43
hardcore utilitarians actually have tensions within
2:46:45
themselves that they aren't Recognizing that you can
2:46:47
just dry out if you if you red team
2:46:49
them instead of doing the normal philosophical thought
2:46:51
experience same to normal people
2:46:54
And then another place I go to challenge this view
2:46:56
is that I do think
2:46:58
The principles people try to build this thing
2:47:00
on the central ones are the utilitarianism
2:47:03
idea Which is like this thing that I didn't
2:47:05
explain well with our sonny's aggregation theorem But
2:47:08
I do I do have a written up you can link to it I
2:47:10
could try and explain it better, but whatever I think it's
2:47:12
a fine principle So I'm not gonna argue with it The
2:47:14
other principle people are leaning on is impartiality,
2:47:17
and I think that one is screwed, and it doesn't work at all
2:47:19
Yeah, yeah, yeah, so you think the impartiality
2:47:22
aspect of this is just completely busted. Yeah,
2:47:24
George want to elaborate on why that is
2:47:26
Yeah, so so this is something I
2:47:28
mean I think you covered it a bit with Joe But I had a little
2:47:30
bit of a different spin on it a little bit more of an agro
2:47:32
spin on it One way to think about impartiality
2:47:35
like a minimum condition for what we might mean by
2:47:37
impartiality would be that if two Persons
2:47:40
or people or whatever I call them persons
2:47:42
to just include whatever animals and ais
2:47:45
and whatever you know two persons Let's
2:47:47
say they're physically identical
2:47:49
then you should care about them equally I would
2:47:51
kind of like I would kind of claim This is like if
2:47:54
you're if you're not meeting that condition, then
2:47:56
it's weird to call yourself impartial and
2:47:58
something is up
2:47:59
probably the hardcore person is not a big fan of you.
2:48:02
And I think you just can't do that. And
2:48:05
all the infinite ethics stuff, it just completely
2:48:07
breaks that. Not in a way that's just like a weird
2:48:10
corner case. Sometimes it might not work. It's just like,
2:48:12
actually, should I donate
2:48:15
a dollar to charity? Well,
2:48:18
across the whole multiverse,
2:48:20
incorporating expected value and a finite
2:48:22
non-zero probability of an infinite
2:48:25
size universe, then it just follows
2:48:27
that my dollar helps and hurts infinite
2:48:30
numbers of people. And there's no answer
2:48:32
to whether it is a good dollar or a bad dollar,
2:48:34
because if it helps one person,
2:48:37
then it hurts a thousand, then it helps one, then it hurts a thousand
2:48:39
onto infinity, versus if it helps a thousand,
2:48:41
then it hurts one, then it helps a thousand, then it hurts one onto
2:48:44
infinity, those are the same. They're just rearranged.
2:48:46
There is no way to compare to infinities like that. It cannot
2:48:48
be done. It's not like no one's found it yet. It
2:48:50
just can't be done. Your system actually
2:48:53
just breaks completely. It just doesn't... It won't tell
2:48:55
you a single thing. It returns an undefined
2:48:57
every time you ask it a question. We're
2:48:59
rushing through this, but that was actually was kind of the bottom
2:49:02
line of the episode with Alan Hayek. It
2:49:04
was episode 139 on
2:49:05
puzzles and paradoxes and probability and expected value.
2:49:08
It's just, it's a bad picture. It's not
2:49:10
a pleasant, it's not a pretty picture. Yeah, yeah, exactly.
2:49:12
And I have other beefs with him personally. I think
2:49:14
I should actually go on for quite a while and at some point I'll
2:49:16
write it all up, but I think just like anything
2:49:18
you try to do where you're just like, here's a physical
2:49:20
pattern or a physical process or a physical thing
2:49:23
and everywhere it is, I care about it equally. I'm just like, that
2:49:25
is you're going to be so screwed. It's not going to work.
2:49:28
The infinities are the easiest to explain way. It doesn't
2:49:30
work, but it just doesn't work. And
2:49:32
so the whole idea that you were building
2:49:34
this beautiful utilitarian system on
2:49:36
one of the things you could be confident in, well,
2:49:39
one of the things you were confident in was impartiality
2:49:41
and it's got to go. And like, you know,
2:49:43
Joe kind of presented, it's like, well, you have these tough choices
2:49:45
in infinite ethics because you can't have all of Pareto
2:49:48
and impartiality, which he called anonymity and
2:49:51
transitivity. And I'm like, yeah, you can't have all of them.
2:49:54
You got to obviously drop impartiality.
2:49:56
You can't make it work. The other two are better. Keep
2:49:59
the other two drop impartiality.
2:49:59
Once you drop in partiality,
2:50:02
I don't know, now we're in the world of just
2:50:04
like, some things are physically identical,
2:50:06
but you care more about one than the other. In some ways, that's
2:50:08
a very familiar world. Like, I care more about my
2:50:10
family than about other people, really not for any
2:50:12
good reason.
2:50:13
You just have to lean into that because that's what you are
2:50:15
as a human being. You care more about some things
2:50:17
than others, not for good reasons. You
2:50:19
can use that to get out of all the infinite ethics
2:50:22
jams. It's like there's some trick to it, and it's
2:50:24
not super easy, but basically as long as you're
2:50:26
not committing to caring about everyone, you're gonna
2:50:28
be okay. And as long as you are, you're not. So don't care about
2:50:30
everyone. And this whole fundamental principle that
2:50:32
was supposed to be powering this beautiful morality, just doesn't work.
2:50:35
Yeah. Yeah, do you want to explain a little bit
2:50:38
the mechanism that you'd use to get away from it? But
2:50:40
basically you could have,
2:50:41
if you define some kind of central point and then have
2:50:43
some, like,
2:50:45
as things get further and further away from that central point,
2:50:47
then you value them less. As long
2:50:49
as you value them less at a sufficiently rapid rate, then
2:50:51
things sum to one rather than ending up summing to infinity.
2:50:54
And so... Yeah, exactly. So now you can make comparisons
2:50:56
again.
2:50:57
Yeah, and this is all me bastardizing and oversimplifying
2:51:00
stuff. But basically you need some system that
2:51:02
says we're discounting things
2:51:04
at a fast enough rate that everything adds up to a finite
2:51:07
number, and we're discounting them even
2:51:09
when they're physically identical to each other. We gotta have some
2:51:11
other way of discounting them. So like, you know, a stupid
2:51:13
version of this would be like you declare
2:51:15
a center of the universe in space and
2:51:18
time and effort branch and everything like that,
2:51:20
and you just like discount by distance from that center.
2:51:23
And if you discount fast enough, you're fine, and you don't run into
2:51:25
the infinities. You know, the way that I think it's like
2:51:27
more people are
2:51:28
into, I've referred to it a couple times already, Udasa,
2:51:31
is like you kind of say, hey, I'm
2:51:33
going to discount you by how long a computer
2:51:35
program I have to write to point to you. And
2:51:37
then you're gonna be like, what the hell are you talking
2:51:39
about? What computer program in what language?
2:51:42
And I'm like, whatever language, pick a language, it'll work.
2:51:44
And you're like, but that's so horrible, that's so arbitrary. So
2:51:47
if I pick Python versus I picked Ruby, then
2:51:49
that'll affect who I care about. And I'm like, yeah, well, it's
2:51:51
all arbitrary, it's all stupid. But at least you can get
2:51:53
screwed by the infinities. Anyway, so I think I think
2:51:55
if I were to be if I were
2:51:57
to take the closest thing
2:51:58
to a beautiful, simple utility,
2:51:59
system that gets everything right, Udasa
2:52:03
would actually be my best guess, and it's pretty
2:52:05
unappealing, and most people who say they're hardcore say they
2:52:07
hate it. I think it's the best contender. It's better than actually
2:52:10
adding everything up.
2:52:11
So that's one approach you could take. I
2:52:13
guess
2:52:14
the Infinity stuff makes me sad,
2:52:16
because it's in as much as
2:52:19
we're right that we're just not going to be able to solve this. We're
2:52:22
not going to come up with any elegant solution that resembles our
2:52:24
intuitions or that embodies
2:52:26
impartiality in the way that we care about. Now
2:52:29
we're valuing one person because it was easier to specify
2:52:31
where they are using the Ruby programming
2:52:33
language. That doesn't capture
2:52:35
my intuitions about value or about ethics.
2:52:39
It's a very long way from them in actual fact.
2:52:41
It feels like any system like that is just
2:52:43
so far away from what I enter this
2:52:45
entire enterprise caring about that
2:52:47
I'm tempted to just give up and embrace
2:52:49
nihilism. Yeah, I think
2:52:51
that's a good temptation. Not nihilism.
2:52:54
I'm just going to do stuff that I want to do. Yeah.
2:52:56
Well, I mean look, that's kind of where I am.
2:52:58
I mean I think I'm like, look, Udast is the best you
2:53:00
can do. You probably like it a lot less than what you
2:53:02
thought you were doing. And a reasonable response
2:53:05
would be screw all this. And then after you screw all this,
2:53:07
okay, what are you doing? And I'm like, okay, well
2:53:09
what I'm doing I still like my job. I still
2:53:11
like my job. I still care about my family. And you
2:53:13
know what? I still want to be a good person. What does that mean? I
2:53:16
don't know. I don't know. Like, I notice when
2:53:18
I do something that feels like it's bad. I notice when
2:53:20
I do something that feels like it's good. I notice that
2:53:22
like,
2:53:23
I'm glad I started a charity evaluator
2:53:25
that helps people in Africa
2:53:28
and India instead of just spending my whole
2:53:30
life making money. Like, I don't know. That didn't
2:53:32
change. I'm still glad I did that.
2:53:34
And I'm just, I don't have a beautiful philosophical
2:53:36
system that gives you three principles that can derive it, but
2:53:38
I'm still glad I did it. And that's pretty much where
2:53:40
I'm at. And that's where I come back to just being like, I'm
2:53:42
not that much of a philosophy guy because I think philosophy
2:53:45
isn't really that promising.
2:53:46
But I am a guy who like works really hard to try and do
2:53:48
a lot of good because I don't think you need to be a philosophy guy to do that.
2:53:51
Just in the interview that, you know, if your
2:53:53
philosophy feels like it's breaking, then
2:53:55
that's probably a problem with the philosophy rather than with
2:53:57
you. And I wonder whether we can turn that to this case where
2:53:59
we say Well, we don't really know why induction
2:54:02
works, but nonetheless, we all go about
2:54:04
our lives as if induction is reasonable.
2:54:06
And likewise, we might say,
2:54:07
we don't know the solution to these infinity paradoxes
2:54:10
in the Mouldy verse and all of that,
2:54:12
but nonetheless, impartial welfare
2:54:14
maximization feels right. And so hopefully at some
2:54:16
point we'll figure out how to make this work and how to make
2:54:19
it reasonable. And, you know, in the meantime,
2:54:21
I'm not going to let these funny philosophical thought experiments
2:54:24
take away from me what I thought the core of ethics
2:54:26
really, really was.
2:54:27
But my question is, why is that the core of ethics? So my
2:54:29
thing is, I want to come back to the burden of proof. I
2:54:32
think I just want to be like, fine, we give up.
2:54:34
Now what are we doing? And I'm like, look, if someone
2:54:36
had a better idea than induction, I'd be pretty interested,
2:54:38
but it seems like no one does. But like, I
2:54:40
do think there is an alternative to these
2:54:43
like very simple, beautiful systems of
2:54:45
ethics that like tell you exactly
2:54:47
when to break all the normal rules. I think the alternative
2:54:49
is just like, you don't have a beautiful system, you're just like a person like
2:54:51
everyone else. Just imagine that you're not
2:54:54
very into philosophy
2:54:55
and you still care about being a good person. That's most people.
2:54:57
You can do that. It's your default. Then
2:55:00
you got to talk me out of that. You got to be like, Holden, here's something
2:55:02
that's much better than that, even though it breaks. And
2:55:04
I'm like, yeah, I haven't heard someone do
2:55:06
that. Yeah. Well, despite everything
2:55:08
you've just said, you say that you think that impartial expected
2:55:10
welfare maximization is underrated. Do you think
2:55:12
you wish that people did it like that the
2:55:14
random person on the street did it more? Do
2:55:17
you want to explain how that can possibly be?
2:55:19
I don't think he does. It's that bad. I
2:55:21
mean, I don't think like there's no way it's going
2:55:24
to be like the final answer or something. Like,
2:55:27
I don't know, like something like that. And later we're
2:55:29
going to come up with something better that's kind of like it. There's
2:55:31
going to be partiality in there. It might be
2:55:33
that it's some sort of like,
2:55:35
I tried to be partial and arbitrary, but in
2:55:37
a very principled way where I just kind of take
2:55:40
the universe as I live in it and try
2:55:42
to be fair and nice to those
2:55:45
around me. And I have to weight them a certain way. And
2:55:47
so I took the simplest way of weighting them. And
2:55:49
it's like, it's not going to be as compelling as
2:55:51
the original vision for utilitarianism was supposed to
2:55:53
be. I don't think it's that bad. And I think there's like
2:55:56
some arguments that are actually like this weird
2:55:58
simplicity criterion of like how
2:55:59
easy it is to find you with a computer program, you
2:56:02
could think of that as like,
2:56:04
what is your measure in the universe or like how much
2:56:06
do you exist or how much of you is there in the universe?
2:56:09
There are some arguments you could think of it that way. So I don't know.
2:56:11
I don't think Udasa is like totally screwed, but
2:56:13
I'm not about to like shut down open philanthropies
2:56:15
like Farm Animal Welfare program because
2:56:17
of this Udasa system. So that's yeah, it's more or less
2:56:19
the middle ground that I've come to. Yeah.
2:56:21
You know, I also just think there's a lot of good and just without
2:56:25
the beautiful system, just challenging yourself, just
2:56:27
saying, hey, common sense reality really
2:56:29
has done badly. Can I do better? Can
2:56:31
I like
2:56:32
do some thought experiments until I really believe
2:56:34
with my heart that I care a lot more about the future
2:56:37
than I thought I did and think a lot about the future? I
2:56:39
think that's fine. I think the part where you say
2:56:41
the 10 to the 50 number is taken literally
2:56:43
and like is in the master system
2:56:45
is exactly 10 to the 50 x is important
2:56:48
to saving one life. I think that's the dicier part.
2:56:50
Yeah. I thought you might say that, you know, the
2:56:53
typical person walking around who hasn't thought about any of these
2:56:55
issues, they nonetheless care about other
2:56:57
people and about having their lives go well, like at least a
2:56:59
bit.
2:57:00
And they might not have appreciated like just
2:57:02
how large an impact they could have if they turned a bit
2:57:04
of their attention to that how much they might be able to help other
2:57:06
people. So without any like deep philosophy or
2:57:08
any like great reflection or changing in their values,
2:57:10
it's actually just pretty appealing to help
2:57:13
to like do things that effectively help other people. Yeah.
2:57:15
And that's kind of what motivates you. I imagine totally.
2:57:18
I love trying to do what's good possible by
2:57:20
defining kind of a sloppy way that isn't a beautiful
2:57:23
system. And I even like the philosophical thought
2:57:25
experience. They have made me move a bit more
2:57:27
toward caring about future generations
2:57:29
and
2:57:30
especially whether they get to exist, which I think intuitively
2:57:32
is not exciting to me at all still isn't that
2:57:34
exciting to me, but it's more than it used to be. So,
2:57:36
you know, I think I think there's like value in here,
2:57:39
but the value comes from like wrestling with the stuff, thinking
2:57:41
about it and deciding where your heart comes out in the
2:57:43
end. But but I just think the dream of a beautiful system
2:57:45
isn't there. I guess the final thing I want
2:57:47
to throw in there, too, as I mentioned this earlier in the podcast.
2:57:50
But if you if you did go in on Yudasa
2:57:52
or you had the beautiful system or you somehow
2:57:54
managed to be totally impartial, I do
2:57:56
think long termism is a weird conclusion from that.
2:57:58
And so you at least should.
2:57:59
And we should realize that what you actually
2:58:02
should care about is something far weirder than future generations.
2:58:05
And if you're still comfortable with it, great. And if you're not,
2:58:07
you may want to also rethink things. Yeah.
2:58:09
So a slightly funny thing about having this conversation
2:58:11
in 2023 is that I think worldview
2:58:14
diversification doesn't get
2:58:16
us as far as it used to, or the idea of wanting
2:58:18
to split your bets across different worldviews. Yeah,
2:58:21
yeah, yeah. As AI becomes like just a more
2:58:23
dominant and obviously important consideration
2:58:25
in how things are going to play out, like not
2:58:28
just for strange existential risk related reasons, but
2:58:30
it seems incredibly related now to, you know, we'll
2:58:32
be able to get people out of poverty. We'll be able
2:58:34
to solve lots of medical problems. It wouldn't be
2:58:37
that crazy to try to help farm animals by doing something
2:58:39
related to ML models at some point in
2:58:41
the next few decades. And also if you think that it's
2:58:43
plausible that we could go extinct
2:58:44
because of AI in the next 10 years,
2:58:46
then just from a life saving list, just
2:58:48
in terms of saving lives of people alive right now, it
2:58:51
seems like an incredibly important and neglected issue.
2:58:53
It's just a funny situation to be in where like the different
2:58:56
worldviews that we kind of picked out 10 years ago now
2:58:58
like might all kind of be converging, at least temporarily
2:59:00
on a very similar set of activities because of some
2:59:03
very like
2:59:04
odd historically abnormal, like
2:59:06
indeed like deeply suspicious empirical
2:59:08
facts that we happen to be living through right now.
2:59:11
That's exactly how I feel. And it's all very awkward
2:59:13
because it just makes it hard for me to explain what
2:59:15
I even disagree with people on because I'm kind of like, well,
2:59:18
I do believe we should be mostly focused on AI
2:59:20
risk, though not exclusively. And I am
2:59:23
glad that open film puts money in other things. But
2:59:25
you know, I do believe AI risk is like the biggest
2:59:27
headline, because of these crazy
2:59:29
historical events that could be upon us. I
2:59:32
disagree with these other people who say we should be in AI
2:59:34
risk because of this insight about the size of the future. Well,
2:59:36
that's awkward. And it's kind of a strange
2:59:38
state of affairs. And I haven't always known what to do with it.
2:59:41
But yeah, I do feel that the effect of altruism unity
2:59:43
has,
2:59:44
has kind of felt philosophy first, it's kind of
2:59:46
felt like our big insight is there's a lot
2:59:48
of people in the future. And then we've kind of
2:59:50
worked out the empirics and determined the biggest threat to
2:59:52
them is AI. And I just like
2:59:55
reversing it. I just like being like,
2:59:57
we are in an incredible historical period, no
2:59:59
matter what your philosophy.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More