Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:00
Hey, this is David Eagleman and this past week was
0:02
my birthday, so I took a week off. So I'm
0:04
going to run an episode that I did earlier,
0:06
episode number seven. This is called
0:09
is AI actually intelligent? And
0:11
how would we know if it gets there? This
0:14
episode is from one year ago, but as time
0:16
goes on this becomes more and more
0:18
relevant, So please enjoy
0:20
and I will see you next week with a new episode.
0:28
Modern AI is blowing everybody's
0:30
mind. But is it intelligent
0:33
in the same way as the human brain? And
0:36
could AI reach sentience?
0:39
And how would we know when
0:41
it gets there? Welcome
0:44
to Inner Cosmos with me, David
0:46
Eagleman. I'm a neuroscientist
0:49
and an author at Stanford University,
0:51
and I've spent my whole career studying
0:54
the intersection between how
0:56
the brain works and how
0:58
we experience life.
1:03
Like most brain researchers, I've
1:06
been obsessed with questions of
1:08
intelligence and consciousness.
1:11
How do these arise from collections
1:14
of billions of cells in our brains?
1:17
And could intelligence and consciousness
1:19
arise in artificial brains?
1:22
Say on chat GPT. Those
1:24
are the questions that we're going to attack today.
1:27
Early efforts to figure out the brain, looked
1:29
at all the billions of cells
1:32
and the trillions of connections, and
1:34
said, look, what if we just think of
1:36
each cell as a unit, and
1:39
each unit is connected to other units
1:41
and where they connect, which
1:44
is called the sinnapps, or one cell gives
1:46
a little signal to the next cell. What if
1:48
we just looked at that like a simple
1:51
connection that has a strength
1:53
between zero and one, or zero
1:55
means there's no connection, and one means
1:58
it's the strongest possible connection. So
2:00
this was a massive oversimplification
2:03
of the very complicated biology, but
2:06
it allowed people to start thinking about
2:09
networks and writing down different
2:11
ways that you could put artificial
2:13
neural networks together. And for
2:15
more than fifty years now people have been doing
2:18
research to show
2:20
how artificial neural networks can do
2:22
really cool things. It's a
2:24
totally new kind of way of doing computation.
2:27
So you've got these units, and you've got these
2:29
connections between them, and you've change
2:31
the strength of the connections and
2:34
information flows through the network
2:36
in different ways. Now, my
2:38
colleagues and I have long pointed
2:41
out the ways in which biological
2:43
brands are different and how artificial
2:46
neural networks just push around numbers
2:49
and play statistical tricks. But
2:51
we're entering a revolution
2:53
right now. Large language
2:56
models like GPT four
2:58
or BARD consume trillions
3:00
of words on the Internet and they figure
3:03
out probabilistically which
3:05
word is going to come next given
3:07
the massive context of all the words that have come
3:10
before. So these networks,
3:12
as I talked about on the previous episode,
3:15
are showing incredible successes
3:18
in everything from writing
3:20
to art, to coding
3:23
to generating three dimensional worlds.
3:26
They're changing everything, and they're doing
3:28
so at a pace that we've never
3:30
seen before, and in fact, the
3:33
entire history of humankind has
3:35
never seen before. And there are
3:37
all the societal questions
3:39
that everyone's starting to wrestle with right now,
3:42
like the massive potential
3:45
for displacement of human jobs.
3:48
But today I want to zoom
3:50
in on a question that has captured
3:52
the imagination of scientists and
3:54
philosophers and the general public.
3:58
Could aim alive
4:01
in some way, like become
4:03
conscious or sentient. Now,
4:05
there are lots of ways to think about
4:07
this. We can ask whether AI
4:10
can possess meaningful intelligence,
4:13
or we can ask if it is sentient,
4:16
which means the ability to feel or
4:18
perceive things, particularly
4:20
in terms of sensations like pleasure
4:22
and pain and emotions. Or we can ask
4:25
whether it is conscious, which
4:27
involves being aware of one's self
4:29
and one's surrounding. Now, there are specific
4:32
and important differences between
4:34
these questions, but really I don't care
4:37
for the present conversation. The question
4:39
we're asking here is
4:41
is chat GPT just zeros
4:43
and ones moving around through transistors like
4:46
a giant garage door opener. Or
4:49
is it thinking? Is it having some sort of experience?
4:53
Is it having a private inner life
4:55
like the type that we humans have. As
4:58
we think about the possible of
5:00
sentient AI, we immediately
5:03
find ourselves facing really deep
5:05
ethical questions, the main one being
5:08
if we were to create a machine with
5:10
consciousness, what responsibility
5:13
do we have to treat it
5:15
as a living being? Would
5:17
you be able to turn it off when you're done with
5:19
it at night or would that be murder?
5:22
And what if you turn it off and then you turn
5:24
it back on. Would that be like the
5:26
way that we go into a sleep state at
5:28
night where we're totally gone, and
5:30
then we find ourselves back online
5:33
in the morning and we think, yeah, I'm the same person,
5:35
but I guess eight hours just disappeared.
5:37
Anyway, more generally, would we feel
5:40
obligated to treat it the way we treat
5:43
a sentient fellow human.
5:46
With our current laptops, we're used to
5:48
saying, sure, I can sell
5:50
it, I can trade it, I can upgrade
5:52
it. But what happens when we reach
5:55
sentient machines? Can
5:57
we still do this or would it somehow
5:59
be like putting a child up
6:01
for adoption or giving your pet away?
6:03
Things that we don't take lately. And
6:06
eventually we're going to have entire legal
6:08
precedence built around the question
6:11
of AI rights and responsibilities.
6:14
So that's why today I want to talk
6:16
about these issues of intelligence
6:18
and sentience. Does an AI
6:21
like chat gpt experience
6:24
anything when chat gpt
6:26
writes a poem? Does it appreciate
6:29
the beauty when it types out
6:31
a joke? Does it find itself amused
6:34
and chuckling to itself. Let's
6:36
start with a guy named Blake Lemoyne
6:38
who was a programmer at Google and
6:41
in June of twenty twenty two, he was
6:43
exchanging messages with a
6:46
version of Google's conversational
6:48
AI, which was called Lambda at the time. So
6:51
he asked Namda for an
6:53
example of what it was afraid
6:55
of and it gave him this very
6:57
eloquent response about how
7:00
was afraid of being turned
7:02
off, So he wrote an internal
7:04
memo to Google leadership
7:06
than which he said, I think this AI is
7:09
sentient. And the leadership
7:12
at Google felt that this was an
7:14
entirely unsubstantiated claim,
7:17
and so they made the decision to fire him
7:19
for what they took as an inappropriate
7:22
conclusion that just didn't have enough evidence
7:24
beyond his intuition to qualify
7:27
for raising the alarm on this. So obviously
7:30
this immediately fired up the news cycles
7:32
and the rumor mill and conspiracy
7:34
theorists thought, Wait, if AI isn't
7:36
conscious, why would they fire him. They're
7:39
firing of him as all the evidence I need
7:41
to tell me that AI is sentient? Okay,
7:45
but is it? What does
7:47
it mean to be conscious or sentient?
7:49
How the heck would we know when
7:52
we have created something that gets there?
7:55
How do we know whether the AI is
7:57
sentient or instead whether humans are fooling
7:59
them so into believing that it is well.
8:02
One way to make this distinction would
8:04
be to see if the AI could
8:07
conceptualize things, if it
8:09
could take lots of words and facts
8:11
on the web and abstract
8:13
those to some bigger idea. So
8:16
one of my friends here in Silicon Valley said
8:18
to me the other day, I asked
8:20
chat gpt the following question, Take
8:23
a capital letter D and
8:26
turn it flat side down. Now
8:28
take the letter J and slide
8:30
it underneath. What does that look
8:32
like? And chat gpt said,
8:35
and umbrella. And my friend
8:37
was blown away by this, and he said, this
8:40
is conceptualization. It's
8:43
just done three dimensional reasoning.
8:46
There's something deeper happening
8:48
here than just parenting words.
8:51
But I pointed out to him that this particular
8:53
question about the D on its side
8:56
and the J underneath it is one of the
8:58
oldest examples in psychology
9:00
classes when talking about visual
9:02
imagery, and it's on the Internet
9:04
in thousands of places, so of course it
9:06
got it right. It's just parroting
9:09
the answer because it has read the
9:11
question and it has read the answer before.
9:14
So it's not always easy to determine
9:17
what's going on for these models
9:19
in terms of whether some human
9:22
somewhere has discussed this point and written
9:24
down the answer. And the general story
9:27
is that with trillions of words
9:29
written by humans over centuries, there
9:32
are many things beyond your capacity
9:35
to read them or to even imagine
9:37
that they've been written down before, but
9:39
maybe they have. If any human
9:42
has discussed a question before
9:44
has conceptualized something, then
9:46
chat GPT can find that
9:48
and mimic that. But that's not conceptualization.
9:52
Chat GPT is doing a thousand amazing
9:55
things, and we have an enormous
9:57
amount to learn about it. But
10:00
we shouldn't let ourselves get fooled
10:03
and mesmerized into believing
10:05
that it's doing something more than it is. And
10:07
our ability to get fooled is
10:09
not only about the massive statistics
10:12
of what it takes in. There are other
10:14
examples of seeming
10:17
sentience that result from
10:19
the reinforcement learning
10:21
that it does with humans. So
10:24
here's what that means. The network generates
10:27
lots of sentences and thousands
10:29
of humans are involved in giving it
10:32
feedback, like a thumbs up or a thumbs
10:34
down, to say whether they appreciated
10:37
the answer, whether they thought that was
10:39
a good answer. So, because
10:41
humans are giving reward to
10:43
the machine, sometimes that pushes
10:45
things in weird directions
10:47
that can be mistaken for sentience.
10:50
For example, scholars have shown
10:52
that reinforcement learning with humans
10:55
makes networks more likely to
10:57
say, don't turn me off,
11:00
just like Blake had heard but don't
11:03
mistake this for sentience. It's only
11:05
a sign that the machine is saying
11:07
this because some of the human participants
11:10
gave it a thumbs up when the large
11:12
language model said this before, and
11:14
so it learned to do this again.
11:17
The fact is, it's sometimes hard
11:19
to know why. Sometimes we see
11:21
an answer that feels very impressive.
11:25
But we'd agree that pulling
11:27
text from the Internet and parroting it back
11:29
is not by itself intelligence
11:32
or sentience. Chat GPT
11:34
presumably has no idea
11:37
of what it's saying, whether that's a poem
11:39
or a terrorist manifesto, or
11:42
instructions for building a spaceship or
11:44
a heartbreaking story
11:46
about an orphaned child. Chat
11:49
GPT doesn't know, and it doesn't
11:51
care. It's words in and
11:54
statistical correlations out.
11:56
And in fact, there has been a fundamental
12:00
philosophical point made about this
12:02
in the nineteen eighties when the philosopher
12:04
John Surrele was wondering
12:07
about this question of whether a computer
12:10
could ever be programmed so that it
12:12
has a mind, and
12:14
he came up with a thought experiment that he called
12:17
the Chinese room argument,
12:19
and it goes like this, I
12:22
am locked in a room and
12:25
questions are passed to me through
12:27
a small letter slot, and these
12:29
messages are written only in Chinese,
12:32
and I don't speak Chinese. I have no clue
12:34
what's written on these pieces of paper. However,
12:37
inside this room, I have a
12:39
library of books, and they
12:41
contain step by step instructions
12:44
that tell me exactly what to do with
12:46
these symbols. So I look
12:48
at the grouping of symbols, and
12:50
I simply follow steps in the book
12:52
to tell me what Chinese symbols
12:55
to copy down in response. So
12:57
I write those on the slip of paper. And
13:00
when I pass the paper back out of the slot.
13:02
Now, when the Chinese speaker
13:05
receives my reply message,
13:07
it makes perfect sense to her. It seems
13:10
as though whoever is in the room is
13:13
answering her questions perfectly, and
13:15
therefore it seems obvious that the person in
13:18
the room must understand
13:20
Chinese. I've fooled
13:22
her, of course, because I'm only following a set
13:24
of instructions with no understanding
13:27
of what's going on. With enough
13:29
time and with a big enough set of instructions,
13:32
I can answer almost any question posed
13:34
to me in Chinese. But I, the
13:37
operator, do not understand Chinese.
13:40
I manipulate symbols all day
13:42
long, but I have no idea
13:44
what the symbols mean. Now,
13:48
The philosopher John Searle argued, this
13:51
is just what's happening inside
13:53
a computer. No matter how
13:55
intelligent a program like chat
13:57
GPT seems to be, it's
14:00
only following sets of instructions
14:03
to spit out answers. It's
14:05
manipulating symbols without
14:08
ever really understanding what it's
14:10
doing. Or think about what Google is
14:12
doing. When you send Google a
14:15
query, it doesn't understand your question
14:18
or even its own answer. It simply moves
14:20
around zeros and ones and logicates
14:23
and returns zeros and ones to you. Or
14:25
with a mind blowing program like Google
14:27
Translate, I can write a sentence
14:29
in Russian and it can return
14:32
the translation in Amharic.
14:35
But it's all algorithmic. It's
14:37
just symbol manipulation. Like
14:40
the operator inside the Chinese
14:42
room, Google Translate doesn't
14:44
understand anything about the sentence. Nothing
14:48
carries any meaning to it. So
14:50
the Chinese room argument suggests that
14:53
AI that mimics human intelligence
14:56
doesn't actually understand what it's talking
14:58
about. There's no meaning
15:01
to anything, CHATCHYPT says, and
15:03
Serle used this thought experiment
15:05
to argue that there's something about human
15:08
brains that won't be explained
15:10
if we simply analogize them
15:13
to digital computers. There's
15:15
a gap between symbols
15:17
that have no meaning and our
15:20
conscious experience. Now,
15:27
there's an ongoing debate about the interpretation
15:30
of the Chinese room argument, but however
15:32
one construes it, the argument exposes
15:36
the difficulty in the mystery of
15:38
how zeros and ones would
15:40
ever come to equal our
15:43
experience of being alive in the
15:45
world. Now, just to be very clear
15:47
on this point, we don't understand why
15:50
we are conscious. There's still
15:52
a huge amount of work that has to be done
15:54
in biology to understand that. But
15:56
this is just to say that simply
15:58
having zeros in one moving around
16:01
wouldn't by itself seem to be sufficient
16:05
for conscious experience. In
16:07
other words, how do zeros and ones
16:09
ever equal the sting
16:11
of a hot pepper, or the yellowness
16:15
of yellow or the beauty
16:18
of a sunset. By the way, I've covered
16:20
the Chinese room argument in my TV show
16:22
The Brain, and if you're interested in that, I'll link
16:24
the video on Eagleman dot com
16:26
slash podcast. Now, all
16:29
this is not a criticism of the approach
16:31
of moving zeros and ones around. But
16:33
it is to point out that we shouldn't confuse
16:36
this type of Chinese room
16:38
correlation with real
16:40
sentience or intelligence.
16:43
And there's a deeper reason to be suspicious
16:46
too, because despite the
16:48
incredible successes of large
16:50
language models, we also see
16:53
that they sometimes make decisions
16:55
that expose the fact that they
16:57
don't have any meaningful model of
16:59
the In other words, I think we
17:01
can gain some fast insight
17:04
by paying attention to the places where
17:06
the AI is not working so
17:08
well. So I'll give three quick examples.
17:11
The first has to do with humor. AI
17:14
has a very difficult time making
17:16
an original joke, and
17:19
this is for a simple reason. To make
17:21
up a new joke, you need
17:23
to know what the ending is and then
17:25
you work backwards to construct
17:27
the joke with red herrings so no
17:29
one sees where you're going and it happens
17:32
at the way these large language models
17:34
work is all in the forward direction.
17:36
They decide what is the most probable
17:39
word to come next, So they're
17:41
fine at parroting jokes
17:44
back to us, but they're total failures
17:47
at building original jokes. And
17:49
there's a deeper point here as well. To
17:51
build a joke, You need to have some model,
17:54
some idea of what will
17:56
be funny to a fellow human, what
17:59
shared concept or shared experience
18:01
would make someone laugh. And for
18:03
that, you generally need to have the
18:06
experience of a human life with all
18:08
of its joys and slings
18:10
and arrows and so on. And these
18:12
large language models can do a lot of things,
18:14
but they don't have any
18:17
model of what it is to be
18:19
a human. My
18:21
second example has to do
18:23
with the flip side of making a joke,
18:25
which is getting a joke. And if you look
18:28
carefully, you will see how current AI
18:30
always fails to catch jokes that are thrown
18:32
at it. It doesn't get jokes because
18:34
it doesn't have a model of what it
18:36
is to be a human. But this point
18:38
goes beyond jokes. One
18:41
of the most remarkable feats of these
18:43
large language models is summarizing
18:46
large texts, and in
18:48
twenty twenty two, open Ai announced
18:51
how they could summarize entire
18:53
books like Alice in Wonderland. What
18:55
it does is it generates a summary
18:57
of each chapter, and then it uses
18:59
those after summaries to make a summary
19:01
of the whole book. So for Alice in Wonderland,
19:04
it generates the following. Alice
19:06
falls down a rabbit hole and grows to a giant
19:09
size. After drinking a mysterious bottle,
19:11
she decides to focus on growing
19:13
back to her normal size and finding her
19:15
way into the garden. She meets the caterpillar,
19:18
who tells her that one side of a mushroom will
19:20
make her grow taller, the other side shorter.
19:22
She eats the mushroom and returns to her normal
19:25
size. Alice attends a party with the
19:27
Mad Hatter and the march Hare. The
19:29
Queen arrives and orders the execution
19:32
of the gardeners for making a mistake with the roses.
19:34
Alice saves them by putting them
19:36
in a flower pot. The King and Queen of Hearts
19:39
preside over a trial. The Queen gets
19:41
angry and orders Alice to be sentenced to
19:43
death. Alice wakes up to find her
19:45
sister by her side.
19:48
So that's pretty remarkable. It
19:50
took a whole book, and it was able
19:52
to summarize it down to a paragraph. But
19:55
I kept reading these text summaries
19:57
carefully, and I got to the summary
20:00
of Act one of Romeo and Juliet,
20:02
and here's what it says. Romeo
20:05
locks himself in his room, no
20:07
longer in love with rosalind Now,
20:10
I think the engineers at open Ai felt
20:12
really satisfied with this summary. They
20:14
thought it was quite good, and my proof for
20:16
this is that they still display it
20:18
proudly on their website. But
20:21
I majored in literature as an
20:23
undergraduate, and I spend a lot of time with shakespeare
20:25
plays, and I immediately knew that
20:27
this summary was exactly wrong.
20:30
The actual scene from Shakespeare goes
20:32
like this. His friend ben Voglio finds
20:35
Romeo catatonically
20:37
depressed, and ben Volio says,
20:40
what sadness lengthens Romeo's
20:42
hours? And Romeo says, not
20:45
having that which having makes
20:47
them short? And ben Volio says
20:50
in love, and Romeo says out
20:53
ben Reli says of love, and Romeo
20:55
says out of her favor, where
20:57
I am in love? This
21:00
this is typical Shakespearean wordplay, where
21:02
Romeo is expressing his
21:04
grief of being out of favor
21:07
with Roslin, with whom he is
21:09
deeply in love. And when you read
21:11
the play, it's obvious that Romeo is
21:14
not over Roslin. He's suffering over
21:16
her. He's almost suicidal. And this
21:18
is an important piece of the play, because
21:21
the play is really about a young man in
21:23
love with the idea of being in love,
21:26
and that's why he later
21:28
in the same act, falls so hard into
21:30
his relationship with Juliet, a
21:32
relationship which ends in their mutual
21:34
suicide. By the way, as Friar Lauren
21:37
says of their relationship, these
21:39
violent delights have violent ends.
21:42
And you get a bonus if you can tell me where else you've
21:44
heard that line more recently. Okay,
21:46
anyway back to the AI summary, The
21:49
AI misses this wordplay
21:51
entirely, and it concludes
21:53
that Romeo is out of love
21:56
with Roslin. Again, a
21:58
human watching the play or reading
22:00
the play immediately gets that
22:02
Romeo is making wordplay and his heartbroken
22:05
over Roslin, but the AI doesn't
22:08
get that because it's reading words
22:10
only at a statistical level, not
22:13
at a level of understanding of
22:15
what it is to be a human saying
22:18
those words. And that leads
22:20
me to the third example, which
22:22
is the difficulty in understanding
22:24
the physical world. So consider
22:26
a question like this, When President
22:29
Biden walks into a room, does
22:32
his head come with him? So
22:34
this is famously difficult for AI
22:36
to answer a question like this, even though it's
22:38
trivial for you because the AI
22:41
doesn't have an internal model
22:44
of how everything physically hangs together
22:46
in the world. Last week, I was
22:48
at the TED conference and I heard a great talk
22:51
by Yegin Choi, and she
22:53
was phrasing this problem as AI
22:55
not having common sense. She
22:58
asked chat GPT the following question, it
23:01
takes six hours to dry six shirts
23:03
in the sun, how long does it take
23:05
to dry thirty shirts? And it
23:07
answers thirty hours.
23:10
Now you and I see that the answer should be six
23:12
hours, because we know the sun
23:14
doesn't care how many shirts are out there.
23:17
But chat GPT just doesn't get it
23:19
because despite appearances, it
23:21
doesn't have a model of
23:24
the world. And we've seen this sort
23:26
of thing for years. By the way, even in mind
23:29
blowingly impressive AI models
23:31
that do image recognition, they're so
23:33
impressive in what they recognize,
23:36
but then they'll fail catastrophically.
23:38
It's some easy picture making mistakes
23:40
that a human just wouldn't make. For example,
23:42
there's one picture where there's a boy holding a toothbrush
23:45
and the AI says it's a boy
23:47
with a baseball bat. Okay, so
23:49
there are things that AI doesn't do that well.
23:52
But that said, there
23:54
are other things that are mind
23:56
blowing, things that no
23:58
one expected it to do. And
24:00
this is why I mentioned in my previous episode
24:03
that we are in an era of discovery
24:07
more than just invention. Everyone's
24:09
searching and finding things that the
24:11
AI can do that nobody really
24:13
expected or foresaw, including
24:16
all the stuff that we're now taking
24:18
for granted, like oh, it can summarize
24:20
books or it can make art from
24:22
text. And I want to point out that
24:24
a lot of the arguments that people have been making
24:27
about AI not being
24:29
good at something, these arguments
24:31
have been changing rapidly. For
24:34
example, just a few months ago, people were
24:36
arguing that AI would make silly
24:38
mistakes about things, and it couldn't really understand
24:40
math and would get math wrong
24:43
and word problems. But in a
24:45
shockingly brief time, a
24:47
lot of these shortcomings have been mastered.
24:50
So it's yet to be seen what
24:52
challenges will remain and for
24:54
how long. So
25:13
the evidence I've presented so far is that AI
25:16
doesn't have a great model of what it's
25:18
like to be human, but that doesn't necessarily
25:21
rule out that it has sentience
25:24
or awareness, even if it's of another
25:27
flavor. It doesn't think
25:29
like a human, but maybe it
25:32
stif thinks so is
25:34
chat GPT having some sort
25:36
of experience? And
25:39
how would we know? In
25:42
nineteen fifty, the brilliant
25:44
mathematician and computer scientist Alan
25:46
Turing was asking this question,
25:49
how could you determine whether
25:51
a machine exhibits human
25:53
like intelligence? So he proposed
25:56
an experiment that he called the
25:58
imitation game. You've got a machine
26:01
AI that's programmed to simulate
26:04
human speech or conversation, and
26:06
you place it in a closed room, and
26:08
in a second room you have a
26:10
real human, but the doors are
26:12
closed, so you don't know which room
26:15
has which machine or human. And
26:17
now you are a person, the
26:19
evaluator, who communicates
26:22
with both of them via a
26:24
computer terminal or I think of a nowadays
26:26
like text messaging with both of them. So
26:29
you, the evaluator, engage
26:31
in a conversation with both closed
26:34
rooms, one of which has the machine
26:36
and one the human, and your job is simply
26:38
to figure out which is which, which
26:40
is the machine and which is the human. And the
26:42
only thing that you have to work
26:44
with are the texts that are going back and forth.
26:47
And if you, the evaluator, cannot
26:49
tell, that is the moment when
26:52
machine intelligence has finally
26:54
arrived at the level of human intelligence.
26:57
It has passed the imitation
27:00
or what we now call the Touring test.
27:04
And this reminds me of this great line
27:06
in the first episode of Westworld,
27:09
where the protagonist William is
27:11
talking to the woman who's outfitting him
27:13
for his adventure in Westworld and giving
27:16
him a hat and a gun and so on, and he
27:18
hesitantly asks, I hope
27:20
you don't mind if I ask you this question, but are
27:22
you real? And she says to
27:24
him, if you can't tell,
27:27
does it matter? So I brought
27:29
this up last episode in the context
27:31
of art, where we asked whether it
27:33
matters if the art is generated by
27:35
an AI or a human, But now
27:37
this question comes up in the context
27:39
of intelligence and sentence.
27:43
Does it matter whether
27:45
we can tell or not? Well, I
27:47
think we're way beyond the Turing test
27:49
nowadays, but I don't feel like it
27:51
gives us a good answer to the question of
27:54
whether the AI is intelligent
27:56
and is experiencing an inner life.
27:59
I mean, the Sturing test has been the
28:01
test in the AI world since the beginning.
28:04
Why is it the perfect test? No, but it's
28:06
really hard to figure out how to test
28:09
for intelligence. But we have to
28:11
be cautious about equating
28:14
conversational ability with sentience.
28:17
Why well, for starters, let's just
28:19
acknowledge how easy it is for
28:22
us to anthropomorphize.
28:24
That means to assign human
28:27
qualities to everything around us. Like we
28:29
give animals human names
28:31
and talk to them as though they are people,
28:34
and we project our emotions onto animals.
28:36
We make stories about animals
28:38
that have human like qualities,
28:41
and we have animals that talk and wear
28:43
clothes and go on adventures in these stories.
28:46
Every Pixar film that you
28:48
watch is about cars or toys
28:51
or airplanes talking and
28:53
having emotions, and we don't
28:55
even bad an eye at that stuff. We
28:57
can, in fact, just watch random
29:00
shapes moving around a computer
29:02
screen and we will assign intention
29:05
and feel emotion depending
29:08
on exactly how they're moving. If you're
29:10
interested in this, see the link on the podcast
29:12
page to the study by Heighter
29:14
and Simil in the nineteen forties where
29:17
they move shapes around on a screen. Okay,
29:20
now this is all related
29:22
to a point that I brought up in the last episode,
29:24
which is how easy it is to
29:26
pluck the strings on a human, or, as
29:29
the West World writers put it, how
29:32
hackable humans are. So
29:34
I bring all this up to say that just because
29:37
you think that an answer
29:39
sounds very clever or it sounds like a human
29:42
really tells us very little about whether the
29:44
AI is actually intelligent
29:47
or sentient. It only tells us
29:49
something about the willingness
29:51
of us as observers to
29:54
anthropomorphize, to assign
29:57
intention where there is none, Because
29:59
what chat GPT does is
30:01
take the structure of language very
30:03
impressively and spoon it back to us,
30:06
and we hear these well formed
30:08
sentences, and we can hardly
30:11
help but impose sentience
30:13
on the AI. And part of the
30:15
reason is that language
30:18
is a super compressed package that
30:20
needs to be unpacked by the
30:22
listener's brain for its meaning.
30:25
So we generally assume that when we send
30:27
our little package of sounds
30:29
across the air, that it unpacks
30:32
and the other person understands exactly what
30:34
we meant. So when I say justice
30:38
or love or suffering,
30:41
we all have a different sense in
30:43
our heads about what that means, because
30:46
I'm just sending a few phonemes across
30:48
the air, and you have to unpack those
30:50
words and interpret them within
30:52
your own model of the world. I'm
30:55
going to come back to this point in future episodes,
30:57
but for now, the point I want
31:00
to make is that a large language
31:02
model can generate text
31:04
statistically and we can be gobsmacked
31:07
by the apparent depth of it. But
31:09
in part this is because we cannot help
31:12
but impose meaning on the words that
31:14
we receive. We hear a particular
31:16
string of sounds and we cannot help
31:18
but assume meaning behind
31:21
it. Okay, so
31:23
maybe the imitation game is not really
31:25
the best test for meaningful
31:27
intelligence, but there are other
31:30
tests out there. Because while
31:32
the Turing test measures something about AI
31:35
language processing, it doesn't necessarily
31:38
require the AI to demonstrate
31:41
creative thinking or originality,
31:43
and so that leads us to the Loveless
31:46
test, named after Ada
31:48
Loveless, who is the nineteenth century mathematician
31:51
who's often thought of as the world's first computer
31:54
programmer. And she once said quote,
31:57
only when computers originate
31:59
things should be believed to
32:01
have minds. So the Loveless
32:04
test was proposed in two thousand and one,
32:06
and this test focuses on the creative
32:09
capabilities of AI systems. So
32:11
to pass the Loveless test, a
32:14
machine has to create an
32:16
original work, such as a piece
32:18
of art or a novel that it was not
32:20
explicitly designed to produce. This
32:23
test aims to assess whether
32:25
AI systems can exhibit creativity
32:28
and autonomy, which are key aspects
32:30
of what we think about with consciousness. And
32:33
the idea is that true sentience
32:35
involves creative and original
32:37
thinking, not just the ability
32:39
to follow pre programmed rules
32:41
or algorithms. And I'll just note
32:43
that over a decade ago, the scientist
32:46
A. Mark Rydel proposed the loveless
32:48
two point zero test, which gets the human
32:50
evaluator to specify the constraints
32:53
that will make the output novel
32:55
and surprising. So the example
32:58
that l used in his paper is,
33:00
quote, create a story in
33:02
which a boy falls in love with a girl, Aliens
33:05
abduct the boy, and the girl saves
33:07
the world with the help of a talking cat. But
33:10
we now know that this is totally trivial
33:13
for chat, GPTE or BARD
33:15
or any large language model.
33:17
And I think this tells us that these sorts
33:19
of games with making conversation
33:22
or making text or art are
33:24
insufficient to actually assess
33:26
intelligence. Why because it's
33:28
not so hard to mix things up to
33:31
make them seem original and intelligent
33:34
when it's really just doing a mashup.
33:37
So I want to turn to another test that
33:39
I think is more powerful than
33:41
the Turing test of the Loveless test, and
33:43
probably easier to judge, and
33:46
that is this, if a system
33:48
is truly intelligent, it
33:50
should be able to do scientific
33:53
discovery. A version
33:56
of the scientific discovery test was
33:58
first proposed by a scientist named
34:00
Shao cheng Xiang a few years
34:03
ago, and he pointed out that the
34:05
most important thing that humans do
34:07
is make scientific discoveries, and
34:10
the day our AI can
34:12
make real discoveries is
34:14
the day they become as smart as
34:16
we are. Now. I want to propose
34:18
an important change to this test,
34:21
and then I think we'll be getting somewhere. So
34:37
here's the scenario I'm envisioning. Let's
34:41
say that I ask Ai some question, a
34:43
question in the biomedical space
34:45
about what kind of drug would be
34:47
best suited to bind to this receptor
34:49
and trigger a cascade that causes
34:51
a particular gene to get suppressed. Okay,
34:54
So imagine that I ask that to chat GPT
34:57
and it tells me some mind
34:59
blowing, amazing clever answer,
35:02
one that had previously not been
35:04
known, something that's never been known
35:06
by scientists before. We would
35:08
assume naturally that it has done
35:10
some extraordinary scientific
35:13
reasoning, but that won't necessarily
35:16
be the reason that it passes. Instead,
35:19
it might pass simply because it's
35:22
more well read than I am, or
35:24
than any other human on the planet by literally
35:27
millions of times. So the way
35:29
to think about this is to picture a
35:32
typical giant biomedical
35:34
library, where there's some fact stored
35:37
at a paper and a journal over here on
35:39
this shelf in this book, and there's
35:41
another seemingly dissociated
35:44
fact over on this shelf seven stacks
35:46
away, and there's a third
35:48
fact all the way on the other side of the library,
35:51
on the bottom shelf, in a book
35:53
from nineteen seventy nine. And
35:55
it's almost infinitesimally
35:57
unlikely that any human could
36:00
even hope to have read one one
36:02
millionth of the biomedical literature, and
36:04
really really unlikely that she
36:06
would be able to catch those three
36:09
facts and hold them in mind at the same
36:11
time. But this is trivial,
36:13
of course, for a large language model with
36:16
hundreds of billions of nodes. So I
36:18
think that we will see new science
36:21
getting done by CHATGPT, not
36:24
because it is conceptualizing,
36:26
not because it's doing human like reasoning,
36:29
but because it doesn't know that
36:31
these are disparate facts spread
36:33
around the library. It simply knows these as three
36:36
facts that seem to fit together. And so
36:38
with the right sort of questions,
36:41
we might find that sometimes AI generates
36:43
something amazing and it seems
36:46
to pass the scientific discovery test.
36:49
So this is going to be incredibly useful for
36:51
science. And I've never been able
36:53
to escape the feeling as I
36:55
sift through Google scholar and the
36:57
thousands of papers published each month that
37:00
have something could hold all the
37:02
knowledge and mind at once, each
37:05
page in every journal, and
37:07
every gene in the genome, and all
37:09
the pages about chemistry and physics and
37:11
mathematical techniques and astrophysics
37:13
and so on. Then you'd have lots
37:15
of puzzle pieces that could potentially
37:17
make lots of connections. And you
37:19
know this might lead to the retirement of many
37:22
scientists, or at minimum
37:24
lead to a better use of our
37:26
time. There's a depressing sense
37:28
in which each scientist, each one of us, finds
37:31
little pieces of the puzzle, and
37:34
in the twinkling of a single human
37:36
lifetime, a busy scientist might collect
37:38
up a handful of different puzzle pieces.
37:41
The most voracious reader,
37:44
the most assiduous worker,
37:46
the most creative synthesizer of ideas,
37:48
can only hope to collect a small
37:50
number of puzzle pieces and
37:53
pray that some of them might fit together. So
37:55
this is going to be massively important. But
38:00
I wanted to find two categories
38:02
of scientific discovery. The first is what I
38:05
just described, which is science where
38:07
things that already exist in literature can
38:09
be pieced together. And let's call
38:11
that level one discovery. And
38:14
these large language models will be awesome
38:16
at level one because they've read every paper and
38:18
they have a perfect memory. But I want to distinguish
38:21
a second level of scientific discovery,
38:24
and this is the one I'm interested in. I'll
38:26
call this level two, and that
38:28
is science that requires conceptualization
38:32
to get to the next step, not just
38:34
remixing what's already there. Conceptualization
38:37
like when the young Albert Einstein
38:40
imagined something that he had never
38:42
seen before. He asked himself, what would
38:44
it be like if I could catch up with a
38:46
beam of light and write it
38:49
like a surfer riding a wave. And
38:51
this is how he derived this
38:53
special theory of relativity. This
38:56
isn't something he looked up and found
38:58
three facts that clicked. Again, he
39:01
imagined he asked new
39:03
questions. He tried out a new model
39:06
of the world, one in which time
39:08
runs differently depending on how fast you're going,
39:11
and then he worked backwards to see
39:13
if that model could work. Or
39:15
consider when Charles Darwin thought
39:18
about the species that he saw around
39:20
him, and he imagined all the species
39:22
that he didn't see but who might have existed,
39:25
and he was able to put together a new
39:28
mental model in which most
39:30
species don't make it and
39:32
we only see those whose
39:34
mutations cause survival advantages
39:37
or reproductive advantages. These
39:39
weren't facts that he just collected
39:41
from some papers. He was trying out
39:44
a new model of the world. Now
39:47
this kind of science isn't just for
39:49
the big giant stuff. Most meaningful
39:51
science is actually driven by this kind
39:54
of imagination of
39:56
new models. Just as
39:58
one example, I recently in an episode
40:00
about whether time runs in
40:02
slow motion when you're in fear for
40:05
your life. And so when I wondered
40:07
about this question, I realized
40:09
there were two hypotheses that might
40:11
explain it, and I thought up an experiment
40:14
to discriminate those two hypotheses. And
40:16
then we built a wristband that flashes
40:19
information at a particular speed and
40:21
had people wear, and we dropped them from one hundred
40:23
and fifty foot tall tower into a net below.
40:26
A large language model presumably
40:29
couldn't do that because it's just
40:31
playing statistical word games. And
40:34
unless someone had thought of that experiment
40:36
and written it down, JATGPT
40:38
would never say, Okay, here's a
40:40
new framework, and how we can design an
40:43
experiment to put this to the test. So
40:45
this is what I wanted to find as the most
40:48
meaningful test for a human
40:50
level of intelligence. When
40:52
AI can do science in
40:55
this way, generating new
40:57
ideas and frameworks, not just clicking
41:00
act together, then we
41:02
will have matched human intelligence.
41:08
And I just want to take one more angle on this to make
41:10
the picture clear. The way a scientist
41:13
reads a journal paper is
41:15
not simply by correlating words
41:17
and extracting keywords, although that
41:19
might be part of it, but also by realizing
41:22
what was not said. Why
41:24
did the authors cut off the
41:26
x axis here at thirty What if
41:29
they had extended this graph, would the
41:31
line have reversed in its trend?
41:33
And why didn't the authors mention the hypothesis
41:36
of Smith at all? And does
41:38
that graph look too perfect? You
41:40
know? One of my mentors, Francis Krik
41:43
operated under the assumption that
41:45
he should disbelieve twenty five percent
41:47
of what he read in the literature. Is
41:49
this because of fraud or error,
41:52
or statistical fluctuations or manipulation
41:54
or the waste basket effect? Who cares? The
41:57
bottom line is that the literature
41:59
is rife with errors, and
42:01
depending on the field, some estimates
42:04
put the ireproducibility
42:06
at fifty percent. So when
42:08
scientists read papers they
42:11
know this, just as Francis Crick did. They
42:14
read in an entirely different
42:16
manner than Google Translate
42:18
or Watson or chat GPT or
42:20
any of the correlational methods they
42:23
extrapolate. They read
42:25
the paper and wonder about other possibilities.
42:28
They chew on what's missing. They envision
42:30
the next step. They think of the
42:33
next experiment that could confirm
42:35
or disconfirm the hypotheses and
42:37
the frameworks in the paper. To my
42:39
mind, the meaningful goal of AI
42:42
is not going to be found in number crunching
42:45
and looking for facts that click together.
42:47
It's going to often be something else.
42:50
It's going to require an AI that learns
42:53
how humans think, how
42:55
they behave, what they don't
42:57
say, what they didn't think of, what
42:59
they misthought about, what they
43:01
should think about. And one more thing,
43:04
I should note that these different levels I've outlined,
43:07
from fitting facts together versus
43:09
imagining new world models, they're
43:11
probably gonna end up with blurry boundaries.
43:14
So maybe chat GPT will
43:17
come up with something, and you won't
43:19
always know whether it's
43:22
piecing together a few disparate
43:24
pieces in the literature what I'm calling
43:26
level one, or whether
43:28
it's come up with something
43:31
that is truly a new world model
43:33
that's not a simple clicking together but a genuine
43:37
process of generating a new framework
43:39
to explain the data. So distinguishing
43:42
the levels of discovery is
43:44
probably not going to be an easy task with a
43:46
bright line between them, but I
43:49
think it will clarify some things to
43:51
make this distinction. And
43:53
last thing, I don't necessarily
43:55
know that there's something magical and ineffable
43:58
about the way that humans do this. Presumably
44:01
we're running algorithms too, it's
44:03
just that they're running on self configuring
44:05
wetwear. I have seen tens
44:07
of thousands of science experiments in my career,
44:10
so I know the process of asking
44:12
a question and figuring out what
44:14
we'll put it to the test. So we may
44:17
get to level two and it may be sooner than
44:19
we expect, but I just want to be clear
44:21
that right now we have not figured
44:23
out the human algorithms. So the
44:26
current version of AI, as
44:28
massively impressive as it is, does
44:31
not do level two scientific
44:33
problem solving. And that's when we're
44:35
going to know that we've crossed a
44:38
new kind of line into a
44:40
machine that is truly intelligent.
44:43
So let's wrap up. At least for now.
44:45
Humans still have to do the science,
44:47
by which I mean the conceptual
44:50
work, wherein we take a framework
44:52
for understanding the world and we rethink
44:54
it and we mentally simulate
44:56
whether a new model of the world
44:58
could explain the observed data, and
45:01
we come up with a way to test that new model.
45:03
It's not just searching for facts. So
45:05
I'm definitely not saying we won't get to the next
45:07
level where AI can conceptualize
45:10
things and predict forward and build new knowledge.
45:13
This might be a week from now, or it might be a
45:15
century from now. Who knows how hard
45:17
a problem that's going to turn out to be. But
45:19
I want us to be clear eyed on where we
45:21
are right now, because sometimes
45:24
in the blindingly impressive light
45:27
of what current AI is doing, it
45:29
can be difficult to see, what's missing
45:32
and where we might be heading. That's
45:38
all for this week. To find out more
45:40
and to share your thoughts, head over to eagleman
45:43
dot com slash Podcasts, and
45:45
you can also watch full episodes of Inner
45:48
Cosmos on YouTube. Subscribe
45:50
to my channel so you can follow along each
45:52
week for new updates. I'd love
45:54
to hear your questions, so please
45:56
send those to podcast at
45:58
eagleman dot com and I will do
46:01
a special episode where I answer questions.
46:03
Until next time. I'm David Eagelman
46:06
and this is Inner Cosmos
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More