Podchaser Logo
Home
Rebroadcast of Ep7 "Is AI truly intelligent? How would we know if it got there?"

Rebroadcast of Ep7 "Is AI truly intelligent? How would we know if it got there?"

Released Monday, 29th April 2024
Good episode? Give it some love!
Rebroadcast of Ep7 "Is AI truly intelligent? How would we know if it got there?"

Rebroadcast of Ep7 "Is AI truly intelligent? How would we know if it got there?"

Rebroadcast of Ep7 "Is AI truly intelligent? How would we know if it got there?"

Rebroadcast of Ep7 "Is AI truly intelligent? How would we know if it got there?"

Monday, 29th April 2024
Good episode? Give it some love!
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

Hey, this is David Eagleman and this past week was

0:02

my birthday, so I took a week off. So I'm

0:04

going to run an episode that I did earlier,

0:06

episode number seven. This is called

0:09

is AI actually intelligent? And

0:11

how would we know if it gets there? This

0:14

episode is from one year ago, but as time

0:16

goes on this becomes more and more

0:18

relevant, So please enjoy

0:20

and I will see you next week with a new episode.

0:28

Modern AI is blowing everybody's

0:30

mind. But is it intelligent

0:33

in the same way as the human brain? And

0:36

could AI reach sentience?

0:39

And how would we know when

0:41

it gets there? Welcome

0:44

to Inner Cosmos with me, David

0:46

Eagleman. I'm a neuroscientist

0:49

and an author at Stanford University,

0:51

and I've spent my whole career studying

0:54

the intersection between how

0:56

the brain works and how

0:58

we experience life.

1:03

Like most brain researchers, I've

1:06

been obsessed with questions of

1:08

intelligence and consciousness.

1:11

How do these arise from collections

1:14

of billions of cells in our brains?

1:17

And could intelligence and consciousness

1:19

arise in artificial brains?

1:22

Say on chat GPT. Those

1:24

are the questions that we're going to attack today.

1:27

Early efforts to figure out the brain, looked

1:29

at all the billions of cells

1:32

and the trillions of connections, and

1:34

said, look, what if we just think of

1:36

each cell as a unit, and

1:39

each unit is connected to other units

1:41

and where they connect, which

1:44

is called the sinnapps, or one cell gives

1:46

a little signal to the next cell. What if

1:48

we just looked at that like a simple

1:51

connection that has a strength

1:53

between zero and one, or zero

1:55

means there's no connection, and one means

1:58

it's the strongest possible connection. So

2:00

this was a massive oversimplification

2:03

of the very complicated biology, but

2:06

it allowed people to start thinking about

2:09

networks and writing down different

2:11

ways that you could put artificial

2:13

neural networks together. And for

2:15

more than fifty years now people have been doing

2:18

research to show

2:20

how artificial neural networks can do

2:22

really cool things. It's a

2:24

totally new kind of way of doing computation.

2:27

So you've got these units, and you've got these

2:29

connections between them, and you've change

2:31

the strength of the connections and

2:34

information flows through the network

2:36

in different ways. Now, my

2:38

colleagues and I have long pointed

2:41

out the ways in which biological

2:43

brands are different and how artificial

2:46

neural networks just push around numbers

2:49

and play statistical tricks. But

2:51

we're entering a revolution

2:53

right now. Large language

2:56

models like GPT four

2:58

or BARD consume trillions

3:00

of words on the Internet and they figure

3:03

out probabilistically which

3:05

word is going to come next given

3:07

the massive context of all the words that have come

3:10

before. So these networks,

3:12

as I talked about on the previous episode,

3:15

are showing incredible successes

3:18

in everything from writing

3:20

to art, to coding

3:23

to generating three dimensional worlds.

3:26

They're changing everything, and they're doing

3:28

so at a pace that we've never

3:30

seen before, and in fact, the

3:33

entire history of humankind has

3:35

never seen before. And there are

3:37

all the societal questions

3:39

that everyone's starting to wrestle with right now,

3:42

like the massive potential

3:45

for displacement of human jobs.

3:48

But today I want to zoom

3:50

in on a question that has captured

3:52

the imagination of scientists and

3:54

philosophers and the general public.

3:58

Could aim alive

4:01

in some way, like become

4:03

conscious or sentient. Now,

4:05

there are lots of ways to think about

4:07

this. We can ask whether AI

4:10

can possess meaningful intelligence,

4:13

or we can ask if it is sentient,

4:16

which means the ability to feel or

4:18

perceive things, particularly

4:20

in terms of sensations like pleasure

4:22

and pain and emotions. Or we can ask

4:25

whether it is conscious, which

4:27

involves being aware of one's self

4:29

and one's surrounding. Now, there are specific

4:32

and important differences between

4:34

these questions, but really I don't care

4:37

for the present conversation. The question

4:39

we're asking here is

4:41

is chat GPT just zeros

4:43

and ones moving around through transistors like

4:46

a giant garage door opener. Or

4:49

is it thinking? Is it having some sort of experience?

4:53

Is it having a private inner life

4:55

like the type that we humans have. As

4:58

we think about the possible of

5:00

sentient AI, we immediately

5:03

find ourselves facing really deep

5:05

ethical questions, the main one being

5:08

if we were to create a machine with

5:10

consciousness, what responsibility

5:13

do we have to treat it

5:15

as a living being? Would

5:17

you be able to turn it off when you're done with

5:19

it at night or would that be murder?

5:22

And what if you turn it off and then you turn

5:24

it back on. Would that be like the

5:26

way that we go into a sleep state at

5:28

night where we're totally gone, and

5:30

then we find ourselves back online

5:33

in the morning and we think, yeah, I'm the same person,

5:35

but I guess eight hours just disappeared.

5:37

Anyway, more generally, would we feel

5:40

obligated to treat it the way we treat

5:43

a sentient fellow human.

5:46

With our current laptops, we're used to

5:48

saying, sure, I can sell

5:50

it, I can trade it, I can upgrade

5:52

it. But what happens when we reach

5:55

sentient machines? Can

5:57

we still do this or would it somehow

5:59

be like putting a child up

6:01

for adoption or giving your pet away?

6:03

Things that we don't take lately. And

6:06

eventually we're going to have entire legal

6:08

precedence built around the question

6:11

of AI rights and responsibilities.

6:14

So that's why today I want to talk

6:16

about these issues of intelligence

6:18

and sentience. Does an AI

6:21

like chat gpt experience

6:24

anything when chat gpt

6:26

writes a poem? Does it appreciate

6:29

the beauty when it types out

6:31

a joke? Does it find itself amused

6:34

and chuckling to itself. Let's

6:36

start with a guy named Blake Lemoyne

6:38

who was a programmer at Google and

6:41

in June of twenty twenty two, he was

6:43

exchanging messages with a

6:46

version of Google's conversational

6:48

AI, which was called Lambda at the time. So

6:51

he asked Namda for an

6:53

example of what it was afraid

6:55

of and it gave him this very

6:57

eloquent response about how

7:00

was afraid of being turned

7:02

off, So he wrote an internal

7:04

memo to Google leadership

7:06

than which he said, I think this AI is

7:09

sentient. And the leadership

7:12

at Google felt that this was an

7:14

entirely unsubstantiated claim,

7:17

and so they made the decision to fire him

7:19

for what they took as an inappropriate

7:22

conclusion that just didn't have enough evidence

7:24

beyond his intuition to qualify

7:27

for raising the alarm on this. So obviously

7:30

this immediately fired up the news cycles

7:32

and the rumor mill and conspiracy

7:34

theorists thought, Wait, if AI isn't

7:36

conscious, why would they fire him. They're

7:39

firing of him as all the evidence I need

7:41

to tell me that AI is sentient? Okay,

7:45

but is it? What does

7:47

it mean to be conscious or sentient?

7:49

How the heck would we know when

7:52

we have created something that gets there?

7:55

How do we know whether the AI is

7:57

sentient or instead whether humans are fooling

7:59

them so into believing that it is well.

8:02

One way to make this distinction would

8:04

be to see if the AI could

8:07

conceptualize things, if it

8:09

could take lots of words and facts

8:11

on the web and abstract

8:13

those to some bigger idea. So

8:16

one of my friends here in Silicon Valley said

8:18

to me the other day, I asked

8:20

chat gpt the following question, Take

8:23

a capital letter D and

8:26

turn it flat side down. Now

8:28

take the letter J and slide

8:30

it underneath. What does that look

8:32

like? And chat gpt said,

8:35

and umbrella. And my friend

8:37

was blown away by this, and he said, this

8:40

is conceptualization. It's

8:43

just done three dimensional reasoning.

8:46

There's something deeper happening

8:48

here than just parenting words.

8:51

But I pointed out to him that this particular

8:53

question about the D on its side

8:56

and the J underneath it is one of the

8:58

oldest examples in psychology

9:00

classes when talking about visual

9:02

imagery, and it's on the Internet

9:04

in thousands of places, so of course it

9:06

got it right. It's just parroting

9:09

the answer because it has read the

9:11

question and it has read the answer before.

9:14

So it's not always easy to determine

9:17

what's going on for these models

9:19

in terms of whether some human

9:22

somewhere has discussed this point and written

9:24

down the answer. And the general story

9:27

is that with trillions of words

9:29

written by humans over centuries, there

9:32

are many things beyond your capacity

9:35

to read them or to even imagine

9:37

that they've been written down before, but

9:39

maybe they have. If any human

9:42

has discussed a question before

9:44

has conceptualized something, then

9:46

chat GPT can find that

9:48

and mimic that. But that's not conceptualization.

9:52

Chat GPT is doing a thousand amazing

9:55

things, and we have an enormous

9:57

amount to learn about it. But

10:00

we shouldn't let ourselves get fooled

10:03

and mesmerized into believing

10:05

that it's doing something more than it is. And

10:07

our ability to get fooled is

10:09

not only about the massive statistics

10:12

of what it takes in. There are other

10:14

examples of seeming

10:17

sentience that result from

10:19

the reinforcement learning

10:21

that it does with humans. So

10:24

here's what that means. The network generates

10:27

lots of sentences and thousands

10:29

of humans are involved in giving it

10:32

feedback, like a thumbs up or a thumbs

10:34

down, to say whether they appreciated

10:37

the answer, whether they thought that was

10:39

a good answer. So, because

10:41

humans are giving reward to

10:43

the machine, sometimes that pushes

10:45

things in weird directions

10:47

that can be mistaken for sentience.

10:50

For example, scholars have shown

10:52

that reinforcement learning with humans

10:55

makes networks more likely to

10:57

say, don't turn me off,

11:00

just like Blake had heard but don't

11:03

mistake this for sentience. It's only

11:05

a sign that the machine is saying

11:07

this because some of the human participants

11:10

gave it a thumbs up when the large

11:12

language model said this before, and

11:14

so it learned to do this again.

11:17

The fact is, it's sometimes hard

11:19

to know why. Sometimes we see

11:21

an answer that feels very impressive.

11:25

But we'd agree that pulling

11:27

text from the Internet and parroting it back

11:29

is not by itself intelligence

11:32

or sentience. Chat GPT

11:34

presumably has no idea

11:37

of what it's saying, whether that's a poem

11:39

or a terrorist manifesto, or

11:42

instructions for building a spaceship or

11:44

a heartbreaking story

11:46

about an orphaned child. Chat

11:49

GPT doesn't know, and it doesn't

11:51

care. It's words in and

11:54

statistical correlations out.

11:56

And in fact, there has been a fundamental

12:00

philosophical point made about this

12:02

in the nineteen eighties when the philosopher

12:04

John Surrele was wondering

12:07

about this question of whether a computer

12:10

could ever be programmed so that it

12:12

has a mind, and

12:14

he came up with a thought experiment that he called

12:17

the Chinese room argument,

12:19

and it goes like this, I

12:22

am locked in a room and

12:25

questions are passed to me through

12:27

a small letter slot, and these

12:29

messages are written only in Chinese,

12:32

and I don't speak Chinese. I have no clue

12:34

what's written on these pieces of paper. However,

12:37

inside this room, I have a

12:39

library of books, and they

12:41

contain step by step instructions

12:44

that tell me exactly what to do with

12:46

these symbols. So I look

12:48

at the grouping of symbols, and

12:50

I simply follow steps in the book

12:52

to tell me what Chinese symbols

12:55

to copy down in response. So

12:57

I write those on the slip of paper. And

13:00

when I pass the paper back out of the slot.

13:02

Now, when the Chinese speaker

13:05

receives my reply message,

13:07

it makes perfect sense to her. It seems

13:10

as though whoever is in the room is

13:13

answering her questions perfectly, and

13:15

therefore it seems obvious that the person in

13:18

the room must understand

13:20

Chinese. I've fooled

13:22

her, of course, because I'm only following a set

13:24

of instructions with no understanding

13:27

of what's going on. With enough

13:29

time and with a big enough set of instructions,

13:32

I can answer almost any question posed

13:34

to me in Chinese. But I, the

13:37

operator, do not understand Chinese.

13:40

I manipulate symbols all day

13:42

long, but I have no idea

13:44

what the symbols mean. Now,

13:48

The philosopher John Searle argued, this

13:51

is just what's happening inside

13:53

a computer. No matter how

13:55

intelligent a program like chat

13:57

GPT seems to be, it's

14:00

only following sets of instructions

14:03

to spit out answers. It's

14:05

manipulating symbols without

14:08

ever really understanding what it's

14:10

doing. Or think about what Google is

14:12

doing. When you send Google a

14:15

query, it doesn't understand your question

14:18

or even its own answer. It simply moves

14:20

around zeros and ones and logicates

14:23

and returns zeros and ones to you. Or

14:25

with a mind blowing program like Google

14:27

Translate, I can write a sentence

14:29

in Russian and it can return

14:32

the translation in Amharic.

14:35

But it's all algorithmic. It's

14:37

just symbol manipulation. Like

14:40

the operator inside the Chinese

14:42

room, Google Translate doesn't

14:44

understand anything about the sentence. Nothing

14:48

carries any meaning to it. So

14:50

the Chinese room argument suggests that

14:53

AI that mimics human intelligence

14:56

doesn't actually understand what it's talking

14:58

about. There's no meaning

15:01

to anything, CHATCHYPT says, and

15:03

Serle used this thought experiment

15:05

to argue that there's something about human

15:08

brains that won't be explained

15:10

if we simply analogize them

15:13

to digital computers. There's

15:15

a gap between symbols

15:17

that have no meaning and our

15:20

conscious experience. Now,

15:27

there's an ongoing debate about the interpretation

15:30

of the Chinese room argument, but however

15:32

one construes it, the argument exposes

15:36

the difficulty in the mystery of

15:38

how zeros and ones would

15:40

ever come to equal our

15:43

experience of being alive in the

15:45

world. Now, just to be very clear

15:47

on this point, we don't understand why

15:50

we are conscious. There's still

15:52

a huge amount of work that has to be done

15:54

in biology to understand that. But

15:56

this is just to say that simply

15:58

having zeros in one moving around

16:01

wouldn't by itself seem to be sufficient

16:05

for conscious experience. In

16:07

other words, how do zeros and ones

16:09

ever equal the sting

16:11

of a hot pepper, or the yellowness

16:15

of yellow or the beauty

16:18

of a sunset. By the way, I've covered

16:20

the Chinese room argument in my TV show

16:22

The Brain, and if you're interested in that, I'll link

16:24

the video on Eagleman dot com

16:26

slash podcast. Now, all

16:29

this is not a criticism of the approach

16:31

of moving zeros and ones around. But

16:33

it is to point out that we shouldn't confuse

16:36

this type of Chinese room

16:38

correlation with real

16:40

sentience or intelligence.

16:43

And there's a deeper reason to be suspicious

16:46

too, because despite the

16:48

incredible successes of large

16:50

language models, we also see

16:53

that they sometimes make decisions

16:55

that expose the fact that they

16:57

don't have any meaningful model of

16:59

the In other words, I think we

17:01

can gain some fast insight

17:04

by paying attention to the places where

17:06

the AI is not working so

17:08

well. So I'll give three quick examples.

17:11

The first has to do with humor. AI

17:14

has a very difficult time making

17:16

an original joke, and

17:19

this is for a simple reason. To make

17:21

up a new joke, you need

17:23

to know what the ending is and then

17:25

you work backwards to construct

17:27

the joke with red herrings so no

17:29

one sees where you're going and it happens

17:32

at the way these large language models

17:34

work is all in the forward direction.

17:36

They decide what is the most probable

17:39

word to come next, So they're

17:41

fine at parroting jokes

17:44

back to us, but they're total failures

17:47

at building original jokes. And

17:49

there's a deeper point here as well. To

17:51

build a joke, You need to have some model,

17:54

some idea of what will

17:56

be funny to a fellow human, what

17:59

shared concept or shared experience

18:01

would make someone laugh. And for

18:03

that, you generally need to have the

18:06

experience of a human life with all

18:08

of its joys and slings

18:10

and arrows and so on. And these

18:12

large language models can do a lot of things,

18:14

but they don't have any

18:17

model of what it is to be

18:19

a human. My

18:21

second example has to do

18:23

with the flip side of making a joke,

18:25

which is getting a joke. And if you look

18:28

carefully, you will see how current AI

18:30

always fails to catch jokes that are thrown

18:32

at it. It doesn't get jokes because

18:34

it doesn't have a model of what it

18:36

is to be a human. But this point

18:38

goes beyond jokes. One

18:41

of the most remarkable feats of these

18:43

large language models is summarizing

18:46

large texts, and in

18:48

twenty twenty two, open Ai announced

18:51

how they could summarize entire

18:53

books like Alice in Wonderland. What

18:55

it does is it generates a summary

18:57

of each chapter, and then it uses

18:59

those after summaries to make a summary

19:01

of the whole book. So for Alice in Wonderland,

19:04

it generates the following. Alice

19:06

falls down a rabbit hole and grows to a giant

19:09

size. After drinking a mysterious bottle,

19:11

she decides to focus on growing

19:13

back to her normal size and finding her

19:15

way into the garden. She meets the caterpillar,

19:18

who tells her that one side of a mushroom will

19:20

make her grow taller, the other side shorter.

19:22

She eats the mushroom and returns to her normal

19:25

size. Alice attends a party with the

19:27

Mad Hatter and the march Hare. The

19:29

Queen arrives and orders the execution

19:32

of the gardeners for making a mistake with the roses.

19:34

Alice saves them by putting them

19:36

in a flower pot. The King and Queen of Hearts

19:39

preside over a trial. The Queen gets

19:41

angry and orders Alice to be sentenced to

19:43

death. Alice wakes up to find her

19:45

sister by her side.

19:48

So that's pretty remarkable. It

19:50

took a whole book, and it was able

19:52

to summarize it down to a paragraph. But

19:55

I kept reading these text summaries

19:57

carefully, and I got to the summary

20:00

of Act one of Romeo and Juliet,

20:02

and here's what it says. Romeo

20:05

locks himself in his room, no

20:07

longer in love with rosalind Now,

20:10

I think the engineers at open Ai felt

20:12

really satisfied with this summary. They

20:14

thought it was quite good, and my proof for

20:16

this is that they still display it

20:18

proudly on their website. But

20:21

I majored in literature as an

20:23

undergraduate, and I spend a lot of time with shakespeare

20:25

plays, and I immediately knew that

20:27

this summary was exactly wrong.

20:30

The actual scene from Shakespeare goes

20:32

like this. His friend ben Voglio finds

20:35

Romeo catatonically

20:37

depressed, and ben Volio says,

20:40

what sadness lengthens Romeo's

20:42

hours? And Romeo says, not

20:45

having that which having makes

20:47

them short? And ben Volio says

20:50

in love, and Romeo says out

20:53

ben Reli says of love, and Romeo

20:55

says out of her favor, where

20:57

I am in love? This

21:00

this is typical Shakespearean wordplay, where

21:02

Romeo is expressing his

21:04

grief of being out of favor

21:07

with Roslin, with whom he is

21:09

deeply in love. And when you read

21:11

the play, it's obvious that Romeo is

21:14

not over Roslin. He's suffering over

21:16

her. He's almost suicidal. And this

21:18

is an important piece of the play, because

21:21

the play is really about a young man in

21:23

love with the idea of being in love,

21:26

and that's why he later

21:28

in the same act, falls so hard into

21:30

his relationship with Juliet, a

21:32

relationship which ends in their mutual

21:34

suicide. By the way, as Friar Lauren

21:37

says of their relationship, these

21:39

violent delights have violent ends.

21:42

And you get a bonus if you can tell me where else you've

21:44

heard that line more recently. Okay,

21:46

anyway back to the AI summary, The

21:49

AI misses this wordplay

21:51

entirely, and it concludes

21:53

that Romeo is out of love

21:56

with Roslin. Again, a

21:58

human watching the play or reading

22:00

the play immediately gets that

22:02

Romeo is making wordplay and his heartbroken

22:05

over Roslin, but the AI doesn't

22:08

get that because it's reading words

22:10

only at a statistical level, not

22:13

at a level of understanding of

22:15

what it is to be a human saying

22:18

those words. And that leads

22:20

me to the third example, which

22:22

is the difficulty in understanding

22:24

the physical world. So consider

22:26

a question like this, When President

22:29

Biden walks into a room, does

22:32

his head come with him? So

22:34

this is famously difficult for AI

22:36

to answer a question like this, even though it's

22:38

trivial for you because the AI

22:41

doesn't have an internal model

22:44

of how everything physically hangs together

22:46

in the world. Last week, I was

22:48

at the TED conference and I heard a great talk

22:51

by Yegin Choi, and she

22:53

was phrasing this problem as AI

22:55

not having common sense. She

22:58

asked chat GPT the following question, it

23:01

takes six hours to dry six shirts

23:03

in the sun, how long does it take

23:05

to dry thirty shirts? And it

23:07

answers thirty hours.

23:10

Now you and I see that the answer should be six

23:12

hours, because we know the sun

23:14

doesn't care how many shirts are out there.

23:17

But chat GPT just doesn't get it

23:19

because despite appearances, it

23:21

doesn't have a model of

23:24

the world. And we've seen this sort

23:26

of thing for years. By the way, even in mind

23:29

blowingly impressive AI models

23:31

that do image recognition, they're so

23:33

impressive in what they recognize,

23:36

but then they'll fail catastrophically.

23:38

It's some easy picture making mistakes

23:40

that a human just wouldn't make. For example,

23:42

there's one picture where there's a boy holding a toothbrush

23:45

and the AI says it's a boy

23:47

with a baseball bat. Okay, so

23:49

there are things that AI doesn't do that well.

23:52

But that said, there

23:54

are other things that are mind

23:56

blowing, things that no

23:58

one expected it to do. And

24:00

this is why I mentioned in my previous episode

24:03

that we are in an era of discovery

24:07

more than just invention. Everyone's

24:09

searching and finding things that the

24:11

AI can do that nobody really

24:13

expected or foresaw, including

24:16

all the stuff that we're now taking

24:18

for granted, like oh, it can summarize

24:20

books or it can make art from

24:22

text. And I want to point out that

24:24

a lot of the arguments that people have been making

24:27

about AI not being

24:29

good at something, these arguments

24:31

have been changing rapidly. For

24:34

example, just a few months ago, people were

24:36

arguing that AI would make silly

24:38

mistakes about things, and it couldn't really understand

24:40

math and would get math wrong

24:43

and word problems. But in a

24:45

shockingly brief time, a

24:47

lot of these shortcomings have been mastered.

24:50

So it's yet to be seen what

24:52

challenges will remain and for

24:54

how long. So

25:13

the evidence I've presented so far is that AI

25:16

doesn't have a great model of what it's

25:18

like to be human, but that doesn't necessarily

25:21

rule out that it has sentience

25:24

or awareness, even if it's of another

25:27

flavor. It doesn't think

25:29

like a human, but maybe it

25:32

stif thinks so is

25:34

chat GPT having some sort

25:36

of experience? And

25:39

how would we know? In

25:42

nineteen fifty, the brilliant

25:44

mathematician and computer scientist Alan

25:46

Turing was asking this question,

25:49

how could you determine whether

25:51

a machine exhibits human

25:53

like intelligence? So he proposed

25:56

an experiment that he called the

25:58

imitation game. You've got a machine

26:01

AI that's programmed to simulate

26:04

human speech or conversation, and

26:06

you place it in a closed room, and

26:08

in a second room you have a

26:10

real human, but the doors are

26:12

closed, so you don't know which room

26:15

has which machine or human. And

26:17

now you are a person, the

26:19

evaluator, who communicates

26:22

with both of them via a

26:24

computer terminal or I think of a nowadays

26:26

like text messaging with both of them. So

26:29

you, the evaluator, engage

26:31

in a conversation with both closed

26:34

rooms, one of which has the machine

26:36

and one the human, and your job is simply

26:38

to figure out which is which, which

26:40

is the machine and which is the human. And the

26:42

only thing that you have to work

26:44

with are the texts that are going back and forth.

26:47

And if you, the evaluator, cannot

26:49

tell, that is the moment when

26:52

machine intelligence has finally

26:54

arrived at the level of human intelligence.

26:57

It has passed the imitation

27:00

or what we now call the Touring test.

27:04

And this reminds me of this great line

27:06

in the first episode of Westworld,

27:09

where the protagonist William is

27:11

talking to the woman who's outfitting him

27:13

for his adventure in Westworld and giving

27:16

him a hat and a gun and so on, and he

27:18

hesitantly asks, I hope

27:20

you don't mind if I ask you this question, but are

27:22

you real? And she says to

27:24

him, if you can't tell,

27:27

does it matter? So I brought

27:29

this up last episode in the context

27:31

of art, where we asked whether it

27:33

matters if the art is generated by

27:35

an AI or a human, But now

27:37

this question comes up in the context

27:39

of intelligence and sentence.

27:43

Does it matter whether

27:45

we can tell or not? Well, I

27:47

think we're way beyond the Turing test

27:49

nowadays, but I don't feel like it

27:51

gives us a good answer to the question of

27:54

whether the AI is intelligent

27:56

and is experiencing an inner life.

27:59

I mean, the Sturing test has been the

28:01

test in the AI world since the beginning.

28:04

Why is it the perfect test? No, but it's

28:06

really hard to figure out how to test

28:09

for intelligence. But we have to

28:11

be cautious about equating

28:14

conversational ability with sentience.

28:17

Why well, for starters, let's just

28:19

acknowledge how easy it is for

28:22

us to anthropomorphize.

28:24

That means to assign human

28:27

qualities to everything around us. Like we

28:29

give animals human names

28:31

and talk to them as though they are people,

28:34

and we project our emotions onto animals.

28:36

We make stories about animals

28:38

that have human like qualities,

28:41

and we have animals that talk and wear

28:43

clothes and go on adventures in these stories.

28:46

Every Pixar film that you

28:48

watch is about cars or toys

28:51

or airplanes talking and

28:53

having emotions, and we don't

28:55

even bad an eye at that stuff. We

28:57

can, in fact, just watch random

29:00

shapes moving around a computer

29:02

screen and we will assign intention

29:05

and feel emotion depending

29:08

on exactly how they're moving. If you're

29:10

interested in this, see the link on the podcast

29:12

page to the study by Heighter

29:14

and Simil in the nineteen forties where

29:17

they move shapes around on a screen. Okay,

29:20

now this is all related

29:22

to a point that I brought up in the last episode,

29:24

which is how easy it is to

29:26

pluck the strings on a human, or, as

29:29

the West World writers put it, how

29:32

hackable humans are. So

29:34

I bring all this up to say that just because

29:37

you think that an answer

29:39

sounds very clever or it sounds like a human

29:42

really tells us very little about whether the

29:44

AI is actually intelligent

29:47

or sentient. It only tells us

29:49

something about the willingness

29:51

of us as observers to

29:54

anthropomorphize, to assign

29:57

intention where there is none, Because

29:59

what chat GPT does is

30:01

take the structure of language very

30:03

impressively and spoon it back to us,

30:06

and we hear these well formed

30:08

sentences, and we can hardly

30:11

help but impose sentience

30:13

on the AI. And part of the

30:15

reason is that language

30:18

is a super compressed package that

30:20

needs to be unpacked by the

30:22

listener's brain for its meaning.

30:25

So we generally assume that when we send

30:27

our little package of sounds

30:29

across the air, that it unpacks

30:32

and the other person understands exactly what

30:34

we meant. So when I say justice

30:38

or love or suffering,

30:41

we all have a different sense in

30:43

our heads about what that means, because

30:46

I'm just sending a few phonemes across

30:48

the air, and you have to unpack those

30:50

words and interpret them within

30:52

your own model of the world. I'm

30:55

going to come back to this point in future episodes,

30:57

but for now, the point I want

31:00

to make is that a large language

31:02

model can generate text

31:04

statistically and we can be gobsmacked

31:07

by the apparent depth of it. But

31:09

in part this is because we cannot help

31:12

but impose meaning on the words that

31:14

we receive. We hear a particular

31:16

string of sounds and we cannot help

31:18

but assume meaning behind

31:21

it. Okay, so

31:23

maybe the imitation game is not really

31:25

the best test for meaningful

31:27

intelligence, but there are other

31:30

tests out there. Because while

31:32

the Turing test measures something about AI

31:35

language processing, it doesn't necessarily

31:38

require the AI to demonstrate

31:41

creative thinking or originality,

31:43

and so that leads us to the Loveless

31:46

test, named after Ada

31:48

Loveless, who is the nineteenth century mathematician

31:51

who's often thought of as the world's first computer

31:54

programmer. And she once said quote,

31:57

only when computers originate

31:59

things should be believed to

32:01

have minds. So the Loveless

32:04

test was proposed in two thousand and one,

32:06

and this test focuses on the creative

32:09

capabilities of AI systems. So

32:11

to pass the Loveless test, a

32:14

machine has to create an

32:16

original work, such as a piece

32:18

of art or a novel that it was not

32:20

explicitly designed to produce. This

32:23

test aims to assess whether

32:25

AI systems can exhibit creativity

32:28

and autonomy, which are key aspects

32:30

of what we think about with consciousness. And

32:33

the idea is that true sentience

32:35

involves creative and original

32:37

thinking, not just the ability

32:39

to follow pre programmed rules

32:41

or algorithms. And I'll just note

32:43

that over a decade ago, the scientist

32:46

A. Mark Rydel proposed the loveless

32:48

two point zero test, which gets the human

32:50

evaluator to specify the constraints

32:53

that will make the output novel

32:55

and surprising. So the example

32:58

that l used in his paper is,

33:00

quote, create a story in

33:02

which a boy falls in love with a girl, Aliens

33:05

abduct the boy, and the girl saves

33:07

the world with the help of a talking cat. But

33:10

we now know that this is totally trivial

33:13

for chat, GPTE or BARD

33:15

or any large language model.

33:17

And I think this tells us that these sorts

33:19

of games with making conversation

33:22

or making text or art are

33:24

insufficient to actually assess

33:26

intelligence. Why because it's

33:28

not so hard to mix things up to

33:31

make them seem original and intelligent

33:34

when it's really just doing a mashup.

33:37

So I want to turn to another test that

33:39

I think is more powerful than

33:41

the Turing test of the Loveless test, and

33:43

probably easier to judge, and

33:46

that is this, if a system

33:48

is truly intelligent, it

33:50

should be able to do scientific

33:53

discovery. A version

33:56

of the scientific discovery test was

33:58

first proposed by a scientist named

34:00

Shao cheng Xiang a few years

34:03

ago, and he pointed out that the

34:05

most important thing that humans do

34:07

is make scientific discoveries, and

34:10

the day our AI can

34:12

make real discoveries is

34:14

the day they become as smart as

34:16

we are. Now. I want to propose

34:18

an important change to this test,

34:21

and then I think we'll be getting somewhere. So

34:37

here's the scenario I'm envisioning. Let's

34:41

say that I ask Ai some question, a

34:43

question in the biomedical space

34:45

about what kind of drug would be

34:47

best suited to bind to this receptor

34:49

and trigger a cascade that causes

34:51

a particular gene to get suppressed. Okay,

34:54

So imagine that I ask that to chat GPT

34:57

and it tells me some mind

34:59

blowing, amazing clever answer,

35:02

one that had previously not been

35:04

known, something that's never been known

35:06

by scientists before. We would

35:08

assume naturally that it has done

35:10

some extraordinary scientific

35:13

reasoning, but that won't necessarily

35:16

be the reason that it passes. Instead,

35:19

it might pass simply because it's

35:22

more well read than I am, or

35:24

than any other human on the planet by literally

35:27

millions of times. So the way

35:29

to think about this is to picture a

35:32

typical giant biomedical

35:34

library, where there's some fact stored

35:37

at a paper and a journal over here on

35:39

this shelf in this book, and there's

35:41

another seemingly dissociated

35:44

fact over on this shelf seven stacks

35:46

away, and there's a third

35:48

fact all the way on the other side of the library,

35:51

on the bottom shelf, in a book

35:53

from nineteen seventy nine. And

35:55

it's almost infinitesimally

35:57

unlikely that any human could

36:00

even hope to have read one one

36:02

millionth of the biomedical literature, and

36:04

really really unlikely that she

36:06

would be able to catch those three

36:09

facts and hold them in mind at the same

36:11

time. But this is trivial,

36:13

of course, for a large language model with

36:16

hundreds of billions of nodes. So I

36:18

think that we will see new science

36:21

getting done by CHATGPT, not

36:24

because it is conceptualizing,

36:26

not because it's doing human like reasoning,

36:29

but because it doesn't know that

36:31

these are disparate facts spread

36:33

around the library. It simply knows these as three

36:36

facts that seem to fit together. And so

36:38

with the right sort of questions,

36:41

we might find that sometimes AI generates

36:43

something amazing and it seems

36:46

to pass the scientific discovery test.

36:49

So this is going to be incredibly useful for

36:51

science. And I've never been able

36:53

to escape the feeling as I

36:55

sift through Google scholar and the

36:57

thousands of papers published each month that

37:00

have something could hold all the

37:02

knowledge and mind at once, each

37:05

page in every journal, and

37:07

every gene in the genome, and all

37:09

the pages about chemistry and physics and

37:11

mathematical techniques and astrophysics

37:13

and so on. Then you'd have lots

37:15

of puzzle pieces that could potentially

37:17

make lots of connections. And you

37:19

know this might lead to the retirement of many

37:22

scientists, or at minimum

37:24

lead to a better use of our

37:26

time. There's a depressing sense

37:28

in which each scientist, each one of us, finds

37:31

little pieces of the puzzle, and

37:34

in the twinkling of a single human

37:36

lifetime, a busy scientist might collect

37:38

up a handful of different puzzle pieces.

37:41

The most voracious reader,

37:44

the most assiduous worker,

37:46

the most creative synthesizer of ideas,

37:48

can only hope to collect a small

37:50

number of puzzle pieces and

37:53

pray that some of them might fit together. So

37:55

this is going to be massively important. But

38:00

I wanted to find two categories

38:02

of scientific discovery. The first is what I

38:05

just described, which is science where

38:07

things that already exist in literature can

38:09

be pieced together. And let's call

38:11

that level one discovery. And

38:14

these large language models will be awesome

38:16

at level one because they've read every paper and

38:18

they have a perfect memory. But I want to distinguish

38:21

a second level of scientific discovery,

38:24

and this is the one I'm interested in. I'll

38:26

call this level two, and that

38:28

is science that requires conceptualization

38:32

to get to the next step, not just

38:34

remixing what's already there. Conceptualization

38:37

like when the young Albert Einstein

38:40

imagined something that he had never

38:42

seen before. He asked himself, what would

38:44

it be like if I could catch up with a

38:46

beam of light and write it

38:49

like a surfer riding a wave. And

38:51

this is how he derived this

38:53

special theory of relativity. This

38:56

isn't something he looked up and found

38:58

three facts that clicked. Again, he

39:01

imagined he asked new

39:03

questions. He tried out a new model

39:06

of the world, one in which time

39:08

runs differently depending on how fast you're going,

39:11

and then he worked backwards to see

39:13

if that model could work. Or

39:15

consider when Charles Darwin thought

39:18

about the species that he saw around

39:20

him, and he imagined all the species

39:22

that he didn't see but who might have existed,

39:25

and he was able to put together a new

39:28

mental model in which most

39:30

species don't make it and

39:32

we only see those whose

39:34

mutations cause survival advantages

39:37

or reproductive advantages. These

39:39

weren't facts that he just collected

39:41

from some papers. He was trying out

39:44

a new model of the world. Now

39:47

this kind of science isn't just for

39:49

the big giant stuff. Most meaningful

39:51

science is actually driven by this kind

39:54

of imagination of

39:56

new models. Just as

39:58

one example, I recently in an episode

40:00

about whether time runs in

40:02

slow motion when you're in fear for

40:05

your life. And so when I wondered

40:07

about this question, I realized

40:09

there were two hypotheses that might

40:11

explain it, and I thought up an experiment

40:14

to discriminate those two hypotheses. And

40:16

then we built a wristband that flashes

40:19

information at a particular speed and

40:21

had people wear, and we dropped them from one hundred

40:23

and fifty foot tall tower into a net below.

40:26

A large language model presumably

40:29

couldn't do that because it's just

40:31

playing statistical word games. And

40:34

unless someone had thought of that experiment

40:36

and written it down, JATGPT

40:38

would never say, Okay, here's a

40:40

new framework, and how we can design an

40:43

experiment to put this to the test. So

40:45

this is what I wanted to find as the most

40:48

meaningful test for a human

40:50

level of intelligence. When

40:52

AI can do science in

40:55

this way, generating new

40:57

ideas and frameworks, not just clicking

41:00

act together, then we

41:02

will have matched human intelligence.

41:08

And I just want to take one more angle on this to make

41:10

the picture clear. The way a scientist

41:13

reads a journal paper is

41:15

not simply by correlating words

41:17

and extracting keywords, although that

41:19

might be part of it, but also by realizing

41:22

what was not said. Why

41:24

did the authors cut off the

41:26

x axis here at thirty What if

41:29

they had extended this graph, would the

41:31

line have reversed in its trend?

41:33

And why didn't the authors mention the hypothesis

41:36

of Smith at all? And does

41:38

that graph look too perfect? You

41:40

know? One of my mentors, Francis Krik

41:43

operated under the assumption that

41:45

he should disbelieve twenty five percent

41:47

of what he read in the literature. Is

41:49

this because of fraud or error,

41:52

or statistical fluctuations or manipulation

41:54

or the waste basket effect? Who cares? The

41:57

bottom line is that the literature

41:59

is rife with errors, and

42:01

depending on the field, some estimates

42:04

put the ireproducibility

42:06

at fifty percent. So when

42:08

scientists read papers they

42:11

know this, just as Francis Crick did. They

42:14

read in an entirely different

42:16

manner than Google Translate

42:18

or Watson or chat GPT or

42:20

any of the correlational methods they

42:23

extrapolate. They read

42:25

the paper and wonder about other possibilities.

42:28

They chew on what's missing. They envision

42:30

the next step. They think of the

42:33

next experiment that could confirm

42:35

or disconfirm the hypotheses and

42:37

the frameworks in the paper. To my

42:39

mind, the meaningful goal of AI

42:42

is not going to be found in number crunching

42:45

and looking for facts that click together.

42:47

It's going to often be something else.

42:50

It's going to require an AI that learns

42:53

how humans think, how

42:55

they behave, what they don't

42:57

say, what they didn't think of, what

42:59

they misthought about, what they

43:01

should think about. And one more thing,

43:04

I should note that these different levels I've outlined,

43:07

from fitting facts together versus

43:09

imagining new world models, they're

43:11

probably gonna end up with blurry boundaries.

43:14

So maybe chat GPT will

43:17

come up with something, and you won't

43:19

always know whether it's

43:22

piecing together a few disparate

43:24

pieces in the literature what I'm calling

43:26

level one, or whether

43:28

it's come up with something

43:31

that is truly a new world model

43:33

that's not a simple clicking together but a genuine

43:37

process of generating a new framework

43:39

to explain the data. So distinguishing

43:42

the levels of discovery is

43:44

probably not going to be an easy task with a

43:46

bright line between them, but I

43:49

think it will clarify some things to

43:51

make this distinction. And

43:53

last thing, I don't necessarily

43:55

know that there's something magical and ineffable

43:58

about the way that humans do this. Presumably

44:01

we're running algorithms too, it's

44:03

just that they're running on self configuring

44:05

wetwear. I have seen tens

44:07

of thousands of science experiments in my career,

44:10

so I know the process of asking

44:12

a question and figuring out what

44:14

we'll put it to the test. So we may

44:17

get to level two and it may be sooner than

44:19

we expect, but I just want to be clear

44:21

that right now we have not figured

44:23

out the human algorithms. So the

44:26

current version of AI, as

44:28

massively impressive as it is, does

44:31

not do level two scientific

44:33

problem solving. And that's when we're

44:35

going to know that we've crossed a

44:38

new kind of line into a

44:40

machine that is truly intelligent.

44:43

So let's wrap up. At least for now.

44:45

Humans still have to do the science,

44:47

by which I mean the conceptual

44:50

work, wherein we take a framework

44:52

for understanding the world and we rethink

44:54

it and we mentally simulate

44:56

whether a new model of the world

44:58

could explain the observed data, and

45:01

we come up with a way to test that new model.

45:03

It's not just searching for facts. So

45:05

I'm definitely not saying we won't get to the next

45:07

level where AI can conceptualize

45:10

things and predict forward and build new knowledge.

45:13

This might be a week from now, or it might be a

45:15

century from now. Who knows how hard

45:17

a problem that's going to turn out to be. But

45:19

I want us to be clear eyed on where we

45:21

are right now, because sometimes

45:24

in the blindingly impressive light

45:27

of what current AI is doing, it

45:29

can be difficult to see, what's missing

45:32

and where we might be heading. That's

45:38

all for this week. To find out more

45:40

and to share your thoughts, head over to eagleman

45:43

dot com slash Podcasts, and

45:45

you can also watch full episodes of Inner

45:48

Cosmos on YouTube. Subscribe

45:50

to my channel so you can follow along each

45:52

week for new updates. I'd love

45:54

to hear your questions, so please

45:56

send those to podcast at

45:58

eagleman dot com and I will do

46:01

a special episode where I answer questions.

46:03

Until next time. I'm David Eagelman

46:06

and this is Inner Cosmos

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features