When AI hears a problem by In Machines We Trust | Podchaser

Episode from the podcastIn Machines We Trust

When AI hears a problem

Released Wednesday, 17th May 2023

Good episode? Give it some love!

When AI hears a problem

When AI hears a problem

Wednesday, 17th May 2023

Good episode? Give it some love!

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

Support for this episode comes from the Knight

0:02

Science Journalism Fellowship Program at

0:04

MIT.

0:07

This is MIT Technology

0:10

Review. We're

0:19

sorry. All of our representatives are still

0:21

assisting other customers. Please

0:23

remain on the line as we value your call.

0:26

We've all been there, and it's safe to say

0:29

you might even dread the experience. You're

0:31

already on a crusade to solve an issue,

0:34

then you have to go through a long phone tree,

0:37

and you might not be greeted by a human. This

0:39

call may be monitored or recorded for quality

0:42

assurance or training purposes. And

0:44

if you've wondered who really listens to these calls

0:47

and recordings, making sure agents

0:49

say the right things, it might not be

0:51

a person at all. These days,

0:53

AI solutions are being used to analyze

0:56

our voices in real time. And

0:58

this applies to some health care settings, as

1:00

well as these customer service phone trees. The

1:03

idea is that hidden away in our voices

1:06

are signals that hold clues to how we're

1:08

doing, what we're feeling, and

1:10

even what's going on with our physical health.

1:13

Wait, so I need to pay even more?

1:15

I hadn't expected that. This

1:17

is an example of a call being analyzed

1:20

by software from a company called Cogito. It's

1:23

supposed to find signals in a caller's voice

1:25

that help an agent be more effective

1:27

with their replies, like by telling

1:29

them to be more empathetic.

1:32

Yeah, I know it's a bit unexpected. I'm sorry

1:35

about that. It's just not currently offered as a standard

1:37

feature. I'm Jennifer Strong.

1:39

In this episode, we examine what happens

1:41

when algorithms analyze our voices,

1:44

looking for clues about our mental and physical

1:46

health. Let's

1:57

go. In Machines

1:59

We Trust. I'm

2:01

listening. A podcast about

2:03

the automation of everything. You have

2:05

reached your destination.

2:12

As someone who speaks for a living, it's strange

2:14

to think there might be all kinds of signals

2:16

and other data lurking beneath the surface

2:19

of the human voice. Not just in

2:21

what we say with words, but in the way we

2:23

sound while speaking. And

2:25

when we sing, we begin with...

2:27

Do re mi. Do

2:29

re mi fa sol la ti

2:32

do, so

2:32

do. Yep, keep the vowel

2:35

open one more time. Naaaa. Mmmmm.

2:41

Ooooo. And those signals

2:43

are increasingly being detected and analyzed

2:46

with AI. To provide clues about

2:48

who we are and even what we look

2:50

like. Or whether we have a medical

2:52

condition. Developers of these

2:54

products are going way beyond the hunt for

2:56

clues about people's emotional states. They're

2:59

looking for signs of all kinds of diseases, including

3:02

Parkinson's and Alzheimer's. So

3:05

from real-time analysis of our voices by

3:07

businesses providing things like customer

3:09

service, to healthcare applications and

3:12

more, just trying to understand

3:14

the overall scale of what companies and

3:16

researchers think they might be able to learn

3:18

from our voices can be pretty overwhelming.

3:22

And might even feel kind of dystopian

3:24

too. But it's also

3:26

possible this kind of tech might help

3:28

millions of people. In the US

3:30

alone, about one in six have been diagnosed

3:32

with depression,

3:33

and there aren't enough therapists to help. We're

3:37

joined now by my reporting partner on this project,

3:39

Hilke Schellman. She's an Emmy Award-winning

3:41

journalist writing a book about AI at work,

3:44

and she's been investigating this topic with us.

3:46

Hey, Hilke. Hey, Jen. What

3:49

was it that got you interested in this topic? You

3:52

know, I was working on my book, and I was reading

3:54

an article on how the surveillance of students

3:56

on university campuses has increased

3:58

dramatically during the pandemic.

3:59

pandemic. And this article was mostly

4:02

about tracking students locations and machines,

4:04

checking the temperatures. But somewhere

4:07

the author talked about a school called Menlo

4:09

College and how they started

4:11

using a tool to check the students voices

4:13

for signs of depression. And that

4:16

really piqued my interest. I wanted to know more

4:18

so I reached out to a company called Ellipsis

4:20

Health that built the tool

4:22

and I also had the chance to talk to a couple of students

4:24

who used it and I wanted to know how did

4:27

it actually feel using the tool.

4:29

Okay so let's start with the college. It's

4:31

located in the heart of Silicon Valley and we

4:34

spoke to one of its students, Lina Lacoski

4:36

Torres.

4:37

I'm majoring in business management

4:40

and they have a concentration in entrepreneurship

4:42

and innovation so that's my concentration.

4:45

I found that that was best suited towards

4:48

my needs just in regards to innovation

4:50

I'm always interested in doing things against

4:52

the status quo so that's

4:55

my major.

4:56

When we spoke with her she was a 19 year old

4:58

junior. I think that it was

5:00

more the holistic version of

5:02

I didn't come in with the mindset

5:04

of oh I'm gonna go to Google

5:07

or I'm gonna go and make a lot of money, like

5:10

I'm more focused on social change

5:13

and how I can best use

5:15

my brain, my thinking skills.

5:17

I don't have a lot of like technical skills.

5:20

But she found it hard to escape these expectations.

5:23

Because it is really overwhelming, fresh

5:25

out of high school going okay now we're

5:28

in a situation where 21 year olds

5:30

have built unicorns.

5:32

What are you gonna do? And it's like I'm just

5:35

trying to figure out how to exist you

5:37

know as my own person. Then

5:39

the pandemic hit.

5:40

So I did my first semester

5:43

of my freshman year in person

5:46

living in the dorms

5:48

and then spring break on

5:51

my second semester freshman year then

5:53

we went online.

5:55

And she told us many of the students

5:57

struggled especially early on when

5:59

they had to rush back home. A lot

6:01

of things come up. You're with your family. Family

6:04

issues, you can't get away from it. You're with your

6:06

mind. You know, you don't realize how much

6:10

you miss your social interactions. When

6:12

the campus shut down, she had to return to her

6:14

mom's place in Las Vegas. And

6:17

relocating out of state

6:18

meant cutting the cord with her therapist, who

6:20

can't work outside of California. I

6:23

was in Nevada, so I was pretty

6:26

distraught. I'll be honest, I felt like I was breaking

6:28

up with my therapist. I go, OK,

6:30

I'm in Nevada, so I guess. Bye.

6:33

And it felt really abrupt, especially in the time

6:35

that you need it the most. She

6:38

wasn't the only one struggling to find help.

6:40

About half the students at Menlo College aren't

6:42

from California. And in the midst of an unfolding

6:45

global crisis, suddenly many

6:47

of them couldn't access the college's mental

6:49

health services.

6:51

Then a startup pitched the school on an AI

6:53

product that's meant to assess anxiety

6:55

and depression and help people navigate

6:57

those symptoms. The school agreed

7:00

to try it, and in late 2020, the

7:02

tool was rolled out free of charge to about 800

7:04

students. It asked

7:06

people to answer daily questions in a voice

7:08

message. How's everything going at home?

7:11

And then I go for 30 to 45 seconds, hey, you know, things

7:15

have been tough. I'm really stressed out.

7:17

I'm feeling overwhelmed. I'm feeling

7:20

like I'm being smothered,

7:21

et cetera. And then

7:23

it would switch over to the prompt of, how

7:25

are you feeling lately? Or

7:28

how is school going? You know, another prompt

7:30

to keep you going.

7:32

Every time someone used it, they got a score,

7:34

and the tool gave recommendations, from

7:37

breathing exercises to the number for

7:39

a crisis helpline. For Lacoski

7:41

Torres, she didn't find the exercises

7:43

too helpful. She mostly used the tool

7:45

to prove to her mom just how stressed out

7:47

she was by living back at home.

7:50

I was like, look, it's real. This

7:53

is how I feel. You

7:55

can see it. There's data right there.

7:58

So not so much the coping skills.

7:59

because I already have issues

8:02

implementing those to begin with. People say, mine,

8:05

meditation. It's not, I'm

8:07

freaking out right now. If I could breathe

8:10

and chill, trust me, I definitely

8:12

would, you know?

8:13

But she says she found it helpful in

8:16

other ways. What if we didn't have this

8:18

at all? It wouldn't have brought up mental health in

8:20

the way that it did at such a large

8:22

scale and in such a way that,

8:25

I

8:25

don't know, I thought it was pretty cool. There's a lot

8:27

of people that thought it was pretty cool.

8:30

Though it did raise some questions about privacy.

8:32

Where's my talking going to? Somebody

8:34

gonna hear it. But it's really

8:37

just the computer-based system and

8:39

the AI system. And then, I guess,

8:42

in its infancy stages, possibly somebody

8:46

who works on it. But I mean, with the utmost

8:48

security, I'm pretty sure they didn't care

8:51

what you're really out to say. I think that's a

8:53

really big thing with AI too and everybody's

8:55

data. You go, okay, what's it gonna

8:57

be used for? It's my data. And it's

9:00

like, you're not a super spy.

9:02

Privacy also came up in discussions between

9:05

students, college representatives, and

9:07

the product's maker, Ellipsis Health. But

9:09

she says it was mostly about whether school therapists

9:12

would get access to that data.

9:14

And students said that wasn't appropriate.

9:16

Though

9:16

the privacy of their voice data and

9:18

what might happen to that was less of an issue.

9:21

Privacy doesn't really exist anymore.

9:23

And if you feel some type of way, you're gonna go onto

9:26

your social media and put it on blast.

9:28

You're gonna tell your friends, da-da-da-da. It's

9:30

not something you really wanna keep to yourself.

9:32

By the time we spoke, she didn't have access

9:34

to the tool anymore because the pilot program

9:37

had ended. But for the school, it jump-started

9:40

a broader conversation and a rethinking

9:42

of the way it delivers mental health services.

9:45

I think they have a greater faith in technology

9:47

than perhaps older generations do.

9:50

This is Angela Schmida, a vice president

9:52

at Menlo College. If I've learned

9:55

anything about students and their mental

9:57

health over the last two years, it's that there's

9:59

not a- one size fits all solution.

10:02

And so you can offer face-to-face counseling

10:05

and some students just won't take advantage of that.

10:08

But if you can offer something

10:10

where students are accessing it on their own

10:12

and they can do it in real time

10:14

and

10:15

there's not a barrier associated with

10:17

it, then some students will access that.

10:20

And she says she tried the tool herself. There

10:23

is something therapeutic about just talking.

10:25

I did find that to be the case and I

10:27

think for me it's a little bit awkward because you're

10:29

just sitting there and you're talking and you're not talking

10:31

to anyone. So it's a little bit like talking to yourself.

10:34

And you do wonder, you know, who's really going

10:36

to listen to this? So I know that the students

10:39

had concerns about that as well.

10:41

But with the demand for mental health services

10:44

greatly outstripping supply all over

10:46

the world, there's a lot of interest in

10:48

finding tech that might help.

10:50

And it's something that reporter Hilke Schellman

10:53

is looking at. I mean, there are not

10:55

enough therapists to help everyone. So

10:57

vocal biomarkers, they could be revolutionary

11:00

and it could help a lot of people, maybe

11:03

even millions of people, billions

11:05

of people, or at least that's

11:08

the hope. And so we

11:10

reached out to four startups to find out

11:12

how their technologies work and to

11:14

really understand what these tools

11:17

do. The first is called Kintsugi.

11:20

It's a startup with a test that's meant to find mental

11:22

distress from just a 20-second voice

11:24

recording. And it's supposed to work

11:26

regardless of what language someone is speaking.

11:29

Grace Chang is the company's CEO and she

11:32

told us these AI tools essentially

11:34

do what our parents and many therapists

11:36

have done for a long time.

11:38

When we talk to our friends, our family

11:40

members, it's almost obvious

11:42

for those who are close to us when

11:45

they speak in a lower voice or

11:47

if they speak in a slower manner that

11:49

something might be wrong with them. And

11:51

we have the luxury of knowing this person's

11:54

set of patterns to be able to determine

11:56

that there may be something

11:59

that is... different than

12:01

how this person normally speaks. And

12:04

so what is really remarkable is

12:07

that psychiatrists have known

12:09

that in this area of speech

12:12

there has always been a tie to

12:14

depression and anxiety.

12:16

She believes they can replicate this intuitive

12:18

speech analysis over a short period,

12:21

do it at scale, and that it can help therapists

12:23

understand how their patients are doing in between

12:26

appointments.

12:27

Our company has moved towards a

12:29

position of being able to create

12:32

a robust set of models, not

12:35

looking at what people are saying

12:38

but how they are saying

12:40

it. Our models end up being

12:42

language agnostic. So

12:44

we have people that are able to speak

12:46

in French and in Japanese

12:49

and English or otherwise. But

12:51

really we are just looking for those

12:54

biomarkers that are most predictive

12:56

for depression and anxiety. But

12:59

what is really fascinating about machine

13:01

learning is that we don't have

13:03

just the few examples of a psychiatrist

13:06

working with maybe hundreds of patients

13:09

across his or her career. Instead

13:12

we have tens of thousands of examples

13:14

of individuals who

13:17

have designated depression

13:19

or anxiety as examples for

13:22

machines to learn from.

13:24

She also says cultural influences

13:26

on language don't matter as much as we

13:28

might think. We don't care

13:31

about any of the demographic

13:33

information or the context of what's

13:35

happening

13:37

because we are looking at how people

13:39

are speaking, these spectral and prosodic

13:42

features of like how fast

13:45

or how loud or these

13:47

sort of, if you can see on a visual

13:49

spectrogram of how people

13:51

are speaking, there are some nuances

13:54

to speech that machines are able

13:56

to pick up.

13:59

What we do is we

14:02

analyze human voice and

14:04

deliver insight, health

14:07

insight, from that analysis.

14:09

And this is David Liu, the CEO

14:11

of Sond Health, a company that's regarded

14:14

as one of the frontrunners in the space. Its

14:17

health monitoring products look for all

14:19

sorts of things, from signs of cognitive

14:21

or motor impairment to asthma,

14:24

drug use. And we asked him for some

14:26

concrete examples of how it all works.

14:28

Think of us

14:29

as a health insight company,

14:32

and we use voice as the window to

14:35

deliver that insight. The technology

14:38

is a detection and monitoring technology,

14:40

right? What it does is it takes in six

14:43

to 30 seconds of human

14:45

voice, and from there, our

14:48

algorithms and models, which have been

14:51

trained on tens of thousands of people,

14:53

both in the US and in Asia, then

14:56

are analyzing, I

14:58

call it the atomic level of your voice.

15:01

We take and look at five

15:03

millisecond strips of your 30

15:05

second or 12 second voice sample, and

15:09

then within there, we're analyzing

15:11

those vocal features that do

15:14

and that we have identified as being

15:17

relevant

15:18

and understanding your particular

15:20

condition. And pauses are something

15:22

that almost all companies are looking into.

15:24

What we look at is the

15:26

time difference between when

15:29

air is being pushed out of your mouth to

15:31

when sound and voice is

15:33

being detected. And so that time

15:35

period, which is

15:37

quite short, we can measure that.

15:39

His system also records the smoothness

15:41

of a speaker's voice, control of vocal

15:44

muscles, energy, clarity and

15:46

the speech rate. And by the way, some

15:48

of these features can be heard

15:50

by the human ear and discerned. Most

15:53

cannot.

15:54

The idea is that tucked away in our voice,

15:57

AI can find hidden information.

15:59

The basic claim is that our voice is directly

16:02

connected to the brain,

16:03

and vocal biomarkers might allow

16:05

us to reverse engineer what's going on up

16:08

there.

16:08

We talk about vocal biomarkers

16:11

and SON being able to understand

16:14

changes in health through changes in voice. And

16:17

the reason why this is possible is,

16:19

and I'll go into the science of it a little bit, but

16:22

it is really based on physiology.

16:24

When symptoms of

16:27

a disease such as depression or anxiety

16:29

or any of these other mental health conditions

16:32

begin to have an effect

16:34

on the body,

16:35

and they do,

16:36

stemming from the brain, and there are a

16:38

hundred different body parts that come

16:41

together in your body,

16:43

in everyone's body to allow us to

16:45

speak. It's one of the most complicated

16:48

activities that human beings participate

16:50

in. There are literally thousands

16:53

of vocal features. I had no idea of this before

16:55

I came into the space either, but voice

16:58

is actually an incredibly rich

17:01

data source of interesting

17:04

acoustic features. And so his

17:06

and other companies argue they can objectively

17:08

monitor patients and pick up, for example,

17:11

if someone is becoming depressed.

17:13

It's a complex mixture of physical,

17:16

mental health, mental abilities

17:18

that come together. And so when

17:20

these symptoms of disease begin

17:23

to spread and begin to become of

17:25

a larger impact, they will impact

17:28

the actual physical aspects

17:31

and characteristics of your voice.

17:35

But it's not just healthcare companies using this

17:37

technology. What might it mean

17:39

to have it running in the background of our business

17:42

calls or during job interviews, where

17:44

voice analysis that may come with hiring

17:47

software could potentially flag

17:49

a job candidate as depressed? It's

17:52

complicated, and among the many questions

17:54

it raises,

17:55

would companies have an ethical obligation

17:57

to share these insights?

17:59

And so he asked Liam.

17:59

the former CEO at

18:02

Winter Light Labs. We've sort

18:04

of steered clear of that type of use cases because

18:08

there's a lot of thorny ethical issues, which

18:10

is like, let's say you have an Alexa running in the background. It

18:13

might be measuring your speech, but it

18:15

also might be measuring your spouse's speech or

18:18

the delivery person's speech or the TV. So

18:20

there's a lot of different things that

18:22

could happen in the background. And

18:24

then there's also a lot of context that's missing, like

18:26

is this person responding to

18:28

something that they're angry with or they're responding

18:31

to something that they're happy with? And so

18:33

in the short term, it's much easier to do

18:36

more active assessments. And

18:38

so you can kind of control the subject matter and what they're talking

18:41

about. So technically, it's

18:43

easier to do, and ethically it

18:45

avoids some of those challenges of diagnosing people

18:48

that don't want to be diagnosed or don't even know that they're

18:50

being listened to, which are generally pretty

18:52

creepy, and we want to kind of

18:54

avoid at this point. But

18:56

one question we haven't asked yet is whether

18:59

these biomarkers really contain this hidden

19:01

information about our health. We trust

19:03

a blood pressure reading as a biomarker of

19:05

physical health, but how do we know whether

19:07

we should trust vocal biomarkers as an

19:10

accurate reflection of disease?

19:12

We asked one of the world's leading AI researchers,

19:15

Margaret Mitchell. She founded Google's Ethical

19:17

AI Group and is a pioneer in the field

19:20

of machine learning with close to 100 papers. She

19:23

doesn't have commercial ties to vocal biomarkers now, but

19:26

she's worked on them in the past at Oregon

19:28

Health and Science University.

19:30

So in particular, we were looking at their

19:32

speech streams to see if

19:34

we could do detection of mild

19:37

cognitive impairment, which is a precursor

19:39

to Alzheimer's, Parkinson's.

19:42

So that's on the older end. And then

19:44

on the younger end, we were looking at autism

19:48

and apraxia. So with

19:50

all of the above, there was a question

19:52

of what sort of signals can we pull out.

19:55

With autism, prosody was a

19:57

big one.

19:58

So prosody is sort of like... the musical

20:01

side of language. So

20:04

I can say something like this, or

20:06

I could say something like this.

20:09

The latter has much more of an intonation

20:11

contour going up and down,

20:14

right? And so, for example, people

20:16

who are depressed tend to have more

20:18

monotone, a little bit flatter intonation

20:21

contours, right, than people

20:23

who are not depressed. So we were looking at that

20:25

kind of thing for autism. And

20:27

then for Parkinson's, we were looking at

20:29

a few different kind of things,

20:32

including pause behavior.

20:35

So one aspect of the

20:37

speech stream is the pauses,

20:40

the silence between different phrases.

20:43

You can start to pull out some

20:45

things that are roughly predictive of some

20:50

sort of neurological statuses. We

20:52

didn't have a ton of luck with Parkinson's at

20:54

the time. We did have some

20:56

luck with mild cognitive impairment and

20:58

with autism.

21:00

And she says her team got the best results

21:02

when they combined both sound and words.

21:05

The project that I worked on the most

21:09

that showed some nice results was

21:12

for mild cognitive impairment. And

21:14

we found that we

21:16

were able to make

21:18

some reasonable predictions of

21:21

mild cognitive impairment when we used speech

21:24

signal as well as language

21:27

signal. So it wasn't

21:29

just the audio. It was

21:31

also what they were saying.

21:33

The goal is to make health care more accessible

21:36

and less expensive. So if

21:38

you're able to make some

21:40

predictions based on a speech stream, you

21:42

can do a few things.

21:44

One is pre-screening. And

21:46

in pre-screening, you would call a phone

21:48

number, you would take a battery of tests.

21:50

And then based on the pre-screening results,

21:53

it would say whether or not you should

21:55

go speak to a physician. So

21:57

that just initial sort of pre-screening. is

22:00

a really useful thing to be able to

22:02

do and keep down costs. And

22:04

then once someone already has a diagnosis,

22:07

it's really useful for them to be able to just

22:09

at home do retellings,

22:12

do these different kinds of things without having to

22:14

go in and be especially diagnosed

22:17

and then have those readings or signals

22:19

sent back to the clinician.

22:22

So monitoring a patient's voice with

22:24

a tool like this could help detect if

22:26

a patient's depression is getting better

22:28

or worse. But she cautions,

22:31

these predictions aren't precise. This

22:33

has to be tempered by the fact that

22:36

accuracy and other sort of evaluation

22:38

metrics are virtually never 100 percent.

22:42

And there's a lot of additional factors that

22:44

come into play that affect accuracy, such

22:46

as what kind of phone is being used, what sort

22:48

of audio recording device, all this. So

22:51

in a world where everything worked perfectly,

22:53

that

22:54

is the goal. In reality,

22:56

we'll probably never get to a place where we work perfectly.

22:59

And so we'll

23:00

probably be in a space

23:02

where a

23:03

system could automatically give you

23:05

a preliminary reading, you

23:07

know, based on my faulty ability

23:10

to make predictions, here is my preliminary

23:12

reading of you. You know, it's kind

23:14

of like when you take a pregnancy test, you like

23:17

you first take the ones that are at home ones

23:20

and then eventually you're like, OK, I guess I'll go see a doctor.

23:22

It's sort of that sort of thing where

23:24

you try and be very clear that there's false positives,

23:26

false negatives, similar with Covid tests,

23:29

I suppose. You know, you try and make it as clear

23:31

as possible that it won't always work, but at least

23:33

it's better than nothing and can

23:35

be a signal that

23:36

then you go see a professional.

23:39

It's why one signal is not enough for

23:41

a diagnosis, because what if someone

23:44

had a bad day and speaks in a monotone

23:46

voice? And this isn't unique. It's

23:48

also the case with other biomarkers, too.

23:51

An elevated heart rate might mean there's a problem

23:54

or maybe the patient was just running

23:56

late. And it's also critical that

23:58

companies building these tools.

23:59

make sure the AI doesn't predict

24:02

on the wrong thing. Disaggregation.

24:05

This is the word to know. Disaggregation.

24:08

In

24:08

disaggregation, you take

24:10

all of these variables and

24:13

you test with respect to each one of them.

24:16

So in general experimental

24:18

design, there are independent variables

24:20

and dependent variables. I think

24:22

to your point, in machine learning, people often

24:25

don't pay attention to, you know, experimental

24:28

design that has been really well defined for years.

24:31

She also points out another problem. This

24:33

technology could easily be misused.

24:35

So if it's possible

24:38

for a potential employer interviewing

24:40

you

24:41

to automatically get some

24:43

readout that says, oh, this person might struggle

24:45

with depression, then that's a mechanism

24:48

for them to discriminate against you and

24:50

choose not to hire you. I've done

24:52

work on predicting depression

24:55

and I intentionally did not

24:57

include examples of the kinds of words

24:59

that were used. Because then you have

25:01

armchair, you know, you have armchair

25:04

clinicians like, oh, well, they're using the

25:06

word ibuprofen a lot. And I heard

25:08

that ibuprofen was a signal of depression.

25:11

So

25:12

now I'm going to be biased against this person in that way. I

25:15

mean, even talking about this, it's a little bit worrying, but,

25:17

you know, it is important for people to understand what's happening.

25:20

I can give this really funny example

25:22

that working on this speech stuff, one

25:25

of my friends, colleagues went to a talk

25:27

about depression and signatures of depression. The

25:30

colleague came back and as they came back,

25:33

I was very frustrated with something

25:35

and I went,

25:36

and then she went, are you depressed? Depressed

25:39

people sigh a lot. I heard, you

25:41

know what I mean? And so like even

25:43

giving this example, now her impression

25:45

of the world is fundamentally altered,

25:47

right? Now she sees depression in people that she wouldn't

25:50

have otherwise seen. So it's so critical

25:52

to make sure that these discussions

25:54

are really couched in all the caveats

25:57

and that they are only released to people

25:59

who.

25:59

who can really work through these nuances

26:02

and understand them. Otherwise,

26:04

all of these kinds of findings are going to be a mechanism

26:06

for discrimination. If you have

26:09

depression, you might use monotone.

26:12

If you're using monotone, that doesn't mean you're depressed.

26:15

And so understanding the sort of causality

26:17

flow there, if this, then that,

26:20

is really critical and something that a lot of people

26:22

mess up. And then particularly

26:24

if it ends up influencing people

26:26

and people's impressions in situations

26:29

like hiring, then it's a very

26:31

serious concern. This

26:34

makes it hard for researchers to share their findings.

26:37

Plus, there are lots of concerns about privacy

26:39

with this data. And so we turned

26:41

to Bjorn Schuller, a professor of artificial

26:44

intelligence at Imperial

26:45

College London, and one of the world's

26:47

leading experts in vocal biomarkers. We

26:49

asked him whether using this tech in a healthcare

26:52

setting is even a good idea. And

26:55

he said yes. Vocal biomarkers

26:58

can be useful where doctors have already

27:00

been listening with the stethoscope, mainly

27:02

for illnesses involving the lungs, since

27:04

these diseases cause an audible change

27:07

in the way we breathe, speak, or cough.

27:10

And he says this tech can go a step further

27:12

than traditional doctors can, not

27:14

only hearing a different cough, but for example,

27:17

if someone develops throat and neck cancer, the

27:19

disease changes the voice in a particular

27:21

way that can be picked up.

27:23

It came from, let's say, where

27:25

people have been listening to, and

27:27

moved more and more into, okay, let's think

27:29

about what would actually make a change

27:32

to your voice. It's your cognition for

27:34

speech production. It's your

27:36

physiology. It's your motor system.

27:38

If that is somehow affected, we should be

27:40

able to hear it.

27:42

But then he said something that gave us pause, that

27:45

the tech can already pretty accurately

27:47

predict a person's height from their voice,

27:49

and even their heart rate.

27:51

And one day, he thinks it'll be used

27:54

to spot things about infants even before

27:56

their own parents can, like that

27:58

it could help diagnose autism. and other

28:00

things very early on, simply

28:02

from a baby's cries.

28:27

He believes vocal biomarkers could also make

28:29

a huge difference in countries where medical care

28:31

is hard to access. So in the

28:33

future, people could call a number,

28:36

leave a voice message, and the computer

28:38

could tell them if they have dementia, throat

28:40

cancer, or other illnesses.

28:42

You would have the luxury of

28:44

it being very cheap, very accessible, if you just

28:46

have to call a call center

28:48

and it can give you feedback. This would

28:50

mean that we can, in less

28:53

connected countries, easily provide

28:55

such health services, via

28:57

phone.

28:58

These ideas are associated with deep neural

29:00

networks, a form of machine learning where

29:02

the computer trains itself to find patterns.

29:05

In the past, scientists needed to make

29:08

assumptions about what they could find in a voice,

29:10

and then figure out where the signal might

29:12

be.

29:13

Is it in the pitch,

29:15

or is it in the loudness?

29:17

These days, the computer gets training data

29:19

that's already labeled, so whether

29:22

the person does or doesn't have a disease

29:24

is marked.

29:25

And then a system looks for patterns in

29:27

that audio.

29:28

And this gets tricky because the

29:30

machine might find patterns we can't hear,

29:33

or it might find patterns that have nothing

29:36

to do with what it's trying to find.

29:38

And he gives an example of this from his own

29:40

work.

29:41

His team was trying to find the difference in

29:43

how people explain smelling something good

29:46

and smelling something awful.

29:48

So it's a pleasant or unpleasant smell, but

29:50

the machine that was inducing the smell made bubble

29:53

sounds, which were slightly different. And

29:56

we then took the pauses between the speech and

29:58

recognized that the AI actually

29:59

picks up the bubbling sounds of different

30:02

things poured into the liquid to produce the smell.

30:05

So we have to be very careful that we assure

30:07

what we're actually recognizing

30:09

is coming from the voice and not from any background

30:12

context.

30:13

In other words, the pattern the software found

30:15

wasn't the difference in how people reacted.

30:18

It's that the machine made different sounds when

30:20

releasing different fragrances. Similar

30:22

problems happened with trying to diagnose COVID

30:25

using software to analyze coughs, and

30:27

Schuller says he's worried that without extensive

30:30

testing, some tools won't work as advertised.

30:33

And frankly, we're sometimes

30:35

right. Yeah, I

30:37

wouldn't want to say worried, but I'm

30:40

clear about the real performance

30:42

of some commercial products. That's

30:45

generally true, of course, in machine learning. There's

30:47

a lot of people not fully revealing

30:49

their test methods and so on.

30:52

Something else, as we consider whether

30:55

to use these markers to test for things like

30:57

drug or alcohol impairment, voice

30:59

generators already exist that can change

31:01

our voices in real time, making

31:04

them sound happier or monotone, or

31:06

like an entirely different person. And

31:09

this could be used to sidestep these checks like

31:11

to fake a drug test.

31:13

So while players in the industry are quite confident

31:16

that vocal biomarkers exist and

31:18

can be used for a great number of applications,

31:21

how well it really works in practice remains

31:23

unclear.

31:29

This episode was reported by Hoka Shellman,

31:31

produced by me with Emma Sillikins and Anthony

31:33

Green, were edited by Matt Hohnen and mixed

31:36

by Garrett Lang, with original music by

31:38

Garrett Lang and Jacob Gorski. Special

31:41

thanks to the Night Science folks at MIT

31:43

for their support with this reporting. And thank

31:45

you for listening. I'm Jennifer Strong.

31:52

This is MIT Technology

31:54

Review.

31:59

you

Rate

Get this podcast via API

From The Podcast

In Machines We Trust

Host Jennifer Strong thoughtfully examines the far-reaching impact of artificial intelligence on our daily lives. Produced by MIT Technology Review, the podcast explores the rise of AI through the voices of people reckoning with the power of the technology, and by taking listeners up close with the inventors and founders whose ambitions are fueling the development of new forms of AI.

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More