Podchaser Logo
Home
When AI hears a problem

When AI hears a problem

Released Wednesday, 17th May 2023
Good episode? Give it some love!
When AI hears a problem

When AI hears a problem

When AI hears a problem

When AI hears a problem

Wednesday, 17th May 2023
Good episode? Give it some love!
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

Support for this episode comes from the Knight

0:02

Science Journalism Fellowship Program at

0:04

MIT.

0:07

This is MIT Technology

0:10

Review. We're

0:19

sorry. All of our representatives are still

0:21

assisting other customers. Please

0:23

remain on the line as we value your call.

0:26

We've all been there, and it's safe to say

0:29

you might even dread the experience. You're

0:31

already on a crusade to solve an issue,

0:34

then you have to go through a long phone tree,

0:37

and you might not be greeted by a human. This

0:39

call may be monitored or recorded for quality

0:42

assurance or training purposes. And

0:44

if you've wondered who really listens to these calls

0:47

and recordings, making sure agents

0:49

say the right things, it might not be

0:51

a person at all. These days,

0:53

AI solutions are being used to analyze

0:56

our voices in real time. And

0:58

this applies to some health care settings, as

1:00

well as these customer service phone trees. The

1:03

idea is that hidden away in our voices

1:06

are signals that hold clues to how we're

1:08

doing, what we're feeling, and

1:10

even what's going on with our physical health.

1:13

Wait, so I need to pay even more?

1:15

I hadn't expected that. This

1:17

is an example of a call being analyzed

1:20

by software from a company called Cogito. It's

1:23

supposed to find signals in a caller's voice

1:25

that help an agent be more effective

1:27

with their replies, like by telling

1:29

them to be more empathetic.

1:32

Yeah, I know it's a bit unexpected. I'm sorry

1:35

about that. It's just not currently offered as a standard

1:37

feature. I'm Jennifer Strong.

1:39

In this episode, we examine what happens

1:41

when algorithms analyze our voices,

1:44

looking for clues about our mental and physical

1:46

health. Let's

1:57

go. In Machines

1:59

We Trust. I'm

2:01

listening. A podcast about

2:03

the automation of everything. You have

2:05

reached your destination.

2:12

As someone who speaks for a living, it's strange

2:14

to think there might be all kinds of signals

2:16

and other data lurking beneath the surface

2:19

of the human voice. Not just in

2:21

what we say with words, but in the way we

2:23

sound while speaking. And

2:25

when we sing, we begin with...

2:27

Do re mi. Do

2:29

re mi fa sol la ti

2:32

do, so

2:32

do. Yep, keep the vowel

2:35

open one more time. Naaaa. Mmmmm.

2:41

Ooooo. And those signals

2:43

are increasingly being detected and analyzed

2:46

with AI. To provide clues about

2:48

who we are and even what we look

2:50

like. Or whether we have a medical

2:52

condition. Developers of these

2:54

products are going way beyond the hunt for

2:56

clues about people's emotional states. They're

2:59

looking for signs of all kinds of diseases, including

3:02

Parkinson's and Alzheimer's. So

3:05

from real-time analysis of our voices by

3:07

businesses providing things like customer

3:09

service, to healthcare applications and

3:12

more, just trying to understand

3:14

the overall scale of what companies and

3:16

researchers think they might be able to learn

3:18

from our voices can be pretty overwhelming.

3:22

And might even feel kind of dystopian

3:24

too. But it's also

3:26

possible this kind of tech might help

3:28

millions of people. In the US

3:30

alone, about one in six have been diagnosed

3:32

with depression,

3:33

and there aren't enough therapists to help. We're

3:37

joined now by my reporting partner on this project,

3:39

Hilke Schellman. She's an Emmy Award-winning

3:41

journalist writing a book about AI at work,

3:44

and she's been investigating this topic with us.

3:46

Hey, Hilke. Hey, Jen. What

3:49

was it that got you interested in this topic? You

3:52

know, I was working on my book, and I was reading

3:54

an article on how the surveillance of students

3:56

on university campuses has increased

3:58

dramatically during the pandemic.

3:59

pandemic. And this article was mostly

4:02

about tracking students locations and machines,

4:04

checking the temperatures. But somewhere

4:07

the author talked about a school called Menlo

4:09

College and how they started

4:11

using a tool to check the students voices

4:13

for signs of depression. And that

4:16

really piqued my interest. I wanted to know more

4:18

so I reached out to a company called Ellipsis

4:20

Health that built the tool

4:22

and I also had the chance to talk to a couple of students

4:24

who used it and I wanted to know how did

4:27

it actually feel using the tool.

4:29

Okay so let's start with the college. It's

4:31

located in the heart of Silicon Valley and we

4:34

spoke to one of its students, Lina Lacoski

4:36

Torres.

4:37

I'm majoring in business management

4:40

and they have a concentration in entrepreneurship

4:42

and innovation so that's my concentration.

4:45

I found that that was best suited towards

4:48

my needs just in regards to innovation

4:50

I'm always interested in doing things against

4:52

the status quo so that's

4:55

my major.

4:56

When we spoke with her she was a 19 year old

4:58

junior. I think that it was

5:00

more the holistic version of

5:02

I didn't come in with the mindset

5:04

of oh I'm gonna go to Google

5:07

or I'm gonna go and make a lot of money, like

5:10

I'm more focused on social change

5:13

and how I can best use

5:15

my brain, my thinking skills.

5:17

I don't have a lot of like technical skills.

5:20

But she found it hard to escape these expectations.

5:23

Because it is really overwhelming, fresh

5:25

out of high school going okay now we're

5:28

in a situation where 21 year olds

5:30

have built unicorns.

5:32

What are you gonna do? And it's like I'm just

5:35

trying to figure out how to exist you

5:37

know as my own person. Then

5:39

the pandemic hit.

5:40

So I did my first semester

5:43

of my freshman year in person

5:46

living in the dorms

5:48

and then spring break on

5:51

my second semester freshman year then

5:53

we went online.

5:55

And she told us many of the students

5:57

struggled especially early on when

5:59

they had to rush back home. A lot

6:01

of things come up. You're with your family. Family

6:04

issues, you can't get away from it. You're with your

6:06

mind. You know, you don't realize how much

6:10

you miss your social interactions. When

6:12

the campus shut down, she had to return to her

6:14

mom's place in Las Vegas. And

6:17

relocating out of state

6:18

meant cutting the cord with her therapist, who

6:20

can't work outside of California. I

6:23

was in Nevada, so I was pretty

6:26

distraught. I'll be honest, I felt like I was breaking

6:28

up with my therapist. I go, OK,

6:30

I'm in Nevada, so I guess. Bye.

6:33

And it felt really abrupt, especially in the time

6:35

that you need it the most. She

6:38

wasn't the only one struggling to find help.

6:40

About half the students at Menlo College aren't

6:42

from California. And in the midst of an unfolding

6:45

global crisis, suddenly many

6:47

of them couldn't access the college's mental

6:49

health services.

6:51

Then a startup pitched the school on an AI

6:53

product that's meant to assess anxiety

6:55

and depression and help people navigate

6:57

those symptoms. The school agreed

7:00

to try it, and in late 2020, the

7:02

tool was rolled out free of charge to about 800

7:04

students. It asked

7:06

people to answer daily questions in a voice

7:08

message. How's everything going at home?

7:11

And then I go for 30 to 45 seconds, hey, you know, things

7:15

have been tough. I'm really stressed out.

7:17

I'm feeling overwhelmed. I'm feeling

7:20

like I'm being smothered,

7:21

et cetera. And then

7:23

it would switch over to the prompt of, how

7:25

are you feeling lately? Or

7:28

how is school going? You know, another prompt

7:30

to keep you going.

7:32

Every time someone used it, they got a score,

7:34

and the tool gave recommendations, from

7:37

breathing exercises to the number for

7:39

a crisis helpline. For Lacoski

7:41

Torres, she didn't find the exercises

7:43

too helpful. She mostly used the tool

7:45

to prove to her mom just how stressed out

7:47

she was by living back at home.

7:50

I was like, look, it's real. This

7:53

is how I feel. You

7:55

can see it. There's data right there.

7:58

So not so much the coping skills.

7:59

because I already have issues

8:02

implementing those to begin with. People say, mine,

8:05

meditation. It's not, I'm

8:07

freaking out right now. If I could breathe

8:10

and chill, trust me, I definitely

8:12

would, you know?

8:13

But she says she found it helpful in

8:16

other ways. What if we didn't have this

8:18

at all? It wouldn't have brought up mental health in

8:20

the way that it did at such a large

8:22

scale and in such a way that,

8:25

I

8:25

don't know, I thought it was pretty cool. There's a lot

8:27

of people that thought it was pretty cool.

8:30

Though it did raise some questions about privacy.

8:32

Where's my talking going to? Somebody

8:34

gonna hear it. But it's really

8:37

just the computer-based system and

8:39

the AI system. And then, I guess,

8:42

in its infancy stages, possibly somebody

8:46

who works on it. But I mean, with the utmost

8:48

security, I'm pretty sure they didn't care

8:51

what you're really out to say. I think that's a

8:53

really big thing with AI too and everybody's

8:55

data. You go, okay, what's it gonna

8:57

be used for? It's my data. And it's

9:00

like, you're not a super spy.

9:02

Privacy also came up in discussions between

9:05

students, college representatives, and

9:07

the product's maker, Ellipsis Health. But

9:09

she says it was mostly about whether school therapists

9:12

would get access to that data.

9:14

And students said that wasn't appropriate.

9:16

Though

9:16

the privacy of their voice data and

9:18

what might happen to that was less of an issue.

9:21

Privacy doesn't really exist anymore.

9:23

And if you feel some type of way, you're gonna go onto

9:26

your social media and put it on blast.

9:28

You're gonna tell your friends, da-da-da-da. It's

9:30

not something you really wanna keep to yourself.

9:32

By the time we spoke, she didn't have access

9:34

to the tool anymore because the pilot program

9:37

had ended. But for the school, it jump-started

9:40

a broader conversation and a rethinking

9:42

of the way it delivers mental health services.

9:45

I think they have a greater faith in technology

9:47

than perhaps older generations do.

9:50

This is Angela Schmida, a vice president

9:52

at Menlo College. If I've learned

9:55

anything about students and their mental

9:57

health over the last two years, it's that there's

9:59

not a- one size fits all solution.

10:02

And so you can offer face-to-face counseling

10:05

and some students just won't take advantage of that.

10:08

But if you can offer something

10:10

where students are accessing it on their own

10:12

and they can do it in real time

10:14

and

10:15

there's not a barrier associated with

10:17

it, then some students will access that.

10:20

And she says she tried the tool herself. There

10:23

is something therapeutic about just talking.

10:25

I did find that to be the case and I

10:27

think for me it's a little bit awkward because you're

10:29

just sitting there and you're talking and you're not talking

10:31

to anyone. So it's a little bit like talking to yourself.

10:34

And you do wonder, you know, who's really going

10:36

to listen to this? So I know that the students

10:39

had concerns about that as well.

10:41

But with the demand for mental health services

10:44

greatly outstripping supply all over

10:46

the world, there's a lot of interest in

10:48

finding tech that might help.

10:50

And it's something that reporter Hilke Schellman

10:53

is looking at. I mean, there are not

10:55

enough therapists to help everyone. So

10:57

vocal biomarkers, they could be revolutionary

11:00

and it could help a lot of people, maybe

11:03

even millions of people, billions

11:05

of people, or at least that's

11:08

the hope. And so we

11:10

reached out to four startups to find out

11:12

how their technologies work and to

11:14

really understand what these tools

11:17

do. The first is called Kintsugi.

11:20

It's a startup with a test that's meant to find mental

11:22

distress from just a 20-second voice

11:24

recording. And it's supposed to work

11:26

regardless of what language someone is speaking.

11:29

Grace Chang is the company's CEO and she

11:32

told us these AI tools essentially

11:34

do what our parents and many therapists

11:36

have done for a long time.

11:38

When we talk to our friends, our family

11:40

members, it's almost obvious

11:42

for those who are close to us when

11:45

they speak in a lower voice or

11:47

if they speak in a slower manner that

11:49

something might be wrong with them. And

11:51

we have the luxury of knowing this person's

11:54

set of patterns to be able to determine

11:56

that there may be something

11:59

that is... different than

12:01

how this person normally speaks. And

12:04

so what is really remarkable is

12:07

that psychiatrists have known

12:09

that in this area of speech

12:12

there has always been a tie to

12:14

depression and anxiety.

12:16

She believes they can replicate this intuitive

12:18

speech analysis over a short period,

12:21

do it at scale, and that it can help therapists

12:23

understand how their patients are doing in between

12:26

appointments.

12:27

Our company has moved towards a

12:29

position of being able to create

12:32

a robust set of models, not

12:35

looking at what people are saying

12:38

but how they are saying

12:40

it. Our models end up being

12:42

language agnostic. So

12:44

we have people that are able to speak

12:46

in French and in Japanese

12:49

and English or otherwise. But

12:51

really we are just looking for those

12:54

biomarkers that are most predictive

12:56

for depression and anxiety. But

12:59

what is really fascinating about machine

13:01

learning is that we don't have

13:03

just the few examples of a psychiatrist

13:06

working with maybe hundreds of patients

13:09

across his or her career. Instead

13:12

we have tens of thousands of examples

13:14

of individuals who

13:17

have designated depression

13:19

or anxiety as examples for

13:22

machines to learn from.

13:24

She also says cultural influences

13:26

on language don't matter as much as we

13:28

might think. We don't care

13:31

about any of the demographic

13:33

information or the context of what's

13:35

happening

13:37

because we are looking at how people

13:39

are speaking, these spectral and prosodic

13:42

features of like how fast

13:45

or how loud or these

13:47

sort of, if you can see on a visual

13:49

spectrogram of how people

13:51

are speaking, there are some nuances

13:54

to speech that machines are able

13:56

to pick up.

13:59

What we do is we

14:02

analyze human voice and

14:04

deliver insight, health

14:07

insight, from that analysis.

14:09

And this is David Liu, the CEO

14:11

of Sond Health, a company that's regarded

14:14

as one of the frontrunners in the space. Its

14:17

health monitoring products look for all

14:19

sorts of things, from signs of cognitive

14:21

or motor impairment to asthma,

14:24

drug use. And we asked him for some

14:26

concrete examples of how it all works.

14:28

Think of us

14:29

as a health insight company,

14:32

and we use voice as the window to

14:35

deliver that insight. The technology

14:38

is a detection and monitoring technology,

14:40

right? What it does is it takes in six

14:43

to 30 seconds of human

14:45

voice, and from there, our

14:48

algorithms and models, which have been

14:51

trained on tens of thousands of people,

14:53

both in the US and in Asia, then

14:56

are analyzing, I

14:58

call it the atomic level of your voice.

15:01

We take and look at five

15:03

millisecond strips of your 30

15:05

second or 12 second voice sample, and

15:09

then within there, we're analyzing

15:11

those vocal features that do

15:14

and that we have identified as being

15:17

relevant

15:18

and understanding your particular

15:20

condition. And pauses are something

15:22

that almost all companies are looking into.

15:24

What we look at is the

15:26

time difference between when

15:29

air is being pushed out of your mouth to

15:31

when sound and voice is

15:33

being detected. And so that time

15:35

period, which is

15:37

quite short, we can measure that.

15:39

His system also records the smoothness

15:41

of a speaker's voice, control of vocal

15:44

muscles, energy, clarity and

15:46

the speech rate. And by the way, some

15:48

of these features can be heard

15:50

by the human ear and discerned. Most

15:53

cannot.

15:54

The idea is that tucked away in our voice,

15:57

AI can find hidden information.

15:59

The basic claim is that our voice is directly

16:02

connected to the brain,

16:03

and vocal biomarkers might allow

16:05

us to reverse engineer what's going on up

16:08

there.

16:08

We talk about vocal biomarkers

16:11

and SON being able to understand

16:14

changes in health through changes in voice. And

16:17

the reason why this is possible is,

16:19

and I'll go into the science of it a little bit, but

16:22

it is really based on physiology.

16:24

When symptoms of

16:27

a disease such as depression or anxiety

16:29

or any of these other mental health conditions

16:32

begin to have an effect

16:34

on the body,

16:35

and they do,

16:36

stemming from the brain, and there are a

16:38

hundred different body parts that come

16:41

together in your body,

16:43

in everyone's body to allow us to

16:45

speak. It's one of the most complicated

16:48

activities that human beings participate

16:50

in. There are literally thousands

16:53

of vocal features. I had no idea of this before

16:55

I came into the space either, but voice

16:58

is actually an incredibly rich

17:01

data source of interesting

17:04

acoustic features. And so his

17:06

and other companies argue they can objectively

17:08

monitor patients and pick up, for example,

17:11

if someone is becoming depressed.

17:13

It's a complex mixture of physical,

17:16

mental health, mental abilities

17:18

that come together. And so when

17:20

these symptoms of disease begin

17:23

to spread and begin to become of

17:25

a larger impact, they will impact

17:28

the actual physical aspects

17:31

and characteristics of your voice.

17:35

But it's not just healthcare companies using this

17:37

technology. What might it mean

17:39

to have it running in the background of our business

17:42

calls or during job interviews, where

17:44

voice analysis that may come with hiring

17:47

software could potentially flag

17:49

a job candidate as depressed? It's

17:52

complicated, and among the many questions

17:54

it raises,

17:55

would companies have an ethical obligation

17:57

to share these insights?

17:59

And so he asked Liam.

17:59

the former CEO at

18:02

Winter Light Labs. We've sort

18:04

of steered clear of that type of use cases because

18:08

there's a lot of thorny ethical issues, which

18:10

is like, let's say you have an Alexa running in the background. It

18:13

might be measuring your speech, but it

18:15

also might be measuring your spouse's speech or

18:18

the delivery person's speech or the TV. So

18:20

there's a lot of different things that

18:22

could happen in the background. And

18:24

then there's also a lot of context that's missing, like

18:26

is this person responding to

18:28

something that they're angry with or they're responding

18:31

to something that they're happy with? And so

18:33

in the short term, it's much easier to do

18:36

more active assessments. And

18:38

so you can kind of control the subject matter and what they're talking

18:41

about. So technically, it's

18:43

easier to do, and ethically it

18:45

avoids some of those challenges of diagnosing people

18:48

that don't want to be diagnosed or don't even know that they're

18:50

being listened to, which are generally pretty

18:52

creepy, and we want to kind of

18:54

avoid at this point. But

18:56

one question we haven't asked yet is whether

18:59

these biomarkers really contain this hidden

19:01

information about our health. We trust

19:03

a blood pressure reading as a biomarker of

19:05

physical health, but how do we know whether

19:07

we should trust vocal biomarkers as an

19:10

accurate reflection of disease?

19:12

We asked one of the world's leading AI researchers,

19:15

Margaret Mitchell. She founded Google's Ethical

19:17

AI Group and is a pioneer in the field

19:20

of machine learning with close to 100 papers. She

19:23

doesn't have commercial ties to vocal biomarkers now, but

19:26

she's worked on them in the past at Oregon

19:28

Health and Science University.

19:30

So in particular, we were looking at their

19:32

speech streams to see if

19:34

we could do detection of mild

19:37

cognitive impairment, which is a precursor

19:39

to Alzheimer's, Parkinson's.

19:42

So that's on the older end. And then

19:44

on the younger end, we were looking at autism

19:48

and apraxia. So with

19:50

all of the above, there was a question

19:52

of what sort of signals can we pull out.

19:55

With autism, prosody was a

19:57

big one.

19:58

So prosody is sort of like... the musical

20:01

side of language. So

20:04

I can say something like this, or

20:06

I could say something like this.

20:09

The latter has much more of an intonation

20:11

contour going up and down,

20:14

right? And so, for example, people

20:16

who are depressed tend to have more

20:18

monotone, a little bit flatter intonation

20:21

contours, right, than people

20:23

who are not depressed. So we were looking at that

20:25

kind of thing for autism. And

20:27

then for Parkinson's, we were looking at

20:29

a few different kind of things,

20:32

including pause behavior.

20:35

So one aspect of the

20:37

speech stream is the pauses,

20:40

the silence between different phrases.

20:43

You can start to pull out some

20:45

things that are roughly predictive of some

20:50

sort of neurological statuses. We

20:52

didn't have a ton of luck with Parkinson's at

20:54

the time. We did have some

20:56

luck with mild cognitive impairment and

20:58

with autism.

21:00

And she says her team got the best results

21:02

when they combined both sound and words.

21:05

The project that I worked on the most

21:09

that showed some nice results was

21:12

for mild cognitive impairment. And

21:14

we found that we

21:16

were able to make

21:18

some reasonable predictions of

21:21

mild cognitive impairment when we used speech

21:24

signal as well as language

21:27

signal. So it wasn't

21:29

just the audio. It was

21:31

also what they were saying.

21:33

The goal is to make health care more accessible

21:36

and less expensive. So if

21:38

you're able to make some

21:40

predictions based on a speech stream, you

21:42

can do a few things.

21:44

One is pre-screening. And

21:46

in pre-screening, you would call a phone

21:48

number, you would take a battery of tests.

21:50

And then based on the pre-screening results,

21:53

it would say whether or not you should

21:55

go speak to a physician. So

21:57

that just initial sort of pre-screening. is

22:00

a really useful thing to be able to

22:02

do and keep down costs. And

22:04

then once someone already has a diagnosis,

22:07

it's really useful for them to be able to just

22:09

at home do retellings,

22:12

do these different kinds of things without having to

22:14

go in and be especially diagnosed

22:17

and then have those readings or signals

22:19

sent back to the clinician.

22:22

So monitoring a patient's voice with

22:24

a tool like this could help detect if

22:26

a patient's depression is getting better

22:28

or worse. But she cautions,

22:31

these predictions aren't precise. This

22:33

has to be tempered by the fact that

22:36

accuracy and other sort of evaluation

22:38

metrics are virtually never 100 percent.

22:42

And there's a lot of additional factors that

22:44

come into play that affect accuracy, such

22:46

as what kind of phone is being used, what sort

22:48

of audio recording device, all this. So

22:51

in a world where everything worked perfectly,

22:53

that

22:54

is the goal. In reality,

22:56

we'll probably never get to a place where we work perfectly.

22:59

And so we'll

23:00

probably be in a space

23:02

where a

23:03

system could automatically give you

23:05

a preliminary reading, you

23:07

know, based on my faulty ability

23:10

to make predictions, here is my preliminary

23:12

reading of you. You know, it's kind

23:14

of like when you take a pregnancy test, you like

23:17

you first take the ones that are at home ones

23:20

and then eventually you're like, OK, I guess I'll go see a doctor.

23:22

It's sort of that sort of thing where

23:24

you try and be very clear that there's false positives,

23:26

false negatives, similar with Covid tests,

23:29

I suppose. You know, you try and make it as clear

23:31

as possible that it won't always work, but at least

23:33

it's better than nothing and can

23:35

be a signal that

23:36

then you go see a professional.

23:39

It's why one signal is not enough for

23:41

a diagnosis, because what if someone

23:44

had a bad day and speaks in a monotone

23:46

voice? And this isn't unique. It's

23:48

also the case with other biomarkers, too.

23:51

An elevated heart rate might mean there's a problem

23:54

or maybe the patient was just running

23:56

late. And it's also critical that

23:58

companies building these tools.

23:59

make sure the AI doesn't predict

24:02

on the wrong thing. Disaggregation.

24:05

This is the word to know. Disaggregation.

24:08

In

24:08

disaggregation, you take

24:10

all of these variables and

24:13

you test with respect to each one of them.

24:16

So in general experimental

24:18

design, there are independent variables

24:20

and dependent variables. I think

24:22

to your point, in machine learning, people often

24:25

don't pay attention to, you know, experimental

24:28

design that has been really well defined for years.

24:31

She also points out another problem. This

24:33

technology could easily be misused.

24:35

So if it's possible

24:38

for a potential employer interviewing

24:40

you

24:41

to automatically get some

24:43

readout that says, oh, this person might struggle

24:45

with depression, then that's a mechanism

24:48

for them to discriminate against you and

24:50

choose not to hire you. I've done

24:52

work on predicting depression

24:55

and I intentionally did not

24:57

include examples of the kinds of words

24:59

that were used. Because then you have

25:01

armchair, you know, you have armchair

25:04

clinicians like, oh, well, they're using the

25:06

word ibuprofen a lot. And I heard

25:08

that ibuprofen was a signal of depression.

25:11

So

25:12

now I'm going to be biased against this person in that way. I

25:15

mean, even talking about this, it's a little bit worrying, but,

25:17

you know, it is important for people to understand what's happening.

25:20

I can give this really funny example

25:22

that working on this speech stuff, one

25:25

of my friends, colleagues went to a talk

25:27

about depression and signatures of depression. The

25:30

colleague came back and as they came back,

25:33

I was very frustrated with something

25:35

and I went,

25:36

and then she went, are you depressed? Depressed

25:39

people sigh a lot. I heard, you

25:41

know what I mean? And so like even

25:43

giving this example, now her impression

25:45

of the world is fundamentally altered,

25:47

right? Now she sees depression in people that she wouldn't

25:50

have otherwise seen. So it's so critical

25:52

to make sure that these discussions

25:54

are really couched in all the caveats

25:57

and that they are only released to people

25:59

who.

25:59

who can really work through these nuances

26:02

and understand them. Otherwise,

26:04

all of these kinds of findings are going to be a mechanism

26:06

for discrimination. If you have

26:09

depression, you might use monotone.

26:12

If you're using monotone, that doesn't mean you're depressed.

26:15

And so understanding the sort of causality

26:17

flow there, if this, then that,

26:20

is really critical and something that a lot of people

26:22

mess up. And then particularly

26:24

if it ends up influencing people

26:26

and people's impressions in situations

26:29

like hiring, then it's a very

26:31

serious concern. This

26:34

makes it hard for researchers to share their findings.

26:37

Plus, there are lots of concerns about privacy

26:39

with this data. And so we turned

26:41

to Bjorn Schuller, a professor of artificial

26:44

intelligence at Imperial

26:45

College London, and one of the world's

26:47

leading experts in vocal biomarkers. We

26:49

asked him whether using this tech in a healthcare

26:52

setting is even a good idea. And

26:55

he said yes. Vocal biomarkers

26:58

can be useful where doctors have already

27:00

been listening with the stethoscope, mainly

27:02

for illnesses involving the lungs, since

27:04

these diseases cause an audible change

27:07

in the way we breathe, speak, or cough.

27:10

And he says this tech can go a step further

27:12

than traditional doctors can, not

27:14

only hearing a different cough, but for example,

27:17

if someone develops throat and neck cancer, the

27:19

disease changes the voice in a particular

27:21

way that can be picked up.

27:23

It came from, let's say, where

27:25

people have been listening to, and

27:27

moved more and more into, okay, let's think

27:29

about what would actually make a change

27:32

to your voice. It's your cognition for

27:34

speech production. It's your

27:36

physiology. It's your motor system.

27:38

If that is somehow affected, we should be

27:40

able to hear it.

27:42

But then he said something that gave us pause, that

27:45

the tech can already pretty accurately

27:47

predict a person's height from their voice,

27:49

and even their heart rate.

27:51

And one day, he thinks it'll be used

27:54

to spot things about infants even before

27:56

their own parents can, like that

27:58

it could help diagnose autism. and other

28:00

things very early on, simply

28:02

from a baby's cries.

28:27

He believes vocal biomarkers could also make

28:29

a huge difference in countries where medical care

28:31

is hard to access. So in the

28:33

future, people could call a number,

28:36

leave a voice message, and the computer

28:38

could tell them if they have dementia, throat

28:40

cancer, or other illnesses.

28:42

You would have the luxury of

28:44

it being very cheap, very accessible, if you just

28:46

have to call a call center

28:48

and it can give you feedback. This would

28:50

mean that we can, in less

28:53

connected countries, easily provide

28:55

such health services, via

28:57

phone.

28:58

These ideas are associated with deep neural

29:00

networks, a form of machine learning where

29:02

the computer trains itself to find patterns.

29:05

In the past, scientists needed to make

29:08

assumptions about what they could find in a voice,

29:10

and then figure out where the signal might

29:12

be.

29:13

Is it in the pitch,

29:15

or is it in the loudness?

29:17

These days, the computer gets training data

29:19

that's already labeled, so whether

29:22

the person does or doesn't have a disease

29:24

is marked.

29:25

And then a system looks for patterns in

29:27

that audio.

29:28

And this gets tricky because the

29:30

machine might find patterns we can't hear,

29:33

or it might find patterns that have nothing

29:36

to do with what it's trying to find.

29:38

And he gives an example of this from his own

29:40

work.

29:41

His team was trying to find the difference in

29:43

how people explain smelling something good

29:46

and smelling something awful.

29:48

So it's a pleasant or unpleasant smell, but

29:50

the machine that was inducing the smell made bubble

29:53

sounds, which were slightly different. And

29:56

we then took the pauses between the speech and

29:58

recognized that the AI actually

29:59

picks up the bubbling sounds of different

30:02

things poured into the liquid to produce the smell.

30:05

So we have to be very careful that we assure

30:07

what we're actually recognizing

30:09

is coming from the voice and not from any background

30:12

context.

30:13

In other words, the pattern the software found

30:15

wasn't the difference in how people reacted.

30:18

It's that the machine made different sounds when

30:20

releasing different fragrances. Similar

30:22

problems happened with trying to diagnose COVID

30:25

using software to analyze coughs, and

30:27

Schuller says he's worried that without extensive

30:30

testing, some tools won't work as advertised.

30:33

And frankly, we're sometimes

30:35

right. Yeah, I

30:37

wouldn't want to say worried, but I'm

30:40

clear about the real performance

30:42

of some commercial products. That's

30:45

generally true, of course, in machine learning. There's

30:47

a lot of people not fully revealing

30:49

their test methods and so on.

30:52

Something else, as we consider whether

30:55

to use these markers to test for things like

30:57

drug or alcohol impairment, voice

30:59

generators already exist that can change

31:01

our voices in real time, making

31:04

them sound happier or monotone, or

31:06

like an entirely different person. And

31:09

this could be used to sidestep these checks like

31:11

to fake a drug test.

31:13

So while players in the industry are quite confident

31:16

that vocal biomarkers exist and

31:18

can be used for a great number of applications,

31:21

how well it really works in practice remains

31:23

unclear.

31:29

This episode was reported by Hoka Shellman,

31:31

produced by me with Emma Sillikins and Anthony

31:33

Green, were edited by Matt Hohnen and mixed

31:36

by Garrett Lang, with original music by

31:38

Garrett Lang and Jacob Gorski. Special

31:41

thanks to the Night Science folks at MIT

31:43

for their support with this reporting. And thank

31:45

you for listening. I'm Jennifer Strong.

31:52

This is MIT Technology

31:54

Review.

31:59

you

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features