Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:00
Support for this episode comes from the Knight
0:02
Science Journalism Fellowship Program at
0:04
MIT.
0:07
This is MIT Technology
0:10
Review. We're
0:19
sorry. All of our representatives are still
0:21
assisting other customers. Please
0:23
remain on the line as we value your call.
0:26
We've all been there, and it's safe to say
0:29
you might even dread the experience. You're
0:31
already on a crusade to solve an issue,
0:34
then you have to go through a long phone tree,
0:37
and you might not be greeted by a human. This
0:39
call may be monitored or recorded for quality
0:42
assurance or training purposes. And
0:44
if you've wondered who really listens to these calls
0:47
and recordings, making sure agents
0:49
say the right things, it might not be
0:51
a person at all. These days,
0:53
AI solutions are being used to analyze
0:56
our voices in real time. And
0:58
this applies to some health care settings, as
1:00
well as these customer service phone trees. The
1:03
idea is that hidden away in our voices
1:06
are signals that hold clues to how we're
1:08
doing, what we're feeling, and
1:10
even what's going on with our physical health.
1:13
Wait, so I need to pay even more?
1:15
I hadn't expected that. This
1:17
is an example of a call being analyzed
1:20
by software from a company called Cogito. It's
1:23
supposed to find signals in a caller's voice
1:25
that help an agent be more effective
1:27
with their replies, like by telling
1:29
them to be more empathetic.
1:32
Yeah, I know it's a bit unexpected. I'm sorry
1:35
about that. It's just not currently offered as a standard
1:37
feature. I'm Jennifer Strong.
1:39
In this episode, we examine what happens
1:41
when algorithms analyze our voices,
1:44
looking for clues about our mental and physical
1:46
health. Let's
1:57
go. In Machines
1:59
We Trust. I'm
2:01
listening. A podcast about
2:03
the automation of everything. You have
2:05
reached your destination.
2:12
As someone who speaks for a living, it's strange
2:14
to think there might be all kinds of signals
2:16
and other data lurking beneath the surface
2:19
of the human voice. Not just in
2:21
what we say with words, but in the way we
2:23
sound while speaking. And
2:25
when we sing, we begin with...
2:27
Do re mi. Do
2:29
re mi fa sol la ti
2:32
do, so
2:32
do. Yep, keep the vowel
2:35
open one more time. Naaaa. Mmmmm.
2:41
Ooooo. And those signals
2:43
are increasingly being detected and analyzed
2:46
with AI. To provide clues about
2:48
who we are and even what we look
2:50
like. Or whether we have a medical
2:52
condition. Developers of these
2:54
products are going way beyond the hunt for
2:56
clues about people's emotional states. They're
2:59
looking for signs of all kinds of diseases, including
3:02
Parkinson's and Alzheimer's. So
3:05
from real-time analysis of our voices by
3:07
businesses providing things like customer
3:09
service, to healthcare applications and
3:12
more, just trying to understand
3:14
the overall scale of what companies and
3:16
researchers think they might be able to learn
3:18
from our voices can be pretty overwhelming.
3:22
And might even feel kind of dystopian
3:24
too. But it's also
3:26
possible this kind of tech might help
3:28
millions of people. In the US
3:30
alone, about one in six have been diagnosed
3:32
with depression,
3:33
and there aren't enough therapists to help. We're
3:37
joined now by my reporting partner on this project,
3:39
Hilke Schellman. She's an Emmy Award-winning
3:41
journalist writing a book about AI at work,
3:44
and she's been investigating this topic with us.
3:46
Hey, Hilke. Hey, Jen. What
3:49
was it that got you interested in this topic? You
3:52
know, I was working on my book, and I was reading
3:54
an article on how the surveillance of students
3:56
on university campuses has increased
3:58
dramatically during the pandemic.
3:59
pandemic. And this article was mostly
4:02
about tracking students locations and machines,
4:04
checking the temperatures. But somewhere
4:07
the author talked about a school called Menlo
4:09
College and how they started
4:11
using a tool to check the students voices
4:13
for signs of depression. And that
4:16
really piqued my interest. I wanted to know more
4:18
so I reached out to a company called Ellipsis
4:20
Health that built the tool
4:22
and I also had the chance to talk to a couple of students
4:24
who used it and I wanted to know how did
4:27
it actually feel using the tool.
4:29
Okay so let's start with the college. It's
4:31
located in the heart of Silicon Valley and we
4:34
spoke to one of its students, Lina Lacoski
4:36
Torres.
4:37
I'm majoring in business management
4:40
and they have a concentration in entrepreneurship
4:42
and innovation so that's my concentration.
4:45
I found that that was best suited towards
4:48
my needs just in regards to innovation
4:50
I'm always interested in doing things against
4:52
the status quo so that's
4:55
my major.
4:56
When we spoke with her she was a 19 year old
4:58
junior. I think that it was
5:00
more the holistic version of
5:02
I didn't come in with the mindset
5:04
of oh I'm gonna go to Google
5:07
or I'm gonna go and make a lot of money, like
5:10
I'm more focused on social change
5:13
and how I can best use
5:15
my brain, my thinking skills.
5:17
I don't have a lot of like technical skills.
5:20
But she found it hard to escape these expectations.
5:23
Because it is really overwhelming, fresh
5:25
out of high school going okay now we're
5:28
in a situation where 21 year olds
5:30
have built unicorns.
5:32
What are you gonna do? And it's like I'm just
5:35
trying to figure out how to exist you
5:37
know as my own person. Then
5:39
the pandemic hit.
5:40
So I did my first semester
5:43
of my freshman year in person
5:46
living in the dorms
5:48
and then spring break on
5:51
my second semester freshman year then
5:53
we went online.
5:55
And she told us many of the students
5:57
struggled especially early on when
5:59
they had to rush back home. A lot
6:01
of things come up. You're with your family. Family
6:04
issues, you can't get away from it. You're with your
6:06
mind. You know, you don't realize how much
6:10
you miss your social interactions. When
6:12
the campus shut down, she had to return to her
6:14
mom's place in Las Vegas. And
6:17
relocating out of state
6:18
meant cutting the cord with her therapist, who
6:20
can't work outside of California. I
6:23
was in Nevada, so I was pretty
6:26
distraught. I'll be honest, I felt like I was breaking
6:28
up with my therapist. I go, OK,
6:30
I'm in Nevada, so I guess. Bye.
6:33
And it felt really abrupt, especially in the time
6:35
that you need it the most. She
6:38
wasn't the only one struggling to find help.
6:40
About half the students at Menlo College aren't
6:42
from California. And in the midst of an unfolding
6:45
global crisis, suddenly many
6:47
of them couldn't access the college's mental
6:49
health services.
6:51
Then a startup pitched the school on an AI
6:53
product that's meant to assess anxiety
6:55
and depression and help people navigate
6:57
those symptoms. The school agreed
7:00
to try it, and in late 2020, the
7:02
tool was rolled out free of charge to about 800
7:04
students. It asked
7:06
people to answer daily questions in a voice
7:08
message. How's everything going at home?
7:11
And then I go for 30 to 45 seconds, hey, you know, things
7:15
have been tough. I'm really stressed out.
7:17
I'm feeling overwhelmed. I'm feeling
7:20
like I'm being smothered,
7:21
et cetera. And then
7:23
it would switch over to the prompt of, how
7:25
are you feeling lately? Or
7:28
how is school going? You know, another prompt
7:30
to keep you going.
7:32
Every time someone used it, they got a score,
7:34
and the tool gave recommendations, from
7:37
breathing exercises to the number for
7:39
a crisis helpline. For Lacoski
7:41
Torres, she didn't find the exercises
7:43
too helpful. She mostly used the tool
7:45
to prove to her mom just how stressed out
7:47
she was by living back at home.
7:50
I was like, look, it's real. This
7:53
is how I feel. You
7:55
can see it. There's data right there.
7:58
So not so much the coping skills.
7:59
because I already have issues
8:02
implementing those to begin with. People say, mine,
8:05
meditation. It's not, I'm
8:07
freaking out right now. If I could breathe
8:10
and chill, trust me, I definitely
8:12
would, you know?
8:13
But she says she found it helpful in
8:16
other ways. What if we didn't have this
8:18
at all? It wouldn't have brought up mental health in
8:20
the way that it did at such a large
8:22
scale and in such a way that,
8:25
I
8:25
don't know, I thought it was pretty cool. There's a lot
8:27
of people that thought it was pretty cool.
8:30
Though it did raise some questions about privacy.
8:32
Where's my talking going to? Somebody
8:34
gonna hear it. But it's really
8:37
just the computer-based system and
8:39
the AI system. And then, I guess,
8:42
in its infancy stages, possibly somebody
8:46
who works on it. But I mean, with the utmost
8:48
security, I'm pretty sure they didn't care
8:51
what you're really out to say. I think that's a
8:53
really big thing with AI too and everybody's
8:55
data. You go, okay, what's it gonna
8:57
be used for? It's my data. And it's
9:00
like, you're not a super spy.
9:02
Privacy also came up in discussions between
9:05
students, college representatives, and
9:07
the product's maker, Ellipsis Health. But
9:09
she says it was mostly about whether school therapists
9:12
would get access to that data.
9:14
And students said that wasn't appropriate.
9:16
Though
9:16
the privacy of their voice data and
9:18
what might happen to that was less of an issue.
9:21
Privacy doesn't really exist anymore.
9:23
And if you feel some type of way, you're gonna go onto
9:26
your social media and put it on blast.
9:28
You're gonna tell your friends, da-da-da-da. It's
9:30
not something you really wanna keep to yourself.
9:32
By the time we spoke, she didn't have access
9:34
to the tool anymore because the pilot program
9:37
had ended. But for the school, it jump-started
9:40
a broader conversation and a rethinking
9:42
of the way it delivers mental health services.
9:45
I think they have a greater faith in technology
9:47
than perhaps older generations do.
9:50
This is Angela Schmida, a vice president
9:52
at Menlo College. If I've learned
9:55
anything about students and their mental
9:57
health over the last two years, it's that there's
9:59
not a- one size fits all solution.
10:02
And so you can offer face-to-face counseling
10:05
and some students just won't take advantage of that.
10:08
But if you can offer something
10:10
where students are accessing it on their own
10:12
and they can do it in real time
10:14
and
10:15
there's not a barrier associated with
10:17
it, then some students will access that.
10:20
And she says she tried the tool herself. There
10:23
is something therapeutic about just talking.
10:25
I did find that to be the case and I
10:27
think for me it's a little bit awkward because you're
10:29
just sitting there and you're talking and you're not talking
10:31
to anyone. So it's a little bit like talking to yourself.
10:34
And you do wonder, you know, who's really going
10:36
to listen to this? So I know that the students
10:39
had concerns about that as well.
10:41
But with the demand for mental health services
10:44
greatly outstripping supply all over
10:46
the world, there's a lot of interest in
10:48
finding tech that might help.
10:50
And it's something that reporter Hilke Schellman
10:53
is looking at. I mean, there are not
10:55
enough therapists to help everyone. So
10:57
vocal biomarkers, they could be revolutionary
11:00
and it could help a lot of people, maybe
11:03
even millions of people, billions
11:05
of people, or at least that's
11:08
the hope. And so we
11:10
reached out to four startups to find out
11:12
how their technologies work and to
11:14
really understand what these tools
11:17
do. The first is called Kintsugi.
11:20
It's a startup with a test that's meant to find mental
11:22
distress from just a 20-second voice
11:24
recording. And it's supposed to work
11:26
regardless of what language someone is speaking.
11:29
Grace Chang is the company's CEO and she
11:32
told us these AI tools essentially
11:34
do what our parents and many therapists
11:36
have done for a long time.
11:38
When we talk to our friends, our family
11:40
members, it's almost obvious
11:42
for those who are close to us when
11:45
they speak in a lower voice or
11:47
if they speak in a slower manner that
11:49
something might be wrong with them. And
11:51
we have the luxury of knowing this person's
11:54
set of patterns to be able to determine
11:56
that there may be something
11:59
that is... different than
12:01
how this person normally speaks. And
12:04
so what is really remarkable is
12:07
that psychiatrists have known
12:09
that in this area of speech
12:12
there has always been a tie to
12:14
depression and anxiety.
12:16
She believes they can replicate this intuitive
12:18
speech analysis over a short period,
12:21
do it at scale, and that it can help therapists
12:23
understand how their patients are doing in between
12:26
appointments.
12:27
Our company has moved towards a
12:29
position of being able to create
12:32
a robust set of models, not
12:35
looking at what people are saying
12:38
but how they are saying
12:40
it. Our models end up being
12:42
language agnostic. So
12:44
we have people that are able to speak
12:46
in French and in Japanese
12:49
and English or otherwise. But
12:51
really we are just looking for those
12:54
biomarkers that are most predictive
12:56
for depression and anxiety. But
12:59
what is really fascinating about machine
13:01
learning is that we don't have
13:03
just the few examples of a psychiatrist
13:06
working with maybe hundreds of patients
13:09
across his or her career. Instead
13:12
we have tens of thousands of examples
13:14
of individuals who
13:17
have designated depression
13:19
or anxiety as examples for
13:22
machines to learn from.
13:24
She also says cultural influences
13:26
on language don't matter as much as we
13:28
might think. We don't care
13:31
about any of the demographic
13:33
information or the context of what's
13:35
happening
13:37
because we are looking at how people
13:39
are speaking, these spectral and prosodic
13:42
features of like how fast
13:45
or how loud or these
13:47
sort of, if you can see on a visual
13:49
spectrogram of how people
13:51
are speaking, there are some nuances
13:54
to speech that machines are able
13:56
to pick up.
13:59
What we do is we
14:02
analyze human voice and
14:04
deliver insight, health
14:07
insight, from that analysis.
14:09
And this is David Liu, the CEO
14:11
of Sond Health, a company that's regarded
14:14
as one of the frontrunners in the space. Its
14:17
health monitoring products look for all
14:19
sorts of things, from signs of cognitive
14:21
or motor impairment to asthma,
14:24
drug use. And we asked him for some
14:26
concrete examples of how it all works.
14:28
Think of us
14:29
as a health insight company,
14:32
and we use voice as the window to
14:35
deliver that insight. The technology
14:38
is a detection and monitoring technology,
14:40
right? What it does is it takes in six
14:43
to 30 seconds of human
14:45
voice, and from there, our
14:48
algorithms and models, which have been
14:51
trained on tens of thousands of people,
14:53
both in the US and in Asia, then
14:56
are analyzing, I
14:58
call it the atomic level of your voice.
15:01
We take and look at five
15:03
millisecond strips of your 30
15:05
second or 12 second voice sample, and
15:09
then within there, we're analyzing
15:11
those vocal features that do
15:14
and that we have identified as being
15:17
relevant
15:18
and understanding your particular
15:20
condition. And pauses are something
15:22
that almost all companies are looking into.
15:24
What we look at is the
15:26
time difference between when
15:29
air is being pushed out of your mouth to
15:31
when sound and voice is
15:33
being detected. And so that time
15:35
period, which is
15:37
quite short, we can measure that.
15:39
His system also records the smoothness
15:41
of a speaker's voice, control of vocal
15:44
muscles, energy, clarity and
15:46
the speech rate. And by the way, some
15:48
of these features can be heard
15:50
by the human ear and discerned. Most
15:53
cannot.
15:54
The idea is that tucked away in our voice,
15:57
AI can find hidden information.
15:59
The basic claim is that our voice is directly
16:02
connected to the brain,
16:03
and vocal biomarkers might allow
16:05
us to reverse engineer what's going on up
16:08
there.
16:08
We talk about vocal biomarkers
16:11
and SON being able to understand
16:14
changes in health through changes in voice. And
16:17
the reason why this is possible is,
16:19
and I'll go into the science of it a little bit, but
16:22
it is really based on physiology.
16:24
When symptoms of
16:27
a disease such as depression or anxiety
16:29
or any of these other mental health conditions
16:32
begin to have an effect
16:34
on the body,
16:35
and they do,
16:36
stemming from the brain, and there are a
16:38
hundred different body parts that come
16:41
together in your body,
16:43
in everyone's body to allow us to
16:45
speak. It's one of the most complicated
16:48
activities that human beings participate
16:50
in. There are literally thousands
16:53
of vocal features. I had no idea of this before
16:55
I came into the space either, but voice
16:58
is actually an incredibly rich
17:01
data source of interesting
17:04
acoustic features. And so his
17:06
and other companies argue they can objectively
17:08
monitor patients and pick up, for example,
17:11
if someone is becoming depressed.
17:13
It's a complex mixture of physical,
17:16
mental health, mental abilities
17:18
that come together. And so when
17:20
these symptoms of disease begin
17:23
to spread and begin to become of
17:25
a larger impact, they will impact
17:28
the actual physical aspects
17:31
and characteristics of your voice.
17:35
But it's not just healthcare companies using this
17:37
technology. What might it mean
17:39
to have it running in the background of our business
17:42
calls or during job interviews, where
17:44
voice analysis that may come with hiring
17:47
software could potentially flag
17:49
a job candidate as depressed? It's
17:52
complicated, and among the many questions
17:54
it raises,
17:55
would companies have an ethical obligation
17:57
to share these insights?
17:59
And so he asked Liam.
17:59
the former CEO at
18:02
Winter Light Labs. We've sort
18:04
of steered clear of that type of use cases because
18:08
there's a lot of thorny ethical issues, which
18:10
is like, let's say you have an Alexa running in the background. It
18:13
might be measuring your speech, but it
18:15
also might be measuring your spouse's speech or
18:18
the delivery person's speech or the TV. So
18:20
there's a lot of different things that
18:22
could happen in the background. And
18:24
then there's also a lot of context that's missing, like
18:26
is this person responding to
18:28
something that they're angry with or they're responding
18:31
to something that they're happy with? And so
18:33
in the short term, it's much easier to do
18:36
more active assessments. And
18:38
so you can kind of control the subject matter and what they're talking
18:41
about. So technically, it's
18:43
easier to do, and ethically it
18:45
avoids some of those challenges of diagnosing people
18:48
that don't want to be diagnosed or don't even know that they're
18:50
being listened to, which are generally pretty
18:52
creepy, and we want to kind of
18:54
avoid at this point. But
18:56
one question we haven't asked yet is whether
18:59
these biomarkers really contain this hidden
19:01
information about our health. We trust
19:03
a blood pressure reading as a biomarker of
19:05
physical health, but how do we know whether
19:07
we should trust vocal biomarkers as an
19:10
accurate reflection of disease?
19:12
We asked one of the world's leading AI researchers,
19:15
Margaret Mitchell. She founded Google's Ethical
19:17
AI Group and is a pioneer in the field
19:20
of machine learning with close to 100 papers. She
19:23
doesn't have commercial ties to vocal biomarkers now, but
19:26
she's worked on them in the past at Oregon
19:28
Health and Science University.
19:30
So in particular, we were looking at their
19:32
speech streams to see if
19:34
we could do detection of mild
19:37
cognitive impairment, which is a precursor
19:39
to Alzheimer's, Parkinson's.
19:42
So that's on the older end. And then
19:44
on the younger end, we were looking at autism
19:48
and apraxia. So with
19:50
all of the above, there was a question
19:52
of what sort of signals can we pull out.
19:55
With autism, prosody was a
19:57
big one.
19:58
So prosody is sort of like... the musical
20:01
side of language. So
20:04
I can say something like this, or
20:06
I could say something like this.
20:09
The latter has much more of an intonation
20:11
contour going up and down,
20:14
right? And so, for example, people
20:16
who are depressed tend to have more
20:18
monotone, a little bit flatter intonation
20:21
contours, right, than people
20:23
who are not depressed. So we were looking at that
20:25
kind of thing for autism. And
20:27
then for Parkinson's, we were looking at
20:29
a few different kind of things,
20:32
including pause behavior.
20:35
So one aspect of the
20:37
speech stream is the pauses,
20:40
the silence between different phrases.
20:43
You can start to pull out some
20:45
things that are roughly predictive of some
20:50
sort of neurological statuses. We
20:52
didn't have a ton of luck with Parkinson's at
20:54
the time. We did have some
20:56
luck with mild cognitive impairment and
20:58
with autism.
21:00
And she says her team got the best results
21:02
when they combined both sound and words.
21:05
The project that I worked on the most
21:09
that showed some nice results was
21:12
for mild cognitive impairment. And
21:14
we found that we
21:16
were able to make
21:18
some reasonable predictions of
21:21
mild cognitive impairment when we used speech
21:24
signal as well as language
21:27
signal. So it wasn't
21:29
just the audio. It was
21:31
also what they were saying.
21:33
The goal is to make health care more accessible
21:36
and less expensive. So if
21:38
you're able to make some
21:40
predictions based on a speech stream, you
21:42
can do a few things.
21:44
One is pre-screening. And
21:46
in pre-screening, you would call a phone
21:48
number, you would take a battery of tests.
21:50
And then based on the pre-screening results,
21:53
it would say whether or not you should
21:55
go speak to a physician. So
21:57
that just initial sort of pre-screening. is
22:00
a really useful thing to be able to
22:02
do and keep down costs. And
22:04
then once someone already has a diagnosis,
22:07
it's really useful for them to be able to just
22:09
at home do retellings,
22:12
do these different kinds of things without having to
22:14
go in and be especially diagnosed
22:17
and then have those readings or signals
22:19
sent back to the clinician.
22:22
So monitoring a patient's voice with
22:24
a tool like this could help detect if
22:26
a patient's depression is getting better
22:28
or worse. But she cautions,
22:31
these predictions aren't precise. This
22:33
has to be tempered by the fact that
22:36
accuracy and other sort of evaluation
22:38
metrics are virtually never 100 percent.
22:42
And there's a lot of additional factors that
22:44
come into play that affect accuracy, such
22:46
as what kind of phone is being used, what sort
22:48
of audio recording device, all this. So
22:51
in a world where everything worked perfectly,
22:53
that
22:54
is the goal. In reality,
22:56
we'll probably never get to a place where we work perfectly.
22:59
And so we'll
23:00
probably be in a space
23:02
where a
23:03
system could automatically give you
23:05
a preliminary reading, you
23:07
know, based on my faulty ability
23:10
to make predictions, here is my preliminary
23:12
reading of you. You know, it's kind
23:14
of like when you take a pregnancy test, you like
23:17
you first take the ones that are at home ones
23:20
and then eventually you're like, OK, I guess I'll go see a doctor.
23:22
It's sort of that sort of thing where
23:24
you try and be very clear that there's false positives,
23:26
false negatives, similar with Covid tests,
23:29
I suppose. You know, you try and make it as clear
23:31
as possible that it won't always work, but at least
23:33
it's better than nothing and can
23:35
be a signal that
23:36
then you go see a professional.
23:39
It's why one signal is not enough for
23:41
a diagnosis, because what if someone
23:44
had a bad day and speaks in a monotone
23:46
voice? And this isn't unique. It's
23:48
also the case with other biomarkers, too.
23:51
An elevated heart rate might mean there's a problem
23:54
or maybe the patient was just running
23:56
late. And it's also critical that
23:58
companies building these tools.
23:59
make sure the AI doesn't predict
24:02
on the wrong thing. Disaggregation.
24:05
This is the word to know. Disaggregation.
24:08
In
24:08
disaggregation, you take
24:10
all of these variables and
24:13
you test with respect to each one of them.
24:16
So in general experimental
24:18
design, there are independent variables
24:20
and dependent variables. I think
24:22
to your point, in machine learning, people often
24:25
don't pay attention to, you know, experimental
24:28
design that has been really well defined for years.
24:31
She also points out another problem. This
24:33
technology could easily be misused.
24:35
So if it's possible
24:38
for a potential employer interviewing
24:40
you
24:41
to automatically get some
24:43
readout that says, oh, this person might struggle
24:45
with depression, then that's a mechanism
24:48
for them to discriminate against you and
24:50
choose not to hire you. I've done
24:52
work on predicting depression
24:55
and I intentionally did not
24:57
include examples of the kinds of words
24:59
that were used. Because then you have
25:01
armchair, you know, you have armchair
25:04
clinicians like, oh, well, they're using the
25:06
word ibuprofen a lot. And I heard
25:08
that ibuprofen was a signal of depression.
25:11
So
25:12
now I'm going to be biased against this person in that way. I
25:15
mean, even talking about this, it's a little bit worrying, but,
25:17
you know, it is important for people to understand what's happening.
25:20
I can give this really funny example
25:22
that working on this speech stuff, one
25:25
of my friends, colleagues went to a talk
25:27
about depression and signatures of depression. The
25:30
colleague came back and as they came back,
25:33
I was very frustrated with something
25:35
and I went,
25:36
and then she went, are you depressed? Depressed
25:39
people sigh a lot. I heard, you
25:41
know what I mean? And so like even
25:43
giving this example, now her impression
25:45
of the world is fundamentally altered,
25:47
right? Now she sees depression in people that she wouldn't
25:50
have otherwise seen. So it's so critical
25:52
to make sure that these discussions
25:54
are really couched in all the caveats
25:57
and that they are only released to people
25:59
who.
25:59
who can really work through these nuances
26:02
and understand them. Otherwise,
26:04
all of these kinds of findings are going to be a mechanism
26:06
for discrimination. If you have
26:09
depression, you might use monotone.
26:12
If you're using monotone, that doesn't mean you're depressed.
26:15
And so understanding the sort of causality
26:17
flow there, if this, then that,
26:20
is really critical and something that a lot of people
26:22
mess up. And then particularly
26:24
if it ends up influencing people
26:26
and people's impressions in situations
26:29
like hiring, then it's a very
26:31
serious concern. This
26:34
makes it hard for researchers to share their findings.
26:37
Plus, there are lots of concerns about privacy
26:39
with this data. And so we turned
26:41
to Bjorn Schuller, a professor of artificial
26:44
intelligence at Imperial
26:45
College London, and one of the world's
26:47
leading experts in vocal biomarkers. We
26:49
asked him whether using this tech in a healthcare
26:52
setting is even a good idea. And
26:55
he said yes. Vocal biomarkers
26:58
can be useful where doctors have already
27:00
been listening with the stethoscope, mainly
27:02
for illnesses involving the lungs, since
27:04
these diseases cause an audible change
27:07
in the way we breathe, speak, or cough.
27:10
And he says this tech can go a step further
27:12
than traditional doctors can, not
27:14
only hearing a different cough, but for example,
27:17
if someone develops throat and neck cancer, the
27:19
disease changes the voice in a particular
27:21
way that can be picked up.
27:23
It came from, let's say, where
27:25
people have been listening to, and
27:27
moved more and more into, okay, let's think
27:29
about what would actually make a change
27:32
to your voice. It's your cognition for
27:34
speech production. It's your
27:36
physiology. It's your motor system.
27:38
If that is somehow affected, we should be
27:40
able to hear it.
27:42
But then he said something that gave us pause, that
27:45
the tech can already pretty accurately
27:47
predict a person's height from their voice,
27:49
and even their heart rate.
27:51
And one day, he thinks it'll be used
27:54
to spot things about infants even before
27:56
their own parents can, like that
27:58
it could help diagnose autism. and other
28:00
things very early on, simply
28:02
from a baby's cries.
28:27
He believes vocal biomarkers could also make
28:29
a huge difference in countries where medical care
28:31
is hard to access. So in the
28:33
future, people could call a number,
28:36
leave a voice message, and the computer
28:38
could tell them if they have dementia, throat
28:40
cancer, or other illnesses.
28:42
You would have the luxury of
28:44
it being very cheap, very accessible, if you just
28:46
have to call a call center
28:48
and it can give you feedback. This would
28:50
mean that we can, in less
28:53
connected countries, easily provide
28:55
such health services, via
28:57
phone.
28:58
These ideas are associated with deep neural
29:00
networks, a form of machine learning where
29:02
the computer trains itself to find patterns.
29:05
In the past, scientists needed to make
29:08
assumptions about what they could find in a voice,
29:10
and then figure out where the signal might
29:12
be.
29:13
Is it in the pitch,
29:15
or is it in the loudness?
29:17
These days, the computer gets training data
29:19
that's already labeled, so whether
29:22
the person does or doesn't have a disease
29:24
is marked.
29:25
And then a system looks for patterns in
29:27
that audio.
29:28
And this gets tricky because the
29:30
machine might find patterns we can't hear,
29:33
or it might find patterns that have nothing
29:36
to do with what it's trying to find.
29:38
And he gives an example of this from his own
29:40
work.
29:41
His team was trying to find the difference in
29:43
how people explain smelling something good
29:46
and smelling something awful.
29:48
So it's a pleasant or unpleasant smell, but
29:50
the machine that was inducing the smell made bubble
29:53
sounds, which were slightly different. And
29:56
we then took the pauses between the speech and
29:58
recognized that the AI actually
29:59
picks up the bubbling sounds of different
30:02
things poured into the liquid to produce the smell.
30:05
So we have to be very careful that we assure
30:07
what we're actually recognizing
30:09
is coming from the voice and not from any background
30:12
context.
30:13
In other words, the pattern the software found
30:15
wasn't the difference in how people reacted.
30:18
It's that the machine made different sounds when
30:20
releasing different fragrances. Similar
30:22
problems happened with trying to diagnose COVID
30:25
using software to analyze coughs, and
30:27
Schuller says he's worried that without extensive
30:30
testing, some tools won't work as advertised.
30:33
And frankly, we're sometimes
30:35
right. Yeah, I
30:37
wouldn't want to say worried, but I'm
30:40
clear about the real performance
30:42
of some commercial products. That's
30:45
generally true, of course, in machine learning. There's
30:47
a lot of people not fully revealing
30:49
their test methods and so on.
30:52
Something else, as we consider whether
30:55
to use these markers to test for things like
30:57
drug or alcohol impairment, voice
30:59
generators already exist that can change
31:01
our voices in real time, making
31:04
them sound happier or monotone, or
31:06
like an entirely different person. And
31:09
this could be used to sidestep these checks like
31:11
to fake a drug test.
31:13
So while players in the industry are quite confident
31:16
that vocal biomarkers exist and
31:18
can be used for a great number of applications,
31:21
how well it really works in practice remains
31:23
unclear.
31:29
This episode was reported by Hoka Shellman,
31:31
produced by me with Emma Sillikins and Anthony
31:33
Green, were edited by Matt Hohnen and mixed
31:36
by Garrett Lang, with original music by
31:38
Garrett Lang and Jacob Gorski. Special
31:41
thanks to the Night Science folks at MIT
31:43
for their support with this reporting. And thank
31:45
you for listening. I'm Jennifer Strong.
31:52
This is MIT Technology
31:54
Review.
31:59
you
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More