Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:04
Welcome to Tech Stuff, a production of
0:06
I Heart Radios How Stuff Works. Hey
0:12
there, and welcome to tech Stuff. I'm your
0:14
host, Jonathan Strickland. I'm an executive producer
0:16
with I Heart Radio and I love all things tech,
0:19
and guys, stick with me. I am
0:22
fighting off a cold. You'll
0:24
be able to hear it in my voice. I have no doubt.
0:26
But you know, I wanted to get you guys
0:29
a brand new episode. So we're gonna fight
0:31
on because the show must
0:34
keep going. I
0:36
think I think this is saying, oh no, this
0:38
cold medicine is good though. All right, Anyway, I thought
0:41
that we would do an episode about
0:44
smart speakers because I
0:46
wanted to kind of start this whole episode off
0:48
with with an old man observation,
0:51
you know, get off my lawn kind of thing.
0:53
And this is from our resident old man,
0:56
old man Strickland. That meaning meaning
0:58
me, So, when I was young, speakers
1:01
were dumb. Now I don't. I don't mean
1:03
that speakers were useless, or
1:05
that they were terrible, or that they
1:07
were incapable of replicating certain
1:09
frequencies or volumes of sound,
1:12
or that they were limited in some other
1:14
way other than they didn't quote
1:16
unquote think they didn't
1:18
connect to any sort of computational
1:21
engine in a meaningful way. You might
1:23
have a set of speakers plugged into a computer,
1:25
but that was just a one way communications tool,
1:27
right. It was just a way to provide an outlet
1:29
for sound that your computer was generating,
1:32
nothing more than that. But contrast
1:34
that with today, when we have numerous
1:36
smart speakers on the market. These speakers
1:39
act as a user interface between
1:41
us and the Internet at large, often
1:43
facilitated by a virtual assistant
1:46
of some kind. Now with these
1:48
speakers, we don't just listen to stuff
1:50
like music and podcasts and
1:53
the radio and you know, other traditional
1:55
audio content. We use them
1:57
to find out information. We might
2:00
link them to our calendars so that
2:02
we can get reminders for upcoming appointments.
2:05
We probably use them to ask about the weather
2:07
report. I use mine at home
2:09
for that all the time, or even
2:11
more often than that, if you're at my house, you'll
2:14
hear us use it to find out which foods
2:16
are safe for us to feed to our dog. My
2:18
doggie, Tibolt, absolutely loves our smart
2:21
speaker because it frequently gives us permission
2:23
to spoil him with a carrot or
2:25
a piece of banana. But how
2:27
do these smart speakers work,
2:30
How are they able to respond to
2:32
our requests? And what are their
2:34
limitations? How safe are they?
2:37
That's the sort of stuff we're gonna be looking into in
2:39
this episode of tech Stuff, and we'll start
2:41
off with the basics, which means
2:43
we have to start off with how speakers work
2:46
in general. Now, this is something
2:48
that I've covered before on tech Stuff, but
2:50
I want to go over it again from a high level
2:52
because well, I just find it fascinating
2:55
that people figured out how to harness electricity
2:58
to drive a motor so that it could
3:00
in turn cause components
3:02
to replicate a recorded or transmitted
3:05
sound. And really motors being too
3:07
generous, but to drive an element to
3:09
create vibrations that could replicate a
3:11
sound that was made into another component,
3:14
that whole thing just boggles my mind that
3:16
people are smart enough to figure that out. Okay,
3:19
So to understand how speakers work,
3:21
it first helps to understand how sound
3:24
itself works. Sound is a
3:26
physical phenomenon. Do
3:28
do do do? Sound is all about vibrations,
3:31
and typically we experience sound
3:33
when we pick up on changes in air pressure
3:36
that enter through our ear
3:38
canal and then affect the tympanic membrane
3:40
or ear drum. So it's
3:42
all about these changes
3:44
of of of air pressure, all
3:46
about air molecules transmitting vibrations
3:49
from a source outward
3:51
in a radiating pattern from from
3:53
that source. So let's think of
3:55
someone knocking on a door. For example,
3:57
you're inside a house, someone's knocking on your door.
4:00
When that person's hand hits the door,
4:03
it causes the door to vibrate, and
4:06
that vibration transmits to the surrounding
4:08
air molecules on the other side of the door.
4:10
They get pushed through that vibration
4:13
and then pulled when the
4:16
the wood is vibrating back towards its
4:18
original position. So the
4:20
air molecules vibrate, those air molecules
4:23
cause the next surrounding layer of
4:25
air molecules to vibrate as well, and
4:27
so on and so forth. It's like a cascade
4:30
or domino effect. You get these little pockets
4:32
of high and low air pressure that travel
4:34
outward from that door.
4:37
It spreads further as it goes towards
4:40
you know, any distance, and if
4:43
you are close enough so that
4:46
you can still detect those changes in air pressure.
4:49
You experience this by hearing the knocking
4:51
on the door. Those vibrating air molecules lose
4:54
a bit of energy as they move
4:56
outward. Right, as they vibrate to the
4:58
next layer, you start to lo use a bit of
5:00
energy with each transmission
5:02
of that So the sound gets quieter
5:05
the further away you are because
5:07
there's not as many air molecules vibrating,
5:10
its amplitude as decreased. So
5:13
if you are in hearing range, you can
5:15
pick up on those changes of air pressure they encounter
5:17
the tympanic membrane in your ear canal. Those
5:19
changes in pressure will cause a reaction in your
5:22
middle and inner ear set that
5:24
will ultimately get picked up by
5:26
your brain that interprets it as sound.
5:29
Now, the frequency at which those fluctuations
5:32
occur relate to the pitch
5:35
that we hear, so faster
5:38
vibrations are higher
5:40
pitches, higher frequencies, higher
5:42
notes. If you think of a musical scale,
5:45
we perceive the force of the
5:47
changes as volume, so
5:50
lower forces lower volume right, and
5:53
higher forces higher volume. The
5:55
human ear can hear a pretty decent
5:57
range of frequencies from twenty hurts,
5:59
which means twenty cycles or
6:01
twenty waves per second
6:04
past a given point of reference,
6:06
to twenty killer hurts. That's twenty
6:09
thousand cycles or waves
6:11
per second. So yeah, the cycle refers
6:14
to the frequency of the wavelength of sound.
6:17
The lower the frequency, the lower the sound.
6:19
All right, and then our brain has to make meaning
6:21
of all this, Right, it's not just that it's
6:23
picking up on it. Our brain interprets
6:26
this and we experience
6:28
it as a sound
6:30
we have heard. So it either matches
6:33
this perceived sound with one we've encountered
6:35
before, and then we
6:38
say, oh, I know what that is. That's someone knocking
6:40
at the door, or they
6:42
might be Holy Cala, I've never heard that
6:45
sound in my life. I have no idea what it is.
6:48
If the sound is language, then our brains
6:50
have to derive the meaning from the perceived
6:53
sound. We've heard someone say
6:55
words such as you're hearing me say this. Then
6:59
our brains have to take that
7:01
collection of sounds and say, what does that actually
7:03
mean? What is the the context,
7:06
what is the the intent? What is the
7:08
message here? Otherwise it would just
7:10
be you know, random noises
7:12
that I'm making with my mouth. Alright,
7:14
so we have a basic understanding behind
7:16
the physics of sound. Now to talk about speakers
7:19
and microphones and the reason I'm
7:21
going to talk about both of them is that
7:24
the devices complement one another. You can
7:26
think of one as being the other in reverse.
7:28
Plus smart speakers
7:30
we have to talk about microphones anyway, because
7:33
smart speakers have microphones as
7:35
well as the speaker element. So
7:38
you can think of this as one long process
7:40
of taking the physical phenomena of
7:42
sound waves, transforming
7:44
that physical phenomena into an electrical
7:47
signal, taking the electrical signal,
7:49
and changing it back into something that can produce
7:51
the sound waves that started the whole
7:53
thing. So you're replicating the original
7:56
sound waves with this end
7:59
device, which in this case is allowed speaker.
8:01
So the microphone is the part of the process where
8:04
you take the sound and you turn it into an electrical
8:06
signal, and the speakers where you take the
8:08
electrical signal and you turn it back into actual
8:10
sound. That's the simple way. But what's actually
8:12
happening, Well, let's
8:14
talk about on a physical level. Sound
8:16
waves go into a microphone.
8:19
So you've got these fluctuations
8:22
and air pressure that encounter a microphone.
8:24
I'm speaking into a microphone right now,
8:27
so this is happening right now. Inside
8:29
the microphone is a very thin diaphragm,
8:32
typically made out of a very flexible
8:34
plastic, and it's sort
8:36
of like the skin of a drum. So
8:38
as the changes in air pressure encounter
8:41
the diaphragm, they cause the diaphragm
8:43
to move back and forth. Well. Attached
8:46
to the diaphragm is a coil of
8:48
conductive wire, and that coil
8:50
wraps either around or near
8:52
a permanent magnet. Magnets have
8:54
magnetic fields. They have a north pole
8:57
and a south pole, and there's a magnetic field
8:59
that surrounds the magnet.
9:02
And the electro magnetic effect means
9:05
that if you move a coil
9:07
of conductive wire through
9:09
a magnetic field, it will produce
9:11
a change in voltage in that coil,
9:14
otherwise known as electromotive
9:16
force, and that means electrical current
9:19
will flow through the coil. Now,
9:21
if you have the end of that coil attached
9:24
to a wire, a conductive
9:26
wire for that current
9:28
to flow through, you can send that current
9:30
onto other components. So for our
9:33
purposes, the component in question
9:35
would be an amplifier, and I'll get
9:37
to explaining why that is in just a
9:39
moment, but first let's talk about loud
9:41
speakers, and the way allowed speaker works
9:44
is essentially the reverse
9:46
of a microphone. You've got your permanent
9:48
magnet around or near which
9:51
is a coil of conductive wire. The
9:53
wire is connected to a diaphragm,
9:56
one much larger and typically made
9:58
out of stiffer material that the plastic
10:01
you'd find in a microphone. This
10:03
is the element inside a speaker that will
10:05
vibrate, that will push air and pull
10:08
air as it moves either
10:10
outward or inward. The electrical
10:12
signal comes from a source such as
10:15
the microphone we were just using a second
10:17
ago that comes into the loudspeaker
10:19
and it flows through the coil. Now,
10:22
when you have an electrical current flowing
10:24
through a conductive coil, you
10:27
generate a magnetic field because
10:29
the laws of electromagnetism. You've
10:31
got the electro magnetic
10:34
field generated as a result. Now
10:36
that field will interact with the magnetic
10:38
field of the permanent magnet. That the
10:41
permnet magnet always has a magnetic field.
10:43
The coil only has one when electric
10:45
current is flowing through it. And
10:47
as I said, we have magnets to have a north
10:50
pole and a south pole. And we also know
10:52
that when we bring two magnets with
10:54
their north poles together, they'll
10:56
push against each other, right because like
10:59
repels like, But if
11:01
we turn one of those magnets around so that
11:03
now it's a south pole and a north pole,
11:06
they attract one another, you
11:08
know, opposites attract. So
11:11
by having the this magnetic
11:14
field being generated by the coil, uh,
11:17
it starts to generate
11:20
interactions with the magnetic field of the permanent
11:23
magnet, so they
11:25
start to push and pull against each other. Well,
11:28
the coil is attached to that diaphragm,
11:30
so it in turn drives the diaphragm
11:33
to either push outward or pull inward.
11:36
That causes air molecules
11:38
to vibrate, just as it would
11:41
with any other you know, source of sound,
11:43
and it emanates outward from the loudspeaker,
11:46
so you get a representation
11:48
of the same sound that was going into
11:51
the microphone got converted
11:53
into an electrical current. The electrical
11:55
current then was passed
11:58
through a coil and next to a permanent
12:00
magnet to create the same sort
12:02
of movement. It replicates the movement of the
12:04
original diaphragm in the microphone and
12:07
generates the sound. So
12:09
you get the replication of the sound
12:11
that was made in the other location. It's
12:14
pretty cool. I think now I did
12:16
mention earlier that you would need
12:18
an amplifier. And the reason you need
12:20
an amplifier is that the electrical signal
12:22
generated by a microphone is
12:24
far too weak to drive allowed
12:27
speakers diaphragm. You just wouldn't
12:29
have the juice to do it. It would be
12:32
much much less, uh powerful
12:34
than what the speaker would need. So chances
12:36
are the diaphragm would either not move at
12:39
all because it would just be too stiff, it would resist
12:41
the movement too much, or it would move
12:44
so weakly as to generate little
12:46
to no sound, so it wouldn't do you any good. So
12:49
the signal from the microphone has to
12:51
first pass through an amplifier, which, as the
12:53
name implies, takes an incoming signal
12:55
and increases the amplitude of
12:57
that signal the volume. In other words,
13:00
uh so, it doesn't affect pitch, but it does
13:02
affect the signal strength and consequently the
13:04
volume. And I've done episodes
13:06
about amplifiers, including explaining
13:09
the difference between amplifiers that use vacuum
13:11
tubes and ones that use transistors,
13:14
so I'm not going to go into that here.
13:16
Besides, it doesn't really factor
13:18
into our conversation about smart speakers
13:20
anyway. It's just important for
13:23
it to work with a microphone and speaker
13:25
setting. Now, over the years, engineers
13:27
have paired microphones and speakers and lots
13:30
of stuff. You've got telephones, you've
13:32
got intercom systems, public address
13:34
systems, handheld radios, all sorts of
13:36
things, so that technology was
13:38
well and truly mature. Before
13:40
we ever got our first smart speaker,
13:43
there wasn't much call to incorporate
13:45
microphones into home speaker systems
13:48
for many years. I mean, what would
13:50
you actually use a microphone embedded
13:52
in a speaker for? Before smart speakers,
13:54
Typically you would have your speakers
13:56
like I'm talking about, like like sound system speakers.
13:59
You would have them hooked up to some other dumb
14:02
as in, not connected to a network
14:04
technology. So it might be a sound
14:06
system or home entertainment set
14:08
up with a television as the focal point, or maybe
14:11
even you know, a computer for the purposes
14:13
of playing more dynamic sounds for like video
14:15
games and and things like that.
14:18
Um. But for a very long time,
14:21
these were all thought of as one way communications
14:23
applications, right, Like, the sound was
14:25
coming from a source and it would get to us
14:27
through the speakers, but we weren't meant to send
14:31
sound back through those same channels.
14:33
The information was just coming to you. You weren't
14:35
sending anything back, But that would all change
14:38
in time. Now. One thing to keep in
14:40
mind about smart speakers is that
14:42
they are the product of several different technologies
14:45
and lines of innovation and development that all
14:47
converged together. The microphone
14:49
and speaker technology is one of the oldest
14:52
ones that we can point to as far as the
14:54
fundamental underlying technology
14:56
is concerned, the stuff that's been
14:58
around since the late nineties century.
15:00
Now there is one other we'll talk about that's
15:02
even older. But I don't
15:04
want to spoil things. I'll just mention there
15:07
is an even older line of
15:09
development that goes into smart speakers
15:12
than the microphone speaker stuff of
15:14
the nineteenth century. Most of the
15:16
other components, however, are much younger
15:18
than that. One big one is
15:21
speech or voice recognition. Creating
15:24
computer systems that could detect noise
15:26
was relatively simple. Right. You could have a computer
15:29
connected to microphones and they
15:31
could monitor the input from those
15:33
microphones and any incoming
15:35
signal could be registered. Right, they could
15:37
record an incoming signal that
15:39
would indicate the microphone had detected a
15:41
noise. That's child's play. That's
15:43
easy to do. But teaching computers
15:46
how to analyze those signals and decipher
15:48
them so that the computer could display
15:50
in text or otherwise act
15:53
upon that that sound
15:55
in a meaningful way that was much
15:57
more difficult. There was
16:00
an IBM engineer named William
16:02
C. Dirsh of the Advanced System
16:04
Development Division who created an early
16:07
implementation of voice recognition. It
16:09
was a very limited application, but it
16:11
proved that the ability to interact
16:13
with computers by voice was more
16:15
than just science fiction. Within
16:18
IBM. It was called the Shoebox.
16:21
Dirsh worked on this project in
16:23
the early nineteen sixties and what he
16:25
produced was a machine that had a
16:27
microphone attached to it. The
16:29
machine could detect sixteen spoken
16:32
words, which included the digits
16:34
of zero to nine plus
16:37
some command indicators like plus
16:39
minus total, sub total.
16:42
You get the idea. So you could speak a
16:44
string of numbers and then commands
16:46
to this device, then ask it to total
16:48
everything and it would do so. So it was
16:51
more or less a basic calculator
16:53
with some voice interpretation incorporated
16:56
into it. Now there's a
16:58
great newsreel piece about this
17:01
shoebox. There's a demonstration of it, and
17:03
it came out in nineteen one, and
17:05
I love that newsreel because it has
17:08
that great music you would hear in the background of
17:10
those old industrial and business films.
17:12
Anyway, there's also a helpful chart
17:14
that hangs in the background of
17:17
that video where Dersh is
17:19
actually explaining how it works. You
17:21
can see a little bit behind him
17:24
what the what is actually being analyzed
17:27
and uh he broke the words down
17:29
into phonemes and syllables, so
17:31
phonemes being specific sounds
17:34
that make up words. So,
17:36
for example, the digit one is
17:39
a single syllable word with a vowel
17:41
sound right at the front. But you also
17:43
have the word eight that's
17:46
another single syllable word as
17:48
a vowel sound right at the front, but
17:50
it's different from one phonetically
17:53
in that eight also has a
17:55
plosive and has that hard t at
17:57
the end. So the shoebox
17:59
was limited not just in what
18:02
words it could recognize, but also the
18:05
types of voices it could recognize.
18:07
Get someone who has a different dialect or
18:09
manner of speech, and the machine might not be able to
18:12
understand them because they're not pronouncing
18:14
the words the same way that drsh did.
18:17
This would be a big challenge in
18:20
speech recognition moving forward, and
18:22
it's also an example of where we
18:24
find bias creeping into technology.
18:27
And it's not necessarily a conscious thing,
18:30
but if you have people designing
18:32
a system and they're designing it based
18:34
off their own uh,
18:36
you know, speech patterns, their own
18:39
pronunciations, their own dialects,
18:41
then it may be that the system
18:44
they create works really well for them
18:46
and less well for anyone who isn't
18:48
them, And the further away you are from
18:50
their manner of speaking, the
18:53
more frustration you will encounter
18:56
as you try to interact with that technology.
18:59
That's an example of s and in
19:01
fact, if you read the histories
19:03
of speech recognition and as we'll get
19:05
too later natural language processing,
19:08
you'll see a lot of people say it works
19:10
great if you happen to be a white
19:12
man, because the
19:15
manner of speech was being or
19:17
the people who were designing it were primarily
19:19
white men who were uh
19:22
typically aiming for a a what
19:25
is considered a non accented
19:27
American dialect somewhere
19:30
in you know, the Eastern seaboard
19:32
side. But that meant
19:34
that if you did have an accent or a dialect,
19:37
or you had a different vernacular,
19:40
that it was harder for the systems
19:42
to actually understand what you were saying. That's
19:44
an example of bias. Well. The
19:46
general strategy was again to break up
19:49
speech and too constituent sound units, you
19:51
know, those phonemes, and then to susse out
19:53
which words were being spoken
19:55
based on those phonemes, and
19:57
that was done by digitizing the voice train,
20:00
forming it from sound into data
20:02
that represented stuff like the sounds
20:04
frequency or pitch, and then
20:07
matching up specific signal signal
20:09
signatures with specific phone nmes.
20:11
So generally the idea was that the computer system would
20:13
monitor incoming sound, convert the
20:15
sound into digital data, compare that
20:18
data that had received with information
20:20
stored in a database, and effort to
20:22
look for matches. Uh. The shoebox
20:25
database was just sixteen words and size.
20:27
Later ones would be much larger, but pretty quickly
20:30
people realized this was
20:32
not an efficient way of doing
20:34
speech recognition because the bigger
20:36
the vocabulary, the more work intens
20:38
of it was to build out those databases.
20:41
So it wasn't something that people thought would
20:43
be sustainable for very
20:45
large vocabularies. But the Shoebox
20:48
marked the beginning of a serious effort to create machines
20:50
that could accept audio cues as actual input,
20:52
and as we'll see, that's one important
20:55
component for these smart speaker systems.
20:57
I've got a lot more to say, but before I get into
20:59
the next part, let's take a quick break.
21:09
Now, obviously we didn't jump right
21:11
into full voice recognition right after
21:13
IBM S Shoebus innovation. The
21:16
challenges related to building automated
21:18
speech recognition systems were numerous,
21:20
even for just a single language,
21:23
because, as I said, you can have accents and
21:25
dialects. One voice can have a
21:27
very different tonal quality from another,
21:30
people speak at different speeds. Teaching
21:32
machines how to recognize speech when the phonemes
21:34
and pacing of that speech aren't
21:37
consistent from speaker to speaker, that's
21:40
really hard. This kind of gets back to
21:42
the same sort of challenges you have when you're teaching
21:44
machines how to recognize images. You
21:47
know, you teach a human what a
21:50
coffee mug is. I always use this example,
21:52
but you teach a human what a coffee mug is, and
21:54
pretty soon they can extrapolate from
21:56
that example and understand
21:58
that coffee mugs can them in all different sizes
22:02
and colors, and you know different
22:04
designs and textures. We
22:07
get it. Like you you see a couple of coffee mugs,
22:09
you understand machines though
22:12
they aren't able to do that. Machines,
22:14
you know, you have to give them lots and lots
22:17
and lots of different examples before they can
22:19
start to pick up on what things
22:22
actually make a coffee mug. Same
22:24
sort of thing with speech, right, So
22:27
if you don't have consistency between
22:29
speakers, it makes it very hard for machines
22:31
to learn what people are saying. Now,
22:33
it didn't take long for the tech industry at
22:35
large to really dive into trying to solve
22:38
this problem. In ninete, DARPA,
22:41
that's the Research and Development division of
22:43
the United States Department of Defense,
22:45
got behind speech recognition in a big
22:47
way. Now, remember darp it self
22:50
doesn't do research. The organization's
22:53
purpose is to invite organizations
22:55
to pitch projects that align
22:57
with whatever darpest goals are and
23:00
and DARBA would provide funding to the
23:02
winning organizations to see
23:05
these projects to completion if possible.
23:07
So DARK is really more of a vetting and funding
23:10
organization anyway. In
23:12
n DARPA created
23:14
a five year program called Speech
23:16
Understanding Research or s u
23:18
are. The initial goal was
23:20
pretty darn ambitious considering the capabilities
23:23
of the technology at the time. The project
23:25
director, Larry Roberts, wanted
23:27
a system that would be capable of recognizing
23:29
a vocabulary of ten thousand words
23:32
with less than ten percent error. After
23:34
holding a few meetings with some of the leading
23:37
computer engineers of the day, Roberts
23:39
suggusted that goal significantly.
23:42
After that adjustment, the target was going
23:44
to be a system capable of recognizing
23:47
one thousand words, not ten
23:49
thousand. Nearror levels
23:51
still had to be less than ten percent, and
23:54
the goal was for the system to be able to accept
23:56
continuous speech, as opposed
23:58
to very deliberate
24:01
speech with pauses
24:05
between each pair of words that would
24:07
not be really that useful.
24:11
One person who was skeptical about
24:13
the potential success of this project
24:15
was John R. Pierce of Bell
24:17
Labs. He argued that any
24:19
success would be limited so long
24:21
as machines remained incapable of understanding
24:24
the words, not just recognizing
24:27
a word based on phone names, but understanding
24:29
what the word is. That is. Pierce felt that
24:31
the machines needed some way to parse the
24:33
language to get to the meaning of what was
24:35
being said. That's an important
24:37
idea that we will come back to in just a bit now.
24:40
Among the companies and organizations that landed
24:42
contracts with DARPA were a Carnegie
24:44
Melon University BBN, which
24:46
actually played a big part in developing our ponette,
24:49
the predecessor to the Internet, Lincoln
24:51
Laboratory, and several more and
24:54
very smart people began to create systems
24:56
intended to recognize speech and meaningful
24:58
ways. The names of the programs
25:00
were a lot of fun. There was h W I
25:03
M that stood for hear what
25:05
I mean as in here as in listen
25:07
hear what I mean. That one was from BBN.
25:10
CMU introduced hearsay,
25:12
which was later designated as Hearsay one,
25:15
and then they came out with Hearsay two. They
25:17
also would demonstrate another
25:20
one called harpy.
25:22
Oh, and there was a professor at CMU named Dr
25:24
James Baker who would design a system
25:27
called Dragon in nineteen seventy
25:29
five that he would later leverage into a company
25:31
with his wife, Dr Janet M. Baker
25:34
in the nineteen eighties, and they had a very successful
25:36
business with speech recognition software.
25:39
Now, I'm not going to go into each of those programs
25:42
in deep detail, but rather just mentioned
25:44
that they all helped advance the cause of
25:46
creating systems that can recognize speech.
25:49
One of the big developments that came out of all
25:51
that work was a shift to probabilistic
25:54
models, which would also play a really
25:56
important part in another phase of
25:58
developing the smart speaker. So what do
26:00
I mean when I say probabilistic? Well,
26:02
as the name indicates, it all has
26:04
to do with probabilities. Essentially,
26:07
systems would analyze incoming phonemes
26:09
and make guesses as to what
26:11
was being said based on the probability
26:14
of it being a given word or part
26:16
of a word. The systems typically
26:18
go with whatever word has the highest
26:20
probability of being the correct one.
26:23
Even with that approach, there are nuances
26:26
to language that are difficult to account for with
26:28
a machine. So, for example, you have homonyms
26:30
and which you have two words that sound the same
26:33
but have very different meanings and potentially
26:35
spellings like right as
26:37
in to write a sentence, or
26:40
right as in am I right? Or
26:42
am I wrong? Or you could have a pair
26:44
of words that sound like
26:46
a single word and have confusion
26:48
there, such as a door. You
26:51
can say a door you mean you're meaning a
26:53
single door a door to go into
26:55
a building, or you might say a dore as
26:57
an I adore this podcast
26:59
you're doing, Jonathan. That's sweet of you, Thank
27:02
you for saying that. So computer
27:04
scientists were hard at work advancing
27:07
both the capability of machines to make
27:09
correct guesses at individual phone names
27:12
and then full words, as well as
27:14
figuring out a way to teach machines to adjust
27:16
guesses based on context. That
27:19
requires a deeper understanding of the language
27:22
within which you're working. If you're aware
27:24
of certain idioms, you can make a
27:26
good guess at a word or phrase even if
27:28
you didn't get a clean pass at
27:31
it right. So, for example, the
27:33
phrase it's raining cats and dogs
27:35
just means it's raining a lot. And if
27:37
a system included a database that indicated
27:40
the phrase cats and dogs sometimes
27:42
follows the phrase it's raining, then
27:45
the system is more likely to guess the correct
27:48
sequence of words instead
27:50
of guessing something that sounded similar
27:52
but it's wrong. For example, if it
27:54
said, oh, they must have said it's raining bats
27:57
and hogs, that
27:59
would not makes sense. So
28:02
the systems estimate the probability
28:04
that any given sequence of sounds within
28:07
the database matches what the systems
28:09
have just quote unquote heard progress
28:11
in this area was steady, but slow, and
28:14
I'd argue that it was also a reminder that concepts
28:16
like Moore's law do not apply universally
28:19
across technology. Rapid development
28:21
in one particular domain of technology is
28:24
not necessarily an indicator that the same sort
28:26
of progress will be observed in all other
28:28
areas of tech. We often
28:32
get into the mistaken habit of
28:34
believing that Moore's law applies to everything. Alright.
28:37
So a related concept to
28:39
voice recognition is something called natural
28:42
language processing, and this relates
28:44
back to how we humans tend to process information
28:47
compared to the way machines tend to do
28:49
it. So we humans formulate ideas,
28:51
we shape those ideas into words and sentences.
28:54
We communicate them in some way to other
28:56
people through that language. It
28:59
may be through speed you maybe through text.
29:01
It may even be through a nonverbal or non
29:04
literary way, but we communicate
29:07
those ideas. Machines typically accept
29:09
input, they perform some process
29:12
or sequence of processes on that input,
29:15
and then they supply an output of some
29:17
sort. Machines do this in machine
29:19
language. That's a code that's far too
29:22
difficult for humans to process. Easily.
29:24
Binary is an example of machine
29:26
language. Binary is represented as zeros
29:29
and ones, which would group together can
29:31
represent all sorts of stuff. But if you just
29:33
looked at a big block of zeros
29:35
and ones, it would mean nothing to you. It's
29:38
not easy for humans to use, and then
29:40
machines in turn are not natively
29:42
able to understand human language, so there's
29:44
a language barrier there. Because
29:47
of that, people created different
29:49
programming languages. These languages
29:52
provide layers of abstraction from
29:54
the machine language. They make it easier
29:56
to create programs or directions
29:59
that the computer should fall low. So the
30:01
person who's doing the programming is using
30:03
a programming language that's easy for humans
30:05
to use that then gets converted
30:07
into machine language that the computers
30:10
understand. But what if you could send
30:12
commands to a computer using natural
30:14
language, not even programming language.
30:16
You could just speak in Plaine
30:19
vernacular, whether it's English or
30:21
any other language, the way humans
30:24
communicate with one another. What if a
30:26
computer could extract meaning from
30:28
a sentence, understand what it
30:30
was you wanted the computer to do, and
30:32
then respond appropriately. So imagine
30:34
how much time you could save if you could just tell your
30:36
computer what you wanted it to do, and
30:38
it took care of the rest. If
30:40
you had a powerful enough computer system
30:43
with strong enough AI,
30:45
maybe you could even potentially do something like describe
30:48
a game that you would love to be able
30:50
to play, like not not a game that
30:52
exists, a game in your head, and
30:55
you could describe it to a computer and the computer
30:57
could actually program that game. Well,
30:59
we're we're definitely not anywhere close to that
31:02
yet, but we made enormous progress with natural
31:04
language processing. Now, the history
31:06
of natural language processing isn't
31:08
exactly an extension of voice
31:11
recognition. It's actually more like a parallel
31:13
line of investigation. And
31:16
that's because natural language processing doesn't
31:18
require voice recognition. You
31:20
can have an implementation in which you just
31:23
right commands in natural language,
31:25
you know, you type them out on a keyboard and
31:27
the machine then carries out those those
31:29
instructions. So much of the
31:31
early work in natural language processing was in
31:33
text based communication rather
31:35
than in speech. The history of natural
31:38
language processing includes stuff like
31:40
the Turing test, named after Alan
31:42
Turing. So the most common interpretation
31:44
of the Turing test these days is
31:47
that you've got a scenario in which a person is
31:49
alone in a room with a computer terminal,
31:51
they can type whatever they like into the computer
31:54
terminal, and someone or something
31:56
is responding to them in real time. Now
31:59
it might be another person, or it might
32:01
be a computer system that's responding
32:03
to that person. You run
32:06
a whole bunch of test
32:08
subjects through this process,
32:10
and if the computer system is able to fool a
32:12
certain percentage of those test subjects,
32:15
like say thirty percent of them,
32:17
that it is in fact another human and
32:19
not a computer, it is said to have
32:21
passed the Turing test, And
32:24
typically we use that to mean the machine has given
32:26
off the appearance of possessing intelligence
32:29
similar to the one that we humans
32:31
possess. That gets beyond
32:33
our scope for this episode, but
32:35
it helps point out that stuff like speech recognition
32:38
and natural language processing are
32:40
both closely related to the field of artificial
32:42
intelligence. In fact, they really belong within
32:45
the artificial intelligence domain. The
32:47
Turing test was more of a hypothetical.
32:50
It was a bit of a cheeky way of saying, Hey,
32:53
if you can't tell whether or not something is
32:55
intelligent, it makes sense to treat
32:57
it as if it actually is intelligent.
33:00
After all, we assume that every human
33:02
with whom we interact possesses
33:04
some level of intelligence. Based on those
33:06
interactions, so why should we not
33:08
extend the same courtesy to machines. Now,
33:12
natural language processing would prove to be
33:14
another super challenging problem to solve.
33:17
In computer science. Early work was
33:19
done in translation algorithms,
33:21
and these were programs that attempted to take phrases
33:23
written in one language and translate those
33:25
automatically into a second
33:27
language. At first, that seemed pretty straightforward,
33:30
but you realize that's also pretty
33:32
tricky. Really. For one thing, you
33:35
can't just translate word for word and
33:37
keep the same order from one language
33:39
to another. The syntax or
33:41
the rules that the language follow
33:44
uh, they could be different from language to language.
33:47
In one language, you might use an infinitive
33:49
such as to record, in the middle
33:51
of a sentence, while another language might
33:53
put all the infinitives at the end of a sentence.
33:56
So in one language, I might say
33:58
I'm going to record a podcast in the
34:00
studio right now, but in another
34:02
language it might come out as I'm going a
34:04
podcast in the studio right now to record.
34:07
It starts to sound like yoda. There
34:10
was initial excitement around machine
34:12
translation, but once computer scientists
34:14
and linguists began to see the scope
34:16
of this challenge, their excitement
34:19
faded a bit. Also, there was a lot
34:21
of other stuff going on in the nineteen sixties
34:23
and seventies that was demanding a lot of attention,
34:26
such as the Space race. So for
34:28
a while, this branch of computer science
34:30
was given less attention than
34:32
other branches, and by less attention, I
34:35
really mean funding. Now, when
34:37
we come back, we'll talk a bit more about
34:39
the advances that were necessary to support natural
34:41
language processing, and we'll move on to how
34:44
this would be another important component in
34:46
smart speakers. But first, let's take
34:48
another quick break. Okay,
34:57
So early enthusiasm for
34:59
an natural language processing created
35:02
a bit of a hype cycle that ultimately
35:04
crashed into the telephone poll
35:06
of unmet expectations. That
35:10
was a really bad metaphor. Anyway,
35:13
natural language processing went
35:15
through something similar to what we saw
35:17
with virtual reality in the nineteen nineties.
35:19
You know, people saw what was actually
35:22
achievable, and then
35:24
they compared that to what they thought they
35:26
were going to get, and those two things
35:28
didn't match up at all, and that really
35:30
pulled the rug out of funding for natural
35:33
language processing, which men of course,
35:35
that progress slowed way down.
35:37
It kept going, but it
35:40
was definitely on the back burner for a lot
35:42
of projects. When interest renewed
35:44
in the nineteen eighties, there had been
35:46
a shift in thinking around natural
35:49
language processing. Computer scientists
35:51
were starting to look at statistical approaches
35:53
similar to what was going on with speech
35:55
recognition, building up probabilistic
35:58
models in which a computer can start making
36:00
what amounts to educated guesses
36:03
at the meaning of a command or a
36:05
phrase. Machine learning became
36:07
an important component on the back
36:09
end of these systems, and later artificial
36:12
neural networks became an important part
36:14
as well. A neural network processes
36:17
information in a way that's sort of analogous
36:19
to how our brains do it. You
36:21
have nodes or neurons
36:23
that connect to other nodes, and
36:26
each node affects incoming data
36:28
in a certain way, performing some sort of operation
36:31
on it, and the degree to which they
36:33
do that in one way versus another
36:35
is called the weight of that
36:37
node. Computer scientists
36:40
apply weights across the nodes
36:42
in an effort to get a specific result
36:44
in order to train these models.
36:46
So you might feed a specific command
36:49
into such a system, and you let
36:51
it go through the computational process
36:53
from the beginning of the neural network through
36:55
to the end, and then you look at the result,
36:58
and if the result is correct, well,
37:00
that just means the system is already working as
37:02
you intended it, which honestly
37:04
is not likely to happen early on.
37:07
But if it's not correct, then you
37:09
start adjusting the weights on those
37:11
nodes in order to affect
37:13
the outcome. I almost think of it as like Plinko
37:16
or pachinko, where you've got the little coin
37:18
and you drop it down and it bounces on
37:20
all the pegs and sometimes
37:23
you're like you might think, all right, well, this time it's going to
37:25
go right for that center slot, but it
37:27
doesn't, and you think, well, maybe if I remove
37:29
some of these pegs or I shift
37:31
these pegs over a little bit, I can drop
37:33
it in that same spot and get hit the center.
37:36
It's kind of like that, except you're talking about data,
37:38
not physical moving parts. So
37:41
you have to do this a lot, like up
37:43
to like millions of times
37:46
in order to try and train a system so
37:49
that responds appropriately to commands.
37:51
And once it's trained, you can then test
37:53
new commands on the system to see if it can
37:56
parse them and respond appropriately. And
37:58
in this way, the system quote unquote learns
38:01
over time how to respond
38:03
to commands. And then we have another
38:05
component that's important with smart speakers,
38:08
and that's speech generation. So
38:10
it's one thing to have a machine either broadcast
38:13
or play back a recording
38:15
of speech. It's another thing for a
38:17
machine to generate brand
38:19
new speech. In computer science,
38:22
we call it speech synthesis. Now,
38:24
this is the really old technology
38:26
I was alluding to at the beginning of this episode,
38:29
speech synthesis. If you
38:31
want to be really, you know, kind of technical
38:33
about it, it actually predates every
38:36
other technology I've mentioned up to this
38:38
point, at least in its most
38:40
rudimentary implementations. You
38:42
have to go way back to the eighteenth
38:44
century the seventeen seventies, as
38:47
when a Russian smarty pants named Christian
38:49
Kradsenstein was building
38:51
a device that used acoustic resonators.
38:54
These these reads that would vibrate,
38:57
and it was in an attempt to replicate
38:59
basic vowel sounds. Now,
39:01
even with such a working device, it would
39:04
be really difficult to communicate anything meaningful
39:06
unless you were, i guess, speaking
39:08
whale like Dory and finding Nemo.
39:10
But it would be an early example of how people tried
39:13
to create mechanical systems that could
39:15
replicate speech or elements of
39:17
speech. Another inventor named
39:19
Wolfgang von Kimberland built
39:22
an acoustic mechanical speech machine
39:25
and that used reads and
39:27
tubes and a pressure chamber, and
39:29
it was all meant to replicate various
39:31
speech sounds. He had other elements to
39:33
create sounds like plosives, those
39:36
hard sounds that I mentioned
39:38
earlier in the episode. So
39:40
he had all these different elements that, working
39:42
together, could create parts
39:45
of the sounds that we humans
39:47
make when we speak. He also built
39:49
a supposed chess playing machine, and
39:52
it turned out that the chess playing part was a hoax.
39:54
So unfortunately, because
39:57
that device was a hoax, a lot of people
39:59
dismiss his other work, which
40:02
was legitimate. So by
40:05
fudging on one thing, he kind of cast
40:08
doubt on everything he had ever done. Skipping
40:10
ahead quite a bit, we
40:13
get to Homer Dudley, which
40:15
is a fantastic name. He
40:17
unveiled the voter or voice
40:20
Operating Demonstrator device
40:22
at the New York World's Fair in nineteen
40:24
thirty nine. It consisted of
40:26
a complex series of controls and
40:29
it sort of reminds me of something like a
40:31
musical instrument, kind of like a synthesizer,
40:34
but with extra controlling units.
40:36
Like there was like a wrist element, there was
40:38
a pedal. There's a lot of stuff that
40:41
made it very complex, and
40:44
with a lot of practice, you could
40:46
create specific sounds from this
40:48
synthesizer. You could even create
40:50
words or full sentences, though from
40:52
what I understand, it was incredibly
40:55
challenging to do. It was a very high learning
40:57
curve, but it demonstrate the possibility
40:59
of a like tronic synthesized speech. Now.
41:02
There was a lot of work done
41:04
in this field by
41:07
lots of different talented scientists
41:10
and engineers, and someday I'll
41:12
have to do a full episode on the history
41:14
of speech synthesis. It's really fascinating,
41:17
but it's far too big a topic to cover
41:19
in its entirety in this episode. By
41:21
the late nineteen sixties we had our first
41:24
text to speech system,
41:26
and by the late nineteen seventies and early
41:28
nineteen eighties, the state of
41:30
the art had progressed quite a bit and we
41:32
were starting to get to a point where we could create
41:35
very understandable computer
41:37
voices. They weren't natural, they
41:39
didn't sound like people, but you could understand
41:42
what they were saying. And finally, something
41:45
else that would enable smart speakers and virtual
41:47
assistance was the pairing of improved
41:50
network connectivity and cloud computing.
41:52
That removes the need for the device that
41:54
you're interacting with to do all the
41:56
processing on its own. So,
41:59
if you think about or the history of computing,
42:01
we used to do main frames with dumb
42:03
terminals that attached the main frame, so
42:05
the terminal wasn't doing any computing. It
42:07
was just tapping into the mainframe computer,
42:10
which was sending results back to the terminal.
42:12
Then you get to the era of personal computers,
42:15
where you had a device sitting on
42:17
your desk that did all the computing and
42:19
it didn't connect to anything else. Then
42:22
we get up to networking and the
42:24
Internet, where we suddenly
42:26
had the capability of having really powerful
42:28
computers or grids of computers
42:31
that were able to take on processing
42:33
power. Uh, and you just you send
42:35
the request out to the Internet
42:38
and you get the response back. That's the basis
42:40
of cloud computing. So your
42:43
your command or message or whatever
42:46
relays back to servers on the cloud
42:49
that then process it and send the proper
42:51
response to whatever device you're
42:53
interacting with, and then you get the
42:55
result. So with the case of the smart speaker,
42:58
it might be playing a specific so long
43:00
or giving you a weather report or whatever it might
43:02
be. Now, if the speakers were
43:04
doing some of that computation themselves, that
43:06
would be an example of edge
43:09
computing, where the processing
43:11
takes place at least in part, at the
43:13
edge of a network at those end points.
43:16
But for now, most of the implementations
43:18
we see send data back to the cloud
43:21
to get the right response, so you have to have a persistent
43:23
Internet connection. These devices are
43:25
not useful without that connection.
43:27
You do have some smart speakers that can
43:29
connect to another device
43:32
like a smartphone via Bluetooth, so
43:34
you could do things that way, but
43:37
without those connections, the smart speaker
43:39
turns into, you know, just a dumb
43:41
speaker, or sometimes just a paperweight. Now,
43:44
this collection of technologies and disciplines
43:46
are what enabled Apple to introduce
43:49
Sirie in two thousand and eleven, and
43:52
Syria is a virtual assistant.
43:54
Series origins actually trace back to the
43:56
Stanford Research Institute and
43:58
a group of guys Grouber, Adamshire
44:01
and dog kit Louse who
44:03
had been working on the concept since the nineteen
44:05
nineties, and when Apple launched
44:08
the iPhone in two thousand seven, they saw
44:10
the iPhone as a potential platform for
44:12
this virtual assistant that they had
44:14
been building, and they thought, well,
44:16
this is perfect because the iPhone has a microphone,
44:19
so the assistant can respond to voice
44:21
commands as a speaker, so it could communicate
44:24
back to the user, it could do all sorts of stuff.
44:26
We can tap into the interoperability
44:29
of apps on the device. It's a perfect
44:32
platform for us to deploy this. So
44:34
they developed an app once the opportunity
44:36
arose because apps were not available
44:39
for development immediately when Apple
44:41
launched the iPhone, and
44:44
once they did launch that app, uh
44:47
within a month, less than a month,
44:49
Steve Jobs was on the phone calling them up and
44:51
offering to buy the technology, which of
44:53
course they would agree to and it would become an
44:55
integrated component in Apple's iPhone
44:57
line afterward. And that's
45:00
where voice assistants kind of lived
45:02
for a few years. They mostly lived on smartphones
45:05
like the iPhone. But in November
45:07
two thousand fourteen, Amazon introduced
45:10
the Amazon Echo smart speaker,
45:12
which was originally only available for Prime
45:14
members, and it had its own virtual
45:17
assistant named Alexa, and
45:19
thus the smart speaker era officially
45:21
began. Now, there are plenty of
45:23
other smart speakers that are on the market
45:26
these days. There are products from Google
45:28
like Google Home. Uh, there are
45:30
so no speakers that can connect to services
45:33
like Amazon's Alexa or Google's Assistant,
45:35
and we're probably going to see a ton more,
45:38
both from companies that piggyback onto services
45:40
from the big providers like Google and Amazon,
45:43
and maybe some that are trying to make a go of
45:45
it with their own branded virtual assistants
45:47
and services. Smart speakers
45:50
respond to commands after they quote unquote
45:52
here a wake up word or phrase.
45:55
Now, I'm gonna make up a wake
45:57
up phrase right now so that I
45:59
don't set off anyone's smart speaker
46:02
or smart watch or smartphone or smart
46:04
car or whatever it might be. So this
46:07
is just a fictional example of a wake up
46:09
phrase. So let's say I
46:11
have a smart speaker and the wake up
46:13
phrase for my smart speaker happens
46:15
to be hey, they're Genie. Well,
46:18
my smart speaker has a microphone, so it can
46:20
detect when I say that, but
46:23
really it's constantly detecting all
46:26
sounds in its environment.
46:29
The microphone is always active. It has to be
46:31
in order to be able to pick up on when
46:33
I say the wake up phrase. So
46:37
the microphone is always active on most smart
46:39
speakers. There's somewhere you can
46:41
program it so that it will only activate
46:44
if you first touch the speaker and
46:46
that wakes it up. There's some that you
46:48
can do that with, But for the most part, they're
46:50
always listening. While the
46:52
speaker can quote unquote here everything,
46:55
it's not listening to everything.
46:57
In other words, it's not mon
47:00
of during the specific things being said. At least
47:02
that's what we've been told. And honestly, that
47:04
makes a ton of sense from an operational
47:06
standpoint. And the reason I say that is
47:09
that the sheer amount of information
47:11
that would be flooding in from all the microphones
47:14
on all the smart devices from any one
47:17
provider that happened to be deployed
47:19
all over the world, that would be an astounding
47:21
amount of data. And sifting
47:23
through all that data to find stuff that's useful
47:26
would take an enormous amount of effort and time
47:29
and and processing power. So
47:31
while you could have all the microphones
47:33
listening in all over the place, finding
47:36
out who to listen to at what time
47:38
would be a lot trickier and probably not worth
47:40
the effort it would take to pull something like that
47:43
off. So what
47:45
these speakers and other devices are actually
47:48
doing is looking for a signal
47:50
that matches the one that represents the wake
47:52
phrase. So when I say, hey,
47:55
they're Genie, the microphone
47:57
picks up my voice, which the mic
47:59
then try inslates into an electrical signal
48:01
which gets digitized and compared against
48:04
the digital fingerprint of the predesignated
48:06
wake up phrase. And in this
48:08
case, the two phrases match. It's
48:11
like a fingerprint matching something
48:13
that was left at a site. So that
48:16
turns the speaker into an active
48:18
listener rather than a passive one. It's
48:20
ready to accept a command or
48:22
a question and to respond to
48:24
me. But if I didn't say,
48:26
hey, they're Genie, then the speaker
48:29
would remain in passive mode because
48:32
it wouldn't have a digital fingerprint
48:34
that matches the one of the wake
48:36
up phrase. Everything stays at
48:38
the local level, and none of my sweet
48:40
secret speech gets transmitt related
48:42
across the internet. It's all staying right there.
48:45
At least that's what we've been told. And
48:47
again I don't have any reason to disbelieve this,
48:50
but it is something to keep in mind. You are
48:52
talking about devices that have microphones.
48:54
Of course, if you have a smartphone, you've already got one
48:56
of those or a cell phone. In general, you've
48:58
got a device with a microphone on it
49:01
neck near you pretty much all the time. Now,
49:04
once I do make a request with my smart
49:06
speaker, the speaker then sends that request
49:09
up to the cloud where it gets processed,
49:11
It's analyzed, uh, and then
49:13
a proper response is returned
49:15
to me, whether that is playing
49:18
a song or giving me information I've asked
49:20
for, or maybe even interacting with some other smart
49:22
device in my home, such as adjusting
49:24
the brightness of my smart lights
49:26
in my house. Now, if the system
49:29
is not sure about whatever it was I just
49:31
said, it will probably
49:33
return an error phrase. So maybe
49:36
maybe I'm too far away from the speaker,
49:38
so it's it couldn't quote unquote
49:40
hear me really well. Or maybe I've
49:42
got a mouthful of peanut butter or something
49:44
as I want to do. Then I'm going to
49:47
get something like I'm sorry, I don't know how to do
49:49
that, or I'm sorry I didn't understand you, and
49:51
then I'd have to repeat it. Now, smart
49:53
speakers are pretty cool. However, they
49:55
do represent another piece of technology
49:58
that you have to network to
50:00
other devices, including your
50:02
own home network, and as such
50:05
that means that they represent a potential
50:08
vulnerability in a network. It
50:10
doesn't mean they're automatically vulnerable, but
50:13
it means that every time you are connecting
50:15
something to your network, then
50:18
you're creating another potential attack
50:21
vector for a hacker. Right
50:23
now, if everything is super strong, it
50:26
it doesn't really effectively
50:29
change your safety in any
50:31
meaningful way. But if one of those
50:34
things that you connect to your network is less
50:36
strong than the others, you're looking at the
50:38
weakest link situation where a hacker
50:40
with the right know how in tools could
50:42
potentially target that part of your
50:44
network to get entry into
50:46
everything else. And when you're
50:49
talking about a smart speaker, you're
50:51
talking about device that has an active
50:54
microphone on it. So potentially,
50:56
if someone were able to compromise a smart speaker,
50:59
they would be able to listening on anything that
51:01
was within range of that smart
51:03
speakers microphone. So that's
51:06
why you have to at least be
51:08
cognizant of that, do your
51:10
research, make sure the devices you're connecting
51:12
to your network are rated well
51:15
as from a security standpoint, when
51:17
you're setting things up and you have to create
51:19
passwords, create strong passwords
51:22
that are not used anywhere else.
51:24
The harder you make things the more
51:27
likely hackers will just pass you
51:29
by, not because you're
51:31
too tough to crack. Never get
51:33
your into your head that you're too strong
51:35
to to be hacked, but rather
51:38
if there's someone who's weaker than the
51:40
hackers are going to go after that person instead.
51:43
So just don't be the weak person. Practice
51:45
really good security
51:48
behaviors, and you're more likely
51:50
to discourage attackers and
51:52
they'll they'll go on to someone else.
51:55
Um, especially if you're talking
51:57
about newbies who don't really know their way
51:59
around their just using tools that other people
52:01
have designed. They get discouraged very
52:03
quickly. They'll move on to someone else because there's always
52:06
another potential target. I'm
52:08
curious about you guys, whether or not you
52:10
have any smart speakers in your life,
52:13
and uh if you find them useful. I
52:15
find mine pretty useful. I
52:17
use it for a very narrow range
52:20
of things. I don't tend to use it.
52:23
I definitely don't use it to its full potential. I
52:25
know that because what's in the blue
52:27
moon. I'll just try something and
52:29
I'm amazed at what happens when when
52:31
I get a response. But for
52:33
the most part, I'm asking about whether what
52:35
I can feed my dog whether or
52:38
not it can turn on the lights and uh
52:40
and and that's about it. Are
52:42
occasionally playing a song. Um,
52:45
but I'm curious what you guys are using them for. Reach
52:47
Out to me on social networks on Facebook
52:50
and I'm on Twitter, and the handle for both of those
52:52
is text stuff. H s W
52:55
also use that those handles if you
52:57
have suggestions for future episodes. If you've got,
52:59
you know, an idea for either a company
53:01
or a technology or a theme in
53:04
tech you'd really like me to tackle, let
53:06
me know there and I'll talk to you
53:08
again really soon. Text
53:15
Stuff is a production of I Heart Radio's How
53:17
Stuff Works. For more podcasts from my
53:19
heart Radio, visit the i heart Radio
53:21
app, Apple Podcasts, or wherever you
53:23
listen to your favorite shows.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More