Podchaser Logo
Home
How Smart Speakers Work

How Smart Speakers Work

Released Monday, 27th January 2020
 1 person rated this episode
How Smart Speakers Work

How Smart Speakers Work

How Smart Speakers Work

How Smart Speakers Work

Monday, 27th January 2020
 1 person rated this episode
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:04

Welcome to Tech Stuff, a production of

0:06

I Heart Radios How Stuff Works. Hey

0:12

there, and welcome to tech Stuff. I'm your

0:14

host, Jonathan Strickland. I'm an executive producer

0:16

with I Heart Radio and I love all things tech,

0:19

and guys, stick with me. I am

0:22

fighting off a cold. You'll

0:24

be able to hear it in my voice. I have no doubt.

0:26

But you know, I wanted to get you guys

0:29

a brand new episode. So we're gonna fight

0:31

on because the show must

0:34

keep going. I

0:36

think I think this is saying, oh no, this

0:38

cold medicine is good though. All right, Anyway, I thought

0:41

that we would do an episode about

0:44

smart speakers because I

0:46

wanted to kind of start this whole episode off

0:48

with with an old man observation,

0:51

you know, get off my lawn kind of thing.

0:53

And this is from our resident old man,

0:56

old man Strickland. That meaning meaning

0:58

me, So, when I was young, speakers

1:01

were dumb. Now I don't. I don't mean

1:03

that speakers were useless, or

1:05

that they were terrible, or that they

1:07

were incapable of replicating certain

1:09

frequencies or volumes of sound,

1:12

or that they were limited in some other

1:14

way other than they didn't quote

1:16

unquote think they didn't

1:18

connect to any sort of computational

1:21

engine in a meaningful way. You might

1:23

have a set of speakers plugged into a computer,

1:25

but that was just a one way communications tool,

1:27

right. It was just a way to provide an outlet

1:29

for sound that your computer was generating,

1:32

nothing more than that. But contrast

1:34

that with today, when we have numerous

1:36

smart speakers on the market. These speakers

1:39

act as a user interface between

1:41

us and the Internet at large, often

1:43

facilitated by a virtual assistant

1:46

of some kind. Now with these

1:48

speakers, we don't just listen to stuff

1:50

like music and podcasts and

1:53

the radio and you know, other traditional

1:55

audio content. We use them

1:57

to find out information. We might

2:00

link them to our calendars so that

2:02

we can get reminders for upcoming appointments.

2:05

We probably use them to ask about the weather

2:07

report. I use mine at home

2:09

for that all the time, or even

2:11

more often than that, if you're at my house, you'll

2:14

hear us use it to find out which foods

2:16

are safe for us to feed to our dog. My

2:18

doggie, Tibolt, absolutely loves our smart

2:21

speaker because it frequently gives us permission

2:23

to spoil him with a carrot or

2:25

a piece of banana. But how

2:27

do these smart speakers work,

2:30

How are they able to respond to

2:32

our requests? And what are their

2:34

limitations? How safe are they?

2:37

That's the sort of stuff we're gonna be looking into in

2:39

this episode of tech Stuff, and we'll start

2:41

off with the basics, which means

2:43

we have to start off with how speakers work

2:46

in general. Now, this is something

2:48

that I've covered before on tech Stuff, but

2:50

I want to go over it again from a high level

2:52

because well, I just find it fascinating

2:55

that people figured out how to harness electricity

2:58

to drive a motor so that it could

3:00

in turn cause components

3:02

to replicate a recorded or transmitted

3:05

sound. And really motors being too

3:07

generous, but to drive an element to

3:09

create vibrations that could replicate a

3:11

sound that was made into another component,

3:14

that whole thing just boggles my mind that

3:16

people are smart enough to figure that out. Okay,

3:19

So to understand how speakers work,

3:21

it first helps to understand how sound

3:24

itself works. Sound is a

3:26

physical phenomenon. Do

3:28

do do do? Sound is all about vibrations,

3:31

and typically we experience sound

3:33

when we pick up on changes in air pressure

3:36

that enter through our ear

3:38

canal and then affect the tympanic membrane

3:40

or ear drum. So it's

3:42

all about these changes

3:44

of of of air pressure, all

3:46

about air molecules transmitting vibrations

3:49

from a source outward

3:51

in a radiating pattern from from

3:53

that source. So let's think of

3:55

someone knocking on a door. For example,

3:57

you're inside a house, someone's knocking on your door.

4:00

When that person's hand hits the door,

4:03

it causes the door to vibrate, and

4:06

that vibration transmits to the surrounding

4:08

air molecules on the other side of the door.

4:10

They get pushed through that vibration

4:13

and then pulled when the

4:16

the wood is vibrating back towards its

4:18

original position. So the

4:20

air molecules vibrate, those air molecules

4:23

cause the next surrounding layer of

4:25

air molecules to vibrate as well, and

4:27

so on and so forth. It's like a cascade

4:30

or domino effect. You get these little pockets

4:32

of high and low air pressure that travel

4:34

outward from that door.

4:37

It spreads further as it goes towards

4:40

you know, any distance, and if

4:43

you are close enough so that

4:46

you can still detect those changes in air pressure.

4:49

You experience this by hearing the knocking

4:51

on the door. Those vibrating air molecules lose

4:54

a bit of energy as they move

4:56

outward. Right, as they vibrate to the

4:58

next layer, you start to lo use a bit of

5:00

energy with each transmission

5:02

of that So the sound gets quieter

5:05

the further away you are because

5:07

there's not as many air molecules vibrating,

5:10

its amplitude as decreased. So

5:13

if you are in hearing range, you can

5:15

pick up on those changes of air pressure they encounter

5:17

the tympanic membrane in your ear canal. Those

5:19

changes in pressure will cause a reaction in your

5:22

middle and inner ear set that

5:24

will ultimately get picked up by

5:26

your brain that interprets it as sound.

5:29

Now, the frequency at which those fluctuations

5:32

occur relate to the pitch

5:35

that we hear, so faster

5:38

vibrations are higher

5:40

pitches, higher frequencies, higher

5:42

notes. If you think of a musical scale,

5:45

we perceive the force of the

5:47

changes as volume, so

5:50

lower forces lower volume right, and

5:53

higher forces higher volume. The

5:55

human ear can hear a pretty decent

5:57

range of frequencies from twenty hurts,

5:59

which means twenty cycles or

6:01

twenty waves per second

6:04

past a given point of reference,

6:06

to twenty killer hurts. That's twenty

6:09

thousand cycles or waves

6:11

per second. So yeah, the cycle refers

6:14

to the frequency of the wavelength of sound.

6:17

The lower the frequency, the lower the sound.

6:19

All right, and then our brain has to make meaning

6:21

of all this, Right, it's not just that it's

6:23

picking up on it. Our brain interprets

6:26

this and we experience

6:28

it as a sound

6:30

we have heard. So it either matches

6:33

this perceived sound with one we've encountered

6:35

before, and then we

6:38

say, oh, I know what that is. That's someone knocking

6:40

at the door, or they

6:42

might be Holy Cala, I've never heard that

6:45

sound in my life. I have no idea what it is.

6:48

If the sound is language, then our brains

6:50

have to derive the meaning from the perceived

6:53

sound. We've heard someone say

6:55

words such as you're hearing me say this. Then

6:59

our brains have to take that

7:01

collection of sounds and say, what does that actually

7:03

mean? What is the the context,

7:06

what is the the intent? What is the

7:08

message here? Otherwise it would just

7:10

be you know, random noises

7:12

that I'm making with my mouth. Alright,

7:14

so we have a basic understanding behind

7:16

the physics of sound. Now to talk about speakers

7:19

and microphones and the reason I'm

7:21

going to talk about both of them is that

7:24

the devices complement one another. You can

7:26

think of one as being the other in reverse.

7:28

Plus smart speakers

7:30

we have to talk about microphones anyway, because

7:33

smart speakers have microphones as

7:35

well as the speaker element. So

7:38

you can think of this as one long process

7:40

of taking the physical phenomena of

7:42

sound waves, transforming

7:44

that physical phenomena into an electrical

7:47

signal, taking the electrical signal,

7:49

and changing it back into something that can produce

7:51

the sound waves that started the whole

7:53

thing. So you're replicating the original

7:56

sound waves with this end

7:59

device, which in this case is allowed speaker.

8:01

So the microphone is the part of the process where

8:04

you take the sound and you turn it into an electrical

8:06

signal, and the speakers where you take the

8:08

electrical signal and you turn it back into actual

8:10

sound. That's the simple way. But what's actually

8:12

happening, Well, let's

8:14

talk about on a physical level. Sound

8:16

waves go into a microphone.

8:19

So you've got these fluctuations

8:22

and air pressure that encounter a microphone.

8:24

I'm speaking into a microphone right now,

8:27

so this is happening right now. Inside

8:29

the microphone is a very thin diaphragm,

8:32

typically made out of a very flexible

8:34

plastic, and it's sort

8:36

of like the skin of a drum. So

8:38

as the changes in air pressure encounter

8:41

the diaphragm, they cause the diaphragm

8:43

to move back and forth. Well. Attached

8:46

to the diaphragm is a coil of

8:48

conductive wire, and that coil

8:50

wraps either around or near

8:52

a permanent magnet. Magnets have

8:54

magnetic fields. They have a north pole

8:57

and a south pole, and there's a magnetic field

8:59

that surrounds the magnet.

9:02

And the electro magnetic effect means

9:05

that if you move a coil

9:07

of conductive wire through

9:09

a magnetic field, it will produce

9:11

a change in voltage in that coil,

9:14

otherwise known as electromotive

9:16

force, and that means electrical current

9:19

will flow through the coil. Now,

9:21

if you have the end of that coil attached

9:24

to a wire, a conductive

9:26

wire for that current

9:28

to flow through, you can send that current

9:30

onto other components. So for our

9:33

purposes, the component in question

9:35

would be an amplifier, and I'll get

9:37

to explaining why that is in just a

9:39

moment, but first let's talk about loud

9:41

speakers, and the way allowed speaker works

9:44

is essentially the reverse

9:46

of a microphone. You've got your permanent

9:48

magnet around or near which

9:51

is a coil of conductive wire. The

9:53

wire is connected to a diaphragm,

9:56

one much larger and typically made

9:58

out of stiffer material that the plastic

10:01

you'd find in a microphone. This

10:03

is the element inside a speaker that will

10:05

vibrate, that will push air and pull

10:08

air as it moves either

10:10

outward or inward. The electrical

10:12

signal comes from a source such as

10:15

the microphone we were just using a second

10:17

ago that comes into the loudspeaker

10:19

and it flows through the coil. Now,

10:22

when you have an electrical current flowing

10:24

through a conductive coil, you

10:27

generate a magnetic field because

10:29

the laws of electromagnetism. You've

10:31

got the electro magnetic

10:34

field generated as a result. Now

10:36

that field will interact with the magnetic

10:38

field of the permanent magnet. That the

10:41

permnet magnet always has a magnetic field.

10:43

The coil only has one when electric

10:45

current is flowing through it. And

10:47

as I said, we have magnets to have a north

10:50

pole and a south pole. And we also know

10:52

that when we bring two magnets with

10:54

their north poles together, they'll

10:56

push against each other, right because like

10:59

repels like, But if

11:01

we turn one of those magnets around so that

11:03

now it's a south pole and a north pole,

11:06

they attract one another, you

11:08

know, opposites attract. So

11:11

by having the this magnetic

11:14

field being generated by the coil, uh,

11:17

it starts to generate

11:20

interactions with the magnetic field of the permanent

11:23

magnet, so they

11:25

start to push and pull against each other. Well,

11:28

the coil is attached to that diaphragm,

11:30

so it in turn drives the diaphragm

11:33

to either push outward or pull inward.

11:36

That causes air molecules

11:38

to vibrate, just as it would

11:41

with any other you know, source of sound,

11:43

and it emanates outward from the loudspeaker,

11:46

so you get a representation

11:48

of the same sound that was going into

11:51

the microphone got converted

11:53

into an electrical current. The electrical

11:55

current then was passed

11:58

through a coil and next to a permanent

12:00

magnet to create the same sort

12:02

of movement. It replicates the movement of the

12:04

original diaphragm in the microphone and

12:07

generates the sound. So

12:09

you get the replication of the sound

12:11

that was made in the other location. It's

12:14

pretty cool. I think now I did

12:16

mention earlier that you would need

12:18

an amplifier. And the reason you need

12:20

an amplifier is that the electrical signal

12:22

generated by a microphone is

12:24

far too weak to drive allowed

12:27

speakers diaphragm. You just wouldn't

12:29

have the juice to do it. It would be

12:32

much much less, uh powerful

12:34

than what the speaker would need. So chances

12:36

are the diaphragm would either not move at

12:39

all because it would just be too stiff, it would resist

12:41

the movement too much, or it would move

12:44

so weakly as to generate little

12:46

to no sound, so it wouldn't do you any good. So

12:49

the signal from the microphone has to

12:51

first pass through an amplifier, which, as the

12:53

name implies, takes an incoming signal

12:55

and increases the amplitude of

12:57

that signal the volume. In other words,

13:00

uh so, it doesn't affect pitch, but it does

13:02

affect the signal strength and consequently the

13:04

volume. And I've done episodes

13:06

about amplifiers, including explaining

13:09

the difference between amplifiers that use vacuum

13:11

tubes and ones that use transistors,

13:14

so I'm not going to go into that here.

13:16

Besides, it doesn't really factor

13:18

into our conversation about smart speakers

13:20

anyway. It's just important for

13:23

it to work with a microphone and speaker

13:25

setting. Now, over the years, engineers

13:27

have paired microphones and speakers and lots

13:30

of stuff. You've got telephones, you've

13:32

got intercom systems, public address

13:34

systems, handheld radios, all sorts of

13:36

things, so that technology was

13:38

well and truly mature. Before

13:40

we ever got our first smart speaker,

13:43

there wasn't much call to incorporate

13:45

microphones into home speaker systems

13:48

for many years. I mean, what would

13:50

you actually use a microphone embedded

13:52

in a speaker for? Before smart speakers,

13:54

Typically you would have your speakers

13:56

like I'm talking about, like like sound system speakers.

13:59

You would have them hooked up to some other dumb

14:02

as in, not connected to a network

14:04

technology. So it might be a sound

14:06

system or home entertainment set

14:08

up with a television as the focal point, or maybe

14:11

even you know, a computer for the purposes

14:13

of playing more dynamic sounds for like video

14:15

games and and things like that.

14:18

Um. But for a very long time,

14:21

these were all thought of as one way communications

14:23

applications, right, Like, the sound was

14:25

coming from a source and it would get to us

14:27

through the speakers, but we weren't meant to send

14:31

sound back through those same channels.

14:33

The information was just coming to you. You weren't

14:35

sending anything back, But that would all change

14:38

in time. Now. One thing to keep in

14:40

mind about smart speakers is that

14:42

they are the product of several different technologies

14:45

and lines of innovation and development that all

14:47

converged together. The microphone

14:49

and speaker technology is one of the oldest

14:52

ones that we can point to as far as the

14:54

fundamental underlying technology

14:56

is concerned, the stuff that's been

14:58

around since the late nineties century.

15:00

Now there is one other we'll talk about that's

15:02

even older. But I don't

15:04

want to spoil things. I'll just mention there

15:07

is an even older line of

15:09

development that goes into smart speakers

15:12

than the microphone speaker stuff of

15:14

the nineteenth century. Most of the

15:16

other components, however, are much younger

15:18

than that. One big one is

15:21

speech or voice recognition. Creating

15:24

computer systems that could detect noise

15:26

was relatively simple. Right. You could have a computer

15:29

connected to microphones and they

15:31

could monitor the input from those

15:33

microphones and any incoming

15:35

signal could be registered. Right, they could

15:37

record an incoming signal that

15:39

would indicate the microphone had detected a

15:41

noise. That's child's play. That's

15:43

easy to do. But teaching computers

15:46

how to analyze those signals and decipher

15:48

them so that the computer could display

15:50

in text or otherwise act

15:53

upon that that sound

15:55

in a meaningful way that was much

15:57

more difficult. There was

16:00

an IBM engineer named William

16:02

C. Dirsh of the Advanced System

16:04

Development Division who created an early

16:07

implementation of voice recognition. It

16:09

was a very limited application, but it

16:11

proved that the ability to interact

16:13

with computers by voice was more

16:15

than just science fiction. Within

16:18

IBM. It was called the Shoebox.

16:21

Dirsh worked on this project in

16:23

the early nineteen sixties and what he

16:25

produced was a machine that had a

16:27

microphone attached to it. The

16:29

machine could detect sixteen spoken

16:32

words, which included the digits

16:34

of zero to nine plus

16:37

some command indicators like plus

16:39

minus total, sub total.

16:42

You get the idea. So you could speak a

16:44

string of numbers and then commands

16:46

to this device, then ask it to total

16:48

everything and it would do so. So it was

16:51

more or less a basic calculator

16:53

with some voice interpretation incorporated

16:56

into it. Now there's a

16:58

great newsreel piece about this

17:01

shoebox. There's a demonstration of it, and

17:03

it came out in nineteen one, and

17:05

I love that newsreel because it has

17:08

that great music you would hear in the background of

17:10

those old industrial and business films.

17:12

Anyway, there's also a helpful chart

17:14

that hangs in the background of

17:17

that video where Dersh is

17:19

actually explaining how it works. You

17:21

can see a little bit behind him

17:24

what the what is actually being analyzed

17:27

and uh he broke the words down

17:29

into phonemes and syllables, so

17:31

phonemes being specific sounds

17:34

that make up words. So,

17:36

for example, the digit one is

17:39

a single syllable word with a vowel

17:41

sound right at the front. But you also

17:43

have the word eight that's

17:46

another single syllable word as

17:48

a vowel sound right at the front, but

17:50

it's different from one phonetically

17:53

in that eight also has a

17:55

plosive and has that hard t at

17:57

the end. So the shoebox

17:59

was limited not just in what

18:02

words it could recognize, but also the

18:05

types of voices it could recognize.

18:07

Get someone who has a different dialect or

18:09

manner of speech, and the machine might not be able to

18:12

understand them because they're not pronouncing

18:14

the words the same way that drsh did.

18:17

This would be a big challenge in

18:20

speech recognition moving forward, and

18:22

it's also an example of where we

18:24

find bias creeping into technology.

18:27

And it's not necessarily a conscious thing,

18:30

but if you have people designing

18:32

a system and they're designing it based

18:34

off their own uh,

18:36

you know, speech patterns, their own

18:39

pronunciations, their own dialects,

18:41

then it may be that the system

18:44

they create works really well for them

18:46

and less well for anyone who isn't

18:48

them, And the further away you are from

18:50

their manner of speaking, the

18:53

more frustration you will encounter

18:56

as you try to interact with that technology.

18:59

That's an example of s and in

19:01

fact, if you read the histories

19:03

of speech recognition and as we'll get

19:05

too later natural language processing,

19:08

you'll see a lot of people say it works

19:10

great if you happen to be a white

19:12

man, because the

19:15

manner of speech was being or

19:17

the people who were designing it were primarily

19:19

white men who were uh

19:22

typically aiming for a a what

19:25

is considered a non accented

19:27

American dialect somewhere

19:30

in you know, the Eastern seaboard

19:32

side. But that meant

19:34

that if you did have an accent or a dialect,

19:37

or you had a different vernacular,

19:40

that it was harder for the systems

19:42

to actually understand what you were saying. That's

19:44

an example of bias. Well. The

19:46

general strategy was again to break up

19:49

speech and too constituent sound units, you

19:51

know, those phonemes, and then to susse out

19:53

which words were being spoken

19:55

based on those phonemes, and

19:57

that was done by digitizing the voice train,

20:00

forming it from sound into data

20:02

that represented stuff like the sounds

20:04

frequency or pitch, and then

20:07

matching up specific signal signal

20:09

signatures with specific phone nmes.

20:11

So generally the idea was that the computer system would

20:13

monitor incoming sound, convert the

20:15

sound into digital data, compare that

20:18

data that had received with information

20:20

stored in a database, and effort to

20:22

look for matches. Uh. The shoebox

20:25

database was just sixteen words and size.

20:27

Later ones would be much larger, but pretty quickly

20:30

people realized this was

20:32

not an efficient way of doing

20:34

speech recognition because the bigger

20:36

the vocabulary, the more work intens

20:38

of it was to build out those databases.

20:41

So it wasn't something that people thought would

20:43

be sustainable for very

20:45

large vocabularies. But the Shoebox

20:48

marked the beginning of a serious effort to create machines

20:50

that could accept audio cues as actual input,

20:52

and as we'll see, that's one important

20:55

component for these smart speaker systems.

20:57

I've got a lot more to say, but before I get into

20:59

the next part, let's take a quick break.

21:09

Now, obviously we didn't jump right

21:11

into full voice recognition right after

21:13

IBM S Shoebus innovation. The

21:16

challenges related to building automated

21:18

speech recognition systems were numerous,

21:20

even for just a single language,

21:23

because, as I said, you can have accents and

21:25

dialects. One voice can have a

21:27

very different tonal quality from another,

21:30

people speak at different speeds. Teaching

21:32

machines how to recognize speech when the phonemes

21:34

and pacing of that speech aren't

21:37

consistent from speaker to speaker, that's

21:40

really hard. This kind of gets back to

21:42

the same sort of challenges you have when you're teaching

21:44

machines how to recognize images. You

21:47

know, you teach a human what a

21:50

coffee mug is. I always use this example,

21:52

but you teach a human what a coffee mug is, and

21:54

pretty soon they can extrapolate from

21:56

that example and understand

21:58

that coffee mugs can them in all different sizes

22:02

and colors, and you know different

22:04

designs and textures. We

22:07

get it. Like you you see a couple of coffee mugs,

22:09

you understand machines though

22:12

they aren't able to do that. Machines,

22:14

you know, you have to give them lots and lots

22:17

and lots of different examples before they can

22:19

start to pick up on what things

22:22

actually make a coffee mug. Same

22:24

sort of thing with speech, right, So

22:27

if you don't have consistency between

22:29

speakers, it makes it very hard for machines

22:31

to learn what people are saying. Now,

22:33

it didn't take long for the tech industry at

22:35

large to really dive into trying to solve

22:38

this problem. In ninete, DARPA,

22:41

that's the Research and Development division of

22:43

the United States Department of Defense,

22:45

got behind speech recognition in a big

22:47

way. Now, remember darp it self

22:50

doesn't do research. The organization's

22:53

purpose is to invite organizations

22:55

to pitch projects that align

22:57

with whatever darpest goals are and

23:00

and DARBA would provide funding to the

23:02

winning organizations to see

23:05

these projects to completion if possible.

23:07

So DARK is really more of a vetting and funding

23:10

organization anyway. In

23:12

n DARPA created

23:14

a five year program called Speech

23:16

Understanding Research or s u

23:18

are. The initial goal was

23:20

pretty darn ambitious considering the capabilities

23:23

of the technology at the time. The project

23:25

director, Larry Roberts, wanted

23:27

a system that would be capable of recognizing

23:29

a vocabulary of ten thousand words

23:32

with less than ten percent error. After

23:34

holding a few meetings with some of the leading

23:37

computer engineers of the day, Roberts

23:39

suggusted that goal significantly.

23:42

After that adjustment, the target was going

23:44

to be a system capable of recognizing

23:47

one thousand words, not ten

23:49

thousand. Nearror levels

23:51

still had to be less than ten percent, and

23:54

the goal was for the system to be able to accept

23:56

continuous speech, as opposed

23:58

to very deliberate

24:01

speech with pauses

24:05

between each pair of words that would

24:07

not be really that useful.

24:11

One person who was skeptical about

24:13

the potential success of this project

24:15

was John R. Pierce of Bell

24:17

Labs. He argued that any

24:19

success would be limited so long

24:21

as machines remained incapable of understanding

24:24

the words, not just recognizing

24:27

a word based on phone names, but understanding

24:29

what the word is. That is. Pierce felt that

24:31

the machines needed some way to parse the

24:33

language to get to the meaning of what was

24:35

being said. That's an important

24:37

idea that we will come back to in just a bit now.

24:40

Among the companies and organizations that landed

24:42

contracts with DARPA were a Carnegie

24:44

Melon University BBN, which

24:46

actually played a big part in developing our ponette,

24:49

the predecessor to the Internet, Lincoln

24:51

Laboratory, and several more and

24:54

very smart people began to create systems

24:56

intended to recognize speech and meaningful

24:58

ways. The names of the programs

25:00

were a lot of fun. There was h W I

25:03

M that stood for hear what

25:05

I mean as in here as in listen

25:07

hear what I mean. That one was from BBN.

25:10

CMU introduced hearsay,

25:12

which was later designated as Hearsay one,

25:15

and then they came out with Hearsay two. They

25:17

also would demonstrate another

25:20

one called harpy.

25:22

Oh, and there was a professor at CMU named Dr

25:24

James Baker who would design a system

25:27

called Dragon in nineteen seventy

25:29

five that he would later leverage into a company

25:31

with his wife, Dr Janet M. Baker

25:34

in the nineteen eighties, and they had a very successful

25:36

business with speech recognition software.

25:39

Now, I'm not going to go into each of those programs

25:42

in deep detail, but rather just mentioned

25:44

that they all helped advance the cause of

25:46

creating systems that can recognize speech.

25:49

One of the big developments that came out of all

25:51

that work was a shift to probabilistic

25:54

models, which would also play a really

25:56

important part in another phase of

25:58

developing the smart speaker. So what do

26:00

I mean when I say probabilistic? Well,

26:02

as the name indicates, it all has

26:04

to do with probabilities. Essentially,

26:07

systems would analyze incoming phonemes

26:09

and make guesses as to what

26:11

was being said based on the probability

26:14

of it being a given word or part

26:16

of a word. The systems typically

26:18

go with whatever word has the highest

26:20

probability of being the correct one.

26:23

Even with that approach, there are nuances

26:26

to language that are difficult to account for with

26:28

a machine. So, for example, you have homonyms

26:30

and which you have two words that sound the same

26:33

but have very different meanings and potentially

26:35

spellings like right as

26:37

in to write a sentence, or

26:40

right as in am I right? Or

26:42

am I wrong? Or you could have a pair

26:44

of words that sound like

26:46

a single word and have confusion

26:48

there, such as a door. You

26:51

can say a door you mean you're meaning a

26:53

single door a door to go into

26:55

a building, or you might say a dore as

26:57

an I adore this podcast

26:59

you're doing, Jonathan. That's sweet of you, Thank

27:02

you for saying that. So computer

27:04

scientists were hard at work advancing

27:07

both the capability of machines to make

27:09

correct guesses at individual phone names

27:12

and then full words, as well as

27:14

figuring out a way to teach machines to adjust

27:16

guesses based on context. That

27:19

requires a deeper understanding of the language

27:22

within which you're working. If you're aware

27:24

of certain idioms, you can make a

27:26

good guess at a word or phrase even if

27:28

you didn't get a clean pass at

27:31

it right. So, for example, the

27:33

phrase it's raining cats and dogs

27:35

just means it's raining a lot. And if

27:37

a system included a database that indicated

27:40

the phrase cats and dogs sometimes

27:42

follows the phrase it's raining, then

27:45

the system is more likely to guess the correct

27:48

sequence of words instead

27:50

of guessing something that sounded similar

27:52

but it's wrong. For example, if it

27:54

said, oh, they must have said it's raining bats

27:57

and hogs, that

27:59

would not makes sense. So

28:02

the systems estimate the probability

28:04

that any given sequence of sounds within

28:07

the database matches what the systems

28:09

have just quote unquote heard progress

28:11

in this area was steady, but slow, and

28:14

I'd argue that it was also a reminder that concepts

28:16

like Moore's law do not apply universally

28:19

across technology. Rapid development

28:21

in one particular domain of technology is

28:24

not necessarily an indicator that the same sort

28:26

of progress will be observed in all other

28:28

areas of tech. We often

28:32

get into the mistaken habit of

28:34

believing that Moore's law applies to everything. Alright.

28:37

So a related concept to

28:39

voice recognition is something called natural

28:42

language processing, and this relates

28:44

back to how we humans tend to process information

28:47

compared to the way machines tend to do

28:49

it. So we humans formulate ideas,

28:51

we shape those ideas into words and sentences.

28:54

We communicate them in some way to other

28:56

people through that language. It

28:59

may be through speed you maybe through text.

29:01

It may even be through a nonverbal or non

29:04

literary way, but we communicate

29:07

those ideas. Machines typically accept

29:09

input, they perform some process

29:12

or sequence of processes on that input,

29:15

and then they supply an output of some

29:17

sort. Machines do this in machine

29:19

language. That's a code that's far too

29:22

difficult for humans to process. Easily.

29:24

Binary is an example of machine

29:26

language. Binary is represented as zeros

29:29

and ones, which would group together can

29:31

represent all sorts of stuff. But if you just

29:33

looked at a big block of zeros

29:35

and ones, it would mean nothing to you. It's

29:38

not easy for humans to use, and then

29:40

machines in turn are not natively

29:42

able to understand human language, so there's

29:44

a language barrier there. Because

29:47

of that, people created different

29:49

programming languages. These languages

29:52

provide layers of abstraction from

29:54

the machine language. They make it easier

29:56

to create programs or directions

29:59

that the computer should fall low. So the

30:01

person who's doing the programming is using

30:03

a programming language that's easy for humans

30:05

to use that then gets converted

30:07

into machine language that the computers

30:10

understand. But what if you could send

30:12

commands to a computer using natural

30:14

language, not even programming language.

30:16

You could just speak in Plaine

30:19

vernacular, whether it's English or

30:21

any other language, the way humans

30:24

communicate with one another. What if a

30:26

computer could extract meaning from

30:28

a sentence, understand what it

30:30

was you wanted the computer to do, and

30:32

then respond appropriately. So imagine

30:34

how much time you could save if you could just tell your

30:36

computer what you wanted it to do, and

30:38

it took care of the rest. If

30:40

you had a powerful enough computer system

30:43

with strong enough AI,

30:45

maybe you could even potentially do something like describe

30:48

a game that you would love to be able

30:50

to play, like not not a game that

30:52

exists, a game in your head, and

30:55

you could describe it to a computer and the computer

30:57

could actually program that game. Well,

30:59

we're we're definitely not anywhere close to that

31:02

yet, but we made enormous progress with natural

31:04

language processing. Now, the history

31:06

of natural language processing isn't

31:08

exactly an extension of voice

31:11

recognition. It's actually more like a parallel

31:13

line of investigation. And

31:16

that's because natural language processing doesn't

31:18

require voice recognition. You

31:20

can have an implementation in which you just

31:23

right commands in natural language,

31:25

you know, you type them out on a keyboard and

31:27

the machine then carries out those those

31:29

instructions. So much of the

31:31

early work in natural language processing was in

31:33

text based communication rather

31:35

than in speech. The history of natural

31:38

language processing includes stuff like

31:40

the Turing test, named after Alan

31:42

Turing. So the most common interpretation

31:44

of the Turing test these days is

31:47

that you've got a scenario in which a person is

31:49

alone in a room with a computer terminal,

31:51

they can type whatever they like into the computer

31:54

terminal, and someone or something

31:56

is responding to them in real time. Now

31:59

it might be another person, or it might

32:01

be a computer system that's responding

32:03

to that person. You run

32:06

a whole bunch of test

32:08

subjects through this process,

32:10

and if the computer system is able to fool a

32:12

certain percentage of those test subjects,

32:15

like say thirty percent of them,

32:17

that it is in fact another human and

32:19

not a computer, it is said to have

32:21

passed the Turing test, And

32:24

typically we use that to mean the machine has given

32:26

off the appearance of possessing intelligence

32:29

similar to the one that we humans

32:31

possess. That gets beyond

32:33

our scope for this episode, but

32:35

it helps point out that stuff like speech recognition

32:38

and natural language processing are

32:40

both closely related to the field of artificial

32:42

intelligence. In fact, they really belong within

32:45

the artificial intelligence domain. The

32:47

Turing test was more of a hypothetical.

32:50

It was a bit of a cheeky way of saying, Hey,

32:53

if you can't tell whether or not something is

32:55

intelligent, it makes sense to treat

32:57

it as if it actually is intelligent.

33:00

After all, we assume that every human

33:02

with whom we interact possesses

33:04

some level of intelligence. Based on those

33:06

interactions, so why should we not

33:08

extend the same courtesy to machines. Now,

33:12

natural language processing would prove to be

33:14

another super challenging problem to solve.

33:17

In computer science. Early work was

33:19

done in translation algorithms,

33:21

and these were programs that attempted to take phrases

33:23

written in one language and translate those

33:25

automatically into a second

33:27

language. At first, that seemed pretty straightforward,

33:30

but you realize that's also pretty

33:32

tricky. Really. For one thing, you

33:35

can't just translate word for word and

33:37

keep the same order from one language

33:39

to another. The syntax or

33:41

the rules that the language follow

33:44

uh, they could be different from language to language.

33:47

In one language, you might use an infinitive

33:49

such as to record, in the middle

33:51

of a sentence, while another language might

33:53

put all the infinitives at the end of a sentence.

33:56

So in one language, I might say

33:58

I'm going to record a podcast in the

34:00

studio right now, but in another

34:02

language it might come out as I'm going a

34:04

podcast in the studio right now to record.

34:07

It starts to sound like yoda. There

34:10

was initial excitement around machine

34:12

translation, but once computer scientists

34:14

and linguists began to see the scope

34:16

of this challenge, their excitement

34:19

faded a bit. Also, there was a lot

34:21

of other stuff going on in the nineteen sixties

34:23

and seventies that was demanding a lot of attention,

34:26

such as the Space race. So for

34:28

a while, this branch of computer science

34:30

was given less attention than

34:32

other branches, and by less attention, I

34:35

really mean funding. Now, when

34:37

we come back, we'll talk a bit more about

34:39

the advances that were necessary to support natural

34:41

language processing, and we'll move on to how

34:44

this would be another important component in

34:46

smart speakers. But first, let's take

34:48

another quick break. Okay,

34:57

So early enthusiasm for

34:59

an natural language processing created

35:02

a bit of a hype cycle that ultimately

35:04

crashed into the telephone poll

35:06

of unmet expectations. That

35:10

was a really bad metaphor. Anyway,

35:13

natural language processing went

35:15

through something similar to what we saw

35:17

with virtual reality in the nineteen nineties.

35:19

You know, people saw what was actually

35:22

achievable, and then

35:24

they compared that to what they thought they

35:26

were going to get, and those two things

35:28

didn't match up at all, and that really

35:30

pulled the rug out of funding for natural

35:33

language processing, which men of course,

35:35

that progress slowed way down.

35:37

It kept going, but it

35:40

was definitely on the back burner for a lot

35:42

of projects. When interest renewed

35:44

in the nineteen eighties, there had been

35:46

a shift in thinking around natural

35:49

language processing. Computer scientists

35:51

were starting to look at statistical approaches

35:53

similar to what was going on with speech

35:55

recognition, building up probabilistic

35:58

models in which a computer can start making

36:00

what amounts to educated guesses

36:03

at the meaning of a command or a

36:05

phrase. Machine learning became

36:07

an important component on the back

36:09

end of these systems, and later artificial

36:12

neural networks became an important part

36:14

as well. A neural network processes

36:17

information in a way that's sort of analogous

36:19

to how our brains do it. You

36:21

have nodes or neurons

36:23

that connect to other nodes, and

36:26

each node affects incoming data

36:28

in a certain way, performing some sort of operation

36:31

on it, and the degree to which they

36:33

do that in one way versus another

36:35

is called the weight of that

36:37

node. Computer scientists

36:40

apply weights across the nodes

36:42

in an effort to get a specific result

36:44

in order to train these models.

36:46

So you might feed a specific command

36:49

into such a system, and you let

36:51

it go through the computational process

36:53

from the beginning of the neural network through

36:55

to the end, and then you look at the result,

36:58

and if the result is correct, well,

37:00

that just means the system is already working as

37:02

you intended it, which honestly

37:04

is not likely to happen early on.

37:07

But if it's not correct, then you

37:09

start adjusting the weights on those

37:11

nodes in order to affect

37:13

the outcome. I almost think of it as like Plinko

37:16

or pachinko, where you've got the little coin

37:18

and you drop it down and it bounces on

37:20

all the pegs and sometimes

37:23

you're like you might think, all right, well, this time it's going to

37:25

go right for that center slot, but it

37:27

doesn't, and you think, well, maybe if I remove

37:29

some of these pegs or I shift

37:31

these pegs over a little bit, I can drop

37:33

it in that same spot and get hit the center.

37:36

It's kind of like that, except you're talking about data,

37:38

not physical moving parts. So

37:41

you have to do this a lot, like up

37:43

to like millions of times

37:46

in order to try and train a system so

37:49

that responds appropriately to commands.

37:51

And once it's trained, you can then test

37:53

new commands on the system to see if it can

37:56

parse them and respond appropriately. And

37:58

in this way, the system quote unquote learns

38:01

over time how to respond

38:03

to commands. And then we have another

38:05

component that's important with smart speakers,

38:08

and that's speech generation. So

38:10

it's one thing to have a machine either broadcast

38:13

or play back a recording

38:15

of speech. It's another thing for a

38:17

machine to generate brand

38:19

new speech. In computer science,

38:22

we call it speech synthesis. Now,

38:24

this is the really old technology

38:26

I was alluding to at the beginning of this episode,

38:29

speech synthesis. If you

38:31

want to be really, you know, kind of technical

38:33

about it, it actually predates every

38:36

other technology I've mentioned up to this

38:38

point, at least in its most

38:40

rudimentary implementations. You

38:42

have to go way back to the eighteenth

38:44

century the seventeen seventies, as

38:47

when a Russian smarty pants named Christian

38:49

Kradsenstein was building

38:51

a device that used acoustic resonators.

38:54

These these reads that would vibrate,

38:57

and it was in an attempt to replicate

38:59

basic vowel sounds. Now,

39:01

even with such a working device, it would

39:04

be really difficult to communicate anything meaningful

39:06

unless you were, i guess, speaking

39:08

whale like Dory and finding Nemo.

39:10

But it would be an early example of how people tried

39:13

to create mechanical systems that could

39:15

replicate speech or elements of

39:17

speech. Another inventor named

39:19

Wolfgang von Kimberland built

39:22

an acoustic mechanical speech machine

39:25

and that used reads and

39:27

tubes and a pressure chamber, and

39:29

it was all meant to replicate various

39:31

speech sounds. He had other elements to

39:33

create sounds like plosives, those

39:36

hard sounds that I mentioned

39:38

earlier in the episode. So

39:40

he had all these different elements that, working

39:42

together, could create parts

39:45

of the sounds that we humans

39:47

make when we speak. He also built

39:49

a supposed chess playing machine, and

39:52

it turned out that the chess playing part was a hoax.

39:54

So unfortunately, because

39:57

that device was a hoax, a lot of people

39:59

dismiss his other work, which

40:02

was legitimate. So by

40:05

fudging on one thing, he kind of cast

40:08

doubt on everything he had ever done. Skipping

40:10

ahead quite a bit, we

40:13

get to Homer Dudley, which

40:15

is a fantastic name. He

40:17

unveiled the voter or voice

40:20

Operating Demonstrator device

40:22

at the New York World's Fair in nineteen

40:24

thirty nine. It consisted of

40:26

a complex series of controls and

40:29

it sort of reminds me of something like a

40:31

musical instrument, kind of like a synthesizer,

40:34

but with extra controlling units.

40:36

Like there was like a wrist element, there was

40:38

a pedal. There's a lot of stuff that

40:41

made it very complex, and

40:44

with a lot of practice, you could

40:46

create specific sounds from this

40:48

synthesizer. You could even create

40:50

words or full sentences, though from

40:52

what I understand, it was incredibly

40:55

challenging to do. It was a very high learning

40:57

curve, but it demonstrate the possibility

40:59

of a like tronic synthesized speech. Now.

41:02

There was a lot of work done

41:04

in this field by

41:07

lots of different talented scientists

41:10

and engineers, and someday I'll

41:12

have to do a full episode on the history

41:14

of speech synthesis. It's really fascinating,

41:17

but it's far too big a topic to cover

41:19

in its entirety in this episode. By

41:21

the late nineteen sixties we had our first

41:24

text to speech system,

41:26

and by the late nineteen seventies and early

41:28

nineteen eighties, the state of

41:30

the art had progressed quite a bit and we

41:32

were starting to get to a point where we could create

41:35

very understandable computer

41:37

voices. They weren't natural, they

41:39

didn't sound like people, but you could understand

41:42

what they were saying. And finally, something

41:45

else that would enable smart speakers and virtual

41:47

assistance was the pairing of improved

41:50

network connectivity and cloud computing.

41:52

That removes the need for the device that

41:54

you're interacting with to do all the

41:56

processing on its own. So,

41:59

if you think about or the history of computing,

42:01

we used to do main frames with dumb

42:03

terminals that attached the main frame, so

42:05

the terminal wasn't doing any computing. It

42:07

was just tapping into the mainframe computer,

42:10

which was sending results back to the terminal.

42:12

Then you get to the era of personal computers,

42:15

where you had a device sitting on

42:17

your desk that did all the computing and

42:19

it didn't connect to anything else. Then

42:22

we get up to networking and the

42:24

Internet, where we suddenly

42:26

had the capability of having really powerful

42:28

computers or grids of computers

42:31

that were able to take on processing

42:33

power. Uh, and you just you send

42:35

the request out to the Internet

42:38

and you get the response back. That's the basis

42:40

of cloud computing. So your

42:43

your command or message or whatever

42:46

relays back to servers on the cloud

42:49

that then process it and send the proper

42:51

response to whatever device you're

42:53

interacting with, and then you get the

42:55

result. So with the case of the smart speaker,

42:58

it might be playing a specific so long

43:00

or giving you a weather report or whatever it might

43:02

be. Now, if the speakers were

43:04

doing some of that computation themselves, that

43:06

would be an example of edge

43:09

computing, where the processing

43:11

takes place at least in part, at the

43:13

edge of a network at those end points.

43:16

But for now, most of the implementations

43:18

we see send data back to the cloud

43:21

to get the right response, so you have to have a persistent

43:23

Internet connection. These devices are

43:25

not useful without that connection.

43:27

You do have some smart speakers that can

43:29

connect to another device

43:32

like a smartphone via Bluetooth, so

43:34

you could do things that way, but

43:37

without those connections, the smart speaker

43:39

turns into, you know, just a dumb

43:41

speaker, or sometimes just a paperweight. Now,

43:44

this collection of technologies and disciplines

43:46

are what enabled Apple to introduce

43:49

Sirie in two thousand and eleven, and

43:52

Syria is a virtual assistant.

43:54

Series origins actually trace back to the

43:56

Stanford Research Institute and

43:58

a group of guys Grouber, Adamshire

44:01

and dog kit Louse who

44:03

had been working on the concept since the nineteen

44:05

nineties, and when Apple launched

44:08

the iPhone in two thousand seven, they saw

44:10

the iPhone as a potential platform for

44:12

this virtual assistant that they had

44:14

been building, and they thought, well,

44:16

this is perfect because the iPhone has a microphone,

44:19

so the assistant can respond to voice

44:21

commands as a speaker, so it could communicate

44:24

back to the user, it could do all sorts of stuff.

44:26

We can tap into the interoperability

44:29

of apps on the device. It's a perfect

44:32

platform for us to deploy this. So

44:34

they developed an app once the opportunity

44:36

arose because apps were not available

44:39

for development immediately when Apple

44:41

launched the iPhone, and

44:44

once they did launch that app, uh

44:47

within a month, less than a month,

44:49

Steve Jobs was on the phone calling them up and

44:51

offering to buy the technology, which of

44:53

course they would agree to and it would become an

44:55

integrated component in Apple's iPhone

44:57

line afterward. And that's

45:00

where voice assistants kind of lived

45:02

for a few years. They mostly lived on smartphones

45:05

like the iPhone. But in November

45:07

two thousand fourteen, Amazon introduced

45:10

the Amazon Echo smart speaker,

45:12

which was originally only available for Prime

45:14

members, and it had its own virtual

45:17

assistant named Alexa, and

45:19

thus the smart speaker era officially

45:21

began. Now, there are plenty of

45:23

other smart speakers that are on the market

45:26

these days. There are products from Google

45:28

like Google Home. Uh, there are

45:30

so no speakers that can connect to services

45:33

like Amazon's Alexa or Google's Assistant,

45:35

and we're probably going to see a ton more,

45:38

both from companies that piggyback onto services

45:40

from the big providers like Google and Amazon,

45:43

and maybe some that are trying to make a go of

45:45

it with their own branded virtual assistants

45:47

and services. Smart speakers

45:50

respond to commands after they quote unquote

45:52

here a wake up word or phrase.

45:55

Now, I'm gonna make up a wake

45:57

up phrase right now so that I

45:59

don't set off anyone's smart speaker

46:02

or smart watch or smartphone or smart

46:04

car or whatever it might be. So this

46:07

is just a fictional example of a wake up

46:09

phrase. So let's say I

46:11

have a smart speaker and the wake up

46:13

phrase for my smart speaker happens

46:15

to be hey, they're Genie. Well,

46:18

my smart speaker has a microphone, so it can

46:20

detect when I say that, but

46:23

really it's constantly detecting all

46:26

sounds in its environment.

46:29

The microphone is always active. It has to be

46:31

in order to be able to pick up on when

46:33

I say the wake up phrase. So

46:37

the microphone is always active on most smart

46:39

speakers. There's somewhere you can

46:41

program it so that it will only activate

46:44

if you first touch the speaker and

46:46

that wakes it up. There's some that you

46:48

can do that with, But for the most part, they're

46:50

always listening. While the

46:52

speaker can quote unquote here everything,

46:55

it's not listening to everything.

46:57

In other words, it's not mon

47:00

of during the specific things being said. At least

47:02

that's what we've been told. And honestly, that

47:04

makes a ton of sense from an operational

47:06

standpoint. And the reason I say that is

47:09

that the sheer amount of information

47:11

that would be flooding in from all the microphones

47:14

on all the smart devices from any one

47:17

provider that happened to be deployed

47:19

all over the world, that would be an astounding

47:21

amount of data. And sifting

47:23

through all that data to find stuff that's useful

47:26

would take an enormous amount of effort and time

47:29

and and processing power. So

47:31

while you could have all the microphones

47:33

listening in all over the place, finding

47:36

out who to listen to at what time

47:38

would be a lot trickier and probably not worth

47:40

the effort it would take to pull something like that

47:43

off. So what

47:45

these speakers and other devices are actually

47:48

doing is looking for a signal

47:50

that matches the one that represents the wake

47:52

phrase. So when I say, hey,

47:55

they're Genie, the microphone

47:57

picks up my voice, which the mic

47:59

then try inslates into an electrical signal

48:01

which gets digitized and compared against

48:04

the digital fingerprint of the predesignated

48:06

wake up phrase. And in this

48:08

case, the two phrases match. It's

48:11

like a fingerprint matching something

48:13

that was left at a site. So that

48:16

turns the speaker into an active

48:18

listener rather than a passive one. It's

48:20

ready to accept a command or

48:22

a question and to respond to

48:24

me. But if I didn't say,

48:26

hey, they're Genie, then the speaker

48:29

would remain in passive mode because

48:32

it wouldn't have a digital fingerprint

48:34

that matches the one of the wake

48:36

up phrase. Everything stays at

48:38

the local level, and none of my sweet

48:40

secret speech gets transmitt related

48:42

across the internet. It's all staying right there.

48:45

At least that's what we've been told. And

48:47

again I don't have any reason to disbelieve this,

48:50

but it is something to keep in mind. You are

48:52

talking about devices that have microphones.

48:54

Of course, if you have a smartphone, you've already got one

48:56

of those or a cell phone. In general, you've

48:58

got a device with a microphone on it

49:01

neck near you pretty much all the time. Now,

49:04

once I do make a request with my smart

49:06

speaker, the speaker then sends that request

49:09

up to the cloud where it gets processed,

49:11

It's analyzed, uh, and then

49:13

a proper response is returned

49:15

to me, whether that is playing

49:18

a song or giving me information I've asked

49:20

for, or maybe even interacting with some other smart

49:22

device in my home, such as adjusting

49:24

the brightness of my smart lights

49:26

in my house. Now, if the system

49:29

is not sure about whatever it was I just

49:31

said, it will probably

49:33

return an error phrase. So maybe

49:36

maybe I'm too far away from the speaker,

49:38

so it's it couldn't quote unquote

49:40

hear me really well. Or maybe I've

49:42

got a mouthful of peanut butter or something

49:44

as I want to do. Then I'm going to

49:47

get something like I'm sorry, I don't know how to do

49:49

that, or I'm sorry I didn't understand you, and

49:51

then I'd have to repeat it. Now, smart

49:53

speakers are pretty cool. However, they

49:55

do represent another piece of technology

49:58

that you have to network to

50:00

other devices, including your

50:02

own home network, and as such

50:05

that means that they represent a potential

50:08

vulnerability in a network. It

50:10

doesn't mean they're automatically vulnerable, but

50:13

it means that every time you are connecting

50:15

something to your network, then

50:18

you're creating another potential attack

50:21

vector for a hacker. Right

50:23

now, if everything is super strong, it

50:26

it doesn't really effectively

50:29

change your safety in any

50:31

meaningful way. But if one of those

50:34

things that you connect to your network is less

50:36

strong than the others, you're looking at the

50:38

weakest link situation where a hacker

50:40

with the right know how in tools could

50:42

potentially target that part of your

50:44

network to get entry into

50:46

everything else. And when you're

50:49

talking about a smart speaker, you're

50:51

talking about device that has an active

50:54

microphone on it. So potentially,

50:56

if someone were able to compromise a smart speaker,

50:59

they would be able to listening on anything that

51:01

was within range of that smart

51:03

speakers microphone. So that's

51:06

why you have to at least be

51:08

cognizant of that, do your

51:10

research, make sure the devices you're connecting

51:12

to your network are rated well

51:15

as from a security standpoint, when

51:17

you're setting things up and you have to create

51:19

passwords, create strong passwords

51:22

that are not used anywhere else.

51:24

The harder you make things the more

51:27

likely hackers will just pass you

51:29

by, not because you're

51:31

too tough to crack. Never get

51:33

your into your head that you're too strong

51:35

to to be hacked, but rather

51:38

if there's someone who's weaker than the

51:40

hackers are going to go after that person instead.

51:43

So just don't be the weak person. Practice

51:45

really good security

51:48

behaviors, and you're more likely

51:50

to discourage attackers and

51:52

they'll they'll go on to someone else.

51:55

Um, especially if you're talking

51:57

about newbies who don't really know their way

51:59

around their just using tools that other people

52:01

have designed. They get discouraged very

52:03

quickly. They'll move on to someone else because there's always

52:06

another potential target. I'm

52:08

curious about you guys, whether or not you

52:10

have any smart speakers in your life,

52:13

and uh if you find them useful. I

52:15

find mine pretty useful. I

52:17

use it for a very narrow range

52:20

of things. I don't tend to use it.

52:23

I definitely don't use it to its full potential. I

52:25

know that because what's in the blue

52:27

moon. I'll just try something and

52:29

I'm amazed at what happens when when

52:31

I get a response. But for

52:33

the most part, I'm asking about whether what

52:35

I can feed my dog whether or

52:38

not it can turn on the lights and uh

52:40

and and that's about it. Are

52:42

occasionally playing a song. Um,

52:45

but I'm curious what you guys are using them for. Reach

52:47

Out to me on social networks on Facebook

52:50

and I'm on Twitter, and the handle for both of those

52:52

is text stuff. H s W

52:55

also use that those handles if you

52:57

have suggestions for future episodes. If you've got,

52:59

you know, an idea for either a company

53:01

or a technology or a theme in

53:04

tech you'd really like me to tackle, let

53:06

me know there and I'll talk to you

53:08

again really soon. Text

53:15

Stuff is a production of I Heart Radio's How

53:17

Stuff Works. For more podcasts from my

53:19

heart Radio, visit the i heart Radio

53:21

app, Apple Podcasts, or wherever you

53:23

listen to your favorite shows.

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features