How Smart Speakers Work by TechStuff | Podchaser

Episode from the podcastTechStuff

How Smart Speakers Work

Released Monday, 27th January 2020

1 person rated this episode

How Smart Speakers Work

How Smart Speakers Work

Monday, 27th January 2020

1 person rated this episode

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:04

Welcome to Tech Stuff, a production of

0:06

I Heart Radios How Stuff Works. Hey

0:12

there, and welcome to tech Stuff. I'm your

0:14

host, Jonathan Strickland. I'm an executive producer

0:16

with I Heart Radio and I love all things tech,

0:19

and guys, stick with me. I am

0:22

fighting off a cold. You'll

0:24

be able to hear it in my voice. I have no doubt.

0:26

But you know, I wanted to get you guys

0:29

a brand new episode. So we're gonna fight

0:31

on because the show must

0:34

keep going. I

0:36

think I think this is saying, oh no, this

0:38

cold medicine is good though. All right, Anyway, I thought

0:41

that we would do an episode about

0:44

smart speakers because I

0:46

wanted to kind of start this whole episode off

0:48

with with an old man observation,

0:51

you know, get off my lawn kind of thing.

0:53

And this is from our resident old man,

0:56

old man Strickland. That meaning meaning

0:58

me, So, when I was young, speakers

1:01

were dumb. Now I don't. I don't mean

1:03

that speakers were useless, or

1:05

that they were terrible, or that they

1:07

were incapable of replicating certain

1:09

frequencies or volumes of sound,

1:12

or that they were limited in some other

1:14

way other than they didn't quote

1:16

unquote think they didn't

1:18

connect to any sort of computational

1:21

engine in a meaningful way. You might

1:23

have a set of speakers plugged into a computer,

1:25

but that was just a one way communications tool,

1:27

right. It was just a way to provide an outlet

1:29

for sound that your computer was generating,

1:32

nothing more than that. But contrast

1:34

that with today, when we have numerous

1:36

smart speakers on the market. These speakers

1:39

act as a user interface between

1:41

us and the Internet at large, often

1:43

facilitated by a virtual assistant

1:46

of some kind. Now with these

1:48

speakers, we don't just listen to stuff

1:50

like music and podcasts and

1:53

the radio and you know, other traditional

1:55

audio content. We use them

1:57

to find out information. We might

2:00

link them to our calendars so that

2:02

we can get reminders for upcoming appointments.

2:05

We probably use them to ask about the weather

2:07

report. I use mine at home

2:09

for that all the time, or even

2:11

more often than that, if you're at my house, you'll

2:14

hear us use it to find out which foods

2:16

are safe for us to feed to our dog. My

2:18

doggie, Tibolt, absolutely loves our smart

2:21

speaker because it frequently gives us permission

2:23

to spoil him with a carrot or

2:25

a piece of banana. But how

2:27

do these smart speakers work,

2:30

How are they able to respond to

2:32

our requests? And what are their

2:34

limitations? How safe are they?

2:37

That's the sort of stuff we're gonna be looking into in

2:39

this episode of tech Stuff, and we'll start

2:41

off with the basics, which means

2:43

we have to start off with how speakers work

2:46

in general. Now, this is something

2:48

that I've covered before on tech Stuff, but

2:50

I want to go over it again from a high level

2:52

because well, I just find it fascinating

2:55

that people figured out how to harness electricity

2:58

to drive a motor so that it could

3:00

in turn cause components

3:02

to replicate a recorded or transmitted

3:05

sound. And really motors being too

3:07

generous, but to drive an element to

3:09

create vibrations that could replicate a

3:11

sound that was made into another component,

3:14

that whole thing just boggles my mind that

3:16

people are smart enough to figure that out. Okay,

3:19

So to understand how speakers work,

3:21

it first helps to understand how sound

3:24

itself works. Sound is a

3:26

physical phenomenon. Do

3:28

do do do? Sound is all about vibrations,

3:31

and typically we experience sound

3:33

when we pick up on changes in air pressure

3:36

that enter through our ear

3:38

canal and then affect the tympanic membrane

3:40

or ear drum. So it's

3:42

all about these changes

3:44

of of of air pressure, all

3:46

about air molecules transmitting vibrations

3:49

from a source outward

3:51

in a radiating pattern from from

3:53

that source. So let's think of

3:55

someone knocking on a door. For example,

3:57

you're inside a house, someone's knocking on your door.

4:00

When that person's hand hits the door,

4:03

it causes the door to vibrate, and

4:06

that vibration transmits to the surrounding

4:08

air molecules on the other side of the door.

4:10

They get pushed through that vibration

4:13

and then pulled when the

4:16

the wood is vibrating back towards its

4:18

original position. So the

4:20

air molecules vibrate, those air molecules

4:23

cause the next surrounding layer of

4:25

air molecules to vibrate as well, and

4:27

so on and so forth. It's like a cascade

4:30

or domino effect. You get these little pockets

4:32

of high and low air pressure that travel

4:34

outward from that door.

4:37

It spreads further as it goes towards

4:40

you know, any distance, and if

4:43

you are close enough so that

4:46

you can still detect those changes in air pressure.

4:49

You experience this by hearing the knocking

4:51

on the door. Those vibrating air molecules lose

4:54

a bit of energy as they move

4:56

outward. Right, as they vibrate to the

4:58

next layer, you start to lo use a bit of

5:00

energy with each transmission

5:02

of that So the sound gets quieter

5:05

the further away you are because

5:07

there's not as many air molecules vibrating,

5:10

its amplitude as decreased. So

5:13

if you are in hearing range, you can

5:15

pick up on those changes of air pressure they encounter

5:17

the tympanic membrane in your ear canal. Those

5:19

changes in pressure will cause a reaction in your

5:22

middle and inner ear set that

5:24

will ultimately get picked up by

5:26

your brain that interprets it as sound.

5:29

Now, the frequency at which those fluctuations

5:32

occur relate to the pitch

5:35

that we hear, so faster

5:38

vibrations are higher

5:40

pitches, higher frequencies, higher

5:42

notes. If you think of a musical scale,

5:45

we perceive the force of the

5:47

changes as volume, so

5:50

lower forces lower volume right, and

5:53

higher forces higher volume. The

5:55

human ear can hear a pretty decent

5:57

range of frequencies from twenty hurts,

5:59

which means twenty cycles or

6:01

twenty waves per second

6:04

past a given point of reference,

6:06

to twenty killer hurts. That's twenty

6:09

thousand cycles or waves

6:11

per second. So yeah, the cycle refers

6:14

to the frequency of the wavelength of sound.

6:17

The lower the frequency, the lower the sound.

6:19

All right, and then our brain has to make meaning

6:21

of all this, Right, it's not just that it's

6:23

picking up on it. Our brain interprets

6:26

this and we experience

6:28

it as a sound

6:30

we have heard. So it either matches

6:33

this perceived sound with one we've encountered

6:35

before, and then we

6:38

say, oh, I know what that is. That's someone knocking

6:40

at the door, or they

6:42

might be Holy Cala, I've never heard that

6:45

sound in my life. I have no idea what it is.

6:48

If the sound is language, then our brains

6:50

have to derive the meaning from the perceived

6:53

sound. We've heard someone say

6:55

words such as you're hearing me say this. Then

6:59

our brains have to take that

7:01

collection of sounds and say, what does that actually

7:03

mean? What is the the context,

7:06

what is the the intent? What is the

7:08

message here? Otherwise it would just

7:10

be you know, random noises

7:12

that I'm making with my mouth. Alright,

7:14

so we have a basic understanding behind

7:16

the physics of sound. Now to talk about speakers

7:19

and microphones and the reason I'm

7:21

going to talk about both of them is that

7:24

the devices complement one another. You can

7:26

think of one as being the other in reverse.

7:28

Plus smart speakers

7:30

we have to talk about microphones anyway, because

7:33

smart speakers have microphones as

7:35

well as the speaker element. So

7:38

you can think of this as one long process

7:40

of taking the physical phenomena of

7:42

sound waves, transforming

7:44

that physical phenomena into an electrical

7:47

signal, taking the electrical signal,

7:49

and changing it back into something that can produce

7:51

the sound waves that started the whole

7:53

thing. So you're replicating the original

7:56

sound waves with this end

7:59

device, which in this case is allowed speaker.

8:01

So the microphone is the part of the process where

8:04

you take the sound and you turn it into an electrical

8:06

signal, and the speakers where you take the

8:08

electrical signal and you turn it back into actual

8:10

sound. That's the simple way. But what's actually

8:12

happening, Well, let's

8:14

talk about on a physical level. Sound

8:16

waves go into a microphone.

8:19

So you've got these fluctuations

8:22

and air pressure that encounter a microphone.

8:24

I'm speaking into a microphone right now,

8:27

so this is happening right now. Inside

8:29

the microphone is a very thin diaphragm,

8:32

typically made out of a very flexible

8:34

plastic, and it's sort

8:36

of like the skin of a drum. So

8:38

as the changes in air pressure encounter

8:41

the diaphragm, they cause the diaphragm

8:43

to move back and forth. Well. Attached

8:46

to the diaphragm is a coil of

8:48

conductive wire, and that coil

8:50

wraps either around or near

8:52

a permanent magnet. Magnets have

8:54

magnetic fields. They have a north pole

8:57

and a south pole, and there's a magnetic field

8:59

that surrounds the magnet.

9:02

And the electro magnetic effect means

9:05

that if you move a coil

9:07

of conductive wire through

9:09

a magnetic field, it will produce

9:11

a change in voltage in that coil,

9:14

otherwise known as electromotive

9:16

force, and that means electrical current

9:19

will flow through the coil. Now,

9:21

if you have the end of that coil attached

9:24

to a wire, a conductive

9:26

wire for that current

9:28

to flow through, you can send that current

9:30

onto other components. So for our

9:33

purposes, the component in question

9:35

would be an amplifier, and I'll get

9:37

to explaining why that is in just a

9:39

moment, but first let's talk about loud

9:41

speakers, and the way allowed speaker works

9:44

is essentially the reverse

9:46

of a microphone. You've got your permanent

9:48

magnet around or near which

9:51

is a coil of conductive wire. The

9:53

wire is connected to a diaphragm,

9:56

one much larger and typically made

9:58

out of stiffer material that the plastic

10:01

you'd find in a microphone. This

10:03

is the element inside a speaker that will

10:05

vibrate, that will push air and pull

10:08

air as it moves either

10:10

outward or inward. The electrical

10:12

signal comes from a source such as

10:15

the microphone we were just using a second

10:17

ago that comes into the loudspeaker

10:19

and it flows through the coil. Now,

10:22

when you have an electrical current flowing

10:24

through a conductive coil, you

10:27

generate a magnetic field because

10:29

the laws of electromagnetism. You've

10:31

got the electro magnetic

10:34

field generated as a result. Now

10:36

that field will interact with the magnetic

10:38

field of the permanent magnet. That the

10:41

permnet magnet always has a magnetic field.

10:43

The coil only has one when electric

10:45

current is flowing through it. And

10:47

as I said, we have magnets to have a north

10:50

pole and a south pole. And we also know

10:52

that when we bring two magnets with

10:54

their north poles together, they'll

10:56

push against each other, right because like

10:59

repels like, But if

11:01

we turn one of those magnets around so that

11:03

now it's a south pole and a north pole,

11:06

they attract one another, you

11:08

know, opposites attract. So

11:11

by having the this magnetic

11:14

field being generated by the coil, uh,

11:17

it starts to generate

11:20

interactions with the magnetic field of the permanent

11:23

magnet, so they

11:25

start to push and pull against each other. Well,

11:28

the coil is attached to that diaphragm,

11:30

so it in turn drives the diaphragm

11:33

to either push outward or pull inward.

11:36

That causes air molecules

11:38

to vibrate, just as it would

11:41

with any other you know, source of sound,

11:43

and it emanates outward from the loudspeaker,

11:46

so you get a representation

11:48

of the same sound that was going into

11:51

the microphone got converted

11:53

into an electrical current. The electrical

11:55

current then was passed

11:58

through a coil and next to a permanent

12:00

magnet to create the same sort

12:02

of movement. It replicates the movement of the

12:04

original diaphragm in the microphone and

12:07

generates the sound. So

12:09

you get the replication of the sound

12:11

that was made in the other location. It's

12:14

pretty cool. I think now I did

12:16

mention earlier that you would need

12:18

an amplifier. And the reason you need

12:20

an amplifier is that the electrical signal

12:22

generated by a microphone is

12:24

far too weak to drive allowed

12:27

speakers diaphragm. You just wouldn't

12:29

have the juice to do it. It would be

12:32

much much less, uh powerful

12:34

than what the speaker would need. So chances

12:36

are the diaphragm would either not move at

12:39

all because it would just be too stiff, it would resist

12:41

the movement too much, or it would move

12:44

so weakly as to generate little

12:46

to no sound, so it wouldn't do you any good. So

12:49

the signal from the microphone has to

12:51

first pass through an amplifier, which, as the

12:53

name implies, takes an incoming signal

12:55

and increases the amplitude of

12:57

that signal the volume. In other words,

13:00

uh so, it doesn't affect pitch, but it does

13:02

affect the signal strength and consequently the

13:04

volume. And I've done episodes

13:06

about amplifiers, including explaining

13:09

the difference between amplifiers that use vacuum

13:11

tubes and ones that use transistors,

13:14

so I'm not going to go into that here.

13:16

Besides, it doesn't really factor

13:18

into our conversation about smart speakers

13:20

anyway. It's just important for

13:23

it to work with a microphone and speaker

13:25

setting. Now, over the years, engineers

13:27

have paired microphones and speakers and lots

13:30

of stuff. You've got telephones, you've

13:32

got intercom systems, public address

13:34

systems, handheld radios, all sorts of

13:36

things, so that technology was

13:38

well and truly mature. Before

13:40

we ever got our first smart speaker,

13:43

there wasn't much call to incorporate

13:45

microphones into home speaker systems

13:48

for many years. I mean, what would

13:50

you actually use a microphone embedded

13:52

in a speaker for? Before smart speakers,

13:54

Typically you would have your speakers

13:56

like I'm talking about, like like sound system speakers.

13:59

You would have them hooked up to some other dumb

14:02

as in, not connected to a network

14:04

technology. So it might be a sound

14:06

system or home entertainment set

14:08

up with a television as the focal point, or maybe

14:11

even you know, a computer for the purposes

14:13

of playing more dynamic sounds for like video

14:15

games and and things like that.

14:18

Um. But for a very long time,

14:21

these were all thought of as one way communications

14:23

applications, right, Like, the sound was

14:25

coming from a source and it would get to us

14:27

through the speakers, but we weren't meant to send

14:31

sound back through those same channels.

14:33

The information was just coming to you. You weren't

14:35

sending anything back, But that would all change

14:38

in time. Now. One thing to keep in

14:40

mind about smart speakers is that

14:42

they are the product of several different technologies

14:45

and lines of innovation and development that all

14:47

converged together. The microphone

14:49

and speaker technology is one of the oldest

14:52

ones that we can point to as far as the

14:54

fundamental underlying technology

14:56

is concerned, the stuff that's been

14:58

around since the late nineties century.

15:00

Now there is one other we'll talk about that's

15:02

even older. But I don't

15:04

want to spoil things. I'll just mention there

15:07

is an even older line of

15:09

development that goes into smart speakers

15:12

than the microphone speaker stuff of

15:14

the nineteenth century. Most of the

15:16

other components, however, are much younger

15:18

than that. One big one is

15:21

speech or voice recognition. Creating

15:24

computer systems that could detect noise

15:26

was relatively simple. Right. You could have a computer

15:29

connected to microphones and they

15:31

could monitor the input from those

15:33

microphones and any incoming

15:35

signal could be registered. Right, they could

15:37

record an incoming signal that

15:39

would indicate the microphone had detected a

15:41

noise. That's child's play. That's

15:43

easy to do. But teaching computers

15:46

how to analyze those signals and decipher

15:48

them so that the computer could display

15:50

in text or otherwise act

15:53

upon that that sound

15:55

in a meaningful way that was much

15:57

more difficult. There was

16:00

an IBM engineer named William

16:02

C. Dirsh of the Advanced System

16:04

Development Division who created an early

16:07

implementation of voice recognition. It

16:09

was a very limited application, but it

16:11

proved that the ability to interact

16:13

with computers by voice was more

16:15

than just science fiction. Within

16:18

IBM. It was called the Shoebox.

16:21

Dirsh worked on this project in

16:23

the early nineteen sixties and what he

16:25

produced was a machine that had a

16:27

microphone attached to it. The

16:29

machine could detect sixteen spoken

16:32

words, which included the digits

16:34

of zero to nine plus

16:37

some command indicators like plus

16:39

minus total, sub total.

16:42

You get the idea. So you could speak a

16:44

string of numbers and then commands

16:46

to this device, then ask it to total

16:48

everything and it would do so. So it was

16:51

more or less a basic calculator

16:53

with some voice interpretation incorporated

16:56

into it. Now there's a

16:58

great newsreel piece about this

17:01

shoebox. There's a demonstration of it, and

17:03

it came out in nineteen one, and

17:05

I love that newsreel because it has

17:08

that great music you would hear in the background of

17:10

those old industrial and business films.

17:12

Anyway, there's also a helpful chart

17:14

that hangs in the background of

17:17

that video where Dersh is

17:19

actually explaining how it works. You

17:21

can see a little bit behind him

17:24

what the what is actually being analyzed

17:27

and uh he broke the words down

17:29

into phonemes and syllables, so

17:31

phonemes being specific sounds

17:34

that make up words. So,

17:36

for example, the digit one is

17:39

a single syllable word with a vowel

17:41

sound right at the front. But you also

17:43

have the word eight that's

17:46

another single syllable word as

17:48

a vowel sound right at the front, but

17:50

it's different from one phonetically

17:53

in that eight also has a

17:55

plosive and has that hard t at

17:57

the end. So the shoebox

17:59

was limited not just in what

18:02

words it could recognize, but also the

18:05

types of voices it could recognize.

18:07

Get someone who has a different dialect or

18:09

manner of speech, and the machine might not be able to

18:12

understand them because they're not pronouncing

18:14

the words the same way that drsh did.

18:17

This would be a big challenge in

18:20

speech recognition moving forward, and

18:22

it's also an example of where we

18:24

find bias creeping into technology.

18:27

And it's not necessarily a conscious thing,

18:30

but if you have people designing

18:32

a system and they're designing it based

18:34

off their own uh,

18:36

you know, speech patterns, their own

18:39

pronunciations, their own dialects,

18:41

then it may be that the system

18:44

they create works really well for them

18:46

and less well for anyone who isn't

18:48

them, And the further away you are from

18:50

their manner of speaking, the

18:53

more frustration you will encounter

18:56

as you try to interact with that technology.

18:59

That's an example of s and in

19:01

fact, if you read the histories

19:03

of speech recognition and as we'll get

19:05

too later natural language processing,

19:08

you'll see a lot of people say it works

19:10

great if you happen to be a white

19:12

man, because the

19:15

manner of speech was being or

19:17

the people who were designing it were primarily

19:19

white men who were uh

19:22

typically aiming for a a what

19:25

is considered a non accented

19:27

American dialect somewhere

19:30

in you know, the Eastern seaboard

19:32

side. But that meant

19:34

that if you did have an accent or a dialect,

19:37

or you had a different vernacular,

19:40

that it was harder for the systems

19:42

to actually understand what you were saying. That's

19:44

an example of bias. Well. The

19:46

general strategy was again to break up

19:49

speech and too constituent sound units, you

19:51

know, those phonemes, and then to susse out

19:53

which words were being spoken

19:55

based on those phonemes, and

19:57

that was done by digitizing the voice train,

20:00

forming it from sound into data

20:02

that represented stuff like the sounds

20:04

frequency or pitch, and then

20:07

matching up specific signal signal

20:09

signatures with specific phone nmes.

20:11

So generally the idea was that the computer system would

20:13

monitor incoming sound, convert the

20:15

sound into digital data, compare that

20:18

data that had received with information

20:20

stored in a database, and effort to

20:22

look for matches. Uh. The shoebox

20:25

database was just sixteen words and size.

20:27

Later ones would be much larger, but pretty quickly

20:30

people realized this was

20:32

not an efficient way of doing

20:34

speech recognition because the bigger

20:36

the vocabulary, the more work intens

20:38

of it was to build out those databases.

20:41

So it wasn't something that people thought would

20:43

be sustainable for very

20:45

large vocabularies. But the Shoebox

20:48

marked the beginning of a serious effort to create machines

20:50

that could accept audio cues as actual input,

20:52

and as we'll see, that's one important

20:55

component for these smart speaker systems.

20:57

I've got a lot more to say, but before I get into

20:59

the next part, let's take a quick break.

21:09

Now, obviously we didn't jump right

21:11

into full voice recognition right after

21:13

IBM S Shoebus innovation. The

21:16

challenges related to building automated

21:18

speech recognition systems were numerous,

21:20

even for just a single language,

21:23

because, as I said, you can have accents and

21:25

dialects. One voice can have a

21:27

very different tonal quality from another,

21:30

people speak at different speeds. Teaching

21:32

machines how to recognize speech when the phonemes

21:34

and pacing of that speech aren't

21:37

consistent from speaker to speaker, that's

21:40

really hard. This kind of gets back to

21:42

the same sort of challenges you have when you're teaching

21:44

machines how to recognize images. You

21:47

know, you teach a human what a

21:50

coffee mug is. I always use this example,

21:52

but you teach a human what a coffee mug is, and

21:54

pretty soon they can extrapolate from

21:56

that example and understand

21:58

that coffee mugs can them in all different sizes

22:02

and colors, and you know different

22:04

designs and textures. We

22:07

get it. Like you you see a couple of coffee mugs,

22:09

you understand machines though

22:12

they aren't able to do that. Machines,

22:14

you know, you have to give them lots and lots

22:17

and lots of different examples before they can

22:19

start to pick up on what things

22:22

actually make a coffee mug. Same

22:24

sort of thing with speech, right, So

22:27

if you don't have consistency between

22:29

speakers, it makes it very hard for machines

22:31

to learn what people are saying. Now,

22:33

it didn't take long for the tech industry at

22:35

large to really dive into trying to solve

22:38

this problem. In ninete, DARPA,

22:41

that's the Research and Development division of

22:43

the United States Department of Defense,

22:45

got behind speech recognition in a big

22:47

way. Now, remember darp it self

22:50

doesn't do research. The organization's

22:53

purpose is to invite organizations

22:55

to pitch projects that align

22:57

with whatever darpest goals are and

23:00

and DARBA would provide funding to the

23:02

winning organizations to see

23:05

these projects to completion if possible.

23:07

So DARK is really more of a vetting and funding

23:10

organization anyway. In

23:12

n DARPA created

23:14

a five year program called Speech

23:16

Understanding Research or s u

23:18

are. The initial goal was

23:20

pretty darn ambitious considering the capabilities

23:23

of the technology at the time. The project

23:25

director, Larry Roberts, wanted

23:27

a system that would be capable of recognizing

23:29

a vocabulary of ten thousand words

23:32

with less than ten percent error. After

23:34

holding a few meetings with some of the leading

23:37

computer engineers of the day, Roberts

23:39

suggusted that goal significantly.

23:42

After that adjustment, the target was going

23:44

to be a system capable of recognizing

23:47

one thousand words, not ten

23:49

thousand. Nearror levels

23:51

still had to be less than ten percent, and

23:54

the goal was for the system to be able to accept

23:56

continuous speech, as opposed

23:58

to very deliberate

24:01

speech with pauses

24:05

between each pair of words that would

24:07

not be really that useful.

24:11

One person who was skeptical about

24:13

the potential success of this project

24:15

was John R. Pierce of Bell

24:17

Labs. He argued that any

24:19

success would be limited so long

24:21

as machines remained incapable of understanding

24:24

the words, not just recognizing

24:27

a word based on phone names, but understanding

24:29

what the word is. That is. Pierce felt that

24:31

the machines needed some way to parse the

24:33

language to get to the meaning of what was

24:35

being said. That's an important

24:37

idea that we will come back to in just a bit now.

24:40

Among the companies and organizations that landed

24:42

contracts with DARPA were a Carnegie

24:44

Melon University BBN, which

24:46

actually played a big part in developing our ponette,

24:49

the predecessor to the Internet, Lincoln

24:51

Laboratory, and several more and

24:54

very smart people began to create systems

24:56

intended to recognize speech and meaningful

24:58

ways. The names of the programs

25:00

were a lot of fun. There was h W I

25:03

M that stood for hear what

25:05

I mean as in here as in listen

25:07

hear what I mean. That one was from BBN.

25:10

CMU introduced hearsay,

25:12

which was later designated as Hearsay one,

25:15

and then they came out with Hearsay two. They

25:17

also would demonstrate another

25:20

one called harpy.

25:22

Oh, and there was a professor at CMU named Dr

25:24

James Baker who would design a system

25:27

called Dragon in nineteen seventy

25:29

five that he would later leverage into a company

25:31

with his wife, Dr Janet M. Baker

25:34

in the nineteen eighties, and they had a very successful

25:36

business with speech recognition software.

25:39

Now, I'm not going to go into each of those programs

25:42

in deep detail, but rather just mentioned

25:44

that they all helped advance the cause of

25:46

creating systems that can recognize speech.

25:49

One of the big developments that came out of all

25:51

that work was a shift to probabilistic

25:54

models, which would also play a really

25:56

important part in another phase of

25:58

developing the smart speaker. So what do

26:00

I mean when I say probabilistic? Well,

26:02

as the name indicates, it all has

26:04

to do with probabilities. Essentially,

26:07

systems would analyze incoming phonemes

26:09

and make guesses as to what

26:11

was being said based on the probability

26:14

of it being a given word or part

26:16

of a word. The systems typically

26:18

go with whatever word has the highest

26:20

probability of being the correct one.

26:23

Even with that approach, there are nuances

26:26

to language that are difficult to account for with

26:28

a machine. So, for example, you have homonyms

26:30

and which you have two words that sound the same

26:33

but have very different meanings and potentially

26:35

spellings like right as

26:37

in to write a sentence, or

26:40

right as in am I right? Or

26:42

am I wrong? Or you could have a pair

26:44

of words that sound like

26:46

a single word and have confusion

26:48

there, such as a door. You

26:51

can say a door you mean you're meaning a

26:53

single door a door to go into

26:55

a building, or you might say a dore as

26:57

an I adore this podcast

26:59

you're doing, Jonathan. That's sweet of you, Thank

27:02

you for saying that. So computer

27:04

scientists were hard at work advancing

27:07

both the capability of machines to make

27:09

correct guesses at individual phone names

27:12

and then full words, as well as

27:14

figuring out a way to teach machines to adjust

27:16

guesses based on context. That

27:19

requires a deeper understanding of the language

27:22

within which you're working. If you're aware

27:24

of certain idioms, you can make a

27:26

good guess at a word or phrase even if

27:28

you didn't get a clean pass at

27:31

it right. So, for example, the

27:33

phrase it's raining cats and dogs

27:35

just means it's raining a lot. And if

27:37

a system included a database that indicated

27:40

the phrase cats and dogs sometimes

27:42

follows the phrase it's raining, then

27:45

the system is more likely to guess the correct

27:48

sequence of words instead

27:50

of guessing something that sounded similar

27:52

but it's wrong. For example, if it

27:54

said, oh, they must have said it's raining bats

27:57

and hogs, that

27:59

would not makes sense. So

28:02

the systems estimate the probability

28:04

that any given sequence of sounds within

28:07

the database matches what the systems

28:09

have just quote unquote heard progress

28:11

in this area was steady, but slow, and

28:14

I'd argue that it was also a reminder that concepts

28:16

like Moore's law do not apply universally

28:19

across technology. Rapid development

28:21

in one particular domain of technology is

28:24

not necessarily an indicator that the same sort

28:26

of progress will be observed in all other

28:28

areas of tech. We often

28:32

get into the mistaken habit of

28:34

believing that Moore's law applies to everything. Alright.

28:37

So a related concept to

28:39

voice recognition is something called natural

28:42

language processing, and this relates

28:44

back to how we humans tend to process information

28:47

compared to the way machines tend to do

28:49

it. So we humans formulate ideas,

28:51

we shape those ideas into words and sentences.

28:54

We communicate them in some way to other

28:56

people through that language. It

28:59

may be through speed you maybe through text.

29:01

It may even be through a nonverbal or non

29:04

literary way, but we communicate

29:07

those ideas. Machines typically accept

29:09

input, they perform some process

29:12

or sequence of processes on that input,

29:15

and then they supply an output of some

29:17

sort. Machines do this in machine

29:19

language. That's a code that's far too

29:22

difficult for humans to process. Easily.

29:24

Binary is an example of machine

29:26

language. Binary is represented as zeros

29:29

and ones, which would group together can

29:31

represent all sorts of stuff. But if you just

29:33

looked at a big block of zeros

29:35

and ones, it would mean nothing to you. It's

29:38

not easy for humans to use, and then

29:40

machines in turn are not natively

29:42

able to understand human language, so there's

29:44

a language barrier there. Because

29:47

of that, people created different

29:49

programming languages. These languages

29:52

provide layers of abstraction from

29:54

the machine language. They make it easier

29:56

to create programs or directions

29:59

that the computer should fall low. So the

30:01

person who's doing the programming is using

30:03

a programming language that's easy for humans

30:05

to use that then gets converted

30:07

into machine language that the computers

30:10

understand. But what if you could send

30:12

commands to a computer using natural

30:14

language, not even programming language.

30:16

You could just speak in Plaine

30:19

vernacular, whether it's English or

30:21

any other language, the way humans

30:24

communicate with one another. What if a

30:26

computer could extract meaning from

30:28

a sentence, understand what it

30:30

was you wanted the computer to do, and

30:32

then respond appropriately. So imagine

30:34

how much time you could save if you could just tell your

30:36

computer what you wanted it to do, and

30:38

it took care of the rest. If

30:40

you had a powerful enough computer system

30:43

with strong enough AI,

30:45

maybe you could even potentially do something like describe

30:48

a game that you would love to be able

30:50

to play, like not not a game that

30:52

exists, a game in your head, and

30:55

you could describe it to a computer and the computer

30:57

could actually program that game. Well,

30:59

we're we're definitely not anywhere close to that

31:02

yet, but we made enormous progress with natural

31:04

language processing. Now, the history

31:06

of natural language processing isn't

31:08

exactly an extension of voice

31:11

recognition. It's actually more like a parallel

31:13

line of investigation. And

31:16

that's because natural language processing doesn't

31:18

require voice recognition. You

31:20

can have an implementation in which you just

31:23

right commands in natural language,

31:25

you know, you type them out on a keyboard and

31:27

the machine then carries out those those

31:29

instructions. So much of the

31:31

early work in natural language processing was in

31:33

text based communication rather

31:35

than in speech. The history of natural

31:38

language processing includes stuff like

31:40

the Turing test, named after Alan

31:42

Turing. So the most common interpretation

31:44

of the Turing test these days is

31:47

that you've got a scenario in which a person is

31:49

alone in a room with a computer terminal,

31:51

they can type whatever they like into the computer

31:54

terminal, and someone or something

31:56

is responding to them in real time. Now

31:59

it might be another person, or it might

32:01

be a computer system that's responding

32:03

to that person. You run

32:06

a whole bunch of test

32:08

subjects through this process,

32:10

and if the computer system is able to fool a

32:12

certain percentage of those test subjects,

32:15

like say thirty percent of them,

32:17

that it is in fact another human and

32:19

not a computer, it is said to have

32:21

passed the Turing test, And

32:24

typically we use that to mean the machine has given

32:26

off the appearance of possessing intelligence

32:29

similar to the one that we humans

32:31

possess. That gets beyond

32:33

our scope for this episode, but

32:35

it helps point out that stuff like speech recognition

32:38

and natural language processing are

32:40

both closely related to the field of artificial

32:42

intelligence. In fact, they really belong within

32:45

the artificial intelligence domain. The

32:47

Turing test was more of a hypothetical.

32:50

It was a bit of a cheeky way of saying, Hey,

32:53

if you can't tell whether or not something is

32:55

intelligent, it makes sense to treat

32:57

it as if it actually is intelligent.

33:00

After all, we assume that every human

33:02

with whom we interact possesses

33:04

some level of intelligence. Based on those

33:06

interactions, so why should we not

33:08

extend the same courtesy to machines. Now,

33:12

natural language processing would prove to be

33:14

another super challenging problem to solve.

33:17

In computer science. Early work was

33:19

done in translation algorithms,

33:21

and these were programs that attempted to take phrases

33:23

written in one language and translate those

33:25

automatically into a second

33:27

language. At first, that seemed pretty straightforward,

33:30

but you realize that's also pretty

33:32

tricky. Really. For one thing, you

33:35

can't just translate word for word and

33:37

keep the same order from one language

33:39

to another. The syntax or

33:41

the rules that the language follow

33:44

uh, they could be different from language to language.

33:47

In one language, you might use an infinitive

33:49

such as to record, in the middle

33:51

of a sentence, while another language might

33:53

put all the infinitives at the end of a sentence.

33:56

So in one language, I might say

33:58

I'm going to record a podcast in the

34:00

studio right now, but in another

34:02

language it might come out as I'm going a

34:04

podcast in the studio right now to record.

34:07

It starts to sound like yoda. There

34:10

was initial excitement around machine

34:12

translation, but once computer scientists

34:14

and linguists began to see the scope

34:16

of this challenge, their excitement

34:19

faded a bit. Also, there was a lot

34:21

of other stuff going on in the nineteen sixties

34:23

and seventies that was demanding a lot of attention,

34:26

such as the Space race. So for

34:28

a while, this branch of computer science

34:30

was given less attention than

34:32

other branches, and by less attention, I

34:35

really mean funding. Now, when

34:37

we come back, we'll talk a bit more about

34:39

the advances that were necessary to support natural

34:41

language processing, and we'll move on to how

34:44

this would be another important component in

34:46

smart speakers. But first, let's take

34:48

another quick break. Okay,

34:57

So early enthusiasm for

34:59

an natural language processing created

35:02

a bit of a hype cycle that ultimately

35:04

crashed into the telephone poll

35:06

of unmet expectations. That

35:10

was a really bad metaphor. Anyway,

35:13

natural language processing went

35:15

through something similar to what we saw

35:17

with virtual reality in the nineteen nineties.

35:19

You know, people saw what was actually

35:22

achievable, and then

35:24

they compared that to what they thought they

35:26

were going to get, and those two things

35:28

didn't match up at all, and that really

35:30

pulled the rug out of funding for natural

35:33

language processing, which men of course,

35:35

that progress slowed way down.

35:37

It kept going, but it

35:40

was definitely on the back burner for a lot

35:42

of projects. When interest renewed

35:44

in the nineteen eighties, there had been

35:46

a shift in thinking around natural

35:49

language processing. Computer scientists

35:51

were starting to look at statistical approaches

35:53

similar to what was going on with speech

35:55

recognition, building up probabilistic

35:58

models in which a computer can start making

36:00

what amounts to educated guesses

36:03

at the meaning of a command or a

36:05

phrase. Machine learning became

36:07

an important component on the back

36:09

end of these systems, and later artificial

36:12

neural networks became an important part

36:14

as well. A neural network processes

36:17

information in a way that's sort of analogous

36:19

to how our brains do it. You

36:21

have nodes or neurons

36:23

that connect to other nodes, and

36:26

each node affects incoming data

36:28

in a certain way, performing some sort of operation

36:31

on it, and the degree to which they

36:33

do that in one way versus another

36:35

is called the weight of that

36:37

node. Computer scientists

36:40

apply weights across the nodes

36:42

in an effort to get a specific result

36:44

in order to train these models.

36:46

So you might feed a specific command

36:49

into such a system, and you let

36:51

it go through the computational process

36:53

from the beginning of the neural network through

36:55

to the end, and then you look at the result,

36:58

and if the result is correct, well,

37:00

that just means the system is already working as

37:02

you intended it, which honestly

37:04

is not likely to happen early on.

37:07

But if it's not correct, then you

37:09

start adjusting the weights on those

37:11

nodes in order to affect

37:13

the outcome. I almost think of it as like Plinko

37:16

or pachinko, where you've got the little coin

37:18

and you drop it down and it bounces on

37:20

all the pegs and sometimes

37:23

you're like you might think, all right, well, this time it's going to

37:25

go right for that center slot, but it

37:27

doesn't, and you think, well, maybe if I remove

37:29

some of these pegs or I shift

37:31

these pegs over a little bit, I can drop

37:33

it in that same spot and get hit the center.

37:36

It's kind of like that, except you're talking about data,

37:38

not physical moving parts. So

37:41

you have to do this a lot, like up

37:43

to like millions of times

37:46

in order to try and train a system so

37:49

that responds appropriately to commands.

37:51

And once it's trained, you can then test

37:53

new commands on the system to see if it can

37:56

parse them and respond appropriately. And

37:58

in this way, the system quote unquote learns

38:01

over time how to respond

38:03

to commands. And then we have another

38:05

component that's important with smart speakers,

38:08

and that's speech generation. So

38:10

it's one thing to have a machine either broadcast

38:13

or play back a recording

38:15

of speech. It's another thing for a

38:17

machine to generate brand

38:19

new speech. In computer science,

38:22

we call it speech synthesis. Now,

38:24

this is the really old technology

38:26

I was alluding to at the beginning of this episode,

38:29

speech synthesis. If you

38:31

want to be really, you know, kind of technical

38:33

about it, it actually predates every

38:36

other technology I've mentioned up to this

38:38

point, at least in its most

38:40

rudimentary implementations. You

38:42

have to go way back to the eighteenth

38:44

century the seventeen seventies, as

38:47

when a Russian smarty pants named Christian

38:49

Kradsenstein was building

38:51

a device that used acoustic resonators.

38:54

These these reads that would vibrate,

38:57

and it was in an attempt to replicate

38:59

basic vowel sounds. Now,

39:01

even with such a working device, it would

39:04

be really difficult to communicate anything meaningful

39:06

unless you were, i guess, speaking

39:08

whale like Dory and finding Nemo.

39:10

But it would be an early example of how people tried

39:13

to create mechanical systems that could

39:15

replicate speech or elements of

39:17

speech. Another inventor named

39:19

Wolfgang von Kimberland built

39:22

an acoustic mechanical speech machine

39:25

and that used reads and

39:27

tubes and a pressure chamber, and

39:29

it was all meant to replicate various

39:31

speech sounds. He had other elements to

39:33

create sounds like plosives, those

39:36

hard sounds that I mentioned

39:38

earlier in the episode. So

39:40

he had all these different elements that, working

39:42

together, could create parts

39:45

of the sounds that we humans

39:47

make when we speak. He also built

39:49

a supposed chess playing machine, and

39:52

it turned out that the chess playing part was a hoax.

39:54

So unfortunately, because

39:57

that device was a hoax, a lot of people

39:59

dismiss his other work, which

40:02

was legitimate. So by

40:05

fudging on one thing, he kind of cast

40:08

doubt on everything he had ever done. Skipping

40:10

ahead quite a bit, we

40:13

get to Homer Dudley, which

40:15

is a fantastic name. He

40:17

unveiled the voter or voice

40:20

Operating Demonstrator device

40:22

at the New York World's Fair in nineteen

40:24

thirty nine. It consisted of

40:26

a complex series of controls and

40:29

it sort of reminds me of something like a

40:31

musical instrument, kind of like a synthesizer,

40:34

but with extra controlling units.

40:36

Like there was like a wrist element, there was

40:38

a pedal. There's a lot of stuff that

40:41

made it very complex, and

40:44

with a lot of practice, you could

40:46

create specific sounds from this

40:48

synthesizer. You could even create

40:50

words or full sentences, though from

40:52

what I understand, it was incredibly

40:55

challenging to do. It was a very high learning

40:57

curve, but it demonstrate the possibility

40:59

of a like tronic synthesized speech. Now.

41:02

There was a lot of work done

41:04

in this field by

41:07

lots of different talented scientists

41:10

and engineers, and someday I'll

41:12

have to do a full episode on the history

41:14

of speech synthesis. It's really fascinating,

41:17

but it's far too big a topic to cover

41:19

in its entirety in this episode. By

41:21

the late nineteen sixties we had our first

41:24

text to speech system,

41:26

and by the late nineteen seventies and early

41:28

nineteen eighties, the state of

41:30

the art had progressed quite a bit and we

41:32

were starting to get to a point where we could create

41:35

very understandable computer

41:37

voices. They weren't natural, they

41:39

didn't sound like people, but you could understand

41:42

what they were saying. And finally, something

41:45

else that would enable smart speakers and virtual

41:47

assistance was the pairing of improved

41:50

network connectivity and cloud computing.

41:52

That removes the need for the device that

41:54

you're interacting with to do all the

41:56

processing on its own. So,

41:59

if you think about or the history of computing,

42:01

we used to do main frames with dumb

42:03

terminals that attached the main frame, so

42:05

the terminal wasn't doing any computing. It

42:07

was just tapping into the mainframe computer,

42:10

which was sending results back to the terminal.

42:12

Then you get to the era of personal computers,

42:15

where you had a device sitting on

42:17

your desk that did all the computing and

42:19

it didn't connect to anything else. Then

42:22

we get up to networking and the

42:24

Internet, where we suddenly

42:26

had the capability of having really powerful

42:28

computers or grids of computers

42:31

that were able to take on processing

42:33

power. Uh, and you just you send

42:35

the request out to the Internet

42:38

and you get the response back. That's the basis

42:40

of cloud computing. So your

42:43

your command or message or whatever

42:46

relays back to servers on the cloud

42:49

that then process it and send the proper

42:51

response to whatever device you're

42:53

interacting with, and then you get the

42:55

result. So with the case of the smart speaker,

42:58

it might be playing a specific so long

43:00

or giving you a weather report or whatever it might

43:02

be. Now, if the speakers were

43:04

doing some of that computation themselves, that

43:06

would be an example of edge

43:09

computing, where the processing

43:11

takes place at least in part, at the

43:13

edge of a network at those end points.

43:16

But for now, most of the implementations

43:18

we see send data back to the cloud

43:21

to get the right response, so you have to have a persistent

43:23

Internet connection. These devices are

43:25

not useful without that connection.

43:27

You do have some smart speakers that can

43:29

connect to another device

43:32

like a smartphone via Bluetooth, so

43:34

you could do things that way, but

43:37

without those connections, the smart speaker

43:39

turns into, you know, just a dumb

43:41

speaker, or sometimes just a paperweight. Now,

43:44

this collection of technologies and disciplines

43:46

are what enabled Apple to introduce

43:49

Sirie in two thousand and eleven, and

43:52

Syria is a virtual assistant.

43:54

Series origins actually trace back to the

43:56

Stanford Research Institute and

43:58

a group of guys Grouber, Adamshire

44:01

and dog kit Louse who

44:03

had been working on the concept since the nineteen

44:05

nineties, and when Apple launched

44:08

the iPhone in two thousand seven, they saw

44:10

the iPhone as a potential platform for

44:12

this virtual assistant that they had

44:14

been building, and they thought, well,

44:16

this is perfect because the iPhone has a microphone,

44:19

so the assistant can respond to voice

44:21

commands as a speaker, so it could communicate

44:24

back to the user, it could do all sorts of stuff.

44:26

We can tap into the interoperability

44:29

of apps on the device. It's a perfect

44:32

platform for us to deploy this. So

44:34

they developed an app once the opportunity

44:36

arose because apps were not available

44:39

for development immediately when Apple

44:41

launched the iPhone, and

44:44

once they did launch that app, uh

44:47

within a month, less than a month,

44:49

Steve Jobs was on the phone calling them up and

44:51

offering to buy the technology, which of

44:53

course they would agree to and it would become an

44:55

integrated component in Apple's iPhone

44:57

line afterward. And that's

45:00

where voice assistants kind of lived

45:02

for a few years. They mostly lived on smartphones

45:05

like the iPhone. But in November

45:07

two thousand fourteen, Amazon introduced

45:10

the Amazon Echo smart speaker,

45:12

which was originally only available for Prime

45:14

members, and it had its own virtual

45:17

assistant named Alexa, and

45:19

thus the smart speaker era officially

45:21

began. Now, there are plenty of

45:23

other smart speakers that are on the market

45:26

these days. There are products from Google

45:28

like Google Home. Uh, there are

45:30

so no speakers that can connect to services

45:33

like Amazon's Alexa or Google's Assistant,

45:35

and we're probably going to see a ton more,

45:38

both from companies that piggyback onto services

45:40

from the big providers like Google and Amazon,

45:43

and maybe some that are trying to make a go of

45:45

it with their own branded virtual assistants

45:47

and services. Smart speakers

45:50

respond to commands after they quote unquote

45:52

here a wake up word or phrase.

45:55

Now, I'm gonna make up a wake

45:57

up phrase right now so that I

45:59

don't set off anyone's smart speaker

46:02

or smart watch or smartphone or smart

46:04

car or whatever it might be. So this

46:07

is just a fictional example of a wake up

46:09

phrase. So let's say I

46:11

have a smart speaker and the wake up

46:13

phrase for my smart speaker happens

46:15

to be hey, they're Genie. Well,

46:18

my smart speaker has a microphone, so it can

46:20

detect when I say that, but

46:23

really it's constantly detecting all

46:26

sounds in its environment.

46:29

The microphone is always active. It has to be

46:31

in order to be able to pick up on when

46:33

I say the wake up phrase. So

46:37

the microphone is always active on most smart

46:39

speakers. There's somewhere you can

46:41

program it so that it will only activate

46:44

if you first touch the speaker and

46:46

that wakes it up. There's some that you

46:48

can do that with, But for the most part, they're

46:50

always listening. While the

46:52

speaker can quote unquote here everything,

46:55

it's not listening to everything.

46:57

In other words, it's not mon

47:00

of during the specific things being said. At least

47:02

that's what we've been told. And honestly, that

47:04

makes a ton of sense from an operational

47:06

standpoint. And the reason I say that is

47:09

that the sheer amount of information

47:11

that would be flooding in from all the microphones

47:14

on all the smart devices from any one

47:17

provider that happened to be deployed

47:19

all over the world, that would be an astounding

47:21

amount of data. And sifting

47:23

through all that data to find stuff that's useful

47:26

would take an enormous amount of effort and time

47:29

and and processing power. So

47:31

while you could have all the microphones

47:33

listening in all over the place, finding

47:36

out who to listen to at what time

47:38

would be a lot trickier and probably not worth

47:40

the effort it would take to pull something like that

47:43

off. So what

47:45

these speakers and other devices are actually

47:48

doing is looking for a signal

47:50

that matches the one that represents the wake

47:52

phrase. So when I say, hey,

47:55

they're Genie, the microphone

47:57

picks up my voice, which the mic

47:59

then try inslates into an electrical signal

48:01

which gets digitized and compared against

48:04

the digital fingerprint of the predesignated

48:06

wake up phrase. And in this

48:08

case, the two phrases match. It's

48:11

like a fingerprint matching something

48:13

that was left at a site. So that

48:16

turns the speaker into an active

48:18

listener rather than a passive one. It's

48:20

ready to accept a command or

48:22

a question and to respond to

48:24

me. But if I didn't say,

48:26

hey, they're Genie, then the speaker

48:29

would remain in passive mode because

48:32

it wouldn't have a digital fingerprint

48:34

that matches the one of the wake

48:36

up phrase. Everything stays at

48:38

the local level, and none of my sweet

48:40

secret speech gets transmitt related

48:42

across the internet. It's all staying right there.

48:45

At least that's what we've been told. And

48:47

again I don't have any reason to disbelieve this,

48:50

but it is something to keep in mind. You are

48:52

talking about devices that have microphones.

48:54

Of course, if you have a smartphone, you've already got one

48:56

of those or a cell phone. In general, you've

48:58

got a device with a microphone on it

49:01

neck near you pretty much all the time. Now,

49:04

once I do make a request with my smart

49:06

speaker, the speaker then sends that request

49:09

up to the cloud where it gets processed,

49:11

It's analyzed, uh, and then

49:13

a proper response is returned

49:15

to me, whether that is playing

49:18

a song or giving me information I've asked

49:20

for, or maybe even interacting with some other smart

49:22

device in my home, such as adjusting

49:24

the brightness of my smart lights

49:26

in my house. Now, if the system

49:29

is not sure about whatever it was I just

49:31

said, it will probably

49:33

return an error phrase. So maybe

49:36

maybe I'm too far away from the speaker,

49:38

so it's it couldn't quote unquote

49:40

hear me really well. Or maybe I've

49:42

got a mouthful of peanut butter or something

49:44

as I want to do. Then I'm going to

49:47

get something like I'm sorry, I don't know how to do

49:49

that, or I'm sorry I didn't understand you, and

49:51

then I'd have to repeat it. Now, smart

49:53

speakers are pretty cool. However, they

49:55

do represent another piece of technology

49:58

that you have to network to

50:00

other devices, including your

50:02

own home network, and as such

50:05

that means that they represent a potential

50:08

vulnerability in a network. It

50:10

doesn't mean they're automatically vulnerable, but

50:13

it means that every time you are connecting

50:15

something to your network, then

50:18

you're creating another potential attack

50:21

vector for a hacker. Right

50:23

now, if everything is super strong, it

50:26

it doesn't really effectively

50:29

change your safety in any

50:31

meaningful way. But if one of those

50:34

things that you connect to your network is less

50:36

strong than the others, you're looking at the

50:38

weakest link situation where a hacker

50:40

with the right know how in tools could

50:42

potentially target that part of your

50:44

network to get entry into

50:46

everything else. And when you're

50:49

talking about a smart speaker, you're

50:51

talking about device that has an active

50:54

microphone on it. So potentially,

50:56

if someone were able to compromise a smart speaker,

50:59

they would be able to listening on anything that

51:01

was within range of that smart

51:03

speakers microphone. So that's

51:06

why you have to at least be

51:08

cognizant of that, do your

51:10

research, make sure the devices you're connecting

51:12

to your network are rated well

51:15

as from a security standpoint, when

51:17

you're setting things up and you have to create

51:19

passwords, create strong passwords

51:22

that are not used anywhere else.

51:24

The harder you make things the more

51:27

likely hackers will just pass you

51:29

by, not because you're

51:31

too tough to crack. Never get

51:33

your into your head that you're too strong

51:35

to to be hacked, but rather

51:38

if there's someone who's weaker than the

51:40

hackers are going to go after that person instead.

51:43

So just don't be the weak person. Practice

51:45

really good security

51:48

behaviors, and you're more likely

51:50

to discourage attackers and

51:52

they'll they'll go on to someone else.

51:55

Um, especially if you're talking

51:57

about newbies who don't really know their way

51:59

around their just using tools that other people

52:01

have designed. They get discouraged very

52:03

quickly. They'll move on to someone else because there's always

52:06

another potential target. I'm

52:08

curious about you guys, whether or not you

52:10

have any smart speakers in your life,

52:13

and uh if you find them useful. I

52:15

find mine pretty useful. I

52:17

use it for a very narrow range

52:20

of things. I don't tend to use it.

52:23

I definitely don't use it to its full potential. I

52:25

know that because what's in the blue

52:27

moon. I'll just try something and

52:29

I'm amazed at what happens when when

52:31

I get a response. But for

52:33

the most part, I'm asking about whether what

52:35

I can feed my dog whether or

52:38

not it can turn on the lights and uh

52:40

and and that's about it. Are

52:42

occasionally playing a song. Um,

52:45

but I'm curious what you guys are using them for. Reach

52:47

Out to me on social networks on Facebook

52:50

and I'm on Twitter, and the handle for both of those

52:52

is text stuff. H s W

52:55

also use that those handles if you

52:57

have suggestions for future episodes. If you've got,

52:59

you know, an idea for either a company

53:01

or a technology or a theme in

53:04

tech you'd really like me to tackle, let

53:06

me know there and I'll talk to you

53:08

again really soon. Text

53:15

Stuff is a production of I Heart Radio's How

53:17

Stuff Works. For more podcasts from my

53:19

heart Radio, visit the i heart Radio

53:21

app, Apple Podcasts, or wherever you

53:23

listen to your favorite shows.

Rate

Get this podcast via API

From The Podcast

TechStuff

TechStuff is a show about technology. And it’s not just how technology works. Join host Jonathan Strickland as he explores the people behind the tech, the companies that market it and how technology affects our lives and culture.

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More