Podchaser Logo
Home
Arsenic in My Muffins (with Kasia Chmielinski)

Arsenic in My Muffins (with Kasia Chmielinski)

Released Thursday, 9th December 2021
 1 person rated this episode
Arsenic in My Muffins (with Kasia Chmielinski)

Arsenic in My Muffins (with Kasia Chmielinski)

Arsenic in My Muffins (with Kasia Chmielinski)

Arsenic in My Muffins (with Kasia Chmielinski)

Thursday, 9th December 2021
 1 person rated this episode
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

Yeah. Welcome

0:02

to How to Citizen with Baritune Day, a podcast

0:05

that reimagined citizen as a verb, not

0:07

a legal status. This season

0:09

is all about tech and how it can bring us

0:12

together instead of tearing us apart. We're

0:15

bringing you the people using technology

0:17

for so much more than revenue and user

0:19

growth. They're using it to help

0:22

us citizen. I

0:34

have been working over the past year

0:36

to try to integrate my own thinking around

0:39

technology, and last year I wrote

0:41

a bit of a manifesto. Back

0:44

in I was invited

0:46

to speak at Google IOH, an annual

0:49

developer conference held by Google.

0:52

They wanted me to share my thoughts on what the future

0:54

of technology could look like. I

0:56

went on a journey to try to understand

0:58

how all my data existed amongst

1:01

the major platforms, amongst app developers,

1:04

and what came out of that was a set

1:06

of principles to help guide us more

1:08

conscientiously into the future.

1:11

Now. The first principle of my manifesto

1:14

is all about transparency. Like

1:17

I wanted to understand what was going

1:19

on inside the apps, behind

1:21

the websites I was spending all my time on.

1:23

When I want to know what's in my food,

1:26

I don't drag a chemistry set to

1:28

the grocery store and inspect every

1:31

item point by point. I read

1:33

the nutrition label. I know the

1:35

content, the calories, the ratings.

1:38

I shouldn't have to guess about

1:40

what's inside the product. I certainly shouldn't

1:42

have to read thirty three thousand

1:44

word legally's terms of service to

1:47

figure out what's really happening

1:49

inside. It's pretty simple.

1:52

We make better decisions about the things we

1:54

consume when we know what's in them.

1:57

So if I'm checking out an app on the app store

1:59

right and I see upfront that it's going to harvest

2:01

my data and slang it on some digital street corner,

2:05

can I interest you in so data?

2:09

I can ask myself, Hey, self, are

2:12

you okay with this app harvesting your data and

2:14

slanging it on a digital street corner? And

2:16

then, having asked myself that question,

2:19

I can decide whether or not to download

2:22

it. I don't have to hope that it won't

2:24

screw me over. I can know, but

2:27

check it out. This nutrition label

2:30

idea hasn't just existed in

2:32

the vacuum of my own brain. It's

2:34

a real thing. There are actual

2:37

people making nutrition labels in the world

2:39

of tech. In the same

2:41

way that I walk into a bakery

2:43

and I see a cake that's been baked, and I

2:45

might think to myself, I wonder

2:48

what's in that cake. We would want the same thing

2:50

for a data set, where even if you

2:52

encounter that data set in the wild, you, as

2:54

a data practitioner, will think to yourself, I

2:57

wonder if this is representative. Cash

3:00

of Malinsky is one of those people.

3:03

These labels are a little different from what I propose

3:05

that Google, I yoe. Their

3:07

data nutrition labels aren't for consumers

3:09

like me and you at the end of the assembly

3:11

line. Instead, therefore, the people

3:13

at the very beginning the data scientists.

3:16

Now, Kasha's data nutrition labels

3:19

are an easy to use tool to help data

3:21

scientists pick the data that's right

3:23

for the thing they're making. We

3:27

interact with algorithms every day,

3:29

even when we're not aware of it. They

3:32

affect the decisions we make about hiring,

3:34

about policing, pretty much everything.

3:37

And in the same way that we the people

3:40

ensure our well being through government standards

3:42

and regulations on business activities.

3:44

For example, data scientists

3:46

needs standards to Kasha

3:51

is fighting for standards that will make sure

3:53

that artificial intelligence works

3:55

for our collective benefit or at

3:57

least doesn't undermine

4:00

m Hi. Hello,

4:03

how are you feeling right now? Kasha? I'm

4:06

feeling pretty good the beginning of another

4:08

way. Kasha is the co founder and lead

4:10

of the Data Nutrition Project, the team

4:12

behind those labels. They've also

4:14

worked as a digital services technologist

4:17

in the White House, on COVID analytics

4:19

at Mackenzie and in communications

4:22

at Google. Yeah.

4:24

Yeah, so I've kind of I've jumped around.

4:27

Yeah, so

4:30

why don't you introduce yourself and just tell

4:32

me what you do. My name

4:34

is Kasha Shamalinski, and I am a technologist

4:37

working on the ethics of data.

4:40

And I'd say, you know importantly

4:42

to me, although I have always been a nerd

4:44

and I studied physics along time ago.

4:47

I come from a family of artists. Actually,

4:49

the painting behind me is by

4:51

my brother. There's another one in the room by my mom

4:54

um. And so I come from a really kind of multidisciplinary

4:57

group of people who are driven

4:59

by our passions. And that's kind of what I've tried to

5:01

do too, and it's just led me on many different

5:03

paths. Where does

5:06

the interest in technology come from? For you?

5:09

You know, I don't think that it's really an interest

5:11

in technology. It's just that we're in a technological

5:13

time. And so when I

5:15

graduated from university with this physics

5:18

degree, I had a few options,

5:21

and none of them really seemed

5:23

great. Uh. You know, I could go into defense

5:25

work, I could become a spy, or I could

5:28

make weapons, and that really wasn't so interesting

5:30

to me. Was being was

5:32

Was spy really an option? Uh?

5:35

Yes, so

5:37

you know I could do that, um, but I

5:39

didn't end and none of these are really interesting

5:42

because I wanted to make

5:44

an impact and I wanted to drive change, and I think that

5:46

was around you know, um, early thousands, and

5:49

technology was the place to be. That's where you could really

5:51

have the most impact and solve really big problems.

5:54

Um. And so that's where I ended up. So I

5:56

actually don't think that it's really about the

5:58

technology at all. I think that the technology is

6:00

just a tool that you can use to

6:02

to kind of make an impact in the world. I

6:05

love the way you describe the

6:08

interest in technology is really just an interest

6:10

in the world. So do you remember

6:12

some of the first steps that

6:14

led you to what you are doing now? So

6:18

when I graduated, I actually applied

6:20

to many things and didn't get them. And what

6:22

I realized that I really didn't know how to do

6:24

it at all. Always tell a story. Um,

6:27

and coming out of a fairly technical

6:29

path, I couldn't really

6:31

make eye contact. I hadn't talked to a

6:34

variety of people. I mean, I was definitely

6:36

one of the only people who had my identity in

6:39

in that discipline at that time. I went

6:41

to a school where the the head of the school

6:43

at the time was saying that women might

6:45

not be able to do science because biologically they

6:48

were inferior in some way. Oh

6:50

that's nice, very welcoming environment. Oh

6:52

yeah, super welcoming. And I was studying physics and at

6:54

the time, I you know, it was female identified. I

6:56

now identify as non binary. Um. But

6:59

it wasn't like a great place to be doing

7:01

science, and I just felt like coming out of that,

7:03

I was, UM. I didn't know how to

7:05

talk to people. I didn't know what it was like to be part of a

7:07

great community. And so I actually went into communications

7:10

at Google, which was strange duringdustory

7:13

industry. I went from this super

7:15

nerdy, very male dominated place

7:17

to like a kind of like the party

7:20

wing of of technology

7:22

at the time. Right, So people who are doing a lot

7:24

of marketing and communications and talking to journalists

7:26

and telling stories and trying to figure out like what's

7:28

interesting it has this fit into the greater narratives

7:31

of our time. So

7:38

while at Google, I got to see inside

7:40

of so many different projects that I think

7:42

was a great benefit to being

7:45

part of that strategy team. So

7:47

I got to work on core Search, I got to

7:49

work on image search, I got to work on

7:51

Gmail and Calendar, and

7:54

I started to see the importance

7:57

of first of all, knowing

7:59

why you're building something before you

8:02

start to build it, right, And there were so

8:04

many times that I saw really

8:06

really cool product at the end of the day,

8:08

something an algorithm or something technical

8:10

that was just really cool, but there was

8:12

no reason that it needed to exist.

8:15

Right from from a person perspective, from a society

8:17

perspective, I

8:26

am relieved to hear you say that.

8:29

There's been one of my critiques of this industry

8:31

for quite some time. It's like, whose

8:33

problems are you trying to solve? And so

8:35

you were at the epicenter of one of the major

8:38

companies seeing some of this firsthand. Yeah,

8:40

that's exactly right, And it was endemic.

8:43

I mean It just happens all the time, and it's not the

8:45

fault of anyone in particular. You just put a bunch

8:47

of really smart engineers on a technical problem

8:49

and they just find amazing

8:51

ways to solve that. But then at the end of the day

8:54

you say, well, how are we actually going to use this? And

8:56

that would fall to the comms team,

8:58

right or the marketing team to say, okay, now

9:00

what are we going to do with this? Um So that was one thing

9:02

and that's why I actually ended up moving into product

9:04

management, where I could think about why

9:07

we want to build something to begin with, and to make sure

9:09

we're building the right thing. Um So I

9:11

got closer to the technology after that job. The

9:15

second thing that I became aware of is the

9:18

importance of considering the

9:20

whole pipeline of the thing that you build, because

9:22

the thing that you build, it's d n A, is

9:25

in the initial data that you

9:27

put into it. And I'm talking specifically

9:29

about algorithmic systems here. So

9:32

one example I have from my days

9:35

when I was at Google. I actually I worked out

9:37

of the London office and there was a

9:39

new search capability and

9:41

it was trained entirely on one particular

9:43

accent and then

9:46

when other people tried to use that, if they didn't

9:48

have that very specific accent, it wasn't working

9:50

so well. And I really didn't

9:52

know much about AI at the time. I hadn't

9:54

studied it, but I realized, you know,

9:56

bias in bias out like garbage in garbage

9:59

out. You you feed this machine something,

10:01

the machine is going to look exactly

10:03

like what you fed it. Right, you are what you

10:05

eat. We'll

10:10

be right back. We

10:22

use these terms data, we use this terms

10:24

algorithm and artificial intelligence.

10:27

And so before we keep going, I'd love for

10:30

you to pause and kind of explain

10:32

what these things are in their relationship

10:35

to each other. Data, algorithms,

10:38

artificial intelligence. How how does Kasha

10:40

define these? Yeah,

10:44

thank you for taking a moment. I think that that's

10:47

um. Something that's so important in technology is

10:49

that people feel like they aren't allowed

10:51

to have an opinion or have

10:53

thoughts about it because they don't quote

10:55

unquote don't understand it. But

10:57

you're right, it's just it's just a definition

11:00

all issue. Often. So,

11:03

data is anything that

11:05

is programmatically accessible

11:08

that is probably in enough volume

11:10

to be used for something by a system.

11:13

So it could be records of something,

11:15

it could be whether information. It

11:17

could be the notes taken by a doctor

11:19

that then get turned into something that's programmatically

11:22

accessible. There's a lot of stuff

11:24

and you can feed that to a machine. I'm

11:28

really interested in algorithms because it's kind

11:30

of the practical way of understanding something

11:32

like AI. It's it's a mathematical formula

11:35

and it it takes some stuff and then it outputs

11:37

something. So that could be something like

11:40

you input where you live

11:43

and your name, and then the algorithm

11:45

will churn and spit out something like

11:47

you know what race or ethnicity it thinks you are.

11:50

And that algorithm, in order to to

11:52

make whatever guesses it's making,

11:55

needs to be fed a bunch of data so

11:57

that it can start to recognize patterns. When

12:01

you deploy that algorithm out in in the

12:03

world, you feed it some data and it

12:05

will spit out what it believes is

12:07

the pattern that it recognizes based on what

12:09

it knows. You

12:14

know, there's different flavors of AI. I think

12:17

a lot of people are very afraid of kind

12:19

of the terminator type AI. I'll

12:22

be back as

12:24

as we should be, because the terminator is very scary.

12:27

I've seen the documentary many times

12:29

and I don't want to live in that world. Yeah,

12:32

legitimately very scary. UM.

12:35

And so there's this there's this question of Okay,

12:37

is the AI going to come to eat our lunch?

12:40

Right? Are they smarter than us? And all the things that

12:42

we can do, and that's like you know, generalized

12:44

AI or or even kind of super AI.

12:47

We're not quite there yet. Currently,

12:49

we're in the phase where we have discrete

12:51

AI that makes discreet decisions and we

12:53

leverage those to help us in our

12:55

daily lives or to hurt us. Sometimes

12:58

data as food for algorithms. I

13:01

think it's a really useful metaphor. And

13:03

a lot of us out in

13:05

the wild who aren't specialized in this, I

13:08

think we're not encouraged to understand that relationship.

13:10

I agree, And I think the

13:13

the relationship between what you feed

13:15

the algorithm and what it gives you is

13:17

so direct, and people don't necessarily

13:20

know that or see that. And what you see

13:22

is the harm or the output that

13:24

comes out of the system, and what you

13:27

don't see is all the work that went into

13:29

building that system. You have someone who

13:31

decided in the beginning they wanted to use AI, and

13:33

then you have somebody who went and found the data,

13:35

and you have somebody else who cleaned the data, and

13:38

you've got somebody or somebody's who

13:40

then built the algorithm and train the algorithm,

13:43

and then you have the somebodies who coded that up, and

13:46

then you have the somebody's that's deployed that, and

13:48

then you have people who are running that. And

13:50

so when the algorithm comes out the end and there's

13:53

a decision that's made you get the loan, you didn't get

13:55

the loan. The algorithm recognizes

13:57

your speech, doesn't recognize your speech see

14:00

ease, you doesn't see you. People

14:02

think, oh, just change the algorithm. Oh no,

14:04

you have to go all the way back to the beginning

14:12

because you have that long chain of people who are doing

14:14

so many different things, and it becomes very complicated

14:17

to try to fix that. So the

14:19

more that we can understand that the process

14:22

begins with the question

14:24

do I need AI for this? And

14:26

then very quickly after where are we going

14:28

to get the data to feed that? So that we

14:30

make the right decision. The sooner

14:33

we understand that as a society, I

14:35

think the easier it's going to be for us to build better AI

14:38

because we're not just catching the issues at the very end

14:40

of what can be a year's long process. Mm

14:43

hm. So so what problems does

14:45

the Data Nutrition Project aimed to tackle?

14:48

We've kind of talked about them all in pieces. At

14:50

its core, the Data Nutrition Project, which

14:52

is this research organization that I

14:54

co found a bunch of very smart people. We

14:57

were all part of a fellowship that was

14:59

looking at the ethics and governance of

15:01

AI. And so when we sat down

15:03

to say what are the real things that we

15:05

can do to drive change um

15:07

as practitioners, as people in the space, as people

15:09

who had built AI before, we decided

15:12

let's just go really small. And

15:14

obviously it's actually a huge problem and it's it's very

15:17

challenging. But instead of saying, let's

15:19

look at the harms that come out of an

15:21

AI system, let's just think

15:23

about what goes in. And I think

15:25

we're maybe eating a lot of snacks. We were hold

15:28

up at the M I T Media Lab, right, So we were just

15:30

all in this room for many many hours,

15:32

many many days, and I think somebody

15:34

at some point picked up, you know, a snack package,

15:38

and we're like, what if you just had a nutritional

15:41

label like the one you have on food, you just

15:43

put that on a data set. What would that do?

15:45

I mean, is it possible? Right?

15:48

But if if it is possible, would that actually

15:50

change things? And we started talking it over and we thought,

15:52

you know, we think it would. In

15:55

our experience in data science as practitioners,

15:58

we know that data doesn't hum with

16:01

standardized documentation, and

16:04

often you get a data set and you don't

16:06

know how you're supposed to use it or not use it. There

16:09

may or may not be tools that you use to

16:11

look at things that will tell you whether

16:13

that data set is healthy for the thing that you want

16:16

to do with it. The

16:20

standard process would be a product

16:22

manager CEO would come

16:24

over to the desk of data scientists and say,

16:27

look, we have all this information about this new product

16:29

we want to sell. We need to map

16:32

the marketing information to the demographics

16:35

of people who are likely to want to

16:37

buy our product or click on our product. Don't

16:39

make it happen. And the data scientist goes

16:42

okay, and the person goes, oh yeah, by tuesday, and

16:44

the persons like, oh okay, let

16:47

me go find the right data for that. There's

16:49

a whole world. You just google a bunch of stuff and

16:52

then you get the data, and then you kind

16:54

of poke around and you think, as seems pretty good,

16:57

and then you use it and you build

16:59

your algorithm on that. Your algorithm is

17:01

going to determine which demographics or

17:03

what geographies or whatever it is you're trying to do.

17:06

You train it on that data you found,

17:09

and then you deploy that algorithm and it starts to

17:11

work in production. And you

17:13

know, no fault of anybody, really, but the

17:15

industry has grown up so much faster than

17:18

the structures and the scaffolding to keep

17:20

that industry doing the right thing. So

17:23

there might be documentation on some of the data, there might

17:25

not be in some cases. We're working with a data

17:28

partner that was very concerned how people

17:30

were going to use their data. The data set

17:32

documentation was an eighty page PDF.

17:35

Zero that data scientist who's on

17:37

deadline for Tuesday is not going to read eighty

17:39

pages. So our

17:42

thought was, hey, can we distill

17:44

the most important components

17:46

of a data set and its usage to

17:49

something that is maybe one sheet two sheets

17:51

right, using the analogy of the nutrition label,

17:54

put it on a data set, and then make that

17:56

the standard so that anybody who is picking

17:58

up a data set to decide whether or not to use. It

18:00

will very quickly be able to assess is this healthy

18:03

for the thing I want to do. It's a novel

18:05

application of a thing that so many of us

18:07

understand. What are some of

18:09

the harms you've seen some

18:12

of the harms you're trying to avoid by

18:14

the data scientists who are building these services

18:17

not having access to healthy data.

18:20

Yeah. Let's say you have a data set that health

18:23

outcomes and you're looking at

18:25

people who have had heart attacks or something

18:27

like that, and you realize

18:29

that the data was only taken

18:31

from men in their sixties.

18:34

If you are now going to use this as

18:37

a data set to train an algorithm to provide

18:39

early warning signs for who

18:42

might have a heart attack, you're

18:44

gonna miss entire demographics

18:46

of people, which may or may not matter. That's

18:48

a question. Does that matter? I don't know, But

18:50

perhaps it matters what the average

18:52

size of a body is, or the average age

18:54

of a body is, or maybe there's

18:56

something that is gender or sex related,

18:59

and you will miss so all of that. If you just take

19:01

the data at face value, you don't think about who's

19:03

not represented here. I remember

19:06

examples that I used to cite in

19:08

some talks. It was the Amazon hiring decisions.

19:12

Amazon software engineers recently

19:14

uncovered a big problem. Their

19:16

new online recruiting tool did

19:18

not like women. It

19:21

had an automated screening system

19:23

for resumes, and that system ignored

19:26

all the women because the data

19:28

set showed that successful

19:30

job candidates at Amazon were

19:32

men. And so the computer like

19:34

garbage in, garbage out. The way we've discussed

19:37

said, well, you've defined success as mail.

19:40

You've fed me a bunch of female that's

19:43

not success. Therefore my formula

19:45

dictates they get rejected, and

19:48

that affects people's job prospects. You know, that affects

19:51

people sense of their self worth and self esteem. That

19:53

could open up the company to liability,

19:56

all kinds of harms in a system that was supposed

19:58

to reread efficiency and

20:01

and help. Yeah,

20:03

that's a great example, and it's, you know, a

20:05

very true one. And I think that

20:07

one was pretty high profile.

20:09

Imagine all the situations that either

20:12

have never been caught or we're kind of too

20:14

low profile to make it into the news. It

20:16

happens all the time because

20:18

the algorithm is a kind of a reflection

20:20

of whatever you've fed it. So in that case,

20:23

you had historical bias, and so

20:25

the historical bias in the resumes

20:27

that they were using to feed the algorithm showed

20:30

that men were hired more frequently and

20:32

that was success. It also comes

20:34

down to, in terms of the metrics, how you're defining

20:36

things. If your definition of success is

20:39

that someone was hired, you're not necessarily

20:42

saying that your definition is that person

20:44

was a good ended up being a good worker.

20:47

Or even if you're looking at the person's performance

20:49

reviews and saying success would be that we

20:51

hire somebody who performs well. But

20:53

historically you hired more men than women. So

20:56

even then, if your success metric is

20:58

someone who performed well, you're already taking

21:00

into account the historical bias that there are more men

21:02

than women who are hired. So there

21:05

are all different kinds of biases that are being captured

21:07

in the data. Something

21:12

that the Data Nutrition Project is

21:15

trying to do with the label that we've built is

21:17

highlight these kinds of historical issues

21:19

as well as the technical issues in the data,

21:22

and that I think is an important balance to

21:24

strike. It's not just about

21:26

what you can see in the data. It's

21:28

also about what you cannot see in the data. So

21:33

in the case that you just called out there with the resumes,

21:35

you would be able to see that's not representative with respect

21:38

to gender, and maybe

21:40

you'd be able to see things like these are all English

21:42

language resumes, But what you would

21:44

not be able to see are things

21:47

like socio economic differences or

21:49

people who never applied, or you

21:51

know, what the job market looked like

21:53

whenever these resumes were collected, So

21:55

you'll kind of not be able to see any of that if you just take

21:57

a purely technical approach to what's in the data

22:00

up. So the data set nutrition

22:02

label tries to highlight those things as well to

22:04

data practitioners to say, before

22:07

you use this data set, here are some things you should

22:09

consider, and sometimes

22:12

will even go as far as to say you

22:14

probably shouldn't use this data set for

22:16

this particular thing because we just

22:18

know that it's not good for that, and

22:20

that's always an option, is to say don't use it.

22:23

Right. It doesn't mean people won't do it, but at least

22:25

we can give you a warning, and we kind of hope

22:27

that people have the best of intentions and are trying

22:29

to do the right thing. So it's about explaining

22:32

what is in the data set or

22:34

in the data so that you can decide as

22:36

a practitioner whether or not it is healthy for your

22:38

usage. After

22:42

the break, it's snack time. I'm

22:45

back hungry. So

22:58

I'm holding a package your food right now,

23:01

and I'm looking at the nutrition

23:03

label nutrition facts. It's got

23:06

servings per container, the size

23:08

of a serving, and then numbers

23:11

and percentages in terms of the daily

23:13

percent of total fact cholesterol, sodium,

23:16

carbohydrates, protein, and

23:18

a set of vitamins that I can expect

23:20

in a single serving of this product. And then

23:23

I can make an informed choice about

23:25

whether and how much of that food

23:27

stuff I want to put in my body,

23:30

how much garbage I want to let in. In this case, it's

23:32

pretty healthy stuff. It's uh dried

23:34

mangoes. If you're curious, what's

23:38

on your data nutrition label? Yeah,

23:42

a great question. And now I'm like kind of hungry. I'm like,

23:44

oh, it's a snack time. I feel like it's snack time. Um.

23:49

This is the hardest part to me about

23:51

this project is what the

23:53

right level of metadata is. So

23:56

what are the right elements that you want to call out

23:58

for our nutritional label?

24:00

You know, what are the facts and the sodiums and these kinds of

24:02

things, Because you know, that The complication here

24:05

is that there are so many different

24:07

kinds of data sets. I can have a data

24:09

set about trees in Central Park, and I can have a

24:11

data set about people in prison. So

24:14

we've kind of identified

24:16

that the harms that were most worried

24:18

about have to do with people. Um

24:21

not to say that we are, you know, not worried

24:24

about things like the environment or

24:26

other things, but when it touches people

24:29

or communities is when we see the

24:31

greatest harms from an algorithmic standpoint

24:33

in society. And so we

24:36

kind of have a badge system that should

24:38

be very quick, kind of icon based that

24:40

says this data sets about people are not This

24:43

data set includes subpopulation

24:45

data, so you know, includes information

24:47

about race or gender or

24:50

whatever status. Right, this

24:52

data set can be used for commercial

24:54

purposes or not. We've identified, let's

24:56

say, tend to fifteen things that we think

24:59

are kind of high level almost like little

25:01

food warning symbols that you would see on something

25:04

like organic or it's got a surgeon General's

25:07

warning right exactly. So

25:09

at a very high level we have these kind of icons.

25:12

And then underneath that there are

25:14

additional very important questions that we've

25:16

highlighted that people will answer who

25:18

own the data set? And then finally there's a

25:20

section that says, here's the reason it was made.

25:23

The data set was made, it's probably an intended use.

25:25

Here are some other use cases that are possible

25:28

or ways that other people have used it, and

25:30

then here are some things that you just shouldn't do. So

25:33

how do we make this approach

25:36

more mainstream? Mainstream

25:38

is a tough word because we're talking about people

25:41

who build AI, and I think that is becoming

25:43

more mainstream for sure. Um,

25:45

but we're really focused on data practitioners,

25:47

so people who are taking data and then

25:49

building things on that data. But there's kind

25:52

of a bottoms up approach. It's very

25:54

anti establishment in some ways, in very

25:56

hagriculture. And so we've

25:58

been working with a lot of data petitioners to

26:00

say what works, what doesn't, is as useful

26:02

as it not. Make it open source right, open licenses,

26:05

use it if you want, and just hoping that if

26:07

we make a good thing, people will use it. A

26:09

rising tide lifts all boats, we think, so you

26:12

know, we're not cag about it because

26:14

we just want better data. We have better data out there,

26:16

and if people have the expectation that they're going to

26:18

see something like this, that's awesome. There's

26:20

also the top down approach, which is regulation

26:23

policy, And I could imagine a world

26:25

in which in the future, if you

26:28

deploy an algorithm, especially

26:30

in the public sector, you would have to include

26:32

some kind of labeling on that right to talk about

26:34

the data that it was trained on and provide a label for that.

26:36

So it's kind of a two way approach, you know. Yeah,

26:39

no, I mean when I think of analogus, like most

26:41

of us don't know civil engineers

26:44

personally, but we interact with

26:46

their work on a regular basis through

26:48

a system of trust, through standards,

26:50

through approvals, through certifications,

26:53

and data scientists are on

26:55

par with like a civil engineer in my mind,

26:57

and that the erect structures that we inhabit

27:00

a regular basis. But I have no

27:02

idea what rules they're operating by.

27:04

I don't know what's in this algorithm, you know, I

27:06

don't know how what ingredients you used to put this together

27:08

that's determining whether I get a job or

27:11

vaccination. What's

27:13

your biggest dream for the Data nutrition

27:15

project? Where does it go? So

27:18

I could easily say, you know, our

27:21

dream would be that every data set comes with a label. Cool,

27:24

But more than that, I think we're trying to

27:26

drive awareness and change. So even if

27:28

there isn't a label, you're thinking about,

27:30

I wonder what's in this and I wish it had a label

27:32

on it. In

27:35

the same way that I walk into

27:37

a bakery and I see a cake that's

27:39

been baked, and I might think to myself, I

27:42

wonder what's

27:44

in that cake, and I

27:46

wonder, you know, if

27:48

it has this much of something, or maybe

27:51

I should consider this when I decide whether

27:53

to have four or five pieces of cake. We

27:55

would want the same thing for a data set where even

27:57

if you encounter that data set in the wild, someone's

28:00

created it. You just downloaded it from some repository

28:03

on gith hub. There's no documentation

28:06

that you, as a data practitioner, will think to yourself,

28:09

I wonder if this is representative. I

28:11

wonder if the thing I'm trying to do with this

28:14

data is responsible, considering

28:17

the data, where it came from, who

28:19

touched it, who funded it, where it lives, how

28:21

often it's updated, whether they

28:23

got consent from people when they

28:25

took their data, And so we're

28:28

trying to drive a culture change.

28:32

I love that and I love the idea

28:34

that when I go to a bakery, one

28:36

of the questions I'm not asking myself is

28:39

is that muffin safe to eat? Right?

28:42

Is that Kate gonna kill me? It

28:45

literally doesn't enter my mind because there's such

28:47

a level of earned trust in

28:49

the system overall that

28:52

you know, these people are getting inspected, that

28:54

there's some kind of oversight that they were trained

28:56

in a reasonable way, so I know

28:58

there's not arsenic in the muffins.

29:01

So this brings me to zooming

29:03

out a little bit further to artificial intelligence

29:06

and the idea of standards, because

29:08

I'm getting this picture from you that there's kind

29:10

of a wild West in terms of what we're feeding

29:12

into the systems that ultimately become

29:15

some form of AI. What does the world

29:17

look like when we have more

29:19

standards in the tools

29:21

and components that create AI. I

29:24

think that our understanding of what AI

29:27

is and what kinds

29:29

of AI there are is going to mature.

29:31

I imagine that there is a system

29:34

of classification where

29:36

some AI is very high risk and

29:39

some AI is less high risk, and

29:41

we start to have a stratified view of

29:43

what needs to occur in each

29:45

level in order to reach an understanding

29:48

that there's no arsenic in the muffins.

29:50

So at the highest level, when

29:52

it's super super risky, maybe we just don't

29:55

use AI. This seems to be

29:57

something that people forget, is that we can decide whether

29:59

or not to use it. Like, would

30:01

you want an AI performing surgery

30:03

on you with no human around? If

30:07

it's really really good? Do you want that? Do

30:09

you want to assume that risk? I mean that is dealing

30:11

with your literal organs, your heart. So

30:15

I think that you know, ideally what happens

30:17

is you've got a good combination of regulation

30:20

and oversight, which I do believe in, but

30:23

then also training and

30:25

you know, good human intention to

30:27

do the right thing. So

30:31

when I think about these algorithms, I

30:33

think of them as kind of automated decision makers,

30:36

and I think they can pose a challenge

30:38

to our ideas of free

30:41

will and self determination

30:44

because we are increasingly living in this world

30:47

where we think we're making choices, but

30:49

we're actually operating within a narrow set of recommendations.

30:53

What do you think about human agency

30:56

in the age of algorithms? WHOA

30:58

These are the big questions? Um,

31:00

Well, I mean I think that we have to be careful

31:03

not to give the machines more

31:05

agency than they have. And there

31:07

are people who are making those machines. So

31:10

when we talk about, you know, the free

31:13

will of people versus machines,

31:15

it's like the free will of people versus

31:19

the people who made the machines. To me,

31:21

technology is just a tool, and

31:24

I personally don't want to

31:26

live in a world that has no

31:29

algorithms and no technology because these are

31:31

useful tools. But I want

31:33

to decide when I'm using them and what I use

31:35

them for. And so my perspective

31:38

is really from the point

31:40

of view of a person who has

31:42

been making the tools, and I

31:45

think that we need to make sure that those

31:47

folks have the free will to

31:49

say, no, I don't want to make those tools,

31:52

or this should not be used in this way, or

31:54

we need to modify this tool in this way so

31:57

those tools don't run away from us. Um

32:00

So, I guess I I kind of disagree

32:03

with the premise that it's people versus machines

32:06

because people are making the machines and we're not at

32:08

the terminator stage yet. Currently it's people

32:11

and people, right, So so let's so

32:13

let's like work together to make the right things

32:15

um for people. Yes, Kasha,

32:19

thank you so much for spending this time with me. I've

32:21

learned a lot and now I'm just

32:23

thinking about Arsenic and my muffins. Thanks

32:25

so much for having me. I've really enjoyed it. Garbage

32:32

in, garbage out. It's a cycle

32:34

that we see that doesn't just apply to the

32:36

world of artificial intelligence, but everywhere.

32:40

If I feed my body junk, it

32:42

turns to junk. If I

32:44

fill my planet with filth, it

32:46

turns to filth. If I

32:48

inject my Twitter feed with hatred,

32:51

that breeds more hatred. It's

32:53

pretty straightforward, but

32:55

it doesn't have to be this way. In

32:58

essence, kashas to standardize

33:01

thoughtfulness, and that fills

33:03

me with so much hope.

33:06

We're all responsible for something

33:09

or someone, so let's

33:11

always do our best to really

33:13

consider what they need to thrive.

33:16

If we put a little more goodness

33:18

into our ai, our bodies,

33:21

our planet, our relationships,

33:23

and everything else, we'll

33:25

see goodness come out. And

33:28

that's a psychle I can get behind goodness

33:31

in, goodness out. This

33:36

is just one part of the how does citizen

33:38

conversation about data? Who

33:40

does data ultimately benefit?

33:44

If the data is not benefiting the

33:46

people, the individuals, the communities

33:48

that provided that data. Then

33:51

who are we uplifting at the cost

33:53

of others justice? Next

33:56

week, we dive deeper into how it's collected

33:58

in the first place, and we meet an indigenous

34:00

geneticist reclaiming data for her people.

34:03

See you. Then we

34:13

asked Kasha what we should have you

34:15

do, and they came up with a lot.

34:18

So here's a whole bunch

34:20

of beautiful options for citizening.

34:23

Think about this. Like people,

34:26

machines are shaped by the context in which

34:28

they're created. So if we think of machines

34:30

and algorithmic systems as children who are

34:32

learning from us, we're to parents. What

34:35

kind of parents do we want to be? How

34:37

do we want to raise our machines to

34:39

be considerate, fair, and to

34:41

build a better world than the one we're in today. Watch

34:45

Coded Bias. It's a documentary

34:47

that explores the fallout around m

34:49

I T media lab researcher Joy will

34:52

Ameni's discovery that facial recognition

34:54

don't see dark skin face as well, and

34:57

this film is capturing her journey

34:59

to put for the first ever legislation in

35:01

the US that will govern against bias

35:04

and the algorithms that impact us all. Check

35:07

out this online buying resource called

35:09

the Privacy Not Included Buying

35:11

Guide. Mozilla built this shopping

35:14

guide which tells you the data practices

35:16

of the app or product that you're considering,

35:19

and it's basically the product reviews we

35:21

need in this hyper connected

35:23

era of data theft and

35:26

hoarding and non consensual

35:28

monetization. Donate

35:31

if you've got money, you can distribute

35:33

some power through dollars to these groups

35:35

that are ensuring that the future of AI is

35:37

human and just, the Algorithmic

35:40

Justice League, the a c L You, and

35:42

the Electronic Frontier Foundation. If

35:47

you take any of these actions, please brag

35:49

about yourself online. Use the hashtag

35:51

how to citizen. Tag us up on Instagram

35:54

at how to Citizen. We will

35:56

accept general direct feedback to our

35:59

inbox common at how to citizen

36:01

dot com and make sure you go ahead and

36:03

visit how to citizen dot com because that's the brand

36:05

new kid in town. We have a

36:07

spanky new website. It's

36:09

very interactive. We have an email list

36:12

you can join. If you like this show,

36:14

tell somebody about thanks. How

36:18

to Citizen with Baryton Day is a production of I

36:20

Heart Radio Podcasts and Dust Light Productions.

36:23

Our executive producers are Me

36:25

Barryton Day Thurston, Elizabeth Stewart, and

36:27

Misha Yusuf. Our senior producer

36:29

is Tamika Adams, our producer is Ali

36:32

Kilts, and our assistant producer is Sam Paulson.

36:34

Stephanie Cohen is our editor, Valentino

36:36

Rivera is our senior engineer, and Matthew

36:38

Lai as our apprentice. Original

36:41

music by Andrew Eathan, with additional original

36:43

music for season three from Andrew Clausen. This

36:46

episode was produced and sound designed by Sam

36:48

Paulson. Special thanks to Joel Smith

36:50

from My Heart Radio and Rachel Garcia

36:52

at dust Light Productions.

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features