Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:00
Yeah. Welcome
0:02
to How to Citizen with Baritune Day, a podcast
0:05
that reimagined citizen as a verb, not
0:07
a legal status. This season
0:09
is all about tech and how it can bring us
0:12
together instead of tearing us apart. We're
0:15
bringing you the people using technology
0:17
for so much more than revenue and user
0:19
growth. They're using it to help
0:22
us citizen. I
0:34
have been working over the past year
0:36
to try to integrate my own thinking around
0:39
technology, and last year I wrote
0:41
a bit of a manifesto. Back
0:44
in I was invited
0:46
to speak at Google IOH, an annual
0:49
developer conference held by Google.
0:52
They wanted me to share my thoughts on what the future
0:54
of technology could look like. I
0:56
went on a journey to try to understand
0:58
how all my data existed amongst
1:01
the major platforms, amongst app developers,
1:04
and what came out of that was a set
1:06
of principles to help guide us more
1:08
conscientiously into the future.
1:11
Now. The first principle of my manifesto
1:14
is all about transparency. Like
1:17
I wanted to understand what was going
1:19
on inside the apps, behind
1:21
the websites I was spending all my time on.
1:23
When I want to know what's in my food,
1:26
I don't drag a chemistry set to
1:28
the grocery store and inspect every
1:31
item point by point. I read
1:33
the nutrition label. I know the
1:35
content, the calories, the ratings.
1:38
I shouldn't have to guess about
1:40
what's inside the product. I certainly shouldn't
1:42
have to read thirty three thousand
1:44
word legally's terms of service to
1:47
figure out what's really happening
1:49
inside. It's pretty simple.
1:52
We make better decisions about the things we
1:54
consume when we know what's in them.
1:57
So if I'm checking out an app on the app store
1:59
right and I see upfront that it's going to harvest
2:01
my data and slang it on some digital street corner,
2:05
can I interest you in so data?
2:09
I can ask myself, Hey, self, are
2:12
you okay with this app harvesting your data and
2:14
slanging it on a digital street corner? And
2:16
then, having asked myself that question,
2:19
I can decide whether or not to download
2:22
it. I don't have to hope that it won't
2:24
screw me over. I can know, but
2:27
check it out. This nutrition label
2:30
idea hasn't just existed in
2:32
the vacuum of my own brain. It's
2:34
a real thing. There are actual
2:37
people making nutrition labels in the world
2:39
of tech. In the same
2:41
way that I walk into a bakery
2:43
and I see a cake that's been baked, and I
2:45
might think to myself, I wonder
2:48
what's in that cake. We would want the same thing
2:50
for a data set, where even if you
2:52
encounter that data set in the wild, you, as
2:54
a data practitioner, will think to yourself, I
2:57
wonder if this is representative. Cash
3:00
of Malinsky is one of those people.
3:03
These labels are a little different from what I propose
3:05
that Google, I yoe. Their
3:07
data nutrition labels aren't for consumers
3:09
like me and you at the end of the assembly
3:11
line. Instead, therefore, the people
3:13
at the very beginning the data scientists.
3:16
Now, Kasha's data nutrition labels
3:19
are an easy to use tool to help data
3:21
scientists pick the data that's right
3:23
for the thing they're making. We
3:27
interact with algorithms every day,
3:29
even when we're not aware of it. They
3:32
affect the decisions we make about hiring,
3:34
about policing, pretty much everything.
3:37
And in the same way that we the people
3:40
ensure our well being through government standards
3:42
and regulations on business activities.
3:44
For example, data scientists
3:46
needs standards to Kasha
3:51
is fighting for standards that will make sure
3:53
that artificial intelligence works
3:55
for our collective benefit or at
3:57
least doesn't undermine
4:00
m Hi. Hello,
4:03
how are you feeling right now? Kasha? I'm
4:06
feeling pretty good the beginning of another
4:08
way. Kasha is the co founder and lead
4:10
of the Data Nutrition Project, the team
4:12
behind those labels. They've also
4:14
worked as a digital services technologist
4:17
in the White House, on COVID analytics
4:19
at Mackenzie and in communications
4:22
at Google. Yeah.
4:24
Yeah, so I've kind of I've jumped around.
4:27
Yeah, so
4:30
why don't you introduce yourself and just tell
4:32
me what you do. My name
4:34
is Kasha Shamalinski, and I am a technologist
4:37
working on the ethics of data.
4:40
And I'd say, you know importantly
4:42
to me, although I have always been a nerd
4:44
and I studied physics along time ago.
4:47
I come from a family of artists. Actually,
4:49
the painting behind me is by
4:51
my brother. There's another one in the room by my mom
4:54
um. And so I come from a really kind of multidisciplinary
4:57
group of people who are driven
4:59
by our passions. And that's kind of what I've tried to
5:01
do too, and it's just led me on many different
5:03
paths. Where does
5:06
the interest in technology come from? For you?
5:09
You know, I don't think that it's really an interest
5:11
in technology. It's just that we're in a technological
5:13
time. And so when I
5:15
graduated from university with this physics
5:18
degree, I had a few options,
5:21
and none of them really seemed
5:23
great. Uh. You know, I could go into defense
5:25
work, I could become a spy, or I could
5:28
make weapons, and that really wasn't so interesting
5:30
to me. Was being was
5:32
Was spy really an option? Uh?
5:35
Yes, so
5:37
you know I could do that, um, but I
5:39
didn't end and none of these are really interesting
5:42
because I wanted to make
5:44
an impact and I wanted to drive change, and I think that
5:46
was around you know, um, early thousands, and
5:49
technology was the place to be. That's where you could really
5:51
have the most impact and solve really big problems.
5:54
Um. And so that's where I ended up. So I
5:56
actually don't think that it's really about the
5:58
technology at all. I think that the technology is
6:00
just a tool that you can use to
6:02
to kind of make an impact in the world. I
6:05
love the way you describe the
6:08
interest in technology is really just an interest
6:10
in the world. So do you remember
6:12
some of the first steps that
6:14
led you to what you are doing now? So
6:18
when I graduated, I actually applied
6:20
to many things and didn't get them. And what
6:22
I realized that I really didn't know how to do
6:24
it at all. Always tell a story. Um,
6:27
and coming out of a fairly technical
6:29
path, I couldn't really
6:31
make eye contact. I hadn't talked to a
6:34
variety of people. I mean, I was definitely
6:36
one of the only people who had my identity in
6:39
in that discipline at that time. I went
6:41
to a school where the the head of the school
6:43
at the time was saying that women might
6:45
not be able to do science because biologically they
6:48
were inferior in some way. Oh
6:50
that's nice, very welcoming environment. Oh
6:52
yeah, super welcoming. And I was studying physics and at
6:54
the time, I you know, it was female identified. I
6:56
now identify as non binary. Um. But
6:59
it wasn't like a great place to be doing
7:01
science, and I just felt like coming out of that,
7:03
I was, UM. I didn't know how to
7:05
talk to people. I didn't know what it was like to be part of a
7:07
great community. And so I actually went into communications
7:10
at Google, which was strange duringdustory
7:13
industry. I went from this super
7:15
nerdy, very male dominated place
7:17
to like a kind of like the party
7:20
wing of of technology
7:22
at the time. Right, So people who are doing a lot
7:24
of marketing and communications and talking to journalists
7:26
and telling stories and trying to figure out like what's
7:28
interesting it has this fit into the greater narratives
7:31
of our time. So
7:38
while at Google, I got to see inside
7:40
of so many different projects that I think
7:42
was a great benefit to being
7:45
part of that strategy team. So
7:47
I got to work on core Search, I got to
7:49
work on image search, I got to work on
7:51
Gmail and Calendar, and
7:54
I started to see the importance
7:57
of first of all, knowing
7:59
why you're building something before you
8:02
start to build it, right, And there were so
8:04
many times that I saw really
8:06
really cool product at the end of the day,
8:08
something an algorithm or something technical
8:10
that was just really cool, but there was
8:12
no reason that it needed to exist.
8:15
Right from from a person perspective, from a society
8:17
perspective, I
8:26
am relieved to hear you say that.
8:29
There's been one of my critiques of this industry
8:31
for quite some time. It's like, whose
8:33
problems are you trying to solve? And so
8:35
you were at the epicenter of one of the major
8:38
companies seeing some of this firsthand. Yeah,
8:40
that's exactly right, And it was endemic.
8:43
I mean It just happens all the time, and it's not the
8:45
fault of anyone in particular. You just put a bunch
8:47
of really smart engineers on a technical problem
8:49
and they just find amazing
8:51
ways to solve that. But then at the end of the day
8:54
you say, well, how are we actually going to use this? And
8:56
that would fall to the comms team,
8:58
right or the marketing team to say, okay, now
9:00
what are we going to do with this? Um So that was one thing
9:02
and that's why I actually ended up moving into product
9:04
management, where I could think about why
9:07
we want to build something to begin with, and to make sure
9:09
we're building the right thing. Um So I
9:11
got closer to the technology after that job. The
9:15
second thing that I became aware of is the
9:18
importance of considering the
9:20
whole pipeline of the thing that you build, because
9:22
the thing that you build, it's d n A, is
9:25
in the initial data that you
9:27
put into it. And I'm talking specifically
9:29
about algorithmic systems here. So
9:32
one example I have from my days
9:35
when I was at Google. I actually I worked out
9:37
of the London office and there was a
9:39
new search capability and
9:41
it was trained entirely on one particular
9:43
accent and then
9:46
when other people tried to use that, if they didn't
9:48
have that very specific accent, it wasn't working
9:50
so well. And I really didn't
9:52
know much about AI at the time. I hadn't
9:54
studied it, but I realized, you know,
9:56
bias in bias out like garbage in garbage
9:59
out. You you feed this machine something,
10:01
the machine is going to look exactly
10:03
like what you fed it. Right, you are what you
10:05
eat. We'll
10:10
be right back. We
10:22
use these terms data, we use this terms
10:24
algorithm and artificial intelligence.
10:27
And so before we keep going, I'd love for
10:30
you to pause and kind of explain
10:32
what these things are in their relationship
10:35
to each other. Data, algorithms,
10:38
artificial intelligence. How how does Kasha
10:40
define these? Yeah,
10:44
thank you for taking a moment. I think that that's
10:47
um. Something that's so important in technology is
10:49
that people feel like they aren't allowed
10:51
to have an opinion or have
10:53
thoughts about it because they don't quote
10:55
unquote don't understand it. But
10:57
you're right, it's just it's just a definition
11:00
all issue. Often. So,
11:03
data is anything that
11:05
is programmatically accessible
11:08
that is probably in enough volume
11:10
to be used for something by a system.
11:13
So it could be records of something,
11:15
it could be whether information. It
11:17
could be the notes taken by a doctor
11:19
that then get turned into something that's programmatically
11:22
accessible. There's a lot of stuff
11:24
and you can feed that to a machine. I'm
11:28
really interested in algorithms because it's kind
11:30
of the practical way of understanding something
11:32
like AI. It's it's a mathematical formula
11:35
and it it takes some stuff and then it outputs
11:37
something. So that could be something like
11:40
you input where you live
11:43
and your name, and then the algorithm
11:45
will churn and spit out something like
11:47
you know what race or ethnicity it thinks you are.
11:50
And that algorithm, in order to to
11:52
make whatever guesses it's making,
11:55
needs to be fed a bunch of data so
11:57
that it can start to recognize patterns. When
12:01
you deploy that algorithm out in in the
12:03
world, you feed it some data and it
12:05
will spit out what it believes is
12:07
the pattern that it recognizes based on what
12:09
it knows. You
12:14
know, there's different flavors of AI. I think
12:17
a lot of people are very afraid of kind
12:19
of the terminator type AI. I'll
12:22
be back as
12:24
as we should be, because the terminator is very scary.
12:27
I've seen the documentary many times
12:29
and I don't want to live in that world. Yeah,
12:32
legitimately very scary. UM.
12:35
And so there's this there's this question of Okay,
12:37
is the AI going to come to eat our lunch?
12:40
Right? Are they smarter than us? And all the things that
12:42
we can do, and that's like you know, generalized
12:44
AI or or even kind of super AI.
12:47
We're not quite there yet. Currently,
12:49
we're in the phase where we have discrete
12:51
AI that makes discreet decisions and we
12:53
leverage those to help us in our
12:55
daily lives or to hurt us. Sometimes
12:58
data as food for algorithms. I
13:01
think it's a really useful metaphor. And
13:03
a lot of us out in
13:05
the wild who aren't specialized in this, I
13:08
think we're not encouraged to understand that relationship.
13:10
I agree, And I think the
13:13
the relationship between what you feed
13:15
the algorithm and what it gives you is
13:17
so direct, and people don't necessarily
13:20
know that or see that. And what you see
13:22
is the harm or the output that
13:24
comes out of the system, and what you
13:27
don't see is all the work that went into
13:29
building that system. You have someone who
13:31
decided in the beginning they wanted to use AI, and
13:33
then you have somebody who went and found the data,
13:35
and you have somebody else who cleaned the data, and
13:38
you've got somebody or somebody's who
13:40
then built the algorithm and train the algorithm,
13:43
and then you have the somebodies who coded that up, and
13:46
then you have the somebody's that's deployed that, and
13:48
then you have people who are running that. And
13:50
so when the algorithm comes out the end and there's
13:53
a decision that's made you get the loan, you didn't get
13:55
the loan. The algorithm recognizes
13:57
your speech, doesn't recognize your speech see
14:00
ease, you doesn't see you. People
14:02
think, oh, just change the algorithm. Oh no,
14:04
you have to go all the way back to the beginning
14:12
because you have that long chain of people who are doing
14:14
so many different things, and it becomes very complicated
14:17
to try to fix that. So the
14:19
more that we can understand that the process
14:22
begins with the question
14:24
do I need AI for this? And
14:26
then very quickly after where are we going
14:28
to get the data to feed that? So that we
14:30
make the right decision. The sooner
14:33
we understand that as a society, I
14:35
think the easier it's going to be for us to build better AI
14:38
because we're not just catching the issues at the very end
14:40
of what can be a year's long process. Mm
14:43
hm. So so what problems does
14:45
the Data Nutrition Project aimed to tackle?
14:48
We've kind of talked about them all in pieces. At
14:50
its core, the Data Nutrition Project, which
14:52
is this research organization that I
14:54
co found a bunch of very smart people. We
14:57
were all part of a fellowship that was
14:59
looking at the ethics and governance of
15:01
AI. And so when we sat down
15:03
to say what are the real things that we
15:05
can do to drive change um
15:07
as practitioners, as people in the space, as people
15:09
who had built AI before, we decided
15:12
let's just go really small. And
15:14
obviously it's actually a huge problem and it's it's very
15:17
challenging. But instead of saying, let's
15:19
look at the harms that come out of an
15:21
AI system, let's just think
15:23
about what goes in. And I think
15:25
we're maybe eating a lot of snacks. We were hold
15:28
up at the M I T Media Lab, right, So we were just
15:30
all in this room for many many hours,
15:32
many many days, and I think somebody
15:34
at some point picked up, you know, a snack package,
15:38
and we're like, what if you just had a nutritional
15:41
label like the one you have on food, you just
15:43
put that on a data set. What would that do?
15:45
I mean, is it possible? Right?
15:48
But if if it is possible, would that actually
15:50
change things? And we started talking it over and we thought,
15:52
you know, we think it would. In
15:55
our experience in data science as practitioners,
15:58
we know that data doesn't hum with
16:01
standardized documentation, and
16:04
often you get a data set and you don't
16:06
know how you're supposed to use it or not use it. There
16:09
may or may not be tools that you use to
16:11
look at things that will tell you whether
16:13
that data set is healthy for the thing that you want
16:16
to do with it. The
16:20
standard process would be a product
16:22
manager CEO would come
16:24
over to the desk of data scientists and say,
16:27
look, we have all this information about this new product
16:29
we want to sell. We need to map
16:32
the marketing information to the demographics
16:35
of people who are likely to want to
16:37
buy our product or click on our product. Don't
16:39
make it happen. And the data scientist goes
16:42
okay, and the person goes, oh yeah, by tuesday, and
16:44
the persons like, oh okay, let
16:47
me go find the right data for that. There's
16:49
a whole world. You just google a bunch of stuff and
16:52
then you get the data, and then you kind
16:54
of poke around and you think, as seems pretty good,
16:57
and then you use it and you build
16:59
your algorithm on that. Your algorithm is
17:01
going to determine which demographics or
17:03
what geographies or whatever it is you're trying to do.
17:06
You train it on that data you found,
17:09
and then you deploy that algorithm and it starts to
17:11
work in production. And you
17:13
know, no fault of anybody, really, but the
17:15
industry has grown up so much faster than
17:18
the structures and the scaffolding to keep
17:20
that industry doing the right thing. So
17:23
there might be documentation on some of the data, there might
17:25
not be in some cases. We're working with a data
17:28
partner that was very concerned how people
17:30
were going to use their data. The data set
17:32
documentation was an eighty page PDF.
17:35
Zero that data scientist who's on
17:37
deadline for Tuesday is not going to read eighty
17:39
pages. So our
17:42
thought was, hey, can we distill
17:44
the most important components
17:46
of a data set and its usage to
17:49
something that is maybe one sheet two sheets
17:51
right, using the analogy of the nutrition label,
17:54
put it on a data set, and then make that
17:56
the standard so that anybody who is picking
17:58
up a data set to decide whether or not to use. It
18:00
will very quickly be able to assess is this healthy
18:03
for the thing I want to do. It's a novel
18:05
application of a thing that so many of us
18:07
understand. What are some of
18:09
the harms you've seen some
18:12
of the harms you're trying to avoid by
18:14
the data scientists who are building these services
18:17
not having access to healthy data.
18:20
Yeah. Let's say you have a data set that health
18:23
outcomes and you're looking at
18:25
people who have had heart attacks or something
18:27
like that, and you realize
18:29
that the data was only taken
18:31
from men in their sixties.
18:34
If you are now going to use this as
18:37
a data set to train an algorithm to provide
18:39
early warning signs for who
18:42
might have a heart attack, you're
18:44
gonna miss entire demographics
18:46
of people, which may or may not matter. That's
18:48
a question. Does that matter? I don't know, But
18:50
perhaps it matters what the average
18:52
size of a body is, or the average age
18:54
of a body is, or maybe there's
18:56
something that is gender or sex related,
18:59
and you will miss so all of that. If you just take
19:01
the data at face value, you don't think about who's
19:03
not represented here. I remember
19:06
examples that I used to cite in
19:08
some talks. It was the Amazon hiring decisions.
19:12
Amazon software engineers recently
19:14
uncovered a big problem. Their
19:16
new online recruiting tool did
19:18
not like women. It
19:21
had an automated screening system
19:23
for resumes, and that system ignored
19:26
all the women because the data
19:28
set showed that successful
19:30
job candidates at Amazon were
19:32
men. And so the computer like
19:34
garbage in, garbage out. The way we've discussed
19:37
said, well, you've defined success as mail.
19:40
You've fed me a bunch of female that's
19:43
not success. Therefore my formula
19:45
dictates they get rejected, and
19:48
that affects people's job prospects. You know, that affects
19:51
people sense of their self worth and self esteem. That
19:53
could open up the company to liability,
19:56
all kinds of harms in a system that was supposed
19:58
to reread efficiency and
20:01
and help. Yeah,
20:03
that's a great example, and it's, you know, a
20:05
very true one. And I think that
20:07
one was pretty high profile.
20:09
Imagine all the situations that either
20:12
have never been caught or we're kind of too
20:14
low profile to make it into the news. It
20:16
happens all the time because
20:18
the algorithm is a kind of a reflection
20:20
of whatever you've fed it. So in that case,
20:23
you had historical bias, and so
20:25
the historical bias in the resumes
20:27
that they were using to feed the algorithm showed
20:30
that men were hired more frequently and
20:32
that was success. It also comes
20:34
down to, in terms of the metrics, how you're defining
20:36
things. If your definition of success is
20:39
that someone was hired, you're not necessarily
20:42
saying that your definition is that person
20:44
was a good ended up being a good worker.
20:47
Or even if you're looking at the person's performance
20:49
reviews and saying success would be that we
20:51
hire somebody who performs well. But
20:53
historically you hired more men than women. So
20:56
even then, if your success metric is
20:58
someone who performed well, you're already taking
21:00
into account the historical bias that there are more men
21:02
than women who are hired. So there
21:05
are all different kinds of biases that are being captured
21:07
in the data. Something
21:12
that the Data Nutrition Project is
21:15
trying to do with the label that we've built is
21:17
highlight these kinds of historical issues
21:19
as well as the technical issues in the data,
21:22
and that I think is an important balance to
21:24
strike. It's not just about
21:26
what you can see in the data. It's
21:28
also about what you cannot see in the data. So
21:33
in the case that you just called out there with the resumes,
21:35
you would be able to see that's not representative with respect
21:38
to gender, and maybe
21:40
you'd be able to see things like these are all English
21:42
language resumes, But what you would
21:44
not be able to see are things
21:47
like socio economic differences or
21:49
people who never applied, or you
21:51
know, what the job market looked like
21:53
whenever these resumes were collected, So
21:55
you'll kind of not be able to see any of that if you just take
21:57
a purely technical approach to what's in the data
22:00
up. So the data set nutrition
22:02
label tries to highlight those things as well to
22:04
data practitioners to say, before
22:07
you use this data set, here are some things you should
22:09
consider, and sometimes
22:12
will even go as far as to say you
22:14
probably shouldn't use this data set for
22:16
this particular thing because we just
22:18
know that it's not good for that, and
22:20
that's always an option, is to say don't use it.
22:23
Right. It doesn't mean people won't do it, but at least
22:25
we can give you a warning, and we kind of hope
22:27
that people have the best of intentions and are trying
22:29
to do the right thing. So it's about explaining
22:32
what is in the data set or
22:34
in the data so that you can decide as
22:36
a practitioner whether or not it is healthy for your
22:38
usage. After
22:42
the break, it's snack time. I'm
22:45
back hungry. So
22:58
I'm holding a package your food right now,
23:01
and I'm looking at the nutrition
23:03
label nutrition facts. It's got
23:06
servings per container, the size
23:08
of a serving, and then numbers
23:11
and percentages in terms of the daily
23:13
percent of total fact cholesterol, sodium,
23:16
carbohydrates, protein, and
23:18
a set of vitamins that I can expect
23:20
in a single serving of this product. And then
23:23
I can make an informed choice about
23:25
whether and how much of that food
23:27
stuff I want to put in my body,
23:30
how much garbage I want to let in. In this case, it's
23:32
pretty healthy stuff. It's uh dried
23:34
mangoes. If you're curious, what's
23:38
on your data nutrition label? Yeah,
23:42
a great question. And now I'm like kind of hungry. I'm like,
23:44
oh, it's a snack time. I feel like it's snack time. Um.
23:49
This is the hardest part to me about
23:51
this project is what the
23:53
right level of metadata is. So
23:56
what are the right elements that you want to call out
23:58
for our nutritional label?
24:00
You know, what are the facts and the sodiums and these kinds of
24:02
things, Because you know, that The complication here
24:05
is that there are so many different
24:07
kinds of data sets. I can have a data
24:09
set about trees in Central Park, and I can have a
24:11
data set about people in prison. So
24:14
we've kind of identified
24:16
that the harms that were most worried
24:18
about have to do with people. Um
24:21
not to say that we are, you know, not worried
24:24
about things like the environment or
24:26
other things, but when it touches people
24:29
or communities is when we see the
24:31
greatest harms from an algorithmic standpoint
24:33
in society. And so we
24:36
kind of have a badge system that should
24:38
be very quick, kind of icon based that
24:40
says this data sets about people are not This
24:43
data set includes subpopulation
24:45
data, so you know, includes information
24:47
about race or gender or
24:50
whatever status. Right, this
24:52
data set can be used for commercial
24:54
purposes or not. We've identified, let's
24:56
say, tend to fifteen things that we think
24:59
are kind of high level almost like little
25:01
food warning symbols that you would see on something
25:04
like organic or it's got a surgeon General's
25:07
warning right exactly. So
25:09
at a very high level we have these kind of icons.
25:12
And then underneath that there are
25:14
additional very important questions that we've
25:16
highlighted that people will answer who
25:18
own the data set? And then finally there's a
25:20
section that says, here's the reason it was made.
25:23
The data set was made, it's probably an intended use.
25:25
Here are some other use cases that are possible
25:28
or ways that other people have used it, and
25:30
then here are some things that you just shouldn't do. So
25:33
how do we make this approach
25:36
more mainstream? Mainstream
25:38
is a tough word because we're talking about people
25:41
who build AI, and I think that is becoming
25:43
more mainstream for sure. Um,
25:45
but we're really focused on data practitioners,
25:47
so people who are taking data and then
25:49
building things on that data. But there's kind
25:52
of a bottoms up approach. It's very
25:54
anti establishment in some ways, in very
25:56
hagriculture. And so we've
25:58
been working with a lot of data petitioners to
26:00
say what works, what doesn't, is as useful
26:02
as it not. Make it open source right, open licenses,
26:05
use it if you want, and just hoping that if
26:07
we make a good thing, people will use it. A
26:09
rising tide lifts all boats, we think, so you
26:12
know, we're not cag about it because
26:14
we just want better data. We have better data out there,
26:16
and if people have the expectation that they're going to
26:18
see something like this, that's awesome. There's
26:20
also the top down approach, which is regulation
26:23
policy, And I could imagine a world
26:25
in which in the future, if you
26:28
deploy an algorithm, especially
26:30
in the public sector, you would have to include
26:32
some kind of labeling on that right to talk about
26:34
the data that it was trained on and provide a label for that.
26:36
So it's kind of a two way approach, you know. Yeah,
26:39
no, I mean when I think of analogus, like most
26:41
of us don't know civil engineers
26:44
personally, but we interact with
26:46
their work on a regular basis through
26:48
a system of trust, through standards,
26:50
through approvals, through certifications,
26:53
and data scientists are on
26:55
par with like a civil engineer in my mind,
26:57
and that the erect structures that we inhabit
27:00
a regular basis. But I have no
27:02
idea what rules they're operating by.
27:04
I don't know what's in this algorithm, you know, I
27:06
don't know how what ingredients you used to put this together
27:08
that's determining whether I get a job or
27:11
vaccination. What's
27:13
your biggest dream for the Data nutrition
27:15
project? Where does it go? So
27:18
I could easily say, you know, our
27:21
dream would be that every data set comes with a label. Cool,
27:24
But more than that, I think we're trying to
27:26
drive awareness and change. So even if
27:28
there isn't a label, you're thinking about,
27:30
I wonder what's in this and I wish it had a label
27:32
on it. In
27:35
the same way that I walk into
27:37
a bakery and I see a cake that's
27:39
been baked, and I might think to myself, I
27:42
wonder what's
27:44
in that cake, and I
27:46
wonder, you know, if
27:48
it has this much of something, or maybe
27:51
I should consider this when I decide whether
27:53
to have four or five pieces of cake. We
27:55
would want the same thing for a data set where even
27:57
if you encounter that data set in the wild, someone's
28:00
created it. You just downloaded it from some repository
28:03
on gith hub. There's no documentation
28:06
that you, as a data practitioner, will think to yourself,
28:09
I wonder if this is representative. I
28:11
wonder if the thing I'm trying to do with this
28:14
data is responsible, considering
28:17
the data, where it came from, who
28:19
touched it, who funded it, where it lives, how
28:21
often it's updated, whether they
28:23
got consent from people when they
28:25
took their data, And so we're
28:28
trying to drive a culture change.
28:32
I love that and I love the idea
28:34
that when I go to a bakery, one
28:36
of the questions I'm not asking myself is
28:39
is that muffin safe to eat? Right?
28:42
Is that Kate gonna kill me? It
28:45
literally doesn't enter my mind because there's such
28:47
a level of earned trust in
28:49
the system overall that
28:52
you know, these people are getting inspected, that
28:54
there's some kind of oversight that they were trained
28:56
in a reasonable way, so I know
28:58
there's not arsenic in the muffins.
29:01
So this brings me to zooming
29:03
out a little bit further to artificial intelligence
29:06
and the idea of standards, because
29:08
I'm getting this picture from you that there's kind
29:10
of a wild West in terms of what we're feeding
29:12
into the systems that ultimately become
29:15
some form of AI. What does the world
29:17
look like when we have more
29:19
standards in the tools
29:21
and components that create AI. I
29:24
think that our understanding of what AI
29:27
is and what kinds
29:29
of AI there are is going to mature.
29:31
I imagine that there is a system
29:34
of classification where
29:36
some AI is very high risk and
29:39
some AI is less high risk, and
29:41
we start to have a stratified view of
29:43
what needs to occur in each
29:45
level in order to reach an understanding
29:48
that there's no arsenic in the muffins.
29:50
So at the highest level, when
29:52
it's super super risky, maybe we just don't
29:55
use AI. This seems to be
29:57
something that people forget, is that we can decide whether
29:59
or not to use it. Like, would
30:01
you want an AI performing surgery
30:03
on you with no human around? If
30:07
it's really really good? Do you want that? Do
30:09
you want to assume that risk? I mean that is dealing
30:11
with your literal organs, your heart. So
30:15
I think that you know, ideally what happens
30:17
is you've got a good combination of regulation
30:20
and oversight, which I do believe in, but
30:23
then also training and
30:25
you know, good human intention to
30:27
do the right thing. So
30:31
when I think about these algorithms, I
30:33
think of them as kind of automated decision makers,
30:36
and I think they can pose a challenge
30:38
to our ideas of free
30:41
will and self determination
30:44
because we are increasingly living in this world
30:47
where we think we're making choices, but
30:49
we're actually operating within a narrow set of recommendations.
30:53
What do you think about human agency
30:56
in the age of algorithms? WHOA
30:58
These are the big questions? Um,
31:00
Well, I mean I think that we have to be careful
31:03
not to give the machines more
31:05
agency than they have. And there
31:07
are people who are making those machines. So
31:10
when we talk about, you know, the free
31:13
will of people versus machines,
31:15
it's like the free will of people versus
31:19
the people who made the machines. To me,
31:21
technology is just a tool, and
31:24
I personally don't want to
31:26
live in a world that has no
31:29
algorithms and no technology because these are
31:31
useful tools. But I want
31:33
to decide when I'm using them and what I use
31:35
them for. And so my perspective
31:38
is really from the point
31:40
of view of a person who has
31:42
been making the tools, and I
31:45
think that we need to make sure that those
31:47
folks have the free will to
31:49
say, no, I don't want to make those tools,
31:52
or this should not be used in this way, or
31:54
we need to modify this tool in this way so
31:57
those tools don't run away from us. Um
32:00
So, I guess I I kind of disagree
32:03
with the premise that it's people versus machines
32:06
because people are making the machines and we're not at
32:08
the terminator stage yet. Currently it's people
32:11
and people, right, So so let's so
32:13
let's like work together to make the right things
32:15
um for people. Yes, Kasha,
32:19
thank you so much for spending this time with me. I've
32:21
learned a lot and now I'm just
32:23
thinking about Arsenic and my muffins. Thanks
32:25
so much for having me. I've really enjoyed it. Garbage
32:32
in, garbage out. It's a cycle
32:34
that we see that doesn't just apply to the
32:36
world of artificial intelligence, but everywhere.
32:40
If I feed my body junk, it
32:42
turns to junk. If I
32:44
fill my planet with filth, it
32:46
turns to filth. If I
32:48
inject my Twitter feed with hatred,
32:51
that breeds more hatred. It's
32:53
pretty straightforward, but
32:55
it doesn't have to be this way. In
32:58
essence, kashas to standardize
33:01
thoughtfulness, and that fills
33:03
me with so much hope.
33:06
We're all responsible for something
33:09
or someone, so let's
33:11
always do our best to really
33:13
consider what they need to thrive.
33:16
If we put a little more goodness
33:18
into our ai, our bodies,
33:21
our planet, our relationships,
33:23
and everything else, we'll
33:25
see goodness come out. And
33:28
that's a psychle I can get behind goodness
33:31
in, goodness out. This
33:36
is just one part of the how does citizen
33:38
conversation about data? Who
33:40
does data ultimately benefit?
33:44
If the data is not benefiting the
33:46
people, the individuals, the communities
33:48
that provided that data. Then
33:51
who are we uplifting at the cost
33:53
of others justice? Next
33:56
week, we dive deeper into how it's collected
33:58
in the first place, and we meet an indigenous
34:00
geneticist reclaiming data for her people.
34:03
See you. Then we
34:13
asked Kasha what we should have you
34:15
do, and they came up with a lot.
34:18
So here's a whole bunch
34:20
of beautiful options for citizening.
34:23
Think about this. Like people,
34:26
machines are shaped by the context in which
34:28
they're created. So if we think of machines
34:30
and algorithmic systems as children who are
34:32
learning from us, we're to parents. What
34:35
kind of parents do we want to be? How
34:37
do we want to raise our machines to
34:39
be considerate, fair, and to
34:41
build a better world than the one we're in today. Watch
34:45
Coded Bias. It's a documentary
34:47
that explores the fallout around m
34:49
I T media lab researcher Joy will
34:52
Ameni's discovery that facial recognition
34:54
don't see dark skin face as well, and
34:57
this film is capturing her journey
34:59
to put for the first ever legislation in
35:01
the US that will govern against bias
35:04
and the algorithms that impact us all. Check
35:07
out this online buying resource called
35:09
the Privacy Not Included Buying
35:11
Guide. Mozilla built this shopping
35:14
guide which tells you the data practices
35:16
of the app or product that you're considering,
35:19
and it's basically the product reviews we
35:21
need in this hyper connected
35:23
era of data theft and
35:26
hoarding and non consensual
35:28
monetization. Donate
35:31
if you've got money, you can distribute
35:33
some power through dollars to these groups
35:35
that are ensuring that the future of AI is
35:37
human and just, the Algorithmic
35:40
Justice League, the a c L You, and
35:42
the Electronic Frontier Foundation. If
35:47
you take any of these actions, please brag
35:49
about yourself online. Use the hashtag
35:51
how to citizen. Tag us up on Instagram
35:54
at how to Citizen. We will
35:56
accept general direct feedback to our
35:59
inbox common at how to citizen
36:01
dot com and make sure you go ahead and
36:03
visit how to citizen dot com because that's the brand
36:05
new kid in town. We have a
36:07
spanky new website. It's
36:09
very interactive. We have an email list
36:12
you can join. If you like this show,
36:14
tell somebody about thanks. How
36:18
to Citizen with Baryton Day is a production of I
36:20
Heart Radio Podcasts and Dust Light Productions.
36:23
Our executive producers are Me
36:25
Barryton Day Thurston, Elizabeth Stewart, and
36:27
Misha Yusuf. Our senior producer
36:29
is Tamika Adams, our producer is Ali
36:32
Kilts, and our assistant producer is Sam Paulson.
36:34
Stephanie Cohen is our editor, Valentino
36:36
Rivera is our senior engineer, and Matthew
36:38
Lai as our apprentice. Original
36:41
music by Andrew Eathan, with additional original
36:43
music for season three from Andrew Clausen. This
36:46
episode was produced and sound designed by Sam
36:48
Paulson. Special thanks to Joel Smith
36:50
from My Heart Radio and Rachel Garcia
36:52
at dust Light Productions.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More