Podchaser Logo
Home
Cloud Fundamentals needed for AI

Cloud Fundamentals needed for AI

Released Sunday, 24th March 2024
Good episode? Give it some love!
Cloud Fundamentals needed for AI

Cloud Fundamentals needed for AI

Cloud Fundamentals needed for AI

Cloud Fundamentals needed for AI

Sunday, 24th March 2024
Good episode? Give it some love!
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:00

Listen up, are you ready for the

0:02

ultimate coding challenge? Here's a chance to win a

0:04

Tesla Cybertruck or $100,000. All

0:07

you have to do is build an app

0:09

with a front end and back end and

0:12

deploy it on WSO2's Corio, an internal developer

0:14

platform. The more you do with Corio, the

0:16

more chances you have to win. For

0:18

all the details, go to

0:20

corio.dev slash Cybertruck. Sign

0:22

up, get started, and possibly

0:24

win a Tesla Cybertruck or $100,000. Plus,

0:28

10 more winners get MacBook Pros.

0:30

But hurry, because the challenge ends on

0:32

April 30th. Good luck. Cloudcast

0:35

Media presents from the Massive Studios

0:37

in Raleigh, North Carolina. This is

0:39

the Cloudcast with Aaron Delb and

0:41

Brian Gracely, bringing you the best

0:43

of cloud computing from around the

0:45

world. Good

0:50

morning, good evening wherever you are, and welcome back

0:52

to the Cloudcast. We are coming to you live

0:54

from the Massive Cloudcast Studios here in Raleigh, North

0:56

Carolina. Hope everybody is doing well. We are now

0:58

officially into spring. By the time you listen to

1:01

this, we will be well past March 21st or

1:03

March 20th whenever you celebrate spring. The

1:06

year is moving along. Hopefully, everybody who has

1:08

listened to this on Sunday, your March Madness

1:10

brackets have not been completely busted by the

1:12

time you get to this on Sunday. A

1:14

couple of rounds of games will have

1:16

happened. For those of you living in the States or those of

1:19

you that are college basketball fans, this

1:21

time of year is sort of the

1:23

mecca of March Madness, the mecca of

1:25

enjoyment of watching small teams be large teams

1:28

and all the Cinderella stories and stuff like

1:30

that. Hope everybody is doing well. Another Sunday

1:32

perspective show. I wanted to

1:34

talk about I had a chance to be

1:36

a guest maybe a month or so ago

1:38

on a podcast called Altitude, which was run

1:41

by the good folks over at Aviatrix. We

1:45

really weren't talking so much

1:47

about networking, but really talking about where

1:49

cloud computing was going, kind of the

1:51

intersection of cloud computing and AI, cloud

1:53

computing, and

1:55

cloud and multi-cloud and hybrid cloud and all

1:57

those sort of things. One

2:00

of the things that I mentioned,

2:02

it was a guest, I mentioned

2:04

about where AI was intersecting with

2:08

cloud. The folks who

2:10

were conducting the interview said,

2:12

hey, does it feel like AI

2:14

is going to be the thing that

2:16

ultimately drags everything into the

2:19

cloud because there's so many GPUs

2:21

in Azure or AWS or GCP

2:23

or in some of these managed

2:25

cloud hosting services. The

2:28

discussion ended up going was sort of, I'm

2:31

not sure that's exactly going to happen. There's

2:33

a lot of opportunity because so much of

2:35

company's data lives on premises. People

2:38

are going to want to build models that are close to

2:40

their data. They've got security concerns. So anyways, we got into

2:42

this sort of long discussion about

2:44

how AI is really kind of

2:47

an interesting kind of use case for the

2:50

concept of hybrid cloud or multi-cloud and so forth. But

2:53

anyways, as part of that discussion, we were

2:56

talking about skill sets and companies

2:59

evolving to use AI and what they

3:01

were going to need to be successful and what

3:03

could we have learned from the last 10 years

3:05

or so of cloud computing. One

3:07

of the things that I mentioned, and I didn't

3:11

really dive into thinking about it so much, it just

3:13

kind of popped in my head, but there

3:15

was an old phrase when we first started doing

3:17

cloud computing and the old kind of cloud-erati, the

3:20

early adopters of cloud. One of the

3:22

things they would often say is that

3:24

if you don't do IT well today,

3:27

meaning you don't automate things well, you

3:29

don't secure things well, you don't have

3:32

good best practices and good hygiene and so

3:34

forth, if you don't do IT

3:36

well, you're probably not going to do cloud well because

3:38

cloud doesn't just magically

3:41

make all of your problems in IT

3:43

and your warts and your

3:46

issues go away. They're going to accentuate

3:48

them. In fact, because the expectation of

3:50

cloud is that things move

3:52

faster, scale bigger than they did

3:55

before, have more flexibility and agility

3:57

built in, If you had

3:59

struggles and IT. The you run struggle even more

4:01

in Cloud and so I sort of took

4:03

that a step further and said lox You

4:06

know is if over the last decade or

4:08

so you haven't figured out how to do

4:10

Cloud likes things well armed, you're going to

4:13

struggle with a I it is. Ultimately, Ice

4:15

is going to take a lot of the

4:17

things that you do today in terms of

4:19

how you organize your team's how you deal

4:22

with large amounts of data, how you are.

4:24

You. Know our It A rating and

4:27

and and retraining and I'm fine

4:29

tuning in and deploying frequently and

4:31

getting feedback loops and so forth.

4:33

So if you're not doing. Cloud.

4:35

Fundamentals Well, you're going to struggle with a

4:37

I and so I thought it might be

4:40

useful to do a show today or as

4:42

a Sunday perspective on sort of your five

4:44

important cloud capabilities that you're going to need

4:46

nor is succeed in the I T world.

4:49

So we'll dig into that right after the

4:51

break us and we're back and as a

4:53

mentor the top the show were going to

4:55

dig into kind of a list sickle type

4:58

avast up shows today but you know five

5:00

important cloud capabilities that I think you you

5:02

really need to be looking at. You

5:04

do. You do these well before you

5:07

start taking on a bunch of Ai

5:09

projects and your this is where these

5:11

scenarios in which the but were seeing

5:13

with a I released today is. So.

5:16

Many. Companies. Are excited about it.

5:18

He's got a lot of executives who are

5:20

starting to put funding and is into projects.

5:23

things that maybe have potential, things that are

5:25

experimental. they're not exactly sure what's gonna happen

5:27

and an early on you're going to see

5:29

a lot of kind of one off projects

5:32

a lot of you know kind of shadow

5:34

ai, a few well arm and it won't

5:36

necessarily look exactly like it will a couple

5:38

of years from now. Once you've figured out

5:41

you know, can I make this technology work?

5:43

I'm can We accomplished some things that the

5:45

change how we work with our customers. or

5:47

make us more productive or reduce costs in

5:50

some ways into some those big three or

5:52

four things but over time we're going to

5:54

see more and more companies that say okay

5:56

i wanted to take advantage of ai but

5:58

i want to make that I'm doing

6:00

it in an efficient way because making a

6:03

mistake in cloud, you could have gotten a

6:05

decent size bill. Maybe you spun up an

6:07

instance in the wrong region or

6:09

you forgot about something or you didn't

6:12

realize that your architecture didn't scale a certain

6:14

way. You got one of those Corey

6:17

Quinn unexpected bills that you kind of go

6:19

fix. With AI, you could run

6:22

up a mistake. Mistake might

6:24

cost you $100,000, $200,000, a million dollars,

6:26

multiple millions of dollars just

6:28

because the cost of GPUs and training cycles

6:31

and loading large amounts of data into the system can

6:33

be very, very expensive. If you don't know what you're

6:35

doing, those experiments

6:37

and those mistakes could be

6:39

an order of magnitude more expensive. We're

6:42

going to see over time as companies

6:44

begin to figure this stuff out, they're

6:46

going to want to maybe not make

6:48

some of the mistakes that we've seen.

6:50

I don't see mistakes, but just sort

6:52

of unexpected situations happen

6:55

that maybe they saw with cloud computing when they really

6:58

didn't, like I said, didn't have their house in order,

7:00

didn't necessarily know how to do IT well. Let me

7:02

kind of go through these. They're not necessarily in any

7:04

particular order in terms of one through five, but

7:07

I think there are things that once

7:10

you begin to get past the, hey, we

7:12

played around with some system, with some model,

7:14

with some capability, and it started to

7:16

work, that you're going to want

7:18

to make sure you have in place in order

7:21

to be able to bring multiple teams in, to

7:23

learn from multiple projects, to do

7:25

things at scale and to do things cost effectively.

7:28

First thing is, I'm kind of

7:30

calling automate everything. You've got to ultimately

7:33

think about automation, not just

7:35

as a nice to have, not just as sort of

7:37

an add-on thing once you figure

7:39

it out how to do something, but you almost have

7:41

to think about automation as mission

7:43

critical because the ability to scale

7:46

out infrastructure is needed for GPUs

7:50

and for networking and storage

7:52

and so forth. It's going to be really important. It's

7:55

not that the automation is all that much different than it

7:57

was in the past, but I think being

8:00

not just a day one thing where we're

8:02

thinking about deploying or a day one and

8:05

a half or day two thing where maybe

8:07

deploying a patch or something, but how do

8:09

we think about automation as being much more

8:11

mission critical? Can it respond to real time

8:13

events and then kick off things? So it

8:15

can be more invent driven and stuff like

8:17

that. The second thing

8:19

is you really wanna think about

8:22

building the right abstractions and flexibilities.

8:24

So let me give you a couple of examples. One

8:27

of the things that your development teams, your data science

8:29

teams are going to ask for because

8:31

they did the exact same thing in the cloud

8:33

era is I want my own machines. I want

8:35

my own server to be able to work on

8:37

this. And you're going to ask for their own

8:40

servers with their own GPUs. And

8:42

while that makes life super simple for

8:44

them because hey, these are my toys,

8:47

this is my playground. I don't have

8:49

to worry about anybody keeping up

8:51

with how long have I been running this and

8:53

am I effective with it? That's

8:55

incredibly expensive. I mean, imagine you

8:57

have several hundred, for example, data

9:00

scientists working on things. You're

9:02

not necessarily going to be able to give them

9:04

their own sandbox, their own playground, their own servers

9:06

with GPUs. Those things are tens

9:09

and hundreds of thousands of dollars. So you

9:11

wanna start figuring out how do I do

9:13

the right kind of abstractions? So things like

9:15

can we do GPU sharing? Can we do

9:18

sort of splicing and so forth of GPUs? Can

9:20

I do this on a time basis? What can

9:22

we do in terms of development

9:25

tools? How do I provide developers with, whether

9:28

you're using tools like Backstage or other things

9:30

to give them self-service

9:32

development environments, kind of build

9:34

the platform engineering types of

9:37

abstractions such that your data

9:39

scientists can do the things that they wanna do, that

9:41

they need to do. You

9:43

can efficiently use the underlying resources and

9:45

infrastructure and so forth. And

9:47

you can put the right kind of guardrails

9:50

where they make sense, right? Right now you may not

9:52

need a lot of guardrails. You're just trying to go

9:54

fast. But over time, you are going to want certain

9:56

things in place, certain abstractions in place that

9:59

help make sense. sure that developers don't create

10:01

a multi-million dollar problem for you,

10:04

or expose data they weren't supposed

10:06

to expose, or overwhelm

10:08

a cluster because you

10:11

were asked to do a training run in eight hours

10:13

and you don't have enough GPUs to do something like

10:15

that. So build those right extractions

10:17

and flexibilities in, and think about them

10:20

early in the process because as you

10:22

scale, you don't want the

10:24

project, when it starts to get momentum in your

10:26

company, to just fall down. Now,

10:28

third thing is to think about,

10:31

as you've built some of those abstractions, you've built some

10:33

of that flexibility, are you

10:35

leveraging platforms, right? The platforms that

10:37

you're building upon, platforms

10:40

you're testing upon, deploying upon, are

10:42

you building them to bring together

10:44

the data science team, the MLOps

10:46

team, and the app dev team?

10:49

Because ultimately, the data scientist, or

10:51

even data scientist plus MLOps, kind

10:53

of can't operate in a vacuum,

10:55

right? So this is, again, this is sort of

10:57

the evolution of DevOps, right? Can

10:59

I bring the developers together and

11:01

the operations teams together in order to be

11:03

able to say, look, the ultimate goal is let's

11:06

allow the developers to build

11:08

business logic, to bring and add value to the

11:11

business, and not have to

11:13

worry so much about underlying security

11:15

and deployments and networking

11:17

and storage configurations. The

11:20

same sort of thing is going to happen where I've got

11:22

to bring together the teams that deal with data and models,

11:25

and be able to do training, and do

11:27

iteration, and redeployments, and all that sort of

11:29

stuff, and then bring them together with

11:31

the application teams, who then are probably

11:33

going to be tied to the hips with the ops team. So this

11:36

is really going to be sort of like data

11:38

science, DevOps, if you will.

11:40

And so you want to be thinking

11:42

about, do the platforms or

11:45

the abstractions that I'm building, do

11:47

they create walls between these groups, or

11:49

do they create sort of free-flowing collaborative

11:52

workspaces such that once

11:56

those models are built and they want to expose

11:58

an API, that it's easy to expand. expose that

12:00

API to the application team, right? They

12:02

don't have to fight over how

12:04

a security deployed, you know, within

12:06

that environment. Do we have enough resources to make

12:09

this work? Is, am I going to be able

12:11

to get the, it worked in my laptop scenario

12:13

to work in Dev and Tester production?

12:15

How much do I have to change? So you

12:18

want to be thinking, those teams

12:20

have to come together, right? They're

12:22

going to have to work together. They're going to have to

12:24

work collaboratively. How do I

12:26

build platforms or take advantage of platforms

12:29

that allow them to work more cohesively

12:31

together? Fourth thing on the

12:33

list, and this is going to sound obvious, but

12:35

given the fact that really over

12:37

the last couple of years since COVID has been

12:39

sort of winding down and

12:42

we've looked at the numbers from a number of

12:44

the cloud providers and they've been, you know, slowed

12:46

down more than they had. So

12:48

much of what they were talking about was

12:50

that their customers were having to go back

12:52

and right size things. They basically figured out

12:54

they had no idea how to do spending

12:56

in the cloud, i.e. they had

12:58

no cost controls, they had no FinOps, they

13:00

had no idea how to size out

13:03

projects and so forth. And they were just throwing things

13:05

in the cloud. Granted part of that was

13:07

driven by COVID, but part of it was just driven by,

13:10

hey, you know, it's really easy to spin

13:12

stuff up, IT's not getting in my way. And

13:14

then they started realizing, oh, wait a second, I'm

13:17

paying, you know, huge costs for doing this, right?

13:19

The cloud isn't necessarily cheap, especially in production when

13:21

I've got to run it, you

13:23

know, on higher performance machines and I want to be

13:26

able to have DR and backup and all the sorts

13:28

of things I need for production. So

13:30

as I mentioned earlier, you know, as we

13:32

get into AI, the cost of AI, you

13:35

know, the cost of entry is not cheap,

13:37

the cost of mistakes is not cheap, but

13:39

just the cost of doing day-to-day

13:41

operations, you know, whether

13:43

it's deployments, whether it's testing, whether it's

13:46

inferencing, whether it's fine tuning or rag

13:48

or whatever models you're using and building,

13:52

you know, you want to start thinking earlier

13:54

on, do we have some mechanism in

13:57

place and some visibility in place so we understand

13:59

the cost? of it because at the end of the day, AI is

14:02

super powerful and it's probably the

14:04

first generation of applications

14:06

that are going to come along that

14:09

you're going to be able to, I don't

14:11

want to say automatically, but more or less be

14:13

able to go, this is the goal of what

14:15

this is going to be and

14:17

I can associate costs with those

14:19

goals. So if my goal is to be like,

14:21

I want to make my developers 50% more productive,

14:24

what would 50% more productive look like in terms

14:26

of like business output? And then I

14:28

can kind of measure that again. So what's it

14:30

going to cost for me to get them to

14:32

say 50% more productive, right? Or whatever the measure

14:34

or the metric you want is, I'm trying to

14:36

reduce cycle times of doing

14:39

analysis using computer vision to do

14:43

preventative maintenance on things or

14:45

whatever it is. We

14:47

want to do recommendations and we know if we do recommendations,

14:49

we should get 20% uplift

14:51

on the sales and our retail

14:53

channels and therefore our sales should

14:55

look like this. Okay, right.

14:57

Whereas if I'm just building like a

14:59

new Java application or I'm building some

15:01

new, I don't know, simpler way to

15:03

keep track of customer projects

15:05

or something like being able

15:07

to figure out the ROI of that is sort

15:09

of complicated because you're like, well, I mean,

15:12

I guess it'd be better, but do we really need

15:14

it that, you know, that kind of math AI, I

15:16

think is going to drive much more, I don't want

15:18

to say simpler, but simpler, simpler

15:20

ways of saying this is the goal. This

15:22

is the outcome from a business perspective. This

15:24

is what it's going to cost. And

15:27

so you want to have sort of

15:29

financial visibility at a minimum. And

15:32

then, you know, controls to help you understand

15:34

like, okay, when they get, you know,

15:36

we get way outside of baseline or when the bill shows

15:38

up and we're really not sure what's going on with it.

15:41

What are we going to do at that point? Right.

15:44

Now, so we've hit on the first four,

15:46

the first four are very much driven around

15:48

technology. So automation, make

15:51

automation mission critical, because again,

15:53

the faster you're trying to drive results with AI,

15:55

the more you're going to need sort of everything in

15:57

the system automated. Second, build the right

15:59

abstract. make sure you can do the right

16:01

kind of sharing. You can give self-service access, all the

16:03

sort of things that we've been driving

16:06

for a number of years around sort

16:08

of DevOps meets platform

16:11

engineering type of things. Third, leverage

16:13

platforms to help bring your teams together, make it

16:15

easier for them to work together. We

16:18

kind of got some of those things

16:20

right in the DevOps world. You really

16:22

want to get those things right in

16:24

the sort of data scientist meets MLOps

16:26

meets app dev world, right? Fourth

16:28

thing, obviously, make sure your costs aren't getting out

16:30

of control, you've got visibility. Now

16:33

the last thing that I'll say, and we talked

16:35

about this a lot, and we saw some early

16:37

of these things

16:39

happen, and then we saw the typical

16:42

kind of fragmentation happen, is

16:44

when you see success, because

16:48

so much of AI has the ability

16:50

to be really, really interesting, and at

16:52

the same time, it has the ability that if you

16:54

get it wrong to be problematic,

16:58

right? And problematic could be very

17:00

expensive, problematic could be Gen AI

17:03

hallucinations that potentially cost your company

17:05

money in lawsuits. We've started to

17:07

see some of that start to

17:09

happen. We've seen misuse

17:11

of how do I go about using public

17:14

models, for example, and you dump your company

17:16

data out into a public model, and then

17:18

you're wondering why your competitors

17:20

were able to kind of get wind

17:22

of it. So we

17:24

wanna make sure that the successes that happen

17:27

are well publicized, at least internally, right? So

17:29

you wanna think about how do

17:32

I take advantage of a situation in which a

17:34

team figured something out, they did something

17:36

well, they were able to benefit the business,

17:39

and given the fact that the data scientists

17:41

and MLOps people and just kind of all

17:43

of the skill sets in this domain are

17:46

pretty rare these days, right? There's

17:48

just not tons of them floating around. There's just

17:50

not a lot of people that have more

17:53

than six months experience or a year experience

17:55

or two years experience in

17:58

some of these new technologies. want

18:00

to be thinking as a company, how do I,

18:02

you know, when we do find success,

18:04

how do we socialize it? How do we, you

18:07

know, document the best practices of it? How do

18:09

we encourage those teams to take those

18:11

one or two extra steps, you

18:14

know, to make sure that what they

18:16

did isn't a snowflake, that it can,

18:18

there can be some amount of learning

18:20

or reuse, or, you

18:22

know, one plus one equals three kind of

18:24

scalability that can be helpful throughout the rest

18:27

of the company. And I know in some

18:29

companies, you know, that won't necessarily

18:31

happen because they're kind of siloed or Chinese firewalled

18:33

off. In some companies, you've got people who

18:35

are like, I'm not giving up my secrets,

18:37

because that's going to help me get promoted

18:39

versus somebody else getting promoted. But

18:42

I guess my guidance is if you are a

18:44

leadership organization, and you're funding these projects, you do

18:46

want to think about well, what will be the

18:49

motivation, the incentive structure that I need in order

18:51

to sort of be able to do that?

18:53

Because again, at the end of the day, everybody's

18:57

got an idea of what would be a cool

18:59

thing to do with AI, there will be more

19:01

napkins floating around that have ideas and whiteboards

19:03

written and so forth. But at

19:05

the end of the day, data scientists aren't

19:07

growing on trees yet. They're not a dime

19:09

a dozen. They're not a commodity. They're still

19:11

a very expensive resource. They're hard to find.

19:14

And you know, people

19:16

that know how to do this stuff well, and can

19:18

do it in, you know, weeks

19:20

instead of months, months instead of years, you

19:22

know, are going to be difficult to find

19:25

in many cases, you know, fairly expensive. So

19:27

you want to figure out, you know, how

19:29

do we socialize successes? How do we incentivize people to

19:31

take that next step and do that? And

19:33

that's sort of my last item here

19:35

on my on my list of five.

19:38

So automation, abstractions, platforms

19:40

for collaboration, understanding

19:42

cost, and then socializing success, or even socializing

19:45

failures, right, so that you don't make the

19:47

same mistakes twice. So a little

19:49

bit of technology in there, a little bit

19:51

of collaboration, a little bit of,

19:53

you know, kind of people plus process

19:55

plus the technology, similar to what we

19:57

had when we first got started with cloud. Some

20:00

people heated that advice and they did it well.

20:03

Others kind of wanted to take their old models

20:05

and just sort of jam them into the new

20:07

world. We found in many cases, those didn't work.

20:10

Hopefully you're learning from the last five

20:12

years, 10 years of what

20:14

happens when we have a technology transition that if

20:16

you don't get the fundamentals right and you don't

20:20

kind of invest in those things, the odds are you

20:22

may be coming back a year, two years from now

20:24

and people go like, hey, how's it going? And you're

20:26

like, well, I got a humongous bill. I

20:29

got a humongous amount of costs that we've spent and not

20:31

as many outputs as we really wanted to or

20:33

successes that we want. So anyways, hopefully you kind

20:35

of heed the advice that if you want to

20:37

do AI well, you got to do

20:40

some cloud fundamentals and

20:42

those cloud fundamentals don't necessarily have

20:44

to be public cloud fundamentals, but

20:47

they want to be the way

20:49

of doing things in a cloud

20:51

way. Self-service, API driven, scalable, take

20:54

advantage of self-service, having teams collaborate

20:56

together, thinking about workflows, thinking about

20:58

business value, all those things

21:00

kind of coming together. So anyways, with that, I'll wrap

21:02

it up. Hope everybody has

21:05

enjoyed this Sunday perspective. Again, hopefully your March

21:07

Madness bracket is still intact and you're doing

21:09

well, or at least you're enjoying the games.

21:11

Hopefully you're enjoying the weather, wherever you are,

21:13

hopefully it's getting a little bit warmer here.

21:15

It's supposed to be sunny and warm on

21:18

the 21st and so forth and going

21:20

forward. So anyways, happy spring to everybody.

21:22

Happy Sunday perspective. Thank you all for

21:24

listening. Thanks for telling a friend. Thanks

21:27

for rating the show, giving us feedback. We'd love

21:29

five stars if you get a chance to, if you enjoy

21:32

the show, we'd love that. Helps us grow the show. It

21:34

helps more people find it through the way that they find

21:36

podcasts. So with that, I'll wrap it up. And we'll talk

21:38

to you next week. Thank you for listening to

21:40

the Cloudcast. Please visit thecloudcast.net

21:43

to find more shows, show notes,

21:46

videos, and everything social media.

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features