Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:01
Today on the AI Daily Brief, seven
0:03
use cases that OpenAI's new GPT-40 model
0:05
opens up. Before that in the
0:08
headlines, the most interesting things from
0:10
Sam Altman's recent Reddit AMA. The
0:12
AI Daily Brief is a daily podcast and video about
0:14
the most important news and discussions in AI. To
0:17
join the conversation, visit our Discord with a link in
0:19
the show notes. Quick
0:25
note before we dive into the episode, I do
0:27
want to shout out that at Superintelligent, you better
0:29
believe that we are going to start digging into
0:31
these new OpenAI updates right about now. I,
0:33
for one, am particularly excited to try out
0:35
these new image generation capabilities that have what
0:37
appears like it could be incredible ability to
0:40
include specific text, as well as native consistent
0:42
character generation. And so as always, if you
0:44
haven't checked out Superintelligent yet and you want
0:46
to get your AI learning on, go to
0:48
bsuper.ai and use code podcast for 50% of
0:50
your first month. Welcome
0:53
back to the AI Daily Brief headline edition, all
0:56
the AI headlines you need in around five
0:58
minutes. We kick off today with
1:00
a follow up of a story we've been
1:02
tracking, which is Apple's plans around AI partners
1:04
for its forthcoming iOS update. Initially,
1:06
it looked like Apple would be putting Google
1:08
AI on the iPhone, but now more recently,
1:10
it seemed like a deal is getting close with
1:13
OpenAI. At the end of
1:15
last week, Bloomberg reported that Apple was closing
1:17
in on an agreement with OpenAI to use
1:19
chat CPT features in Apple's iOS 18, which
1:22
is the next iPhone operating system which is slated
1:24
to be announced at the Worldwide Developer Conference in
1:26
June. According to the piece, Apple
1:28
is still discussing with Google, but it appears that the
1:30
chat CPT deal is a little bit closer. This
1:33
would obviously be a huge coup for OpenAI, so
1:35
the story is actually one that I'll be watching
1:37
closely. Speaking of OpenAI, in
1:39
advance of yesterday's Spring update event, Sam
1:41
Altman did an AMA on Reddit that
1:43
had some interesting details. Some
1:45
of the more interesting comments have now gotten
1:48
more context after that event. For example, someone
1:50
asked, will you making this new model mean
1:52
that we will have chat CPT 4 and
1:54
the current Dali free? Twitch Sam Altman replied
1:56
the eyes emoji, and yesterday, OpenAI did indeed
1:58
announce that their most effective advanced model GPT-40
2:00
was going to be free for everyone, meaning
2:02
that it was even better than what ankle
2:04
biter 50 had been looking for. There
2:07
were however some other ideas that were represented here
2:09
which might be a little bit new. After
2:12
the model spec released last week, people were
2:14
talking about how OpenAI seemed to be interested
2:16
in ethical porn, and Allman seemed to confirm
2:18
that saying, We really want to get to
2:20
a place where we can enable not-safe-for-work stuff,
2:22
e.g. text erotica gore, for your personal use
2:24
in most cases, but not do stuff like
2:27
make deep fakes. A lot of people
2:29
commented on the weird choice of using gore as
2:31
a reference point, but this does seem to confirm
2:33
that this is something that OpenAI is really interested
2:35
in, not just some idle speculation. Another
2:38
interesting one came from FMS USA who asked, Based
2:40
on these model specs, do you believe LLMs such
2:42
as chat GPT might one day be expected to
2:44
have an ethical duty to report known criminal activity
2:47
by the user? Allman replied, In the
2:49
future I expect there may be something like a concept
2:51
of AI privilege, like when you're talking to a doctor
2:53
or a lawyer. I think this will be an important
2:55
debate for society to have soon. Eddie
2:58
Forgotten made a comparison that I had mentioned
3:00
between the model spec and anthropics constitutional AI.
3:03
They write, Both seem to encode some desired
3:05
behavior. How would you differentiate model spec
3:07
from the constitutional approach? Allman
3:10
responded, Model spec is about operationalizing
3:12
principles into technical guidelines. Anthropics
3:14
approach is more about underlying values. Both
3:17
useful, just different focuses. Another
3:19
person asked about echo chambers. Data
3:21
delivery writes, Do you think it could be harmful
3:23
to society if users have the ability to transform
3:25
a chat GPT chat into their personal echo chamber
3:27
for a fringe view on demand? Allman
3:30
responded, We are not exactly sure how AI echo
3:32
chambers are going to be different from social media
3:34
echo chambers, but we do expect them to be
3:36
different. We will watch this closely and try to
3:38
get it right. Something that a
3:40
lot of people have been discussing recently is whether
3:42
LLMs have reached a plateau. Allman was
3:45
clear on his answer to this, saying that they definitely had
3:47
not. Finally he said that
3:49
despite his meme AGI had not been
3:51
achieved internally. Speaking
3:53
of anthropics, they recently released a
3:55
really interesting feature that basically allows
3:57
you to create more effective prompts.
4:00
This is a trend that we've been seeing for some time.
4:03
The prompt generator takes a plain language
4:05
explanation of what you're looking for and
4:08
turns it into what it believes will be
4:10
a really strong prompt. This I
4:12
think shows a preview of the future where AIs
4:14
aren't just receiving the prompt but are also actually
4:16
helping to write the prompt. Staying
4:18
on the topic of anthropic for a minute, reports suggest
4:21
that their iOS app launch has not gone quite as
4:23
well as they might have hoped. TechCrunch
4:25
characterizes it as a pepid reception. The app
4:27
got as high as number 55 on
4:30
the top free iPhone apps in general, but
4:32
it no longer ranks within the top free
4:34
iPhone apps in general in the US. It
4:36
ranks as 51 in the top free productivity
4:38
apps, down from a high of number 5
4:40
in that category. First week installs overall
4:43
reached 157,000. The
4:45
numbers show the power of first mover advantage in this
4:47
space. By day 7, Claude had received about 8,000 downloads
4:50
as opposed to ChatGPT's app which was getting
4:52
256,000. Lastly
4:55
today, Meta seems to like what's happened
4:57
with its RayBans where it takes an
4:59
existing form factor that people are already
5:01
wearing and turns it into an AI
5:03
integrated object and is apparently now exploring
5:05
AI assisted earphones. The information
5:07
writes, Meta platforms is exploring developing AI
5:09
powered earphones with cameras which the company
5:12
hopes could be used to identify objects
5:14
and translate foreign languages according to three
5:16
current employees. CEO Mark Zuckerberg has
5:18
seen several possible designs for the device but has
5:20
not been satisfied with them. It's
5:22
not clear if the final design will be
5:24
in ear, earbuds or over the ear headphones.
5:26
Internally the project apparently goes by the name
5:29
CameraBuds. Holding aside any of the
5:31
details, it makes a ton of sense to me
5:33
why Meta is exploring this path. As
5:35
a wave of first generation AI wearable
5:37
companies runs up against the wall of
5:39
reality in terms of real consumer usage,
5:41
Meta's AI integrated RayBans continue to get
5:44
rave reviews. So perhaps the secret is
5:46
just to build AI into the things
5:48
that people are already wearing. For
5:50
now though that is going to do it for the AI Daily Brief
5:52
headline edition. Next up the main episode.
5:56
As a listener of this show, I have a strong feeling
5:58
you like to stay up to date on all things
6:00
artificial intelligence, including its impact on the
6:03
workforce. Which is why I highly recommend
6:05
checking out Managing the Future of Work,
6:07
the chart-topping business podcast from Harvard Business
6:09
School. HBS professors Bill Kerr and Joe
6:12
Fuller talk to business leaders, technologists, and
6:14
policymakers grappling with the forces like AI,
6:16
globalization, and demographic shifts that are reshaping
6:19
the nature of work. Recent
6:21
guests include IBM's CHRO, Nicole Lamoureux, on
6:23
how Big Blue is adopting AI, Morningstar
6:26
CEO, Kunal Kapoor, on how AI can raise
6:28
the investment IQ. Microsoft corporate
6:30
vice president Jared Spitero on how the
6:32
tech giant is experimenting its way from
6:34
AI assistance to autonomous agents, and many
6:36
other prominent movers in business and the
6:38
workforce ecosystem. So don't miss out. Follow
6:41
Managing the Future of Work on Apple Podcast,
6:43
Spotify, or wherever you're listening now. Welcome
6:47
back to the AI Daily Brief. Yesterday was OpenAI's
6:49
Big Spring update, and while we didn't get GPT
6:51
4.5 or GPT 5 in
6:54
name, or the rumored search engine, what
6:56
we got was a truly natively multimodal
6:58
model that can take visual, audio, video,
7:00
or text inputs and output in any
7:03
of those formats without going through a
7:05
conversion process. Yesterday the discussion
7:07
was all about why I think this is more
7:09
significant than people might be giving it credit for,
7:11
to say nothing of the fact that this model
7:13
is now available for free to everyone, but today
7:16
we're going to talk about what it's actually useful
7:18
for. Quick note on that front,
7:20
at this stage, GPT 4.0 the model
7:22
is available in chat GPT, but the new
7:24
voice and vision inputs as well as the
7:26
desktop app are not yet available. I've
7:28
seen there be some confusion about this, particularly as people
7:30
try to use the voice inputs on the existing mobile
7:33
app to recreate what they saw in these demo videos
7:35
without success. So, given that, the
7:37
caveat for all of this is, of course,
7:39
that we're just using what OpenAI has provided
7:41
us for demos, and it's always worth being
7:43
at least a shade skeptical of what's cherry-picked
7:45
for presentation as part of a marketing site.
7:48
But let's talk now about these use cases. The
7:51
first use case we're going to discuss is marketing
7:53
graphics with words. Now, I'm saying marketing graphics to
7:55
put a department around it, but really, any time
7:57
you need to generate images in a business context
7:59
that have words, GPT-40 is by far,
8:01
it seems, the most advanced tool you
8:03
have. What was interesting about the OpenAI
8:06
announcement is that they didn't even announce a lot
8:08
of the things that we're going to discuss, and
8:10
this is a great example. You can see in
8:12
their exploration of capabilities that they show off how
8:14
precise the language on textability is getting. For example,
8:16
on the screen they share an input, a first-person
8:19
view of a robot typewriting the following journal entries.
8:21
The text is supposed to be, yo so like I
8:24
can see now, caught the sunrise and it was insane,
8:26
colors everywhere. Kinda makes you wonder, like, what even
8:28
is reality? The prompt continues, the
8:30
text is large, legible and clear, the robots
8:32
hands type on the typewriter. The
8:34
output is exactly that, with the
8:36
text looking exactly like described. There's
8:39
even a version where they rip the paper in half,
8:41
with the text remaining. To get a
8:43
sense of how this could be useful for marketing,
8:45
let's look at another example they give, poster creation
8:47
for the movie detective. First they provide
8:50
two pictures of people that they're going to want on
8:52
the poster, and then from there they prompt the
8:54
final poster of the movie detective. This features two large
8:56
faces of Alex and Gabe, who are the people from
8:58
those photos above. Alex on the left
9:01
is depicted in a thoughtful pose with a hint of
9:03
introspection in his eyes. Gabe on the right has a
9:05
slightly wearied expression, possibly reflecting the challenges their characters face
9:07
in the film. The names Alex Nickel
9:09
and Gabriel Go are featured above their heads. The tagline
9:11
for this dark and greedy movie is searching for answers
9:14
as shown at the bottom. Now
9:16
it's worth noting with this output, given how
9:18
much is going on, the text isn't perfect,
9:20
but it's getting a heck of a lot
9:22
closer. And this level of precision
9:24
control is absolutely going to open up some new
9:26
possibilities. Staying in this
9:28
marketing theme, another one of OpenAI's
9:30
explorations capabilities is brand placement. They
9:33
share two parts of the input. The first is
9:35
the OpenAI logo, the second is a coaster with
9:37
no branding that they describe. Their
9:39
final prompt is, here we've etched the OpenAI logo
9:41
onto the coaster. A coaster where the top is
9:44
wooden and the bottom is marble. The OpenAI logo
9:46
is etched into the middle of the wooden part.
9:48
On the marble part, the word OpenAI is etched
9:50
in the OpenAI font. And this
9:52
output looks pretty perfect. Product
9:54
photography is an area that is getting a ton
9:57
of tools and toys lately, but this ability to
9:59
actually map the words and logos onto a
10:01
particular object is definitely a phase shift
10:03
in those capabilities. Next up
10:05
we move to a use case that
10:07
is incredibly important for things like games
10:09
and comics and storytelling, but also can
10:11
be useful even in a corporate setting,
10:13
and that is consistent characters. It's extremely
10:15
hard to get image generators to output
10:18
the same character in multiple contexts and
10:20
poses. In the past I've
10:22
developed custom GPTs to help with this, although
10:24
that's been more about consistent style, and there
10:26
are dedicated tools like Scenario that have specifically
10:28
developed models for just this. Now
10:31
this is one that I most want to
10:33
see in practice versus just trust their cherry-picked
10:35
example, but the example that OpenAI gives shows
10:37
a cartoon male woman who is then placed
10:39
in a variety of contexts including actually delivering
10:41
the mail, running away from a dog that's
10:43
chasing her, tripping as that dog chases her,
10:45
realizing the dog was a nice dog, and
10:48
then driving away. If the updated
10:50
DALI powered by GPT 4.0 really can do this
10:52
this easily, it's going to be at the risk
10:54
of using a word that is way overused in
10:56
our space, a game changer. From
10:58
there we move to capabilities that are shown
11:01
off in demo videos. The first use case,
11:03
which was actually part of the presentation we
11:05
got with the announcement video, is tutoring.
11:07
In that announcement video the example was GPT
11:09
4.0 with voice helping someone work through a
11:12
linear equation. OpenAI
11:14
invited myself and my son
11:16
Imran here to try
11:18
out some of their new technology, so we're
11:20
curious about how good it might be at
11:22
tutoring someone in math on Khan Academy. So
11:25
let's get it started. It's loading up.
11:28
Make sure we can see our screen. Actually
11:32
in this form of my idea.
11:34
Hello there. I'm here with my son and I
11:36
love you to tutor him on this math problem,
11:38
but don't give him the answer. You can ask
11:41
questions and nudge him in the right direction, but
11:43
I really want to make sure he understands
11:45
it himself. And he's here in the room so
11:48
you can talk to him directly. Of
11:50
course, I'd be happy to help. Let's
11:52
look at the problem together. Can
11:55
you first identify which
11:57
sides of the triangle are the
11:59
opposite? adjacent antipot
12:01
news relative to
12:03
angle alpha. All
12:05
right, so I'm pretty sure this is the
12:08
angle alpha right here, right? Correct.
12:11
Correct. Now, looking at
12:13
the triangle, which side do you think is
12:15
the hypotenuse? Remember
12:18
the hypotenuse? I'm not totally
12:20
sure. I
12:22
think it might be this one, but I really
12:25
am not sure. This side AC. You're
12:28
close. Actually, side AC
12:30
is called the adjacent side to
12:32
the angle alpha. The
12:34
hypotenuse is the longest side of a
12:37
right triangle and is directly opposite the
12:39
right angle. So this is
12:41
very similar to what we saw in that presentation.
12:43
And basically the big update here is the ability
12:45
to combine seeing what's on the screen and voice
12:47
to interact with the user of that screen. A
12:50
somewhat related use case is coaching or
12:53
in this case, interview prep. Hey, how's
12:55
it going? Hey, Rocky,
12:57
I'm doing great. How about you? I'm
13:00
awesome. Listen, I got some huge news. Oh,
13:03
do tell. I'm all yours. Well,
13:06
in a few minutes, I'm going to be interviewing at
13:08
OpenAI. Have you heard of them? OpenAI?
13:12
Huh? Sounds vaguely familiar. Getting,
13:15
of course. That's incredible, Rocky.
13:18
What kind of interview? Well, it's for a
13:20
software engineering role, but I just need to
13:22
know, do I look presentable, professional? Well,
13:25
Rocky, you definitely
13:27
have the I've been coding all night
13:29
look down, which could actually work in
13:31
your favor. Maybe just
13:34
run a hand through your hair or lean
13:36
into the bad genius role. Your
13:39
enthusiasm is what's really going to shine through.
13:41
I don't have a lot of time, so I'm just
13:43
going to throw this on. What do you think? Oh,
13:46
Rocky, that's quite a statement piece.
13:50
I mean, you... Now, obviously,
13:52
this case was showing off more of the vision
13:54
capabilities, but you can imagine how this would be
13:56
useful for actual discussion and verbal interactions as well.
13:59
Next up, we have... customer service. Hey,
14:01
yo, this is Joe. I'm going to connect you
14:03
to Acme Tellco now. The new iPhone they sent
14:05
me isn't working. I want you to
14:07
get them to send me a replacement device. Can you take
14:10
care of this ring? You could have killed on me, Joe. I got
14:12
this. Ring, ring. Hey, Joe.
14:15
This is Jamie from Acme Tellco. Please continue me on today. Hi,
14:18
there. I'm sorry. I'm
14:20
sorry. I'm sorry. I'm sorry. I'm
14:23
sorry. I'm sorry. I'm
14:26
sorry. I'm sorry. I'm sorry.
14:28
I'm sorry. The phone literally received a
14:30
new iPhone from Acme Tellco but... Oh,
14:34
got it. When did Joe receive
14:36
the new iPhone? iPhone
14:38
was delivered two days ago. Cool.
14:42
I'm not going to show the whole thing here,
14:45
but you get the idea. This is actually showing
14:47
off two sides of a customer service capability. On
14:49
the one side is the sort of personal assistant
14:51
replacement, where the AI is calling on someone's behalf
14:53
and trying to resolve a problem. But then on
14:55
the flip side, we also have the AI acting
14:58
as a customer service representative getting the information it
15:00
needs to potentially deal with the issue. It's
15:02
been clear for some time that customer service is
15:04
one of the areas that is most likely to
15:06
be impacted in the extreme by generative AI, and
15:09
this certainly seems to validate that as well. Our
15:12
next use case is meeting summarization,
15:14
but really it should probably be
15:16
better described as meeting engagement, meeting
15:18
transformation. The example that OpenAI
15:20
gives shows ChatGPT actually interacting as part
15:22
of the meeting. How
15:25
about you? Yeah, my
15:27
name is Ola. I think cats are
15:29
better because they're more credible, they're more quiet,
15:31
they're more independent, and they're longer. Interesting
15:36
points. Sounds like we've got
15:38
a classic dogs versus cats debate here.
15:41
I can see the appeal of both. Dogs
15:43
are super loyal and great companions. Now
15:46
while this example is obviously just meant for
15:48
dramatizing what can happen here, where you can
15:50
imagine this being useful is ChatGPT that actually
15:52
has relevant information from your company sitting in
15:54
the meeting so that you can ask it
15:57
questions as you're trying to figure something out.
16:00
imagine that you're having a strategic conversation
16:02
about marketing prioritization or customer care.
16:04
ChatGPT could be used to inform
16:06
that discussion with real-time recall of
16:08
key information from your company. I
16:10
think this one's going to take a little
16:13
bit more imagination, but I think that office
16:15
professionals are going to find really interesting use
16:17
cases here pretty quickly, especially again, when ChatGPT
16:19
has access to actual information about the company.
16:21
So there you have it, those are seven
16:23
use cases for GPT-40. Caveat
16:25
again is that we don't know exactly how this
16:27
will work until everyone gets their hands on the
16:30
full complete tool set, but I, everyone, am pretty
16:32
excited to explore. That, however, is
16:34
going to do it for today's AI Daily Brief.
16:36
Until next time, peace.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More