Believing the AI Hype
Really?
Yes. I quit my current job. I think we are on the verge of a new class of applications1 that were not possible previously. There is a new interface between machines and the environment. Models can call into functions and interact meaningfully with the real world. Combine that with the ability to reason2, and this is a new primitive3.
Let’s say that I want to cut into the doordash pie4 😉, and create an application that uses a chatbot interface to help people find their dinner5. I can hook it up to a bunch of sources6 and get conversations like:
USER
what is the best dosa near me
ASSISTANT
For a delicious dosa experience in the Seattle area, I recommend House Of Chettinad located at 1525 13th Ave, Seattle, WA 98122. They specialize in South Indian cuisine and are known for their authentic dosa offerings.
You can find more information on their website: House Of Chettinad.
This might look like a normal ChatGPT conversation, but there are semantics that make it interesting.
- The model got my location through a cache in my browser
- House of Chettinad is, in my opinion, better than the first 10 google maps7 results
- The sources6/tools informed the models’s choice of restaurant
Why am I making this post?
Honestly, it caught me off-guard, but I don’t think that is the case for everyone. YC seems to be on top of it. Anthropic has a blog post on how to do it well. It seems like there is a lot of VC money going into this8…
Similar things happened with crypto, so I took the same approach by creating a project in the space9 to cut out the noise. Dissimilar to crypto, the projects I created absolutely shock me by their potential value. Creating a project you can see being valuable is extremely galvanizing10.
Why isn’t this tech in your life now?
Recency
@datnguyennx has a good summary of the timeline
%%{init: { 'logLevel': 'debug', 'theme': 'default' , 'timeline': {'disableMulticolor': true}, 'themeVariables': { 'cScale0': '#383838', 'cScaleLabel0': '#93e0e3', 'cScale1': '#383838', 'cScaleLabel1': '#cc9393', 'cScale2': '#383838', 'cScaleLabel2': '#f0dfaf', 'cScale3': '#383838', 'cScaleLabel3': '#7f9f7f' } } }%% timeline title OpenAI Timeline May 2023 : jsonformer Nov 2023 : openai function calling/assistants api May 2024 : GPT 4o released May 2024 : Claude tool calling released Sep 2024 : o1 is released Oct 2024 : sonnet 3.5 v2 released
Check the Berkeley Function-Calling Leaderboard for performance metrics (init Aug 2024)11. Function calling only got really good recently. These primitives3 were just mined.
Technical Challenges
Let’s say we have a great model that can do tool use almost perfectly. There is still a bunch of technical challenges with integrating the model into the environment.
- Managing data: Dumping your entire API response into the model won’t give you optimal performance. Here are a couple potential optimizations:
- Utilizing Tools: How can you ensure the model utilizes all the information available to it14
- Latency: Tool use can take a while. How do you reconcile your requirements?15
- Verification: Can you let the model work against a deterministic component?16
- Model Strategy: Can you apply different sizes/types of models to different problems?17
- Observability: In your new system, where are the failure points?
And so many more…
Model Cost
Models cost a lot18. However, I believe that inference costs (and latencies) will have a race to the bottom.
- Bedrock Inferentia and Google Trillium/TPU
- Many, many ASIC companies19
- Better inference software20
- Power costs through Nuclear?21
- Model commoditization (deepseek)
More?
Dude, these are new primitives3. Commercial apps are just being built. LLMs make for great demos, and I can only assume there is a ton more last mile engineering to get through.
What is next?
Write an app
⚠️ CREATE AN APP! ⚠️
Use any of the APIs (openai, bedrock, anthropic, langchain…) and spin up a quick app. Do a convicion startup idea, Jared Friedman says choose a sass company to replace, or automate a part of your life22
Going from nothing to something requires no deep ML knowledge, no crazy system knowledge, and no insane techniques. My 10th grade self with basically just arduino programming experience could build something of value.
The future
Commercial apps are just being created/released. I truly believe the hype is warranted, and this tech is going to create great value in the future. Cursor and Perplexity are already in my life every day.
An exercise I have been doing is to compare this to the internet era. I believe they are comparable (and I’ll probably write a blog post exploring the comparison).
-
sierra, camfer, cognition as just 3 of the many many great companies ↩︎
-
Test time compute in models like o3 is one of the recent pieces of tech enabling this ↩︎
-
Thanks Tony Kam from lilac labs for the terminology ↩︎ ↩︎ ↩︎
-
I did this with the app and had a great date at kedai makan with my girlfriend ↩︎
-
I did this with google maps api, yelp api, exa, and a couple more pieces ↩︎ ↩︎
-
Taste of Mumbai, Chennai Express Authentic Indian, Babai Dosa, The Bangalore Kitchen, Dosawaala, Sangamam, Spice Bliss, Madras Dosa Corner, Dosa House (I actually might choose Dosa House over House of Chettinad, but both are great) ↩︎
-
I don’t know how to find good sources on this but this guy agrees with me ↩︎
-
This is inspired by moxie who I recently saw on a plane (celebrity shock) ↩︎
-
I did an internship at dark forest to cut out the crypto noise, but was not nearly as shocked. ↩︎
-
The gorilla project (which created BFCL) has a ton of useful information about function calling/tool use. ↩︎
-
Would a three-random-words method work better than UUIDv4s? ↩︎
-
Structured data (e.g. json) vs unstructured seems to be an open question so far ↩︎
-
You can force claude to use a specific tool ↩︎
-
Can you use async tool calling? ↩︎
-
I did this in this app and got great results ↩︎
-
Techniques like speculative decoding or task specific selection ↩︎
-
I once spent $40 in a day just developing/testing an app. ↩︎
-
Etched, Tenstorrent, Groq, Chipflow, Taalas, Cerebras, SambaNova. Check out artificial analysis to see who is on top. ↩︎
-
There has to be a better way than just deploying vllm containers on kubernetes. Sources - anthropic, openai, bedrock - I used to work there ↩︎
-
microsoft is reopening three mile island ↩︎
-
Superhuman make my emacs email automation somewhat obsolete ↩︎