18 Jan 2025

Believing the AI Hype

Really?

Yes. I quit my current job. I think we are on the verge of a new class of applications¹ that were not possible previously. There is a new interface between machines and the environment. Models can call into functions and interact meaningfully with the real world. Combine that with the ability to reason², and this is a new primitive³.

Let’s say that I want to cut into the doordash pie⁴ 😉, and create an application that uses a chatbot interface to help people find their dinner⁵. I can hook it up to a bunch of sources⁶ and get conversations like:

USER

what is the best dosa near me

ASSISTANT

For a delicious dosa experience in the Seattle area, I recommend House Of Chettinad located at 1525 13th Ave, Seattle, WA 98122. They specialize in South Indian cuisine and are known for their authentic dosa offerings.

You can find more information on their website: House Of Chettinad.

This might look like a normal ChatGPT conversation, but there are semantics that make it interesting.

The model got my location through a cache in my browser
House of Chettinad is, in my opinion, better than the first 10 google maps⁷ results
The sources⁶/tools informed the models’s choice of restaurant

Why am I making this post?

Honestly, it caught me off-guard, but I don’t think that is the case for everyone. YC seems to be on top of it. Anthropic has a blog post on how to do it well. It seems like there is a lot of VC money going into this⁸…

Similar things happened with crypto, so I took the same approach by creating a project in the space⁹ to cut out the noise. Dissimilar to crypto, the projects I created absolutely shock me by their potential value. Creating a project you can see being valuable is extremely galvanizing¹⁰.

Why isn’t this tech in your life now?

Recency

@datnguyennx has a good summary of the timeline

%%{init: { 'logLevel': 'debug', 'theme': 'default' , 'timeline': {'disableMulticolor': true}, 'themeVariables': {
          'cScale0': '#383838', 'cScaleLabel0': '#93e0e3', 
          'cScale1': '#383838', 'cScaleLabel1': '#cc9393',
          'cScale2': '#383838', 'cScaleLabel2': '#f0dfaf',
          'cScale3': '#383838', 'cScaleLabel3': '#7f9f7f'
   } } }%%
timeline
  title OpenAI Timeline
    May 2023 : jsonformer
    Nov 2023 : openai function calling/assistants api
    May 2024 : GPT 4o released
    May 2024 : Claude tool calling released
    Sep 2024 : o1 is released
    Oct 2024 : sonnet 3.5 v2 released

Check the Berkeley Function-Calling Leaderboard for performance metrics (init Aug 2024)¹¹. Function calling only got really good recently. These primitives³ were just mined.

Technical Challenges

Let’s say we have a great model that can do tool use almost perfectly. There is still a bunch of technical challenges with integrating the model into the environment.

Managing data: Dumping your entire API response into the model won’t give you optimal performance. Here are a couple potential optimizations:
- How do you manage IDs¹²?
- What pieces of data are most important for the model?
- How do you present that data¹³?
Utilizing Tools: How can you ensure the model utilizes all the information available to it¹⁴
Latency: Tool use can take a while. How do you reconcile your requirements?¹⁵
Verification: Can you let the model work against a deterministic component?¹⁶
Model Strategy: Can you apply different sizes/types of models to different problems?¹⁷
Observability: In your new system, where are the failure points?

And so many more…

Model Cost

Models cost a lot¹⁸. However, I believe that inference costs (and latencies) will have a race to the bottom.

Bedrock Inferentia and Google Trillium/TPU
Many, many ASIC companies¹⁹
Better inference software²⁰
Power costs through Nuclear?²¹
Model commoditization (deepseek)

More?

Dude, these are new primitives³. Commercial apps are just being built. LLMs make for great demos, and I can only assume there is a ton more last mile engineering to get through.

What is next?

Write an app

⚠️ CREATE AN APP! ⚠️

Use any of the APIs (openai, bedrock, anthropic, langchain…) and spin up a quick app. Do a convicion startup idea, Jared Friedman says choose a sass company to replace, or automate a part of your life²²

Going from nothing to something requires no deep ML knowledge, no crazy system knowledge, and no insane techniques. My 10th grade self with basically just arduino programming experience could build something of value.

The future

Commercial apps are just being created/released. I truly believe the hype is warranted, and this tech is going to create great value in the future. Cursor and Perplexity are already in my life every day.

An exercise I have been doing is to compare this to the internet era. I believe they are comparable (and I’ll probably write a blog post exploring the comparison).

sierra, camfer, cognition as just 3 of the many many great companies ↩︎
Test time compute in models like o3 is one of the recent pieces of tech enabling this ↩︎
Thanks Tony Kam from lilac labs for the terminology ↩︎ ↩︎ ↩︎
I actually wrote this recently ↩︎
I did this with the app and had a great date at kedai makan with my girlfriend ↩︎
I did this with google maps api, yelp api, exa, and a couple more pieces ↩︎ ↩︎
Taste of Mumbai, Chennai Express Authentic Indian, Babai Dosa, The Bangalore Kitchen, Dosawaala, Sangamam, Spice Bliss, Madras Dosa Corner, Dosa House (I actually might choose Dosa House over House of Chettinad, but both are great) ↩︎
I don’t know how to find good sources on this but this guy agrees with me ↩︎
This is inspired by moxie who I recently saw on a plane (celebrity shock) ↩︎
I did an internship at dark forest to cut out the crypto noise, but was not nearly as shocked. ↩︎
The gorilla project (which created BFCL) has a ton of useful information about function calling/tool use. ↩︎
Would a three-random-words method work better than UUIDv4s? ↩︎
Structured data (e.g. json) vs unstructured seems to be an open question so far ↩︎
You can force claude to use a specific tool ↩︎
Can you use async tool calling? ↩︎
I did this in this app and got great results ↩︎
Techniques like speculative decoding or task specific selection ↩︎
I once spent $40 in a day just developing/testing an app. ↩︎
Etched, Tenstorrent, Groq, Chipflow, Taalas, Cerebras, SambaNova. Check out artificial analysis to see who is on top. ↩︎
There has to be a better way than just deploying vllm containers on kubernetes. Sources - anthropic, openai, bedrock - I used to work there ↩︎
microsoft is reopening three mile island ↩︎
Superhuman make my emacs email automation somewhat obsolete ↩︎

M-% || M-x query-replace