Everyone is bragging about using claude.
Almost nobody is talking about the real opportunity:
Training AI to think like YOUR business.
That's where the money is.
Here's what I built and why I think most companies will get this wrong 👇:
The biggest AI divide won't be between people who know how to code and people who don't.
It'll be between people who know how to ask good questions and people who don't.
For 20 years, we were rewarded for having answers.
For the next 20, we'll be rewarded for curiosity.
AI isn't replacing intelligence.
It's exposing how we think.
The biggest AI divide won't be between people who know how to code and people who don't.
It'll be between people who know how to ask good questions and people who don't.
For 20 years, we were rewarded for having answers.
For the next 20, we'll be rewarded for curiosity.
AI isn't replacing intelligence.
It's exposing how we think.
For anyone wondering what this this is actually saying:
Most multimodal AI models have extra models sitting in front of the LLM.
One model translates images, another translates audio, and then the LLM gets the processed information.
The first diagram shows that traditional setup: image and audio go through separate encoders before reaching the language model.
What Gemma 4 is trying to do is cut out those extra layers.
Instead of relying on large dedicated vision and audio encoders, it converts images, audio, and text into the same token format and feeds them into a single 12B model.
The second diagram shows that simplified architecture.
One model is doing the heavy lifting instead of multiple specialized models working together.
The third diagram explains how images become tokens.
The image is split into small patches, those patches are compressed into fewer representations, and the model processes those instead of every individual pixel.
The interesting part isn't that Gemma can handle text, images, and audio. Lots of models can do that.
The interesting part is how much of that capability Google is packing into a single model instead of stitching together several different ones.
Brutal truth for local businesses:
The biggest threat to your business isn't your competitor down the street.
It's becoming invisible when customers ask AI who to buy from.
A few years ago, ranking on Google was enough.
Today?
People search on Google.
They ask ChatGPT.
They use Gemini.
They trust AI summaries more than scrolling through 20 websites.
That's why SEO, AEO, and GEO aren't nice to haves anymore.
SEO helps you get discovered.
AEO helps you become the answer.
GEO helps AI engines mention and recommend your business when customers are ready to buy.
And here's the scary part:
If AI doesn't know who you are, it can't recommend you.
Meanwhile, your competitor is showing up in searches, AI answers, local recommendations, and buying conversations 24/7.
The future isn't about ranking #1.
It's about owning the answer wherever customers are looking.
The businesses that figure this out now will dominate local markets for the next decade.
Brutal truth for local businesses:
The biggest threat to your business isn't your competitor down the street.
It's becoming invisible when customers ask AI who to buy from.
A few years ago, ranking on Google was enough.
Today?
People search on Google.
They ask ChatGPT.
They use Gemini.
They trust AI summaries more than scrolling through 20 websites.
That's why SEO, AEO, and GEO aren't nice to haves anymore.
SEO helps you get discovered.
AEO helps you become the answer.
GEO helps AI engines mention and recommend your business when customers are ready to buy.
And here's the scary part:
If AI doesn't know who you are, it can't recommend you.
Meanwhile, your competitor is showing up in searches, AI answers, local recommendations, and buying conversations 24/7.
The future isn't about ranking #1.
It's about owning the answer wherever customers are looking.
The businesses that figure this out now will dominate local markets for the next decade.
@HuggingPapers the hybrid mamba-2 + transforme moe is the part worth paying attention to
if it keeps coherence at 1m while staying efficient on inference, it could actually move the needle for long document or multi step agent work where pure transformers still choke
for people who actually have obesity or type 2 diabetes, these drugs are legitimately transformative
for healthy people using them purely for longevity or mild weight loss, we're still guessing on long term tradeoffs like muscle loss, nutrient absorption, and what happens when you stop
@venom1s telling men to treat every unknown woman in danger as someone else's problem turns a sad accident into a gender score settling exercise
most families would rather have their guy alive, but a society where nobody intervenes for strangers is also miserable
@AiBattle_ i'm not that worried yet
chinese labs in particular love to stay vague until the last minute, and qwen has a track record of eventually dropping the models people actually asked for even after periods of silence
the blog difference from 3.6 is noticeable though
@bindureddy the real problem is almost never the models themselves
it's that most companies still have zero serious way to measure what ai is actually saving or earning them, so every spend looks like a black hole once the initial hype budget runs dry
@JonhernandezIA technology that moves faster than society's ability to understand and adapt to it tends to create backlash or quiet damage
real iteration requires genuine feedback loops, not just shipping and then doing pr about listening
@vrexec people who have never done high skill work at a high level don't understand how much time used to go into producing something truly excellent
ai collapsing that from weeks to under an hour for someone who knows exactly what excellent looks like is a genuine step change
@GaryMarcus even if llms stay relatively inefficient, the top 1-2 players can still be extremely profitable because of distribution, switching costs, and enterprise lock in
being 3x more expensive than the alternative doesn't matter if you're 5x better and already integrated everywhere
@davidpattersonx even if you can run a strong model locally, the best performance still comes from whoever did the best post training, alignment, and tool integration
that advantage doesn't magically distribute just because inference is decentralized
Ideogram 4.0 is the best open source model ever.
I jut ran locally on my potato laptop, it took 1min 41 sec to generate complete image.
Look at the image below, it is just great as fuck.
I really love this model.
My hardware: 64GB RAM and 6GB VRAM with RTX 4050.
Great model
@googleaidevs 12B running serious agentic workflows on 16GB VRAM is the part that matters most for most people
Local multimodal agents that don't require a 4090 or cloud round trips are finally becoming realistic
Ideogram 4.0 is the best open source model ever.
I jut ran locally on my potato laptop, it took 1min 41 sec to generate complete image.
Look at the image below, it is just great as fuck.
I really love this model.
My hardware: 64GB RAM and 6GB VRAM with RTX 4050.
Great model