Running a company:
2020: can you survive a pandemic?
2021: still here? we’re going to give all of your competitors $100m series A rounds.
2022: wow, you made it? okay, all engineers cost $600,000/year now.
2023: nice job! okay, SVB failed and we’re going to take away your bank account.
2024: a survivor I see. but can you pivot from ai to crypto to defense tech back to ai-enabled defense tech in a 12 month period to stay relevant?
2025: unfortunately all of your competitors have raised $2b series B rounds. oh and only 500 engineers are relevant and they cost $100m/yr each.
2026: well, well, well. you’re still in business? let’s deploy the thunderclap of godlike LLMs from the heavens so all of your customers can rebuild your app in 2 hours. can you survive?
I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter.
The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input.
Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in:
- more information compression (see paper) => shorter context windows, more efficiency
- significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images.
- input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful.
- delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go.
OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa.
So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to.
Now I have to also fight the urge to side quest an image-input-only version of nanochat...
I don’t think it’s too late to enter almost any software market with an LLM-powered alternative if you want to.
We are seeing tiny teams of a few people build valuable software leapfrogging incumbents with even today’s frontier models (get enterprise sales)
The war will be retention: who can serve your vertical better?
Two ways for an AI company to protect itself from competition: (a) depend not just on AI but also deep domain knowledge about a particular field, (b) have a very close relationship with the end users.
Contradictory things I believe:
1. the world is tiny / the world is huge
2. life is short / life is long
3. future is mundane / future is weird
1.
People everywhere are increasingly similar. You can easily know everyone in the top of any field. Powerful people are a small global community. The world feels small.
No matter how niche an interest is, you can find millions like you. The long tail of differences between people and cultures is astounding. And there are endless places with gripping histories to learn about. The world feels big.
2.
You can fully count your days. The number of vacations you will take, the number of times you will visit your parents, and the number of days you have your kids with you at home are painfully limited. Life feels short.
You can live many lives in one. Change careers, compete in sports, build a business or two—maybe attempt to change the world. Life is plenty.
3.
Reading history I’m struck by how little changes. People dealt with the same problems: they loved, laughed, and mourned. We have the same unanswered questions about meaning. And with the exception of new gadgets our lives rarely change. The future is mundane.
While we don’t have flying cars, we’ve created entirely new worlds: virtual worlds. We live insanely different media lives and that changes everything. We’re tapping deep into the human Id and materializing strange desires into hyperreality. The future is weird.
It's been an insane year in AI. To try and understand what's going on, I asked 10 awesome investors what trends and companies to watch out for.
Here's what they said 👇 🧠
🔗 https://t.co/uGGEEHqPLA