@deepseek_ai R1's open-source launch has sent shockwaves through tech markets, thanks to costs that are far lower than those of giants like @OpenAI or @Google
Yet there’s a bigger picture that few are talking about🧵
In the end, success won’t merely hinge on who has the biggest model but on who best leverages these new approaches to deliver real-world impact.
The AI landscape is evolving rapidly—and it’s thrilling to see how these trends will shape the future.
@deepseek_ai R1's open-source launch has sent shockwaves through tech markets, thanks to costs that are far lower than those of giants like @OpenAI or @Google
Yet there’s a bigger picture that few are talking about🧵
This suggests that companies might pivot to offering specialized products built around 'smaller' models, targeting specific industries. By integrating AI into existing workflows, these solutions might deliver greater value than simply providing access to large, generic models.
1) DeepSeek r1 is real with important nuances. Most important is the fact that r1 is so much cheaper and more efficient to inference than o1, not from the $6m training figure. r1 costs 93% less to *use* than o1 per each API, can be run locally on a high end work station and does not seem to have hit any rate limits which is wild. Simple math is that every 1b active parameters requires 1 gb of RAM in FP8, so r1 requires 37 gb of RAM. Batching massively lowers costs and more compute increases tokens/second so still advantages to inference in the cloud. Would also note that there are true geopolitical dynamics at play here and I don’t think it is a coincidence that this came out right after “Stargate.” RIP, $500 billion - we hardly even knew you.
Real: 1) It is/was the #1 download in the relevant App Store category. Obviously ahead of ChatGPT; something neither Gemini nor Claude was able to accomplish. 2) It is comparable to o1 from a quality perspective although lags o3. 3) There were real algorithmic breakthroughs that led to it being dramatically more efficient both to train and inference. Training in FP8, MLA and multi-token prediction are significant. 4) It is easy to verify that the r1 training run only cost $6m. While this is literally true, it is also *deeply* misleading. 5) Even their hardware architecture is novel and I will note that they use PCI-Express for scale up.
Nuance: 1) The $6m does not include “costs associated with prior research and ablation experiments on architectures, algorithms and data” per the technical paper. “Other than that Mrs. Lincoln, how was the play?” This means that it is possible to train an r1 quality model with a $6m run *if* a lab has already spent hundreds of millions of dollars on prior research and has access to much larger clusters. Deepseek obviously has way more than 2048 H800s; one of their earlier papers referenced a cluster of 10k A100s. An equivalently smart team can’t just spin up a 2000 GPU cluster and train r1 from scratch with $6m. Roughly 20% of Nvidia’s revenue goes through Singapore. 20% of Nvidia’s GPUs are probably not in Singapore despite their best efforts. 2) There was a lot of distillation - i.e. it is unlikely they could have trained this without unhindered access to GPT-4o and o1. As @altcap pointed out to me yesterday, kinda funny to restrict access to leading edge GPUs and not do anything about China’s ability to distill leading edge American models - obviously defeats the purpose of the export restrictions. Why buy the cow when you can get the milk for free?
Will 2025 be make-or-break for OpenAI?
As Google and Apple perfect AI hardware integration, OpenAI faces a critical choice: innovate beyond software or risk losing its edge.
IS ChatGPT’s Hardware on the Horizon:
Buffett, Active Investing and Index Funds...
In 2008, Warren Buffett issued a challenge to the hedge fund industry, and a million-dollar bet was made.
Buffett's position was that, including fees, costs and expenses, an S&P 500 index fund would outperform a hand-picked portfolio of hedge funds over 10 years. The bet pit two investing philosophies against each other: passive and active investing.
Buffett picked the S&P500 Index. The hedge funders picked their actively managed funds. At the end of ten years, they looked back and Buffett won.
A recent article in Bloomberg reinforces this point. Only one equity mutual fund, the $7.1B Baron Partners Fund, has outperformed the Invesco QQQ ETF (Nasdaq ETF) over the past 5, 10 and 15 years.
Said differently, passively investing in the Nasdaq ETF exposed you to the gains of the best companies of this era without you having to do any work or diligence. All the best companies were part of the ETF. When one of those company lagged, their composition in the index fell or dropped all together. And when a company did well, their composition in the index would increase or they were added if they weren't part of it beforehand.
Passive investing allowed the ETF manager to define simple rules and then do all the work for you. The companies it picked, because of its rigid rules, turned out to be far superior to those picked by active investors. So much so that only ONE fund (out of thousands) managed to beat the ETF.
The lesson is that for most people, they will find that this is the superior method for investing in the stock market. Allocate some money (say each month) to a very low cost ETF and then let the ETF manager, natural selection and compounding do the rest.
No more hours of video editing.
ChatGPT can now create a video commercial with the script, voice-over, music and everything with just two prompts.
I will show you how in 4 easy steps 👇