@niccruzpatane Would anyone want to park their car, go shopping and come back to realize their battery is depleted because it was used as a distributed inference node?
My favourite talks at YC AI Startup School:
1. Karpathy on LLMs as software 3.0 and a type of orchestrator OS with tools
2. Chollet on building AI that can reason - use DL to learn approx repres (intuition) to constrain discrete program search
@_catwu@OpenAI@GeminiApp@Anthropic@DavidSHershey Okay clear definitions for Agents, when to use Agents vs workflows, tips on Evals! Not much new here but always a good reminder to always build your evals early and llm as a judge goes fairly far.
Microsoft just dropped OmniParser V2, this changes everything.
This AI sees your screen, understands it, and takes action, just like a human.
100% free & open source!
The PhD thesis of my 14th PhD student, Khurram Javed (@KhurramJaved_96), is now available.
Title: Real-time Reinforcement Learning for Achieving Goals in Big Worlds
Url: https://t.co/cA3O1Oz0Ow
Abstract:
In this dissertation, I motivate the need for real-time learning and propose algorithms that can learn in real time. I argue that such algorithms are needed for achieving goals in large and partially observable environments—big worlds. I then present my algorithms, developed in collaboration with others, in two parts.
In Part I, I present algorithms that can learn quickly and reliably in the linear function approximation setting. I introduce an algorithm for learning temporal predictions—SwiftTD—and use it to develop an algorithm for decision-making—SwiftSarsa. The key property of these algorithms is that they can learn with large step-size parameters online without the instability associated with quick online learning.
In Part II, I present algorithms for learning non-linear recurrent features efficiently. I introduce the idea of continual imprinting for generating useful candidate features, and I present an algorithm for efficiently computing the gradients of recurrent features online.
Khurram is now a research scientist at Keen Technologies.
Tiny teams are the future:
• Cursor: 0 to $100M ARR in 21 months w/ 20 people
• Bolt: 0 to $20M ARR in 2 months w/ 15 people
• Lovable: 0 to $10M ARR in 2 months w/ 15 people
• Mercor: 0 to $50M ARR in 2 years w/ 30 people
• ElevenLabs: 0 to $100M ARR in 2 years w/ 50 people
For friends of open source: imo the highest leverage thing you can do is help construct a high diversity of RL environments that help elicit LLM cognitive strategies. To build a gym of sorts. This is a highly parallelizable task, which favors a large community of collaborators.
DeepSeek-R1: What's the main takeaway & what should we expect next?
I asked AI researchers and Jordan Schneider from ChinaTalk.
FYI: Long post.
Finbarr Timbers, @finbarrtimbers (Artfintel, former DeepMind)
1) What's the main takeaway:
The biggest update that we should see is that it turns out to be very easy to replicate these reasoning models, and it's purely a question of getting the right data. And that wasn't obvious: when o1 came out, there was a lot of speculation as to how they did it, and you know whether or not it'd be difficult to replicate, and it just turns out it's not! It turns out you take a good base model and you do very basic RL, and that's enough.
2) What should we expect to see next:
First, in the domains where there are objective signals, like math questions, generally science and technology and engineering questions, or where there's code we're able to execute and get signal, we should expect massive improvements. And we should expect to be able to pay more and get better answers. Before this, it wasn't clear that you could pay for a good model without going in and getting, millions of of examples of data that you could do pre training on. In the medium term, I think we will see general reasoning improvements. Another question is whether or not it's possible iterate on this data and make it better and better and better. Because if you run this thinking process and you're able to figure out an answer to this question, can you then have a model, summarize the reasoning trace, summarize the chain of thought and then get a more concise version. And then, instead of thinking for 1000 tokens, you can think for 100 tokens. And then just keep iterating on that. I strongly suspect that we can. So not only we will see this new capability where our models are going to do well in reward rich settings, but we should also see an improvement in the rate of progress for research on these domains where this objective reward signal exists.
3) What's important but under-discussed:
This is a really simple approach. If you were to say, hey, let's use RL with a language model to solve these problems, this would be pretty close to the first thing that I'd try. And it turns out that it works. There was all of work on Atari that made Atari agents really good. And we're just not using those tricks. So, how much of this stuff can we apply? And, how do we efficiently use inference-time compute? I still think MCTS has a lot of opportunity. (Finbarr has a good read on his blog related to what research gets wrong about MCTS.) Muesli is another RL algorithm that DeepMind wrote. I think that some of this stuff is going to be useful. And we're scratching the surface on how we're doing search and how we're doing the reinforcement learning here. I think there's a lot of potential to improve it.
Jacob Buckman, @jacobmbuckman (Co-founder at Manifest AI, former Google Brain)
1) What's the main takeaway:
Don't discount boring seeming optimizations. There's nothing fundamental here, it's just a bunch of really good engineering making things work. Also, you need to imagine they did lots of hyperparameter sweeping and ablations. Don't discount the value of getting things exactly right. In this field there's a really high leverage on ideas and precise implementation.
2) What should we expect to see next:
Expect to see big teams investing more heavily in research and less in naive compute scaling. Instead of just dumping dollars into GPUs and getting the edge that way, they'll invest in getting good scaling laws. That said, they'll still be dumping the dollars into the GPUs also, because once they have the good scaling laws, they scale up.
3) What's important but under-discussed:
The claim that everyone is wild about is the pre-training budget, which is from V3. But people didn't seem to take much notice of V3 until R1 came out. Which is a different model that wasn't trained on the $6mm budget. That one is definitely doing a bunch of distillation stuff which means they trained a big one and trained a small model off of the big one. It's also doing reasoning inference time unrolling stuff. I haven't read it in detail, but they must've done something like collecting the unrolling data. Which either means they scraped it from humans (paid $ for data collection), maybe they found a way to scrape it from OpenAI/o1 CoT to bootstrap it, or maybe a classic RL setup with search to find the right answer and then add that to the dataset (paying compute for dataset construction). So even if the parameter updates aren't expensive, the parameter updates are on data that was expensive to collect and create. It was bizarre that that is what got the attention: the distilled reasoning fine-tuned model, and then that attention was broadcast back to DeepSeek-V3 and their training budget. Which was impressive, but not impressive enough for this level of attention.
Jordan Schneider, @jordanschneider (Founder of ChinaTalk)
3) What's important but under-discussed:
Compute still matters. DeepSeek closed itself to new signups today. Even with its efficiency gains, it's one thing to train a model and another to deploy it to millions of users. Export controls on SME (semiconductor manufacturing equipment) and AI chips, smart immigration policy, and government policies that support AI diffusion are the best tools at the US government's disposal to ensure liberal democracies maintain a sustainable edge on AI vs China
Nathan Lambert, @natolambert (Post-Training Lead at Allen AI)
1) What's the main takeaway:
R1 is about the pace of innovation when new techniques are found.
2) What should we expect to see next:
The most important thing that DeepSeek shows us is that the pace and opportunity of innovation in AI is still very high and we’re in for a wild ride. Oh, and the people in charge should figure out the geopolitics.
3) What's important but under-discussed:
I think everything important is mentioned, but it is drowned in crap takes. DeepSeek is a great team and talent is spread around the world, we, as Americans, want it here!
Ross Taylor, @rosstaylor90 (Previously led Meta AI reasoning team)
1) What's the main takeaway:
Research update: Never underestimate the power of pushing a simple baseline with more compute.
Product update: Open thought models which show the chain-of-thought are compelling products
2) What should we expect to see next:
Reasoning is going to be everywhere. Instead of thinking about agents in the world, it's maybe more helpful to think of the world itself becoming more thought-dense -> we'll be able to embed thought into things that previously were static or reactive.
3) What's important but under-discussed:
The interplay between generation and verification, how they are linked, and what that means for the future.
Reasoning in more modalities than just text.
Continual learning and better understanding of task context.
Thank you to Finbarr Timbers, Griffin Choe, Gwern, Herrick Fang, Jacob Buckman, Jordan Schneider, Josh Singer, Matt Figdore, Nathan Lambert, Rick Barber, Ron Bhattacharyay, Ross Taylor, Tim Wee and Tyler Cowen.
What's the even more important question that I should be asking?
That a bunch of Chinese hobbyists could release an AI that is more competent than American models, more cost efficient, has 3% of the environmental impact, and can pretty much run on a Raspberry Pi and is... open source, should not be shocking to the West.
There's room to be skeptical of course, but one should also take this seriously. I've said time and time again that people grossly misunderstand that "the Chinese can't innovate because they lack freedoms."
Not only have they invested more in AI, they are also far more focused. Imagine what you can do when you don't have to worry about relitigating your history or canceling engineers because they put out internal memos accurately describing why evolutionary psychology explains varying outcomes between men and women?
The arrogance of the West is similar to China's hubris during the Ming Dynasty when the "Middle Kingdom Syndrome" gave it the illusion that Chinese civilization was superior to the rest of the world, and that it did not need to learn anything from it. China has since learned from that. It takes the best from the West and adapts it for its own purposes.
DeepSeek's R1 is a Sputnik moment, guys. Time to wake the hell up.
In a recent technical report, LearnLM, our set of AI models and capabilities fine-tuned for learning, outperformed other leading AI models on the principles of learning science. Now it���s available to try out in AI Studio. Learn more ↓ https://t.co/V6IPvlSo4n
Introducing *Transfusion* - a unified approach for training models that can generate both text and images. https://t.co/xGdVZ0M1Xx
Transfusion combines language modeling (next token prediction) with diffusion to train a single transformer over mixed-modality sequences. This allows us to leverage the strengths of both approaches in one model. 1/5
Show-o
One Single Transformer to Unify Multimodal Understanding and Generation
discuss: https://t.co/lfTLQOgGs8
We present a unified transformer, i.e., Show-o, that unifies multimodal understanding and generation. Unlike fully autoregressive models, Show-o unifies autoregressive and (discrete) diffusion modeling to adaptively handle inputs and outputs of various and mixed modalities. The unified model flexibly supports a wide range of vision-language tasks including visual question-answering, text-to-image generation, text-guided inpainting/extrapolation, and mixed-modality generation. Across various benchmarks, it demonstrates comparable or superior performance to existing individual models with an equivalent or larger number of parameters tailored for understanding or generation. This significantly highlights its potential as a next-generation foundation model.
Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery!
https://t.co/jC7g5GPVsE
From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI Scientist opens a new era of AI-driven scientific research and accelerated discovery.
Here are 4 example Machine Learning research papers generated by The AI Scientist.
We published our report, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, and open-sourced our project!
Paper: https://t.co/lTQ8UenFHk
GitHub: https://t.co/Im53whVeAq
Our system leverages LLMs to propose and implement new research directions. Here, we first apply The AI Scientist to conduct Machine Learning research. Crucially, our system is capable of executing the entire ML research lifecycle: from inventing research ideas and experiments, writing code, to executing experiments on GPUs and gathering results. It can also write an entire scientific paper, explaining, visualizing and contextualizing the results.
Furthermore, while an LLM author writes entire research papers, another LLM reviewer critiques resulting manuscripts to provide feedback to improve the work, and also to select the most promising ideas to further develop in the next iteration cycle, leading to continual, open-ended discoveries, thus emulating the human scientific community. As a proof of concept, our system produced papers with novel contributions in ML research domains such language modeling, Diffusion and Grokking.
We (@_chris_lu_, @RobertTLange, @hardmaru) proudly collaborated with the @UniOfOxford (@j_foerst, @FLAIR_Ox) and @UBC (@cong_ml, @jeffclune) on this exciting project.
In 2013, at 23, I felt on top of the world.
I'd just sold my company for $5M. Well, kinda.
I'll tell you the story of how I lost it all.
The $5M was in stock in a VC-backed company. But not just any stock.
This company wasn't just any company.
The company was doing $35M in revenue, with 70% gross margins.
It was backed by Silicon Valley's who's who. I was living the dream. Or so I thought.
My thinking was simple: Worst case? We'd cut staff, print $20M/year.
And that rate, I'd easy be able to get my $5M of stock out, maybe even more.
How could it go wrong?
Spoiler alert: It did. Lol.
Slowly, then all at once, we became a zombie company.
Revenue started declining. Growth stalled. VCs lost interest.
Here's the thing about the VC model. They care about growth and the next big unicorn.
We looked like a donkey with a party hat.
Suddenly, no one wanted to fund us. We had to sell. Fast.
We found a buyer. People congratulated us. But I knew the truth: This wasn't a success.
The outcome? We got nothing. Zip. Nada. My $5M paper fortune? Gone with the wind.
But here's the silver lining: I learned this lesson at 23, not 43.
Fast forward to today:
I run a different kind of company. We're profitable. We grow steadily. No VC money. No paper valuations.
The best part of my job now is sending out profit shares. 2x a year.
Our team's reaction is priceless: "Wow, thank you! This is real?"
They're used to VC-backed startups:
1. Equity worth millions (on paper)
2. Promises of future riches
3. Reality - 90%+ of the time worth nothing
I've been there, worn the t-shirt. VC equity is just gravy. Maybe it pays off, probably not.
But profit shares? That's real money.
In your bank account. Buy a car. Put a down payment on a house. Live your life now, not in some hypothetical future.
This is why we're seeing the rise of the dividend startup.
More and more people are choosing real money over paper unicorns.
Here's my lesson learned:
Build a business that prints cash, not promises.
Focus on profitability, not vanity metrics.
Grow steadily, not at all costs.
Your team will thank you.
Your stress levels will thank you.
Your bank account will definitely thank you.
Your family or future family will thank you.
Am I grateful for my $5M lesson? Absolutely. It shaped who I am today.
So here's to failing young, learning fast, and building businesses that matter.
Real value. Real profits. Real impact.
I'm not saying you can't build a VC-backed business and build wealth. You totally can.
But the odds are stacked against you. And in 2024, easier than ever to build and find customers, building a "small business" like a micro-saas or niche marketplace could be quite the adventure and retirement plan in its own way.
And most employees think when they join a VC-backed rocketship, that their stock is as good as gold.
It usually isn't.
Sharing this story in case it's useful to someone.
The rise of the dividend startup isn't just coming.
It's here.
@soniajoseph_ Wrt AI, MTL feels closer to a big research lab where most ppl are more concerned with publishing papers. SV ppl care more about building product or working for innovative/hype-y startups
@soniajoseph_ I've worked and lived in both places. The Montreal ecosystem feels tiny, provincial and not global enough. People in SV are hungrier (and chase big paychecks/exits). But I agree there's a cult-ish mono-tech-culture-bubble in SV, while MTL feels more diverse with its arts scene