I have decided to leave Meta Superintelligence Labs.
I’ve been at Meta for more than three years, starting during my PhD. I’ve learned a lot during this time, and I’m grateful to have worked with many amazing collaborators.
We advanced the frontier of LLMs, reinforcement learning, and computer use agents. I’m pretty proud of our latest work on rigorous agent evaluations and CUA envs (check it out: https://t.co/goZSd8N0wB).
Looking forward to taking some time to explore what’s next!
You deserve more than a crowded Blue Bottle and a batch of 400 startups...
I'm launching Kernel Grants, a pre-seed program that will invest $271,828 in 10 founders/year who are building tooling and infrastructure for the token factories of the future.
We have an amazing set of speakers lined up for our first set of events:
- @pirroh, President of Replit
- @soumithchintala, CTO of Thinky
- @jeremyphoward, Founder of https://t.co/RHjK7ZPIFM
- @NaderLikeLadder, Dir. of DevTech at NVIDIA
- @OfficialLoganK, MOTS at DeepMind (and first ever Latent Space guest!)
- @clattner_llvm, Founder of Modular
- @dylan522p, Founder of SemiAnalysis
- @swyx, Editor of Latent Space (+ AIE, Cognition, etc!)
Batches are a relic of pre-AI acceleration. Any day is a great day to start building, so applications are open and we accept founders on a rolling basis. Let's build!
https://t.co/KM8KtRiniH
Enjoy an exclusive tour of our Kernel space 👀
@finbarrtimbers nice, let me know what you think! this was one of the first papers introducing the idea of LLM as judges for training agents, and it came exactly from trying to address that contradiction you were bringing up
1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵
this assumes that intelligence scales nicely and uniformly, that failing small pattern tasks means failing big science.
but intelligence is asymmetric. humans who crack deep problems often stumble on simple ones. why should we aim for AI to be different?
Many people expect that current AI is ready to cure cancer and do breakthrough new science. ARC-AGI-3 envs are like a microcosm of the scientific method: you must observe a tiny world, form a theory of how it works, test it, iterate until correct. Over the course of a few minutes.
If AI can't do it in an ultra-simple, ultra-small scale setting that is explicitly designed to be as accessible as possible, I expect there are a few steps missing until AI can crack the nature of reality.
@mpourmirzaei nethack has a similar concept of time and doesn't require a fast reaction time. on long context understanding and handling, that's in my opinion an important facet of intelligence that an agi benchmark should monitor, not ignore
wait, if the test for AGI is hard games, why don't we just use existing hard games as the benchmark?
I feel like arc-agi-n is supposed to be basically nethack or dark souls... but we already have those, no reason to wait to build them
this was the original idea of the pioneers of modern reinforcement learning (e.g., @_rockt,). why should we go in circles?
Announcing ARC-AGI-3
The only unsaturated agentic intelligence benchmark in the world
Humans score 100%, AI <1%
This human-AI gap demonstrates we do not yet have AGI
Most benchmarks test what models already know, ARC-AGI-3 tests how they learn
@punchesbears to me this doesn't sound much different from training and evaluating on a complex and rich game where episodes are procedurally generated, such as nethack
@ChaseBrowe32432 is contamination really an issue? models are not good on hard games benchmarks even if there is data about them on the internet. and once agi-arc is out you immediately will have contamination