History in the making
In this new image from our @NASAArtemis II crew, you can see Orientale basin on the right edge of the lunar disk. This mission marks the first time the entire basin has been seen with human eyes.
We see our home planet as a whole, lit up in spectacular blues and browns. A green aurora even lights up the atmosphere. That's us, together, watching as our astronauts make their journey to the Moon.
Announcing ARC-AGI-3
The only unsaturated agentic intelligence benchmark in the world
Humans score 100%, AI <1%
This human-AI gap demonstrates we do not yet have AGI
Most benchmarks test what models already know, ARC-AGI-3 tests how they learn
Our live tissue clearing paper is out in @naturemethods! We achieved optical clearing of mammalian brain tissues without compromising normal neuronal function. Big congrats to @Shigenori774 and our wonderful collaborators! 🎉
https://t.co/joMB5odihK (1/10)
@karpathy Very inspiring as always! We are also open sourcing part of our infra on automated research for Gemini to evolve itself at https://t.co/WH7JBEEm9h More complex than the nanochat setup but closer to SOTA LLM pre/post-training while staying as minimal as possible. More on the way.
for years, society was limited to only 16 syrup squares per waffle but with recent combinatorial optimization breakthroughs our research department has achieved previously unheard of densities of waffle syrup
It’s extremely good that Anthropic has not backed down, and it’s siginficant that OpenAI has taken a similar stance.
In the future, there will be much more challenging situations of this nature, and it will be critical for the relevant leaders to rise up to the occasion, for fierce competitors to put their differences aside. Good to see that happen today.
There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
Today, we are releasing our first weights from Trinity-Large, our first frontier-scale model in the Trinity MoE family. American Made.
- Trinity-Large-Preview (instruct)
- Trinity-Large-Base (pretrain checkpoint)
- Trinity-Large-TrueBase (10T pre Instruct data/anneal)