Enthralled by machine learning / artificial intelligence, robot•me CTO, software engineer, Dai the robot co-creator, president of impactIA foundation, Genève
@BingBongBrent@arcprize@OpenAI@poetiq_ai Poetiq is the white triangle on the top right. It's clearer on the arc-agi site when you filter by author.
Why their name isn't displayed 🤷
From the makers of the popular AlphaGo documentary, The Thinking Game gives a much broader picture of the story of DeepMind and our mission to build AGI, drawing on interviews with myself and others going back many years.
You can now freely watch it here: https://t.co/hCIicyWbLi
ARC25 is over and despite a lot of work I have been unable to implement my vision successfully. I hope to learn from other teams’ solutions and refine my ideas for ARC26. I am currently 6th on the public test set. Read about my vision and experiments: https://t.co/Jk8klSz5GF
@StphTphsn1@Dorialexander Yes, very much iid and fairly simple tasks belonging to eg a single 20-person service. But I'm pretty sure they would have failed even a few months ago.
@StphTphsn1@Dorialexander I've seen a significant increase in robustness of data extraction / instruction following scenarios over the past 12 months with high 9x% accuracy/F1 now achievable on real world tasks.
New paper 📜: Tiny Recursion Model (TRM) is a recursive reasoning approach with a tiny 7M parameters neural network that obtains 45% on ARC-AGI-1 and 8% on ARC-AGI-2, beating most LLMs.
Blog: https://t.co/w5ZDsHDDPE
Code: https://t.co/7UgKuD9Yll
Paper: https://t.co/3m8ANhNMiw
From https://t.co/icZYDE2inN "the private eval set is only accessible via the no-internet-access Kaggle competition"
"The semi-private eval set was calibrated to have the same difficulty as the public eval set, but researchers need to coordinate with the ARC-Prize team to test their model on it in a Kaggle notebook that runs at most 12 hours."
From the Kaggle page "This leaderboard is calculated with approximately 50% of the test data. The final results will be based on the other 50%, so the final standings may be different."
So the ARC-AGI-2 scores on both pages are measured in different ways but are somewhat comparable?
@arcprize@podesta_aldo How should the ARC-AGI-2 scores here https://t.co/AsH6ytGsx7 be compared to those on the Kaggle leaderboard here https://t.co/ITs8M69d5W ?
It looks like J. Berman working outside the Kaggle competition has a higher score of 29.4%. Are the constraints different?
I spent the past month reimplementing DeepMind’s Genie 3 world model from scratch
Ended up making TinyWorlds, a 3M parameter world model capable of generating playable game environments
demo below + everything I learned in thread (full repo at the end)👇🏼
Ever wondered how Energy-Based Models (EBMs) work and how they differ from normal neural networks?
☕️We go over EBMs and then dive into the Energy-Based Transformers paper to make LLMs that refine guesses, self-verify, and could adapt compute to problem difficulty. (link👇)
Here's how I (almost) got the high scores in ARC-AGI-1 and 2 (the honor goes to @jeremyberman) while keeping the cost low. To put things into perspective: o3-preview scored 75.7% on ARC-AGI-1 last year while spending $200/task on low setting. My approach scores 77.1% while spending $2.56!