Nora

@schottkey

ᶘ ᵒᴥᵒᶅ

London

Joined June 2008

416 Following

77 Followers

439 Posts

schottkey retweeted

Jackson Atkins

@JacksonAtkinsX

8 months ago

My brain broke when I read this paper. A tiny 7 Million parameter model just beat DeepSeek-R1, Gemini 2.5 pro, and o3-mini at reasoning on both ARG-AGI 1 and ARC-AGI 2. It's called Tiny Recursive Model (TRM) from Samsung. How can a model 10,000x smaller be smarter? Here's how it works: 1. Draft an Initial Answer: Unlike an LLM that writes word-by-word, TRM first generates a quick, complete "draft" of the solution. Think of this as its first rough guess. 2. Create a "Scratchpad": It then creates a separate space for its internal thoughts, a latent reasoning "scratchpad." This is where the real magic happens. 3. Intensely Self-Critique: The model enters an intense inner loop. It compares its draft answer to the original problem and refines its reasoning on the scratchpad over and over (6 times in a row), asking itself, "Does my logic hold up? Where are the errors?" 4. Revise the Answer: After this focused "thinking," it uses the improved logic from its scratchpad to create a brand new, much better draft of the final answer. 5. Repeat until Confident: The entire process, draft, think, revise, is repeated up to 16 times. Each cycle pushes the model closer to a correct, logically sound solution. Why this matters: Business Leaders: This is what algorithmic advantage looks like. While competitors are paying massive inference costs for brute-force scale, a smarter, more efficient model can deliver superior performance for a tiny fraction of the cost. Researchers: This is a major validation for neuro-symbolic ideas. The model's ability to recursively "think" before "acting" demonstrates that architecture, not just scale, can be a primary driver of reasoning ability. Practitioners: SOTA reasoning is no longer gated behind billion-dollar GPU clusters. This paper provides a highly efficient, parameter-light blueprint for building specialized reasoners that can run anywhere. This isn't just scaling down; it's a completely different, more deliberate way of solving problems.

$JacksonAtkinsX's tweet photo. My brain broke when I read this paper. A tiny 7 Million parameter model just beat DeepSeek-R1, Gemini 2.5 pro, and o3-mini at reasoning on both ARG-AGI 1 and ARC-AGI 2. It's called Tiny Recursive Model (TRM) from Samsung. How can a model 10,000x smaller be smarter? Here's how it works: 1. Draft an Initial Answer: Unlike an LLM that writes word-by-word, TRM first generates a quick, complete "draft" of the solution. Think of this as its first rough guess. 2. Create a "Scratchpad": It then creates a separate space for its internal thoughts, a latent reasoning "scratchpad." This is where the real magic happens. 3. Intensely Self-Critique: The model enters an intense inner loop. It compares its draft answer to the original problem and refines its reasoning on the scratchpad over and over (6 times in a row), asking itself, "Does my logic hold up? Where are the errors?" 4. Revise the Answer: After this focused "thinking," it uses the improved logic from its scratchpad to create a brand new, much better draft of the final answer. 5. Repeat until Confident: The entire process, draft, think, revise, is repeated up to 16 times. Each cycle pushes the model closer to a correct, logically sound solution. Why this matters: Business Leaders: This is what algorithmic advantage looks like. While competitors are paying massive inference costs for brute-force scale, a smarter, more efficient model can deliver superior performance for a tiny fraction of the cost. Researchers: This is a major validation for neuro-symbolic ideas. The model's ability to recursively "think" before "acting" demonstrates that architecture, not just scale, can be a primary driver of reasoning ability. Practitioners: SOTA reasoning is no longer gated behind billion-dollar GPU clusters. This paper provides a highly efficient, parameter-light blueprint for building specialized reasoners that can run anywhere. This isn't just scaling down; it's a completely different, more deliberate way of solving problems.$

341

12K

11K

schottkey retweeted

Linnea Evanson, PhD @EvansonLinnea

about 1 year ago

We’re very pleased to release our latest study ‘Emergence of Language in the Developing Brain’ Paper: https://t.co/CEMqkwditV Blog: https://t.co/TrvxMlVqS4 The first systematic investigation of how the neural representations of language evolve as the brain develops. A collaboration between @AIatMeta and @FondARothschild, with @JeanRemiKing. Thread 👇

437

383

128K

schottkey retweeted

Jean-Rémi King @JeanRemiKing

9 months ago

Can AI help understand how the brain learns to see the world? Our latest study, led by @JRaugel from FAIR at @AIatMeta and @ENS_ULM, is now out! 📄 https://t.co/y2Y3GP3bI5 🧵 A thread:

320

244K

schottkey retweeted

Natalia Perez-Campanero @NPerezCampanero

over 1 year ago

It was a pleasure working with our fellows @alexandraabbas, Helyos and @schottkey on @Apartresearch work investigating Latent Adversarial Training (LAT) as a safety fine-tuning method. The study compares LAT to other methods and analyzes its impact on refusal behavior encoding.

NPerezCampanero's tweet photo. It was a pleasure working with our fellows @alexandraabbas, Helyos and @schottkey on @Apartresearch work investigating Latent Adversarial Training (LAT) as a safety fine-tuning method.

The study compares LAT to other methods and analyzes its impact on refusal behavior encoding. https://t.co/nWV03XO4eE

Who to follow

Johannes Obermayr

@j_obermayr

Communication graphics designer, father, husband with dog and studio @artischock_linz

NatalyVT♡

@NatalyVT_

♡ NatalyVT on Twitch ♡ Destiny 2 Creator Bungie ID: Nataly♡#3188 Powered by @ADVANCEDgg Business Inquiries: [email protected]

Lello Avallone

@lelloavallone

🤟🏼 Co-Founder & Cretive Partner https://t.co/pCwnnknAXd 🔥 judge @FWA 🇮🇹 Milan - Italy

Nora @schottkey

over 1 year ago

8/7 There's also a post about our research at https://t.co/H9DylNBmie

Nora @schottkey

over 1 year ago

1/7 Excited to share our recent project from LASR Labs! We investigated on the utility of SAE latents in language models. #MechanisticInterpretability #SAE Here's what we discovered: 🧠🔍

710

Nora @schottkey

over 1 year ago

7/7 Curious to learn more? Check out our paper: https://t.co/XVCDevtFLq. Huge thanks to my awesome teammates and to the organising team at LASR Labs! #LASRLabs #AIResearch #MechanisticInterpretability #SAE

schottkey retweeted

Zachary Nado @zacharynado

over 3 years ago

Excited to announce our Deep Learning Tuning Playbook, a writeup of tips & tricks we employ when designing DL experiments. We use these techniques to deploy numerous large-scale model improvements and hope formalizing them helps the community do the same! https://t.co/vDhSwZyHJm

zacharynado's tweet photo. Excited to announce our Deep Learning Tuning Playbook, a writeup of tips & tricks we employ when designing DL experiments. We use these techniques to deploy numerous large-scale model improvements and hope formalizing them helps the community do the same! https://t.co/vDhSwZyHJm https://t.co/71RYByGijj

589

334K

schottkey retweeted

Petar Veličković

@PetarV_93

over 3 years ago

📢 New course! Cats4AI🐱🤖 Learn category theory foundations from the lens of ML, grounded in concrete papers. Open to all! Sign up: https://t.co/WnbVfm3ju5 @andrewdudzik @bgavran3 @_joaogui1 @pimdehaan + fantastic speakers @math3ma @CollapsingPanda @david_i_spivak @TacoCohen