Mohammed Alshehri

@SwishMoe

23 | Applied ML\RL, Post Training → building and learning prev @ibmwatsonx

London

Joined July 2017

1.7K Following

581 Followers

2.5K Posts

Pinned Tweet

Mohammed Alshehri

@SwishMoe

5 months ago

My implementation of the Recursive Language Model (RLM) paper by @a1zhang , Kraska, and @lateinteraction . Key insight: "Treat long context as an external environment, not something to stuff into a context window." Applied to video understanding — instead of encoding 38K frames into a prompt, the agent: → Treats video as an environment → Writes code to explore segments → Uses recursive LLM sub-calls for analysis Tested: 20+ min video, 7 steps, $0.002 Paper: https://t.co/sMkqVscWZD Code: https://t.co/J3GxdlKeav

852

875

58K

SwishMoe retweeted

Patrick Collison

@patrickc

about 8 hours ago

I want some kind of LLM workflow tool. • Ability to manage a set of input files (Markdown or similar), plus other general-purpose context. • With real-time collaboration. (And maybe some concept of snapshots or VCS integration.) • And the ability to create/manage a inference workflows and a stored set of prompts. • Access to general-purpose coding agents (and not just chat models). • Some concept of compiled outputs/inference results (which ideally can be shared externally). Many projects have this feeling: "there is all this stuff, which I want to process/compute over in this iterated way, with some build artifacts being important/worth saving." GNU Autotools x Notion or something. Is anyone building this?

230

130K

Mohammed Alshehri

@SwishMoe

about 11 hours ago

Link:https://t.co/CGYGtjsz9M

Mohammed Alshehri

@SwishMoe

about 11 hours ago

I found Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses really interesting. It’s from researchers at UIUC, UC Berkeley, and Chroma, and the core idea is simple but important: don’t force the model to remember and manage all search state inside a messy transcript. Move the bookkeeping into the harness, then let RL focus on the actual semantic decisions. Why I recommend reading it: • It frames harness design as part of the learning problem, not just infrastructure around the model. • The agent keeps explicit working memory: candidate pools, curated evidence, verification records, evidence graphs, and budget-aware context. • Harness-1, a 20B search agent, beats strong open search agents across eight retrieval benchmarks and stays competitive with much larger frontier models. • The most interesting part is transfer: the gains are stronger on held-out benchmarks, which suggests the model is learning general search behavior, not just memorizing domains. Main takeaway: this paper came at the right time because everyone is trying to make agents more reliable, and it shows that better RL might require better environments, not just bigger models or stronger rewards.

SwishMoe's tweet photo. I found Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses really interesting. It’s from researchers at UIUC, UC Berkeley, and Chroma, and the core idea is simple but important: don’t force the model to remember and manage all search state inside a messy transcript. Move the bookkeeping into the harness, then let RL focus on the actual semantic decisions.

Why I recommend reading it:

• It frames harness design as part of the learning problem, not just infrastructure around the model.

• The agent keeps explicit working memory: candidate pools, curated evidence, verification records, evidence graphs, and budget-aware context.

• Harness-1, a 20B search agent, beats strong open search agents across eight retrieval benchmarks and stays competitive with much larger frontier models.

• The most interesting part is transfer: the gains are stronger on held-out benchmarks, which suggests the model is learning general search behavior, not just memorizing domains.

Main takeaway: this paper came at the right time because everyone is trying to make agents more reliable, and it shows that better RL might require better environments, not just bigger models or stronger rewards.

821

Who to follow

استشاري طب وجراحة الجلد والعلاج بالليزر، حاصلة على البورد والزمالة الأمريكية. للتواصل: 0501588164

Anthony Hacking

@Anthony_Hacking

28. Arsenal fan. Assistant Secretary/Youth Officer for Blackburn Labour. Music/Gigs.

Mohammed Alshehri

@SwishMoe

about 12 hours ago

@SavinovNikolay Congrats Nikolay!!!!!

Mohammed Alshehri

@SwishMoe

about 12 hours ago

@_sholtodouglas Looks like the next frontier model is trained on reps and suffering.

121

Mohammed Alshehri

@SwishMoe

about 17 hours ago

@realchillben @wulfie_bain_ @OpenAI Dude is huge

Mohammed Alshehri

@SwishMoe

1 day ago

Soooo grateful for platforms like @PrimeIntellect and @tinkerapi. They’ve genuinely changed the way I view LLMs and technology.

SwishMoe retweeted

Paul Graham

@paulg

1 day ago

Sam Altman deserves credit for YC's turn toward hard tech. When he became CEO in 2014 he went out and recruited companies doing stuff like airliners and fusion, and hard tech startups have been some of the best in every batch since.

103

100

323

344K

Mohammed Alshehri

@SwishMoe

2 days ago

@karpathy and @DarioAmodei are pointing at one of the most important research loops in AI: systems that improve the process of building better systems.

Anthropic

@AnthropicAI

2 days ago

Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor. It’s happening faster than we thought, and the implications deserve greater attention. https://t.co/OVVPJO7VQx

28K

15K

18M

SwishMoe retweeted

Bernt Bornich

@BerntBornich

2 days ago

We’re going all in on World Models. Today we’re launching the 1X World Model Lab. The bet is simple: You can’t fine-tune your way to AGI. And you definitely can’t fine-tune your way to robots that can operate in the physical world. General-purpose humanoids need models that understand space, motion, objects, causality, affordances, physics, and action before they ever see a specific task. The frontier is not better VLA wrappers. The frontier is embodied world models. The 1X World Model Lab will focus on large-scale embodied world model pretraining: building the most generalizable foundation model for humanoid robots from the ground up. The next frontier in AI requires scaling: web-scale media + egocentric human videos + sim + dexterous remote operated robot data + on-policy NEO data → real-world deployment for robot data collection and RL → abundance of data → physical AI The robot collects data. The model gets better. The robot gets better. Repeat. To lead this, we brought in one of the best for the mission: @_sam_sinha_ , as Head of World Models. Sam was a founding research scientist at Luma AI and has been at the frontier of scaling multimodal generative video models his whole career. If you’re the best in the world at large-scale pretraining, video models, robotics, RL, infra, or data — and you want your models to move atoms, not just pixels — join us. Send background + evidence of exceptional ability to: [email protected] We’re building the model that makes autonomous labor real.

BerntBornich's tweet photo. We’re going all in on World Models.

Today we’re launching the 1X World Model Lab.

The bet is simple:

You can’t fine-tune your way to AGI.

And you definitely can’t fine-tune your way to robots that can operate in the physical world.

General-purpose humanoids need models that understand space, motion, objects, causality, affordances, physics, and action before they ever see a specific task.

The frontier is not better VLA wrappers.

The frontier is embodied world models.

The 1X World Model Lab will focus on large-scale embodied world model pretraining: building the most generalizable foundation model for humanoid robots from the ground up.

The next frontier in AI requires scaling:

web-scale media + egocentric human videos + sim + dexterous remote operated robot data + on-policy NEO data → real-world deployment for robot data collection and RL → abundance of data → physical AI

The robot collects data.
The model gets better.
The robot gets better.
Repeat.

To lead this, we brought in one of the best for the mission: @_sam_sinha_ , as Head of World Models.

Sam was a founding research scientist at Luma AI and has been at the frontier of scaling multimodal generative video models his whole career.

If you’re the best in the world at large-scale pretraining, video models, robotics, RL, infra, or data — and you want your models to move atoms, not just pixels — join us.

Send background + evidence of exceptional ability to:

wmlab@1x.tech

We’re building the model that makes autonomous labor real.

125

211

327K

Mohammed Alshehri

@SwishMoe

3 days ago

@combuting @STVcapital Congrats

307

Mohammed Alshehri

@SwishMoe

3 days ago

Paper:https://t.co/DFsKvr5MkR Credits: @YIFENGLIU_AI @yifan_zhang_

Mohammed Alshehri

@SwishMoe

3 days ago

1/10 Really interesting paper: Self-Distilled Policy Gradient (SDPG). Core idea: RLVR gives strong outcome rewards, but they are sparse. Self-distillation gives dense token-level signals, but can collapse. SDPG tries to get the best of both.

SwishMoe's tweet photo. 1/10

Really interesting paper: Self-Distilled Policy Gradient (SDPG).

Core idea: RLVR gives strong outcome rewards, but they are sparse.

Self-distillation gives dense token-level signals, but can collapse.

SDPG tries to get the best of both. https://t.co/9t4AybeEmN

144

Mohammed Alshehri

@SwishMoe

3 days ago

10/10 The ablations are the real lesson. Removing OPD loses early accuracy gains. Removing KL hurts reasoning structure. On Qwen3-1.7B, SDPG still wins, while pure self-distillation OPCD collapses after ~250 steps. Takeaway: dense self-distillation works best when grounded by verifier rewards and anchored by KL.

Mohammed Alshehri

@SwishMoe

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users