Manish Pandey 🧬

@Manish_GenAI

@ICLR_conf 2026, ICML 2026, KDD 2026, ECCV 2026, Co-Founder @RoentGenHealth 🧬 Building a collaborative Platform for Patients and Doctors. 🌐🩻 #GraphML, #RL,

Joined August 2021

7.5K Following

374 Followers

1.3K Posts

Pinned Tweet

Manish Pandey 🧬 @Manish_GenAI

about 2 months ago

Excited to share our new paper at @iclr_conf on Adversarial Robustness and AI Safety🚀 My awesome collaborators will be presenting them at the main conference this week in Rio de Janeiro, Brazil 🇧🇷(Sat, Apr 25, 2026), Pavilion 3, P3-#812; check it out! Feel free to reach out for any discussions. "Certified vs. Empirical Adversarial Robustness via Hybrid Convolutions with Attention Stochasticity" Paper Link: https://t.co/u8JaVJ3mxN #AdversarialRobustness #AIsafety #AI #ICLR2026 #Research #MachineLearning #ComputerVision

Manish_GenAI's tweet photo. Excited to share our new paper at @iclr_conf on Adversarial Robustness and AI Safety🚀
My awesome collaborators will be presenting them at the main conference this week in Rio de Janeiro, Brazil 🇧🇷(Sat, Apr 25, 2026), Pavilion 3, P3-#812; check it out! Feel free to reach out for any discussions.

"Certified vs. Empirical Adversarial Robustness via Hybrid Convolutions with Attention Stochasticity"
Paper Link: https://t.co/u8JaVJ3mxN
#AdversarialRobustness #AIsafety #AI #ICLR2026 #Research #MachineLearning #ComputerVision

145

Manish_GenAI retweeted

Sergey Levine

@svlevine

3 days ago

A new way to do off-policy RL with diffusion: if we have off-policy data, we need to figure out what the diffusion latent steps for it would be with our *current* policy (not the one that collected it), so this requires reversing the diffusion process on off-policy data.

276

246

38K

Manish_GenAI retweeted

Josh Engels @JoshAEngels

7 days ago

New GDM interp research: SFT is a big deal for safety relevant behaviors. We recently investigated root causes for some of Gemini’s behaviors. We were surprised to find that many behaviors actually came from the initial supervised finetuning stage, not later stages like RL! 🧵

JoshAEngels's tweet photo. New GDM interp research: SFT is a big deal for safety relevant behaviors.

We recently investigated root causes for some of Gemini’s behaviors. We were surprised to find that many behaviors actually came from the initial supervised finetuning stage, not later stages like RL!

🧵 https://t.co/mLg87XuXK5

264

193

95K

Manish_GenAI retweeted

How To Prompt

@HowToPrompt__

12 days ago

Stanford + Meta just dropped the paper that flips everything about AI agents. It's called "Code as Agent Harness." Right now, we treat large language models as text generators. When they need to solve a complex problem, they rely on a "chain of thought." But natural language is slippery. It's vague. It loses context. When an agent hallucinates in English, it just keeps talking. So they introduced a framework that changes the entire architecture of autonomy: "Code as Agent Harness." They stopped asking the AI to reason in words, and forced it to reason in code. Code isn't just the final output anymore. It is the memory. It is the environment. It is the boundary. Instead of writing a paragraph about how to solve a problem, the agent writes a script, executes it, and reads the output. Tests become its senses. Execution logs become its memory. Sandboxes become its physics. If an agent makes a mistake in English, it apologizes and hallucinates again. If an agent makes a mistake in code, the compiler throws an error. The trace tells it exactly what broke. The system forces it to fix it. This is where prompt engineering dies, and systems engineering takes over. The paper proves that reliability doesn't come from a smarter base model. It comes from the "harness" wrapped around it: - The model proposes. - The harness executes. - The environment returns feedback. - The verifier checks.

HowToPrompt__'s tweet photo. Stanford + Meta just dropped the paper that flips everything about AI agents.

It's called "Code as Agent Harness."

Right now, we treat large language models as text generators. When they need to solve a complex problem, they rely on a "chain of thought."

But natural language is slippery. It's vague. It loses context. When an agent hallucinates in English, it just keeps talking.

So they introduced a framework that changes the entire architecture of autonomy: "Code as Agent Harness."

They stopped asking the AI to reason in words, and forced it to reason in code.

Code isn't just the final output anymore. It is the memory. It is the environment. It is the boundary.

Instead of writing a paragraph about how to solve a problem, the agent writes a script, executes it, and reads the output.

Tests become its senses. Execution logs become its memory. Sandboxes become its physics.

If an agent makes a mistake in English, it apologizes and hallucinates again.

If an agent makes a mistake in code, the compiler throws an error. The trace tells it exactly what broke. The system forces it to fix it.

This is where prompt engineering dies, and systems engineering takes over.

The paper proves that reliability doesn't come from a smarter base model. It comes from the "harness" wrapped around it:

- The model proposes.
- The harness executes.
- The environment returns feedback.
- The verifier checks.

193

76K

Manish_GenAI retweeted

Sergey Levine

@svlevine

11 days ago

Diffusion (or flow) makes for excellent policies, but training them with RL is notoriously hard: BPTT is unstable, RL over diffusion blows up the horizon. In our new paper, we show how we can optimize flow matching actors by using "one weird trick" -- "approximate" the Jacobian of the flow denoising process with the identity matrix. 👇

svlevine's tweet photo. Diffusion (or flow) makes for excellent policies, but training them with RL is notoriously hard: BPTT is unstable, RL over diffusion blows up the horizon. In our new paper, we show how we can optimize flow matching actors by using "one weird trick" -- "approximate" the Jacobian of the flow denoising process with the identity matrix. 👇

122

961

84K

Manish_GenAI retweeted

Rishabh Kabra @RishabhKabra

14 days ago

If you used pretrained vision encoders like DINO, this is for you––we found a simple post-training recipe to improve DINO features! CVPR Highlight Paper: https://t.co/FMYmWQHTz7 Code: https://t.co/MS7v8vUNA7 Poster #63 on Sunday, June 7 at 3-5:30pm. Details in thread.

128

Manish_GenAI retweeted

Jim Fan

@DrJimFan

15 days ago

NitroGen just won CVPR Best Paper Honorable Mention!! We are making strides towards general-purpose embodied agents that master not only the real world physics, but also all possible physics across a multiverse of simulations. It’s been 4 years since MineDojo, our first embodied agent in Minecraft, won NeurIPS Best Paper. Congrats to everyone on the team!!

DrJimFan's tweet photo. NitroGen just won CVPR Best Paper Honorable Mention!! We are making strides towards general-purpose embodied agents that master not only the real world physics, but also all possible physics across a multiverse of simulations.

It’s been 4 years since MineDojo, our first embodied agent in Minecraft, won NeurIPS Best Paper. Congrats to everyone on the team!!

395

41K

Manish_GenAI retweeted

Haitham Bou Ammar

@hbouammar

15 days ago

I have so much fun writing this position with some of the most amaaazing people in robotics! Have a look at it here: https://t.co/zM3NBtobkx #AI #MachineLearning #Robotics

hbouammar's tweet photo. I have so much fun writing this position with some of the most amaaazing people in robotics!

Have a look at it here: https://t.co/zM3NBtobkx
#AI #MachineLearning #Robotics https://t.co/GrRJZ89pwg

718

109

837

122K

Manish_GenAI retweeted

Cameron R. Wolfe, Ph.D.

@cwolferesearch

18 days ago

Interested in learning how to run RL at scale? Here are the best resources to read… Research on Scaling RL 1. The Art of Scaling RL compute for LLMs: https://t.co/PGjI6Gwgv0 2. Scaling Behaviors of LLM RL Post-Training: https://t.co/2u2saB3C0h 3. Optimally Scaling Sampling Compute for LLM RL: https://t.co/rUSdUvJyNH 4. Scaling up RL: https://t.co/O8vV6z8ymx 5. ProRL V2 - Prolonged Training Validates RL Scaling Laws: https://t.co/vu72juvRW4 6. Polaris - A Recipe for Scaling RL with Reasoning Models: https://t.co/rMibSAeJbg RL Frameworks 1. Hybrid Flow (early outline of the verl framework): https://t.co/GnWXx131uD a. More up-to-date info can be found here: https://t.co/j801HcJmPP 2. AReal - Large-Scale Async RL: https://t.co/qhOvsQK09N 3. PipelineRL - Fast On-Policy RL: https://t.co/iRM7KzySXe 4. AsyncFlow - Async Streaming RL: https://t.co/YwmzFtiU2q RL for Agents 1. DeepSWE - Open Coding Agent Trained w/ RL: https://t.co/GHQHcmtE6F 2. AutoForge - Environment Synthesis for Agentic RL: https://t.co/mr3WDIL5vq 3. Agent-R1 - Training Agents w/ End-to-End RL: https://t.co/xpfQJGgzEv 4. AgentRL - Scaling RL for Multi-Turn, Multi-Task Agents: https://t.co/7fbVl0RWXG 5. The Landscape of Agentic RL: https://t.co/OMnSV4rgdW 6. Training SWE Agents with RL: https://t.co/YqMqySbyXS Case Studies & Tech Reports 1. Kimi tech reports: a. Kimi K2 - Open Agentic Intelligence: https://t.co/aAw17SXrIw b. Kimi End-to-end Agentic RL: https://t.co/ProBpOPIiI c. Kimi K1.5 - Scaling RL for LLMs: https://t.co/kRGOxY9Jvp 2. Composer series from Cursor: a. Composer 2: https://t.co/K0v8rNCE6Z b. Composer 2.5: https://t.co/D9PYimfOMU 3. Olmo 3 (also has open code / data): https://t.co/khetJFvp6N 4. MiniMax tech reports: a. MiniMax-M2: https://t.co/HApb0OB80S b. MiniMax-M1: https://t.co/mZj9UQsrnC 5. Nemotron 3 (NVIDIA): https://t.co/lCpE1GzxSi

cwolferesearch's tweet photo. Interested in learning how to run RL at scale? Here are the best resources to read…

Research on Scaling RL
1. The Art of Scaling RL compute for LLMs: https://t.co/PGjI6Gwgv0
2. Scaling Behaviors of LLM RL Post-Training: https://t.co/2u2saB3C0h
3. Optimally Scaling Sampling Compute for LLM RL: https://t.co/rUSdUvJyNH
4. Scaling up RL: https://t.co/O8vV6z8ymx
5. ProRL V2 - Prolonged Training Validates RL Scaling Laws: https://t.co/vu72juvRW4
6. Polaris - A Recipe for Scaling RL with Reasoning Models: https://t.co/rMibSAeJbg

RL Frameworks
1. Hybrid Flow (early outline of the verl framework): https://t.co/GnWXx131uD
a. More up-to-date info can be found here: https://t.co/j801HcJmPP
2. AReal - Large-Scale Async RL: https://t.co/qhOvsQK09N
3. PipelineRL - Fast On-Policy RL: https://t.co/iRM7KzySXe
4. AsyncFlow - Async Streaming RL: https://t.co/YwmzFtiU2q

RL for Agents
1. DeepSWE - Open Coding Agent Trained w/ RL: https://t.co/GHQHcmtE6F
2. AutoForge - Environment Synthesis for Agentic RL: https://t.co/mr3WDIL5vq
3. Agent-R1 - Training Agents w/ End-to-End RL: https://t.co/xpfQJGgzEv
4. AgentRL - Scaling RL for Multi-Turn, Multi-Task Agents: https://t.co/7fbVl0RWXG
5. The Landscape of Agentic RL: https://t.co/OMnSV4rgdW
6. Training SWE Agents with RL: https://t.co/YqMqySbyXS

Case Studies & Tech Reports
1. Kimi tech reports:
a. Kimi K2 - Open Agentic Intelligence: https://t.co/aAw17SXrIw
b. Kimi End-to-end Agentic RL: https://t.co/ProBpOPIiI
c. Kimi K1.5 - Scaling RL for LLMs: https://t.co/kRGOxY9Jvp
2. Composer series from Cursor:
a. Composer 2: https://t.co/K0v8rNCE6Z
b. Composer 2.5: https://t.co/D9PYimfOMU
3. Olmo 3 (also has open code / data): https://t.co/khetJFvp6N
4. MiniMax tech reports:
a. MiniMax-M2: https://t.co/HApb0OB80S
b. MiniMax-M1: https://t.co/mZj9UQsrnC
5. Nemotron 3 (NVIDIA): https://t.co/lCpE1GzxSi

803

135

35K

Manish_GenAI retweeted

Jianlan Luo

@jianlanluo

20 days ago

Excited to release τ0-WM: an open-source unified video-action world model for robotic manipulation. It's a 5B-parameter robotic foundation model trained on 27.3K hours of real-robot teleoperation, UMI-style demonstrations, and egocentric interaction videos.

658

466

54K

Manish_GenAI retweeted

Tom Dörr

@tom_doerr

24 days ago

Generates depth maps from single or multi-view images https://t.co/OV5sciDp8n

Manish_GenAI retweeted

Nicholas Tomlin @NickATomlin

24 days ago

New paper! LLM memory keeps improving, but this makes them *worse* as user sims. If we want to build models that can, e.g., simulate realistic students to train chatbots to be better teachers, then these models need to be able to forget like humans do 📄: https://t.co/1GpOfwcsat

NickATomlin's tweet photo. New paper! LLM memory keeps improving, but this makes them *worse* as user sims. If we want to build models that can, e.g., simulate realistic students to train chatbots to be better teachers, then these models need to be able to forget like humans do

📄: https://t.co/1GpOfwcsat https://t.co/IDePa4f6gw

460

320

46K

Manish_GenAI retweeted

Yuyin Zhou

@yuyinzhou_cs

25 days ago

#Claude is great — but building clinical-grade AI requires more active evidence retrieval. Introducing #ClinSeekAgent — a complete stack for building advanced medical agents through active multimodal evidence retrieval, powered by a comprehensive agent toolbox for dynamic clinical reasoning. 🧰 **Agent Toolbox**: 20 MCP tools for active evidence seeking (11 EHR · 3 web retrieval · 6 medical imaging tools) 🔧 **Framework**: ClinSeekAgent for orchestrating 📊 **Benchmark**: ClinSeek-Bench — paired Curated vs Agentic evaluation (text-only EHR + multimodal) 🧠 **Data**: high-quality Claude Opus 4.6 evidence-seeking trajectories (for SFT) 🤖 **Model**: ClinSeek-35B-A3B — open-source SOTA clinical agent 📄 https://t.co/T1Qn64LkUv 💻 https://t.co/Jai6zHfhgH 🤗 https://t.co/RJaES2hgON

117

123

13K

Manish_GenAI retweeted

Siqiao Huang

@KnightNemo_

25 days ago

In the last couple of months, we have witnessed significant advances in Industry-scale World Models. Yet, for the broader community, the gap between reading about these models and deploying them remains disappointingly wide. Today we're releasing Nano World Models: a minimalist, batteries-included repo for advancing world model science. 🧵 (1/9)

352

246

48K

Manish_GenAI retweeted

Pushmeet Kohli

@pushmeet

24 days ago

When I was asked by the American Academy of Arts and Sciences to write an essay on my thoughts on how AI will accelerate Science, I felt honored but also felt that it would require a lot of thoughtfulness and diligence to distill my thoughts on paper. The essay has now been published and I cannot be more thankful to the @americanacad and @GoogleDeepMind teams for their feedback and encouragement during the process. Key reflections from my essay: 🔭 AI is our newest revolutionary lens: Just as the telescope and microscope expanded our physical perception, AI is extending our cognitive reach, allowing us to decipher the immense complexity of the data-universe. 🧬 The rise of "machine intuition": AI is not just a computational engine. By detecting hidden structures across disciplines—from protein folding to extremal combinatorics—it acts as an ultimate bridge, accelerating the interdisciplinary breakthroughs that modern science depends on. 🏗️ From puzzle-solvers to architects of questions: As we transition toward open-ended, agentic AI systems that actively generate novel hypotheses, the burden of reasoning is shifting. We are evolving from being the solvers of intricate puzzles into the architects of profound scientific questions. ✨ Expanding human potential: AI won't replace scientists; it expands what we can imagine and achieve. Just as the telescope didn't make astronomers obsolete, AI is giving us the stars. Read the full essay here: https://t.co/LCoF7ds7WZ

693

122

553

179K

Manish_GenAI retweeted

Ryohei Sasaki@engineer

@rsasaki0109

26 days ago

[ICML' 26] From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models https://t.co/Q1uOL0R0B6

rsasaki0109's tweet photo. [ICML' 26] From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models
https://t.co/Q1uOL0R0B6 https://t.co/p6SVGgqlWR

102

Manish_GenAI retweeted

Binfeng Xu

@billxbf

25 days ago

Excited to release 🌟Polar🌟, our Agent RL rollout infra for real-world harnesses. Be it Codex, Claude Code, OpenClaw, Hermes, or your self-made ones 🔥 -- Polar takes your harnesses directly as training environments without code change. Find a problem, design the harness, and train your own agents! 🧵

billxbf's tweet photo. Excited to release 🌟Polar🌟, our Agent RL rollout infra for real-world harnesses. Be it Codex, Claude Code, OpenClaw, Hermes, or your self-made ones 🔥 -- Polar takes your harnesses directly as training environments without code change.

Find a problem, design the harness, and train your own agents! 🧵

905

144

949

132K

Manish_GenAI retweeted

Turing Post

@TheTuringPost

27 days ago

12 AI Co-Scientists of 2026 Open-source: ▪️ ERA - builds scientific simulations and software for biology, forecasting, and more ▪️ DISCO - designs proteins and enzymes from scratch ▪️ kUPS - fast molecular simulation engine ▪️ Axplorer by @axiommathai - solved trillion-scale math searches 100× more efficiently ▪️ AI CFD Scientist - physics-aware fluid simulation research ▪️ The AI Scientist (Sakana AI) - automates full research pipeline ▪️ AutoResearchClaw - self-improving multi-agent research system Other important breakthroughs: ▪️ Google DeepMind's AI Co-Scientist – discovered a fibrosis drug candidate ▪️ OpenAI reasoning model – solved an 80-year-old geometry problem ▪️ Robin – identified a blindness treatment candidate ▪️ AxiomProver – solved the entire Putnam exam ▪️ AI Co-Mathematician – hits math benchmarks Full breakdown with papers, GitHub repos, and technical details ↓ https://t.co/nn8eOEVP06

TheTuringPost's tweet photo. 12 AI Co-Scientists of 2026

Open-source:

▪️ ERA - builds scientific simulations and software for biology, forecasting, and more
▪️ DISCO - designs proteins and enzymes from scratch
▪️ kUPS - fast molecular simulation engine
▪️ Axplorer by @axiommathai - solved trillion-scale math searches 100× more efficiently
▪️ AI CFD Scientist - physics-aware fluid simulation research
▪️ The AI Scientist (Sakana AI) - automates full research pipeline
▪️ AutoResearchClaw - self-improving multi-agent research system

Other important breakthroughs:

▪️ Google DeepMind's AI Co-Scientist – discovered a fibrosis drug candidate
▪️ OpenAI reasoning model – solved an 80-year-old geometry problem
▪️ Robin – identified a blindness treatment candidate
▪️ AxiomProver – solved the entire Putnam exam
▪️ AI Co-Mathematician – hits math benchmarks

Full breakdown with papers, GitHub repos, and technical details ↓
https://t.co/nn8eOEVP06

172

150

21K

Manish_GenAI retweeted

Sungjin Ahn

@SungjinAhn_

about 1 month ago

🧠We introduce "Generative Recursive Reasoning"! Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor. Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling. And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x). With only 10M params: • Sudoku-Extreme: 97.0% (TRM 87.4%) • ARC-AGI-1: 52.0% • ARC-AGI-2: 11.1% • N-Queens coverage: 90%+ 📄 Paper: https://t.co/JC7EyXYc9Y 🌐 Project page: https://t.co/LRT1dQiWLZ w/ Junyeob Baek @JunyeobB (KAIST), Mingyu Jo @pyross0000 (KAIST), Minsu Kim @minsuuukim (KAIST & Mila), Mengye Ren @mengyer (NYU), Yoshua Bengio @Yoshua_Bengio (Mila), Sungjin Ahn @SungjinAhn_ (KAIST)

SungjinAhn_'s tweet photo. 🧠We introduce "Generative Recursive Reasoning"!

Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor.

Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling.

And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x).

With only 10M params:
• Sudoku-Extreme: 97.0% (TRM 87.4%)
• ARC-AGI-1: 52.0%
• ARC-AGI-2: 11.1%
• N-Queens coverage: 90%+

📄 Paper: https://t.co/JC7EyXYc9Y
🌐 Project page: https://t.co/LRT1dQiWLZ

w/
Junyeob Baek @JunyeobB (KAIST),
Mingyu Jo @pyross0000 (KAIST),
Minsu Kim @minsuuukim (KAIST & Mila),
Mengye Ren @mengyer (NYU),
Yoshua Bengio @Yoshua_Bengio (Mila),
Sungjin Ahn @SungjinAhn_ (KAIST)

209

183K

Manish_GenAI retweeted

Yining Hong

@yining_hong

about 1 month ago

Excited to share ESI-BENCH, a benchmark for Embodied Spatial Intelligence! Most spatial reasoning benchmarks assume an oracle observer: the agent is given the right image, view, or 3D scene. But in the real world, the observer is also an actor. To understand space, agents must decide where to look, how to move, and when to interact, to reveal what is hidden: occlusions, containment, contact, dynamics, and functionality. In many cases, the hard part is not perception itself, but choosing the right action to make informative perception possible. ESI-BENCH tests this perception-action loop. Agents receive an egocentric observation and a spatial question, then must actively gather evidence through perception, locomotion, and manipulationbefore answering. The benchmark spans 10 task categories, 29 subcategories, and 3,081 instances, built in BEHAVIOR-1K across realistic interactive scenes. 🌍Webpage: https://t.co/Ou3zJ48eFx 💻Code & data: https://t.co/Mw0kU5hoyA Thanks for collaborators: Jiageng, Han, @ManlingLi_ , Leonidas Guibas, @drfeifei , @jiajunwu_cs , @YejinChoinka

222

48K

Manish_GenAI retweeted

James Zou @james_y_zou

about 1 month ago

The best gym for data science💪: #DSGym provides a grounded and realistic environment to train and test data science agents. Accepted to #ICML2026! Great work by @FanNie1208 @JunlinWang3 @_harperhua @federicobianchy @ykwon_0407 @ZhentingQi @oq_35 @ShangZhu18 @togethercompute

Manish Pandey 🧬

@Manish_GenAI

Last Seen Users on Sotwe

Trends for you

Most Popular Users