Jyo Pari @Jyo_Pari - Twitter Profile

Pinned Tweet

about 1 year ago

What if an LLM could update its own weights? Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated model’s downstream performance as reward.

jyo_pari's tweet photo. What if an LLM could update its own weights?

Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs.

Self-editing is learned via RL, using the updated model’s downstream performance as reward. https://t.co/LMWVJYZaXA

132

3K

506

3K

665K

Jyo Pari

@jyo_pari

10 days ago

Nice way to handle realistic data!

Adam Wei

@adamwei_

10 days ago

🤖 We introduce Ambient Diffusion Policy, a simple and principled method for training policies with suboptimal data in robotics. Suboptimal data is everywhere in robotics… ❌ Data filtering is wasteful ❌ Co-training learns both good and bad features ✅ Ambient Diffusion Policy selectively learns useful features via noise-dependent data usage 👇🧵(1/5)

15

313

54

241

88K

0

13

0

4

2K

Jyo Pari

@jyo_pari

13 days ago

This is a great blog by Moritz to see the state of robot foundation models!

Moritz Reuss @moritz_reuss

13 days ago

World-Action Models (WAMs) have become the second dominant recipe for robot foundation models, next to classical VLAs. So where do they come from, and how do they compare vs VLAs? I wrote an small overview of the WAM landscape, with some personal takes: https://t.co/6S4gH9tWTt

9

646

98

723

118K

0

7

0

3

3K

jyo_pari retweeted

Ryan Bahlous-Boldi

@RyanBoldi

about 1 month ago

Your RL post-training may be sabotaging your LLM’s test-time scaling! Conventional RL pretends that you can collapse all reward signals *upfront* into a single *scalar reward*. We introduce Vector Policy Optimization (VPO), which natively maximizes *vector-valued* rewards, boosting test time search performance, even on the original scalar.

RyanBoldi's tweet photo. Your RL post-training may be sabotaging your LLM’s test-time scaling!

Conventional RL pretends that you can collapse all reward signals *upfront* into a single *scalar reward*.
We introduce Vector Policy Optimization (VPO), which natively maximizes *vector-valued* rewards, boosting test time search performance, even on the original scalar.

35

876

124

806

222K

Who to follow

Maximilian Du

@du_maximilian

Stanford AI Ph.D. researcher interested in enabling robots to reason like humans & animals! Previously at Stanford undergrad and U.S. Navy dolphin program :)

Abitha

@abitha___

PhD student at @SCSatCMU

about 1 month ago

The computational abstractions humans developed are great for building architectures, however they’re not necessarily the right abstractions for kernels. Han shows why 🔥

Han Guo

@HanGuo97

about 1 month ago

LLM training is built on fast MatMuls. But many surrounding ops still run as memory-bound kernels. CODA reparameterizes them to hide in the matmul’s shadow, fused into its epilogue before results leave the chip. Bonus: LLMs can write fast CODA kernels too (approaching SoLs).

HanGuo97's tweet photo. LLM training is built on fast MatMuls. But many surrounding ops still run as memory-bound kernels.

CODA reparameterizes them to hide in the matmul’s shadow, fused into its epilogue before results leave the chip.

Bonus: LLMs can write fast CODA kernels too (approaching SoLs). https://t.co/cOTeMUr4py

16

687

103

534

200K

0

21

3

9

3K

Jyo Pari

@jyo_pari

about 2 months ago

Pulkit and I had many chats since the start of my PhD of how models should intimately understand force and physics. I always thought this is the right technical bet to make and now Pulkit + Team is paving the path to bring this to life!

Pulkit Agrawal

@pulkitology

about 2 months ago

Eka means unity -- “one,” in Sanskrit and “first” in Finnish. We’re building intelligence for the physical world in its native language: forces. Until now, robotics faced a tradeoff — generality or speed. The real world requires both. Robotics also faced a data problem. Our Vision–Force–Action (VFA) model — the first of its kind — breaks the generality-speed tradeoff and the data barrier. It's a new foundation uniting performance, generality, and safety for putting capable robots in everyone's hands. Today, I am excited to share our journey of pushing robots beyond human limits. Today, dexterity becomes scalable. Today, I welcome you to the Era of Eka. Co-founded with @haarnoja, and so thrilled and grateful to be working with a dream team at @EkaRobotics. Learn more: https://t.co/QYQ6x2Etyi

65

2K

219

596

329K

1

34

3

6

3K

jyo_pari retweeted

Lewis Bollard

@Lewis_Bollard

2 months ago

Big vote for animals this week. On Thursday, the House will likely vote on whether to strike the Save Our Bacon Act from the farm bill. Rep. Luna (R-FL) is leading a bipartisan amendment to strike the Act, which would wipe out state laws banning the sale of pork from crated pigs. Pork industry lobbyists are already hard at work against her. They're counting on no one speaking up for the pigs. Prove them wrong. Call the House at 202-224-3121. Give them your zip code and they'll connect you to your member's office. Ask them to vote YES on the Luna Amendment to remove the Save Our Bacon Act from the farm bill.

50

2K

603

114

89K

Jyo Pari

@jyo_pari

3 months ago

Shared representations and computations for generation and tokenization !

Shivam Duggal @ShivamDuggal4

3 months ago

Tokenization & Generation power Large Models. But are they really separate? Tokenization=Generation under strong observability UNITE: An end-to-end training framework where one shared Generative Encoder (GE) performs both token. & latent denoising Paper: https://t.co/8idMdy123h

ShivamDuggal4's tweet photo. Tokenization & Generation power Large Models. But are they really separate?

Tokenization=Generation under strong observability

UNITE: An end-to-end training framework where one shared Generative Encoder (GE) performs both token. & latent denoising
Paper: https://t.co/8idMdy123h https://t.co/Yjf6cFnMaP

4

413

79

295

66K

0

20

0

3

2K

Jyo Pari

@jyo_pari

3 months ago

Very cool! Adding more non-linearity to the state update is needed ➰

Mayank Mishra

@MayankMish98

3 months ago

Introducing M²RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling We bring back non-linear recurrence to language modeling and show it's been held back by small state sizes, not by non-linearity itself. 📄 Paper: https://t.co/AS8e2tNrRa 💻 Code: https://t.co/LMvBcI22Du 🤗 Models: https://t.co/NCmjrpNriq

10

516

109

329

148K

0

17

2

3

2K

jyo_pari retweeted

Shivam Duggal @ShivamDuggal4

3 months ago

Similar thought. Next-token prediction feels statistical: perplexity / shannon-entropy minimization. But creativity / science may require: finding compact generative structures, then exploring in that space. Closer to algorithmic complexity? More Kolmogorov than Shannon.

3

70

7

44

9K

Jyo Pari

@jyo_pari

3 months ago

Hard problems require more than bigger models, they require effective exploration at test time. 💡 @aviral_kumar2 will present new approaches for training LMs to scale test-time exploration, including solving IMO-level math problems. 🏅 🗓️ March 19, 4pm ET @scaleml

jyo_pari's tweet photo. Hard problems require more than bigger models, they require effective exploration at test time. 💡

@aviral_kumar2 will present new approaches for training LMs to scale test-time exploration, including solving IMO-level math problems. 🏅

🗓️ March 19, 4pm ET
@scaleml https://t.co/EmI7Xwu5DN

2

93

5

68

8K

Jyo Pari

@jyo_pari

4 months ago

Very cool work by Seungwook! Would be interesting to see if the neural cellular automata pre training results in additional capabilities that natural language training alone can’t produce.

Seungwook Han

@seungwookh

4 months ago

Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: https://t.co/Pni0RsIcxL (1/n)

seungwookh's tweet photo. Can language models learn useful priors without ever seeing language?

We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning.

Surprisingly, it even beats pre-pre-training on natural text!

Blog: https://t.co/Pni0RsIcxL

(1/n)

47

2K

259

1K

256K

0

12

0

1

1K

Jyo Pari

@jyo_pari

4 months ago

As context windows grow 📈, continual learning matters more! @tianyuanzhang99 will present how to scale test-time training for effectively infinite context ♾ 🗓️ Feb 19, 3pm ET @scaleml

jyo_pari's tweet photo. As context windows grow 📈, continual learning matters more!

@tianyuanzhang99 will present how to scale test-time training for effectively infinite context ♾

🗓️ Feb 19, 3pm ET
@scaleml https://t.co/r2RiQwu5VB

8

178

15

106

26K

Jyo Pari

@jyo_pari

5 months ago

The benefits of on-policy learning with the speed of SFT !

idan shenfeld

@IdanShenfeld

5 months ago

People keep saying 2026 will be the year of continual learning. But there are still major technical challenges to making it a reality. Today we take the next step towards that goal — a new on-policy learning algorithm, suitable for continual learning! (1/n)

IdanShenfeld's tweet photo. People keep saying 2026 will be the year of continual learning.

But there are still major technical challenges to making it a reality.

Today we take the next step towards that goal — a new on-policy learning algorithm, suitable for continual learning!

(1/n) https://t.co/tuDTBATlTQ

50

2K

224

1K

240K

0

26

1

8

2K

jyo_pari retweeted

Locke Cai

@couplefire12

7 months ago

RL for reasoning often rely on verifiers — great for math, but tricky for creative writing or open-ended research. Meet RARO: a new paradigm that teaches LLMs to reason via adversarial games instead of verification. No verifiers. No environments. Just demonstrations. 🧵👇

couplefire12's tweet photo. RL for reasoning often rely on verifiers — great for math, but tricky for creative writing or open-ended research.

Meet RARO: a new paradigm that teaches LLMs to reason via adversarial games instead of verification.

No verifiers. No environments. Just demonstrations. 🧵👇

23

603

78

667

179K

Jyo Pari

@jyo_pari

7 months ago

Next Tuesday, @shannonzshen will present hybrid chain-of-thought, a method that mixes latent and discrete tokens during decoding 🔥 🗓️ Nov 25, 3pm ET @scaleml

jyo_pari's tweet photo. Next Tuesday, @shannonzshen will present hybrid chain-of-thought, a method that mixes latent and discrete tokens during decoding 🔥

🗓️ Nov 25, 3pm ET
@scaleml https://t.co/47dAiQswTX

1

51

7

20

6K

Jyo Pari

@jyo_pari

8 months ago

Why do deep learning optimizers make progress even in the edge-of-stability regime? 🤔 @alex_damian_ will present theory that can describe the dynamics of optimization in this regime! 🗓️ Nov 17, 3pm ET @scaleml

jyo_pari's tweet photo. Why do deep learning optimizers make progress even in the edge-of-stability regime? 🤔

@alex_damian_ will present theory that can describe the dynamics of optimization in this regime!

🗓️ Nov 17, 3pm ET
@scaleml https://t.co/Jx8Uw1Uum4

0

72

10

33

8K

jyo_pari retweeted

idan shenfeld

@IdanShenfeld

8 months ago

Everyone’s talking about Kimi K2 Thinking and its impressive performance. No full report yet, but judging from Kimi K2\1.5 reports, it likely uses Policy Mirror Descent - an RL trick that’s quietly becoming standard in frontier labs. Let’s break down what it is:

$IdanShenfeld's tweet photo. Everyone’s talking about Kimi K2 Thinking and its impressive performance. No full report yet, but judging from Kimi K2\1.5 reports, it likely uses Policy Mirror Descent - an RL trick that’s quietly becoming standard in frontier labs. Let’s break down what it is: https://t.co/OEHAlhQWHt$

12

468

47

480

59K

jyo_pari retweeted

Kevin Lu

@_kevinlu

8 months ago

in our new post, we walk through great prior work from @agarwl_ & the @Alibaba_Qwen team exploring on-policy distillation using an open source recipe: you can run our experiments on Tinker today! https://t.co/7nkW8YgT7K i'm especially excited by the use of on-policy distillation to enable new "test-time training" personalization methods, allow the model to learn new domain knowledge without regressing on post-training capabilities

14

367

29

165

96K

Jyo Pari

@jyo_pari

9 months ago

Very interest! We could use RLMs for complex reasoning problems where models are solving sub-problems in parallel unlocking a new dimension of scaling!

alex zhang

@a1zhang

9 months ago

What if scaling the context windows of frontier LLMs is much easier than it sounds? We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length, as a REPL environment. On the OOLONG benchmark, RLMs with GPT-5-mini outperforms GPT-5 by over 110% gains (more than double!) on 132k-token sequences and is cheaper to query on average. On the BrowseComp-Plus benchmark, RLMs with GPT-5 can take in 10M+ tokens as their “prompt” and answer highly compositional queries without degradation and even better than explicit indexing/retrieval. We link our blogpost, (still very early!) experiments, and discussion below.

a1zhang's tweet photo. What if scaling the context windows of frontier LLMs is much easier than it sounds?

We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length, as a REPL environment.

On the OOLONG benchmark, RLMs with GPT-5-mini outperforms GPT-5 by over 110% gains (more than double!) on 132k-token sequences and is cheaper to query on average.

On the BrowseComp-Plus benchmark, RLMs with GPT-5 can take in 10M+ tokens as their “prompt” and answer highly compositional queries without degradation and even better than explicit indexing/retrieval.

We link our blogpost, (still very early!) experiments, and discussion below.

135

3K

377

3K

950K

1

29

4

16

6K

jyo_pari retweeted

Moritz Reuss @moritz_reuss

9 months ago

VLAs have become the fastest-growing subfield in robot learning. So where are we now? After reviewing ICLR 2026 submissions and conversations at CoRL, I wrote an overview of the current state of VLA research with some personal takes: https://t.co/OMMdB1MHtS

11

532

103

446

53K

Jyo Pari

@jyo_pari

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users