qqzzqqqqqq @qqzzq2 - Twitter Profile

qqzzq2 retweeted

3 days ago

Vision-language AI models have a gaze. And you can steer it! 👀 Redirect just 9% of a model’s attention heads to any region in an image, and the VLM will start describing that region mid-generation. We call them Gaze Heads! Try the demo: https://t.co/y5jlb0iBI8 🧵👇

11

493

91

352

46K

qqzzq2 retweeted

Sergey Levine

@svlevine

1 day ago

A new way to do off-policy RL with diffusion: if we have off-policy data, we need to figure out what the diffusion latent steps for it would be with our *current* policy (not the one that collected it), so this requires reversing the diffusion process on off-policy data.

0

228

19

197

26K

qqzzq2 retweeted

Siddharth Ancha

@siddancha

7 days ago

Cool work on refining coarse VLM actions using a flow matching policy π(a₀ | o) → a₁ where a₀ ∼ N(0, 1) by first reversing (inverting) the given coarse action a₁ via â₀ = π⁻¹(a₁ | o) and then reconstructing it in the forward direction i.e. â₁ = π(â₀ | o). The interesting bit is that this method crucially **relies** on Euler integration error! If you use a vanishingly small integration step size (which is usually supposed to be a good, albeit expensive, thing), there is no refinement i.e. â₁ = a₁!

1

74

5

53

10K

qqzzq2 retweeted

Will Chen @verityw_

7 days ago

Generalist robot policies learn many useful skills. How can we elicit relevant behaviors when faced with new tasks? We introduce Flow Reversal Steering (FRS): a way to refine coarse actions produced by semantic reasoning into similar precise ones! https://t.co/BRdvq0OVg0 1/N

verityw_'s tweet photo. Generalist robot policies learn many useful skills. How can we elicit relevant behaviors when faced with new tasks? We introduce Flow Reversal Steering (FRS): a way to refine coarse actions produced by semantic reasoning into similar precise ones!
https://t.co/BRdvq0OVg0
1/N https://t.co/ua8lRrgmzM

3

63

16

46

16K

Who to follow

AI動画（Sora2）/サウナ（サウナ・スパ健康アドバイザー）/スキー。いろんな人たちと繋がりたい。ちょっとした「気づき」「学び」をAI動画と共に発信

qqzzq2 retweeted

Chelsea Finn

@chelseabfinn

7 days ago

How does test-time scaling impact robots? We find that larger models, more thinking, and more context help significantly for some prompts but not others. Like LLMs, we can also train a router to for a better performance/latency tradeoff! Paper: https://t.co/HEjjCkrsen

2

185

19

117

23K

qqzzq2 retweeted

Kosta Derpanis (sabbatical in Zurich)

@CSProfKGD

9 days ago · Zurich

The videos from the “Frontiers of Embodied AI” meetup at ETHZ from a few weeks back are now available. Speakers: Jitendra Malik, Vladlen Koltun, Yann LeCun, and Shuran Song Hosted by Marc Pollefeys YouTube playlist: https://t.co/IfU9owsa1o

CSProfKGD's tweet photo. The videos from the “Frontiers of Embodied AI” meetup at ETHZ from a few weeks back are now available.

Speakers: Jitendra Malik, Vladlen Koltun, Yann LeCun, and Shuran Song

Hosted by Marc Pollefeys

YouTube playlist: https://t.co/IfU9owsa1o https://t.co/dNiH3OfBYm

2

115

21

98

19K

qqzzq2 retweeted

Seika Karamatsu

@SeikaKaramatsu

9 days ago

チューリッヒ工科大の2026年春のロボット学習の授業の資料が一部公開されてるみたいです！ https://t.co/RlpSUOBTHf 模倣学習・RLの基礎、VLA、ロボティクス基盤モデルなどの講義が計12週分あります。全部無料だとバズっていますが、実際確認したところスライドはパスワード保護されており、外部からは開けません。外部の人が見られるのは・ゲスト講義のYouTube録画（無料・登録不要）・GitHubのコーディング課題（PyTorch、模倣学習、RL）です。ゲスト陣は・Cheng Chi（Diffusion PolicyとUMIの作者、Sunday Robotics共同創業者）・Quan Vuong（Physical Intelligence共同創業者、π0.6の回）・Scott Reed（NVIDIA GEAR Lab）・Dieter Fox（AI2ディレクター）などと分野の最前線の方々が並んでるみたいです！ 12週分は多いので、VLAに興味があるならQuan Vuongさんの授業、模倣学習ならCheng Chiさんの授業、のようにつまみ食いして見��のが良いと思います。自分も面白そうなの見てみます！

0

254

26

281

29K

qqzzq2 retweeted

Chen Tang @ChenTangMark

9 days ago

RTC is a key ingredient for deploying high-latency VLA policies in real-time. We show that discrete diffusion is a more natural fit for asynchronous execution: with no extra implementation or specialized fine-tuning, it achieve strong performance on dynamic manipulation tasks!

0

58

5

66

9K

qqzzq2 retweeted

Sergey Levine

@svlevine

10 days ago

Diffusion (or flow) makes for excellent policies, but training them with RL is notoriously hard: BPTT is unstable, RL over diffusion blows up the horizon. In our new paper, we show how we can optimize flow matching actors by using "one weird trick" -- "approximate" the Jacobian of the flow denoising process with the identity matrix. 👇

svlevine's tweet photo. Diffusion (or flow) makes for excellent policies, but training them with RL is notoriously hard: BPTT is unstable, RL over diffusion blows up the horizon. In our new paper, we show how we can optimize flow matching actors by using "one weird trick" -- "approximate" the Jacobian of the flow denoising process with the identity matrix. 👇

8

1K

122

961

83K

qqzzq2 retweeted

Marcus Castro

@mac_a_castro

12 days ago

Nice read on the current state of robotics and where it is going.

2

217

16

303

35K

qqzzq2 retweeted

Shane Gu

@shaneguML

14 days ago

Appreciate Jitendra's takes on world models/VLMs. His word below is why back in 2019-2021, instead of VLAs for simple pick-and-place, we chose assembly. Dexterity = mutual info between your intent and forces/torques on objects via contacts.

7

261

27

227

48K

qqzzq2 retweeted

Chris Paxton

@chris_j_paxton

17 days ago

Dexterous world models!

2

61

3

31

9K

qqzzq2 retweeted

Jeannette Bohg @leto__jean

18 days ago

I've always been impressed by how Diffusion policies and VLAs do contact-rich tasks without force sensing. Today I'll try to Demystify VLA Performance in Contact-Rich Tasks and How to Fix Them. 15:30 at the Act to Sense to Act Better Workshop, Hall C4, #ICRA2026

leto__jean's tweet photo. I've always been impressed by how Diffusion policies and VLAs do contact-rich tasks without force sensing.

Today I'll try to Demystify VLA Performance in Contact-Rich Tasks and How to Fix Them.

15:30 at the Act to Sense to Act Better Workshop, Hall C4, #ICRA2026 https://t.co/LQwx1KJpG7

2

115

9

85

10K

qqzzq2 retweeted

Jeannette Bohg @leto__jean

18 days ago

My favourite thing at #ICRA26? The workshops. Because you learn what everyone has been up to. In that spirit, I will talk about our new Dexterous Manipulation work at the workshop on Dexterity with Multifingered Hands: 13:55-14:20, Stolz 2. Here is a teaser (video plays at 1x)

12

450

48

159

66K

qqzzq2 retweeted

AGIBOT Finch @agibot_research

19 days ago

Introducing τ0-WM: the largest-scale pre-trained embodied world model to date. At 5B parameters, it diverges from standard models that directly map observations to actions. Instead, τ0-WM unifies policy and world modeling into a single robotic foundation framework.

agibot_research's tweet photo. Introducing τ0-WM: the largest-scale pre-trained embodied world model to date. At 5B parameters, it diverges from standard models that directly map observations to actions. Instead, τ0-WM unifies policy and world modeling into a single robotic foundation framework. https://t.co/Nka69B6ZsM

1

42

10

48

5K

qqzzq2 retweeted

Haniel Ulises @Haniel_Ulises

21 days ago

Revisiting PILP, might be doing something soon with SOTA architectures. The probabilistic + relational structure still feels ahead of its time.

Haniel_Ulises's tweet photo. Revisiting PILP, might be doing something soon with SOTA architectures. The probabilistic + relational structure still feels ahead of its time. https://t.co/9OdSo7bGIF

2

120

10

89

6K

qqzzq2 retweeted

Xiaoxuan Ma @XiaoxuanMa_

21 days ago

🚀 Excited to share REST3D: REconstructing physically STable and visually consistent 3D scenes from a casual single image🤳. With REST3D, you can naturally interact with stable virtual objects through hand-based VR interactions👐. 🔗 Project page: https://t.co/1CVuGIjAVM

6

558

82

475

38K

qqzzq2 retweeted

Lukas Ziegler

@lukas_m_ziegler

21 days ago

Six arms instead of two! 🤯 Midea Group humanoid robot has six arms instead of two. It handles heavy components with lower limbs and performs fine assembly work with upper limbs. Full 360-degree rotation, stable vertical lifting, rapid tool-swapping. The robot handles workstation transitions that would typically require multiple human workers or separate machines. So no idle-time between stations. Pretty cool. But I'm really curious about the cost, and how this 6-arms robot compares to a regular robot arm, and then to a regular humanoid What do yo think? Overcomplicated? ~~ ♻️ Join the weekly robotics newsletter, and never miss any news → https://t.co/GoA3ZuwoPB

10

127

31

43

13K

qqzzq2 retweeted

Alessandro Favero @alesfav

21 days ago

AI needs vastly more data than we do. One idea might close the gap: don't predict raw signals (tokens), predict your own abstract latent representation (JEPA, data2vec). With @DanKorchinski @MatthieuWyart, on a toy model, we prove how much that helps: the gap is exponential. 🧵

alesfav's tweet photo. AI needs vastly more data than we do. One idea might close the gap: don't predict raw signals (tokens), predict your own abstract latent representation (JEPA, data2vec).

With @DanKorchinski @MatthieuWyart, on a toy model, we prove how much that helps: the gap is exponential.

🧵 https://t.co/I51Q6Jwiqr

14

517

76

466

51K

qqzzq2 retweeted

Aleksa Gordić (水平问题)

@gordic_aleksa

24 days ago

new in-depth blog post time: Inside the Transformer: The Life of a Token a deep dive into a modern dense transformer, i cover YaRN (why does pairwise coordinate rotation induce positional information?), hybrid attention (getting to 160k context length), soft capping, QK normalization, etc. as the token flows through the transformer bonus transformer math: FLOPs/token formula (and when is 6N formula broken), cluster sizing (how big of a cluster do you need given the model/data size and experiment throughput of interest), and more

gordic_aleksa's tweet photo. new in-depth blog post time: Inside the Transformer: The Life of a Token

a deep dive into a modern dense transformer, i cover YaRN (why does pairwise coordinate rotation induce positional information?), hybrid attention (getting to 160k context length), soft capping, QK normalization, etc. as the token flows through the transformer

bonus transformer math: FLOPs/token formula (and when is 6N formula broken), cluster sizing (how big of a cluster do you need given the model/data size and experiment throughput of interest), and more

22

1K

143

1K

49K

qqzzqqqqqq

@qqzzq2

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users