Byeonghyun Pak @Byeonghyun_Pak - Twitter Profile

7 days ago

Introducing SERF: a spatiotemporal environment and robot feature map for long-horizon mobile manipulation. We demonstrate that conditioning a mobile manipulation policy on a SERF map enables long-horizon reasoning.

6

190

28

142

26K

Byeonghyun_Pak retweeted

Physical Intelligence

@physical_int

12 days ago

New work with @nvidia: evaluating robot policies entirely inside a world model. The policy acts, the model imagines the consequences, and the imagined evals predict real-world results. 🧵 real vs world-model rollout side by side📷

19

709

102

365

100K

Byeonghyun_Pak retweeted

Furong Huang

@furongh

16 days ago

Hot take: robots should not dream in pixels. Pixels are too low-level. Latents are too opaque. μ₀ predicts a third thing: 3D motion traces. On real robots, it beats π₀.₅ — with ~1/100 the data scale and no action labels for world-model pretraining. 🧵 https://t.co/UfmrqNlBtw (this video features voiceover narration)

29

855

113

761

110K

Byeonghyun_Pak retweeted

Loren @murloren

3 months ago

I am very happy to share the result of my internship at FAIR (Meta): V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning with @ylecun @AdrienBardes Our approach learns dense, spatially coherent features from video while preserving strong global understanding

murloren's tweet photo. I am very happy to share the result of my internship at FAIR (Meta): V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning with @ylecun @AdrienBardes

Our approach learns dense, spatially coherent features from video while preserving strong global understanding https://t.co/NVh5CZAGfZ

18

331

47

137

72K

Byeonghyun_Pak retweeted

Ayush Tewari @_atewari

5 months ago

Is pixel prediction the best way to build a world model? Check out VDAWorld, an alternative path to building interpretable, editable, and physically grounded world models. We use a VLM to build a simulation of the scene with the help of a computer vision toolbox.

7

164

15

96

9K

Byeonghyun_Pak retweeted

Sourish Jasti

@SourishJasti

5 months ago

1/ General-purpose robotics is the rare technological frontier where the US / China started at roughly the same time and there's no clear winner yet. To better understand the landscape, @zoeytang_1007, @intelchentwo, @vishnuman0 and I spent the last ~8 weeks creating a deep dive on humanoid robotics hardware and flew to China to see the supply chain firsthand. Here's everything we've created + our takeaways about the components, humanoid comparisons, supply chains, and geopolitics👇

71

2K

256

2K

833K

Byeonghyun_Pak retweeted

Nima Gard

@NimaGard

5 months ago

https://t.co/ydZAQzA9OQ

5

97

18

183

37K

Byeonghyun_Pak retweeted

AGIBOT

@AGIBOTofficial

6 months ago

🚀 AGIBOT Genie Sim 3.0: First LLM-Driven Open-Source Simulation Platform for Embodied AI @ CES 2026 Integrated with NVIDIA Isaac Sim, AGIBOT’s all-new Genie Sim 3.0 (https://t.co/LcGXCa8168) debuts at CES - delivering a unified toolchain for digital asset generation, scene generalization, data collection, and automated evaluation. Key highlights: ✅ 10,000+ hours of synthetic dataset (real-world robot operation scenarios) — the largest open-source dataset for embodied AI. ✅ High-fidelity environments via 3D reconstruction + visual generation fusion. ✅ LLM-driven tech: Generate massive simulation scenes & metrics in minutes. ✅ 200+ tasks across 100k+ scenarios for comprehensive model evaluation. Fully open-source (code, data, assets: https://t.co/CmKe4FEAUb) - accelerating model development, cutting physical hardware reliance, and fueling embodied AI innovation! #AGIBOT #GenieSim3 #AGIBOTG2 #EmbodiedAI #OpenSource #CES2026 #AGIBOTatCES2026 #NVIDIA #HumanoidRobot #Robotics #Innovation

4

249

52

80

20K

Byeonghyun_Pak retweeted

Ilir Aliu

@IlirAliu_

7 months ago

Long horizon robotics is hard. Robots drift, forget the goal, and fall apart when a task takes hundreds of steps. This small independent team hit rank 1 on Stanford’s Behavior1K benchmark by Fei Fei Li’s lab… Ahead of teams from NVIDIA, CMU, and several big labs. What they found useful: ✅ Split each task into stages so the policy always knows where it is Keeps the robot on track for long sequences ✅ Predict a chunk of actions instead of step by step Reduces jitter and creates smoother motion ✅ Add correlated noise during training Matches real robot behavior better and makes the policy more stable ✅ Use many samples per forward pass Lowers variance and helps when trajectories are long ✅ Keep high level context isolated from noisy signals Prevents the model from getting confused mid task These ideas transfer well to real robots: Stage awareness, chunked execution, realistic noise, and clean attention flow all reduce drift and make long tasks possible. All built by a small independent group: @IliaLarchenko @akashkarnatak, and Gleb Zarin. Thanks for pointing out, @abhishekX697. I have no scientific verdict on the full method, but the ideas are interesting and worth reading. —- Weekly robotics and AI insights. Subscribe free: https://t.co/dsa6wcvq6n

9

174

24

105

13K

Byeonghyun_Pak retweeted

Robots Digest 🤖

@robotsdigest

9 months ago

Latent Theory of Mind: Meet LatentToM, a powerful framework for robots to understand and predict human intentions in real-time. - - learning compressed latent representations of human mental states - seamless collaboration and adaptive behavior, - 3x more sample-efficient.

1

9

2

5

408

Byeonghyun_Pak retweeted

Chenhao Li @breadli428

9 months ago

People: robots are replacing humanity. Real: robot dance show at a leading robotic conference without moving a toe.

100

2K

83

507

1M

Byeonghyun_Pak retweeted

Rohan Paul

@rohanpaul_ai

10 months ago

Fei-Fei Li (@drfeifei) on limitations of LLMs. "There's no language out there in nature. You don't go out in nature and there's words written in the sky for you.. There is a 3D world that follows laws of physics." Language is purely generated signal.

245

4K

667

2K

2M

Byeonghyun_Pak retweeted

Jonathan Kao @JonathanCKao

10 months ago

Check out our study in @NatMachIntell, where we use AI copilots for brain-computer interfaces (BCIs): https://t.co/IWhprtemFR. We enable a paralyzed participant to neurally control a computer cursor and a robotic arm (non-invasive, no surgery). Video here: https://t.co/dZG8h6xb9C

1

51

14

11

6K

Byeonghyun_Pak retweeted

Ilir Aliu

@IlirAliu_

10 months ago

🛠️ What if a robot could invent its own tools. And teach itself how to use them? That’s exactly what VLMgineer does: a new framework that lets Vision Language Models (VLMs) design physical tools and the actions to use them, entirely on their own. No templates. No human demonstrations. Just raw, AI-driven creativity. Why it matters ✅ Co-designs tools and actions together using VLMs, ensuring tight coupling between form and function ✅ Uses VLM-guided evolution (not random search) to refine designs intelligently ✅ Outperforms human-designed tools by +64.7% in task success across 12 RoboToolBench challenges ✅ Produces better-than-everyday tools for real manipulation tasks—measured in success rate and elegance It builds on the emerging trend of large-model-guided evolutionary design (like Eureka and AlphaEvolve) and brings it into physical robotics. It opens the door to general-purpose, automated hardware design, no strong priors needed. Code & paper: https://t.co/3pXxpQR4E1

0

167

32

98

8K

Byeonghyun_Pak retweeted

Jean-Rémi King

@JeanRemiKing

10 months ago

Can AI help understand how the brain learns to see the world? Our latest study, led by @JRaugel from FAIR at @AIatMeta and @ENS_ULM, is now out! 📄 https://t.co/y2Y3GP3bI5 🧵 A thread:

30

2K

317

1K

244K

Byeonghyun_Pak retweeted

Jonathan Stephens

@jonstephens85

11 months ago

The team at @NVIDIAAIDev really nailed it with ViPE!! This SLAM system generated pose, depth, point clouds, etc, in a few seconds! Now to get the output to work with GEN3C... If you are interested in running this yourself, it took about 5 minutes to get up and running from scratch. https://t.co/BXMVnyguR8

12

473

58

288

35K

Byeonghyun_Pak retweeted

Vincent Sitzmann

@vincesitzmann

about 1 year ago

Our paper on learning controllable 3D robot models from vision is published in Nature! Huge congrats to Lester and the team, @annan__zhang, @BoyuanChen0, Hanna Matusik, Chao Liu, and Daniela Rus!! Learning joint world models for the environment & the agent is super exciting :)

3

152

20

46

20K

Byeonghyun_Pak retweeted

Ryohei Sasaki@engineer

@rsasaki0109

about 1 year ago

Test3R: Learning to Reconstruct 3D at Test Time https://t.co/JMDNghN7Ew

1

84

11

40

5K

Byeonghyun_Pak retweeted

AK

@_akhaliq

about 1 year ago

UniGeo Taming Video Diffusion for Unified Consistent Geometry Estimation

1

72

6

23

9K

Byeonghyun_Pak retweeted

Xun Huang

@xxunhuang

about 1 year ago

A video generator must satisfy 3 criteria to be a world model: 1️⃣ Causality: Past affects future, not vice versa. 2️⃣ Persistence: The world shouldn't change because you looked away. 3️⃣ Constant Speed: Simulation shouldn't slow down over time. We believe SSMs are a natural fit: inherent alignment with the causal structure of time, supporting long-context, and constant per-frame generation speed.

5

243

29

125

27K

Byeonghyun Pak

@Byeonghyun_Pak

Last Seen Users on Sotwe

Trends for you

Most Popular Users