Peihao Wang @peihao_wang - Twitter Profile

about 2 months ago

Interestingly, we revealed a duality: 🔵 Training-time alignment ≈ amortized parameter-space optimization 🔵 Test-time optimization ≈ latent space sampling From a classical statistical inference lens, these two are tightly connected, just operating over different spaces.

0

7

1

463

Peihao Wang @peihao_wang

about 2 months ago

Latent space reasoning via looped transformers has gained attention lately. It is rooted in optimization unrolling , where each loop implicitly models a GD step on hidden states. Our ICLR paper studied what if we explicitly run GD in latent space at test time?

Zhen Wang

@zhenwang9102

about 2 months ago

1/🧵 What if test-time reasoning wasn't discrete search, but gradient descent in latent space? Happy to share our #ICLR2026 paper ∇-Reasoner: a paradigm shift from zeroth-order search to first-order optim at test time. Led by @peihao_wang @ccccrs_0908 https://t.co/MgoSQ8lyXG

zhenwang9102's tweet photo. 1/🧵 What if test-time reasoning wasn't discrete search, but gradient descent in latent space?

Happy to share our #ICLR2026 paper ∇-Reasoner: a paradigm shift from zeroth-order search to first-order optim at test time. Led by @peihao_wang @ccccrs_0908

https://t.co/MgoSQ8lyXG https://t.co/DTgrO3KWql

5

233

37

242

46K

5

380

39

326

36K

Peihao Wang @peihao_wang

about 2 months ago

We formulate decoding as an optimization problem: find responses that maximize a differentiable reward subject to being sampled from an LLM . Gradients are backpropagated into the model’s hidden states, steering inference into a form of test-time training.

1

7

1

2

574

Peihao Wang @peihao_wang

3 months ago

@liliang_ren Congrats. Looking forward to your new chapter.

0

1

0

176

Who to follow

VITA Group

@VITAGroupUT

VITA Group @UTAustin w/ Prof Atlas Wang | https://t.co/Wi3tJXf1mg Run by VITA students (PI is busy changing diapers😄). Tweets only reflect personal views

Yihao Xue

@xue_yihao65785

Research Scientist @ Google | PhD, UCLA

WENHAN YANG

@WenhanYang0315

P.hD. in CS, UCLA. Interest in self-supervised learning, including exploring Graph CL, CL robustness and multimodal CL robustness.

Peihao Wang @peihao_wang

3 months ago

"One static model does not fit all." Reminds me of the old parametric vs. non-parametric regression debate. Nice to see scalable generative weight-space models finally taking shape.

Tencent Hy

@TencentHunyuan

4 months ago

One static model does not fit all😭 We just dropped our latest work: Functional Neural Memory. Instead of static models, we generate custom "parameters" for every single input. ✅Prompt your model anytime ✅Instant personalization ✅Better instruction following ✅Flexible & dynamic memory (w/o memory bank✌️) (🧵1/6)

11

342

139

202

74K

0

1

0

157

peihao_wang retweeted

DAIR.AI

@dair_ai

5 months ago

Are multi-agent systems necessary? Here is a great new paper addressing this. The big assumption most AI devs make today is that more agents lead to better performance. But here is the overlooked reality: most multi-agent systems are homogeneous. All agents typically share the same base LLM, differing only in prompts, tools, and positions in the workflow. This raises a compelling question of whether a single agent can simulate these workflows through multi-turn conversations. This new research investigates this across seven benchmarks spanning coding, mathematics, QA, domain-specific reasoning, and real-world planning. A single agent with KV cache reuse can match the performance of homogeneous multi-agent workflows while reducing inference costs. The cost advantage comes from shared KV cache across agent interactions, avoiding redundant prefill computation. Because homogeneous agents possess identical reasoning capabilities and differ only in specialized instructions, a single agent can role-play these agents sequentially, exploiting the workflow's task decomposition without needing separate model instances. Building on this finding, the researchers propose OneFlow, an algorithm that automatically designs workflows optimized for single-agent execution. OneFlow uses a dual meta-LLM architecture (Creative Designer + Critical Reviewer) with Monte Carlo Tree Search to discover streamlined workflows with comprehensive system prompts and fewer total agents. OneFlow with single-agent execution achieves 92.1% on HumanEval, 81.4% on MBPP, 93.3% on GSM8K, matching or exceeding multi-agent baselines while significantly reducing cost. Single-LLM methods cannot capture truly heterogeneous workflows where agents use different base models, since KV caches cannot be shared across different LLMs. These results position single-LLM implementation as a strong baseline for MAS research. The authors suggest that the real opportunity lies in developing heterogeneous systems where model diversity benefits outweigh coordination costs. Paper: https://t.co/Y6wCAfqrMN Learn to build effective AI agents in our academy: https://t.co/zQXQt0PMbG

dair_ai's tweet photo. Are multi-agent systems necessary?

Here is a great new paper addressing this.

The big assumption most AI devs make today is that more agents lead to better performance.

But here is the overlooked reality: most multi-agent systems are homogeneous.

All agents typically share the same base LLM, differing only in prompts, tools, and positions in the workflow.

This raises a compelling question of whether a single agent can simulate these workflows through multi-turn conversations.

This new research investigates this across seven benchmarks spanning coding, mathematics, QA, domain-specific reasoning, and real-world planning.

A single agent with KV cache reuse can match the performance of homogeneous multi-agent workflows while reducing inference costs.

The cost advantage comes from shared KV cache across agent interactions, avoiding redundant prefill computation.

Because homogeneous agents possess identical reasoning capabilities and differ only in specialized instructions, a single agent can role-play these agents sequentially, exploiting the workflow's task decomposition without needing separate model instances.

Building on this finding, the researchers propose OneFlow, an algorithm that automatically designs workflows optimized for single-agent execution.

OneFlow uses a dual meta-LLM architecture (Creative Designer + Critical Reviewer) with Monte Carlo Tree Search to discover streamlined workflows with comprehensive system prompts and fewer total agents.

OneFlow with single-agent execution achieves 92.1% on HumanEval, 81.4% on MBPP, 93.3% on GSM8K, matching or exceeding multi-agent baselines while significantly reducing cost.

Single-LLM methods cannot capture truly heterogeneous workflows where agents use different base models, since KV caches cannot be shared across different LLMs.

These results position single-LLM implementation as a strong baseline for MAS research. The authors suggest that the real opportunity lies in developing heterogeneous systems where model diversity benefits outweigh coordination costs.

Paper: https://t.co/Y6wCAfqrMN

Learn to build effective AI agents in our academy: https://t.co/zQXQt0PMbG

14

207

41

205

24K

Peihao Wang @peihao_wang

8 months ago

@zhiwen_fan_ Thx Zhiwen! Glad that I finally made some progress chasing your excellence.

0

29

Peihao Wang @peihao_wang

8 months ago

@VITAGroupUT It won't be possible without the team I'm working with.

0

4

0

54

Peihao Wang @peihao_wang

8 months ago

Thank you, @JeffDean. Really honored to join the 2025 class of Google PhD Fellows! Excited to carry forward the inspiration to explore the AI frontier where logics and physics meet algebra and geometry.

Jeff Dean

@JeffDean

8 months ago

Congrats to all the 255 recipients of this year's Google PhD Fellows awards, across 35 countries! 🎉

19

956

57

106

120K

1

31

0

2

9K

Peihao Wang @peihao_wang

12 months ago

This work is so special to me. I first touched cryo-EM as a junior - couldn’t believe a neural net could predict bio structure from extremely low SNR, unposed images. with so many AI progress in these 5 years, scaling laws make AI-driven protein discovery feel real

Zhiwen(Aaron) Fan

@zhiwen_fan_

12 months ago

DUSt3R-like models work for scientific imaging too! Our ICCV’25 paper “CryoFastAR” shows that a geometric foundation model can do feed-forward ab initio cryo-EM reconstruction—10× faster and state-of-the-art quality on noisy particle images! #ICCV2025 #CryoEM 📎Paper: https://t.co/jqlpBmi5G5

zhiwen_fan_'s tweet photo. DUSt3R-like models work for scientific imaging too! Our ICCV’25 paper “CryoFastAR” shows that a geometric foundation model can do feed-forward ab initio cryo-EM reconstruction—10× faster and state-of-the-art quality on noisy particle images! #ICCV2025 #CryoEM

📎Paper: https://t.co/jqlpBmi5G5

3

103

14

36

10K

0

1

0

607

peihao_wang retweeted

Zhiwen(Aaron) Fan

@zhiwen_fan_

about 1 year ago

We already introduced #LightGaussian last year to accelerate the rendering speed of 3DGS. In our CVPR'25 paper, SteepGS, we go further by demystifying and improving density control during 3DGS optimization — making training more efficient and reliable. Project Page: https://t.co/kzyNTBeo2T

zhiwen_fan_'s tweet photo. We already introduced #LightGaussian last year to accelerate the rendering speed of 3DGS.
In our CVPR'25 paper, SteepGS, we go further by demystifying and improving density control during 3DGS optimization — making training more efficient and reliable.

Project Page: https://t.co/kzyNTBeo2T

0

107

18

50

6K

Peihao Wang @peihao_wang

over 1 year ago

@ccccrs_0908 Congrats 🎊 so proud of you

0

2

0

179

peihao_wang retweeted

Ruisi Cai @ccccrs_0908

over 1 year ago

Layer-wise routers are surprisingly redundant in current MoE. Check out Read-ME for the system-friendly MoE refactorization technique with system co-design!

0

19

3

4

4K

peihao_wang retweeted

Zhiwen(Aaron) Fan

@zhiwen_fan_

over 1 year ago

🚀 Our NeurIPS '24 work, Large Spatial Model (LSM), is here! LSM performs semantic 3D reconstruction in just 0.1s, processing unposed data via feed-forward 3D reconstruction. 👉It leverages large-scale 3D datasets with minimal annotations, defining a 3D latent space. We are continuously exploring how this explicit 3D representation can further enhance reasoning and robotic learning. 🔗 Try our online Gradio demo with your own data at https://t.co/FjGsPkcJ6h #NeurIPS2024 #3DReconstruction

3

309

63

215

44K

peihao_wang retweeted

Ruisi Cai @ccccrs_0908

almost 2 years ago

Train one - Get many🚀! Check more details about Flextron at https://t.co/aPEgVIyfqq

0

14

2

0

1K

peihao_wang retweeted

Mingyuan Zhou @MingyuanZhou

almost 2 years ago

Introducing Score identity Distillation with Long and Short Guidance (SiD-LSG), our data-free solution to distill Stable Diffusion models into one-step text-to-image generators, achieving a COCO2014 zero-shot FID of 8.15. Excited to share the code and checkpoints with the community! Code: https://t.co/hM2BDH2Spe Paper: https://t.co/mJp5WFqWub #Diffusion #Distillation #StableDiffusion @ZhendongWang6 @UnderGroundJeg @haihuang_ml

1

11

3

0

684

peihao_wang retweeted

Ruisi Cai @ccccrs_0908

about 2 years ago

Tired of training varying-size LLMs to fit various GPU memory and latency requirements? Check out Flextron! Our new ICML (Oral) paper shows how to train one model deployable across GPU series. Learn more: https://t.co/aPEgVIyfqq🚀

2

29

7

9

5K

peihao_wang retweeted

Ruisi Cai @ccccrs_0908

about 2 years ago

The Flextron-Llama2-7B model family demonstrates superior MMLU performance compared to both open-source models (including Pythia, OpenLLaMA-v2) and existing post-hoc compression methods (including Sheared-LLaMA, SliceGPT, LLM-Pruner, Compresso, LaCo).

ccccrs_0908's tweet photo. The Flextron-Llama2-7B model family demonstrates superior MMLU performance compared to both open-source models (including Pythia, OpenLLaMA-v2) and existing post-hoc compression methods (including Sheared-LLaMA, SliceGPT, LLM-Pruner, Compresso, LaCo). https://t.co/9MhSbrNkoL

1

6

1

0

1K

peihao_wang retweeted

Ruisi Cai @ccccrs_0908

about 2 years ago

Managing long context is challenging due to quadratic attention memory usage. But what if we could compress growing context information into a fixed-size memory? 🤔 Check out our new ICML paper: "LoCoCo: Dropping In Convolutions for Long Context Compression"! 1/3

ccccrs_0908's tweet photo. Managing long context is challenging due to quadratic attention memory usage. But what if we could compress growing context information into a fixed-size memory? 🤔

Check out our new ICML paper: "LoCoCo: Dropping In Convolutions for Long Context Compression"!

1/3 https://t.co/a80aqRNq4V

5

88

24

73

20K

Peihao Wang @peihao_wang

about 2 years ago

Training 3D foundation models? In our CVPR2024 work, we propose a new concept that directly enhances 2D prediction’s view consistency via image based rendering. It generalizes to many 2D foundation models in zero shot and transfers their success to 3D at little training cost.

Mukund @sneezygiraffe

about 2 years ago

Progress in 2D vision models has been exciting, e.g. SAM, DINO, etc. But how do we apply them on a 3D scene? We propose Lift3D, a plug ‘n play framework that converts any arbitrary 2D vision model to be 3D consistent w/o any extra optimization. https://t.co/lLOFR0Pa0w

4

87

17

34

9K

0

10

1

0

727

Peihao Wang

@peihao_wang

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users