Sai Bi @Sai__Bi - Twitter Profile

Sai__Bi retweeted

4 days ago

Research ideas you can't be outscaled on. Many important problems in ML now demand compute no university lab can match, and junior students feel the constant pressure of getting scooped. I wanted to share a different bet: research directions where insight may still matter more than scale. These are deliberately unconventional — grounded in mechanisms rarely seen in the usual scaling playbook. Because they sit in the long tail, they're unlikely to show up in AI-generated research idea lists. No guaranteed success, but real open questions with enough material to get started. If you find these useful, PRs to add more are welcome. I'll keep updating too. https://t.co/gkjcwiu111

9

450

40

493

44K

Sai__Bi retweeted

Hansheng Chen @HanshengCh

about 1 month ago

New paper: AsymFlow🔥 JiT x0-prediction is not enough for pixel generation. Better keep velocity in a low-rank subspace: - 1.57 FID on ImageNet (best pixel flow model) - Finetunes FLUX.2 klein into pixel space, beats the original on HPSv3/DPG/GenEval (#1 overall on HPSv3) 1/7

HanshengCh's tweet photo. New paper: AsymFlow🔥

JiT x0-prediction is not enough for pixel generation. Better keep velocity in a low-rank subspace:

- 1.57 FID on ImageNet (best pixel flow model)
- Finetunes FLUX.2 klein into pixel space, beats the original on HPSv3/DPG/GenEval (#1 overall on HPSv3)

1/7 https://t.co/FSz46hrJHj

20

281

55

196

54K

Sai__Bi retweeted

kat kampf

@kat_kampf

about 2 months ago

Two new updates to the agent in AI Studio Build: 🔎 Web search to ground agent responses in the latest api docs 💬 Multi chat to spin up a new conversation for every idea and switch between sessions

5

35

6

3

2K

Sai__Bi retweeted

Tri Dao

@tri_dao

6 months ago

This is what we've been coking for the last 9 months: make MoEs training goes ~2x faster and ~2x less memory! Highlights: - MoE typically takes the most time and memory in modern models. Turns out one can mathematically rewrite the MoE backward pass to reduce the activation mem you need to store in the fwd by ~2x, resulting in the same gradients with no extra matmul recomputation. I really like this result, as it combines both algorithmic and systems insights. - Analyzing bottlenecks in MoE layer leads to a natural optimization stragegy: reduce mem reads/writes as much as possible! Gathering the input for fwd and output grad for bwd can sometimes take as much time as the grouped GEMMs. We fuse gather with grouped GEMM + overlap mem access and compute to make the whole layer goes ~2x faster. - Computing top-k for expert routing can take surprisingly long, ~15-20% of the whole MoE layer! Standard top-k impl uses radix top-k algo, great for large k but suboptimal for small k. We rewrote top-k using bitonic top-k algo, and it's sometimes 20-30x faster than pytorch's top-k! All the main kernels are written in Cute-DSL so they should be easy to extend (and install :D). Hopper kernels are out, Blackwell kernels are just about ready. MoE models used to be 2x less hardware-efficient to train, hopefully Sonic-MOE will change that.

30

1K

167

700

159K

Who to follow

Yi Zhou

@Papagina_Yi

@Roblox Senior AI Scientist @Adobe Previous Research Scientist @USC Ph.D. Alumni @sjtu1896 Alumni 3D Representations, 4D Humans, Avatar, Digital Human

Qi Sun

@qisun0

Associate Professor at New York University directing https://t.co/Tm0Ri67xhk

Xiuming Zhang

@xiuming_zhang

@NVIDIAAI Spatial Intelligence Lab, ex-@Tesla_AI, ex-@Adobe, ex-@GoogleAI, Ph.D. from @MIT_CSAIL. Views obviously all mine.

Sai__Bi retweeted

Hanwen Jiang

@hanwenjiang1

6 months ago

(1/N) Will this be the BERT/GPT moment for 3D vision？ Finally, unsupervised pre-training for 3D works. Led by @qitao_zhao , we present E-RayZer — a fully self-supervised 3D reconstruction model that: 🔥Matches or surpasses supervised methods like VGGT 👀Learns transferable 3D representations, outperforming CroCo, VideoMAE, and DINO 📈Scales with more unlabeled data A new recipe for scalable 3D foundation models.

4

393

86

262

58K

Sai__Bi retweeted

Ying Sheng

@ying11231

6 months ago

We've been running @radixark for a few months, started by many core developers in SGLang @lmsysorg and its extended ecosystem (slime @slime_framework , AReaL @jxwuyi). I left @xai in August — a place where I built deep emotions and countless beautiful memories. It was the best place I’ve ever worked, the place I watched grow from a few dozen people to hundreds, and it truly felt like home. What pushed me to make such a hard decision is the momentum of building SGLang open source and the mission of creating an ambitious future, within an open spirit that I learnt from my first job at @databricks after my PhD. We started SGLang in the summer of 2023 and made it public in January 2024. Over the past 2 years, hundreds of people have made great efforts to get to where they are today. We experienced several waves of growth after its first release. I still remember the many dark nights in the summer of 2024, I spent with @lm_zheng , @lsyincs , and @zhyncs42 debugging, while @ispobaoke single-handedly took on DeepSeek inference optimizations, seeing @GenAI_is_real and the community strike team tag-teaming on-call shifts non-stop. There are so many more who have joined that I'm out of space to call out, but they're recorded on the GitHub contributor list forever. The demands grow exponentially, and we have been pushed to make it a dedicated effort supported by RadixArk. It’s the step-by-step journey of a thousand miles that has carried us here today, and the same relentless Long March that will lead us into the tens of thousands of miles yet to come. The story never stops growing. Over the past year, we’ve seen something very clear: The world is full of people eager to build AI, but the infrastructure that makes it possible is not shared. The most advanced inference and training stacks live inside a few companies. Everyone else is forced to rebuild the same schedulers, compilers, serving engines, and training pipelines again and again — often under enormous pressure, with lots of duplicated effort and wasted insight. RadixArk was born to change that. Today, we’re building an infrastructure-first, deep-tech company with a simple and ambitious mission: "Make frontier-level AI infrastructure open and accessible to everyone." If the two values below resonate with you, come talk to us: (1) Engineering as an art. Infrastructure is a first-class citizen in RadixArk. We care about elegant design and code that lasts. Beneath every line of code lies the soul of the engineer who wrote it. (2) A belief in openness. We share what we build. We bet on long-term compounding through community, contribution, and giving more than we take. A product is defined by its users, yet it truly comes alive the moment functionality transcends mere utility and begins to embody aesthetics. Thanks to all the miles (the name of our first released RL framework; see below). https://t.co/2vio4Eiiac

116

1K

130

359

549K

Sai Bi @Sai__Bi

6 months ago

Check out the amazing work by our intern @AntheaYLi on applying Evolution Strategies to improve reasoning efficiently with LoRA and quantization. Yichen is at NeurIPS, and please reach out to her and chat if you are interested.

Anthea Li

@AntheaYLi

6 months ago

We look at how Evolution Strategies can be effective to improve reasoning under small population size and low-rank perturbations: • How population size, noise scale, step size, and LoRA rank interact • A trust-region + spectral norm lens on the stability of rank • Forward-only allows for smooth quantization: alleviates training-inference mismatch in RL Blog(WIP): https://t.co/x4AmdWPE2o

6

198

33

144

45K

0

6

0

1K

Sai__Bi retweeted

Anthea Li

@AntheaYLi

6 months ago

We look at how Evolution Strategies can be effective to improve reasoning under small population size and low-rank perturbations: • How population size, noise scale, step size, and LoRA rank interact • A trust-region + spectral norm lens on the stability of rank • Forward-only allows for smooth quantization: alleviates training-inference mismatch in RL Blog(WIP): https://t.co/x4AmdWPE2o

6

198

33

144

45K

Sai Bi @Sai__Bi

8 months ago

Check out the cool work by our intern @HanshengCh on policy-based distillation for few-step generation.

Hansheng Chen @HanshengCh

8 months ago

Excited to announce a new track of accelerating Generative AI: pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation https://t.co/6ro55E1XGP Distill 20B flow models now using just an L2 loss via imitation learning for SOTA diversity and teacher-aligned quality.

HanshengCh's tweet photo. Excited to announce a new track of accelerating Generative AI:

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation
https://t.co/6ro55E1XGP

Distill 20B flow models now using just an L2 loss via imitation learning for SOTA diversity and teacher-aligned quality. https://t.co/kxIZs7j3vC

2

155

27

86

36K

0

7

0

1

1K

Sai__Bi retweeted

Bowei Chen @bowei_chen_19

9 months ago

We found that visual foundation encoder can be aligned to serve as tokenizers for latent diffusion models in image generation! Our new paper introduces a new tokenizer training paradigm that produces a semantically rich latent space, improving diffusion model performance🚀🚀.

bowei_chen_19's tweet photo. We found that visual foundation encoder can be aligned to serve as tokenizers for latent diffusion models in image generation!

Our new paper introduces a new tokenizer training paradigm that produces a semantically rich latent space, improving diffusion model performance🚀🚀. https://t.co/quD5hHaWYf

7

521

70

328

81K

Sai__Bi retweeted

Percy Liang

@percyliang

12 months ago

Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbband @rckpudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:

46

5K

570

7K

679K

Sai Bi @Sai__Bi

about 1 year ago

I am going to give a talk on scalable 3D reconstructions today at the 3D-LLM/VLA workshop at CVPR at 10:55am today at Room 106A. Welcome to attend! https://t.co/ILPYYG1UrP

1

27

0

1

1K

Sai__Bi retweeted

Zhao Dong

@flycooler_zd

about 1 year ago

🚀 Excited to announce our CVPR 2025 Workshop: 3D Digital Twin: Progress, Challenges, and Future Directions 🗓 June 12, 2025 · 9:00 AM–5:00 PM 📢 Incredible lineup: @rapideRobot, Andrea Vedaldi @Oxford_VGG,@richardzhangsfu,@QianqianWang5,Dr. Xiaoshuai Zhang @Hillbot_AI, @xiaolonw, @shuangz, @MilosHasan, James Fort @meta_aria. 🔗Details👉https://t.co/E1f9YwYtWg Join us to explore photorealistic, functional & physically-accurate #3DdigitalTwins for Spatial & Contextual AI, Robotics, AR/VR, Digital Content Creation! #CVPR2025 #3DdigitalTwins #3DVision #ContextualAI #SpatialAI #Robotics #Humanoidrobots #Metaverse #Gen3d 🖼👇

flycooler_zd's tweet photo. 🚀 Excited to announce our CVPR 2025 Workshop:
3D Digital Twin: Progress, Challenges, and Future Directions
🗓 June 12, 2025 · 9:00 AM–5:00 PM
📢 Incredible lineup: @rapideRobot, Andrea Vedaldi
@Oxford_VGG,@richardzhangsfu,@QianqianWang5,Dr. Xiaoshuai Zhang @Hillbot_AI, @xiaolonw, @shuangz, @MilosHasan, James Fort @meta_aria.
🔗Details👉https://t.co/E1f9YwYtWg
Join us to explore photorealistic, functional & physically-accurate #3DdigitalTwins for Spatial & Contextual AI, Robotics, AR/VR, Digital Content Creation!
#CVPR2025 #3DdigitalTwins #3DVision #ContextualAI #SpatialAI #Robotics #Humanoidrobots #Metaverse #Gen3d
🖼👇

2

58

22

14

14K

Sai__Bi retweeted

Tianyuan Zhang

@tianyuanzhang99

about 1 year ago

Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative? Our new paper “Test-Time Training Done Right” propose LaCT (Large Chunk Test-Time Training) — a highly efficient, massively scalable nonlinear memory with: 💡 Pure PyTorch (no custom kernels) 🚀 10× GPU FLOPs utilization compared to previous nonlinear test-time training(ttt) methods. 🧠 Huge memory size (up to 40% of model params) Project page with code: https://t.co/oRy52Z8ZS1 (videos generated with our AR video diffusion) 1/9

7

426

80

281

102K

Sai__Bi retweeted

Haian Jin

@Haian_Jin

about 1 year ago

Excited to attend #ICLR2025 in person this year! I’ll be presenting two papers: 1. LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias 🔹 Oral Presentation: Session 3C (Garnet 216-218) — Apr 25 (Fri), 11:06–11:18 a.m. 🔹 Poster: Hall 3 + Hall 2B, Poster #593 — Apr 25 (Fri), 3:00–5:30 p.m. 🔹 Website: https://t.co/Alqo3s1o91 2. RelitLRM: Generative Relightable Radiance for Large Reconstruction Models (led by @tianyuanzhang99) 🔹 Poster: Hall 3 + Hall 2B, Poster #531 — Apr 26 (Sat), 3:00–5:30 p.m. 🔹 Website: https://t.co/Be9PzDfbEl Feel free to drop by—looking forward to chatting with you!

1

27

3

2

3K

Sai Bi @Sai__Bi

about 1 year ago

I will be attending ICLR in Singapore this week. Feel free to reach out and chat!

0

23

0

3K

Sai Bi @Sai__Bi

about 1 year ago

Check out the fantastic work by our intern @HanshengCh at Adobe Research. The code and model are publicly available!

Hansheng Chen @HanshengCh

about 1 year ago

Excited to share our work: Gaussian Mixture Flow Matching Models (GMFlow) https://t.co/XWAy2VCJlg GMFlow generalizes diffusion models by predicting Gaussian mixture denoising distributions, enabling precise few-step sampling and high-quality generation.

HanshengCh's tweet photo. Excited to share our work:
Gaussian Mixture Flow Matching Models (GMFlow)
https://t.co/XWAy2VCJlg
GMFlow generalizes diffusion models by predicting Gaussian mixture denoising distributions, enabling precise few-step sampling and high-quality generation. https://t.co/Pv039hsvI7

1

128

31

58

13K

1

18

0

787

Sai__Bi retweeted

Haian Jin

@Haian_Jin

about 1 year ago

Our paper LVSM has been accepted as an oral presentation at #ICLR2025! See you in Singapore! We’ve just released the code and checkpoints—check it out here: https://t.co/BNxC35NaB3.🚀

2

130

20

35

11K

Sai Bi @Sai__Bi

over 1 year ago

The speaker was fully aware of the implications of her words and the damage they would cause. Yet, instead of preventing harm, she chose to inflict it first and then attempt to repair it with some 'nice' words. That’s not acceptable!

Jiao Sun

@sunjiao123sun_

over 1 year ago

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡

sunjiao123sun_'s tweet photo. Mitigating racial bias from LLMs is a lot easier than removing it from humans!

Can’t believe this happened at the best AI conference @NeurIPSConf

We have ethical reviews for authors, but missed it for invited speakers? 😡 https://t.co/BjClBR9Kyl

175

4K

772

516

2M

0

20

0

2K

Sai Bi

@Sai__Bi

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users