Lei Hsiung

@twweeb

Ph.D. student @DartmouthCS 🌲

Hanover, NH

Joined February 2011

287 Following

73 Followers

43 Posts

twweeb retweeted

Tianyu Pang

@TianyuPang327

19 days ago

(1/6) Happy to share our ICML 2026 paper: Balancing Learning Rates Across Layers: Exact Two-Step Dynamics and Optimal Scaling in Linear Neural Networks Paper: https://t.co/fTtmyrnqmu How should learning rates across layers evolve during training? Our answer: training can undergo a transition from asymmetry to balance, as cross-layer feature learning make balanced learning increasingly important over time.

Lei Hsiung @twweeb

about 1 month ago

Introducing #HTMuon, a simple spectral correction to Muon inspired by Heavy-Tailed Self-Regularization, improving performance across LLM pretraining and image classification while retaining strong theoretical guarantees. #Muon #Optimization

Tianyu Pang

@TianyuPang327

about 1 month ago

🎉 Excited to share our recent Findings of ACL 2026 paper, HTMuon! Muon has recently shown promising results in LLM training. But can we further improve its update rule? In our new work, we study Muon from the perspective of Heavy-Tailed Self-Regularization (HT-SR) theory and introduce HTMuon, a simple yet effective spectral correction for Muon. Our key contributions are: 1. Understanding a limitation of Muon. Muon’s orthogonalized update rule can over-emphasize noise-dominated directions and suppress the emergence of heavy-tailed eigenspectral distributions in the model’s weight matrices, potentially limiting performance under HT-SR theory. 2. Introducing HTMuon. While Muon uses the orthogonalized update UV^T, HTMuon considers the more general form U\Sigma^pV^T, introducing a spectral correction. This enables HTMuon to produce heavier-tailed updates while preserving Muon’s strength in capturing parameter interdependencies. Across LLM pretraining and image classification, HTMuon consistently improves over Muon and other strong optimizers. It can also be used as a plug-in correction for existing Muon variants. For example, HTMuon reduces perplexity by up to 0.98 over Muon in LLaMA pretraining on C4. We further develop accelerated implementations and demonstrate improvements over Muon on LLaMA-1B. 3. Providing a theoretical characterization. We show that HTMuon is equivalent to steepest descent under a Schatten-q norm constraint and provide a convergence analysis in smooth non-convex settings. The results show that HTMuon retains competitive convergence guarantees while improving practical training performance. 📄 Paper: https://t.co/7yqov5p3jP 💻 Code: https://t.co/iWVtOBspcS Many thanks to my collaborators Yujie Fang, @HenryLiu0820, @DengShenyang24, @twweeb , Shuhua Yu and @nsfzyzz !

$TianyuPang327's tweet photo. 🎉 Excited to share our recent Findings of ACL 2026 paper, HTMuon! Muon has recently shown promising results in LLM training. But can we further improve its update rule? In our new work, we study Muon from the perspective of Heavy-Tailed Self-Regularization (HT-SR) theory and introduce HTMuon, a simple yet effective spectral correction for Muon. Our key contributions are: 1. Understanding a limitation of Muon. Muon’s orthogonalized update rule can over-emphasize noise-dominated directions and suppress the emergence of heavy-tailed eigenspectral distributions in the model’s weight matrices, potentially limiting performance under HT-SR theory. 2. Introducing HTMuon. While Muon uses the orthogonalized update UV^T, HTMuon considers the more general form U\Sigma^pV^T, introducing a spectral correction. This enables HTMuon to produce heavier-tailed updates while preserving Muon’s strength in capturing parameter interdependencies. Across LLM pretraining and image classification, HTMuon consistently improves over Muon and other strong optimizers. It can also be used as a plug-in correction for existing Muon variants. For example, HTMuon reduces perplexity by up to 0.98 over Muon in LLaMA pretraining on C4. We further develop accelerated implementations and demonstrate improvements over Muon on LLaMA-1B. 3. Providing a theoretical characterization. We show that HTMuon is equivalent to steepest descent under a Schatten-q norm constraint and provide a convergence analysis in smooth non-convex settings. The results show that HTMuon retains competitive convergence guarantees while improving practical training performance. 📄 Paper: https://t.co/7yqov5p3jP 💻 Code: https://t.co/iWVtOBspcS Many thanks to my collaborators Yujie Fang, @HenryLiu0820, @DengShenyang24, @twweeb , Shuhua Yu and @nsfzyzz !$

14K

182

twweeb retweeted

Shenyang Deng ✈️ ICML2026

@DengShenyang24

about 2 months ago

1/n Please stop by👋. This is not just another ICML 2026 optimizer paper. We have rich intuition to share on why simple preconditioners like orthogonalization and row-normalization specifically benefit NNs optimization. Quick overview below 🧵

118

113

20K

twweeb retweeted

Shenyang Deng ✈️ ICML2026

@DengShenyang24

4 months ago

It‘s an honor to receive the Best Student Paper Award at #ALT2026 (37th Algorithmic Learning Theory) ! 🏆 Huge thanks to my amazing collaborators Boyao，@Collapsar0000 ，@Tianyu0628 ，@MinhakSong ，@nsfzyzz ！ Had a great time at the Fields Institute in Toronto. 🇨🇦 Looking forward to attending ALT again next time! ✨

DengShenyang24's tweet photo. It‘s an honor to receive the Best Student Paper Award at #ALT2026 (37th Algorithmic Learning Theory) ! 🏆

Huge thanks to my amazing collaborators Boyao，@Collapsar0000 ，@Tianyu0628 ，@MinhakSong ，@nsfzyzz ！

Had a great time at the Fields Institute in Toronto. 🇨🇦 Looking forward to attending ALT again next time! ✨

Who to follow

AI Safety Papers

@safe_paper

Sharing the latest in AI safety research.

Yiming Li

@GeorgeL84893376

Research Fellow @NTUsg | Previous Research Professor @ZJU_China | Ph.D. @Tsinghua_Uni | Visiting Ph.D. Student @uiuc_aisecure | Working on Trustworthy ML/GenAI

Eli Chien

@chien_eli

Assistant Professor @ National Taiwan Univ. Prev.: @Google @GeorgiaTech @UofIllinois @Amazon @BellLabs #RegulatableAI

twweeb retweeted

Chieh-Hsin (Jesse) Lai ✈️ ICML

@JCJesseLai

8 months ago

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core ideas that shaped diffusion modeling and explains how today’s models work, why they work, and where they’re heading. 🧵You’ll find the link and a few highlights in the thread. We’d love to hear your thoughts and join some discussions! ⚡ Stay tuned for our markdown version, where you can drop your comments!

JCJesseLai's tweet photo. Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on!

📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon.

It traces the core ideas that shaped diffusion modeling and explains how today’s models work, why they work, and where they’re heading.

🧵You’ll find the link and a few highlights in the thread.
We’d love to hear your thoughts and join some discussions!

⚡ Stay tuned for our markdown version, where you can drop your comments!

493

859K

twweeb retweeted

Shafiq Joty

@JotyShafiq

9 months ago

Proud to share our new work on LLM Verification — the first systematic study of verification asymmetry and its implications for test-time scaling.

627

twweeb retweeted

Anthropic

@AnthropicAI

11 months ago

We’re running another round of the Anthropic Fellows program. If you're an engineer or researcher with a strong coding or technical background, you can apply to receive funding, compute, and mentorship from Anthropic, beginning this October. There'll be around 32 places.

AnthropicAI's tweet photo. We’re running another round of the Anthropic Fellows program.

If you're an engineer or researcher with a strong coding or technical background, you can apply to receive funding, compute, and mentorship from Anthropic, beginning this October. There'll be around 32 places. https://t.co/wJWRRTt4DG

220

Lei Hsiung @twweeb

about 1 year ago

Great Paper! Safety and trustworthiness are always worth exploring in making reliable AI and deserve more attention from the industry.

Xiangyu Qi

@xiangyuqi_pton

about 1 year ago

Thrilled to know that our paper, `Safety Alignment Should be Made More Than Just a Few Tokens Deep`, received the ICLR 2025 Outstanding Paper Award. We sincerely thank the ICLR committee for awarding one of this year's Outstanding Paper Awards to AI Safety / Adversarial ML. Special thanks go to the reviewers and area chairs for their strong support and recommendations. Throughout the rebuttal period, the reviewers remained deeply engaged, raising thoughtful questions that helped enhance the rigor of our experiments and manuscript. I am also profoundly grateful to my collaborators (@PandaAshwinee @vfleaking @infoxiao @sroy_subhrajit @abeirami) for their joint efforts and my advisors (@prateekmittal_ @PeterHndrsn) for their invaluable guidance and support. + On a personal note, I also defended my PhD at Princeton in February and joined OpenAI last month, where I will continue working on AI safety and adversarial robustness. I'm looking forward to catching up with old friends and meeting new friends around the Bay!) ------ Below are some of my reflections and thoughts on our awarded paper: Adversarial robustness has been an ongoing topic since the early rise of deep learning in 2013 (https://t.co/Squ5sX8GCz). Over the years, we've observed the community swing from pessimism—epitomized by Nicholas Carlini's adaptive attacks (https://t.co/EE9O5aLcRd) systematically dismantling various defenses, fostering the sentiment "adversarial examples are hard"—to skepticism, as adversarial examples appeared to have limited impact on practical AI applications for a while, prompting the notion "adversarial examples are not even important." With the emergence of ChatGPT at the end of 2022, deep learning entered a new era towards AGI, shifting AI safety from theoretical speculation to mainstream practical concern. This is also when adversarial robustness again gets more attention. For example, following our 2023 demonstrations that adversarial examples pose fundamental threats to AI safety alignment (https://t.co/WZM7wfYqa3, https://t.co/g7IQA8D1nY, https://t.co/e5jOcIvFzU), adversarial examples reemerged as the "Sword of Damocles" hanging over AI safety (memorably illustrated by Zico Kolter at ICML 2023 in Hawaii, who humorously preempted his talk on the GCG attack with a Terminator slide captioned, "adversarial examples are back"). More concerningly, in the context of AI safety, disrupting safety alignment through fine-tuning is even simpler and harder to mitigate than adversarial examples (https://t.co/P9o5KGH9mM, https://t.co/2D0HYXcnCb, https://t.co/gH8wW6Nx9V, https://t.co/PK1nlICXzo, https://t.co/jrqD8XQfxB). In 2023, conducting attack research was enjoyable—simply formulating and demonstrating the existence of vulnerabilities sufficed, as the effectiveness of an attack is inherently compelling. However, in 2024, my advisors started to heavily push me toward working on robustness defense, asserting that identifying problems without striving for solutions is not ambitious enough. While I wholeheartedly agreed, I was acutely aware of the profound challenge in achieving genuine robustness. After a lot of struggle, we eventually still developed this paper. Initially, our exploration focused on constrained supervised fine-tuning (SFT) against fine-tuning attacks. During this process, we discovered a critical bias—models exhibit substantial "first-few-tokens bias" concerning safety (here we acknowledge similar findings by https://t.co/irDKwTmsNb and https://t.co/ZceJPx37yg, despite differences in our ultimate directions). Using this bias as a technical trick, we impose strong constraints on the losses of only the initial tokens, relaxing constraints for later tokens. This achieved robustness with significantly lower utility regression. Nevertheless, we soon recognized that this bias is not merely a technical trick but represents a fundamental issue. Consequently, we shifted our focus to exploring the broader implications of this phenomenon itself, ultimately shaping the current paper. In writing this paper, I intentionally echoed the style of two seminal works: "Adversarial Examples Are Not Bugs, They Are Features" (https://t.co/lOmPqbo6B3) and "Shortcut Learning in Deep Neural Networks" (https://t.co/IdeWjpNygQ). The two papers deeply influenced my research style, and receiving the Outstanding Paper award at the culmination of my PhD journey, using a similar writing style, feels both fulfilling and like a tribute to these classics. Frankly, our work still stands far from fully resolving adversarial robustness. In fact, during writing, we deliberately reduced/avoided using the term "defense," resulting in some critique that our paper reads more like a position paper. Rather, our contribution primarily provides just a simple yet concrete explanation (shallow alignment) for a broadly exploited class of vulnerabilities, enabling causal interventions on models to explore the counterfactual of shallow alignment—deep alignment—and demonstrating that such interventions genuinely improve robustness. Fundamentally, our intervention underscores that model alignment must span the entire generation process rather than being confined to the first few token distributions—a principle articulated explicitly in our paper's title. This concept resonates with several other studies, such as Andy Zou et al.’s Circuit Breakers (https://t.co/ul9h7tWVXC), Youliang Yuan et al.’s refusal at every position (https://t.co/DEOBQpOF3i), and Yiming Zhang et al.’s backtracking (fri). To some extent, improved robustness in reasoning models’ safety alignment (https://t.co/F4fNc7BLOu) might also be related to this principle, as large-scale reinforcement learning for reasoning spontaneously enhances self-correction and recovery. Yet, adversarial robustness remains unresolved. Adaptive attacks will continuously emerge, potentially perpetuating many cycles of a cat-and-mouse game again. Furthermore, our challenges extend beyond AI safety and jailbreak issues. As frontier models rapidly advance in agentic capabilities, we eagerly anticipate their large-scale deployment to automate numerous tasks. However, currently, robustness and prompt injection significantly hinder this vision. As AI increasingly manages critical workloads and computational systems, robustness failures could pose severe systemic security risks. Finally, we again extend our sincere appreciation to all friends in the AI safety and AdvML research communities for their ongoing support and encouragement. Let’s continue working together to advance the research on AI safety and adversarial machine learning.

351

111

46K

Lei Hsiung @twweeb

over 1 year ago

@YuYang_i @OpenAI Congrats! 🎉

141

twweeb retweeted

Ruibo Liu

@RuiboLiu

over 1 year ago

my mum tried DeepSeek and told me she loved it because it could generate beautiful literature-level Chinese. she has zero idea about which model is leading those AGI/HLE benchmarks, but she knows what model can best serve her needs in an easily accessible and affordable way. maybe we should rethink what a good AI product should be like. I bet after several years, all the models will converge on those benchmarks. at the end of the day, every developer needs to answer "what special value can your model bring to users?" the majority of your users are people like my mum—they might choose gemini-2.0-thinking not because of its benchmark scores but because its explicit thoughts make her feel "wow, I'm talking to a human-like agent rather than a cold-blooded machine." AI probably has no moat, but personality has.

147

17K

twweeb retweeted

Pin-Yu Chen @pinyuchenTW

over 1 year ago

(7/n) Another novel use of model reprogramming is to design configurable input transformation modules (NeuralFuse) to improve the accuracy of neural networks deployed in low-voltage hardware. Meet @twweeb & Nandhini Chandramoorthy there! Project page: https://t.co/RpKmbc3C0H

pinyuchenTW's tweet photo. (7/n)
Another novel use of model reprogramming is to design configurable input transformation modules (NeuralFuse) to improve the accuracy of neural networks deployed in low-voltage hardware.

Meet @twweeb & Nandhini Chandramoorthy there!

Project page: https://t.co/RpKmbc3C0H https://t.co/3W1IhXigMP

563

twweeb retweeted

mansin

@Mankaran32

about 2 years ago

Back then when they were homies

108

Lei Hsiung @twweeb

about 2 years ago

@windx0303 @ISTtapia @PSUCrowdAILab @ISTatPENNSTATE Congratulations!!

twweeb retweeted

Pin-Yu Chen @pinyuchenTW

about 2 years ago

(4/n) AutoVP: #AutoML meets visual prompting: an end-to-end optimization framework to fully unleash the power of visual prompting for vision models and VLMs like CLIP with Hsi-Ai Tsao @twweeb @sijialiu17 Tsung-Yi Ho Paper: https://t.co/S9PZFPZr0R

pinyuchenTW's tweet photo. (4/n) AutoVP: #AutoML meets visual prompting: an end-to-end optimization framework to fully unleash the power of visual prompting for vision models and VLMs like CLIP

with Hsi-Ai Tsao @twweeb @sijialiu17 Tsung-Yi Ho

Paper: https://t.co/S9PZFPZr0R https://t.co/LFingic4uV

188

Lei Hsiung @twweeb

over 2 years ago

Can't wait!

Dartmouth 🌲

@dartmouth

over 2 years ago

Taking center court as the #Dartmouth24s Commencement speaker 🎾… https://t.co/3cGyr5UoM2

714

843K

twweeb retweeted

Pin-Yu Chen @pinyuchenTW

over 2 years ago

Happy to share the release of the book "Federated Learning: Theory and Practice" that I co-edited with @LamMNguyen3 @nghiaht87, covering fundamentals, emerging topics, and applications. Kudos to the amazing contributors to make this book happen! @ElsevierNews @sciencedirect

pinyuchenTW's tweet photo. Happy to share the release of the book "Federated Learning: Theory and Practice" that I co-edited with @LamMNguyen3 @nghiaht87, covering fundamentals, emerging topics, and applications. Kudos to the amazing contributors to make this book happen!

@ElsevierNews @sciencedirect https://t.co/WOIcyRLqD5

17K

twweeb retweeted

Ge Yang

@EpisodeYang

over 2 years ago · Cambridge

Sora from @OpenAI is super impressive, but how consistent are the geometries? We ran this through our fast 3DGS pipeline, and here are some of the early results. This is a reconstruction 👉 1/n

147

522

560K

twweeb retweeted

Delip Rao e/σ

@deliprao

over 2 years ago

Crazy AF. Paper studies @_akhaliq and @arankomatsuzaki paper tweets and finds those papers get 2-3x higher citation counts than control. They are now influencers 😄 Whether you like it or not, the TikTokification of academia is here! https://t.co/gXCvUBrMMY

deliprao's tweet photo. Crazy AF. Paper studies @_akhaliq and @arankomatsuzaki paper tweets and finds those papers get 2-3x higher citation counts than control.

They are now influencers 😄 Whether you like it or not, the TikTokification of academia is here!

https://t.co/gXCvUBrMMY https://t.co/zbDH2xEgAI

264

587

417K

Lei Hsiung @twweeb

over 2 years ago

DM me if you would like to have a chat/coffee, or see magic 🎩. See you there! 🚀 #NeurIPS2023 #MachineLearningMagic

Lei Hsiung @twweeb

over 2 years ago

[1/2] 🚀 Pumped to dive into the NeurIPS whirlwind! 🤝 Can't wait to reconnect with pals and make some new ones at @NeurIPSConf. I will present AutoVP at two workshops (DistShift and R0-FoMo) on Friday. Welcome to stop by and learn more about what AutoVP can do. #NeurlPS2023

Lei Hsiung @twweeb

over 2 years ago

Excited to share AutoVP! AutoVP streamlines hyperparameter tuning of visual prompts (VP) and offers a valuable benchmark for accelerating VP development. 🚀 📚check out -> https://t.co/9Dq5XMh54i

twweeb's tweet photo. Excited to share AutoVP!

AutoVP streamlines hyperparameter tuning of visual prompts (VP) and offers a valuable benchmark for accelerating VP development. 🚀

📚check out -> https://t.co/9Dq5XMh54i https://t.co/g03lw1dKQ7

Lei Hsiung @twweeb

over 2 years ago

[2/2] 🔍 Currently geeking out on the mystery between machine learning and data attributes. Our group has an exciting finding that utilizes a model diagnostic tool to improve model learning (Spotlight paper). @LiamZhou98 Paper: https://t.co/sV9qxQeLjT

Lei Hsiung

@twweeb

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users