Hoyoon Byun @hoyunb - Twitter Profile

hoyunb retweeted

26 days ago

[1/6] 🚨 New paper! Why do dictionary-based explanations fail under distribution shift? We identify a geometric cause and propose Geometry-Adaptive Explainer (GAE), a training-free method that restores explanation faithfulness. 🌐 https://t.co/lyHAbz3KpX Details below 🧵

lsj9862's tweet photo. [1/6]
🚨 New paper!

Why do dictionary-based explanations fail under distribution shift?

We identify a geometric cause and propose Geometry-Adaptive Explainer (GAE), a training-free method that restores explanation faithfulness.

🌐 https://t.co/lyHAbz3KpX

Details below 🧵 https://t.co/ygq1evLeQp

5

4

2

0

125

Hoyoon Byun

@hoyunb

26 days ago

🎉 Update: BHyT has been accepted to ICML 2026 Happy to share that our previously archived paper has been accepted to ICML 2026. I’m also grateful to have been selected as a Gold Reviewer for ICML 2026. Sincere thanks again to my co-authors @choiyj9803, @Gold_Milkyway, especially Dr. Sungrae Park from @upstageai , and to my advisor, Prof. @KyungwooSong at Yonsei University, for their support and guidance. #icml2026 #upstageai #yonsei

Hoyoon Byun

@hoyunb

5 months ago

🚀 Excited to share our new work: BHyT - a stable & efficient alternative to Pre-LayerNorm for LLMs 📜 https://t.co/m8l1JWcYjf Pre-LN (e.g., RMSNorm) is stable, but less efficient and suffers from the curse of depth. Normalization-free (e.g., DyT) methods aim to remove normalization overhead, without directly controlling depth-wise variance growth. BHyT is a drop-in replacement that keeps activations bounded (non-saturating) + reduces norm overhead. BHyT v.s. RMSNorm ✅ 15.8% faster training ✅ 4.2% higher generation throughput ✅ Matches or improves downstream performance & robustness 🧵Details below:

hoyunb's tweet photo. 🚀 Excited to share our new work: BHyT - a stable & efficient alternative to Pre-LayerNorm for LLMs

📜 https://t.co/m8l1JWcYjf

Pre-LN (e.g., RMSNorm) is stable, but less efficient and suffers from the curse of depth.
Normalization-free (e.g., DyT) methods aim to remove normalization overhead, without directly controlling depth-wise variance growth.

BHyT is a drop-in replacement that keeps activations bounded (non-saturating) + reduces norm overhead.

BHyT v.s. RMSNorm
✅ 15.8% faster training
✅ 4.2% higher generation throughput
✅ Matches or improves downstream performance & robustness

🧵Details below:

1

5

2

1

589

1

7

2

0

260

Hoyoon Byun

@hoyunb

5 months ago

6/ Takeaway ✅ BHyT is stability-aware bounding + efficiency via variance approximation. A practical path to train deeper LLMs with less normalization overhead. 🙏 Big thanks to co-authors @choiyj9803 , @Gold_Milkyway , Sungrae Park, and my advisor @KyungwooSong

0

60

Hoyoon Byun

@hoyunb

5 months ago

🚀 Excited to share our new work: BHyT - a stable & efficient alternative to Pre-LayerNorm for LLMs 📜 https://t.co/m8l1JWcYjf Pre-LN (e.g., RMSNorm) is stable, but less efficient and suffers from the curse of depth. Normalization-free (e.g., DyT) methods aim to remove normalization overhead, without directly controlling depth-wise variance growth. BHyT is a drop-in replacement that keeps activations bounded (non-saturating) + reduces norm overhead. BHyT v.s. RMSNorm ✅ 15.8% faster training ✅ 4.2% higher generation throughput ✅ Matches or improves downstream performance & robustness 🧵Details below:

1

5

2

1

589

Who to follow

Changdae Oh ✈️ ACL 2026

@Changdae_Oh

Intern @Meta Superintelligence Labs | PhD student @ UW-Madison | Prev: @NAVER_AI_Lab, @CarnegieMellon, @USeoul

Taero Kim

@Gold_Milkyway

Ph.D student @ Yonsei University Research Interest: OOD Generalization, Causality, Frontier Architecture of LLM, Efficient LLM

Andy Liu

@uilydna

PhD @LTIatCMU • mudd '23

Hoyoon Byun

@hoyunb

5 months ago

5/ Evaluation results BHyT achieves lower pretrain loss, higher downstream average accuracy, and much shorter wall time than Peri-LN:

hoyunb's tweet photo. 5/ Evaluation results
BHyT achieves lower pretrain loss, higher downstream average accuracy, and much shorter wall time than Peri-LN: https://t.co/urKorHsJGt

1

0

44

hoyunb retweeted

Kyungwoo.Song

@KyungwooSong

5 months ago

[4 × ICLR2026] Four papers have been accepted to #ICLR2026 I’m pleased to share that four papers I contributed to were accepted. These works are all first-authored by graduate students in our lab! Across these four papers, we develop methods that make ML and LLMs more reliable under real-world uncertainty, distribution shift, spurious correlations, and limited supervision. Each project pairs practical algorithms with principled theory to improve robustness, calibration, and safety. 1. Multi-LLM Adaptive Conformal Inference for Reliable LLM Response 2. Uncertainty-driven Embedding Convolution 3. Semi-Supervised Preference Optimization with Limited Feedback 4. Spurious Correlation-Aware Embedding Regularization for Worst-Group Robustness

4

52

6

11

5K

hoyunb retweeted

Taero Kim @Gold_Milkyway

6 months ago

📢 New paper alert! [🧵1/7] MIDUS: Memory-Infused Depth Up-Scaling [https://t.co/9acesKzLkj] 💡Up-scaling LLM depth without relying on heavy FFNs! 😋 Key idea: swap the FFN modules for our new sparse memory layer, HML. - Head-wise Memory Layer (HML) does attention head-wise top-k retrieval and writes useful information back into the hidden states. - In depth up-scaling / CPT evaluations, we see lower perplexity + higher downstream accuracy on 1B/8B, while staying lightweight and high-throughput.

Gold_Milkyway's tweet photo. 📢 New paper alert! [🧵1/7]
MIDUS: Memory-Infused Depth Up-Scaling [https://t.co/9acesKzLkj]
💡Up-scaling LLM depth without relying on heavy FFNs!

😋 Key idea: swap the FFN modules for our new sparse memory layer, HML.
- Head-wise Memory Layer (HML) does attention head-wise top-k retrieval and writes useful information back into the hidden states.
- In depth up-scaling / CPT evaluations, we see lower perplexity + higher downstream accuracy on 1B/8B, while staying lightweight and high-throughput.

1

6

3

0

216

Hoyoon Byun

@hoyunb

7 months ago

I'll be @NeurIPSConf to present our paper: CCL: Causal-aware In-context Learning for Out-of-Distribution Generalization (https://t.co/6YwAD1ydy2) TL;DR: CCL is a VAE-based causal representation learning framework that captures a query’s underlying problem intent and selects intent-aligned examples, making in-context learning more robust to OOD setting. Feel free to stop by anytime - I’d love to chat about In-context Learning, Causal Representation Learning, or anything related! 📍 Poster #3819 Dec 3 (Wed), 16:30 - 17:30 PST

hoyunb's tweet photo. I'll be @NeurIPSConf to present our paper: CCL: Causal-aware In-context Learning for Out-of-Distribution Generalization (https://t.co/6YwAD1ydy2)

TL;DR: CCL is a VAE-based causal representation learning framework that captures a query’s underlying problem intent and selects intent-aligned examples, making in-context learning more robust to OOD setting.

Feel free to stop by anytime - I’d love to chat about In-context Learning, Causal Representation Learning, or anything related!

📍 Poster #3819 Dec 3 (Wed), 16:30 - 17:30 PST

0

1

0

81

Hoyoon Byun

@hoyunb

about 1 year ago

@YonghanJung @eliasbareinboim @ildiazm 축하드립니다!!🎉🔥

1

0

82

hoyunb retweeted

Changdae Oh ✈️ ACL 2026

@Changdae_Oh

almost 2 years ago

Our paper "Towards Calibrated Robust Fine-Tuning of Vision-Language Models" was accepted to #NeurIPS2024🥳 📃:https://t.co/cjxtggibaO [1/n] To pursue uncertainty calibration as well as generalization under distribution shifts, we derived a novel theorem with a practical impl!

Changdae_Oh's tweet photo. Our paper "Towards Calibrated Robust Fine-Tuning of Vision-Language Models" was accepted to #NeurIPS2024🥳
📃:https://t.co/cjxtggibaO

[1/n] To pursue uncertainty calibration as well as generalization under distribution shifts, we derived a
novel theorem with a practical impl! https://t.co/XsmU3nQe7g

1

70

14

16

7K

hoyunb retweeted

Elias Bareinboim

@eliasbareinboim

over 2 years ago

Thanks for sharing your thoughts, Amit. Recall, just to add some clarity in terms of context, my comment regarding @ylecun & @yudapearl's posts is neither about generative nor about deep learning versus causal; those are pacified issues in the literature. In other words, we now have some principled understanding of how these modes of reasoning relate. Also, I haven’t made any claim about LLMs & Causality, at least not in this thread. Putting it simply, my message was triggered by LeCun’s original tweet showing an architecture that looked like what folks in RL have been doing. Since I have been studying RL for a long time and know that it’s insufficient for causal reasoning, in a broad sense (as elaborated here: https://t.co/CV6z6KMzZo), I felt compelled to ask for clarification regarding the causal aspect of his architecture. It was a bit surprising to me that he mentioned that RL was not really needed, going in the opposite direction of what I would expect (i.e., that RL itself is insufficient). (There is also the literature on causal discovery, which in its most basic form attempts to learn a causal model from observational data. One of the conclusions is that this is almost never possible, and we usually end up with an equivalence class of models.) In a bit more technical terms, it's understood that pure observational data, devoid of causal bias, is insufficient for making statements about interventions or counterfactuals, as we have demonstrated, for example, in Thm. 1 in https://t.co/MnlAuEgtoh. Given this impossibility result, we illustrate how integrating proper causal inductive bias with neural networks enables the performance of inferences using 'neural causal models,' as first shown in https://t.co/pSYJwXb50h. Furthermore, we can also perform counterfactual inferences within the realm of images thr. causal abstractions and representations (e.g., https://t.co/snwf6ElKDx or https://t.co/7kLkE0zYNS). In essence, my post does not make a negative claim but rather offers a nuanced scientific perspective on the interrelation between causal and neural modes of reasoning, as well as the significance of abstractions and representations. I hope this clarifies the discussion. Having said that, I am curious to understand in what ways both comments are valuable, given that Yann’s perspective on causality and its contrast with the existing literature was not clear to me; curious to learn from your insights.

eliasbareinboim's tweet photo. Thanks for sharing your thoughts, Amit. Recall, just to add some clarity in terms of context, my comment regarding @ylecun & @yudapearl's posts is neither about generative nor about deep learning versus causal; those are pacified issues in the literature. In other words, we now have some principled understanding of how these modes of reasoning relate. Also, I haven’t made any claim about LLMs & Causality, at least not in this thread.

Putting it simply, my message was triggered by LeCun’s original tweet showing an architecture that looked like what folks in RL have been doing. Since I have been studying RL for a long time and know that it’s insufficient for causal reasoning, in a broad sense (as elaborated here: https://t.co/CV6z6KMzZo), I felt compelled to ask for clarification regarding the causal aspect of his architecture. It was a bit surprising to me that he mentioned that RL was not really needed, going in the opposite direction of what I would expect (i.e., that RL itself is insufficient). (There is also the literature on causal discovery, which in its most basic form attempts to learn a causal model from observational data. One of the conclusions is that this is almost never possible, and we usually end up with an equivalence class of models.)

In a bit more technical terms, it's understood that pure observational data, devoid of causal bias, is insufficient for making statements about interventions or counterfactuals, as we have demonstrated, for example, in Thm. 1 in https://t.co/MnlAuEgtoh. Given this impossibility result, we illustrate how integrating proper causal inductive bias with neural networks enables the performance of inferences using 'neural causal models,' as first shown in https://t.co/pSYJwXb50h. Furthermore, we can also perform counterfactual inferences within the realm of images thr. causal abstractions and representations (e.g., https://t.co/snwf6ElKDx or https://t.co/7kLkE0zYNS). In essence, my post does not make a negative claim but rather offers a nuanced scientific perspective on the interrelation between causal and neural modes of reasoning, as well as the significance of abstractions and representations. I hope this clarifies the discussion.

Having said that, I am curious to understand in what ways both comments are valuable, given that Yann’s perspective on causality and its contrast with the existing literature was not clear to me; curious to learn from your insights.

4

59

16

91

32K

hoyunb retweeted

SummarizedML @summarizedml

about 3 years ago

A new method, robust prompt learning with knowledge graph (RPLKG), based on the knowledge graph, which can be used for 📄 https://t.co/DXeGFjTttJ

summarizedml's tweet photo. A new method, robust prompt learning with knowledge graph (RPLKG), based on the knowledge graph, which can be used for
📄 https://t.co/DXeGFjTttJ https://t.co/yqaVQvx0Dl

0

3

1

341

Hoyoon Byun

@hoyunb

about 3 years ago

@raphaelmilliere @readwise save thread

1

0

70

hoyunb retweeted

Bernhard Schölkopf @bschoelkopf

almost 4 years ago

Our 2012 paper ‘On causal and anticausal learning’ just received a Test of Time Honorable Mention at @icmlconf #ICML2022: https://t.co/gc1FZYSOyP. I am really grateful, and would like to use this occasion for some thoughts on causality and machine learning:

7

986

149

191

0

Hoyoon Byun

@hoyunb

over 3 years ago

Countdowns to top CV/NLP/ML/Robotics/AI conference deadlines https://t.co/oGLj9O1wGq

0

1

0

91

Hoyoon Byun

@hoyunb

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users