Haoran Xu @fe1ixxu - Twitter Profile

Pinned Tweet

22 days ago

Thrilled to see this work out in the world. 🚀 We’ve been building new frontier models for coding and reasoning, including MAI-Code-1-Flash and MAI-Thinking-1. Model details: MAI-Code-1-Flash: https://t.co/BvKDZwdORJ MAI-Thinking-1: https://t.co/5qPiEJZV96

fe1ixxu's tweet photo. Thrilled to see this work out in the world. 🚀

We’ve been building new frontier models for coding and reasoning, including MAI-Code-1-Flash and MAI-Thinking-1.

Model details:
MAI-Code-1-Flash: https://t.co/BvKDZwdORJ
MAI-Thinking-1: https://t.co/5qPiEJZV96 https://t.co/Z51h2Uw0FG

0

18

2

0

607

fe1ixxu retweeted

Yang Liu @nlpyang

22 days ago

Excited to introduce our MAI Code model at Microsoft Build. As shared in the session, this is a MoE (5B active / 137B total) initialized from an MAI pretrained model and trained for real user scenarios with product harnesses. I’m proud to have served as the research lead for this effort, and even prouder of what the team has achieved. It’s a beast for its size. Stay tuned — a larger model could come :)

nlpyang's tweet photo. Excited to introduce our MAI Code model at Microsoft Build. As shared in the session, this is a MoE (5B active / 137B total) initialized from an MAI pretrained model and trained for real user scenarios with product harnesses. I’m proud to have served as the research lead for this effort, and even prouder of what the team has achieved. It’s a beast for its size. Stay tuned — a larger model could come :)

2

38

7

3

4K

Haoran Xu @fe1ixxu

29 days ago

@kentonmurray Congrats!! The best advisor in the world!!

0

1

0

119

fe1ixxu retweeted

Tianjian Li @tli104

10 months ago

Language models often produce repetitive responses, and this issue is further amplified by post-training. In this work, we introduce DARLING, a method that explicitly optimizes for both response diversity and quality within online reinforcement learning!

2

91

27

47

11K

Who to follow

Yunmo Chen

@YunmoChen

MTS @MicrosoftAI | Previously @Bloomberg @jhuclsp @Apple @MSFTResearch @Amazon | Opinions are my own

Aaron Mueller

@amuuueller

Asst. Prof. in CS at @BU_Tweets ≡ {Mechanistic, causal} {interpretability, NLP}

Stella Li ✈️ ICML🇰🇷

@StellaLisy

PhD student @uwnlp | visiting researcher @AIatMeta | undergrad @jhuclsp #NLProc

fe1ixxu retweeted

Liliang Ren

@liliang_ren

12 months ago

Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮 Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput at 32K generation length with vLLM. 🤯 Model: https://t.co/bYFanHgikH Codebase: https://t.co/M2GLiw3nUl Blog: https://t.co/ka7yjL29HQ Paper: https://t.co/lUF2xwYQWq (1/8)

liliang_ren's tweet photo. Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮
Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput at 32K generation length with vLLM. 🤯

Model: https://t.co/bYFanHgikH
Codebase: https://t.co/M2GLiw3nUl
Blog: https://t.co/ka7yjL29HQ
Paper: https://t.co/lUF2xwYQWq

(1/8)

2

362

69

208

44K

fe1ixxu retweeted

JHU Computer Science @JHUCompSci

about 1 year ago

Fluent, fast, and fair—in collaboration with @MSFTResearch, Johns Hopkins computer scientists (including @fe1ixxu & @kentonmurray) have built a new machine translation model that achieves top-tier performance across 50 diverse languages. Learn more: https://t.co/4vBEEDWIgS

JHUCompSci's tweet photo. Fluent, fast, and fair—in collaboration with @MSFTResearch, Johns Hopkins computer scientists (including @fe1ixxu & @kentonmurray) have built a new machine translation model that achieves top-tier performance across 50 diverse languages. Learn more: https://t.co/4vBEEDWIgS https://t.co/NcWo4r6yPs

0

7

3

0

778

fe1ixxu retweeted

Satya Nadella

@satyanadella

about 1 year ago

Another big step forward for our SLM Phi family, with new reasoning models that once again redefine what is possible with small and efficient AI.

63

790

91

64

139K

fe1ixxu retweeted

Weizhu Chen @WeizhuChen

about 1 year ago

Glad to see the team used a 3.8B model (Phi-4-mini-reasoning) to achieve 94.6 in Math-500 and 57.5 in AIME-24. arxiv: https://t.co/741JoHgK4m hf: https://t.co/PVbW4jyJTu Azure: https://t.co/V2QusWIAgc

WeizhuChen's tweet photo. Glad to see the team used a 3.8B model (Phi-4-mini-reasoning) to achieve 94.6 in Math-500 and 57.5 in AIME-24.
arxiv: https://t.co/741JoHgK4m
hf: https://t.co/PVbW4jyJTu
Azure: https://t.co/V2QusWIAgc https://t.co/eIV0tqTcRA

2

28

4

9

3K

Haoran Xu @fe1ixxu

about 1 year ago

Model: https://t.co/jOuHg9R4Dg

0

1

0

1

132

Haoran Xu @fe1ixxu

about 1 year ago

🚀 Phi-4-Mini-Reasoning is finally out! Two months ago, we introduced a reasoning-enhanced Phi-4-Mini. Since then, we've taken it further—a compact model with robust reasoning abilities that even surpass, models up to 2x its size. Paper: https://t.co/GcSxxwVZX4

fe1ixxu's tweet photo. 🚀 Phi-4-Mini-Reasoning is finally out!

Two months ago, we introduced a reasoning-enhanced Phi-4-Mini. Since then, we've taken it further—a compact model with robust reasoning abilities that even surpass, models up to 2x its size.

Paper: https://t.co/GcSxxwVZX4 https://t.co/pxSBXgO9ER

3

31

5

8

3K

fe1ixxu retweeted

HyoJung Han @h__j___han

about 1 year ago

I'll be presenting our work, VocADT, tomorrow at #ICLR2025✨ Check out our poster session: https://t.co/bVOYMDQBnz 🗓️Thu 24 Apr 3 p.m. - 5:30 p.m 📍Hall 3 + Hall 2B #250 So excited to be attending @iclr_conf in Singapore🇸🇬

h__j___han's tweet photo. I'll be presenting our work, VocADT, tomorrow at #ICLR2025✨
Check out our poster session: https://t.co/bVOYMDQBnz
🗓️Thu 24 Apr 3 p.m. - 5:30 p.m
📍Hall 3 + Hall 2B #250
So excited to be attending @iclr_conf in Singapore🇸🇬 https://t.co/BgfLmYkGoF

0

31

12

2

3K

fe1ixxu retweeted

Young @yjkim362

over 1 year ago

We also arxived #Phi-4-Mini technical report to cover our innovations for building strong lightweight multimodal model Phi-4-multimodal and language model Phi-4-mini. We use mixture-of-LoRAs technique to combine text, image, speech modalities together without interference.

1

54

8

15

14K

Haoran Xu @fe1ixxu

over 1 year ago

Excited to share that Phi-4-mini has been released! This was my first time rolling up my sleeves and experiencing the entire text training process. We also have a reasoning-enhanced Phi-4—outperforming many 7B reasoning models—which we plan to release very soon. Stay tuned!

Weizhu Chen @WeizhuChen

over 1 year ago

We released Phi-4-mini (3.8B base in LLM), a new SLM excelling in language, vision, and audio through a mixture-of-LoRA, uniting three modalities in one model. I am so impressed with its new audio capability. I hope you can play with it and share with us your feedback. We also trained a reasoning model, achieving 90.4 on Math-500. Model: https://t.co/QJx65DtQPv Paper: https://t.co/Jxktctw1pv Blog: https://t.co/rzaA0x6e6I

WeizhuChen's tweet photo. We released Phi-4-mini (3.8B base in LLM), a new SLM excelling in language, vision, and audio through a mixture-of-LoRA, uniting three modalities in one model. I am so impressed with its new audio capability. I hope you can play with it and share with us your feedback. We also trained a reasoning model, achieving 90.4 on Math-500.
Model: https://t.co/QJx65DtQPv
Paper: https://t.co/Jxktctw1pv
Blog: https://t.co/rzaA0x6e6I

48

718

142

316

89K

0

16

0

885

Haoran Xu @fe1ixxu

over 1 year ago

Excited to share that X-ALMA got accepted at #ICLR2025! See you in Singapore!

Haoran Xu @fe1ixxu

over 1 year ago

Multilingual models are usually heavily skewed in favor of high-resource languages. We change this with X-ALMA: an LLM-based translator committed to ensuring top-tier performance across 50 diverse languages, regardless of their resource levels! Paper: https://t.co/O4M5LDGdAB

fe1ixxu's tweet photo. Multilingual models are usually heavily skewed in favor of high-resource languages.

We change this with X-ALMA: an LLM-based translator committed to ensuring top-tier performance across 50 diverse languages, regardless of their resource levels!

Paper: https://t.co/O4M5LDGdAB https://t.co/YtH5E4ThEG

4

48

13

19

9K

0

7

2

1

662

Haoran Xu @fe1ixxu

over 1 year ago

Glad to see CPO is in a lecture now!

Wei Xu

@cocoweixu

over 1 year ago

We wrapped up CS 8803 "Large Language Model" class at @GeorgiaTech for Fall 2024. Here is the reading list: • learning from human preferences (PPO, DPO, SimPO, CPO, RRHF, ORPO, CTO) • real-world LLM (Llama-3, Aya, Arena's) • efficient LLM (MoMa, LoRA, QLoRA, LESS)

cocoweixu's tweet photo. We wrapped up CS 8803 "Large Language Model" class at @GeorgiaTech for Fall 2024.

Here is the reading list:

• learning from human preferences (PPO, DPO, SimPO, CPO, RRHF, ORPO, CTO)
• real-world LLM (Llama-3, Aya, Arena's)
• efficient LLM (MoMa, LoRA, QLoRA, LESS) https://t.co/W6R3fIUafy

14

1K

164

1K

96K

0

8

1

0

902

Haoran Xu @fe1ixxu

over 1 year ago

Work done with my amazing co-workers: @kentonmurray, Philipp Koehn, @akikoe_, @MosesSMT (Hieu Hoang), and @HudaKhay !

0

1

0

269

Haoran Xu @fe1ixxu

over 1 year ago

Multilingual models are usually heavily skewed in favor of high-resource languages. We change this with X-ALMA: an LLM-based translator committed to ensuring top-tier performance across 50 diverse languages, regardless of their resource levels! Paper: https://t.co/O4M5LDGdAB

4

48

13

19

9K