Kairong Luo

@openhonor

PhD Student @ Tsinghua University | Researching LLM

Joined February 2025

111 Following

88 Followers

19 Posts

Kairong Luo @openhonor

about 1 month ago

✈️ Heading to ICLR 🇧🇷 Apr 22–27. Come to our oral on Fri, Apr 24 (10:30 AM–12:00 PM, Room 202 A/B) or find me at our poster (3:15 PM–5:45 PM, P3-#521). We study why LR decay can hurt curriculum-based LLM pretraining — and how to fix it. Happy to chat!

openhonor's tweet photo. ✈️ Heading to ICLR 🇧🇷 Apr 22–27.

Come to our oral on Fri, Apr 24 (10:30 AM–12:00 PM, Room 202 A/B) or find me at our poster (3:15 PM–5:45 PM, P3-#521).

We study why LR decay can hurt curriculum-based LLM pretraining — and how to fix it.

Happy to chat! https://t.co/F5UBCEi64E

0

7

0

0

3K

Kairong Luo @openhonor

6 months ago

🙏 Great honor to collaborate with @BranSun10, @Dunk_KD1998, @Harry_Chen_, with advice from Professor Kaifeng Lyu @vfleaking, and under the support and leadership of Professor Wenguang Chen. Thanks to all contributors who made this work possible!

0

0

0

0

91

Kairong Luo @openhonor

6 months ago

🚀 Announcing PCMind-2.1-Kaiyuan-2B A new frontier for fully open-source models. Not just weights—full pretraining pipeline & recipe. Specs: 2B params, 2.2T tokens Approach: data-centric pretraining Status: SOTA among fully-open models 🤗 HF: https://t.co/G86k7ja08P

openhonor's tweet photo. 🚀 Announcing PCMind-2.1-Kaiyuan-2B

A new frontier for fully open-source models.
Not just weights—full pretraining pipeline & recipe.

Specs: 2B params, 2.2T tokens
Approach: data-centric pretraining
Status: SOTA among fully-open models

🤗 HF: https://t.co/G86k7ja08P https://t.co/u3PfSHGAKa

9

6

0

0

178

Kairong Luo @openhonor

6 months ago

🔓 Resources We believe in True Open Source: Weights, Data, and Recipes. 📄 Report: https://t.co/ioDzfrA3OV 🤗 Model: https://t.co/G86k7ja08P 📚 Data: https://t.co/JvoDBiUa2o ⚙️ Data Code: https://t.co/44JFpRXVt9 🏋️ Train Code: https://t.co/2VqRjM1ps2

0

0

0

0

66

Kairong Luo @openhonor

6 months ago

⚙️ Infrastructure: Kaiyuan-Spark Built on Spark & Chukonu (https://t.co/8ZSX4EJcti) for scale. - Capabilities: Massive deduplication & mixing. - Speed: Optimized C++ kernels. - Reproducibility: Reconstruct our exact training set via config files.

0

0

0

0

50

Kairong Luo @openhonor

6 months ago

🛠️ Engineering: "Hard Mode" (FP16) Training on FP16-only hardware risks divergence. We modified the architecture for maximum stability: - Sandwich Normalization (controls residual growth) - Logits Soft-Capping (prevents extreme values) - QK-Norm

openhonor's tweet photo. 🛠️ Engineering: "Hard Mode" (FP16)

Training on FP16-only hardware risks divergence.

We modified the architecture for maximum stability:
- Sandwich Normalization (controls residual growth)
- Logits Soft-Capping (prevents extreme values)
- QK-Norm https://t.co/CRK4DHMeIy

0

0

0

0

42

Kairong Luo @openhonor

6 months ago

🏆 Evaluation Results KAIYUAN-2B pushes the fully open-source boundary. ✅ Beats: SmolLM2-1.7B & OLMo-2-1B ✅ Matches: Larger models like YuLan-Mini (2.4B) ⚔️ Approaches: Open-weight leaders (Qwen2-1.5B / Llama3.2-3B) 💪 Exceptionally strong in Chinese, Math, and Code.

openhonor's tweet photo. 🏆 Evaluation Results

KAIYUAN-2B pushes the fully open-source boundary.
✅ Beats: SmolLM2-1.7B & OLMo-2-1B
✅ Matches: Larger models like YuLan-Mini (2.4B)
⚔️ Approaches: Open-weight leaders (Qwen2-1.5B / Llama3.2-3B)
💪 Exceptionally strong in Chinese, Math, and Code. https://t.co/BgDyQsmU6l

openhonor's tweet photo. 🏆 Evaluation Results

KAIYUAN-2B pushes the fully open-source boundary.
✅ Beats: SmolLM2-1.7B & OLMo-2-1B
✅ Matches: Larger models like YuLan-Mini (2.4B)
⚔️ Approaches: Open-weight leaders (Qwen2-1.5B / Llama3.2-3B)
💪 Exceptionally strong in Chinese, Math, and Code. https://t.co/BgDyQsmU6l

openhonor's tweet photo. 🏆 Evaluation Results

KAIYUAN-2B pushes the fully open-source boundary.
✅ Beats: SmolLM2-1.7B & OLMo-2-1B
✅ Matches: Larger models like YuLan-Mini (2.4B)
⚔️ Approaches: Open-weight leaders (Qwen2-1.5B / Llama3.2-3B)
💪 Exceptionally strong in Chinese, Math, and Code. https://t.co/BgDyQsmU6l

0

0

0

0

42

Kairong Luo @openhonor

6 months ago

📈 Innovation 3: Quality Curriculum Samples sorted by quality (ascending), then interleaved globally. - Progressive Exposure: Model sees "textbook quality" data only when mature. - Stable Mix: Domain ratios (Chinese/Code/Math) remain fixed while quality ramps up.

openhonor's tweet photo. 📈 Innovation 3: Quality Curriculum

Samples sorted by quality (ascending), then interleaved globally.

- Progressive Exposure: Model sees "textbook quality" data only when mature.
- Stable Mix: Domain ratios (Chinese/Code/Math) remain fixed while quality ramps up. https://t.co/EgfkgPwPTj

0

0

0

0

35

Kairong Luo @openhonor

6 months ago

🔄 Innovation 2: Strategic Repetition High-quality data is finite. We use a multi-phase approach to repeat the best data without overfitting. Method: Retain top 50% → 30% → 10% in later phases. Result: Top 10% samples seen 4x; low-quality samples seen only once.

openhonor's tweet photo. 🔄 Innovation 2: Strategic Repetition

High-quality data is finite. We use a multi-phase approach to repeat the best data without overfitting.

Method: Retain top 50% → 30% → 10% in later phases.

Result: Top 10% samples seen 4x; low-quality samples seen only once. https://t.co/tXOdA8RzWT

0

0

0

0

32

Kairong Luo @openhonor

6 months ago

📊 Innovation 1: Quantile Probing Stop blind filtering. Start systematic probing. We trained reference models on data subsets across quality quantiles (top 15%... 75%). Insight: Quality is task-dependent. FineWeb-Edu 👑 Knowledge (MMLU) DCLM-Baseline 👑 Reasoning (WinoGrande)

openhonor's tweet photo. 📊 Innovation 1: Quantile Probing

Stop blind filtering. Start systematic probing.
We trained reference models on data subsets across quality quantiles (top 15%... 75%).

Insight: Quality is task-dependent.
FineWeb-Edu 👑 Knowledge (MMLU)
DCLM-Baseline 👑 Reasoning (WinoGrande) https://t.co/RmzfPKGi0q

openhonor's tweet photo. 📊 Innovation 1: Quantile Probing

Stop blind filtering. Start systematic probing.
We trained reference models on data subsets across quality quantiles (top 15%... 75%).

Insight: Quality is task-dependent.
FineWeb-Edu 👑 Knowledge (MMLU)
DCLM-Baseline 👑 Reasoning (WinoGrande) https://t.co/RmzfPKGi0q

openhonor's tweet photo. 📊 Innovation 1: Quantile Probing

Stop blind filtering. Start systematic probing.
We trained reference models on data subsets across quality quantiles (top 15%... 75%).

Insight: Quality is task-dependent.
FineWeb-Edu 👑 Knowledge (MMLU)
DCLM-Baseline 👑 Reasoning (WinoGrande) https://t.co/RmzfPKGi0q

0

0

0

0

33

Kairong Luo @openhonor

6 months ago

🧐Challenge: Heterogeneity & Scarcity Open datasets (DCLM, FineWeb) are great but vastly different. High-quality tokens are potent but rare. How to compare/mix heterogeneous sources? How to max efficiency with sparse "gold" data? Focus on these and run data-centric training.👇

openhonor's tweet photo. 🧐Challenge: Heterogeneity & Scarcity

Open datasets (DCLM, FineWeb) are great but vastly different. High-quality tokens are potent but rare.

How to compare/mix heterogeneous sources?
How to max efficiency with sparse "gold" data?

Focus on these and run data-centric training.👇 https://t.co/OOzreCRYlA

0

0

0

0

43

Kairong Luo @openhonor

about 1 year ago

📢 Come meet us at #ICLR2025! We'll be presenting our Multi-Power Law — a new approach to predicting full pretraining loss curves across LR schedules — during the poster session: 🗓 Friday, April 25 🕒 3:00 PM – 5:30 PM CST 📍 Hall 3 + Hall 2B, Poster #237 Expect your feedback!

Kairong Luo @openhonor

about 1 year ago

🔍How does pretraining loss evolve under different LR schedules? 🌟Meet our Multi-Power Law: predicts the full loss curve for various schedules! 🌟Accurate enough to optimize LR schedules directly. 🌟Result? A WSD-like schedule that outperforms the rest! 🔥Accepted at #ICLR2025

2

29

3

22

19K

1

3

1

3

4K

Kairong Luo @openhonor

about 1 year ago

📢 Come meet us at #ICLR2025! We'll be presenting our Multi-Power Law — a new approach to predicting full pretraining loss curves across LR schedules — during the poster session: 🗓 Friday, April 25 🕒 3:00 PM – 5:30 PM CST 📍 Hall 3 + Hall 2B, Poster #237 Expect your feedback!

openhonor's tweet photo. 📢 Come meet us at #ICLR2025!

We'll be presenting our Multi-Power Law — a new approach to predicting full pretraining loss curves across LR schedules — during the poster session:
🗓 Friday, April 25
🕒 3:00 PM – 5:30 PM CST
📍 Hall 3 + Hall 2B, Poster #237
Expect your feedback! https://t.co/MFgQcU4MEU

0

6

0

1

209

Kairong Luo @openhonor

about 1 year ago

🔹 Using predicted final loss as a surrogate objective, we induce an optimized schedule—matching WSD (Hu et al., 2024) in shape but achieving even lower loss!

openhonor's tweet photo. 🔹 Using predicted final loss as a surrogate objective, we induce an optimized schedule—matching WSD (Hu et al., 2024) in shape but achieving even lower loss! https://t.co/ncXBnjDaCQ

0

4

0

0

324

Kairong Luo @openhonor

about 1 year ago

🔍How does pretraining loss evolve under different LR schedules? 🌟Meet our Multi-Power Law: predicts the full loss curve for various schedules! 🌟Accurate enough to optimize LR schedules directly. 🌟Result? A WSD-like schedule that outperforms the rest! 🔥Accepted at #ICLR2025

2

29

3

22

19K

Kairong Luo @openhonor

about 1 year ago

💡 Results at a glance: 🔹 Our law is fitted on the schedules in the first row—then accurately predicts loss curves for unseen schedules in the second row!

openhonor's tweet photo. 💡 Results at a glance:
🔹 Our law is fitted on the schedules in the first row—then accurately predicts loss curves for unseen schedules in the second row! https://t.co/PyUK5esChP

1

3

0

0

459

Last Seen Users on Sotwe

Trends for you

Most Popular Users