Pingbang Hu 🇹🇼

@PingbangHu

I work on, with, and for data. Ph.D. candidate @UofIllinois. Fellows @AnthropicAI. Interns @ SIG @amazon @jouhouken. Alumni @Umich @SJTU1896.

Earth

Joined July 2021

369 Following

3K Followers

449 Posts

Pinned Tweet

Pingbang Hu 🇹🇼 @PingbangHu

24 days ago

New Preprint Alert ⏰ Propose Dr. Post-training 🩺 a Data Regularization framework, making your data more effective with ZERO overheads Experiments demonstrate faster training convergence across SFT, RLHF, RLVR over SOTA data selection, opening up new data optimization designs!

PingbangHu's tweet photo. New Preprint Alert ⏰

Propose Dr. Post-training 🩺 a Data Regularization framework, making your data more effective with ZERO overheads

Experiments demonstrate faster training convergence across SFT, RLHF, RLVR over SOTA data selection, opening up new data optimization designs! https://t.co/MePVPMFXol

4

151

32

102

27K

Pingbang Hu 🇹🇼 @PingbangHu

6 days ago

Will spend the summer at Susquehanna (SIG) as an ML/QR intern. Starting tmr. I will not be a finance bro I promise.

PingbangHu's tweet photo. Will spend the summer at Susquehanna (SIG) as an ML/QR intern. Starting tmr.

I will not be a finance bro I promise. https://t.co/NTsAkHFBZX

3

55

0

7

5K

Pingbang Hu 🇹🇼 @PingbangHu

7 days ago

@HelloVyom sorry but I'm pretty sure deepseek (or High-Flyer) is not the biggest "quant" competitors of USA.

2

9

0

1

872

Pingbang Hu 🇹🇼 @PingbangHu

9 days ago

@kaiqu_liang previous fellow here. Enjoy!

1

2

0

1

397

Who to follow

Verified account

CS Ph.D. student @Columbia & Research Scientist @NVIDIARobotic | Prev. Meta FAIR Embodied AI, Boston Dynamics AI Institute, Google X #Vision #Robotics #Learning

Verified account

PhD student at Columbia University

Michigan SLED Lab

Situated Language and Embodied Dialogue (SLED) research lab at @michigan_AI, led by Joyce Chai.

PingbangHu retweeted

9 days ago

We've raised $65 billion in Series H funding at a $965 billion post-money valuation, led by @AltimeterCap, Dragoneer, @Greenoaks, and @sequoia. This investment will help us advance our research and expand our capacity to meet growing demand for Claude.

1K

22K

2K

2K

8M

PingbangHu retweeted

Pingbang Hu 🇹🇼 @PingbangHu

24 days ago

New Preprint Alert ⏰ Propose Dr. Post-training 🩺 a Data Regularization framework, making your data more effective with ZERO overheads Experiments demonstrate faster training convergence across SFT, RLHF, RLVR over SOTA data selection, opening up new data optimization designs!

PingbangHu's tweet photo. New Preprint Alert ⏰

Propose Dr. Post-training 🩺 a Data Regularization framework, making your data more effective with ZERO overheads

Experiments demonstrate faster training convergence across SFT, RLHF, RLVR over SOTA data selection, opening up new data optimization designs! https://t.co/MePVPMFXol

4

151

32

102

27K

Pingbang Hu 🇹🇼 @PingbangHu

12 days ago

@joemelko for real

0

0

0

0

56

Pingbang Hu 🇹🇼 @PingbangHu

13 days ago

while ago @joemelko told me that the post-training technique I'm working on (https://t.co/SJOcHdEWR4) will also work in pretraining, if not then it's skill issue. now given this promising signal I'm ready. only problem is where's the gpu credit 😭

机器之心 JIQIZHIXIN

13 days ago

There is now a smarter way to pick data for training LLMs! Enter OPUS! This is an ICML Oral paper from SJTU, Alibaba, UW–Madison, UIUC, and Mila - Quebec AI Institute. The proposed method dynamically and intelligently selects the most impactful data for LLM pre-training in every single training iteration, bringing principled, continuous data optimization to the forefront. This approach aims to significantly boost training efficiency and yield higher-quality LLMs, outperforming conventional static data selection methods across diverse language tasks. OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper: https://t.co/zgAzuwoTJf Our report: https://t.co/tUCDBOHV5q 📬 #PapersAccepted by Jiqizhixin

jiqizhixin's tweet photo. There is now a smarter way to pick data for training LLMs!

Enter OPUS!

This is an ICML Oral paper from SJTU, Alibaba, UW–Madison, UIUC, and Mila - Quebec AI Institute.

The proposed method dynamically and intelligently selects the most impactful data for LLM pre-training in every single training iteration, bringing principled, continuous data optimization to the forefront.

This approach aims to significantly boost training efficiency and yield higher-quality LLMs, outperforming conventional static data selection methods across diverse language tasks.

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper: https://t.co/zgAzuwoTJf

Our report: https://t.co/tUCDBOHV5q

📬 #PapersAccepted by Jiqizhixin

4

348

59

376

90K

2

59

5

46

12K

Pingbang Hu 🇹🇼 @PingbangHu

12 days ago

@ShaoboWang6 @joemelko happy to

0

0

0

0

68

PingbangHu retweeted

Pingbang Hu 🇹🇼 @PingbangHu

24 days ago

This is really my best work so far and am generally proud of sharing this with people who are interested. Side note: I'm right now in Anthropic team matching process after the fellowship if someone know teams that are doing data please dm me!!!

3

51

2

18

6K

PingbangHu retweeted

Pingbang Hu 🇹🇼 @PingbangHu

23 days ago

BTW, we have a blog post as well! Please check out in case the paper is too long 😗 Paper: https://t.co/Nqo4aFcEnf Blog: https://t.co/t1OGqzZhuz

PingbangHu's tweet photo. BTW, we have a blog post as well!
Please check out in case the paper is too long 😗

Paper: https://t.co/Nqo4aFcEnf
Blog: https://t.co/t1OGqzZhuz https://t.co/fQUHtuNW2P

0

64

6

32

6K

Pingbang Hu 🇹🇼 @PingbangHu

15 days ago

Day 140 I left a little fan for the next tenant, enjoy.

PingbangHu's tweet photo. Day 140

I left a little fan for the next tenant, enjoy. https://t.co/yjb2ZZGjzt

Pingbang Hu 🇹🇼 @PingbangHu

5 months ago

Day 1 in SF At least I got a bed

PingbangHu's tweet photo. Day 1 in SF

At least I got a bed https://t.co/bV6yGelxzQ

0

16

0

1

7K

0

25

0

1

5K

PingbangHu retweeted

Andrej Karpathy

18 days ago

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

8K

150K

11K

14K

27M

Pingbang Hu 🇹🇼 @PingbangHu

20 days ago

0

1

0

0

399

Pingbang Hu 🇹🇼 @PingbangHu

22 days ago

@HighFreqAsuka fair 🤣

0

2

0

1

2K

Pingbang Hu 🇹🇼 @PingbangHu

22 days ago

@joemelko @HighFreqAsuka im listening

1

1

0

0

171

PingbangHu retweeted

23 days ago

fun little artifact, i worked on something similar to freon last year and started writing an (unedited) post that is hidden on my blog: https://t.co/4flVlmPEnC very naive implementation of steepest descent under various p using full svd: https://t.co/civIcS5XP4

2

21

7

14

4K

Pingbang Hu 🇹🇼 @PingbangHu

23 days ago

BTW, we have a blog post as well! Please check out in case the paper is too long 😗 Paper: https://t.co/Nqo4aFcEnf Blog: https://t.co/t1OGqzZhuz

PingbangHu's tweet photo. BTW, we have a blog post as well!
Please check out in case the paper is too long 😗

Paper: https://t.co/Nqo4aFcEnf
Blog: https://t.co/t1OGqzZhuz https://t.co/fQUHtuNW2P

Pingbang Hu 🇹🇼 @PingbangHu

24 days ago

New Preprint Alert ⏰ Propose Dr. Post-training 🩺 a Data Regularization framework, making your data more effective with ZERO overheads Experiments demonstrate faster training convergence across SFT, RLHF, RLVR over SOTA data selection, opening up new data optimization designs!

PingbangHu's tweet photo. New Preprint Alert ⏰

Propose Dr. Post-training 🩺 a Data Regularization framework, making your data more effective with ZERO overheads

Experiments demonstrate faster training convergence across SFT, RLHF, RLVR over SOTA data selection, opening up new data optimization designs! https://t.co/MePVPMFXol

4

151

32

102

27K

0

64

6

32

6K

Pingbang Hu 🇹🇼 @PingbangHu

24 days ago

@iamashtonchew Glad you like it!!

0

1

0

0

184

Pingbang Hu 🇹🇼 @PingbangHu

24 days ago

New Preprint Alert ⏰ Propose Dr. Post-training 🩺 a Data Regularization framework, making your data more effective with ZERO overheads Experiments demonstrate faster training convergence across SFT, RLHF, RLVR over SOTA data selection, opening up new data optimization designs!

PingbangHu's tweet photo. New Preprint Alert ⏰

Propose Dr. Post-training 🩺 a Data Regularization framework, making your data more effective with ZERO overheads

Experiments demonstrate faster training convergence across SFT, RLHF, RLVR over SOTA data selection, opening up new data optimization designs! https://t.co/MePVPMFXol

4

151

32

102

27K

Last Seen Users on Sotwe

Trends for you

Most Popular Users