Defa Zhu @zhudefa - Twitter Profile

Defa Zhu @zhudefa

16 days ago

@teortaxesTex Interesting take! Curious how you arrived at that conclusion?

0

67

Defa Zhu @zhudefa

19 days ago

@aHpaBean Got it, thanks for the clarification! That's a very interesting finding. Congrats again on the great work! :)

0

1

0

21

Defa Zhu @zhudefa

about 1 month ago

Which Chinese AI model has the best shot at joining the “Big Three” this year (GPT, Claude, Gemini)?

0

43

Defa Zhu @zhudefa

about 1 month ago

@natolambert Why are they concerned about ByteDance?

0

375

Defa Zhu @zhudefa

about 1 month ago

@aHpaBean Thanks! I’ll look forward to your update once the paper is online. And thank you for your kind words about our Hyper-Connections work — I really appreciate your interest. Looking forward to seeing more of your work in this direction as well!

1

0

51

Defa Zhu @zhudefa

about 2 months ago

I hope this is not a whale fall, but just a temporary stranding.

0

70

Defa Zhu @zhudefa

about 2 months ago

After reading the v4 tech report, a model of this scale really shouldn't be this unstable. They need to look into the issue thoroughly. I think there are indeed some internal problems at DeepSeek, but I hope they can bounce back soon.

0

1

0

100

Defa Zhu @zhudefa

3 months ago

God is silicon-based.

0

71

Defa Zhu @zhudefa

5 months ago

@xiaolonw This is a solid piece of work. I found reading your paper very inspiring. Congratulations on such a great result—I must admit, I'm a bit jealous of this work!

0

16

Defa Zhu @zhudefa

5 months ago

@ytz2024 👍👍

0

34

Defa Zhu @zhudefa

5 months ago

@_arohan_ @YouJiacheng https://t.co/su6VBwGfG6

0

28

Defa Zhu @zhudefa

5 months ago

@_arohan_ @YouJiacheng While we independently developed over-encoding, we acknowledge that Ngrammer introduced a similar technique earlier. However, the primary contribution of our work, the Over-Tokenized Transformer, lies in the discovery that the input ngram vocab follows a log-linear scaling law.

0

49

Defa Zhu @zhudefa

about 1 year ago

@leijun I am looking forward to buy a MI car.

0

36

Defa Zhu @zhudefa

over 1 year ago

@MadHermitHimbo The computation of concatenating is non - negligible. The computation of the unembedding layer is quite substantial. When concatenating, this part of the computation will be magnified by n times.

0

1

0

36

Defa Zhu @zhudefa

over 1 year ago

🎉 Thrilled to announce that our paper "Hyper-Connections" has been accepted at ICLR 2025! 🚀 💡 Discover how Hyper-Connections improve performance in LLMs & vision models. Faster convergence, better results! 💪 📄 Paper: https://t.co/bJwYFTpCMC #LLM #LLMs #AI

zhudefa's tweet photo. 🎉 Thrilled to announce that our paper "Hyper-Connections" has been accepted at ICLR 2025! 🚀

💡 Discover how Hyper-Connections improve performance in LLMs & vision models. Faster convergence, better results! 💪

📄 Paper: https://t.co/bJwYFTpCMC
#LLM #LLMs #AI https://t.co/0re2ByQ5v7

2

4

0

328

Defa Zhu @zhudefa

over 1 year ago

@mbalunovic How about Part II

1

0

143

Defa Zhu @zhudefa

over 1 year ago

🚀 Check out UltraMem, accepted at ICLR 2025 🌟 🔑 Highlights: 2-6x faster inference vs. MoE 🚀 State-of-the-art performance with minimal memory access 🧠 Scales better than MoE models 📈https://t.co/NUccimBKzk #LLMs #MachineLearning #NLP #Transformers

0

2

0

355

Defa Zhu @zhudefa

over 1 year ago

🚀 Excited to share "Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling"! 🎉 Discover how scaling input vocabularies with multi-gram tokens improves LLM performance. 💡 Read more: https://t.co/pMOkbFY6y4 #AI #LLM #NLP

zhudefa's tweet photo. 🚀 Excited to share "Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling"! 🎉

Discover how scaling input vocabularies with multi-gram tokens improves LLM performance. 💡

Read more: https://t.co/pMOkbFY6y4
#AI #LLM #NLP https://t.co/M8LrhlKDl4

0

1

0

326

Defa Zhu

@zhudefa

Last Seen Users on Sotwe

Trends for you

Most Popular Users