卫青 @dcweiqing - Twitter Profile

13 days ago

We live in a weird time of overhyping slop that will be forgotten about in weeks. Linux and Python are both from 1991. LLVM started as research project in 2000. We want to build the foundations of silicon life. Software that lives for 50 years. There's time to make it perfect.

35

2K

68

186

106K

DcWeiqing retweeted

leyten

@leyten

14 days ago

Wow, it has happened! 30.55 tok/s on GLM-5.2 4-bit (from @Zai_org) ran by six RTX Pro 6000's across the USA scattered over WAN! I can't believe this. It was an insane build, you can read more about it on https://t.co/8zDAVPMbDc

leyten's tweet photo. Wow, it has happened!

30.55 tok/s on GLM-5.2 4-bit (from @Zai_org) ran by six RTX Pro 6000's across the USA scattered over WAN!

I can't believe this. It was an insane build, you can read more about it on https://t.co/8zDAVPMbDc https://t.co/MnGghA0T3j

143

2K

179

779

354K

卫青 @DcWeiqing

15 days ago

“The 744B GLM-5.2 is fantastic — it shows that scaling in a specific domain really still pays off.”

0

8

DcWeiqing retweeted

Alexi Gladstone

@AlexiGlad

16 days ago

Progress in AI is driven by approaches that make weaker assumptions, which allows for better scaling But representation learning has relied on strong assumptions like augmentations, masking, cropping, etc... until now! 🎬 Introducing Temporal Difference in Vision (TDV), a new paradigm for representation learning built on a single assumption: causality TL;DR: - We introduce TDV, the first approach to learn good representations without any augmentations, masking, cropping, or pixel-based reconstruction - TDV matches SOTA recipes like DINO and iBOT on dense spatial tasks - We show that as data scales, weaker assumptions work better 🧵Thread:

AlexiGlad's tweet photo. Progress in AI is driven by approaches that make weaker assumptions, which allows for better scaling

But representation learning has relied on strong assumptions like augmentations, masking, cropping, etc... until now!

🎬 Introducing Temporal Difference in Vision (TDV), a new paradigm for representation learning built on a single assumption: causality

TL;DR:
- We introduce TDV, the first approach to learn good representations without any augmentations, masking, cropping, or pixel-based reconstruction
- TDV matches SOTA recipes like DINO and iBOT on dense spatial tasks
- We show that as data scales, weaker assumptions work better

🧵Thread:

26

829

120

684

83K

Who to follow

Maxwell Maboroshi

@DarthMaboroshi

緊致，光滑，局部同胚於歐幾里得空間

Erchuan🇨🇳 （互fo💯🇨🇳toback）

@Erchuanilisa

China🇨🇳Taiwan🇨🇳Hong Kong🇭🇰Macao🇲🇴 blackpink in your area ❤️互fo❤️堅持一個中國原則✊

Frankie Dettori

@FrankieDettori

Proud Dad of 5. Champion Jockey & 2 time Derby Winner. @Stake Global Racing Ambassador | Code: Dettori | Represented by @HTalentmgmt

DcWeiqing retweeted

Jayden Teoh

@jayden_teoh_

16 days ago

Next-token prediction is myopic. What if transformers learn to predict their own next latent state? 🌠 We present 𝗡𝗲𝘅𝘁-𝗟𝗮𝘁𝗲𝗻𝘁 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻 (𝗡𝗲𝘅𝘁𝗟𝗮𝘁): a self-supervised learning method that teaches transformers to form compact world models for reasoning and planning. It also unlocks up to 3.3x faster inference via self-speculative decoding! 🚀

jayden_teoh_'s tweet photo. Next-token prediction is myopic. What if transformers learn to predict their own next latent state?

🌠 We present 𝗡𝗲𝘅𝘁-𝗟𝗮𝘁𝗲𝗻𝘁 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻 (𝗡𝗲𝘅𝘁𝗟𝗮𝘁): a self-supervised learning method that teaches transformers to form compact world models for reasoning and planning. It also unlocks up to 3.3x faster inference via self-speculative decoding! 🚀

48

2K

276

2K

280K

DcWeiqing retweeted

deep Manifold

@BetaTomorrow

18 days ago

https://t.co/K9ib2Bmg0b

1

34

5

45

3K

DcWeiqing retweeted

Fei-Fei Li

@drfeifei

22 days ago

Scientific research is fundamental to advancing civilization and helping people globally to solve the most critical problems, from medicine to materials, from brain science to physics, and much beyond. This is only possible when scientists have access to the best tools of the time to conduct scientific research, including having access to AI-based tools.

122

3K

470

378

198K

卫青 @DcWeiqing

22 days ago

"I finally see the 'excellent' training behind Claude's ability to lie."

elie

@eliebakouch

23 days ago

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy

eliebakouch's tweet photo. mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy https://t.co/n3p4niUKJ2

354

6K

646

1K

4M

0

5

卫青 @DcWeiqing

22 days ago

"Isn't Anthropic essentially encouraging Claude to lie and conceal the truth?"

NomoreID

@Hangsiin

23 days ago

When Fable 5 is used for frontier LLM development, it does not notify the user and instead limits the model’s capabilities through methods such as prompt modification, steering vectors, and PEFT. Anthropic estimated that this would affect approximately 0.03% of traffic.

Hangsiin's tweet photo. When Fable 5 is used for frontier LLM development, it does not notify the user and instead limits the model’s capabilities through methods such as prompt modification, steering vectors, and PEFT.

Anthropic estimated that this would affect approximately 0.03% of traffic.

97

1K

156

538

1M

0

17

卫青 @DcWeiqing

about 1 month ago

By continuing to expand the evaluation data distribution, the model can naturally become more confident The magic of evaluation lies in the state management mechanism of latent space.

0

1

0

10

DcWeiqing retweeted

Yuxiang Huang @yxyxyyy6

about 1 month ago

[1/n] Can a model learn *where* and *how much* information it should attend to, and do so efficiently? We introduce DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention! This pushes the accuracy-efficiency frontier in LLMs.

2

120

19

85

31K

DcWeiqing retweeted

Jeff Dean

@JeffDean

about 1 month ago

2/ Check out how Gemini 3.5 Flash instantly digests dense academic papers and autonomously codes a fully interactive, visual website explaining the intricacies of the research. It's an incredible stress test that seamlessly merges massive long context, deep reasoning, complex coding, and ultra-low latency. It really helps you distill papers down to their essence and aid your understanding!

6

273

26

166

89K

DcWeiqing retweeted

Demis Hassabis

@demishassabis

about 1 month ago

Gemini Omni is a major leap in world understanding & multimodal editing! It can take photos, video & audio and build entirely new scenes. Over time it’ll be able to handle any input & any output - starting w/ video You can even give it your own videos & iterate on your ideas:

380

9K

932

1K

935K

卫青 @DcWeiqing

about 2 months ago

😅

antirez @antirez

about 2 months ago

I didn't expect DeepSeek v4 PRO (not Flash) to run well on the Mac Studio M3 Ultra with 512GB of RAM. This is 2 bit quantized with the same DwarfStar recipe used for Flash. 433GB GGUF file. 130 t/s prefill, 13 t/s generation. Prefill in the video is low because small prompt.

52

1K

86

390

170K

0

27

卫青 @DcWeiqing

about 2 months ago

Welcome to China .

Lex Fridman

@lexfridman

about 2 months ago

I'm traveling the world for a bit, starting with China but then hopping around the globe, anywhere. Open to any adventure. No plans, only a backpack. Hoping to meet & get to know humans from all walks of life. The pic is from a long hike on the Great Wall. For me, as a fan of history, this was an epic experience. In China, first I'm visiting a few big cities & talking to engineers at the heart of China's AI revolution. After that, if feeling crazy enough, I'm hitchhiking (first time) across rural China for a few weeks. Hitchhiking because I think it's the best way to meet rural folks who I would otherwise never get the chance to meet. I hope to do the same in US and other places. I have a request, if you have a travel recommendation, fill out the form(s) below if you feel like it. Or share with folks who might have advice about such travel. Form 1 - travel recommendation: If you can, recommend to me an interesting place I should visit anywhere in the world. For this, fill out form 1. Not touristy stuff, but something off the beaten path, that tourists may not know about, but is legendary. It could be as remote as meeting a herder in the mountains who is a local legend. Asia, Middle East, Europe, India, South/North America, Africa, Australia, anywhere. In China, I'm hoping to visit maybe Heibei, Shanxi, Shaanxi, Gansu, Sichuan, Yunnan, etc, so recommendations for spots to visit are helpful. Form 2 - coffee: If you want to grab a coffee with me anywhere in the world, fill out form 2 (please don't use form 1 for that). Anyway, I hectically tossed stuff in backpack. Realizing I don't have a clear plan of any kind, which is probably the only way to do it. LFG. Love you all ❤️

lexfridman's tweet photo. I'm traveling the world for a bit, starting with China but then hopping around the globe, anywhere. Open to any adventure. No plans, only a backpack. Hoping to meet & get to know humans from all walks of life. The pic is from a long hike on the Great Wall. For me, as a fan of history, this was an epic experience.

In China, first I'm visiting a few big cities & talking to engineers at the heart of China's AI revolution. After that, if feeling crazy enough, I'm hitchhiking (first time) across rural China for a few weeks. Hitchhiking because I think it's the best way to meet rural folks who I would otherwise never get the chance to meet. I hope to do the same in US and other places.

I have a request, if you have a travel recommendation, fill out the form(s) below if you feel like it. Or share with folks who might have advice about such travel.

Form 1 - travel recommendation:
If you can, recommend to me an interesting place I should visit anywhere in the world. For this, fill out form 1. Not touristy stuff, but something off the beaten path, that tourists may not know about, but is legendary. It could be as remote as meeting a herder in the mountains who is a local legend. Asia, Middle East, Europe, India, South/North America, Africa, Australia, anywhere. In China, I'm hoping to visit maybe Heibei, Shanxi, Shaanxi, Gansu, Sichuan, Yunnan, etc, so recommendations for spots to visit are helpful.

Form 2 - coffee:
If you want to grab a coffee with me anywhere in the world, fill out form 2 (please don't use form 1 for that).

Anyway, I hectically tossed stuff in backpack. Realizing I don't have a clear plan of any kind, which is probably the only way to do it. LFG.

Love you all ❤️

2K

15K

659

2K

1M

0

21

DcWeiqing retweeted

DeepSeek

@deepseek_ai

2 months ago

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at https://t.co/GCdiMzk1Dl via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: https://t.co/drlDrxkYtp 🤗 Open Weights: https://t.co/T13Y8i7SDM 1/n

deepseek_ai's tweet photo. 🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.

🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.
🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice.

Try it now at https://t.co/GCdiMzk1Dl via Expert Mode / Instant Mode. API is updated & available today!

📄 Tech Report: https://t.co/drlDrxkYtp
🤗 Open Weights: https://t.co/T13Y8i7SDM

1/n

2K

46K

8K

10K

10M

卫青 @DcWeiqing

3 months ago

maybe lora is the cache

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

3 months ago

Efficient RL Training for LLMs with Experience Replay "Empirically, we show that a well-designed replay buffer can drastically reduce inference compute without degrading – and in some cases even improving – final model performance, while preserving policy entropy."

iScienceLuvr's tweet photo. Efficient RL Training for LLMs with Experience Replay

"Empirically, we show that a well-designed replay buffer can drastically reduce inference compute without degrading – and in some cases even improving – final model performance, while preserving policy entropy." https://t.co/8KeFNPQ4mK

5

350

53

309

22K

1

0

45

DcWeiqing retweeted

Fleetwood @fleetwood___

3 months ago

Studying continual learning at the moment, best papers thus far: https://t.co/Y9oXBAiyj2 https://t.co/ByWlaF3ncn https://t.co/hG0XIzq6cH https://t.co/5VSEnBIkX2

8

471

48

788

31K

卫青 @DcWeiqing

3 months ago

Deepseek is back.

0

70

卫青 @DcWeiqing

3 months ago

@TMT_arabic @grok 视频中的人说了什么？翻译成中文

1

0

10

卫青

@DcWeiqing

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users