TonG @yIntuition - Twitter Profile

about 1 month ago

Thanks to an amazing partnership with @inferact and @lmsysorg /@radixark , Dynamo had day0 support for DeepSeek-V4 with features including large scale P/D disaggregation on B/GB200 and 300 and KV cache aware routing. Containers and some fun PRs linked in the next thread!

13

62

7

11

16K

yIntuition retweeted

Haibin @eric_haibin_lin

about 1 year ago

We are open sourcing bytecheckpoint and veomni! bytecheckpoint is the Bytedance's production checkpointing system for foundation model training, battle-tested with jobs with 10k+ GPUs. Blazing fast save/load, load-time checkpoint auto-resharding for different parallelism across training stages (pretrain/SFT/RL). veomni is a open source model training framework for llm and multi-modal training. UI-TARS (the SOTA GUI Agent model prior to OpenAI operator's release) is trained with veomni. Developed with modular design, integrated with sequence/expert/zero-optimizer parallelism, offloading optimizations, @liger_kernel. Trainer-free (let user control the training loop) and easy for researcher to hack! The go-to framework for text/multimodal llm pre-training and post-training, from research to production. Try them today, your feedback is welcome! Code: - https://t.co/IKM7ceK7GN - https://t.co/HAzL6Omc6j Paper: - NSDI paper: https://t.co/HjHgbjv9r0

3

180

25

86

15K

yIntuition retweeted

DeepSeek

@deepseek_ai

over 1 year ago

🚀 Day 3 of #OpenSourceWeek: DeepGEMM Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference. ⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs ✅ No heavy dependency, as clean as a tutorial ✅ Fully Just-In-Time compiled ✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes ✅ Supports dense layout and two MoE layouts 🔗 GitHub: https://t.co/cxJ55w61pT

466

6K

865

868

965K

yIntuition retweeted

DeepSeek

@deepseek_ai

over 1 year ago

🚀 Day 2 of #OpenSourceWeek: DeepEP Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference. ✅ Efficient and optimized all-to-all communication ✅ Both intranode and internode support with NVLink and RDMA ✅ High-throughput kernels for training and inference prefilling ✅ Low-latency kernels for inference decoding ✅ Native FP8 dispatch support ✅ Flexible GPU resource control for computation-communication overlapping 🔗 GitHub: https://t.co/Q6eRxZ9kgW

511

8K

1K

1M

Who to follow

over 2 years ago

@AhBoard @hellowo63335565 @Kathleen_Tyson_ Seems like all major apps are hiring harmony devs based on their job listings

0

4

0

37

TonG @yIntuition

about 3 years ago

@MattMolinario lmao

0

2

0

15

TonG @yIntuition

about 3 years ago

@ClearHeatVision

0

19

TonG @yIntuition

about 3 years ago

@ClearHeatVision

Drew Comments @sjs856

about 3 years ago

Was this necessary? lol

13

81

4

1

10K

0

16

TonG @yIntuition

about 3 years ago

@ClearHeatVision sounds like something canbe done by robots...

0

10

TonG @yIntuition

about 3 years ago

@Mick01915 @mcgilvrey @LPMisesCaucus @evil_blacksheep

0

108

TonG @yIntuition

about 3 years ago

@questflow @jonfitzsimon @jappleby Bigger reason is there are laws in China that requires foreign companies to follow, you mentioned both F and G, but you don't mention Microsoft (Bing) and AWS as well as Apple cloud services, those are all available in China because they abide the Chinese laws, F&G chose not to.

1

14

0

688

TonG @yIntuition

about 3 years ago

@MattMolinario allllmost

0

31

over 3 years ago

over 3 years ago

Microsoft Research has released BioGPT, a large language model trained on biomedical research literature. The model achieves better-than-human performance on answering questions from the biomedical literature, as evaluated on PubMedQA. The code for the model has been publicly released, and the weights for the large 1.5B model are expected to be released soon as well. Link to the paper: https://t.co/92MBKqWPZd Link to the GitHub repo: https://t.co/baaFEsmuNF #ArtificialIntelligence #GenerateiveAI #LLM #LLMs #AI #NLP #DeepLearning @MicrosoftResearch

tunguz's tweet photo. Microsoft Research has released BioGPT, a large language model trained on biomedical research literature. The model achieves better-than-human performance on answering questions from the biomedical literature, as evaluated on PubMedQA. The code for the model has been publicly released, and the weights for the large 1.5B model are expected to be released soon as well.

Link to the paper: https://t.co/92MBKqWPZd

Link to the GitHub repo: https://t.co/baaFEsmuNF

#ArtificialIntelligence #GenerateiveAI #LLM #LLMs #AI #NLP #DeepLearning @MicrosoftResearch

63

4K

726

1K

776K

0

25

TonG @yIntuition

over 3 years ago

@ClearHeatVision @MattMolinario He skipped the game last night to host a party to celebrate his "goat"ness. Having goats at the party doesn't make him a goat, but his ego is preventing him from seeing it lawlz.

0

12

TonG @yIntuition

over 3 years ago

@ClearHeatVision @MattMolinario that boy's delusional

NBACentral

@TheDunkCentral

over 3 years ago

LeBron brought two goats to his party last night 😅 (Via @TMZ_Sports )

717

34K

2K

293

3M

1

0

24

TonG @yIntuition

over 3 years ago

@ClearHeatVision Nobody give a f about that poser who claims goat by himself lawlz. His ego is like a black hole and it's devouring everybody around him with it

1

0

17

TonG @yIntuition

over 3 years ago

@sportingnews Easy to do when you don't have to play defense...

0

77

TonG @yIntuition

over 3 years ago

@MattMolinario @ClearHeatVision Ain't that the truth

0

3

TonG

@yIntuition

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users