Cade Daniel 🇺🇸 @cdnamz - Twitter Profile

cdnamz retweeted

about 1 year ago

What if we could teach LLMs to be algorithm inventors? I trained an LLM to improve sorting algorithms through pure reinforcement learning - and it discovered optimizations giving 47.92x speedups over an optimized python based Timsort baseline! No cold-start data needed. I used @huggingface grpo implementation and @Alibaba_Qwen 7b model.

HrishbhDalal's tweet photo. What if we could teach LLMs to be algorithm inventors?

I trained an LLM to improve sorting algorithms through pure reinforcement learning - and it discovered optimizations giving 47.92x speedups over an optimized python based Timsort baseline! No cold-start data needed.

I used @huggingface grpo implementation and @Alibaba_Qwen 7b model.

20

778

63

764

107K

cdnamz retweeted

Jonathan Frankle

@jefrankle

about 1 year ago

The hardest part about finetuning LLMs is that people generally don't have high-quality labeled data. Today, @databricks introduced TAO, a new finetuning method that only needs inputs, no labels necessary. Best of all, it actually beats supervised finetuning on labeled data.

jefrankle's tweet photo. The hardest part about finetuning LLMs is that people generally don't have high-quality labeled data. Today, @databricks introduced TAO, a new finetuning method that only needs inputs, no labels necessary. Best of all, it actually beats supervised finetuning on labeled data. https://t.co/7ICyOQKGWN

13

890

134

839

91K

cdnamz retweeted

Simran Arora

@simran_s_arora

about 1 year ago

BASED ✌️ turns 1! One year since its launch at NeurIPS 2023 — and it's helped shape the new wave of efficient LMs. ⚡️ Fastest linear attention kernels 🧠 405B models trained on 16 GPUs 💥 Inspired Mamba-v2, RWKVs, MiniMax Checkout our retrospective below!

3

107

56

31

22K

cdnamz retweeted

Hongyang Zhang @hongyangzh

about 1 year ago

Jointly announcing EAGLE-3 with SGLang: Setting a new record in LLM inference acceleration! - 5x🚀than vanilla (on HF) - 1.4x🚀than EAGLE-2 (on HF) - A record of ~400 TPS on LLama 3.1 8B with a single H100 (on SGLang) - 1.65x🚀in latency even for large bs=64 (on SGLang) - A new scaling law: more training data, better speedup - Apache 2.0 Paper: https://t.co/u6mQ6U9xTT Code: https://t.co/Hnhnwb9iJ3 SGLang version: https://t.co/8tCSDjCktY ⚒️Takeaway: Introducing training-time test, a novel draft model training technique: we replace feature prediction with direct token prediction and shift from top-layer-only features to multi-layer feature fusion. This approach unlocks a new scaling law previously undiscovered in EAGLE and EAGLE-2. 🙏Acknowledge: We would like to thank the SGLang team (@zhyncs42 @lm_zheng @ying11231 @JamesLiuID, @ispobaoke, and others @lmsysorg) for their merge and careful evaluation of EAGLE-3 on SGLang. 🤝Want to collaborate? We're a small academic group with limited GPU resources. If you're interested in supporting our next version of EAGLE or would like us to train a preliminary version tailored to a specific model, please get in touch! Joint work with Yuhui Li, Fangyun Wei, and Chao Zhang

15

299

43

191

42K

Who to follow

Run, manage, and scale AI workloads on any AI infrastructure. Open-source system for all your AI compute — Kubernetes, Slurm, VMs, 20+ clouds.

Abhi Venigalla

@ml_hardware

Researcher @Databricks. Former @MosaicML, @CerebrasSystems. Addicted to all things compute.

cdnamz retweeted

Shanli Xing @shanli_xing

about 1 year ago

🚀Meet flashinfer.sampling—our sorting-free GPU kernels for lightning-fast #LLM sampling. Our implementation achieves over 50% reduction in sampling time. Blog post: https://t.co/R780Rth03x

shanli_xing's tweet photo. 🚀Meet flashinfer.sampling—our sorting-free GPU kernels for lightning-fast #LLM sampling.

Our implementation achieves over 50% reduction in sampling time.

Blog post: https://t.co/R780Rth03x https://t.co/KQbc9RS4aF

1

180

32

94

31K

cdnamz retweeted

Simon Guo

@simonguozirui

over 1 year ago

LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇

simonguozirui's tweet photo. LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench!

Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time.

More 🧵👇

9

304

67

132

114K

Cade Daniel 🇺🇸

@cdnamz

over 1 year ago

Welcome @istoica05

Hao Zhang

@haozhangml

over 1 year ago

Thrilled to see @istoica05 joining X and couldn't agree more with his insights on the importance of shared infrastructure. "Open source" encompasses more than just open weights—it includes open data, open artifacts, and open infrastructure!

1

20

2

1

4K

0

11

0

895

Cade Daniel 🇺🇸

@cdnamz

over 1 year ago

Congrats!

Deli Chen

@victor207755822

over 1 year ago

Unbelievable results, feels like a dream—our R1 model is now #1 in the world (with style control)! 🌍🏆 Beyond words right now. 🤯 All I know is we keep pushing forward to make open-source AGI a reality for everyone. 🚀✨ #OpenSource #AI #AGI #DeepSeekR1

306

7K

534

1K

814K

0

3

0

684

cdnamz retweeted

Grad

@Grad62304977

over 1 year ago

People waking up to take their bitter lesson pill https://t.co/fswrLVjfCC

3

88

3

11

9K

Cade Daniel 🇺🇸

@cdnamz

over 1 year ago

@AnushElangovan @PytorchToAtoms There should be a Nanoflow for AMD https://t.co/XyX0s8SEk7

0

4

0

6

742

Cade Daniel 🇺🇸

@cdnamz

over 1 year ago

@nikitamounier @hongyangzh I’m not sure the latest status — last I checked there was an accuracy issue causing lower acceptance rate. Maybe @CodyHaoYu or @eqhylxx have more up to date info

0

117

Cade Daniel 🇺🇸

@cdnamz

over 1 year ago

@_opencv_ good bait

1

8

0

1K

Cade Daniel 🇺🇸

@cdnamz

over 1 year ago

love finding bangers so damn good they force a follow

0

12

1

0

777

cdnamz retweeted

Suhail

@Suhail

over 1 year ago

Once the AI labs realize they need to make products for survival, they will immediately reformulate their strategy to competing with the most obvious working thing that is vaguely under the guise of the original mission. You should presume you will be ruthlessly copied.

30

768

38

254

106K

cdnamz retweeted

Rohan Choudhury

@rchoudhury997

over 1 year ago

Excited to finally release our NeurIPS 2024 (spotlight) paper! We introduce Run-Length Tokenization (RLT), a simple way to significantly speed up your vision transformer on video with no loss in performance!

22

1K

169

833

156K

cdnamz retweeted

Vima Gupta @vima_gupta

over 1 year ago

1/7 🧵 MoEs: A tale of expectation vs reality Marketing: "Only compute the expert parameters you need!" Reality: Batch 16 requests → ALL experts activate At serving time (vLLM/TGI), arithmetic intensity: AI ≈ (num_tokens * top_k) / total_experts In simpler terms: Your decode arithmetic intensity scales inversely with expert count 🤔 #MoE #LLMs #ChatGPT #Claude #vllm #AI #ML

vima_gupta's tweet photo. 1/7 🧵 MoEs: A tale of expectation vs reality

Marketing: "Only compute the expert parameters you need!"
Reality: Batch 16 requests → ALL experts activate
At serving time (vLLM/TGI), arithmetic intensity:
AI ≈ (num_tokens * top_k) / total_experts
In simpler terms: Your decode arithmetic intensity scales inversely with expert count 🤔

#MoE #LLMs #ChatGPT #Claude #vllm #AI #ML

4

32

7

11

3K

cdnamz retweeted

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

over 1 year ago

Pie: Pooling CPU Memory for LLM Inference paper: https://t.co/HPsU3exTFJ Pie is an LLM inference framework that tackles the memory challenges of large models by enabling efficient GPU-CPU memory swapping and adaptive expansion. It optimizes memory usage without increasing latency, achieving up to 1.9x higher throughput and 2x lower latency compared to alternatives like vLLM, while reducing GPU memory usage by up to 1.67x.

1

169

40

107

11K

Cade Daniel 🇺🇸

@cdnamz

over 1 year ago

@doomslide very straussian of you

0

2

0

116

cdnamz retweeted

Michael Matthews @mitrma

over 1 year ago

🍎 The core of Kinetix is our new 2D rigid body physics engine: Jax2D. This is a minimal rewrite of the classic Box2D engine made by @erin_catto. Jax2D allows us to run thousands of heterogeneous parallel environments on a single GPU (yes, you can vmap over different tasks!) 8/

4

40

4

6

3K

Cade Daniel 🇺🇸

@cdnamz

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users