Atharva Kshirsagar

8 months ago

@cneuralnetwork hahaha is it crazy that I have a really good guess which lab this might be

anotherAtharva retweeted

8 months ago

🔥 New Blog: “Disaggregated Inference: 18 Months Later” 18 months in LLM inference feels like a new Moore’s Law cycle – but this time not just 2x per year: 💸 Serving cost ↓10–100x 🚀 Throughput ↑10x ⚡ Latency ↓5x A big reason? Disaggregated Inference. From DistServe, our early research system on prefill-decode disaggregation, to today’s production frameworks, disaggregation has become the backbone of modern LLM serving. So what is disaggregated inference? Why does the LLM inference community love it? And how far have we come? As the inventors of this technique, we take a look back – 18 months later - at how the idea reshaped the landscape and what comes next. 🔗 Read the full story: https://t.co/Kh7e6xq0Gx

173

141

40K

11 months ago

Predatory lending: VC edition

Antler India

@AntlerIndia

11 months ago

🚨 Announcing the Antler India AI Residency — our boldest program yet for India’s most ambitious AI founders. ₹4 Cr in investment, $1M+ in AI perks, and fast-track decisions in 4 weeks. To Learn more and Apply👇 Last Date: 13 Aug, 2025

AntlerIndia's tweet photo. 🚨 Announcing the Antler India AI Residency — our boldest program yet for India’s most ambitious AI founders.

₹4 Cr in investment, $1M+ in AI perks, and fast-track decisions in 4 weeks.

To Learn more and Apply👇 Last Date: 13 Aug, 2025 https://t.co/J2B0RUiCJr

33K

185

anotherAtharva retweeted

about 1 year ago

🚀 Dynasor is now production-ready in open-source stacks! @NVIDIA TensorRT-LLM @Snowflake ArcticInference Try it today ↓ TensorRT-LLM ➡️ https://t.co/SftheuQk8X Snowflake ➡️ https://t.co/8oPBM6QOjR 🎮Original Dynasor Repo: https://t.co/DxwL4wQp6r

anotherAtharva retweeted

about 1 year ago

Announcing FastVideo V1, a unified framework for accelerating video generation. FastVideo V1 offers: - A simple, consistent Python API - State of the art model performance optimizations - Optimized implementations of popular models Blog: https://t.co/lUsBq3Z4gm

161

15K

anotherAtharva retweeted

PJ Ace

@PJaccetturo

over 1 year ago

What if Studio Ghibli directed Lord of the Rings? I spent $250 in Kling credits and 9 hours re-editing the Fellowship trailer to bring that vision to life—and I’ll show you exactly how I did it 👇🏼

94K

12K

35K

13M

anotherAtharva retweeted

ESPN

@espn

over 1 year ago

Let the good times roll 🎉 UC San Diego heads to the NCAA tournament for the FIRST TIME in school history 👏

110

272

388K

anotherAtharva retweeted

over 1 year ago

You might have heard top reasoning models now match AIME gold medalists in 2025 🏅, but watch them crumble in box-pushing Sokoban (倉庫番) from the 80s! 🧩 Again, we put top reasoning models into the game, o3-mini (medium) took the crown, reaching level 4 before tangled with just two boxes. 😵‍💫 Claude-3.7-thinking managed two levels, Deepseek-R1 cleared one level. Gemini-2.0-flash-thinking solved none.

168

24K

anotherAtharva retweeted

over 1 year ago

Reasoning models often waste tokens self-doubting. Dynasor saves you up to 81% tokens to arrive at the correct answer! 🧠✂️ - Probe the model halfway to get the certainty - Use Certainty to stop reasoning - 100% Training-Free, Plug-and-play 🎮Demo: https://t.co/nDNILbJayQ

377

260

100K

over 1 year ago

@prajdabre1 @NeurIPSConf Congrats!

anotherAtharva retweeted

almost 2 years ago

We are excited to announce our lab's papers at #ICML2024! 🧠✨ Come and discuss our latest research from LLM evaluation to efficient LLM serving & inference! See you there! 1️⃣ Poster: MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving 📍 Location & Time: poster session 1 — Hall C 4-9 #816, 11:30 AM on Tuesday July 23 📜 TL;DR: MuxServe Boosts multiple LLM serving throughput by up to 1.8x through flexible spatial-temporal multiplexing. 2️⃣ Poster: Break the Sequential Dependency of LLM Inference Using Lookahead Decoding 📍 Location & Time: poster session 2 — Hall C 4-9 #411, 1:30 PM on Tuesday July 23 📜 TL;DR: An exact and parallel decoding algorithm that accelerates LLM decoding without needing auxiliary models or data stores. 3️⃣ Poster: Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference 📍 Location & Time: poster session 3 — Hall C 4-9 #709, 11:30 AM on Wednesday July 24 📜 TL;DR: Chatbot Arena is an open platform for evaluating LLMs based on human preferences through crowdsourced pairwise comparisons, and it’s becoming a widely cited leaderboard for its robust and credible evaluation methods. 4️⃣ Poster: CLLMs: Consistency Large Language Models 📍 Location & Time: poster session 4 — Hall C 4-9 #604, 1:30 PM on Wednesday July 24 📜 TL;DR: We introduce a new family of LLMs optimized for fast Jacobi decoding, achieving a 2.4x to 3.4x improvement in generation speed across multiple benchmarks without compromising quality. 5️⃣ Poster: Online Speculative Decoding 📍 Location & Time: poster session 5 — Hall C 4-9 #605, 11:30 AM on Thursday July 25 📜 TL;DR: OSD improves the efficiency of large language model inference by continuously updating the draft models with user query data, resulting in a significant reduction in latency and an increase in token acceptance rates. 6️⃣ Poster: InferCept: Efficient Intercept Support for Augmented Large Language Model Inference 📍 Location & Time: poster session 5 — Hall C 4-9 #709, 11:30 AM on Thursday July 25 📜 TL;DR: InferCept is the first inference framework for augmented LLMs, efficiently serving LLMs that can query tools, ML models, and virtual environments.

anotherAtharva retweeted

about 2 years ago

People often see LLMs as sequential decoders, but we show they can be easily adapted as fast parallel decoders!🔥🚀 Announcing consistency LLMs: teaching LLMs to predict the fixed point from any point on its Jacobi decoding trajectory - LLM can fast forward on token generation. - 3.4x speedup, no extra cost, no draft model. Details: https://t.co/oKE7pWf497

haoailab's tweet photo. People often see LLMs as sequential decoders, but we show they can be easily adapted as fast parallel decoders!🔥🚀

Announcing consistency LLMs: teaching LLMs to predict the fixed point from any point on its Jacobi decoding trajectory
- LLM can fast forward on token generation.
- 3.4x speedup, no extra cost, no draft model.

Details: https://t.co/oKE7pWf497

211

148

60K

anotherAtharva retweeted

over 2 years ago

Still optimizing throughput for LLM Serving? Think again: Goodput might be a better choice! Splitting prefill from decode to different GPUs yields - up to 4.48x goodput - up to 10.2x stricter latency criteria Blog: https://t.co/pVNpYbR7Qq Paper: https://t.co/n47rFkMZS0

179

79K

over 2 years ago

@amitp_ai @francoisfleuret @intel @AMD Yep, but had a similar experience on gaudi 2s as well

137