Higgs Audio v3 TTS is here.
Built for voice AI that speaks, not just reads:
• 100 languages with single-digit WER/CER
• inline control over emotion, style, prosody, and sound effects
• API, Workspace, and open weights
• Blog 👉 https://t.co/C8frDlfO5D
Watch the demo 👇
🚀 The wait is over! Today at #MLSys, we'll give a talk to reveal the final results and present the awards for the FlashInfer AI GPU Competition! 🏆
I'll also introduce FlashInfer-Bench: an agent-oriented Benchmark Engine designed for production kernels.
Join us from 11:00 AM - 1:00 PM PT to see who takes the crown and learn more. Everyone is welcome to attend—see you there! ✨
🌐 Competition & Results: https://t.co/GS21eemEZv
💻 FlashInfer-Bench Benchmark Engine: https://t.co/rlzNUXJq5e
#FlashInfer #MLSys26 #AI #GPU
We open-sourced some amazing work on an experimental Rust compiler for GPU from my colleagues at @nvidia. It takes a slightly different approach to expose GPU programming concepts natively in Rust. Check it out https://t.co/xR4Ho2LUMR.
An OpenAI friend told me he burns 300M GPT-5.5 tokens/day.
The top one in his team burns billions of tokens/day. Codex coding for them every night.
Databricks also gives engineers unlimited tokens.
We're looking for cracked inference engineers to join us at Databricks AI to produce trillions of tokens, insanely fast. DM me if you have:
- Contributed to open-source ML systems like SGLang/vLLM/PyTorch
- Experience serving LLMs at large scale
Databricks AI runs like a startup. Lots of exciting things to build!
#MLSys2026 is happening in two weeks! Our AI Infra team at @perplexity_ai is throwing a happy hour event at Bellevue on May 19. Come chat with us about inference, post-training, RL, kernels, GPUs, RDMA, agents, anything... https://t.co/mXf7laYt1s
Introducing XGrammar-2: structured generation for complex agent harnesses.
Strict tool-calling formats. Built-in DeepSeek-V4 and Qwen-3.6 support. Up to 80x speedup over XGrammar. Ready-to-use integrations with vLLM, SGLang, TensorRT-LLM, and more! ⚡
From Claude Code to OpenClaw, agents are defining more complex harnesses. XGrammar-2 ensures LLMs always interact with them in the right way.
Built in collaboration with DeepSeek, Databricks, and leading frontier AI labs to bring XGrammar-2 into latest models and products.
🧩 Structural Tag: one unified abstraction to describe any format your agent needs
🚀 Scales to 500+ strictly typed tools for complex agent harnesses
🌐 Native APIs in Python, C++, Rust, and JS, running everywhere from cloud to edge
🛠️ Integrated with vLLM, SGLang, TensorRT-LLM, and more
Excited to see what agent builders create with it!
Blog: https://t.co/N0Tbl588BH
GitHub: https://t.co/lo4yScuI2f
Excited to share that I’ll be presenting SkyWalker at #EuroSys26 in Edinburgh tomorrow!🚀
We asks: Can we reduce the cost of multi-region LLM serving by cross-region offloading, without losing the benefits of KV-cache locality?
Talk: April 29, afternoon track A, ~16:20-16:40📍
In this paper is presented Event Tensor, an abstraction designed to simplify the compilation and execution of dynamic megakernels, providing first-class support for both shape and data-dependent dynamism.
https://t.co/x399Y9onle