Huge congrats to @modal on Modal Auto Endpoints! 🎉
Owning your inference means owning the code that runs it, and we love seeing that philosophy in action🫡Honored that SGLang powers part of the stack, alongside the work on DFlash speculative decoding 🧡
You no longer have to pick between the performance of a black box API and the flexibility and control of @modal. Auto Endpoints give you both.
We're unlocking frontier performance for everyone without having to talk to sales or an FDE. More cooking here, stay tuned.
We spent a lot of time at @modal working with customers to figure out the best way to deploy frontier open-source models. I think we have a pretty good idea now.
https://t.co/FDgKr5tJOA
@diptanu fwiw snapshotting a sandbox greatly depends on the use case. In many agent-based use-cases you don't know which dependencies the agent might decide to install nor when, making it a bit of a harder problem.
.wait_until_ready(), set, go
Building performant sandbox systems goes way beyond the initial container boot. We're unpacking what that means, and breaking down some tools to help you manage the entire lifecycle.
Speculation Is All You Need.
In this blog post, we announce the co-release (w/ Z Lab) of six more state-of-the-art DFlash speculators for @Alibaba_Qwen 3.x.
Over 1k output tps for 3.5 122B-A10B on a B200.
Read the blog for why we're all-in on spec dec.
https://t.co/Bv3Zc95Xgh
Reinforcement learning has exploded on Modal, and we've been cooking.
Here's a review of lessons learned helping teams train at scale, the patterns we kept seeing, and an open-source library to get started with RL on Modal quickly.
Day 0 support for Step 3.7 Flash on Modal.
- 198B parameter MoE with 11B active
- 256K context
- 3 reasoning levels
- Native image & video understanding
Great to work with @StepFun_ai and @sgl_project on this one.
Step 4 to achieve truly serverless GPUs for AI inference: skip over unserializable inference engine setup steps like CUDA graph capture and Torch compilation by stacking GPU snapshots and CPU snapshots.