We won 1st place at the @JaneStreetGroup x @GPU_MODE Hackathon in NYC this weekend! 🚀⚡
Our challenge was to a real-time inference server for an ensemble of sequential stateful models (Mamba2, xLSTM, etc.) processing streaming market data. The goal was to maximize the PnL of our trading system, which demanded high model accuracy while decreasing latency, maximizing throughput, and maintaining live uptime in production. We worked across the full inference stack, from high-level batching algorithms to GPU utilization optimizations.
Our production engine ultimately processed ~400 requests per second with ~30ms of latency, achieving the highest PnL of the competition! Our key techniques:
💡 Dynamic batching and state management to maximize throughput while preserving sequential inference accuracy
💡 Profiled and eliminated CPU <-> GPU communication overhead, removing synchronization points and bottlenecks
💡 Reduced kernel launch overhead with PyTorch optimizations like torch.compile
💡 Fast state expansion/reduction strategies to minimize batch latency
💡 Explored model quantization and custom Triton kernels to fuse operations and improve GPU utilization
Huge thanks to @marksaroufim and @GuggerSylvain for designing a deeply interesting open-ended technical challenge and a smooth contest experience, and the real-world insights throughout!
The frontier has increasingly shifted to hybrid models - from Qwen to Kimi-Linear and now with NVIDIA's Nemotron-3 Super - that rely on a strong linear sequence model. Today we release Mamba-3, the most powerful linear model to date.
https://t.co/OpMmqEWMkP
I’m unreasonably excited about the fact that we wrote everything in Cute-DSL, embedded in Python. Installing / “compiling” now takes seconds instead of minutes / hours (looking at you, C++ templates). Try pip install fa4!
Check out our implementation at https://t.co/ttbyyLJH6c :)
This wouldn’t be possible without my incredible teammates Kyle Yu (https://t.co/5nka3EhSDu) and Aswinkumar (https://t.co/fnhIIB8Vgx) 🙌
Especially grateful to Jane Street team and the broader GPU MODE community for giving us a taste of the demands of ML infra for low-latency trading, to Tri Dao and the PyTorch team for sharing their insights on the future of GPU programming models, and to CoreWeave and Northflank for the H100s and support! 💸
We won 1st place at the @JaneStreetGroup x @GPU_MODE Hackathon in NYC this weekend! 🚀⚡
Our challenge was to a real-time inference server for an ensemble of sequential stateful models (Mamba2, xLSTM, etc.) processing streaming market data. The goal was to maximize the PnL of our trading system, which demanded high model accuracy while decreasing latency, maximizing throughput, and maintaining live uptime in production. We worked across the full inference stack, from high-level batching algorithms to GPU utilization optimizations.
Our production engine ultimately processed ~400 requests per second with ~30ms of latency, achieving the highest PnL of the competition! Our key techniques:
💡 Dynamic batching and state management to maximize throughput while preserving sequential inference accuracy
💡 Profiled and eliminated CPU <-> GPU communication overhead, removing synchronization points and bottlenecks
💡 Reduced kernel launch overhead with PyTorch optimizations like torch.compile
💡 Fast state expansion/reduction strategies to minimize batch latency
💡 Explored model quantization and custom Triton kernels to fuse operations and improve GPU utilization
Huge thanks to @marksaroufim and @GuggerSylvain for designing a deeply interesting open-ended technical challenge and a smooth contest experience, and the real-world insights throughout!
Today, we’re disclosing two 9.8 CVSS memory corruption vulnerabilities in the @NVIDIA Triton Inference Server that lets attackers crash production AI services through malicious HTTP requests (CVE-2025-23310 and CVE-2025-23311) 🧵
We just wanted to cowork past 5pm. Turns out the entire city did too :)
@samanthaaouyang and I first crossed paths in Turkey earlier this year. In April, we reconnected at a women founders’ event and nerded out for hours about café culture, cities that sleep too early, and what it would take to build something different this summer.
Two days later, we had a full Notion doc and a dream for a late-night popup called Elsewhere. A third space that would be a little less lonely than working from home, but a little more magical than your usual café.
When @elsewhere_today launched on July 16th, we got way more inbound than expected: 100K+ views in 24 hours and 700+ new followers from almost nothing. Still, the real test: would SF actually show up?
400+ people bought in. During our first pop-up last Thursday, we had a line out of the door before 7pm. We served hojicha lattes, blue Thai tea, crepe cakes, fruit tarts, coconut macarons, and, of course, endless matcha. People brought their laptops and stayed there working until 11pm on passion projects. Someone even paid in USDC on Solana! (other chains tap in 👀)
At first, some skeptics on X were quick to dismiss “just another late-night café attempt.” But honestly, if we were able to make SF even 1% more alive that night, I think that’s beautiful :)
Huge thanks to the best team in the world: Sam, Akshaya (@akshayadinesh19), and Daniel (@Bluezmango123). Thank you to Fayeeza and the Entrepreneurs First team (@join_ef) for an incredible venue. And much love to Jade and Homeroom for an amazing collab!
Let us know what other cities to stop by! Stay tuned for more from Elsewhere ❤️
🔥 Semgrep is officially live on Cursor!
You can now harness the power of @semgrep directly in your AI coding assistant, combining fast, accurate static analysis with LLMs to help developers ship code that’s secure from the start, fast.
From securing code at leading AI companies to joining the @cursor_ai tools ecosystem, Semgrep is becoming essential for dev-first security in the modern stack.
Shoutout to our team for making this integration happen, and to our customers, partners, and community for pushing us forward 🚀
https://t.co/f0HxFhr02D
#Cursor #AppSec #DeveloperTools #SecureCoding #LLM #StaticAnalysis