@PyTorch Ambassador. @ApacheTVM committer. Edge LLM Infrastructure Engineer. Creator of Verilog-HDL/SystemVerilog for VS @code extension. Views are on my own.
Two moments every ML researcher knows. You get onto a new cluster, and week one goes to fitting the framework to your setup, not training. A new architecture lands, and trying it means hacking through a gigantic codebase to stay compatible with the pipeline. What you want to change is small. The code you wade through to change isn't.
This experience is likely not alone, and many researchers we’ve talked to run into similar issues. A year of this on CMU's FLAME cluster left us with one question: what if a framework were built for an agent to adapt and evolve, not just for humans to maintain?
So we introduce PithTrain: a compact, agent-native MoE training system, now ~11K lines of Python, on four principles:
- Compact: fits in one context window
- Python-native: readable tracebacks, no compiled-extension rebuilds
- No implicit indirection: direct calls, each model in its own file
- Agent skills: in-repo playbooks for recurring tasks
Then we measured the thing nobody measures. Same agent, same tasks, only the framework underneath changes: on PithTrain it finishes with up to 62% fewer turns and 64% less GPU time than production frameworks, while training just as fast.
We call this second axis agent-task efficiency, and we believe it deserves to sit alongside training throughput as a metric worth optimizing. Excited to see what people build with it.
Built with amazing collaborators @haok1402, Haozhan Tang, Akaash Parthasarathy, @Zichun_Yu, @junrushao, Todd Mowry, @XiongChenyan and @tqchenml.
Blog: https://t.co/byOKPs9rGQ
Code: https://t.co/AH5ZbwYluV
Paper: https://t.co/hkmDGx9Hc6
It was an honor to give the keynote at MLSys
Covered how AI systems have evolved, why AI is needed to improve them, why results have disappointed, why the future looks amazing, and why I’m working on this at Core Auto
Recording should be out soon, in the meantime slides
🚨 Event Update 🚨
MCP Dev Summit Tokyo is expanding its scope to become #AGNTCon + #MCPCon Japan (Sept 10-11)! We're covering the full #AgenticAI stack alongside deep-dive #MCP tech.
📣 Submit to speak! The CFP is open through Friday, May 29.
Learn more + submit your talk today 👇
https://t.co/XiZB4rIhIB
Thanks to @LaithSakka, we now have a shared developer log at https://t.co/ZlLiuMhFLE on all sorts of PyTorch things. The way to think about it: classic Meta culture is to build things and then post about them in the internal Workplace. Now we ask people to repost them here!
From #Kernel Engineering to Responsible #AI: We want your voice at #PyTorchCon North America in San Jose (Oct 20-21). 🔥 Submit your talk by June 7 & join the world's leading AI innovators.
#CallForProposal submissions: https://t.co/hLlKK7WxLD
Introducing XGrammar-2: structured generation for complex agent harnesses.
Strict tool-calling formats. Built-in DeepSeek-V4 and Qwen-3.6 support. Up to 80x speedup over XGrammar. Ready-to-use integrations with vLLM, SGLang, TensorRT-LLM, and more! ⚡
From Claude Code to OpenClaw, agents are defining more complex harnesses. XGrammar-2 ensures LLMs always interact with them in the right way.
Built in collaboration with DeepSeek, Databricks, and leading frontier AI labs to bring XGrammar-2 into latest models and products.
🧩 Structural Tag: one unified abstraction to describe any format your agent needs
🚀 Scales to 500+ strictly typed tools for complex agent harnesses
🌐 Native APIs in Python, C++, Rust, and JS, running everywhere from cloud to edge
🛠️ Integrated with vLLM, SGLang, TensorRT-LLM, and more
Excited to see what agent builders create with it!
Blog: https://t.co/N0Tbl588BH
GitHub: https://t.co/lo4yScuI2f
🚀FlexAttention is expected to land in ONNX as a preview op in the next version!
Hope this helps accelerate adoption across the ONNX ecosystem. Feedback welcome🙌
Let’s make modern LLMs easier to export, deploy, and run from edge to cloud💪
https://t.co/zeuGe5uRkk
We're building our @PyTorch-dedicated team at Hugging Face!
First item: speeding up torch.mps for 100x perf.
@Is36E has been killing it:
- torch.{sort,multinomial} as MPS shaders
- soon flex attention
- 5x ⚡️loading safetensors in mps
What other ops should we focus on?
We're opening a Hugging Face office in Tokyo!
Our goal: help open-source AI develop in Japan and grow the local community. Let's meet!
ハギングフェイスの東京オフィスがオープンしました!
私たちの目標は、日本におけるオープンソースAIの発展を支援し、ローカルコミュニティを育てることです。ぜひお会いしましょう!
I can't go back to the regular YouTube UI after this 😅
Obsidian Reader now makes the transcript interactive so you can scrub, highlight, auto-scroll. It feels so nice.
sam3.cpp - Meta's SAM 3 in pure C++ with @ggerganov's ggml
- Supports SAM 3.1, 3, 2.1, 2 and EdgeTAM
- FP16, 4-bit quant (EdgeTAM in 15 MB)
- Apple Metal GPU, CUDA, CPU
- Text-prompted: "peach" → every peach
- Single-file C++14
Performance-wise:
- 100ms object detection, segmentation
- Video object segmentation @ 20FPS on M4 Pro with EdgeTAM
https://t.co/XHC7ipyQtI
🔥 CFP is LIVE for #PyTorchCon North America 2026! Submit a talk or poster for Oct 20–21 in San Jose. Topics span training, inference, kernel engineering, responsible AI, & more. Deadlines: June 7 (talks) · July 26 (posters). Learn more + submit: https://t.co/Mz2gMtNnFc
Super early bird reg ends April 10. Save up to $500: https://t.co/1z0jDhdUZm
I've been using the ChatGPT Pro model for a month, and it feels way smarter than the Thinking model. If OpenAI lets us use the Pro series in Codex too, I think they could take over the market.
We’re excited to release TorchLean which is the first fully verified neural network framework in Lean. The Lean community has largely focused on pure mathematics. TorchLean expands this frontier toward verified neural network software and scientific computing. With the recent release of CSlib, we see this as another step toward a fully verified ML stack.
We support features:
1. Executable IEEE-754 floating-point semantics (and extensible alternative FP models) verified tensor abstractions with precise shape/indexing semantics
2. Formally verified autograd system for differentiation of NN programs Proof-checked certification / verification algorithms like CROWN (robustness, bounds, etc.)
3. PyTorch-inspired modeling API with eager-style development + export/lowering to a shared IR for execution and verification
Project page: https://t.co/YHpqhRbMQe
Paper: [2602.22631] TorchLean: Formalizing Neural Networks in Lean
Work done @Robertljg, Jennifer Cruden, Xiangru Zhong, @huan_zhang12 and @AnimaAnandkumar.
#MachineLearning #ScientificComputing #Lean