Masahiro Hiramori

@mshrh3

@PyTorch Ambassador. @ApacheTVM committer. Edge LLM Infrastructure Engineer. Creator of Verilog-HDL/SystemVerilog for VS @code extension. Views are on my own.

Greater Tokyo Area, Japan

Joined April 2020

103 Following

77 Followers

356 Posts

mshrh3 retweeted

Ruihang Lai @ruihanglai

16 days ago

Two moments every ML researcher knows. You get onto a new cluster, and week one goes to fitting the framework to your setup, not training. A new architecture lands, and trying it means hacking through a gigantic codebase to stay compatible with the pipeline. What you want to change is small. The code you wade through to change isn't. This experience is likely not alone, and many researchers we’ve talked to run into similar issues. A year of this on CMU's FLAME cluster left us with one question: what if a framework were built for an agent to adapt and evolve, not just for humans to maintain? So we introduce PithTrain: a compact, agent-native MoE training system, now ~11K lines of Python, on four principles: - Compact: fits in one context window - Python-native: readable tracebacks, no compiled-extension rebuilds - No implicit indirection: direct calls, each model in its own file - Agent skills: in-repo playbooks for recurring tasks Then we measured the thing nobody measures. Same agent, same tasks, only the framework underneath changes: on PithTrain it finishes with up to 62% fewer turns and 64% less GPU time than production frameworks, while training just as fast. We call this second axis agent-task efficiency, and we believe it deserves to sit alongside training throughput as a metric worth optimizing. Excited to see what people build with it. Built with amazing collaborators @haok1402, Haozhan Tang, Akaash Parthasarathy, @Zichun_Yu, @junrushao, Todd Mowry, @XiongChenyan and @tqchenml. Blog: https://t.co/byOKPs9rGQ Code: https://t.co/AH5ZbwYluV Paper: https://t.co/hkmDGx9Hc6

ruihanglai's tweet photo. Two moments every ML researcher knows. You get onto a new cluster, and week one goes to fitting the framework to your setup, not training. A new architecture lands, and trying it means hacking through a gigantic codebase to stay compatible with the pipeline. What you want to change is small. The code you wade through to change isn't.

This experience is likely not alone, and many researchers we’ve talked to run into similar issues. A year of this on CMU's FLAME cluster left us with one question: what if a framework were built for an agent to adapt and evolve, not just for humans to maintain?

So we introduce PithTrain: a compact, agent-native MoE training system, now ~11K lines of Python, on four principles:

- Compact: fits in one context window
- Python-native: readable tracebacks, no compiled-extension rebuilds
- No implicit indirection: direct calls, each model in its own file
- Agent skills: in-repo playbooks for recurring tasks

Then we measured the thing nobody measures. Same agent, same tasks, only the framework underneath changes: on PithTrain it finishes with up to 62% fewer turns and 64% less GPU time than production frameworks, while training just as fast.

We call this second axis agent-task efficiency, and we believe it deserves to sit alongside training throughput as a metric worth optimizing. Excited to see what people build with it.

Built with amazing collaborators @haok1402, Haozhan Tang, Akaash Parthasarathy, @Zichun_Yu, @junrushao, Todd Mowry, @XiongChenyan and @tqchenml.

Blog: https://t.co/byOKPs9rGQ
Code: https://t.co/AH5ZbwYluV
Paper: https://t.co/hkmDGx9Hc6

170

136

22K

mshrh3 retweeted

Mark Saroufim

@marksaroufim

28 days ago

It was an honor to give the keynote at MLSys Covered how AI systems have evolved, why AI is needed to improve them, why results have disappointed, why the future looks amazing, and why I’m working on this at Core Auto Recording should be out soon, in the meantime slides

marksaroufim's tweet photo. It was an honor to give the keynote at MLSys
Covered how AI systems have evolved, why AI is needed to improve them, why results have disappointed, why the future looks amazing, and why I’m working on this at Core Auto
Recording should be out soon, in the meantime slides https://t.co/5pbyUHTAVC

446

298

67K

mshrh3 retweeted

The Linux Foundation

@linuxfoundation

30 days ago

🚨 Event Update 🚨 MCP Dev Summit Tokyo is expanding its scope to become #AGNTCon + #MCPCon Japan (Sept 10-11)! We're covering the full #AgenticAI stack alongside deep-dive #MCP tech. 📣 Submit to speak! The CFP is open through Friday, May 29. Learn more + submit your talk today 👇 https://t.co/XiZB4rIhIB

linuxfoundation's tweet photo. 🚨 Event Update 🚨
MCP Dev Summit Tokyo is expanding its scope to become #AGNTCon + #MCPCon Japan (Sept 10-11)! We're covering the full #AgenticAI stack alongside deep-dive #MCP tech.
📣 Submit to speak! The CFP is open through Friday, May 29.
Learn more + submit your talk today 👇
https://t.co/XiZB4rIhIB

mshrh3 retweeted

Edward Z. Yang @ezyang

about 1 month ago

Thanks to @LaithSakka, we now have a shared developer log at https://t.co/ZlLiuMhFLE on all sorts of PyTorch things. The way to think about it: classic Meta culture is to build things and then post about them in the internal Workplace. Now we ask people to repost them here!

119

28K

Who to follow

Dan Zhang @ ICLR

@DZhang50

LLM Lead at Ricursive Intelligence | ex-Gemini @ Google DeepMind | Computer Architecture PhD @ UT Austin🤘 | Opinions stated here are my own.

L. Junbum

@__Beomi__

Researcher at Lablup Inc. AI/ML GDE

Hiroshi Inoue

@Inoue_0852

ボート漕いでたり、ツーリングに出たり。写真は、元ITproの高橋さんに撮ってもらった写真です。 https://t.co/1YV87QCGgO https://t.co/60IvxnNdRt 今日よりも明日、明日よりも子ども達が大きくなった頃、そしてその先がもっといい時代になっていますように。

mshrh3 retweeted

PyTorch

@PyTorch

about 1 month ago

From #Kernel Engineering to Responsible #AI: We want your voice at #PyTorchCon North America in San Jose (Oct 20-21). 🔥 Submit your talk by June 7 & join the world's leading AI innovators. #CallForProposal submissions: https://t.co/hLlKK7WxLD

mshrh3 retweeted

Yixin Dong @yi_xin_dong

about 1 month ago

Introducing XGrammar-2: structured generation for complex agent harnesses. Strict tool-calling formats. Built-in DeepSeek-V4 and Qwen-3.6 support. Up to 80x speedup over XGrammar. Ready-to-use integrations with vLLM, SGLang, TensorRT-LLM, and more! ⚡ From Claude Code to OpenClaw, agents are defining more complex harnesses. XGrammar-2 ensures LLMs always interact with them in the right way. Built in collaboration with DeepSeek, Databricks, and leading frontier AI labs to bring XGrammar-2 into latest models and products. 🧩 Structural Tag: one unified abstraction to describe any format your agent needs 🚀 Scales to 500+ strictly typed tools for complex agent harnesses 🌐 Native APIs in Python, C++, Rust, and JS, running everywhere from cloud to edge 🛠️ Integrated with vLLM, SGLang, TensorRT-LLM, and more Excited to see what agent builders create with it! Blog: https://t.co/N0Tbl588BH GitHub: https://t.co/lo4yScuI2f

yi_xin_dong's tweet photo. Introducing XGrammar-2: structured generation for complex agent harnesses.

Strict tool-calling formats. Built-in DeepSeek-V4 and Qwen-3.6 support. Up to 80x speedup over XGrammar. Ready-to-use integrations with vLLM, SGLang, TensorRT-LLM, and more! ⚡

From Claude Code to OpenClaw, agents are defining more complex harnesses. XGrammar-2 ensures LLMs always interact with them in the right way.

Built in collaboration with DeepSeek, Databricks, and leading frontier AI labs to bring XGrammar-2 into latest models and products.

🧩 Structural Tag: one unified abstraction to describe any format your agent needs
🚀 Scales to 500+ strictly typed tools for complex agent harnesses
🌐 Native APIs in Python, C++, Rust, and JS, running everywhere from cloud to edge
🛠️ Integrated with vLLM, SGLang, TensorRT-LLM, and more

Excited to see what agent builders create with it!

Blog: https://t.co/N0Tbl588BH
GitHub: https://t.co/lo4yScuI2f

149

42K

Masahiro Hiramori @mshrh3

about 2 months ago

Experimenting llm-jp-4 to gguf conversion / https://t.co/tan4CeqZVq

102

Masahiro Hiramori @mshrh3

about 2 months ago

🚀FlexAttention is expected to land in ONNX as a preview op in the next version! Hope this helps accelerate adoption across the ONNX ecosystem. Feedback welcome🙌 Let’s make modern LLMs easier to export, deploy, and run from edge to cloud💪 https://t.co/zeuGe5uRkk

mshrh3 retweeted

Lysandre

@LysandreJik

about 2 months ago

We're building our @PyTorch-dedicated team at Hugging Face! First item: speeding up torch.mps for 100x perf. @Is36E has been killing it: - torch.{sort,multinomial} as MPS shaders - soon flex attention - 5x ⚡️loading safetensors in mps What other ops should we focus on?

LysandreJik's tweet photo. We're building our @PyTorch-dedicated team at Hugging Face!

First item: speeding up torch.mps for 100x perf.

@Is36E has been killing it:
- torch.{sort,multinomial} as MPS shaders
- soon flex attention
- 5x ⚡️loading safetensors in mps

What other ops should we focus on? https://t.co/uqK0eBrqek

12K

Masahiro Hiramori @mshrh3

about 2 months ago

Chief Handoutai Officer / "なぜAppleは「半導体」と「製品」のトップを統合したのか　クック退任より重要な「CHO新設」と究極の垂直統合：本田雅一のクロスオーバーデジタル（1/4 ページ） - ITmedia PC USER" https://t.co/J2dJ464llF

mshrh3 retweeted

Lysandre

@LysandreJik

about 2 months ago

We're opening a Hugging Face office in Tokyo! Our goal: help open-source AI develop in Japan and grow the local community. Let's meet! ハギングフェイスの東京オフィスがオープンしました！私たちの目標は、日本におけるオープンソースAIの発展を支援し、ローカルコミュニティを育てることです。ぜひお会いしましょう！

LysandreJik's tweet photo. We're opening a Hugging Face office in Tokyo!

Our goal: help open-source AI develop in Japan and grow the local community. Let's meet!

ハギングフェイスの東京オフィスがオープンしました！

私たちの目標は、日本におけるオープンソースAIの発展を支援し、ローカルコミュニティを育てることです。ぜひお会いしましょう！

131

474

433

310K

mshrh3 retweeted

kepano

@kepano

2 months ago

I can't go back to the regular YouTube UI after this 😅 Obsidian Reader now makes the transcript interactive so you can scrub, highlight, auto-scroll. It feels so nice.

225

12K

838

10K

771K

mshrh3 retweeted

dalance @dalance1982

2 months ago

Verylシミュレータ上でのLinuxブートに成功しました。シミュレータの安定度もだいぶ上がってきた感じです。

mshrh3 retweeted

Pierre-Antoine Bannier

@el_PA_B

2 months ago

sam3.cpp - Meta's SAM 3 in pure C++ with @ggerganov's ggml - Supports SAM 3.1, 3, 2.1, 2 and EdgeTAM - FP16, 4-bit quant (EdgeTAM in 15 MB) - Apple Metal GPU, CUDA, CPU - Text-prompted: "peach" → every peach - Single-file C++14 Performance-wise: - 100ms object detection, segmentation - Video object segmentation @ 20FPS on M4 Pro with EdgeTAM https://t.co/XHC7ipyQtI

894

120

903

63K

mshrh3 retweeted

PyTorch

@PyTorch

3 months ago

🔥 CFP is LIVE for #PyTorchCon North America 2026! Submit a talk or poster for Oct 20–21 in San Jose. Topics span training, inference, kernel engineering, responsible AI, & more. Deadlines: June 7 (talks) · July 26 (posters). Learn more + submit: https://t.co/Mz2gMtNnFc Super early bird reg ends April 10. Save up to $500: https://t.co/1z0jDhdUZm

PyTorch's tweet photo. 🔥 CFP is LIVE for #PyTorchCon North America 2026! Submit a talk or poster for Oct 20–21 in San Jose. Topics span training, inference, kernel engineering, responsible AI, & more. Deadlines: June 7 (talks) · July 26 (posters). Learn more + submit: https://t.co/Mz2gMtNnFc

Super early bird reg ends April 10. Save up to $500: https://t.co/1z0jDhdUZm

Masahiro Hiramori @mshrh3

3 months ago

NVIDIA DGX Spark✨

Masahiro Hiramori @mshrh3

3 months ago

I've been using the ChatGPT Pro model for a month, and it feels way smarter than the Thinking model. If OpenAI lets us use the Pro series in Codex too, I think they could take over the market.

246

mshrh3 retweeted

Prof. Anima Anandkumar

@AnimaAnandkumar

4 months ago

We’re excited to release TorchLean which is the first fully verified neural network framework in Lean. The Lean community has largely focused on pure mathematics. TorchLean expands this frontier toward verified neural network software and scientific computing. With the recent release of CSlib, we see this as another step toward a fully verified ML stack. We support features: 1. Executable IEEE-754 floating-point semantics (and extensible alternative FP models) verified tensor abstractions with precise shape/indexing semantics 2. Formally verified autograd system for differentiation of NN programs Proof-checked certification / verification algorithms like CROWN (robustness, bounds, etc.) 3. PyTorch-inspired modeling API with eager-style development + export/lowering to a shared IR for execution and verification Project page: https://t.co/YHpqhRbMQe Paper: [2602.22631] TorchLean: Formalizing Neural Networks in Lean Work done @Robertljg, Jennifer Cruden, Xiangru Zhong, @huan_zhang12 and @AnimaAnandkumar. #MachineLearning #ScientificComputing #Lean