はるやま | Makoto Haruyama @Spring_MT - Twitter Profile

about 1 month ago

How do we make LLMs faster and lighter? Don’t force the GPU to adapt to sparsity. Reshape the sparsity to fit the GPU! ⚡️ Excited to share our new #ICML2026 paper in collaboration with @NVIDIA: "Sparser, Faster, Lighter Transformer Language Models". This work introduces new open-source GPU kernels and data formats for faster inference and training of sparse transformer language models: Paper: https://t.co/3Avj8N8iYO Blog: https://t.co/SqFkkKvkbd Code: https://t.co/PHSzMq8pg0 While LLMs are undoubtedly powerful, they are increasingly expensive to train and deploy, with a large part of this cost coming from their feedforward layers. Yet, an interesting phenomenon occurs inside these layers: For any given token, only a small fraction of the hidden activations actually matter. The rest approximate zero, wasting computation. With ReLU and very mild L1 regularization, this sparsity can exceed 95% with little to no impact on downstream performance. So, can we leverage this sparsity to make LLMs faster? The challenge is hardware. Modern GPUs are optimized for dense matrix multiplications. Traditional sparse formats introduce irregular memory access and overheads that cancel out their theoretical savings for GEMM operations. Our contribution is twofold: 1/ We introduce TwELL (Tile-wise ELLPACK), a new sparse packing format designed to integrate directly in the same optimized tiled matmul kernels without disrupting execution. 2/ We develop custom CUDA kernels that fuse multiple sparse matmuls to maximize throughput and compress TwELL to a hybrid representation that minimizes activation sizes. We used our kernels to train and benchmark sparse LLMs at billion-parameter scales, demonstrating >20% speedups and even higher savings in peak memory and energy. This work will be presented at #ICML2026. Please check out our blog and technical paper for a deep dive!

21

757

118

418

408K

Spring_MT retweeted

Walden

@walden_yan

about 1 month ago

Cool to see failure modes of different coding agents in new report from @greptile - seems like Devin is better than humans in almost all categories!

walden_yan's tweet photo. Cool to see failure modes of different coding agents in new report from @greptile - seems like Devin is better than humans in almost all categories! https://t.co/A2g2lJJlDB

8

194

25

63

61K

Spring_MT retweeted

Claude

@claudeai

about 2 months ago

Memories are stored as files, so developers can export them, manage them via the API, and keep full control over what agents retain. Read more: https://t.co/PcfYg5sFxe

52

762

52

313

218K

Spring_MT retweeted

ClaudeDevs

@ClaudeDevs

about 2 months ago

Over the past month, some of you reported Claude Code's quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers.

2K

40K

3K

6K

6M

Who to follow

そのっつ (Naotoshi Seo)

@sonots

ZOZO 執行役員CTO / CRuby, Fluentd コミッタ / 他社技術顧問(クラウドアーキテクト, SRE, MLOps, データ基盤, 組織開発) / ex-DeNA, s1080134

Hiroshi SHIBATA

@hsbt

@[email protected]: A member of Ruby core/RubyGems/Bundler team, Administrator of *.ruby-lang.org, https://t.co/rdx69wmQZK, Fellow at ANDPAD, Inc.

fujiwara

@fujiwara

さくらインターネット株式会社クラウド事業本部 / 達人が教えるWebパフォーマンスチューニング〜ISUCONから学ぶ高速化の実践 https://t.co/gHYk2Cvajy / Amazon ECSデプロイツールecspressoの作者 / https://t.co/NaumOIH4la

はるやま | Makoto Haruyama @Spring_MT

about 2 months ago

またIT土方と呼ばれる時代に戻るのではないか

0

1

0

120

はるやま | Makoto Haruyama @Spring_MT

about 2 months ago

つらいのは組織の問題、楽しくないのは個人の問題っていうのは合っているか？

0

1

0

227

Spring_MT retweeted

Anthropic

@AnthropicAI

2 months ago

New on the Engineering Blog: Building Managed Agents—our hosted service for long-running agents—meant solving an old problem in computing: how to design a system for “programs as yet unthought of.” Read more: https://t.co/YYaEub2QGV

390

4K

457

2K

574K

はるやま | Makoto Haruyama @Spring_MT

3 months ago

CTOといっしょにFDEの立ち上げしてます！FDEのポジションも空いているので、興味ある人は連絡ください〜 https://t.co/1T5uZ5DqES

0

34

8

10

5K

Spring_MT retweeted

DHH

@dhh

3 months ago

*Kaigi on Rails, not RubyKaigi. Shibuya. You should come!

3

127

11

3

22K

はるやま | Makoto Haruyama @Spring_MT

4 months ago

https://t.co/XW4oTmOXuW これはたいへん

0

177

Spring_MT retweeted

ナレッジワーク @kworkcom

4 months ago

【新ソリューション発表】個社固有の複雑な業務をAIで支援・代行できる「ナレッジワークカスタマイズAIエージェント」を提供開始しました。専門コンサルタントが、自社の業務にあわせてカスタマイズしたAIエージェントを構築し、業務を変革し、成果を創出できるAXを実現します。

1

55

12

27

16K

Spring_MT retweeted

株式会社AgenticSec

@AgenticSecJP

6 months ago

AI自動ペネトレーションテスト「RapidPen」のホワイトペーパーの配布を開始しました。 RapidPenで取り組む課題・ユースケースの他、Hack The Box (ペネトレーションテストトレーニングサイト) での実験結果を掲載しております。下記リンクよりお気軽にご請求ください。 https://t.co/BxpW35VdDT

AgenticSecJP's tweet photo. AI自動ペネトレーションテスト「RapidPen」のホワイトペーパーの配布を開始しました。

RapidPenで取り組む課題・ユースケースの他、Hack The Box (ペネトレーションテストトレーニングサイト) での実験結果を掲載しております。

下記リンクよりお気軽にご請求ください。

https://t.co/BxpW35VdDT https://t.co/srmCBOp3sa

0

6

1

4

7K

はるやま | Makoto Haruyama @Spring_MT

6 months ago

子どもとみてねの1秒動画を一緒に見てたら子供が、「こころがいっぱいつまってるね」と言ってて、楽しいとか面白いとか悲しいとか沢山つまってるなって思ってとても感動しました。いつもこころがいっぱいにつまった動画をありがとうみてね #mitene

0

2

0

192

はるやま | Makoto Haruyama @Spring_MT

7 months ago

葉っぱのギザギザ、間隔が決まる仕組みを発見　「調整役」の役割解明（毎日新聞） #Yahooニュース https://t.co/mqgjcIsUHT

0

193

Spring_MT retweeted

ゆっきー | SmartBank

@yuki930

8 months ago

Claude codeにツール作ってもらう →Githubに公開 →streamlitにアップして使う = 便利すぎるという事に気づいた日曜日これは目標設定して、体重計のCSVデータをアップロードしたらグラフを書いてくれるアプリ https://t.co/VoQuESk3WU https://t.co/XK9IzbnG50

yuki930's tweet photo. Claude codeにツール作ってもらう
→Githubに公開
→streamlitにアップして使う
= 便利すぎる

という事に気づいた日曜日

これは目標設定して、体重計のCSVデータをアップロードしたらグラフを書いてくれるアプリ
https://t.co/VoQuESk3WU
https://t.co/XK9IzbnG50 https://t.co/KmzlRA3u0g

0

11

2

7

3K

Spring_MT retweeted

ゆっきー | SmartBank

@yuki930

7 months ago

今朝の出来事娘「パウパトロールのリュック欲しい！」私「えーどこに売ってるかねー？」娘「マルハニチロ」私「マルハニチロ？」息子「マルハニチロは2026年3月からウミオスに変わります」私「？！？！」 CMの力ってすごいね🤣

0

3

1

0

797

はるやま | Makoto Haruyama @Spring_MT

7 months ago

toon試してみたけど、増えるパターンもあるのね、、、

0

130

Spring_MT retweeted

Andy Jassy

@ajassy

7 months ago

New multi-year, strategic partnership with @OpenAI will provide our industry-leading infrastructure for them to run and scale ChatGPT inference, training, and agentic AI workloads. Allows OpenAI to leverage our unusual experience running large-scale AI infrastructure securely, reliably, and at scale. OpenAI will start using AWS’s infrastructure immediately and we expect to have all of the capacity deployed before end of next year-- with the ability to expand in 2027 and beyond. https://t.co/l2xXvEEPn3