Jouni Helminen @dharmaone - Twitter Profile

Jouni Helminen

@dharmaone

16 days ago

Banger

hardmaru

@hardmaru

16 days ago

For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall. We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal. This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (https://t.co/PK5h0mqQSo), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.

154

6K

637

4K

741K

0

3

0

162

dharmaone retweeted

OpenAI Developers

@OpenAIDevs

29 days ago

398

6K

411

546

1M

dharmaone retweeted

Bookmark Bro

@bookmarkbroski

about 1 month ago

Bookmarked something fire on X… then spent 15 minutes scrolling trying to find it again? 😩 We got you fam. Meet BookmarkBro — a beautiful native Mac app for browsing, super-fast search, tagging, and chatting with AI about your X bookmarks. All locally on your Mac. No privacy leaks. No new sign-ups. No cloud nonsense. Free in beta. More + download in the replies 👇

3

19

7

11

15K

dharmaone retweeted

Aman

@Amank1412

about 2 months ago

USING Claude Opus 4.7 TO CENTER A DIV

349

29K

2K

3K

2M

Who to follow

Josh

@devjoshstevens

VP, Engineering Defi @polymarket • prev SVP, Engineering at @aave • https://t.co/xILoNQsTZR

Bradley Freeman (coinbrad.base.eth)

@brad_or_bradley

Partnerships @base @coinbase | Opinions my own

Brett Sun

@sohkai

modular, integrated, aggregated, unified, abstracted; co-founder @preludexyz

Jouni Helminen

@dharmaone

about 2 months ago

@DylanWeaver @HinataMotivates Dwarkesh is like a 1b param model doing 1000 tok/s stumbling over in a reasoning loop. Jensen is a 2t param yoda one shotting it

1

8

0

342

Jouni Helminen

@dharmaone

about 2 months ago

@aakashgupta Was embarrassing to watch tbh. Dwarkesh doesn’t get it

0

8

0

1K

Jouni Helminen

@dharmaone

about 2 months ago

Great interview. Only one codex model runs on cerebras afaik - 5.3-spark. I’ve been testing it - very fast but the quality isn’t great. Tiny context window and not as good overall as 5.4. I think this is because the chip only has 44gb sram. @MatXComputing will have an interesting blend of sram (weights) and HBM (kv cache) and Nvidia will do more with Groq over time for fast inference of some workloads no doubt. Huang is right in that Nvidia GPUs/CUDA is more general and more future proof for architecture changes than TPUs optimised for current workloads/architectures. He also said that the main reason Anthropic is using TPUs is because Google/Amazon are large investors in them and Nvidia wasn’t able to invest early on - not sure how true that is but was interesting. China doesn’t have access to latest lithography for competitive power efficiency but will build EUV (or whatever comes after) capabilities eventually, likely in the next decade. They are moving pretty fast elsewhere (models obviously, but also fast 3d DDR5 from CXMT, Huawei etc for processors). I think the chip ban is probably bad long term, might have been better to keep them on nvidia instead of accelerating home grown alternatives

0

421

Jouni Helminen

@dharmaone

2 months ago

@claudeai this is the way. the executor could be a local model also, or a realtime voice model that does tool calling for complex tasks when needed but doesn't stop the voice conversation

0

1

0

1K

Jouni Helminen

@dharmaone

2 months ago

@elonmusk Ramanujan is a good example also. Speedrunning in representation space of compressed insights + intuition vs reasoning in language

0

1

0

502

Jouni Helminen

@dharmaone

2 months ago

@mustafasuleyman @MicrosoftAI Open weights?

0

137

Jouni Helminen

@dharmaone

2 months ago

@justbyte_ Basic, then pascal

0

16

Jouni Helminen

@dharmaone

3 months ago

@amix3k @soumo_dg it's a really great model and the optional reasoning and tool calling are great too. but i wonder how this will scale to every user on popular apps unless metered. some day a model like this will run on device

0

12

Jouni Helminen

@dharmaone

3 months ago

@bradneuberg @BoWang87 https://t.co/R7K5PKfWPD

0

34

Jouni Helminen

@dharmaone

3 months ago

@_DataStrategies @BoWang87 Yes

0

1

0

31

Jouni Helminen

@dharmaone

3 months ago

@cherry_cc12 Any s2s realtime voice model like Omni for qwen 3.5?

0

25

Jouni Helminen

@dharmaone

3 months ago

@elonmusk v cool. will AI4/5 be sold separately? And will you be able to use the AI4/5 chip in your car for other inference tasks (like Digital Optimus) while not driving?

0

39

Jouni Helminen

@dharmaone

3 months ago

@yezhang1998 From 48:24

0

12

Jouni Helminen

@dharmaone

3 months ago

This was a great recent interview - https://t.co/LsnwGKdlg2 Good fit for millions of low complexity problems that are still unsolved and are verifiable Coding is a bit special in the sense that there is potential for RSI - starting to see that with Karpathy’s new autoresearcher, AI optimised CUDA kernels etc

1

0

1

0

266

Jouni Helminen

@dharmaone

4 months ago

@tomjohndesign This plugin has worked very well for the same task but it’s great to see Figma embrace Claude code more. More excited to see design systems integration and two way flows in the future https://t.co/ILvXRbeWXc

0

2

0

306

Jouni Helminen

@dharmaone

4 months ago

@bencera @ryancarson macOS/iOS STT APIs do the heavy lifting- the weights are bundled with the OS. Still, looks great and very useful

0

38

Jouni Helminen

@dharmaone

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users