罗杰斯 @dhbrojas - Twitter Profile

dhbrojas retweeted

3 days ago

June 9th Researcher Reciprocity License "if you train on it, you let us generate - reverse terms of use void" Status quo 1. We teach frontier devs with ICLR/NeurIPS papers, OSS Github contributions 2. They use it to make frontier models 3. Then ban us from exploring our ideas We need a new license, original thinkers can't be an underclass to a tyrannical researcher fiefdom

31

959

100

95

107K

dhbrojas retweeted

Daniel Auras

@rasdani_

3 days ago

this is the biggest wake-up call to protect and nourish open source AI if you don't build out sovereign and independent models+infra closed labs will patronize you to an insulting degree

36

2K

273

152

69K

dhbrojas retweeted

Lazarz

@Laz4rz

5 days ago

To my future CEO: 👉👈🙄

1

37

1

3

5K

dhbrojas retweeted

JB

@JasonBotterill

8 days ago

Anthropic employees are fucking depressed

136

8K

404

1K

646K

罗杰斯

@dhbrojas

9 days ago

Please @tenstorrent, deploy DSV4 with 1M context at 500 TSU in the API and I will pay no matter the price 🙏

0

2

0

219

罗杰斯

@dhbrojas

10 days ago

You’re probably not bullish enough on RLM and context management / tool calling as code

0

97

罗杰斯

@dhbrojas

12 days ago

Chinese providers are on fire

Hot Aisle

@HotAisle

13 days ago

https://t.co/DISt8UrhX3 "On the flip side, we have the "F-Tier". Providers like https://t.co/w4vKh0dycR, AkashML, SambaNova, and Nebius are clocking in at exactly 0.0% cache hit rates across the models."

HotAisle's tweet photo. https://t.co/DISt8UrhX3

"On the flip side, we have the "F-Tier". Providers like https://t.co/w4vKh0dycR, AkashML, SambaNova, and Nebius are clocking in at exactly 0.0% cache hit rates across the models." https://t.co/sFfAJOQuyo

2

9

2

9

3K

0

1

0

322

dhbrojas retweeted

tomie

@tomieinlove

14 days ago

It seems Anthropic revenue has been bolstered by widespread “Who Can Give Anthropic the Most Money” tournaments.

10

3K

93

97

94K

罗杰斯

@dhbrojas

14 days ago

@boopdotpng That’s awesome man. I hope that with the ISA documentation progress and simulator a no-TT-dependency Blackhole stack is possible within the next 12-24 months 🤞🏻

0

4

0

164

罗杰斯

@dhbrojas

15 days ago

@Shiwei_Liu66 Great work as always. You guys deserve more compute! I always want to see your stuff tried at larger scales

0

1

0

228

dhbrojas retweeted

Shiwei Liu

@Shiwei_Liu66

15 days ago

🚀 New paper: One LR Doesn’t Fit All for Transformers Arxiv: https://t.co/vmJC3XKRNU Transformers look like homogeneous stacks. They are not. Modern Transformers are highly heterogeneous: attention layers, FFN layers, embeddings, and different depths can have very different training dynamics. But we still give them the same learning rate. In our new paper, we show that the shape of weight spectrum can diagnose this heterogeneity and turn it into a practical optimizer design: layerwise learning rates. Weakly trained layers get larger updates. Well-trained layers get protected. It works for both AdamW and Muon — and the improvement with Muon is even more considerable. The result is better module utilization, faster convergence, and stronger generalization — up to 1.5× training speedup across LLaMA/GPT-style models.

13

239

25

206

13K

罗杰斯

@dhbrojas

15 days ago

Are there any labs/researchers working on reducing the hyper-parameter surface of optimisers, large training runs in general? So much money wasted in ablations!

0

1

93

罗杰斯

@dhbrojas

15 days ago

Your university the second you graduate

Polymarket

@Polymarket

16 days ago

NEW: Nvidia CEO Jensen Huang to join board of prestigious Tsinghua University in Beijing.

76

2K

136

164

217K

0

3

0

200

罗杰斯

@dhbrojas

19 days ago

@JCzarlinski @__tinygrad__ Let me know what you’d like me to try, I may write a couple blog posts

0

1

0

97

罗杰斯

@dhbrojas

21 days ago

I will definitely regret this but the @__tinygrad__ backend is not going to write itself...

5

107

2

20

17K

罗杰斯

@dhbrojas

19 days ago

@igorjmichalak @tenstorrent Interesting, what’s the use case?

1

0

269

罗杰斯

@dhbrojas

19 days ago

@__tinygrad__ I don’t plan on using anything from @tenstorrent beside the driver. But yeah, the first part will be documenting everything. I’ll see how much can be extracted from the TT codebases. It will probably (P=0.95) fail, but hey, it’s fun to try!

0

11

0

895

罗杰斯

@dhbrojas

19 days ago

@__tinygrad__ Do you ever bring it up to @jimkxa? What does he say?

0

6

0

601

dhbrojas retweeted

the tiny corp

@__tinygrad__

20 days ago

@dhbrojas Looks like it's still missing a lot of Blackhole. It's crazy to me that $10M+ tapeouts are done without a full spec of each instruction + cycle accurate simulator.

2

53

1

4

4K

罗杰斯

@dhbrojas

Last Seen Users on Sotwe

Trends for you

Most Popular Users