indirectjump @indirectjump - Twitter Profile

about 2 months ago

@hu_yifei You need NVFP4 + the flashinfer a2a backend enabled. The out of the box a2a kernel is dog slow and so is the Marlin int4 grouped gemm

2

0

1

169

indirectjump @indirectjump

3 months ago

@thdxr @meansoabstractn Lol just completely wrong. Deepseek inference margins are public. And they don’t even have NVL72s that literally give you 10x gains in inference throughput.

0

117

indirectjump @indirectjump

4 months ago

@ben_j_todd In what world is inference compute 50% unless you count free users? Your B200 numbers are way off too, since SemiAnalysis showed DS R1 can get >9k TPS/GPU with NVL72 + PD disagg.

0

1

0

214

indirectjump @indirectjump

5 months ago

@finbarrtimbers They’re moving up the stack to capture more margin. Fully intentional and imo the right decision if you have the best model.

0

2

0

70

Who to follow

Turning animal emotions into poetry ✍️🐘 Stories of survival, love, loneliness & nature.

Tarunendra Tiwari

@tarunendra8884

सच लिखने की कोशिश करता हूँ, इसलिए कम लोगों को पसंद आता हूँ। | शब्द | सच्चाई | ज़िंदगी |

indirectjump @indirectjump

5 months ago

@StijnSmits @eliebakouch xhigh thinks for 10000 years while opus takes 10 seconds

0

20

indirectjump @indirectjump

5 months ago

@tokenbender The thinking time is just too long unfortunately

0

1

0

36

indirectjump @indirectjump

5 months ago

@gallabytes You can just predict the distribution in RGB space instead of expectation of the distribution

0

33

indirectjump @indirectjump

5 months ago

@rosinality I think they really should’ve tested this with English/Chinese, having shared tokenization between Latin languages helps hugely imo

1

0

66

indirectjump @indirectjump

5 months ago

@zephyr_z9 Seems easy to raise with bonds considering their revenue

0

971

indirectjump @indirectjump

5 months ago

Returns in society are accruing to the tails more than ever and your average SWE is not ready for the wave that’s about to hit them

0

39

indirectjump @indirectjump

6 months ago

@zephyr_z9 @teortaxesTex If you’ve been to KR/JP it’s very obvious they were never going to win just based on culture Same reason NYC hasn’t produced any notable work in this space SF has enough non grifters and risk takers still

0

1

0

84

indirectjump @indirectjump

6 months ago

@teortaxesTex Increasingly top heavy Impact & income will continue to concentrate exponentially more I think

0

748

indirectjump @indirectjump

6 months ago

@FangYi11101 Tabelog is not that great tbh Lots of 3.8-4.0 places that are mid Beli in US is probably higher hit rate than Tabelog in JP

1

3

0

1K

indirectjump @indirectjump

6 months ago

@teortaxesTex Better hardware = stack more SRAM Already been tried (Cerebras, Groq, SambaNova) Diffusion doesn’t improve this if you decode 1 token at a time

0

2

0

1

421

indirectjump @indirectjump

6 months ago

@_xjdr It saves them a couple iterations of synth data -> train -> repeat doesn’t it?

0

1

0

520

indirectjump @indirectjump

6 months ago

@suchenzang 4x the flops, what’s the point you’re trying to make here?

1

0

1K

indirectjump @indirectjump

6 months ago

@redtachyon I mean sure this works if you did ICPC but what you do on a daily basis is generally somewhat poorly correlated to interview questions

0

73

indirectjump @indirectjump

6 months ago

@rosinality https://t.co/XX7zGJWjiz

indirectjump @indirectjump

7 months ago

@yacinelearning It’s a big hack The reason GRPO like objectives are unstable with MoE is that a single gradient step can cause you to greatly exceed the clip bound due to rerouting, and then you won’t update to fix it bc you’re clipped. The right way to fix it is to stop clipping like CISPO.

1

6

0

5

539

1

0

153

indirectjump @indirectjump

6 months ago

@joel_bkr @scaling01 Elicitation is harder than people think!

0

725

indirectjump @indirectjump

6 months ago

@ViaFloo @apples_jimmy Spending time on safety is exactly why they can do it All modern RL techniques come from safety (oversight specifically)

0

1

0

78

indirectjump

@indirectjump

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users