evrazian_schizo

@rationaleist

Joined November 2012

4 Following

47 Followers

375 Posts

evrazian_schizo @rationaleist

about 2 hours ago

@teortaxesTex Zhipu could take the more straightforward Whale and Moonshot innovations, scale them to what they can handle, and apply the post-training framework they already have. It's also possible for DS and Kimi to step up their RL and iteration speed

0

3

0

0

41

evrazian_schizo @rationaleist

about 2 hours ago

@teortaxesTex Evidently even last gen 700B models are far from saturated and at this point every good checkpoint is an asset for developing the next one. The absolute moat is still ephemeral for now

1

8

1

0

698

evrazian_schizo @rationaleist

1 day ago

@teortaxesTex The cost is weird. I wouldn't call it more terse.

0

1

0

1

176

evrazian_schizo @rationaleist

1 day ago

@teortaxesTex If they applied current recipe sure, but RL will get better by then too. I was thinking 1.5T class, which should be manageable. I'm just really impressed by 5.1 -> 5.2 jump on general capabilities ig. I expected code but not everything else. Feels like they cracked something.

1

0

0

0

74

evrazian_schizo @rationaleist

1 day ago

@teortaxesTex Mythos tier is a tech achievement, but is it necessary for Fable-like capabilities by Q4? I doubt GLM-5.2 itself is in the same scale ballpark as the closed source models it's matching on either parameters or data.

1

0

0

0

128

evrazian_schizo @rationaleist

2 days ago

@teortaxesTex > gemini and kimi in the same category offensive

0

2

0

0

227

evrazian_schizo @rationaleist

3 days ago

@teortaxesTex Fable less fried by Ant post-training?

0

1

0

0

263

evrazian_schizo @rationaleist

4 days ago

@teortaxesTex GLM-5.2 is like half the size of Opus. It's also architecturally a V3 family model. GLM-5, while my favorite for niche tasks, wasn't special and GLM-5.1 was codemaxxed and regressed on those tasks in my exp. Incredible things are happening in China.

0

0

0

0

96

evrazian_schizo @rationaleist

4 days ago

@teortaxesTex I mean, they ARE drones. The question is how to make UGVs cost less to deploy than the cost of UAVs needed to destroy them. A swarm of these covering each other in overlapping AA umbrella could create significant fire density against UAVs and bullets are still cheaper.

0

6

0

0

467

evrazian_schizo @rationaleist

5 days ago

@teortaxesTex > yuros will regulate US AI It's over for SF boys

0

0

0

0

94

evrazian_schizo @rationaleist

6 days ago

@teortaxesTex Don't see why CSA wouldn't benefit from this too.

0

0

0

0

164

evrazian_schizo @rationaleist

6 days ago

@teortaxesTex Do you specify the development process flow? I mostly get these loops when it messes up custom CoT and freaks out because it can't erase the steps that are already in wrong order.

1

1

0

0

510

evrazian_schizo @rationaleist

7 days ago

@invizive @teortaxesTex @Donogzs Every architecture would have to "go through all the context" by virtue of constraints of information processing. Attn can be log(n) without fundamental changes. What does it have to do with inventing a better arch anyway? It's a specific capability threshold in math and ML.

0

0

0

0

9

evrazian_schizo @rationaleist

12 days ago

@teortaxesTex In a shocking development people saying "make us the sole members of permanent upper class for your own good and safety" mean exactly what it sounds like.

0

3

0

1

55

evrazian_schizo @rationaleist

about 1 month ago

@teortaxesTex I mean, makes sense if you think about it as "rotate the self-attention." In a sequence token m represents what is discretely appended at position m, not the accumulated state.

0

0

0

0

151

evrazian_schizo @rationaleist

about 1 month ago

@akarlin The ground truth for SWE tasks and math is simple, sparse, and objective. "Good writing" is an extremely jagged shape that requires expensive human feedback and good heuristics. The former has high financial returns, the latter is niche atm. Resources are invested accordingly.

0

2

0

0

185

evrazian_schizo @rationaleist

about 2 months ago

@teortaxesTex @rasbt They explicitly reject both state tracking and SWA hybrids on their site. The specific claim seems to be DSA analog with a working linear indexer. Which would be both good and not that unrealistic but the "not your average transformer" marketing is so ass it's hard to believe.

1

4

0

0

343

evrazian_schizo @rationaleist

about 2 months ago

@teortaxesTex Long if factual but oozing scam energy. Why would you invent a buzzword name for your attention if you are not going to release the algo?

0

6

0

0

720

evrazian_schizo @rationaleist

about 2 months ago

@teortaxesTex Wouldn't that require strong multiturn?

1

1

0

0

200

Last Seen Users on Sotwe

Trends for you

Most Popular Users