bocchi fan

@rusty_coconut

🦕🦖

Joined December 2016

4.7K Following

172 Followers

779 Posts

rusty_coconut retweeted

6 days ago

New podcast with @finbarrtimbers! We survey the latest post-training recipes, from GLM 5.1, Kimi K2.6, DeepSeek V4, Xiaomi MiMo V2.5, Nemotron Ultra, etc. and discuss: - Why the industry slowly shifted to multi-teacher on-policy distillation (MOPD). - What an Olmo-style recipe would need improvements in - How post-training works / suits larger organizational efforts - Career advice in the foothills of the singularity - and other topics I heard y'all wanted me to start doing this, so making some time when I'm in funemployment! Chapters: 00:00 Introduction & Olmo reflections 06:28 Post-train recipes review (history) 23:00 2026’s model recipes (MiMo Flash, DeepSeek V4, GLM 5, Kimi K2.6, etc.) 39:05 Open-ended post-training discussions 48:22 Career advice in the LLM race Links below, please follow @interconnectsai and like and subscribe and buy my book?

13

314

38

307

33K

bocchi fan @rusty_coconut

6 days ago

@threepointone King of grifting after claiming to have a fine tuned llana model which is just wrapping sonnet 3.5 is back at it again

1

4

0

0

676

bocchi fan @rusty_coconut

16 days ago

Openai should publish a blog on the longest and most token consumed goals for fun

0

2

0

0

13

bocchi fan @rusty_coconut

19 days ago

@tngtech Is it possible to make the tech talks public? It's currently unlisted. It would make it easier to retrieve the transcript and discuss the content with AI? Also very curious to see the other tech talks

0

0

0

0

12

Who to follow

🏠Walkable neighborhoods that embrace belonging, transportation freedom, & thriving local businesses 📍Tempe, AZ // Atlanta, GA

nerdy writer type. mostly harmless. co-host, wnyc's @radiolab. host, #theotherlatif and #connectednetflix

Verified account

ceo @MonkEtweets

bocchi fan @rusty_coconut

24 days ago

@nrehiew_ You should check out asianometry if you haven't already https://t.co/zthN0pG8A9 amazing videos

0

0

0

0

107

bocchi fan @rusty_coconut

about 1 month ago

@MaxForAI https://t.co/PITNuzjPqK 主要问题是jax生态和需要serving时候用pytorch

about 1 month ago

part of it is needing to use jax for training on tpus and then pytorch for inference on gpus and needing to switch back and forth all the time, then not being able to use any open source pytorch code for training research without porting that too. it can also be harder to debug issues on tpu vs gpu etc.

0

20

0

2

4K

0

1

0

0

329

bocchi fan @rusty_coconut

about 1 month ago

Reminds me of Ted Chiang's The Evolution of Human Science story. "In the face of metahuman science, humans have become metascientists."

about 1 month ago

I think we are in the process of discovering that humans are bad at mathematics. A gibbon would scoff at an Olympic climber; the human body is not optimized for climbing. We're getting mounting evidence that our brain may be far from optimal for advanced math. No disrespect to mathematicians. I was a two-time IMO silver medalist; I'm just smart enough to appreciate that some people are much, much smarter. But it's starting to look like math is somewhere on the midpoint of Moravec’s paradox; between chess (computers surpassed us some time back) and cooking (probably many years to go, for general capabilities). It's fairly hard for us, and so it looks like computers are going to surpass us. AI math still has important weaknesses. For instance, AI systems have not yet shown any ability to identify interesting research directions, or develop new concepts on which further work can build. But they are starting to look superhuman in some respects. And once AI *starts* to become superhuman in some domain, we all know what happens next.

184

2K

207

648

660K

0

3

0

0

74

bocchi fan @rusty_coconut

about 1 month ago

0

0

0

0

15

bocchi fan @rusty_coconut

about 1 month ago

@ts1mm @ClaudeDevs https://t.co/IdiUf9OBgR this video is super helpful covering this. As batch size grows, weight fetching cost gets amortized and there is diminishing return for slow (than regular inference with larger batch size and window) mode

0

0

0

0

234

bocchi fan @rusty_coconut

about 1 month ago

@jerryjliu0 @mintlify Late interaction model like colBERT looks really promising https://t.co/HEDQ62TQUI

@lateinteraction

4 months ago

Wow. It’s absolutely preposterous that ColBERTv2, a 100M parameter retriever, still fricking outperforms Qwen3-Embed-8B, an 80x bigger dense retriever. ColBERTv2 was trained by one dude in 2021 on 4 A100s for 4 days, on top of puny BERT-base. Single-vector models hold IR back.

lateinteraction's tweet photo. Wow. It’s absolutely preposterous that ColBERTv2, a 100M parameter retriever, still fricking outperforms Qwen3-Embed-8B, an 80x bigger dense retriever.

ColBERTv2 was trained by one dude in 2021 on 4 A100s for 4 days, on top of puny BERT-base.

Single-vector models hold IR back. https://t.co/SOT1YvmJfs

12

472

32

305

43K

0

2

0

2

1K

bocchi fan @rusty_coconut

about 1 month ago

Kids will love wingspan dinosaur version for sure

0

3

0

0

38

rusty_coconut retweeted

about 1 month ago

Virgin egocentric data collectooor vs. mecha-mounted teleop chad

__Rhodium__'s tweet photo. Virgin egocentric data collectooor vs. mecha-mounted teleop chad https://t.co/HnQMNNcaJH

18

1K

81

144

70K

bocchi fan @rusty_coconut

about 1 month ago

Everyone be talking about llm benchmarks. But we need more manufacturer hill climbing for consumer hardware accuracy like heart rate / sleep tracking against eeg / highest accuracy devices like what the quantified scientist channel is doing

0

1

0

0

40

rusty_coconut retweeted

TNG Technology Consulting GmbH

about 1 month ago

https://t.co/7bzOltNaLi

0

18

5

2

7K

bocchi fan @rusty_coconut

about 1 month ago

@timourxyz yeah it's really annoying how lot of the providers don't have near real time data availability + wanting to sell subscriptions

0

0

0

0

31

bocchi fan @rusty_coconut

about 1 month ago

Gen Alpha chip nerds will read SemiAnalysis first, discover AnandTech later, and act like they personally unearthed the Dead Sea Scrolls of cache latency.

0

0

0

0

16

bocchi fan @rusty_coconut

about 1 month ago

Program bench but for porting program to rust with given source code?

0

0

0

0

17

bocchi fan @rusty_coconut

about 1 month ago

Ma jiaqi the Chinese version of goblin mode like investigation. Love this kind of investigation.

@RyanLeeMiniMax

about 1 month ago

https://t.co/wxsraJGsnk

13

329

45

196

533K

0

0

0

0

40

rusty_coconut retweeted

about 2 months ago

How much of SQLite, FFmpeg, PHP compiler can LMs code from scratch? Given just an executable and no starter code or internet access. Introducing ProgramBench: 200 rigorous, whole-repo generation tasks where models design, build, and ship a working program end to end. 🧵

jyangballin's tweet photo. How much of SQLite, FFmpeg, PHP compiler can LMs code from scratch? Given just an executable and no starter code or internet access.

Introducing ProgramBench: 200 rigorous, whole-repo generation tasks where models design, build, and ship a working program end to end. 🧵 https://t.co/8ayeDJLXaJ

104

2K

246

659

729K

bocchi fan @rusty_coconut

about 2 months ago

@wordgrammer This is deepseek but it's using up water in China instead

0

1

0

0

14

Last Seen Users on Sotwe

Trends for you

Most Popular Users