Every week @scale_AI , I read and break down a recent, relevant, important ML paper for non-ML folks.
Starting today, I'm sharing them publicly, scrubbed of any private info.
Below is the first one from the archives, covering a strange , insightful paper from Apple.
@ivanfioravanti Why not a higher quant? You have the room. On the same machine I'm using Qwen3.5 397B at 8bit and wondering if there's a better option now
@shreyasnsharma@Muennighoff 100%. In fact it would be strange *not* to see pass@k improvements at these values of k, as Yue et. al. also show.
You'd also need to show a graph of entropy imo if you want to disprove elicitation hypothesis
@peterwildeford Once Nvidia releases Nemotron 3 Ultra I think it'll be worth tracking them. They're getting serious about commoditizing their complement
@distributionat Fittingly, "the mother of all x" was popularized by... Saddam Hussein, to describe the impending Gulf War in 1991 https://t.co/7jG27HPkmQ
@weeklytreeman@kalomaze Hit a certain conversation length periodically and then summarize. Maybe if the conversation is confusing or you have to make notes for third parties. But not in natural conversation, even over many hours