@ivanfioravanti Yup, rtk is great! Using it in pretty much all of my LLM workflows π Configuring it for unknown/custom commands is a bit of work but it handles the common patterns nicely and the coverage is growing too.
@agitbackprop@antirez My experiences with it are mixed. Sometimes works, but also breaks down, often when the model has to make multiple edits to the same file, or larger edits. The usual error is using old hashes. Larger/cloud LLMs are better at it. Also interested to hear @antirez thoughts.
@mr_r0b0t@NVIDIAAI@Alibaba_Qwen Working nicely! Tried this model yesterday and got pretty much the same numbers π₯³ The unsloth NVFP4 version has 0% acceptance (i guess MTP heads in wrong format or sth), sakamakismile/Qwen3.6-27B-Text-NVFP4-MTP is working nicely.
@MichaelZima@KuittinenPetri@antirez On Mac I use OMLX, a bit MTPLX too. It runs nicely, lot of progress in the past months, also the multi-token prediction stuff has been boosting inference speeds on Mac too. If you already have a Studio, plenty of good stuff to run.
@KuittinenPetri@MichaelZima Qwen 3.6 27B, different quants, atm via llama-cpp as it just landed MTP support. Also the DwarfStar by @antirez is running nicely on the GB10.
Those are my main models atm, then also a bit of Qwen 3.6-35B-A3B and Qwen3-Coder-Next.
@techedgedaily@rohanpaul_ai Yeah. Now anybody can vibe-code their software into existence, and if it's good software, they can have other people pay them to use it. Oh wait..
@0xSero With MoE, the experts are not activated just based on prompts, the expert selection is done for _each token_ separately. This is why the full MoE usually needs to be loaded in memory even if a fraction of it is used per token.
@usr_bin_roygbiv@LottoLabs Parallel processing. The mem bandwith is low for a single token stream but there's a lot of compute in that box to handle X tasks at a time. Parallel subagents etc. If you only need single-user single-stream, go M5 Max. https://t.co/6MjBCDXZ2T