@JakeKAllDay With ik_llama.cpp, you can get 110 tok/s with an RTX 4070 12GB using Qwen3.6-35B-A3B-IQ4_XS-4.19bpw.gguf (comparable to Q4_K_XL) and MTP. I'm sure any 12GB RTX card can get similar results too.
https://t.co/T4cPZuchEF
@CryptoCred I would suggest starting with Freqtrade / CCXT for connecting to exchanges and experimenting with trading algos and strategies. Codex (GPT 5.5) is an excellent choice for this. Much more deterministic and accurate than Claude imho.
@decodejar Have you tried Codex (GPT 5.5) with either Codex CLI or Opencode? I've recently switched from Opus/Sonnet, mostly working on trading algos, and it's been day and night for me. Codex is fast, deterministic and doesn't overthink.
@ItsmeAjayKV For my setup, RTX 4070 Super 12GB, --spec-draft-p-min 0.75 worsens performance, and has no positive impact on draft acceptance.
I'm seeing a clear performance regression in the merged MTP branch. Around 15 tok/sec less and 0.20 acceptance rate less.