nvidia's 3B mamba destroyed alibaba's 3B deltanet on the same RTX 3090. only 24 days between releases. same active parameters, same VRAM tier, completely different architectures.
nemotron cascade 2: 187 tok/s. flat from 4K to 625K context. zero speed loss.
flags: -ngl 99 -np 1. that's it. no context flags, no KV cache tricks. auto-allocates 625K.
qwen 3.5 35B-A3B: 112 tok/s. flat from 4K to 262K context. zero speed loss.
flags: -ngl 99 -np 1 -c 262144 --cache-type-k q8_0 --cache-type-v q8_0. needed KV cache quantization to fit 262K.
both models held a flat line across every context level. both architectures are context-independent. but nvidia's mamba2 is 67% faster at generating tokens on the exact same hardware and needs fewer flags to get there. same node, same GPU, same everything. the only variable is the model.
gold medal math olympiad winner running at 187 tokens per second on single RTX 3090 a card from 6 years ago. nvidia cooked.
Bitget Wallet的Master U卡 突然没办法给claude续费了,之前成功续费了2个月,不知道是啥原因,重新update也不行,不知是个例还是普遍,提示 Your card was declined. decline_code是generic_decline,看来是完蛋要找新的u卡渠道了 @BitgetWallet