@Italianclownz@barackomaba@sudoingX@LarryAGuy1 with the work from you too and i have 35b at peaks of 137.9 t/s (~83 mean) and 27b at peaks of 51.4 t/s (~32 mean)
Thats matching/beating 2x 3090 setups, but mine are co-loaded, and have access to a small army of helper, speech, img, and video models in parallel on the npu π€
@LottoLabs hey lotto! my agent accidentally made some mistake (or was being overly literal) and lots of my 5060ti egpu runs on localmaxxing are also showing on the strix halo boards. despite patching my submissions they still show there. I don't want to slop ur site up, plz help
@Italianclownz@barackomaba@rocketman110us@sudoingX@DJLougen@populartourist I dont see why not. maybe a multi task bench testing small runs on code, creative, extraction, facts, multi-turn convo, short gen, long gen, etc? score each section. select for configs w best robust performance bias towards stable gen across tasks > bursty peaks on select tasks
@barackomaba@Italianclownz@rocketman110us@sudoingX@DJLougen :( spec draft type stacking i picked up from @populartourist . but relative success does depend on task type.
I have a coherence baseline quick bench that gets run to keep things sane. I just give all the flags I want to stack/test and toss into an autoreseach hill climb loop.
@barackomaba@Italianclownz@DJLougen forgot I got long bench running rn but for stats on a previous run my strix lean 35 had a mean tps of 82.38 and max of 122.80. pp 772 @ 50k, 1101 @ 10k
f16 kv
b 32768
ub 2048
t 32
tb 16
draft-mtp , ngram-mod
draft n max 6
draft p min .25
ngram-mod match 32
ngram-mod max 64