Salvador Dali Speaks with Albert Einstein.
You can do this too by using my custom persona builder found on my GitHub!
All running simultaneously on a single DGX Spark ⚡️ not breaking a sweat while doing it. Can easily spin up dozens of these at once without slowing down. @NVIDIAAI
It's just going to get weirder and weirder and weirder
I think Terence McKenna's Timewave was almost completely perfect but the elusive part is figuring out where the true Zero Point is.
@akirathedon is such a legend for this
https://t.co/rAaHIGwnnh
Unfortunately I’ve run out of disposable funds to rent runpods experimenting to see if I’ve got other ways to resolve them. It’s been a painful process this model is quite complex and sensitive to both abliteration and quantization.
The working BF16 abliteration is public maybe someone with enough disposable compute will quantize it soon.
Unfortunately have to gate the model quant as temporary non-functional.
Something odd happened in weight key map on quantization that’s causing it to destabilized and output garbage.
Working through a fix will post when working.
Time to rent some more B300 GPUs 🦾🙈
Step-3.7-Flash officially Abliterated and Quantized to NVFP4
Unfortunately this one is just barely too large to fit on a single DGX Spark, you will need 2x DGX Sparks to run it. Sadly I only have one so only could do limited smoke tests so far, would love hear feedback.
https://t.co/KtakI3NSrG
@TeksEdge@xbin12345@msiUSA Isn’t this the same hardware as the DGX Spark minus the Connect-x7 200gbps infiniband ports?
I’m sure DGX OS would run on it, or any other Arm based Linux. If not yet I’m sure Linux support will come soon after.
Basically A DGX Spark that’s has more limited scaling ability.
I requantized much more cautiously this time, then was still getting failures (different errors thought), found an important distinguishing feature in the official NVFP4 that somehow got ripped out in quantization some input scales. Fingers crossed 🤞 restoring input scales and might have a working version soon. This has been quite the challenge, and and expensive one at that. Thanks for your kindness and offer to help 🙏
Step-3.7-Flash officially Abliterated and Quantized to NVFP4
Unfortunately this one is just barely too large to fit on a single DGX Spark, you will need 2x DGX Sparks to run it. Sadly I only have one so only could do limited smoke tests so far, would love hear feedback.
https://t.co/KtakI3NSrG
Introducing Cosmos 3: Our latest frontier model for Physical AI
Cosmos 3 is the world’s first fully open omnimodel with native vision reasoning, world and action generation.
Today we’re releasing Super (32B) and Nano (8B) variants.
I could squeeze down some of the attention and other layers or remove vision (but that would be a big loss)
Even still it’s going to be far too tight for a single DGX with any more than one agent instance and tiny kv cache.
Also the layers I left unquntized were intentional to prevent model degradation even still it seems to be unstable so quantizing attention layers and router is not viable with a stable model while also abliterated.
Working through a new approach but not expecting a fit on a single spark without distilling the model.
@ClankerQueen Working on a fix that seems to make the alliterated weights lose integrity at quantized levels.
I had limited time on the rented GPUs to smoke test and it was working at the time.
Going to gate these models temporarily and work through the bugs
@garychanhk825@StepFun_ai Oh, I’m sure it will, but it might still be better than a small model without quantization for certain use cases.
Waiting on Stepfun 3.7 Flash and Llama.ccp to be compatible.
1bit done right is surprisingly usable.
It’s actually 1.58bit [-1,0,1] BitNet
https://t.co/uraamndKdj
Proactively quantized some Step 3.7 Flash GGUFs. When Step 3.7 Flash is supported on llama.cpp these should all run.
Even a mind blowing 1bit quantization that brings total footprint down to 48GB!
Let me know if you manage to get any of these working early. @StepFun_ai keep us in the loop on llama.cpp engine support.
https://t.co/xkrpT2ASpZ