We released Qwen 3.6 35B A3B GGUF quants in both NTP and MTP.
The benchmark results made one thing clear: size, speed, and quality do not move in a straight line.
GPU-5 was hard to beat. If it fits, try it first.
Blog: https://t.co/WJrOEKPjfk
@KC_goes_digital We have them on our radar.
Our models are optimized for the best quality vs speed/size tradeoff. We benchmark on real tasks and measure performance across different hardware, so it’s easier to find what actually works best on your setup.
We recently released our Qwen 3.5 35B A3B quants.
If your setup can run GPU-7, you should try it.
If not, we’ve got options across all hardware.
Pi → 5090.
Blog: https://t.co/d1TCjbIgMS
@therealsol4ra Regardless, we're happy for everyone to perform independent evaluations of our models and test them out for yourselves! We're highly confident in their quality!
@therealsol4ra KLD is a distribution deviation metric measuring in this case the deviation in token generation between quants and the original model. However, it does not measure behaviour, like the model taking a different path to solve the same problem than the original model for example.
+ Hermes Agent
+ Qwen3.5 35B A3B
+ 4x parallel agents with 262k context window (each)
+ Over 200 t/s token generation + 3000 t/s prefill
+ 23.2GB total VRAM consumption on RTX 5090
It can take 5 parallel agents, 4 was the sweet spot with 2x in completion time vs 1.74x.
Dream inference.
@NousResearch@ByteShape
Run your own local AI coding agent
We just published a beginner guide for using @opencode with local models (@lmstudio, llama.cpp, @ollama).
Mac, Linux, WSL2, full setup + API + config.
https://t.co/TzgNNTxU49
From “I have a model” → “I have a working coding agent”
GPUs are consistent. CPUs are not.
With our ByteShape Qwen 3.5 9B quants, the same models perform well across GPUs, but CPUs each have their own “favorites”.
No one-size-fits-all. Optimize for your hardware.
https://t.co/RSX3iK3vgh
ByteShape was quietly launched just before the year end. Two weeks ago, we announced our investment in the company. Since its launch, and with minimal fanfare on purpose, @ByteShape cumulative downloads have easily blown past 100,000. No small feat for a new startup!
Announcing @twosmallfishvc's investment in @ByteShape.
In short, ByteShape is delivering step-function gains in AI efficiency, including up to 7x faster training, up to 10x faster inference, plus up to 40% compression to reduce model size.
We released ShapeLearn-optimized GGUFs for:
• Devstral Small 2 24B, tuned for RTX 40/50 GPUs
• Qwen3 Coder 30B, runs everywhere, yes even the Pi
Maximum quality. Fastest TPS. Minimal compromise.
GGUFs + interactive plots are live: https://t.co/VVZ87Pvm1p
Edge computing is getting spicy! Shoutout to @geerlingguy for showcasing our model. Love seeing what the community is building and how hard it’s being pushed. Clip: https://t.co/aPvRpsxIAC
Raspberry Pi has a new AI HAT. This time with built-in 8 GB of RAM, so you can run machine vision + LLM inference all without touching the Pi's CPU. It's $130 and a little bit of a niche item. Find out why in my video: https://t.co/vMhZ5w1wCU
Raspberry Pi has a new AI HAT. This time with built-in 8 GB of RAM, so you can run machine vision + LLM inference all without touching the Pi's CPU. It's $130 and a little bit of a niche item. Find out why in my video: https://t.co/vMhZ5w1wCU