@AlexFromAtomic@atomic_chat_hq It was meant to be 3 GB MORE vram... That should be pretty obvious if you think about it for more than half a second.
I tried 5 bit and 6 bit on 12GB vram, both ran fine, but the quality was still far inferior than Qwen 9B.
@googlegemma Thank you Google Deepmind for constantly releasing open models! ๐
We made Dynamic GGUFs so you can run Gemma 4 12B more efficiently: https://t.co/8cL321pVDh
Now this is smart engineering. Exactly what real AI builders aim for.
Meanwhile, all the grifters on this platform are still out here parroting "just use Claude; local AI isn't useful enough"
Qwen3.6 35B A3B can't fill out a paper form on its own. But give it NVIDIA's LocateAnything-3B โ the #1 trending model on HuggingFace โ as its eyes, and the two small models get it done together.
(The test: place each element at the right pixel position on a blank form image, not type into a field.)
Setup:
> Qwen is the brain (main model), LocateAnything is the eyes (helper model acting as a tool).
> I gave Qwen a new tool: ask "where's the email field?" and LocateAnything returns the exact x, y, width, height.
> The blue boxes on the screen are its detections. Look how tight they are โ it nails every field.
Result:
> Qwen3.6 35B A3B + LocateAnything-3B: form completed, all info correct.
> Name, DOB, ID, gender, marital status, nationality, email, phone, address, postal code: all landed in the right field areas.
> Character-box alignment still a touch loose, but every value is where it belongs.
> 9m10s, 224.5k input, 24.3k output, 21 turns.
Why it matters:
> Qwen alone can't finish this test. Bolt on a 3B model that does exactly one thing > locate > and suddenly it can.
> A combination of small models can do the work of a single large one.
Iโm sorry, but I donโt think you have a clear understanding of this.
Running models locally isnโt about โbe competitiveโ.
Anyone who buys into self-hosting understands this basic tradeoff: you sacrifice some speed and peak accuracy in exchange for greater privacy and full control.
I donโt know what kind of hardware youโre running locally (if any), but if you believe your home setup or personal server is โcompetitiveโ with state-of-the-art cloud models, youโre being delusional.
That said, the DGX Spark is already achieving very respectable speeds despite its memory bandwidth limitations.
@LottoLabs Yeahโฆ They really should drop a 40B, a 120B MoE, and a 30B dense version.
But that would create competition, and I doubt they want that right now, especially with their IPO coming up.
@Alibaba_Qwen Donโt hesitate to launch new models.
If youโre worried that a new release could cannibalize your inference revenue, consider a staggered rollout of openโweight models.
Begin with the Qwenโฏ3.7โฏ9B, then after a month or two follow up with the 35B MOE and the 27B dense.
If you have ever used a computer to do actual meaningful work, you already know exactly what agents are for. No debate there.
I personally use Hermes Agent for deep web research and run hundreds of long brainstorming sessions with it.
Some sessions go nowhere, but others spit out solid ideas and technical docs that I've used to build self-hosted AI tools for my work and hobbies.
I also lean on it to benchmark local models and troubleshoot Windows 11 and Linux issues. Honestly, it just fits right into any dev workflow.
Don't give attention to this "theo" guy he is basically the face of AI grifting right now.
Constantly pushing the idea that local and self-hosted models are useless while openly shilling for the big labs.
I keep seeing him trying starting drama around open source projects, while never contributing to any.
OMG! PewDiePie just dropped his own AI agent and it is awesome!
I love to see things like this.
Free and open source AI/Tools are the future and although corporations may try to stop it, "they can not stop the future"
https://t.co/oIatDfX4I1