@LottoLabs I have an M1 Max 64GB with 400GB/s memory bandwidth. It’s very fast using Qwen 3.6 with MTP
Is the m3 Ultra faster? Yes. But good luck finding one.
I personally think m1 machines with a lot of memory are a good buy
This is the same reason I don’t use Opus. When there’s ambiguity, it fills in the blanks.
To you, that’s a feature.
To me, that’s a risk.
If youre a precise, controlling developer, you want your AI to be precise and under your control. Opus is too agentic and I don’t like it.
@raviojhax Glad to see you talking about it. I’ve felt that way when my dopamine reward pathways are out of whack (from too much work, pleasure, screen time). It’s good to touch grass, meditate, exercise, spent time in real life with friends, eat clean, and work on healthy sleep habits
Qwen3.6 35B A3B can't fill out a paper form on its own. But give it NVIDIA's LocateAnything-3B — the #1 trending model on HuggingFace — as its eyes, and the two small models get it done together.
(The test: place each element at the right pixel position on a blank form image, not type into a field.)
Setup:
> Qwen is the brain (main model), LocateAnything is the eyes (helper model acting as a tool).
> I gave Qwen a new tool: ask "where's the email field?" and LocateAnything returns the exact x, y, width, height.
> The blue boxes on the screen are its detections. Look how tight they are — it nails every field.
Result:
> Qwen3.6 35B A3B + LocateAnything-3B: form completed, all info correct.
> Name, DOB, ID, gender, marital status, nationality, email, phone, address, postal code: all landed in the right field areas.
> Character-box alignment still a touch loose, but every value is where it belongs.
> 9m10s, 224.5k input, 24.3k output, 21 turns.
Why it matters:
> Qwen alone can't finish this test. Bolt on a 3B model that does exactly one thing > locate > and suddenly it can.
> A combination of small models can do the work of a single large one.
"Get a $1000 Mac Mini and run local AI for FREE!!"
Nope. There's no "free lunch" with local AI.
Here's a simple breakdown of why this math doesn't add up.
At the heart of local AI is the need for RAM, the computer memory that loads the language model so it can make its calculations.
AI intelligence is largely driven by model size: the number of parameters the model was trained on. And the whole model needs to be loaded all at once, so you need more volume of RAM to run larger models.
~10B param model --> needs about 5GB of RAM to run.
~30B param model --> 15-30GB of RAM
~70B param model --> 35-70GB of RAM
~200B param model --> 100-200GB of RAM
...and so on.
Here's the catch, the best open-weight models you can run yourself are about 1T parameters (like Kimi 2.6). That means you need about 500-1000GB of RAM. Which would cost at the bare minimum $15K and more likely $50-100K to run at full capabilities.
The second half of the equation is the speed of the RAM, and its memory bandwidth.
Cheap, low cost machines use LPDDR5 RAM is lower cost, but very slow.
A base M4 Mac mini has unified memory bandwidth around 120GB/s. Higher-end Apple chips and DGX Spark-class systems can get closer to 300GB/s.
Powerful workstations and premium graphics cards use GDDR7 RAM that has a memory bandwidth of about 1800 GB/s (yes, about 10x faster).
Nvidia RTX 5090 and RTX PRO 6000 Blackwell use this type of memory, but have only 16GB and 96GB of RAM, respectively which is very very fast, but not as much volume. And these cards cost between $4k and $13K for just the video card, with another $3-10K needed for the CPU, memory, power supply, harddrive, cooling, etc to get a full workstation built.
Meanwhile, data-center AI systems use HBM3e. A DGX B200 server delivers 6400GB/s of aggregate GPU-memory bandwidth across 8 B200 GPUs with 1400 GB RAM. (yes, more than 400x faster than a mac mini)
This server costs around $500,000 for the hardware and requires special power sources and cooling setups.
That's what Chat GPT 5.5 and Opus 4.8 are running on. They are AI models with between 1T and 10T parameters.
So, no, you can't buy consumer hardware for $300, $3000, or even $30,000 and expect it to replace the AI you have access to starting at $20/mo and as much as $200/mo for top account types.
Local AI is powerful, private, and an exciting, emergent part of AI. But its slow, less intelligent, and still costs more than you think.
Steve Jobs died 15 years ago, and I’ve wondered ever since how long Apple’s dominance could last.
Nvidia’s new integrated RTX Spark architecture is a smart chess move, focused on unseating Apple.
It combines a powerful ARM CPU, integrated Grace-Blackwell/CUDA graphics, and up to 128GB of shared memory in one system.
Not only does it threatens Intel and AMD, but it also goes straight at Apple’s M-series unified memory moat.
And the real prize is unlocking local AI, at scale. If Nvidia can put serious local model-running power into laptops, Apple’s lead starts to look less inevitable.
Are you kidding? The $200/mo AI bill is for top tier usage of top tier frontier models.
The board Huang is holding is not a Jetson/Nano board.
Local AI on a 3-figure piece of hardware cannot replace frontier model intelligence or speed. To even come close you need to invest mid 6-figures for a GB-300
@Sprytixl "I was paying $1900 month to lease a BWM M5. I bought a $249 bicycle and now ride to work for free."
Comparing a frontier 9T model to a basic 9B parameter model running at 10 tok/s is a joke.
The 1000X difference in intelligence is real.
@JMasterFarm@browomo Yes, slow at ~300GB/s versus:
Intel Arc Pro B70 or Mac M3 Ultra at ~800GB/s
Nvidia RTX 5090 or RTX Pro 6000 at ~1800GB/s
memory bandwidth --> speed
Nvidia’s new ARM-based SOC is rumoured to feature:
- Up to 20 CPU cores
- Up to 6144 Blackwell CUDA cores
- Use a TSMC 3nm process
- Up to 128GB unified memory
This has the potential to be an M5 Max competitor for AI-developers.
Local AI video gen runs best on Nvidia/CUDA hardware. But it can be done on a Mac with:
- Wan2.2-I2V-A14B
- ComfyUI
- 96GB+ memory (uses about 80GB)
Takes ~8 mins to generate a 5 sec 480p video. Speed is slow, quality is good.
@witcheer Just set up this same model at a 6-bit quant on a 64GB system and it will run quite well. Still very poor intelligence compared to Composer 2.5 but local!