I've been comparing two setups running Qwen3.5-397B-A17B at full 262K context:
🖥 Mac Studio M3 Ultra (512GB) — €14,500
⚙️ Custom workstation, 4× RTX PRO 6000 (384GB VRAM) — €45,000
Results:
• Workstation: 46.9 tok/s, 1,100W, 51 dBA
• Mac Studio: 35 tok/s, 120W, ~15 dBA
The Mac is 6.7× more energy-efficient per token. Over 3 years, the TCO gap is nearly €40K.
I have never been a Mac guy, but I have to admit that the Mac Studio is currently the most attractive hardware for running local AI agents.
@The_Only_Signal@LottoLabs I have started the project of converting my air cooled workstation to water cooling. Will have a total of 6 RTX6000 Pro MaxQ in waterblocks, the Theadripper as well. I will have 2 MO RA IV 600 radiators that will be some meters away from my workstation to exhaust the air outside
@simpsoka Would be I think a big plus if /goal could you a smart model router to decide the level of the model (M, H or xH) to optimize the consumption. I currently use it to run cycles of different scenarios, collecting traces, telemetry, QA data, so Codex improves the code between cycles
@levie Are there some enterprises considering inference on premises? Some OS models can perform a good portion of agentic tasks. Then with a clever routing, only the very advanced tasks could be channeled to frontier models of the key labs.
@embirico@Dimillian Since May 2025, I was using both Codex and Claude Code. As of November, I was using less and less Claude Code. In May 2026, I have cancelled all my Anthropic subscription as I wasn’t using Claude Code since March. I am sometimes using Droid to run my local models.
@0xSero I am going to buy waterblocks to liquid cool my GPU , this seems to be an interesting approach, as you can more easily channel the heat out, or to another room in the winter, better thermals and less dust issues. I am also going to add 3 more RTX6000 Pro max Q
@Galanthai@ivanfioravanti Many corporations I know, are only providing Copilot. Employees aware of the power of the much more advanced models/ agents, are using theses secretly so they can gain productivity. Would be wiser if those corporations were adopting enterprise solutions from OpenAI or Anthropic
@ivanfioravanti Did you try the Codex app? It’s also fantastic that you can run it remotely on your iPhone; I don’t feel obliged to stay the whole time behind my computer. Also, if for instance I am walking the dog and an idea pops up, I can dictate immediately so I don’t lose the idea & logics