COO at Read|Thomas - BIM and VDC Consultant, featured in Mastering Revit 2011 and Tech Editor for the Revit Essentials 2011-2012. Email: [email protected]
We’ve been busy and are very excited! This week we will be releasing a Halloween scene and assets for Enscape. Teasers to come each day! Just getting started. #enscape#spooky#free#rendering#realtimeVFX
It’s Spooky Season and that means Candy, Creativity, and Curiosity.
What started as a fun seasonal scene turned into a new way of working for us. By blending Revit, Blender, and Enscape, we discovered a faster, more creative approach to visualization.
https://t.co/zuB5MBZW0V
@TheRevitKid@kanoferreira For guys still printing things that we work closely with they would agree with you. You misread and mishear the foot and something something far too often. Guys are hot, tired, and sites are loud. Keep it simple 😉
Apple's timing could not be better with this.
The M3 Ultra 512GB Mac Studio fits perfectly with massive sparse MoEs like DeepSeek V3/R1.
2 M3 Ultra 512GB Mac Studios with @exolabs is all you need to run the full, unquantized DeepSeek R1 at home.
The first requirement for running these massive AI models is that they need to fit into GPU memory (in Apple's case, unified memory). Here's a quick comparison of how much that costs for different options (note: DIGITS is left out here since details are still unconfirmed):
NVIDIA H100: 80GB @ 3TB/s, $25,000, $312.50 per GB
AMD MI300X: 192GB @ 5.3TB/s, $20,000, $104.17 per GB
Apple M2 Ultra: 192GB @ 800GB/s, $5,000, $26.04 per GB
Apple M3 Ultra: 512GB @ 800GB/s, $9,500, $18.55 per GB
That's a 28% reduction in $ per GB from the M2 Ultra - pretty good.
The concerning thing here is the memory refresh rate. This is the ratio of memory bandwidth to memory of the device. It tells you how many times per second you could cycle through the entire memory on the device. This is the dominating factor for the performance of single request (batch_size=1) inference. For a dense model that saturates all of the memory of the machine, the maximum theoretical token rate is bound by this number. Comparison of memory refresh rate:
NVIDIA H100 (80GB): 37.5/s
AMD MI300X (192GB): 27.6/s
Apple M2 Ultra (192GB): 4.16/s (9x less than H100)
Apple M3 Ultra (512GB): 1.56/s (24x less than H100)
Apple is trading off more memory for less memory refresh frequency, now 24x less than a H100. Another way to look at this is to analyze how much it costs per unit of memory bandwidth. Comparison of cost per GB/s of memory bandwidth (cheaper is better):
NVIDIA H100 (80GB): $8.33 per GB/s
AMD MI300X (192GB): $3.77 per GB/s
Apple M2 Ultra (192GB): $6.25 per GB/s
Apple M3 Ultra (512GB): $11.875 per GB/s
There are two ways Apple wins with this approach. Both are hierarchical model structures that exploit the sparsity of model parameter activation: MoE and Modular Routing.
MoE adds multiple experts to each layer and picks the top-k of N experts in each layer, so only k/N experts are active per layer. The more sparse the activation (smaller the ratio k/N) the better for Apple. DeepSeek R1 ratio is small: 8/256 = 1/32. Model developers could likely push this to be even smaller, potentially we might see a future where k/N is something like 8/1024 = 1/128 (<1% activated parameters).
Modular Routing includes methods like DiPaCo and dynamic ensembles where a gating function activates multiple independent models and aggregates the results into one single result. For this, multiple models need to be in memory but only a few are active at any given time.
Both MoE and Modular Routing require a lot of memory but not much memory bandwidth because only a small % of total parameters are active at any given time, which is the only data that actually needs to move around in memory.
Funny story... 2 weeks ago I had a call with one of Apple's biggest competitors. They asked if I had a suggestion for a piece of AI hardware they could build. I told them, go build a 512GB memory Mac Studio-like box for AI. Congrats Apple for doing this. Something I thought would still take you a few years to do you did today. I'm impressed.
Looking forward, there will likely be an M4 Ultra Mac Studio next year which should address my main concern since these Ultra chips use Apple UltraFusion to fuse Max dies. The M4 Max had a 36.5% increase in memory bandwidth compared to the M3 Max, so we should see something similar (or possibly more depending on the configuration) in the M4 Ultra.
@AEC_Tech_Dash Lots of podcasts have been started in the last 4 years primarily. From Construction Brothers and TRXL Podcast to Wrenches Gadgets and Hacks. Even our Readthomas and Bimkits channels have far more interaction now. It’s odd.
If this is how your customer support “works” be assured that users with leave your service and platform. @Reddit@redditstatus@reddit_lies 5 emails, 2 chats and still not a human response. Just copy and paste. How are those AI Agents working out for ya?
We have been hard at work updating and adding content to https://t.co/uoxOJ6JMhY for the last few weeks! Our recent addition is an Arcade Kit for Enscape! Enjoy!
🎙️In this week's #WGHpodcast episode, we chat with AECO Maverick @jadamthomas , discussing the human side of technical consulting. Discover why sometimes the best solutions begin by stepping away from the computer!
Listen here 👇
https://t.co/FxIqUT4ipk
#PeopleOverTechnology
@ReadThomasBIM project discussion and kickoff underway. Lots on the docket: Real Time assets, Revit assets, Projects Support, pyRevit development, and BBQ.
🎙️ Exciting news! In our latest episode, Scott Reynolds updates us on the legal battle over building codes and the Pro Codes Act. Don't miss this insightful conversation impacting the AEC industry. Listen here: https://t.co/AbplmvJtDS #buildingcodes#AECindustry