Sr. Director, Communities and Influencers at VMware. ACM Member, Engineer, Administrator, Community Guy at Heart... Oh yea Marketing! Family with 4 kids.
Pasting logs into AI? Remove sensitive data first.
Log Anonymizer β single HTML file, runs in your browser, no backend.
Auto-detects emails, IPs, tokens, UUIDs and more.
π https://t.co/OqRG51yKKr
β https://t.co/Aoxzotfwz4
VMUG connect in Dallas, spend extra day learning VCF deployment design. I will be in Dallas. With Advantage lab presentation for your home lab environment !
#768 - Automating ESXi Host Preparation for VCF 9 with Paul van Dieen | Paul joins the podcast with Eric and Bob to talk about his blog article covering ESXi host preparation for VCF 9 with Powershell. WE, May 13 at noon Pacific. @ericnipro@plankers https://t.co/hYvyWlbYum
#767 - Tommy Grot Talks Virtual Tooling and VirtualBytes Blog with Eric and Bob | Join the conversation on WE, May 6 at noon Pacific. @ericnipro@plankers https://t.co/QOALkJm9fg
The AMD Zen4/Zen5 IPMI Thermal Driver for ESX Fling https://t.co/hzMRwgQI2e is now LIVE on Broadcom Support Portal under Free Downloads! π₯³
PS: Don't forget to click on T&C link or you won't be able to click on I Agree π
This will be a MUST attend briefing tomorrow (05/05) on our next planned release of βοΈ VMware Cloud Foundation (#VCF) ... π
Sign Up π https://t.co/AQzcytiAlT
For the benefit of others, I want to document a bug in the Nvidia GB10 chipset devices such as DGX Spark, also variations made by companies like MSI.
This Nvidia bug affects all GB10-based systems (NVIDIA DGX Spark, ASUS Ascent GX10, and by extension MSI's EdgeXpert/GB10 variant) because they share the same SoC and ConnectX-7 wiring.
Two DGX Sparks connected via QSFP, with both interfaces negotiating 200 Gbps via ethtool, but actual throughput capped at ~13 Gbps under both TCP (iperf3) and RDMA (ib_write_bw).
So instead of 200 Gbps or 120 Gbps between two boxes, you get just 12.9 Gbps which is super super slow when trying to distribute an LLM.
The root cause is: "The ConnectX-7 firmware reports "insufficient power on the PCIe slot (27W)" and throttles both PCIe domains. RDMA hits the same wall as TCP, which rules out the kernel networking stack and points to firmware/hardware below the software layer."
Updating the driver from 580.126 to 580.142 via apt full-upgrade resolves it completely. The power warning persists in logs but no longer throttles. Use apt full-upgrade to achieve this (with sudo of course).
Problem solved. Hope this saves you some time. NVIDIA should have told customers about this, and they should have shipped the units with the updates in place, but they didn't.
This is a impressive discovery for Zen4 CPUs, great fix. -eric Quick Tip: High CPU Utilization on ESX due to Slow Entropy from AMD Zen 4 CPUs | William Lam https://t.co/GZx8LiK4I5