I implemented an LLM end-to-end in hardware, and ran it on an FPGA.
Zero Python. Zero CUDA. Just pure SysVerilog.
All my progress + everything I learned from 200h of LLM chip design (demo at the end)👇
if the rumors are true…
Huawei’s chip(s) might be seen as a very viable alternative to US chips in the global market.
This might have bigger implications for geopolitics.
🚨Viral rumors of DeepSeek R2 leaked!
—1.2T param, 78B active, hybrid MoE
—97.3% cheaper than GPT 4o ($0.07/M in, $0.27/M out)
—5.2PB training data. 89.7% on C-Eval2.0
—Better vision. 92.4% on COCO
—82% utilization in Huawei Ascend 910B
Big shift away from US supply chain.