An unwilling diversion from Codex today
Just found out my 60% weekly limit drops to 7% in one day, panicking, I researched on local models. Finally use Aider with Qwen-2.5-coder:7b & 32b. It’s very much unusable. Even though locally hosted thru ollama, somehow the response is slow & the 32b version is hallucinating non-existent prompt a lot. If @qwen has a better way to do it, please let me know! (Surprisingly, 7B version hallucinate less than 32B and is a little bit usable.) Should I use Deepseek (@deepseek_ai) instead?
I went back to Codex again, trying to get it to generate a list of files for aider to add to the chat. Then my usage limit was up. Upon checking https://t.co/6ZJSGQO71G, I found that my subscription expires & became a free account, thus reducing my weekly limits so much. I happily renew my subscription and secretly thank @sama & @openai for their good product. Hallelujah! Back to designing chips again!
LLM as digital designer?
Many people wondered about using LLM as RTL designer. One of RTL designers' main tasks is to fix timing. Whenever there is timing report with violation, it is the task of the team lead to find out which block the worst violations belong to & distribute the report to different block designers. In modern design, the number of blocks has becomes unmanageable for one person & many designers do block level designs & the team lead will stitch them together.
In my sample design, Codex was able to explain the violating path in human terms. This is not easy, sometimes human designer has to stare at the violating path for a while, before the reason pops up, as the path can be very cryptic.
Not only was Codex able to explain the path, it is also capable of suggesting fixes.
This process iterates until all timings are fix and implementation succeeded.
Sounds trivial but exactly one of the most important tasks that RTL designers have to do day-to-day. IMHO, LLM as designer test has passed.
Which AI model to use for digital design?
People always wonder. With Claude in the news every day, we naturally think that Claude Code is the best & OpenAI's Codex is probably second class. I have use CC & Codex for both programming task & digital design. Surprisingly, CC is very happy to obey our commands, but its output is many times buggy, sometimes with major architectural issues. In my mind, CC acts like a junior programmer/chip designer. Codex, on the other hand, will not always obey and tends to debate your choice. But it was able to point out CC's mistakes & correct them. Many times, Codex made CC's unworkable code/design OK again. For digital design, CC's output was not compilable at one stage for my sample project and it gave up. I have to launch Codex to solve the issue. During the resolution process, Codex also indicated the architecture problem to me. It is indeed, a more seasoned programmer/chip designer.
Aside from coding RTL, Codex is also a good helper in creating TCL scripts, which is essential to post processing synthesis results, for example. It is quite impressive. For $20/month, I think it's a great deal. It now has newly imposed daily limits & weekly limit like Claude, but it seems more lenient though. So far so good. So Codex, as a black horse, is worth a try. Just don't let Mr. Altman @sama nor @OpenAI know, otherwise, the monthly fee may increase :-)...
Impact of LLM agent to digital design
Traditional digital design rely on text based editors to manually type in RTL. This is a manual flow with many tools on the road. LINT for syntax check, synthesis, DFT scan insertion, ATPG, P&R, DRC chk, inegration into GDSII, etc. Some companies will also have power estimation, either by RTL power analysis tool or use PTPX.
Overall, it is a heavily text based flow. RTL is text, netlist is text, so are the reports, test bench, test patterns. Even waveforms, in VCD format, are text as well.
Most digital design tools are command line based, which optional GUI for visualization, which expert users do not use often, except for debugging (like for ATPG debug).
All these reason, make digital design, very suitable for agent-based design using CLI. The revolution has started, which some insider saying, agent-generated verification has risen to 70%. Boiler-plate design will be very easily generated by AI agents. While breakthrough architecture still the realm of human, we expect day-to-day permutations of classical blocks to be all written by agents very soon, probably within 2026.
Powerful as they stand, LLM agents are not without errors that subsequent tool flow can catch, like synthesis and other downstream tools can still catch errors committed. But with almost entire toolchain in CLI, the whole digital flow can literally be overtaken.
On one hand, it will free up more time for human designers to think about architecture & have more time to iterate for efficiency in PPA search space.
Another speed up will be in script writing. Typically, designers had to write Python/Perl scripts for different purpose, like register generation, special compliance check constraints from Excel spreadsheets. Now all these can be delegated to AI agents, a big time saver.
Can AI agent replace EDA software? That's almost like a SaaS question. I think it will take time. The big three has accumulated years of expertise & private data on these. The lack of publicly available data for training hampers the agents' capability to replace EDA tools, in short time span of 2~3 years. After 3 years, may be things will change by arrival of ASI. May be super intelligence can do a better synthesis, can replace Conformal ECO tool & can do formal verification... Those are million, or billion dollar questions that only the likes of Anthropic, OpenAI & Google can answer...
Sad to announce that after many many hours of struggle, the RTL that I asked Codex to convert from working Python code (accuracy >97%) still cannot perform above 10%. I think chip design is probably an area that vibe coding cannot conquer yet…
Working on vibe coding MNIST test using predictive coding & an image to spike front end . But guess what, Claude Code failed to make it work. Codex got it working to ~80% accuracy & Grok aced it by pointing me to a relevant 2025 paper!
Gemini CLI is stuck as well. Switching back to Codex as it’s the only one who’s not giving up (still suggesting fixes). I wish Grok had CLI so that I can try it.
RTL sim of codex updated code has train accuracy of 100% but test accuracy <10%. Repeated prompting of Codex couldn’t fix it. My hope is now on Gemini CLI…
Asked Claude Code to convert the difference predictive coding Python code into Verilog for simulation (eventually on FPGA). After some stuck sims, it downgrades 28x28 images to 8x8 & uses no hidden layer despite what’s in the Python code, then declared victory. I am forced to switch to Codex. Now I have something to simulate…
This paper, https://t.co/qbZrOzp9Fo, suggested by Claude, is pretty interesting. The code can get MNIST accuracy to >97%, albeit slowly. The reviewers’ comments are interesting to read/consider as well.
Ends up spiking world model was replacing RSSM in dreamer3 with spiking neurons, exactly what I did a few weeks ago in my private experiment. I was using eprop & they are using BPTT, which I tried to avoid as not bio-plausible. Needless to say, eprop’s performance was less than desired. Still looking for a way to do it without using dreaded gradients…
Discussing top level prediction of Rao & Ballard 1999 Nature paper with Grok today, conjecturing that it may be from world model & Grok suggested the Spiking World model paper for me to read (https://t.co/yIhP8CiVuN). Pretty cool!
@bindureddy Also by putting a Turing award winning scientist, inventor of CNN, under the wings of 28 yrs old founder of a data labeling company. IMHO, OAI+Sam Altman>>Meta+new chief AI officer
Puzzled that Codex wouldn’t obey my request to change some code from backprop to spike-based, I asked Claude Code to do it & it complied immediately, without debating that how infeasible it is. Wow! Looks like CC is the young intern & Codex is the old dog programmer 😬…
Ordered my DGX spark today NVDA opened it up after months of waiting post-reservation. Hopefully no more OOM errors to deal with anymore. Thank God for Jensen’s good decision.
The most dreaded messages nowadays are “context low” & “5 hour limit reached”. Yes, they can help work-life balance but Codex CLI can be used to “destroy” that balance as I am now an “al-code-holic”. These tools are so addicting+ I can use each of them to cross check one another
Watching Claude fixing errors & running code for extended time is really amazing. Somehow Codex still won’t run the code it generated & still need me to copy & paste error message back to its window…
Codex got stuck on my prototype again. It generates very hifalutin ideas that doesn’t work but overly complex. I tried Claude again & it seemed to have recovered & don’t give 5 hour limit anymore. Strange…