just wrapped up my latest side project- ai agents that play the resistance: avalon! built both rule-based and llm agents to see if ai can master social deduction, deception, and hidden info gameplay.
how do you think they fared?
1/n
Just saving this here to document a story and as a self reflection on whether AI is really making me more productive
Yesterday morning I found a way to complete the new HVM approach, that is much faster than before. I spent a few hours writing a spec, and then used Opus to implement. About 3k lines of C code later, everything worked and performance was incredible: 5x faster than HVM4 (stable at ~10x now). So, in one day I had outclassed HVM4. Incredible. I'd never have implemented that so fast manually.
Now, enter today. I want to turn this into a real thing, but I haven't fully read the 3k lines yet. So, how do I trust it? I spent the whole day auditing the code. With AI. Several bugs found, most minor like forgetting to collect() some argument. But then I stumble upon this:
λ{ inl: 1 ; inr: 1 }
This was a test. But wait. This is matching on inl/inr. So the branches should receive the value of the Either. But they were numbers instead. Numbers aren't functions. This makes no sense. So why this is a test?
It then stuck me. The AI completely misunderstood how function arities work. It literally assumed for no good reason that HVM5 was supposed to handle under/over-applied functions. For no good reason. I never wrote that. It never asked either. It just kinda thought "HVM is weird in some aspects, this might be one of them..." - and then it went on to implement a massive system to handle cases that should never happen to begin with. And all of that code is obviously wrong because it should not even exist. It is wrong. It is damage. And it is there.
But it isn't too bad either. I just told Opus that it was wrong. Perhaps not so politely. And it solved it just fine.
But then this begs the question. I spent ~20 hours in this file, and it is STILL not done. I went from 0 to 95% in the first 5 hours. Yet, 15 hours later, it is still not 100%. I suppose that is the real effect of using AI. If I had just written the C file manually in the last two days, would I not be further than where I am *right now*?
Surely, the first version would have taken much longer to drop. But when I'd finish writing all that code, there would be zero, literally zero retarded shit. And, just today, I caught 5 or 6 retarded shit. And the worst part is: I don't know what the number of retarded shit left is, but I'm afraid it is >0.
So if I have to read it all, review it all to ensure there is no retarded shit... what did I achieve by using AI, other than that dopamine anticipation?
Millions of people use Ookla Speedtest. SIMBA is the most reliable for internet speed and has the fastest download speeds in Singapore according to Ookla's latest report.
You too can have SIMBA 10Gbps Broadband for just $29.99 /30 days with a free eero Pro 7 router worth $599.
A profound error that many experienced product people make is to fall into the habit of thinking & speaking at the level of clever proxies (frameworks, industry jargon, corporate buzzwords) rather than seeing the basic facts of the customer situation & identifying what matters.
I had hoped some AI folks would prove me wrong and that you can indeed go to bed and have "agents running while you sleep". I'd love that.
All I got was a bunch of vague posts, claims from folks who are "totally doing it" or "have a friend who does this all the time". Lots of anonymous anime accounts. Lots of folks butthurt by me merely asking for something more credible than "trust me bro".
I was expecting links to videos or posts from credible developers explaining how they're making it happen. I mean, stuff like what @mitsuhiko or @badlogicgames put out here all the time about how they work and which tools they use.
But nope. Crickets.
https://t.co/3HttIcxtlT