dmitry kim (jsn)

@jsn13

Moscow, Russia

Joined September 2007

83 Following

142 Followers

293 Posts

dmitry kim (jsn) @jsn13

3 days ago

I remembered that I have a favorite programming benchmark, the one that I previously used to evaluate programming languages, that I can now use to evaluate coding models! I'm talking, of course, about ICFP Programming Contest 2006 virtual machine. So I asked models to read https://t.co/CTOdGyYx8T and to implement the UM virtual machine capable of running `sandmark.umz`. I've made a few runs with a few models that are +- in the same price range on Opencode Zen / Go: DeepSeek V4 Pro, GLM-5.1, Kimi-2.6, MiniMax-M3, Claude Haiku 4.5 and GPT-5.4 mini (well, DeepSeek is way cheaper, actually). So, GLM, DeepSeek, Kimi: usually use C, usually get pretty close, but then invariably are confused about "self-decompressing" wording on the web page, which makes them run `sandmark` not directly on their VM (as they should), but using `https://t.co/ltjWFRa4Al` (UM emulator written in UM) -- which, of course, is terribly slow. Then they get stuck trying random useless ideas to speed things up. I had to interrupt them at that point. Minimax-M3: usually uses C, is never confused about "self-decompressing", but invariably screws up op 13 every time. Sometimes it manages to dig itself out of that hole and delivers good results; other times it finds a way to dig itself deeper with some other mistakes. Haiku just did the whole thing in python (with terrible performance) and with some additional error causing `sandmark` to terminate early and without producing full output. Haiku then proudly declared that it has successfully completed its task. And GPT-5.4 mini just one-shots the whole thing every time, in Go or Rust, pretty fast, too -- definitely making it look too easy. It's a clear winner, and it's not close. I tried warning DeepSeek and Kimi about `https://t.co/ltjWFRa4Al`; after that, DeepSeek one-shotted the task pretty fast, too, and Kimi decided that it has to ignore my advice, and ended up stuck trying useless performance tricks again. I tried warning MiniMax-M3 about op 13, but it found some new way to screw up the implementation.

dmitry kim (jsn) @jsn13

8 days ago

@BTobotras @dinozavr

dmitry kim (jsn) @jsn13

8 days ago

@BTobotras @dinozavr Я только после твоего коммента начал подозревать, что Chinstrap and Adelie might not be some obscure Linux distros here.

dmitry kim (jsn) @jsn13

about 1 month ago

@BTobotras Про https://t.co/JZTLQH5jBH ты слышал же?

Who to follow

Anton Monakhov ([email protected])

Software engineer, private pilot, immigrant, Jewish, USSR born, anti-communist. Make California Red again!

dmitry kim (jsn) @jsn13

about 1 month ago

So yes, we have our irreconcilable differences with China about rights/freedoms. It makes it easy to forget that it *is* one of the great civilizations, and that it's one of those that are very much based on a huge corpus of written texts -- which LLMs thrive on. Case in point: I have a medical condition for which I find the Traditional Chinese Medicine view quite helpful. I would argue that it might be better to discuss that part with something like Deepseek or Kimi than with whatever Western model is white-hot these days. Makes me wonder what other topics are like this. I suppose Chinese models would be more familiar with e.g. Chinese classical Chan texts ("zen" before it came to Japan). I'd also expect them to understand those texts better, since they are notoriously hard to translate, and Chinese models would just have a deeper understanding of the language of the original, see more original comments, etc. Of course, there must be many other topics like this.

dmitry kim (jsn) @jsn13

about 1 month ago

@jurbed Duh, it's funny hearing that from you. You, of all people, have, for years, been one of the first examples in my mind of this idea embodied :)

dmitry kim (jsn) @jsn13

about 1 month ago

Hot take: quite often, the most useful mode of using an AI coding agent is peer programming mode. That's strong evidence that the most useful mode of using a human coding agent is often also peer programming mode. The main difference is that humans are so damn expensive.

177

dmitry kim (jsn) @jsn13

2 months ago

...still, when it works, when I'm not bogged down by friction of the mundane, not decision-fatigued from a thousand everyday nanodecisions, free to give it all to pursuing my big life goals -- do I then get the amazing results that *are*, indeed, the ultimate joy? Also no.

dmitry kim (jsn) @jsn13

2 months ago

Do I enjoy the low-key ongoing struggle of keeping my everyday life and the space around me orderly and well-organized? No. But when it works, when I manage to get that predictable, efficient, frictionless flow -- do I enjoy *that* enough to make it all worth it? Also no. But ...

dmitry kim (jsn) @jsn13

3 months ago

@adworse @z_nuts https://t.co/uW0HPrMPG8

dmitry kim (jsn) @jsn13

3 months ago

@arkenoi Как же ты, Петька, дошёл до жизни такой, что спрашиваешь меня, своего боевого командира, почему люди, глушащие GPS, не бегут, роняя тапки, делать систему, которая будет как GPS, но которую нельзя заглушить?

dmitry kim (jsn) @jsn13

7 months ago

It's pretty much everything that "fuck you money" is, except for the money part!

dmitry kim (jsn) @jsn13

7 months ago

@TheCinesthetic Rewatched this scene 4-5 times initially because I was absolutely sure there must be a moment there when shadows or background features behind Vader look like two giant mouse ears around his head. There isn't one. Unreal self-restraint from Gareth Edwards.

dmitry kim (jsn) @jsn13

11 months ago

@adworse @BTobotras Я для такого когда-то сооружал себе сетап, который локально делал inotify + rsync сорса на remote, и одновременно пробрасывал nrepl socket прозрачно туда же на remote.

dmitry kim (jsn) @jsn13

11 months ago

@jurbed I absolutely do agree, on some level, in some sense, that everyone deserves a lot of things. It's just that we very often don't get what we deserve. Which is sad, but has nothing to do with one's "right" to take something from others by force.

dmitry kim (jsn) @jsn13

about 1 year ago

@oleksandr_now then you surrender your chance to influence which parts of "you" die and when and how and which ones get to keep on living. you can get a lot of mileage from properly dying just the right amount all the time.

dmitry kim (jsn) @jsn13

about 1 year ago

@nikitonsky The idea is that the last Terminator has got her a promotion deal so good she'll either have no time to have a kid or too much money to raise him a good leader of the Resistance.

dmitry kim (jsn) @jsn13

about 1 year ago

@TomasForgac Oh yeah, my wife does that. And if, God forbid, I hesitate to pick one for a few seconds, she's gonna get nervous and add another 7 options.

dmitry kim (jsn) @jsn13

about 1 year ago

@BTobotras @ByakkaBukka Но это же не про зашквар, а про риски.

dmitry kim (jsn) @jsn13

about 1 year ago

@kpertsev @sergeax "The Accountant" -- для меня прям значимое кино, но я ожидаю, что сиквел будет примерно ужасен.

dmitry kim (jsn)

@jsn13

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users