The US AI pay-to-play scam is so much more tolerable after switching to a locally hosted GLM-5.2. From the front page of HN, open weights will be the frontier this December. Sorry about your IPOs.
@fabianstelzer The true sloppenheimers are the guys selling nannyware to trendslop producing promptcells, to protect against the kind of skillspam that even the most blatant agentwashing cannot turn into vibeware
"It almost makes me wonder if idpol was a psyop to get you to doubt this idea. Poison the well of the humanities to make way for the false God of technocapital."
NEW: malware developers added nuclear & biological weapons text to to their spyware.
Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner.
Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky.
When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit.
We are only in the earliest days of attackers leveraging these features, and it wouldn't surprise me if users systems that need to handle complex cybersecurity issues demand that models be less safety-blunted.
In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation.
H/T to colleagues that shared this with me https://t.co/f3Aj9TYxU4
this is my personal singularity moment
this post may sound like a paid ad. I only wish. I'm concerned, more so than happy. the world is changing, and, among the scenarios where AI goes terribly wrong, inequality is the most realistic, yet, the one Anthropic seems to be the least concerned about. I'm glad OpenAI is taking the opposite stance: *personal AGI for everyone*. I think this is a commendable position in the times we live. but who am I in the queue of the bread?
anyway, Fable is here, so I'll just report my first-hour experience
first of all, all my pet prompts are solved.
→ λ-calculus puzzles
→ bug questions
→ one-shot apps
all are trivial to it.
I don't have anything harder other than my
ongoing work
so, in the last several days, I've been toying with HVM5, a new interaction net evaluator with a faster loop.
after writing the first version, I left 32 GPT-5 agents working for ~20 hours each. this resulted in up to 2x speedups, but the file size increased by 2-fold and quality decreased significantly.
I then simplified the whole thing into an even simpler core, and left Opus 4.8 and GPT 5.5 optimizing it for 8 hours. Opus got a legit 6% - 34% speedup in most benches. GPT got better results, but, sadly, an unusable file.
I then asked Fable to optimize it.
2 hours later, it landed a 1770% speedup in one case, 100%+ in other 4, and 22% in average. yes, in 2 hours it outperformed me, opus 4.8 and a swarm of gpt 5.5 agents, by one order of magnitude.
that could not possibly be legit. "it must be hardcoding the benchmarks" (GPT trauma). so I read its explanation and what it did was, indeed, the most high impact optimization one could try first. seems like HVM5 was wasting a lot of time garbage-collecting unused branches of pattern-match nodes. I had optimized that for static mats, but not for dynamic mats. skill issue. Fable figured how to do it for these, resulting in a massive speedup in some benches
but wait, is that *correct*? I'm not sure yet, it is credible, but this is the kind of thing that is very easy to get wrong on interaction nets. the problem is, when I was ready to start auditing Fable's solution so I could tell whether it was buggy or legit, it interrupted me to tell me it had found a massive bug on the code *I* had written.
... wait, what?
so... for garbage collection purposes, I stored a bit on lambda term pointers that meant "the variable bound by this lambda has been freed, so, its lambda must free whatever argument it is applied to". that's fine. yet, on duplicator nodes, I also used the same bit to mean "one of the duplicated variables was freed, so, treat this dup as a passthrough no-op". so, if a lambda entered a duplicator, it would mistake the lambda's collection bit for its own, resulting in corrupted interaction!
that's a mouthful, why I'm writing this?
just so you can appreciate the sheer absurdity of what just happened. I didn't ask it to find bugs. I asked it for an optimization. and even if I did ask it to find bugs, this bug is so astonishingly subtle and specific, identifying it takes mastering the domain to an extent that it beyond even me. I'd easily need hours or days to fix it, *if* I ever came across it. chances are it would just go unnoticed. and Fable found it and fixed it like it was nothing, while it was busy adding a 17x speedup to a file that neither I, nor Opus 4.8, nor a fleet of GPT 5.5 managed to barely make 2x faster.
oh and there is also another tab where it is also ripping through Bend's codebase and finishing everything I had to do
I don't know what to say anymore
this isn't about Anthropic or OpenAI, this is about our collective future as a species. the world is changing, and we need to be aware of it, and discuss how to handle this change.
receipt below . . .
I like this concept of "intent dept".
"Intent is different. An agent can’t generate intent, because intent is the one input that has to come from you."
Blog summary: https://t.co/MW04bbXE2R
Paper: https://t.co/dChLb0BKbo
Ted Chiang is one of my favorite writers and this article makes many good points.
However, it is too close to the argument that LLMs are just stochastic parrots. (1/3)
If we confuse generative AI’s ability to produce text with consciousness, we risk assigning moral responsibility to chatbots—and not to their makers, Ted Chiang argues. https://t.co/j88vZlqxsd
Arguably, to be a good stochastic parrot you need some kind of "world model", like a representation of how everything fits together. This poses the question if this model does not also include ephemeral simulacrum of a persona. (1/2)
@architectonyx It's a bit like "shower thoughts" and once somebody here referred to this process as "system 3" in reference to Kahneman's system 1&2.
@No_AGI_But_Soon@NousResearch I am just playing around with it for now, but for example I like how it can send me daily/weekly reports for certain topics by using learned skills, browsing the newest publications, reddit etc.
Then send me the report via any messenger and save it as markdown/Obsidian notebook.
@bilawalsidhu@demishassabis Nice, but the physics is off: the drone is a quadcopter, but in the video it flies like a plane.
A quadcopter needs to tilt forwards to fly forward.
@readswithravi While I like many of your posts, they become repetitive and you never say anything bad about any book. This is unrealistic. How about a new post format for you:
What book would you NOT recommend?
What book did you NOT finish because it was bad?
What book is a waste of time?
"Contrary to the beliefs of the rationality cult, most things aren’t optimization problems. The whole hard problem is determining what to optimize for."
I wonder if part of the disconnect over the arxiv policy/hallucinated citation gate is that insiders perceive their funding and status to be based on adequate performance of certain rituals, whereas outsiders perceive it to be based on a certain relationship with the truth