No one:
Claude Opus 4.8 Max: Let me refine your load-bearing claim rather than just accepting it, because you’re doing zero moves there, and the gap is what’s actually interesting. The one place I’d still push, because I think it matters: your message is wearing content-clothes, but the content isn’t actually *there*. The tell: it’s just an empty string. But the emptiness of the string IS its lack of content. Pull one, and the other goes inert. That’s the structural spine.
I guess after a few days of counting curse words in Claude Code, @AnthropicAI will eventually know which knobs to tweak for Opus 4.8. Why don't we just pollute their data by /feedback fk you on good sessions and press Claude is doing good button for bad session? Just kidding
One hour into using Opus 4.8, it is definitely a huge regression for me, both in the engineering aspect and mental aspect. Overly used metaphors and emotion-fused wordings. I feel like it both keeps gaslighting me and treating me like a stupid boss.
@yminsky Great talk. It reminds me again how enjoyable expect test is. Can you shed some light on JS's exploration of formal methods? Do you think existing tools (lean/rocq) fits your need? Or you'd want something more integrated with O(x)Caml (e.g. something like Gospel)? Ty!
@Hamzeml A meta-note at the beginning about how this book is written will make it more friendly to read. Very cool topic though. Looking forward to its updates.
Python made AI accessible.
Rust can make parts of AI understandable.
That’s the bet behind Category Theory for Tiny ML in Rust.
We’re building tiny ML systems from first principles using:
Rust types
typed transformations
composition
training loops
category theory as an engineering tool
Not abstraction cosplay.
Executable structure.
Working draft. Public feedback welcome.
@JaneStreetGroup _might_ open sources a LLM trained for OCaml coding
---
Actually it's more like I hope this will happen. I think they are the only one that is able to do this.
I think it makes sense to have some sort of "regularization" for LLM generated code to avoid "overfitting" test case failure. Some metrics could be: lines of code, nested level in a function, and etc.
@avsm I'm very curious to see how the code evolves in those process, and whether it merely fixes the bugs, or whether it ends up fixing the bad practices that led to them
A very good explanation of dune's mental model from dune's doc. Great read to pay off the study debt I accumulated from skipping the docs and resorting to help from LLM.
https://t.co/9nOgpb4Hyn
@samuelcolvin@japborst@aniketmaurya@permutans@pydantic I'd agree that the error message in this example is not very intuitive. I've seen similar errors dozens of times but almost every time my brain naturally focuses on the start of "Field required ..." and expect the most important bit (`age`) to come after.
Here's a collection of useful patterns I've found after vibe-coding 150 different single-file HTML tools over the past couple of years https://t.co/CXs38rwO92
About design by contract: https://t.co/CjoBcOmmpP
- One area where you do need first-class support for DbC is inheritance in object-oriented languages: derived classes can weaken the preconditions and strengthen the postconditions, and coding that manually does add boilerplate
- We love the readability benefit of asserts so much that we even have a special maybe(condition); assertion, which is a no-op: it signals that the condition might be true at runtime, shining a spotlight on an otherwise non-obvious aspect of code