@s9_l6 Kinda interesting because of 4 (at least) possible formulas to get the solution:
2xy = z
x^2 + y^2 = z
x(x + y) = z
y(x + y) = z
With x = 3, y = 4:
2(3)(4) = 24
3^2 + 4^2 = 25
3(3 + 4) = 21
4(3 + 4) = 28
@coreyepstein L take, IMO. Willingness to learn tools and techniques should be table stakes. Beyond that, all the old metrics are still as relevant as ever.
they already farmed the hype, and now people are wising up to the fact that 5.5 is either flat-out better, or at least dramatically more efficient than Opus 4.6/4.7. They need moar hype
Also, tbf, maybe they feel they've given enough early access to companies to resolve existing bugs?
@davis7 I'm a bit disappointed by it, but that's probably more to do with seeing how good it is and then expecting it to just be perfect. I'm annoyed when it does stupid things.
Strongly disagree. Those evals are created to differentiate between humans because most/all of us share some basic capabilities. Those aren't as easy to evaluate, and they aren't as apparent until you compare us to a superintelligent computer that can't count the number of R's in strawberry.
As a sidenote, what's the point of calling out what they can do? Everyone knows, it's all over social media and the news. The more interesting question is what they can't do, and why.
@labibrahman@puhlkit@jxnlco Does the compacted context sit on top of AGENTS.md? As in, is AGENTS.md loaded after every compaction step? It seems like it should be.
@JeffBezos Honest question: would you still support it if it also brought higher taxes to the top half? I'm not speaking about the math, just on principle.
People are changing their tune on AI replacing human work as the limitations of what current LLM architecture can do seem to become clearer. IMO, it's a mistake to confuse that with what AI could become, and when. I still think that fundamentally, computers could at some point do most of what the human brain does, and better. But for all the crazy impressive things they can do, it doesn't seem like it'll be this year or the next.
@davis7 If token budget is not a factor, do you see any reason to use a lower reasoning effort? Does speed alone outweigh the reduced quality for some tasks? If so, which types of tasks?
@theo I keep hearing people say how much better pi (specifically, omp/oh my pi). Any thoughts on that? I'm on the GUI train, but I guess in theory you could build enough around pi to keep a lot of the niceties of a GUI, like good diff-viewing. Also, I like Codex pets now.