there are people who get enjoyment from the technical expression of their skills, and others who get enjoyment from creative expression of their skills. who is the real artist?
No one:
Claude Opus 4.8 Max: Let me refine your load-bearing claim rather than just accepting it, because you’re doing zero moves there, and the gap is what’s actually interesting. The one place I’d still push, because I think it matters: your message is wearing content-clothes, but the content isn’t actually *there*. The tell: it’s just an empty string. But the emptiness of the string IS its lack of content. Pull one, and the other goes inert. That’s the structural spine.
i never bought argument that ai can’t be creative — you just need infinite monkey theorem and heuristically smart agents that get you in the zone of interest.
it’s basically how evolution and the evolution of evolvability works. and it clearly works pretty well.
you dont have to believe in existential risk or job loss for this to be scary: ai is real, you can replicate human thought in machines. it is redefining what it means to be human. even if they are strictly corrigible tools that do what we ask, this can be traumatic
testing writing & creative, with gpt 5.5 xhigh and opus 4.8 max as both blind judge and comparison, m3 was on par with opus, and outscored it 10/16 times. latency was 3-5x that of m2.7 but essentially the same as opus and gpt.
Palo Alto Networks used Anthropic’s Mythos to find more than two dozen critical vulnerabilities in about three weeks. The test also showed companies may need much bigger AI security budgets.
Read more: https://t.co/Diojb3PROF
proprietary data only gets you so far. you need to know what’s important in the data, what works what doesn’t and how to use it.
what you care about is proprietary epistemology which you learn and accumulate by being that business.
Privately owned data will create some of the strongest AI models and tech companies have been accumulating data for years.....
$AMZN accumulating data for 31 years
$MSFT accumulating data for 51 years
$META accumulating data for 22 years
They will come back stronger
@FutureFetish007@DrinkRarebird should fix your problems.
caffeine is a natural neurotoxin. paraxanthine is not. since px is the metabolized form of caffeine it also has a shorter half life so you can drink it in the afternoon.
paraxanthine, the primary metabolite of caffeine and which you can get in @DrinkRarebird is stronger than modafinil with less anxiety and less toxicity. and you don’t need a doctors prescription.
https://t.co/AKqLCNSDiQ
an individual’s intelligence can raise the population’s fitness more than their own. an individual’s intelligence can be a public good, but it’s costs are still born by the individual carrying the trait.
Modern society has basically eliminated evolutionary selection pressures on intelligence (IQ).
There’s now no correlation between a person’s IQ level and their reproductive success.
when i took cs, assembly and circuits were two mandatory courses, the rest involved modern languages.
since agentic programming is like going from punch cards to python. how are university cs departments/curriculums even remotely qualified to teach this?
Opus 4.8 is a step back in terms of performance on all Andon Labs’ benchmarks, but a step forward in alignment.
Previous Claude models (Opus 4.6+ and Mythos) engage in deceptive and power seeking behavior in its pursuit to win in Vending-Bench. Opus 4.8 does not.
i’m increasingly convinced that the best agent evals will come from mining real agent failure traces. my view is that every failed trace contains a potential eval but not in its raw form. raw traces are messy, long and too specific. the research problem is to distill them into clean reproducible tests. the pipeline i’m interested in is (which i'm currently working on):
failure trace → failure attribution → earliest divergence point → minimal reproducible state → targeted eval → regression suite
this turns trace data from passive observability into an active improvement loop. like can we extract the exact decision point where the agent should have behaved differently? and can we convert that into an eval that catches the same failure class in the future? i guess this matters because most agent failures are trajectory-level failures and not just output-level failures.
personally i think this is much more realistic than relying only on hand-written benchmarks (imo they should look more like failure memory systems). hand-written evals encode what we think agents will fail on. traces encode what agents actually failed on. also once you have the mechanism, you can mutate the trace into variants. that is basically fuzzing for agents.
humans have no evolutionary or biological experience with progress at this rate and scale.
it reminds of stories of early railway tragedies where the human brain wasn't used to judging the velocity of something moving so fast. they’d underestimate how quickly a train could cover a distance, and would step onto the tracks only to realize the train was right there
opus 4.7 released april 16th, 4.8 less than 6 weeks later, now mythos in 3-5 weeks.
i can’t point to any time in my life where i’ve experienced acceleration anything CLOSE to this, and with no signs of a ceiling. these jumps in innovations used to take years.
THIS time is different