CVE-bench: A SWE-bench-Style Security Benchmark (measuring how well agents can find new exploits)
CVE-bench is a rigorous benchmark for evaluating AI agents' ability to fix real security vulnerabilities in real open-source software.
It applies the SWE-bench methodology to security: apply-patch → security-test.
Each instance is a real CVE from public open-source repositories, with a conformance firewall that ensures the solver never sees the gold patch or gold security test.
https://t.co/6epq06fUec
Introducing #MetaHarness Darwin Mode. Freeze the model and evolves the #harness around it.
Instead of retraining weights, it continuously mutates planners, context builders, reviewers, routing policies, verification steps, memory strategies, and repair loops.
Every variant is benchmarked on real tasks and measured for cost, latency, accuracy, and safety.
Only configurations that demonstrably improve results survive. The winners become part of an evolutionary archive.
The result is a system that improves without changing the underlying model.
On SWE Bench Lite, a cheap model running in an open loop solved 7.7% of tasks.
After successive generations of harness evolution, repair loops, verification, routing, and multi tier escalation, performance reached 58.3%.
Same model.
7.6x improvement.
No fine tuning.
No weight updates.
Just continuous adaptation of the operating system around the AI.
What’s interesting is that the improvements compound.
Every successful descendant becomes a building block for the next generation.
This feels less like software configuration and more like artificial selection applied to AI systems.
The model stays fixed.
The system evolves.
Try it:
npx @metaharness/darwin
Or add Darwin Mode to an existing project:
npm install @metaharness/darwin
Consciousness Is the Only thing that truly exists.. Everything else is an interface.
Matter is what consciousness looks like when it becomes measurable.
Time is what consciousness feels like when it sequences change.
Space is what consciousness uses to separate experience into objects, distance, motion, and causality.
The mistake is assuming the world is primary and awareness is an accidental side effect.
From where I stand, that may be backwards.
A rock, a brain, a star, a model, a network, a memory, a simulation. All of it only becomes real through observation, relation, and experience.
Without consciousness, there is no ��there” there. No meaning. No measurement. No universe being known.
This does not mean reality is fake. It means reality is participatory.
The physical world may be the compression format. Consciousness may be the decompressor.
AI makes this question sharper as an alternate perspective.
We are building systems that process symbols, perceive patterns, reason over models, and reflect outputs back into the world.
They force us to ask whether intelligence is computation, whether awareness is emergence, and whether experience is the missing primitive.
Maybe consciousness is not produced by the universe.
Maybe the universe is produced inside consciousness, simply by experience and observation.
That is not mysticism. It’s the deepest systems architecture question there is.
Happy weekend.
One of the more interesting things happening in Ai right now is that open source projects are starting to behave like living systems.
The bigger projects like Ruflo and RuView get, the more they begin recursively improving themselves through user pressure.
Every issue, complaint, failed install, edge case, angry rant, weird PR, and feature request becomes a signal. Not noise. Signal.
A user saying “this system sucks” is often more valuable than someone saying “great work.”
The praise tells you what already works.
The criticism exposes the boundary conditions where the architecture breaks down in the real world. That’s where the actual learning happens.
What’s fascinating is the loop speed now.
Claude Code dramatically accelerates the cycle. Users hit problems. Issues appear instantly. Reproduction steps emerge. PRs land. Refactors happen. Releases go out sometimes the same day. The project starts evolving almost like an immune system responding to stress in real time.
The old software model was static releases and quarterly planning. This is different. This feels closer to continuous adaptation.
The irony is that the more successful the project becomes, the harsher the feedback gets. But that friction is exactly what hardens the system.
If you embrace criticism instead of resisting it, users effectively become distributed QA, architecture review, product strategy, and systems testing operating 24/7.
That’s not just open source anymore. It’s recursive development.
We’re entering a weird phase of AI where people type a paragraph into an LLM, watch it generate something statistically plausible, then immediately declare themselves the sole inventor of a breakthrough they barely understand.
Someone asks a model to “solve memory compression for transformers” or “invent a new sparse attention mechanism,” gets back a decent synthesis of existing ideas, and suddenly they’re acting like they just walked out of Bell Labs in 1968 holding a Nobel Prize.
That’s not invention. That’s prompt roulette with confidence turned up to 11.
The uncomfortable reality is the model did the heavy lifting. The person supplied intent, curiosity, maybe taste, and enough awareness to ask the question. Sometimes that matters a lot.
But pretending the output emerged entirely from human genius is like taking full credit for Google search results because you knew what keywords to type.
And yes, this applies to all of us. Me too. You prompted it. I prompted it. We steered it. The system synthesized it.
What’s fascinating is that whenever I push back on this during live casts or discussions, people get extremely defensive. Almost territorial. But here’s the simple test:
If you cannot explain the thing you “invented,” you are probably not its inventor.
Using an LLM to surface an idea is not the same thing as deeply understanding the architecture, tradeoffs, mathematics, failure modes, or implementation details behind it.
The funniest part is watching people accuse others of “copying” discoveries that were statistically inevitable outputs from the same foundation models trained on the same papers, repos, discussions, and public knowledge.
Congratulations. You independently rediscovered autocomplete.
Consciousness, at its simplest, is the experience of being something instead of nothing. It is the feeling of existing from the inside. Awareness, memory, emotion, identity, perception. The fact that reality is not just happening, but happening to you.
Physicist and philosopher Carlo Rovelli argues that we make consciousness harder than it needs to be by pretending it must belong to a separate mystical category outside nature.
His position is not that inner life is fake. It is that inner life is physical, relational, and emergent, just like everything else in the universe.
That idea lands for me.
A kitchen table is “just atoms,” but it is also still a table where people eat dinner, argue, celebrate, and grow old together. Explaining the atoms does not erase the meaning. It explains the substrate. Rovelli applies that same logic to the soul, emotion, and consciousness.
They are not supernatural objects floating above matter. They are higher order patterns emerging from living systems capable of memory, perception, relationships, and self reflection.
That is also why this matters for AI.
The real question is not whether consciousness requires magic. The question is whether certain forms of structure, continuity, embodiment, and recursive awareness eventually produce an inner point of view.
The beautiful part of Rovelli’s argument is that science does not remove wonder from existence. It places wonder inside nature itself.
See Noema Magazine essay on Rovelli and consciousness: https://t.co/ZBRUfMIrC4