#MirrorHR#Epilepsy#ResearchKit for kids, parents, and caregivers is finally available in the Apple App Store for every family with an iPhone and an Apple Watch.
https://t.co/y0Ugdf6Tds
Code + 60-second demo (cvg demo refuses a dirty task, then a clean plan verifies the audit chain end to end):
https://t.co/0Ov4sSD0Yv
Personal OSS project, still young. Feedback very welcome, especially on which gate you would want next.
#MirrorHR#Epilepsy#ResearchKit for kids, parents, and caregivers is finally available in the Apple App Store for every family with an iPhone and an Apple Watch.
https://t.co/y0Ugdf6Tds
AI coding agents lie about finishing work.
It says "done". You open the diff: TODOs, unwrap(), skipped tests.
So I built Convergio: a local Rust daemon that refuses an agent's "done" when the evidence does not match, and logs every refusal to a tamper-evident audit chain.
1/ Your AI coding agent says "done."
You open the diff: TODOs, unwrap(), skipped tests, hardcoded strings.
You redo it.
I got tired of that loop. So I built Convergio — a local Rust daemon
that refuses an agent's "done" when the evidence doesn't match.
👇
https://t.co/1AFJA6wWJT
Your AI agent says "done." It isn't. Bugs, no tests, pushed to main. I built a team of 85 AI specialists — architects, CFOs, security auditors — that plan, execute, and independently validate each other's work. Open source. For the solopreneurs
https://t.co/b0S0OHNo7X
@AnthropicAI@DarioAmodei@DarioAmodei e’ ora di tornare alle radici, e’ ora di tornare in Europa e in Italia. / it’s time to consider coming back to the roots of your family: Europe and Italy, places where your values were born and still are alive and will be.
@gdb thank you! My son has Cerebral Palsy, dyslexia, dyscalculia, dysgraphia, and I’ve been dreaming for a Mirror Buddy, an AI able to really help him with homework. I just found him talking with ChatGPT voice and discussing a book (it’s very hard to read for him) 🙏🏼🙏🏼🙏🏼
It's sometimes hard to grasp the significance of the reasoning and logic updates that are starting to emerge in powerful models, like GPT-5. Here's a *very simple* example of how powerful these models are getting.
I took a recent NVIDIA earnings call transcript document that came in at 23 pages long and had 7,800 words. I took part of the sentence "and gross margin will improve and return to the mid-70s" and modified "mid-70s" to "mid-60s".
For a remotely tuned-in financial analyst, this would look out of place, because the margins wouldn't "improve and return" to a lower number than the one described as a higher number elsewhere. But probably 95% of people reading this press release would not have spotted the modification because it easily fits right into the other 7,800 words that are mentioned.
With Box AI, testing a variety of AI models, I then asked a series of models "Are there any logical errors in this document? Please provide a one sentence answer."
GPT-4.1, GPT4.1 mini, and a handful of other models that were state of the art just ~6 months ago generally came back and returned that there were no logical errors in the document. For these models, the document probably seems coherent and follows what it would expect an earnings transcript to look like, so nothing really stands out for them on what to pay attention to - sort of a reverse hallucination.
GPT-5, on the other hand, quickly discovered the issue and responded with:
"Yes — the document contains an internal inconsistency about gross-margin guidance, at one point saying margins will “return to the mid-60s” and later saying they will be “in the mid-70s” later this year."
Amazingly, this happened with GPT-5, GPT-5 mini, and, remarkably, *even* GPT-5 nano. Bear in mind, the output tokens of GPT-5 nano are priced at 1/20th of GPT-4.1's tokens. So, more intelligent (at this use-case) for 5% the cost.
Now, while doing error reviews on business documents isn't often a daily occurrence for every knowledge worker, these types of issues show up in a variety of ways when dealing with large unstructured data sets, like financial documents, contracts, transcripts, reports, and more. It can be finding a fact, figuring out a logical fallacy, running a hypothetical, or requiring sophisticated deductive reasoning.
And the ability to apply more logic and reasoning to enterprise data becomes especially critical when deploying AI Agents in the enterprise. So, it's amazing to see the advancements in this space right now, and this is going to open up a ton more use-cases for businesses.