First, the released dataset has pretty good geographic coverage, and accounts for a majority of the US population!
Everything including zoning, noise, housing codes, and whether or not you can ride an ATV without a license.
New paper: every law in America is technically public. But not really, until now!
With @DenisPeskoff at UC Berkeley, we built a corpus of ~every publicly accessibly city and county law, and released a huge chunk of it!
2.2 million laws, you're (probably) covered in it!
🧵
New in Claude Code: Artifacts.
Interactive pages built from your session, like a PR walkthrough or a living project dashboard, shared with your team at a private link.
Available in beta on Team and Enterprise plans.
WOW - This trial just got moved to later in the year. This follows two full days of jury selection. A juror used CHATGPT to research this case, and then he told other jurors. That juror is going to be summoned for a contempt hearing. This trial is not happening this week.
New w/ @AISecurityInst & @UniofOxford:
Frontier AI can now out-persuade expert humans in conversation - incl. world-champ debaters and professional canvassers.
This held even when humans chose their topics, prepared in advance, and competed for £1,000 prizes 🧵
this sort of headline -- "The Job That AI Was Supposed to Kill Needs More Humans Than Ever" -- is becoming more common as pundits and papers realize the AI job apocalypse narrative is exactly backwards.
Harvey is live inside Microsoft 365 Copilot and Copilot Cowork.
Use Harvey in @Copilot for instant legal answers. Click through to Harvey for deeper analysis.
Use Copilot Cowork for full multi-step legal workflows, all without leaving Cowork.
@mualphaxi@MADarbyshire I’ve wondered the same about people named Al. “Did Al write this?” “That’s fake, it’s from Al.” This is why we need serif fonts.
We think Fable-5 is an incredible model and want to give our customers the controls to be able to use it safely for their legal work.
We are currently allowing firms to opt-in to using Mythos-class models and being very explicit to avoid customers being unaware like you mentioned.
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use.
Its capabilities exceed those of any model we’ve ever made generally available.
@KurtisCarman@SMB_Attorney I don’t think so yet, but a Nebraska attorney was suspended recently until further notice. Bunch of hallucinated citations in a divorce case that went to the state Supreme Court and then initially denied missing AI.
@AbeGreenwald There’s research showing that LLMs prefer their own output if the same model is rating it (just further supports the point from your friend’s job hunt), e.g.,
https://t.co/ILiDsbdl4A
Token costs are why there will be no saas apocalypse / good dev tools are cached intelligence for agents!
The popular theory goes: agents can write code, so they'll just rebuild every tool from scratch and hit raw APIs. no more dev tools, no more CLIs, no more software layers. just agents and endpoints!
We just tested this and the data says the opposite. We benchmarked Claude Code and Codex on real Hugging Face Hub tasks (~1,000 graded runs), with two setups: the agent-optimized hf CLI vs the agent hand-rolling curl or SDK calls from scratch.
Hand-rolling burns up to 6x more tokens on multi-step tasks and fails more often (84% vs 94% task success).
And that's just dropping one abstraction layer. It would obviously be orders of magnitude more tokens and a dramatically higher failure rate if the agent tried to bypass HF altogether and rebuild model hosting, versioning, and distribution from scratch. Every time an agent re-derives a workflow from raw API calls, you pay for that reasoning in tokens. every single run. a good CLI compresses that entire chain into a few high-level commands the agent can't get wrong.
In a world where everyone is complaining tokens are too expensive, abstraction is leverage: thousands of hours of design decisions your agent doesn't have to re-reason about at inference time.
Good tools are cached intelligence for agents!
So no, agents won't rebuild everything from scratch. they'll gravitate to the most token-efficient tools, because that's what their owners pay for. The software that survives won't just be accessible to agents, it will be accurate and cheap for them to drive.
We're seeing it happen with HF, which is becoming the platform for agents to use AI: ~49M requests in just two months, and growing fast!
https://t.co/Y7q6yuxZrZ
@dbreunig@dbreunig you can see part of what I mean here. The point I saw here is the creativity comes with some nonsensical outputs, but it at least gets you out of the basin.
Just wrote a blog about this paper from the perspective of Des Moines metro's quirky Halloween joke-telling tradition. https://t.co/3jYwY1XCGg @shi_weiyan
@dbreunig I’ve had some success generating truly novel synthetic data for test purposes using the VerbalizedSampling method but I’m not sure I’d characterize it as necessarily “good” or “creative” on its own the way I’ve done it.