@Tanner_H_Jones Would be interested in giving this a try -- any easy trial option available? I've made an account but just get an error message when I try to log in.
Perhaps unsurprisingly, in my many conversations with law professors about AI over the last few months, this study is the one that I have found most consistently helps people update their worldview about AI capabilities.
Law professors wrote questions they were asked during office hours. Gemini 2.5 & humans answered them then other law professors blindly judged the results:
-Gemini had a 75% win rate vs. professors
-Gemini's answers were rated LESS harmful than humans
-Newer models do even better
Finally -- I've been waiting for this paper to be made more broadly available. Answers blind-graded by experts is, I think, the most useful form of AI quality assessment, and this study really makes it clear how capable these tools are at answering legal questions.
@TimSchnabel Yeah, that would be great. I assume those who own that data see it as extremely risky to license it out. But as Anthropic and OpenAI keep seeing their revenue skyrocket, I do wonder whether a deal might get made.
This news from Kirkland along with the Harvey legal agent benchmark this week raises a big question: will the next major improvements in legally useful AI come from law-specific model building, or from improvements in general-purpose foundation models? 🧵
Kirkland & Ellis, the world's highest-grossing law firm, is setting aside $500M to build its own AI platform rather than rely on tools available to its rivals (Financial Times)
(Visit Techmeme dot com for the link and full context!)
But at this point my default bet would be that ChatGPT 6 or 7 (or whatever) will be better than Kirkland 1, just as o3, GPT 5 Pro, etc., are so much better than Lexis or Westlaw's AI products, both in terms of raw power and workflows.
Kirkland & Ellis, the world's highest-grossing law firm, is setting aside $500M to build its own AI platform rather than rely on tools available to its rivals (Financial Times)
(Visit Techmeme dot com for the link and full context!)
I definitely take this anonymous post with a grain of salt, but it’s a good reminder that a lot of what’s going on in courts with AI may not be very legible…
talked to a sitting state court clerk in NJ. here’s what he said.
I asked about AI hallucinations in briefs. he said judges are seeing them from reputable firms and mostly aren’t sanctioning unless the whole brief is BS. here’s the reason: NJ judges are nominated by the governor and can only get tenure after 7 years if they’re confirmed again by the governor and senate. that confirmation is informed by reviews of the judges written by lawyers who appeared before them. so judges don’t want to be known as hard asses.
on AI use in the courts: “AI is used by every clerk and judge in the NJ courts. NJ’s courts spent millions developing a proprietary internal AI which does nearly everything a judicial clerk does — it scans documents, finds caselaw, and can draft decisions.” not Harvey, not Legora. built in-house. they also give the judges access to LexisAI.
on his own workflow: “I don’t have any physical documents I just look up the docket number. I can download the PDFs and put them into the AI to tell me what’s relevant and then I’ll have AI do a first draft of an analysis and check the cases.”
I asked him if he’s just chilling compared to old school clerks. his quote: “I swear the only reason they still have judicial clerks is the deep clerking culture in Jersey.”
on the inflow of AI-drafted briefs from pro se litigants: “Pro se briefs are now like 10-20 pages of actual meritorious arguments. It’s wild.” historically pro se filings were thin and easily dismissed. now they’re being drafted with Claude and the clerks have to take the arguments seriously.
on the skill being built: he used to think his clerkship was about the intense writing experience. now he thinks the real skill is reading through AI-drafted work and finding the core arguments underneath — and every clerk is developing that skill in real time because of the volume.
I asked if AI is going to speed up the court process. his answer: “Jersey already has a relatively quick motion cycle (it’s literally every two weeks) so I think this will just let clerks get more work done in that two weeks so less stuff will get adjourned.”
makes me wonder if the courts are further along than law firms.
talked to a sitting state court clerk in NJ. here’s what he said.
I asked about AI hallucinations in briefs. he said judges are seeing them from reputable firms and mostly aren’t sanctioning unless the whole brief is BS. here’s the reason: NJ judges are nominated by the governor and can only get tenure after 7 years if they’re confirmed again by the governor and senate. that confirmation is informed by reviews of the judges written by lawyers who appeared before them. so judges don’t want to be known as hard asses.
on AI use in the courts: “AI is used by every clerk and judge in the NJ courts. NJ’s courts spent millions developing a proprietary internal AI which does nearly everything a judicial clerk does — it scans documents, finds caselaw, and can draft decisions.” not Harvey, not Legora. built in-house. they also give the judges access to LexisAI.
on his own workflow: “I don’t have any physical documents I just look up the docket number. I can download the PDFs and put them into the AI to tell me what’s relevant and then I’ll have AI do a first draft of an analysis and check the cases.”
I asked him if he’s just chilling compared to old school clerks. his quote: “I swear the only reason they still have judicial clerks is the deep clerking culture in Jersey.”
on the inflow of AI-drafted briefs from pro se litigants: “Pro se briefs are now like 10-20 pages of actual meritorious arguments. It’s wild.” historically pro se filings were thin and easily dismissed. now they’re being drafted with Claude and the clerks have to take the arguments seriously.
on the skill being built: he used to think his clerkship was about the intense writing experience. now he thinks the real skill is reading through AI-drafted work and finding the core arguments underneath — and every clerk is developing that skill in real time because of the volume.
I asked if AI is going to speed up the court process. his answer: “Jersey already has a relatively quick motion cycle (it’s literally every two weeks) so I think this will just let clerks get more work done in that two weeks so less stuff will get adjourned.”
makes me wonder if the courts are further along than law firms.
@ProfArbel@ARozenshtein Yeah; the lack of a quality measure is part of the challenge re hill climbing in the first place. But on those subtasks, that sounds on the money to me (I'd probably estimate toward the higher end of that range, just based on unscientific impression).
@ProfArbel@ARozenshtein I’m mostly with you I think, Yonathan, but I’m curious about your take: given a trial court ruling and record, and asked to write an appellate brief, where would you place the best AI tools relative to lawyers currently (eg 10th percentile, 90th, etc.)? And by end of year?
I say "could" genuinely here—maybe an AI-graded benchmark will help. But there's also a risk that it's the cheap/easy way out in a domain where getting a good signal for model training is notoriously difficult.
I'm very much in favor of more and different legal benchmarks. But from what I can tell, this benchmark is graded by AI models as well. That has advantages of course, but without validating that kind of grading, that is a meaningful asterisk, and could limit hill climbing.
Don't read this as suggesting law poses any particular challenges to AI. Rather, the existence of a benchmark in which AI already scores in the mid-single digits suggests that rapid improvement and saturation are imminent. If you can measure, you can hill climb.
I should also caveat—not *all* human writing is better along those lines, of course. But a lot of it! Even just your everyday twitter back-and-forth has more of those features than your everyday AI output. And I say this as someone who uses AI often and finds it very valuable.
My working theory is that human writing has more information density, more perplexity, and more emotional range than AI writing, and people who are used to reading both recognize that intuitively, which leads to being turned off by the latter.
I think it also leads to feeling a bit cheated when you thought you were reading a human product and then realize it's AI. You were just expecting, reasonably, to get more for your time. (Note that all this might change with changing capabilities!).