If you have substantial experience with litigation and applying AI to solve real-world litigation problems and are beyond excited about the potential of AI to transform literally everything, I have open positions on our Westlaw team and would love to connect. See:
https://t.co/lm0BepSgh4
Is Everest the tallest mountain? It depends how tallest is defined. Base to peak? From the center of Earth? Based on reasonable definitions, the tallest could be Everest, or it could be Mauna Kea, Chimborazo, or Olympus Mons.
In law, definitions can be far more nuanced than with that simple example. Across different topical areas, the same words can have very different meanings, and terms can even have different definitions within the same laws. "Located" has different meanings in different provisions of the National Bank Act. "Age" has different meanings in different provisions of the ADEA, and “wages paid” has different meanings in different provisions of the Tax Code.
When benchmarking accuracy of applications, it'd be ideal for evaluators if questions were precise and answers precisely fit the questions. Math is perfect for this. But with real-world legal questions, we often find ambiguities in both questions and answers, and this can make accurate benchmarking far more difficult than it might seem. I teamed up with @robodasha to walk through what we consider to be best practices for benchmarking AI for legal research to avoid accuracy issues in benchmarking. See: https://t.co/xoop8NqBeI