How accurate and significant are the points raised in AI reviews of academic papers?
In a paper with 45 contributors across 27 institutions and many domains from the natural sciences, we attempted to answer this question.
Some major results:
- State-of-the-art AI reviewers are generally accurate and point out significant well-evidenced points, comparably to human reviewers
- However, they have issues such as being less well grounded in scientific community norms
- A panel of AI reviewers is more homogenous than a panel of human reviewers, pointing out similar issues far more often
We view this as evidence that AI-supported paper review is promising supplement when done well, but certainly not a substitute for human expert reviewers at this time.
Now available in AstaLabs in limited research preview: MyScholarQA, a personalized version of ScholarQA for scientific deep research.
ScholarQA helps synthesize evidence from 12M+ open-access papers. MyScholarQA adds user profiles to tailor that synthesis to you. 🧵
New Anthropic research: Emotion concepts and their function in a large language model.
All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.