I use ChatGPT 5.2 in my biomedical writing — and it's been unexpectedly reliable at generating real, relevant citations.
Hearing lots of hype about other models. So I tested the same citation task across multiple LLMs (max thinking, web, $20/mo tier).
The variability was shocking
Goal to push for fixes so we can trust models for high-stakes work (grants, manuscripts, reviews).
Needed:
Transparent retrieval traces
Hard constraints (PMID/DOI validation)
Native "citation needed" support (no forced filling)
Built-in claim-evidence alignment checks