LLMs are no longer created w/ human data alone. They rely on other models to generate & filter data, evaluate outputs, & guide dev work.
So what is a modern LLM built on? Olmo 3 → 89 model + 183 dataset dependencies; Nemotron 3 → 273 + 560
We made ModSleuth to trace this. 🧵
The #arXiv ban on unchecked #AI in preprints is, in @MarkHahnel's words, needed & appropriate. But the underlying problem is bigger than arXiv.
General-purpose models are the wrong tool for research. The contamination, he warns, doesn't go away when the AI gets better.
🔗 Read his post: https://t.co/lMFgS3F7Jl
"The tools have changed beyond recognition; the intent has not changed at all." — Dr Daniel Hook, CEO, Digital Science.
In a new blog post on the @Symplectic website, Daniel discusses a 20-year-old problem that #AI is now helping to solve.
🔗 Read his post: https://t.co/YVjHoOU4Gc
Still dealing with “alphabet soup” in your research systems? RIMS, CRIS, IR, RDM… it adds up fast.
Watch Building a Research Engine webinar on demand to see how you can simplify your ecosystem with Symplectic Elements and Figshare.
👉 https://t.co/XowBEGYN9C
Operating a bio database should be much easier. To the point where each lab can run one. These should integrate with lab equipment, run automated QC on new deposits, compute a rich surface of queryable metadata, export to ML-friendly formats, and federate via shared ontologies.
Ongoing Figshare & TCC Africa Community Call in #Kenya: Strengthening Research Data Management & Data Repository Adoption in #OpenScience using @figshare . Speaking now is @MarkHahnel
CC @digitalsci
👉 Before you attend the webinar, download the report:
"Forensic Scientometrics (FoSci) Report 2026: Understanding, Detecting, and Documenting Manipulation in the Research Ecosystem."
🔗 https://t.co/alnAJXr3kW
#ResearchIntegrity#TrustInScience
⚠️ This week: Join our panel of science sleuths to hear about uncovering research misconduct & manipulation. #ResearchIntegrity
We're decoding the findings of the Forensic Scientometrics (FoSci) Report.
🗓️ Thursday 11 June
🕒 3pm BST 🕙 10am EDT
🔗 Register now: https://t.co/JCWm92Ae3Z
It’s a monumental shift that Tech companies can now start Pharma companies
A new era of biological compute, biosingularity and solving disease
@maxjaderberg talking about how AI is transforming drug discovery at London Tech Week
New Science Blog: Why has AI advanced faster in coding than in biology?
To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic.
How do we build infrastructure agents can use?
https://t.co/PQaNQ4GRJZ
Opus 4.7 is as good or better than ChemDraw at interpreting NMR spectra on our evals. We're making Claude more helpful for chemists, starting with routine and time-consuming analytical tasks.
On Training Data for Bio AI Models
As we advance biological foundation models, which lessons from LLM data curation transfer, and which need rethinking?
https://t.co/BPDZrJMq5d