@OpenAI Are the benchmarks run with the Life Sciences plugin enabled? And does the plugin materially change results e.g., does the GPT-5.5 delta to GPT-Rosalind meaningfully narrow with the plugin? Maybe @ChrisHayduk you might know?
@OpenAI Are the benchmarks run with the Life Sciences plugin enabled? And does the plugin materially change results e.g., does the GPT-5.5 delta to GPT-Rosalind meaningfully narrow with the plugin? Maybe @ChrisHayduk you might know?
Nice to see Claude Opus 4.8 refusing fewer biology questions than 4.7.
It's now also recognizing the error-prone nature of correctly reproducing long sequences:
"I can't provide the exact full amino acid sequence of... as I'm not confident I can reproduce it with the precision required. Sequence data like this is the kind of information where small errors (a single wrong residue) can be significant, and I don't want to give you something that looks authoritative but contains mistakes."
The year is 2026 and not a single brand on amazon contains an English word. New companies are spawned by the blind smash of a fist on the z x w q side of the keyboard. Vowels are banned. Each product has 2.5 million 5 star reviews. They're all written by AI. There are so many we need AI summaries. The summary is never helpful. The product images are AI too. Buy it Now happens automatically if you stare at an item too long. Nothing is what you want. Doesn't matter. Commerce enlightenment has been achieved.
Because of all the refusals (via API and separately in chat) I get significantly lower performance on my biotech benchmark that covers antibody therapeutics, tox, clinical dev, cell surface receptors, food allergens, and biological sequences.
Today we're publishing our index of unmet needs in human disease: 2443 indications scored and ranked on burden of disease, prevalence, pipeline activity, and treatment burden. We hope this will help drug developers identify overlooked medical problems
https://t.co/OX4BKC8cfI
after repeatedly trimming and simplifying the prompt, the conclusion is that claude doesn't like post-translationally cleaved proteins. [diff below shows the deletion necessary for claude to not refuse the prompt; no data was attached to prompt]
after repeatedly trimming and simplifying the prompt, the conclusion is that claude doesn't like post-translationally cleaved proteins. [diff below shows the deletion necessary for claude to not refuse the prompt; no data was attached to prompt]
Genuinely bewildered. Claude refusing to analyze ELISA and SPR binding data. I even tried removing (contentious ??) terms like "antibody" and "antigen"
after repeatedly trimming and simplifying the prompt, the conclusion is that claude doesn't like post-translationally cleaved proteins. [diff below shows the deletion necessary for claude to not refuse the prompt; no data was attached to prompt]
@TensorTwerker@ByteDanceOSS unavailable it seems: "Please note that the accessibility of the protenix-v2 checkpoint is currently under review... We are unable to provide a specific timeline at this stage" - https://t.co/W9HoWeXTFP