@AlexanderKalian I don't think emotions/consciousness are the missing pieces, because they are non-functional (you can always just simulate them). I think it is long-term goals. Not optimized for in RL finetuning. AI's don't invent new areas of math or new programming languages to solve problems
@prz_chojecki That is an awesome idea! I love the lore/drama behind this "proof," and the philosophical implications are interesting (math seems to become postmodern...). But what are the "funny things" that happen here?
@Quasilocal Makes sense. In BioML it is vastly different. There are papers in reputable journals with many citations that are completely sloppy. You have to be an expert in the field to spot this
Now published in Nature Communications:
https://t.co/23eFOVOHmo
DrEval could also function as an unbiased reward signal for AI agent–based model development in biological ML. We are happy to chat about this if anyone is interested in details :)
ML-based cancer cell line drug response models are well-motivated, and significant research effort has gone into developing complex modeling approaches (over 100 papers in 2025).
The problem: under rigorous evaluation, we found none that actually works.
Most are published based on inflated metrics, break down when probed in realistic application scenarios, and are outperformed by simple baselines.
We built DrEval, a pipeline for unbiased evaluation of these models, to encourage more meaningful progress in the field.
@AlexanderKalian Fully agree. I see that in my field: Drug response prediction in cancer. The bottleneck is not papers, not intelligence, but just large datasets. People train large neural networks on 1k data points, and surprise Pikachu face that we haven't cured cancer
@HistedLab Biggest problem for me is the papers that are "not even wrong". Too fluffy and unfalsifiable. They have names like "Navigating the landscape of xy" and noone will ever engineer something based on them.