We just published our work on an explainable active learning framework for ligand–protein binding affinity prediction in Digital Discovery.
🔗 https://t.co/zfurYTsFG4
Here’s a quick breakdown of what we did and why 👇
@DdelAlamo In our paper... 650M works. I was co-author of this paper.
I just work with ESM35M.
I even prefer just tokenising amino acid with 20 letters. And honestly speaking internally it works.
Will need to get compute and try ESM15B now.
SHAP analysis revealed chemically meaningful features driving predictions. The model learns to focus on SAR-relevant motifs over time. We identified key fragments for high affinity (e.g., halogens for TYK2).
We evaluate the framework across multiple settings and compare against standard baselines.
Key takeaway:
We can maintain (or improve) performance while gaining explainability.
One interesting observation:
The samples selected by our method are not just “uncertain” they often correspond to meaningful interaction patterns.
This gives more confidence in the active learning loop.
Big thanks to all collaborators and reviewers who helped improve the work.
@gorantlarohan@ppxasjsm
If you’re working on:
• drug discovery
• active learning
• explainable ML
we’d love to hear your thoughts!
This is not a final solution, but a step toward:
• More transparent ML models
• Better human–AI collaboration in science
• Active learning systems that scientists can actually trust
Instead of just predicting affinity, the model provides insight into:
👉 Which parts of the ligand and protein matter
👉 Why a sample is selected during active learning
This helps move from “prediction” → “understanding”.
Our goal was simple:
👉 Build an active learning framework that is not only effective
👉 But also explainable
So that model decisions can be inspected, trusted, and potentially acted upon.
Predicting binding affinity is central to drug discovery, but data is expensive.
Active learning helps by selecting which experiments to run next, instead of blindly collecting more data.
But there’s a problem…
Most active learning methods are black boxes.
They may pick good samples, but don’t tell us why those samples are useful.
In drug discovery, that lack of interpretability is a real limitation.
We just published our work on an explainable active learning framework for ligand–protein binding affinity prediction in Digital Discovery.
🔗 https://t.co/zfurYTsFG4
Here’s a quick breakdown of what we did and why 👇
@TheOneKloud@SchmidhuberAI When I read BYOL, I was impressed and at first glance saw how similar it was to JEPA(how JEPA was similar to BYOL), so I know now the whole lore!