1/9 ๐ New paper: Demystifying Scientific Problem-Solving in LLMs โ How does reasoning enhancement affect knowledge recall, and do LLMs benefit from external knowledge complimentary to reasoning?
Tldr;
๐ SciReas: holistic and efficient evaluation suite for scientific reasoning
๐ง KRUX: a novel framework to study knowledge vs reasoning in LLMs
๐ Findings: knowledge is a bottleneck; reasoners + in-context knowledge help; long CoT helps knowledge recall/utilization
(1/7) Very excited to share my first PhD preprint on the interactions of two of my favorite mobile genetic elements: phages and group II introns!
https://t.co/99pFYIYl4r
Super excited to share my postdoc work investigating how mating and parental behaviors evolve using wild species of mice combined with single nucleus RNA-sequencing of the hypothalamus ๐ญ๐ง ๐งฌ!
https://t.co/tYSFCfOqwP
@sokrypton It's also important to look at the homology between any domain in any seq in test to any domain in any seq in training, and I think we do a good job of assessing performance at various domain-level max % identity thresholds in our work on PSALM (https://t.co/80WSlT0uMp)
PSALM annotates sequences with greater sensitivity and specificity than profile HMM-based methods (on identical training sets). PSALM has a very low residue-level FPR, benefits strongly from additional examples, and annotates clans even at low percent identity to training data
PSALM uses a hierarchical approach that considers both individual protein domain families and clans (determined by Pfam). Modeling clans is an interpretable intermediate step that helps identify functional regions that lack clear family-level annotations
We propose PSALM, which extends ESM-2 to predict *residue*-level protein sequence annotations. PSALM accurately annotates domain boundaries, multi-domain proteins, and even domains that are currently unannotated in sequence databases