Arman Seyed-Ahmadi @arman1sa - Twitter Profile

Pinned Tweet

3 months ago

What if AI could explain why a protein is a kinase, not just tell you it is? We built just that. BioReason-Pro is a multimodal LLM that reasons about protein function — walking through domains, interactions, and biological context to make predictions you can actually evaluate.

3

54

9

30

8K

Arman Seyed-Ahmadi @arman1sa

30 days ago

@adibvafa @Radii2323 Huge congrats Adib!🎉🔥

1

0

38

arman1sa retweeted

Bo Wang

@BoWang87

about 2 months ago

Orthrus is now in Nature Methods(@naturemethods ) 🔥🔥🚀🚀 Paper: https://t.co/Ry55kWVkXl Code: https://t.co/UZeEw7bCgE The core bet: existing genomic foundation models use masked language modeling or next-token prediction imported from NLP. They work. But they're not aligned with how RNA sequence relates to function. Orthrus uses contrastive learning with two biologically grounded augmentations: splicing isoforms (same gene, different exon inclusion) and orthologous transcripts (same gene, different species). Both pairs should be functionally similar. The model learns by agreeing across them. Trained on 400+ mammalian species via the Zoonomia Project. Outperforms existing genomic models on 5 mRNA property prediction tasks, often beating task-specific supervised baselines with a linear head. SOTA on RNA half-life with 45 labeled examples. The lesson isn't "more data" or "bigger model." It's that the pre-training objective has to mirror the structure of the biology. Evolution and splicing are the right teachers for mature RNA. Huge congrats to the lead authors @phil_fradkin @ianshi3 !

BoWang87's tweet photo. Orthrus is now in Nature Methods(@naturemethods ) 🔥🔥🚀🚀

Paper: https://t.co/Ry55kWVkXl

Code: https://t.co/UZeEw7bCgE

The core bet: existing genomic foundation models use masked language modeling or next-token prediction imported from NLP. They work. But they're not aligned with how RNA sequence relates to function.

Orthrus uses contrastive learning with two biologically grounded augmentations: splicing isoforms (same gene, different exon inclusion) and orthologous transcripts (same gene, different species). Both pairs should be functionally similar. The model learns by agreeing across them.

Trained on 400+ mammalian species via the Zoonomia Project. Outperforms existing genomic models on 5 mRNA property prediction tasks, often beating task-specific supervised baselines with a linear head. SOTA on RNA half-life with 45 labeled examples.

The lesson isn't "more data" or "bigger model." It's that the pre-training objective has to mirror the structure of the biology. Evolution and splicing are the right teachers for mature RNA.

Huge congrats to the lead authors
@phil_fradkin @ianshi3 !

7

242

42

115

35K

arman1sa retweeted

Adib

@adibvafa

about 2 months ago

We taught a DNA model to learn its own tokenization. It learned the genetic code with no supervision. And outperforms Evo 2's architecture with 3x faster inference. Great work with Arnav (@arnavshah0), Victor (@victor_ljz), Parsa (@Radii2323), Brandon (@fluorane), Sukjun (@sukjun_hwang), Bo Wang (@BoWang87), Patrick Hsu (@pdhsu), Hani Goodarzi (@genophoria) and Albert Gu (@_albertgu) 🔥

1

111

19

67

10K

arman1sa retweeted

Bo Wang

@BoWang87

2 months ago

BioReason-Pro was released less than 2 weeks ago, and the response has been incredible. Already, 1,300+ users worldwide have signed up for the portal, and 3,000+ proteins have been tested. We’re deeply grateful for all the thoughtful and constructive feedback. Today, we’re open-sourcing 223,000+ protein reasoning traces from BioReason-Pro on @huggingface and hopefully our work can further facilitate more research into biological reasoning! Dataset: https://t.co/TBPwy77mIN Try it here: https://t.co/ejt2AQ562N

1

88

20

39

11K

Arman Seyed-Ahmadi @arman1sa

2 months ago

@ravishar313 @adibvafa BioReason-Pro could be a nice addition to this cool tool

0

2

0

10

arman1sa retweeted

Hani Goodarzi

@genophoria

2 months ago

@anshulkundaje articulates something the AI-for-biology practitioners (or AI-for-science for that matter) need to hear more: we are far from a stage that scale alone solves biology. Deep domain expertise and principled interpretation (as opposed to cherry-picking of results) is how we actually make progress. There's too much hubris right now in assuming one can brute-force their way through biological complexity without understanding it.

0

98

14

24

7K

arman1sa retweeted

Adib

@adibvafa

2 months ago

We have fixed a major inference bug in https://t.co/QJ8km69UXC, significantly improving the quality of reasoning Give BioReason-Pro another try! And please keep the feedback coming You can also find a guide on setting up the model locally at https://t.co/SBINSVnu7d

0

33

15

19

5K

arman1sa retweeted

Abhinav Adduri

@abhinadduri

3 months ago

We @arcinstitute, @UHN, and @VectorInst recently released out BioReason-Pro, a multimodal reasoning LLM for protein function prediction, trained via SFT on synthetic reasoning traces and subsequent RL. I had a chance to interview @BoWang87 and @genophoria on their vision for the work and what comes next. Was fun to pick their brains on the bio! Check out the interview: https://t.co/YWslLYKFxf

2

78

10

42

15K

arman1sa retweeted

Matthew Sdorf

@MSdorf9980

3 months ago

I just used BioReason-Pro on a gene I am subcloning and was quite impressed. The processing time is reasonable, and the results appear accurate. That said, the functional summary could be expanded to provide more depth and context. In addition, the GO-GPT predictions section would benefit from clearer guidance and more informative explanations. Still, amazing work! Congratulations to @BoWang87 @genophoria @arcinstitute. I plan to use more in my future research.

MSdorf9980's tweet photo. I just used BioReason-Pro on a gene I am subcloning and was quite impressed. The processing time is reasonable, and the results appear accurate. That said, the functional summary could be expanded to provide more depth and context. In addition, the GO-GPT predictions section would benefit from clearer guidance and more informative explanations.

Still, amazing work! Congratulations to @BoWang87 @genophoria @arcinstitute. I plan to use more in my future research.

2

30

9

13

10K

arman1sa retweeted

Helen Qu @_helenqu

3 months ago

physical systems (orbits/fluid mechanics) may look complex, but are often governed by simple equations/few parameters. can current self-supervised methods learn the underlying physics? our new paper finds that learning in latent space may be the key! https://t.co/cvMKzx9qrQ🧵

_helenqu's tweet photo. physical systems (orbits/fluid mechanics) may look complex, but are often governed by simple equations/few parameters. can current self-supervised methods learn the underlying physics?

our new paper finds that learning in latent space may be the key!

https://t.co/cvMKzx9qrQ🧵 https://t.co/OIwwvvd0pA

25

660

97

529

57K

arman1sa retweeted

Andrew White 🐦‍⬛

@andrewwhite01

3 months ago

@adibvafa Great project! I've been looking for someone to try this on protein function and your team did a great job!

1

9

3

2

2K

arman1sa retweeted

Biology+AI Daily @BiologyAIDaily

3 months ago

BioReason-Pro: Advancing Protein Function Prediction with Multimodal Biological Reasoning @arcinstitute 1. BioReason-Pro introduces the first multimodal reasoning large language model specifically designed for protein function prediction, combining protein embeddings with biological context to generate interpretable reasoning traces rather than just classification labels. 2. The system integrates ESM3 protein embeddings, a GO graph encoder, and biological context including organism, domains, protein-protein interactions, and GO-GPT predictions to perform step-by-step biological reasoning from sequence to function. 3. GO-GPT, a key component, is the first autoregressive transformer for Gene Ontology prediction that captures hierarchical and cross-aspect dependencies between GO terms, achieving state-of-the-art Fwmax of 0.65-0.70 across inference strategies. 4. The model was trained on over 130,000 synthetic reasoning traces generated by GPT-5 and further optimized through reinforcement learning with Group Sequence Policy Optimization, achieving 73.6% Fmax on GO term prediction. 5. Human protein experts preferred BioReason-Pro annotations over ground truth UniProt annotations in 79% of evaluated cases, with an LLM judge score of 8/10 for functional summaries, substantially outperforming previous methods. 6. Remarkably, BioReason-Pro de novo predicted experimentally confirmed binding partners with per-residue attention localizing to exact contact residues resolved in cryo-EM structures, demonstrating genuine structural reasoning capabilities. 7. The model successfully performed structural reasoning that overrode misleading superfamily-level domain annotations, such as correctly identifying CFAP61 as a non-enzymatic scaffold despite its Rossmann-like fold that typically indicates catalytic activity. 8. For eEFSec, BioReason-Pro identified SECIS-binding protein 2 as the obligate functional partner from sequence alone, with attention concentrated on the RIFT domain surface that matches the experimentally resolved SECIS RNA binding interface in PDB 7ZJW. 9. The system maintains strong performance even for proteins with very low sequence similarity to training data, with performance degrading much more slowly than BLAST as sequence identity decreases, indicating learned generalizable reasoning rather than simple homology transfer. 10. All model weights, code, and curated datasets are released publicly, alongside precomputed predictions for over 240,000 proteins including the Human Protein Atlas, enabling broad adoption for functional annotation of uncharacterized proteins. 💻Code: https://t.co/52TcS08BmC 📜Paper: https://t.co/YrF9y6yaHW #BioReasonPro #ProteinFunction #ComputationalBiology #Bioinformatics #MachineLearning #LLM #GeneOntology #ProteinStructure #FunctionalAnnotation #AIforScience

BiologyAIDaily's tweet photo. BioReason-Pro: Advancing Protein Function Prediction with Multimodal Biological Reasoning @arcinstitute

1. BioReason-Pro introduces the first multimodal reasoning large language model specifically designed for protein function prediction, combining protein embeddings with biological context to generate interpretable reasoning traces rather than just classification labels.

2. The system integrates ESM3 protein embeddings, a GO graph encoder, and biological context including organism, domains, protein-protein interactions, and GO-GPT predictions to perform step-by-step biological reasoning from sequence to function.

3. GO-GPT, a key component, is the first autoregressive transformer for Gene Ontology prediction that captures hierarchical and cross-aspect dependencies between GO terms, achieving state-of-the-art Fwmax of 0.65-0.70 across inference strategies.

4. The model was trained on over 130,000 synthetic reasoning traces generated by GPT-5 and further optimized through reinforcement learning with Group Sequence Policy Optimization, achieving 73.6% Fmax on GO term prediction.

5. Human protein experts preferred BioReason-Pro annotations over ground truth UniProt annotations in 79% of evaluated cases, with an LLM judge score of 8/10 for functional summaries, substantially outperforming previous methods.

6. Remarkably, BioReason-Pro de novo predicted experimentally confirmed binding partners with per-residue attention localizing to exact contact residues resolved in cryo-EM structures, demonstrating genuine structural reasoning capabilities.

7. The model successfully performed structural reasoning that overrode misleading superfamily-level domain annotations, such as correctly identifying CFAP61 as a non-enzymatic scaffold despite its Rossmann-like fold that typically indicates catalytic activity.

8. For eEFSec, BioReason-Pro identified SECIS-binding protein 2 as the obligate functional partner from sequence alone, with attention concentrated on the RIFT domain surface that matches the experimentally resolved SECIS RNA binding interface in PDB 7ZJW.

9. The system maintains strong performance even for proteins with very low sequence similarity to training data, with performance degrading much more slowly than BLAST as sequence identity decreases, indicating learned generalizable reasoning rather than simple homology transfer.

10. All model weights, code, and curated datasets are released publicly, alongside precomputed predictions for over 240,000 proteins including the Human Protein Atlas, enabling broad adoption for functional annotation of uncharacterized proteins.

💻Code: https://t.co/52TcS08BmC
📜Paper: https://t.co/YrF9y6yaHW
#BioReasonPro #ProteinFunction #ComputationalBiology #Bioinformatics #MachineLearning #LLM #GeneOntology #ProteinStructure #FunctionalAnnotation #AIforScience

1

84

20

47

6K

arman1sa retweeted

Bo Wang

@BoWang87

3 months ago

Our X-cell is up at @biorxiv_bioinfo ! Read our full paper at https://t.co/qdLD7mTIDy Part of the data and the model weights will be shared soon. stay tuned!

BoWang87's tweet photo. Our X-cell is up at @biorxiv_bioinfo !

Read our full paper at https://t.co/qdLD7mTIDy

Part of the data and the model weights will be shared soon. stay tuned!

3

100

21

43

20K

arman1sa retweeted

Mehran Karimzadeh @MKarimzade

3 months ago

1/ A year ago, I was skeptical about LLMs and RL in biology. Today, I’m inspired by the results and the massive potential ahead. Biomedical AI is thriving thanks to both the visionaries imagining and building the future and also those that remind us of the limitations ...

2

18

3

6

2K

Arman Seyed-Ahmadi @arman1sa

3 months ago

@alifmunim Thank you Alif! Honored to be part of such an amazing team 🙏

0

1

0

21

Arman Seyed-Ahmadi @arman1sa

3 months ago

@GongDennis Thank you for checking it out! There's very high traffic right now and some of the requests might fail🥲 You can find the Catalogue if you scroll down on the home page (we're making a fix so it autoscrolls)

1

0

82

Arman Seyed-Ahmadi @arman1sa

3 months ago

What if AI could explain why a protein is a kinase, not just tell you it is? We built just that. BioReason-Pro is a multimodal LLM that reasons about protein function — walking through domains, interactions, and biological context to make predictions you can actually evaluate.

3

54

9

30

8K

arman1sa retweeted

Adib

@adibvafa

3 months ago

BioReason-Pro was trained on synthetic reasoning traces from GPT-5. While the coding agent hype train is in full speed, the true impact of LLMs will come in biology. This is the best time to be a Bio AI researcher. You finally have the tools to address humanity's most challenging problems. Today we took the first step by releasing a reasoning model for proteins. Can't wait for what we released in 10 years. Ad vitam!

1

73

13

38

7K

arman1sa retweeted