If you are interested in working at the intersection of AI, robotics/automation and biology, come and build the next generation of open source toolkits for scientific discovery w/ me at @argonne . Apply here: https://t.co/3Eg2Gf2KyQ
It was a privilege to present at the Caltech UChicago AI +
Science meeting following none other than @AnimaAnandkumar and @francesarnold and all the wonderful speakers including @KyleCranmer! Really thrilled to see how AI is impacting so many facets of scientific discovery!
@ferruz_noelia Hi @ferruz_noelia - very interesting work! Have been following a lot of your lab’s work. Our paper on using DPO for proteins design integrating natural language was recently presented at supercomputing conference with @AnimaAnandkumar. https://t.co/GCJLM4B2rc
MProt-DPO: Breaking the ExaFLOPS Barrier for Multimodal Protein Design Workflows with Direct Preference Optimization
• This paper presents MProt-DPO, an innovative framework that achieves ExaFLOPS-scale performance for protein design, combining AI and high-performance computing (HPC) to generate and optimize protein sequences across multiple supercomputers.
• Key breakthroughs include the integration of Direct Preference Optimization (DPO), allowing fine-tuning of protein language models based on preferred structural and functional characteristics, enhancing model capability to generate “fit” protein variants effectively.
• MProt-DPO utilizes multimodal input (sequence, structural, and natural language descriptors), bridging data from simulations and experiments to guide protein design towards desired functional landscapes.
• Achieved a record-breaking 4.11 ExaFLOPS sustained performance on Aurora, demonstrating scalability across five supercomputing systems, marking a milestone in protein engineering and HPC synergy.
@arvindr_@AnimaAnandkumar@mpapka@ChaoweiX@Shengchao_Liu@archit_vasan@servesh_m@WardLT2@argonne
📜Paper: https://t.co/ykOoJ2UXO7
#ProteinDesign #HPC #MachineLearning #AI #ComputationalBiology
What's more fun when you can break the Exascale barrier for #protein design workflows with #artificalintelligence, that can be prompted with natural language. Read more here: https://t.co/8Ta3JANWgS
The fun part is in scaling these models across five supercomputing platforms including @ALCF's Aurora, @OLCFGOV Frontier, @CINECA's Leonardo and @cscsch Alps machines.
We show that it is possible to minimize hallucinations with such models by integrating experimental or simulation observables allowing the model to sample not only novel sequence space, but cover reliable ground in terms of protein fitness landscapes.
Interested in knowing about what @argonne is working on at the intersection of AI, robotics and scientific discovery? Look at our program line up for this year's Scientific Discovery in the AI Age meeting at Argonne. https://t.co/z7kkP4SRiN
We also have hands on tutorial sections, interactive sessions and lab tours to show how scientific discoveries are being transformed with generative AI.
Check out some recent work integrating MD simulations with rare event sampling and DeepDriveMD with the wonderful @ltchong and our group. Really enjoyed the research, discussions, and the directions this work is taking!
Unsupervised learning of progress coordinates during weighted ensemble simulations: Application to millisecond protein folding
- Improve rare events in protein folding (e.g., state transitions) through weighted ensemble simulation and an unsupervised deep learning model.
- Use a convolutional VAE to compress contact maps into latent space, and applies a Local Outlier Factor to identify outlier conformations, which are then replicated in the simulation.
- Training the CVAE on-the-fly works better than using a pretrained model.
Preprint: https://t.co/gcKKFubk2P