The Align Foundation

@Align_Bio

Improving life science with programmable experiments. 🧬 🤖 Home of the Protein Engineering Tournament & Open Datasets Initiative

Joined January 2022

279 Following

1.7K Followers

233 Posts

The Align Foundation

@Align_Bio

14 days ago

Antimicrobial resistance (AMR) is a growing global challenge, and we believe meaningful progress starts with bringing the right people together 🌍🦠 Over the past several months, @Align_Bio has been convening researchers, funders, clinicians, technologists, and ML experts to better understand the AMR landscape and identify where AI/ML predictive models can have the greatest impact. So far, we’ve: • Interviewed ~150 experts and stakeholders • Completed 6 virtual workshops • Engaged 65 workshop participants across sectors, disciplines and 17 countries These conversations have surfaced important gaps, promising directions, and opportunities for collaboration. We’re now working on a landscape summary of actionable ideas and a set of proposals outlining where focused efforts would make a difference 📊 If you’re interested in contributing ideas, perspectives, or expertise, we’d love to hear from you: [email protected] #AMR #AntimicrobialResistance #GlobalHealth #Biotech #Biosecurity #PublicHealth #Innovation #ScientificCollaboration #LifeSciences

Align_Bio's tweet photo. Antimicrobial resistance (AMR) is a growing global challenge, and we believe meaningful progress starts with bringing the right people together 🌍🦠

Over the past several months, @Align_Bio has been convening researchers, funders, clinicians, technologists, and ML experts to better understand the AMR landscape and identify where AI/ML predictive models can have the greatest impact.

So far, we’ve:
• Interviewed ~150 experts and stakeholders
• Completed 6 virtual workshops
• Engaged 65 workshop participants across sectors, disciplines and 17 countries

These conversations have surfaced important gaps, promising directions, and opportunities for collaboration. We’re now working on a landscape summary of actionable ideas and a set of proposals outlining where focused efforts would make a difference 📊

If you’re interested in contributing ideas, perspectives, or expertise, we’d love to hear from you: official@alignbio.org

#AMR #AntimicrobialResistance #GlobalHealth #Biotech #Biosecurity #PublicHealth #Innovation #ScientificCollaboration #LifeSciences

250

The Align Foundation

@Align_Bio

21 days ago

Benchmarking generative protein models at scale with GROQ-SEQ 📊 We built a unified benchmarking framework and experimentally tested alignment-based, language, and structure-based models on TEV protease. Structure-based models achieved up to 74% hit rates, a remarkably high level of experimental success, while computational predictions showed weak agreement with experimental activity. This collaboration with the @Deborah Marks Lab enabled large-scale experimental benchmarking of today’s leading generative protein models.🔬 🔗 Dive into the details: https://t.co/o0bLEq8qf0 #ProteinEngineering #ProteinDesign #MachineLearning #SyntheticBiology #Biotech #AIinBiology #DataScience #GROQseq

Align_Bio's tweet photo. Benchmarking generative protein models at scale with GROQ-SEQ 📊

We built a unified benchmarking framework and experimentally tested alignment-based, language, and structure-based models on TEV protease.

Structure-based models achieved up to 74% hit rates, a remarkably high level of experimental success, while computational predictions showed weak agreement with experimental activity.

This collaboration with the @Deborah Marks Lab enabled large-scale experimental benchmarking of today’s leading generative protein models.🔬

🔗 Dive into the details: https://t.co/o0bLEq8qf0

#ProteinEngineering #ProteinDesign #MachineLearning #SyntheticBiology #Biotech #AIinBiology #DataScience #GROQseq

The Align Foundation

@Align_Bio

28 days ago

🧬 How Much Diversity Can Proteases Handle? GROQ-seq Has Answers We profiled 11,722 sequence-diverse protease homologs, including AI-designed minimized variants, and uncovered robust activity across surprisingly distant sequences. Even across extreme sequence divergence, protease function against the canonical TEV protease substrate persists, with distinct homologs showing activity at as low as 19% sequence identity. Datasets like this are key to unlocking the next generation of machine learning models for protein engineering 🤖 🔗 Read more: https://t.co/wiYnTTzyW1 #ProteinEngineering #MachineLearning #SyntheticBiology #Biotech #EnzymeEngineering #HighThroughput #DataScience #AIinBiology #GROQseq

Align_Bio's tweet photo. 🧬 How Much Diversity Can Proteases Handle? GROQ-seq Has Answers

We profiled 11,722 sequence-diverse protease homologs, including AI-designed minimized variants, and uncovered robust activity across surprisingly distant sequences.

Even across extreme sequence divergence, protease function against the canonical TEV protease substrate persists, with distinct homologs showing activity at as low as 19% sequence identity.
Datasets like this are key to unlocking the next generation of machine learning models for protein engineering 🤖

🔗 Read more: https://t.co/wiYnTTzyW1

#ProteinEngineering #MachineLearning #SyntheticBiology #Biotech #EnzymeEngineering #HighThroughput #DataScience #AIinBiology #GROQseq

The Align Foundation

@Align_Bio

about 1 month ago

📊 Engineering better PETases isn’t just a modeling problem, it’s a data problem. In the PETase Engineering Tournament, we partnered to develop three independent assay platforms to measure expression and activity across temperature and pH: cell-free systems, E. coli + Rapid Fire Mass Spec, and microfluidic droplets. The takeaways: → High-throughput ≠ high-quality → Realistic assays ≠ scalable assays → Generating reliable, ML-ready data is still the bottleneck Though challenging, this is the kind of groundwork needed to actually move the field forward. 🔗Full methods + learnings: https://t.co/IzCBunHrES 💪 Many thanks to our sponsor Twist Bioscience for DNA synthesis and to Adaptyv for their valuable collaboration on assay development. #ProteinEngineering #SyntheticBiology #EnzymeEngineering #MachineLearning #Biotech #AssayDevelopment #DataScience #Bioengineering #Sustainability #PlasticRecycling

Align_Bio's tweet photo. 📊 Engineering better PETases isn’t just a modeling problem, it’s a data problem.

In the PETase Engineering Tournament, we partnered to develop three independent assay platforms to measure expression and activity across temperature and pH: cell-free systems, E. coli + Rapid Fire Mass Spec, and microfluidic droplets.

The takeaways:
→ High-throughput ≠ high-quality
→ Realistic assays ≠ scalable assays
→ Generating reliable, ML-ready data is still the bottleneck

Though challenging, this is the kind of groundwork needed to actually move the field forward.
🔗Full methods + learnings: https://t.co/IzCBunHrES

💪 Many thanks to our sponsor Twist Bioscience for DNA synthesis and to Adaptyv for their valuable collaboration on assay development.

#ProteinEngineering #SyntheticBiology #EnzymeEngineering #MachineLearning #Biotech #AssayDevelopment #DataScience #Bioengineering #Sustainability #PlasticRecycling

Who to follow

Erika Alden DeBenedictis

@erika_alden_d

CEO of @Pioneer__Labs, founder of @Align_Bio, resident @AsteraInstitute Former astronomer 🌎 recovering computer scientist 🤖 current molecular biologist 🧬🧪

Judy Savitskaya

@heyjudka

Climatetech @ Frontier via BioE PhD, @a16z investor, founder. Tweets on 🏔️ climate, 🧬 syn bio, 💊 therapeutics, 👩‍💻 comp bio, 🌱 agriculture, 🚀 startups

Nucleate

@NucleateHQ

Nucleate empowers the next generation of biotech leaders.

The Align Foundation

@Align_Bio

about 1 month ago

👉 GROQ-seq is Live: Quantitative Protein Function at Scale Each dataset is powered by a function specific genetic circuit in E. coli: • Transcription factors → regulate DHFR expression via operator binding • T7 RNA polymerase → drives transcription from a T7 promoter • TEV protease → cleaves a split DHFR reporter to modulate growth Function → growth → sequencing All calibrated. All comparable. This is how we start making protein function datasets that are quantitative and generalizable. Dive into the data: https://t.co/5Wc2NKid60 More about each circuit: 📊Transcription factors →https://t.co/UNuBfmFXyd 📊T7 RNA polymerase →https://t.co/XHYMs4VACM 📊TEV protease →https://t.co/LCHfuhk8st #SyntheticBiology #ProteinEngineering #OpenScience #AI #Biotech #ProteinML #GROQSEQ

The Align Foundation

@Align_Bio

about 2 months ago

🚀 GROQ-seq: Scale Meets Reproducibility You’ve seen The Align Foundation's recent GROQ-seq data releases on transcription factors, T7 polymerase and TEV protease (https://t.co/te33eiBrB3). But protein ML doesn’t just need more data. It needs data you can trust. GROQ-seq delivers both: 📈 Scale (still a critical bottleneck) 🔁 Reproducibility (what makes that scale usable) We stress-tested it: • Same sequence, different barcodes → consistent results (Spearman 0.875) • Same protocol, different facilities → strong agreement (Spearman ~0.8) • Indistinguishable measurements between sites (AUC ≈ 0.56) • Recover top variants from both sites (fold enrichment >5) The Align Foundation is delivering the difference between “more data” and usable data. Because if your measurements don’t reproduce, your models don’t generalize. GROQ-seq is built for: ✔ aggregation across datasets ✔ reliable model training ✔ real-world protein design This is how we move from fragmented assays to foundation datasets for biology. 🔗Check out all the details: https://t.co/TlQuIxCoIZ #OpenScience #ProteinEngineering #BioAI #MachineLearning #Reproducibility #ProteinML #SyntheticBiology #AIforBiology #GROQSEQ

The Align Foundation

@Align_Bio

about 2 months ago

📢 HiBiT Feasibility Study: Toward Scalable Protein Expression Measurement In our new report, @Align_Bio explores how to address one of the key bottlenecks in protein engineering: reliable prediction of soluble protein expression. In collaboration with @Ginkgo Bioworks and with leading contributions from Kasia Baranowski, we evaluate the HiBiT luminescence-based reporter as a scalable, quantitative approach to measure protein expression across hosts. Tested in both E. coli and P. pastoris, the assay shows strong potential as a standardized, high-throughput method for generating reproducible expression data. This work contributes to the development of large-scale datasets required to train more accurate models aimed at predicting protein expression. 🔗 Access the report: https://t.co/5TjpxtjHi6 #SyntheticBiology #ProteinEngineering #BioAI #MachineLearning #HiBiT #Ecoli #Ppastoris #OpenScience

Align_Bio's tweet photo. 📢 HiBiT Feasibility Study: Toward Scalable Protein Expression Measurement

In our new report, @Align_Bio explores how to address one of the key bottlenecks in protein engineering: reliable prediction of soluble protein expression.

In collaboration with @Ginkgo Bioworks and with leading contributions from Kasia Baranowski, we evaluate the HiBiT luminescence-based reporter as a scalable, quantitative approach to measure protein expression across hosts. Tested in both E. coli and P. pastoris, the assay shows strong potential as a standardized, high-throughput method for generating reproducible expression data.

This work contributes to the development of large-scale datasets required to train more accurate models aimed at predicting protein expression.
🔗 Access the report: https://t.co/5TjpxtjHi6

#SyntheticBiology #ProteinEngineering #BioAI #MachineLearning #HiBiT #Ecoli #Ppastoris #OpenScience

648

The Align Foundation

@Align_Bio

2 months ago

📢 Data Release Tuesday: Align TEV Protease Dataset 📊 We’re expanding the The Align Foundation data ecosystem again with ~30,000 high-quality GROQ-seq data points capturing TEV protease sequence–function relationships at scale. To our knowledge, this is the largest mutational dataset on TEV protease to date. Notably, no comprehensive deep mutational scanning study across the full protein has been reported in over three decades, leaving key aspects of its functional landscape unexplored. TEV protease is a cornerstone tool in biotechnology, known for its high substrate specificity, and this dataset provides a rich resource for enzyme engineering and ML-driven protein design. This release was made possible by a strong cross-team effort, with key contributions from Erika Alden DeBenedictis, @Anjali Chadha, @Dave Ross’s team at National Institute of Standards and Technology (NIST) and the DAMP Lab at Boston University 🔗 Access the dataset: https://t.co/BuoZK3L4WO #OpenScience #SyntheticBiology #ProteinEngineering #BioAI #MachineLearning #AlignData #GROQSEQ #Protease #TEV

The Align Foundation

@Align_Bio

2 months ago

📢 Public Data Release: Align T7 RNA Polymerase Dataset. 📊The data keeps coming at @Align_Bio! We’re excited to release our T7 RNA polymerase dataset, adding ~35,000 unique GROQ-seq data points to the growing Align data ecosystem, capturing sequence–function relationships across variants at scale. To our knowledge, this is the largest mutational dataset on T7 RNA polymerase to date! 🔗 Access the dataset on the Align Data Portal: https://t.co/ivyvCZX1qL #OpenScience #SyntheticBiology #ProteinEngineering #BioAI #MachineLearning #AlignData #GROQSEQ #RNApolymerase

The Align Foundation

@Align_Bio

3 months ago

🔗 Access the dataset on the Align Data Portal: https://t.co/kqjqnqIqNC 👏 Huge thanks to our collaborators and everyone involved in making this dataset possible.

146

The Align Foundation

@Align_Bio

3 months ago

🚀 Public Data Release: GROQ-seq Transcription Factor Dataset We’re excited to announce the public release of The Align Foundation's first GROQ-seq dataset, one of the largest high-resolution transcription factor datasets of its kind.

The Align Foundation

@Align_Bio

3 months ago

This is the result of a fantastic collaboration with David Ross's Lab at @NIST, @simonsnitz at @Harvard and @Damp_Lab. We’re thrilled to make the data publicly accessible to support the broader research community.

169

The Align Foundation

@Align_Bio

3 months ago

Align and @GoogleDeepMind are partnering to build AI-ready datasets & evaluations for the future of predictive #AMR biology. Researchers worldwide can submit concepts through March 31 w/ roadmapping workshops coming to North America + APAC this spring. 🔗 https://t.co/iPWuKu8PDw

28K

The Align Foundation

@Align_Bio

5 months ago

Designed to be scalable and with closed-loop experimentation, Tesseract will power AI to map sequence → function, predict gene transfer success, and assign function to unknown genes from rich phenotypic signatures. Big thanks to the collaborators+reviewers who helped shape this!

276

The Align Foundation

@Align_Bio

5 months ago

In partnership with @Pioneer__Labs, we’re proposing Tesseract: a large-scale, open microbial phenomics dataset to functionally annotate microbial genomes at scale. 🧬🤖 ✅5M diverse genes x 50 host strains × 100 conditions 🔗 Read the proposal: https://t.co/veVh0oPplB

719

The Align Foundation

@Align_Bio

6 months ago

🚀 Team growth: 2x growth in team size vs last year to support all this open science. Huge thanks to collaborators, participants, and the entire Align team. You make this possible!

197

The Align Foundation

@Align_Bio

6 months ago

🔬 ALIGN WRAPPED: our year in open science 🔬 Thread 🧵 We turned up the impact. Here are highlights from the past 12 months at The Align Foundation. Want to collaborate or learn more? ✉️ [email protected] #AIforBiology #OpenScience #ProteinEngineering #Microbes #Data

584

The Align Foundation

@Align_Bio

6 months ago

🎤 Outreach: Attended 30+ conferences to present our work, build partnerships, and spread open science worldwide.

132

The Align Foundation

@Align_Bio

6 months ago

📝 4 peer-reviewed publications: • Data Scaling • Results of the Protein Engineering Tournament • Can protein expression be ‘solved’? • The influence of automation and AI on biology

156

The Align Foundation

@Align_Bio

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users