Hugging Science

@huggingscience

The @huggingface community effort for science.

Joined November 2025

8 Following

267 Followers

38 Posts

huggingscience retweeted

1 day ago

We’re excited to share the full binder design protocol. Check it out here: https://t.co/AtkipkiYtS. The notebook includes support for @modal to easily scale up binder generation. Give it a try and let us know how it works! You can read more about ESMFold2, ESMC, ESM Atlas, and the full results in the paper here: https://t.co/M3rt00pU8Z.

1

74

22

59

19K

huggingscience retweeted

Georgia Channing

2 days ago

huge for getting medical data ACTUALLY used by the machine learning community All publicly funded science should be open source 🔥🔥🔥

5

92

13

41

17K

huggingscience retweeted

Daniel van Strien @vanstriendaniel

2 days ago

Hugging Face is the home for AI & ML across every domain, including biomedical! The @NIH just added the @huggingface Hub to its official list of Generalist Repositories for data sharing. NIH-funded? You can point to the Hub in your data sharing plan 🤗

vanstriendaniel's tweet photo. Hugging Face is the home for AI & ML across every domain, including biomedical!

The @NIH just added the @huggingface Hub to its official list of Generalist Repositories for data sharing.

NIH-funded? You can point to the Hub in your data sharing plan 🤗 https://t.co/OhJIMDaGcW

5

74

23

24

26K

huggingscience retweeted

Michael Rabinovich

5 days ago

Opus 4.8 just dropped and I ran it through our CAD tasks. 4.6 → 4.7 → 4.8 side by side. The results are unexpected!

199

4K

194

2K

705K

Hugging Science @huggingscience

5 days ago

https://t.co/FGlXhw0Bg6

0

1

0

1

62

Hugging Science @huggingscience

5 days ago

on hugging science: mattergen ⚛️ generative ai for materials. you give it a target property, it proposes novel inorganic crystal structures to match. inverse design instead of screen-and-pray. built for energy, catalysis and functional materials research. weights on the hub.

1

4

0

1

110

huggingscience retweeted

Georgia Channing

7 days ago

today was a massive day for protein engineering. esmfold2 dropped—next gen of the esm series, fully open on @huggingscience. 1.1 billion predicted structures, 6.8 billion sequences. 800m more entries than the alphafold db, and reportedly edging out alphafold3 on protein complexes, including antibody–antigen binding. alongside it: the new esm atlas. a huge expansion of known protein space, heavy on metagenomic sequences from soil, ocean, and the parts of biology that have been least characterised (until now!!) and if that weren't enough, litefold dropped the fineweb of proteins, so every major protein database (pdb included) aggregated, cleaned, and made plug-and-play in one place. these are the releases that push the whole field forward, and the pace of open science right now is almost motion-sickness inducing all of it on https://t.co/T4l4r1lDz0 (and ofc @huggingface)

cgeorgiaw's tweet photo. today was a massive day for protein engineering.

esmfold2 dropped—next gen of the esm series, fully open on @huggingscience. 1.1 billion predicted structures, 6.8 billion sequences. 800m more entries than the alphafold db, and reportedly edging out alphafold3 on protein complexes, including antibody–antigen binding.

alongside it: the new esm atlas. a huge expansion of known protein space, heavy on metagenomic sequences from soil, ocean, and the parts of biology that have been least characterised (until now!!)

and if that weren't enough, litefold dropped the fineweb of proteins, so every major protein database (pdb included) aggregated, cleaned, and made plug-and-play in one place.

these are the releases that push the whole field forward, and the pace of open science right now is almost motion-sickness inducing

all of it on https://t.co/T4l4r1lDz0 (and ofc @huggingface)

9

345

72

155

36K

huggingscience retweeted

Loubna Ben Allal

@LoubnaBenAllal1

13 days ago

What can a DNA foundation model actually do? We got this question a lot after releasing Carbon, our new DNA model. Here are three things it does. 🧬 All live in our demo: https://t.co/8NtRlHQG3H

5

86

19

44

17K

huggingscience retweeted

Steven Dillmann

@StevenDillmann

14 days ago

📣 Announcing Terminal-Bench Science: benchmarking AI agents on real scientific workflows – now open for task contributions👇 https://t.co/MSPMwnbhVt @AnthropicAI, @OpenAI, and @GoogleDeepMind use Terminal-Bench to evaluate AI on coding tasks. We're now extending it to scientific workflows. 1/6🧵

StevenDillmann's tweet photo. 📣 Announcing Terminal-Bench Science: benchmarking AI agents on real scientific workflows – now open for task contributions👇

https://t.co/MSPMwnbhVt

@AnthropicAI, @OpenAI, and @GoogleDeepMind use Terminal-Bench to evaluate AI on coding tasks. We're now extending it to scientific workflows.

1/6🧵

16

494

112

271

903K

huggingscience retweeted

Loubna Ben Allal

@LoubnaBenAllal1

15 days ago

Introducing Carbon 🧬 a family of open generative DNA foundation models. Carbon-3B matches Evo2-7B while running 250x faster at inference. It can generate new DNA sequences and score the functional impact of mutations, zero-shot. We borrowed a lot from how modern LLMs are trained, but DNA isn't language. Genomes are noisy, redundant, and shaped by evolution rather than communication. So we adjusted the recipe: Tokenizer. Most genomic models tokenize at the nucleotide/character level, which blows up sequence length. BPE is the obvious LLM-style fix, but it doesn't behave well on DNA. We use deterministic 6-mer tokens (one token = 6 nucleotides): 6× shorter sequences and cheaper attention. Training loss. With 6-mer tokens, cross-entropy scores a prediction that gets 5/6 nucleotides right the same as one that's completely wrong. This gets brittle late in training and produces loss spikes. We switch mid-training to a more flexible factorized loss (FNS). Data. Genomes are mostly sparse, repetitive background. We curate down to a staged functional DNA + mRNA mixture, with every ratio chosen by ablation, like mixing a web corpus, but for biology. We're releasing the models, training data, training code, evaluation suite, and a demo to play with. More details in the technical report: https://t.co/RMzFmTAhhT Demo to play with the model, with a biology primer for our ML friends ;) https://t.co/IcOQq7GKF4

16

360

82

230

40K

huggingscience retweeted

Georgia Channing

15 days ago

Hugging Science just got a whole lot more huggier 🤗🤗🤗 Today, we’re releasing a family of genomics models, which we call Carbon

cgeorgiaw's tweet photo. Hugging Science just got a whole lot more huggier 🤗🤗🤗

Today, we’re releasing a family of genomics models, which we call Carbon https://t.co/N90YPfkJyX

8

279

45

119

19K

huggingscience retweeted

about 1 month ago

Super happy to have this one out. A clean organized up-to-date view of all the science resources (chemistry, biology, physics, materials, math) people have been sharing on the Hugging Face hub: datasets, blogs, models and more

7

63

6

34

16K

huggingscience retweeted

Leandro von Werra

about 1 month ago

AI for Science: this is the new frontier for AI and making progress here will impact all of humanity. The new Hugging Science site is here to make sure it is open and accessible to every researcher! Datasets, models, leaderboards, blogs, guides: https://t.co/31Ryr8K2VE

4

49

4

25

7K

huggingscience retweeted

OpenMed @OpenMed_AI

about 1 month ago

What OpenMed contributes to https://t.co/dc2nOaoYLt: → 1,000+ clinical and medical NER models → PII detection in 9 languages → SuperClinical: #1 on the PII Masking leaderboard → Privacy-filter-nemotron (OpenAI base, retrained for medical) Apache 2.0. On-prem deployable.

OpenMed_AI's tweet photo. What OpenMed contributes to https://t.co/dc2nOaoYLt:

→ 1,000+ clinical and medical NER models
→ PII detection in 9 languages
→ SuperClinical: #1 on the PII Masking leaderboard
→ Privacy-filter-nemotron (OpenAI base, retrained for medical)

Apache 2.0. On-prem deployable. https://t.co/rD0UybbHCn

5

9

2

2

660

huggingscience retweeted

Georgia Channing

about 1 month ago

🤗🤗🤗introducing Hugging Science -- the home of AI for science 🤗🤗🤗 open models and datasets are the powerhouse of science (see the PDB), but finding the models and data you actually need for your breakthrough is hard af you shouldn't need to scrape arxiv, own your own wetlab, fight a custom HDF5 parser, build a fusion stellarator, and beg for compute before you've trained a single epoch so we're changing that we've put all the best science on @huggingface in one place: - 78GB of genomics data - 11TB of PDE simulations - 100M cell profiles - 9T DNA base pairs - 13M molecular trajectories - 400k medical QA pairs and much more, all open, and all ready for training (+ you can also now filter and search by domain, task, and keyword) we've put together all the biggest releases from our partners at NASA, Google, OpenAI, Meta FAIR, Arc Institute, Ginkgo, SandboxAQ, Proxima Fusion, NVIDIA, Ai2, OpenADMET, InstaDeep, Future House, Polymathic AI, LeMaterial, Earth Species Project, Merck, and Eve Bio if you're not sure where you fit in -- work on open challenges for problems that matter: including fusion stellarator design, ADMET, antibody developability, multilingual medicine, catalysis and materials, and scientific reasoning. we're already changing how science gets done: a fusion startup needed a benchmark for stellarator plasma confinement that didn't exist. @proximafusion shipped ConStellaration on Hugging Science: a leaderboard, dataset, and eval metrics, all in one place. a drug discovery team wanted to predict hPXR induction. OpenADMET put up a blind challenge: 11,000+ compounds assayed at Octant, 513 held out, two tracks (pEC50 + structure). Anyone in the world can train and submit. an antibody team at @Ginkgo released GDPa1, a developability dataset for stability, manufacturability, and immunogenicity prediction, with a live leaderboard scoring every submission. if you know a problem the ML community should be working on, let us know. make a challenge! this is about putting all the tools for solving science in one place. so we can hillclimb! → https://t.co/T4l4r1lDz0

56

2K

350

1K

198K

huggingscience retweeted

Georgia Channing

about 2 months ago

immaculate community energy last night with the people behind: > EquiformerV3 > The Well > NVIDIA Atlas > Meta OMat ✨✨✨

1

17

1

1

673

huggingscience retweeted

Georgia Channing

about 2 months ago

tomorrow! sf! come hang! https://t.co/Fcq9JtXxHp

0

6

2

1

660

huggingscience retweeted

Georgia Channing

about 2 months ago

🤗 Hosting a happy hour to meet the people building the future of open science in SF on Thursday! Wanna come? https://t.co/Fcq9JtXxHp

0

6

2

1

713

huggingscience retweeted

about 2 months ago

We taught a DNA model to learn its own tokenization. It learned the genetic code with no supervision. And outperforms Evo 2's architecture with 3x faster inference. Great work with Arnav (@arnavshah0), Victor (@victor_ljz), Parsa (@Radii2323), Brandon (@fluorane), Sukjun (@sukjun_hwang), Bo Wang (@BoWang87), Patrick Hsu (@pdhsu), Hani Goodarzi (@genophoria) and Albert Gu (@_albertgu) 🔥

1

111

19

67

10K

huggingscience retweeted

Aishwarya Kamath @ashkamath20

about 2 months ago

We released Gemma 4 last week, and seeing the community's response has been amazing! 🚀 Honored to lead the vision efforts in which we made huge performance leaps from Gemma 3, I wanted to help you make the most of the new capabilities. Deep dive 🧵

ashkamath20's tweet photo. We released Gemma 4 last week, and seeing the community's response has been amazing! 🚀

Honored to lead the vision efforts in which we made huge performance leaps from Gemma 3, I wanted to help you make the most of the new capabilities. Deep dive 🧵 https://t.co/nP6ogGgghf

26

898

107

537

47K

Last Seen Users on Sotwe

Trends for you

Most Popular Users