William Steele

@willjsteele

Bioinformatics Senior Scientist @rxbiologics. On a mission to improve drug discovery using machine learning (all views expressed are my own)

Joined January 2010

1.5K Following

234 Followers

1.3K Posts

Pinned Tweet

William Steele @willjsteele

over 5 years ago

@CodeWisdom "Programming isn't about what you know; it's about what you can copy and paste from stack overflow" - someone in a hurry

4

33

5

0

0

willjsteele retweeted

about 2 months ago

🔓🧬First big unlock from vibe science-ing: rapid access to publicly available datasets. Sounds basic. It isn't. If you've ever tried to pull raw data from a paper you care about, you know: the metadata is a mess, the supplementary tables are unstructured, the file formats don't match, and by the time you've got it working you've lost half a day. For people without strong bioinformatics skills, it's often a dead end entirely. 1/🧵

7

118

20

104

13K

William Steele @willjsteele

about 2 months ago

@Oliver__Hahn Incredible tweet! I too have found huge value in paper associated datasets, but felt the pain of cleaning them up. Nice that this might be over to a large extent.

1

0

0

0

184

William Steele @willjsteele

about 2 months ago

@ChrisHayduk IMO the problematic part is the generation of good quality data, not the analysis of said data. The costs involved often make experimentation difficult to justify.

0

0

0

0

24

Who to follow

Verified account

science nerd; biological AI agent; into art, travel, spirituality & general curiosity send fiat if you are too wealthy. UPenn ‘18, ex @TwistBioscience, @ $SDGR

Abdullah Al Nahid

Postdoc researcher @ljiresearch | Scientist by profession and passion| Creator of https://t.co/j4E4bYMKp1

William Steele @willjsteele

2 months ago

@adamlewisgreen Really interesting, thanks for sharing. How did you come across the 2022 statistics paper, and how did you know it was worth building on?

1

2

0

0

1K

willjsteele retweeted

3 months ago

I'm rebuilding AlphaFold2 from scratch in pure PyTorch. No frameworks on top of PyTorch. No copy-paste from DeepMind's repo. Just nn.Linear, einsum, and the 60-page supplementary paper. The project is called minAlphaFold2, inspired by Karpathy's minGPT. The idea is simple: AlphaFold2 is one of the most important neural networks ever built, and there should be a version of it that a single person can sit down and read end-to-end in an afternoon. Where it stands today: - ~3,500 lines across 9 modules - Full forward pass works: input embedding → Evoformer → Structure Module → all-atom 3D coordinates - Every loss function from the paper (FAPE, torsion angles, pLDDT, distogram, structural violations) - Recycling, templates, extra MSA stack, ensemble averaging — all implemented - 50 tests passing - Every module maps 1-to-1 to a numbered algorithm in the AF2 supplement The Structure Module was the most satisfying part to build. Invariant Point Attention is genuinely beautiful — it does attention in 3D space using local reference frames so the whole thing is SE(3)-equivariant, and the math fits in about 150 lines of PyTorch. What's next: - Build the data pipeline (PDB structures + MSA features) - Write the training loop - Train on a small set of proteins and see what happens The repo is public. If you've ever wanted to understand how AlphaFold2 actually works at the level of individual tensor operations, this is meant for you. Repo: https://t.co/k25vl5th1y

ChrisHayduk's tweet photo. I'm rebuilding AlphaFold2 from scratch in pure PyTorch.

No frameworks on top of PyTorch. No copy-paste from DeepMind's repo. Just nn.Linear, einsum, and the 60-page supplementary paper.

The project is called minAlphaFold2, inspired by Karpathy's minGPT. The idea is simple: AlphaFold2 is one of the most important neural networks ever built, and there should be a version of it that a single person can sit down and read end-to-end in an afternoon.

Where it stands today:
- ~3,500 lines across 9 modules
- Full forward pass works: input embedding → Evoformer → Structure Module → all-atom 3D coordinates
- Every loss function from the paper (FAPE, torsion angles, pLDDT, distogram, structural violations)
- Recycling, templates, extra MSA stack, ensemble averaging — all implemented
- 50 tests passing
- Every module maps 1-to-1 to a numbered algorithm in the AF2 supplement

The Structure Module was the most satisfying part to build. Invariant Point Attention is genuinely beautiful — it does attention in 3D space using local reference frames so the whole thing is SE(3)-equivariant, and the math fits in about 150 lines of PyTorch.

What's next:
- Build the data pipeline (PDB structures + MSA features)
- Write the training loop
- Train on a small set of proteins and see what happens

The repo is public. If you've ever wanted to understand how AlphaFold2 actually works at the level of individual tensor operations, this is meant for you.

Repo: https://t.co/k25vl5th1y

59

2K

257

1K

83K

willjsteele retweeted

4 months ago

In January, @jonhoo, @jjgort, and I returned to @MIT_CSAIL to teach Missing Semester, a class on topics missing from most CS programs—tools and techniques that everyone should know, like Bash, Git, CI/CD, and AI tools. Today, we’re releasing the course for free online!

anishathalye's tweet photo. In January, @jonhoo, @jjgort, and I returned to @MIT_CSAIL to teach Missing Semester, a class on topics missing from most CS programs—tools and techniques that everyone should know, like Bash, Git, CI/CD, and AI tools. Today, we’re releasing the course for free online! https://t.co/O0BNOa2Cak

16

1K

212

1K

89K

William Steele @willjsteele

4 months ago

@anishathalye @jeremyphoward @jonhoo @jjgort @MIT_CSAIL This course is amazing! Thanks so much for making it free and open 😍

0

1

0

0

81

William Steele @willjsteele

4 months ago

@JFPuget Cool challenge!

0

0

0

0

56

William Steele @willjsteele

4 months ago

@DdelAlamo Do you think this would translate well to finding shared antibody specificities based on loop structure similarity?

0

0

0

0

10

willjsteele retweeted

:probabl. @probabl_ai

6 months ago

🎁 We have a gift for you! You've heard about skrub and would like to discover more? Or you never heard about it, but struggle with data preprocessing? 📽️ Riccardo Cappuzzo did an awesome video at PyData that has been recorded: you can have a look here 👉 https://t.co/Cgm8r6wLxH

probabl_ai's tweet photo. 🎁 We have a gift for you!
You've heard about skrub and would like to discover more? Or you never heard about it, but struggle with data preprocessing?
📽️ Riccardo Cappuzzo did an awesome video at PyData that has been recorded: you can have a look here 👉 https://t.co/Cgm8r6wLxH https://t.co/QMmE6HZtJ4

0

7

3

3

1K

William Steele @willjsteele

6 months ago

@biocheMichael Love the vision! 🚀

0

1

0

0

14

William Steele @willjsteele

6 months ago

@jeremyphoward Read his paper 'Data analysis and statistics: an expository overview' a year ago and it blew my mind. Taught me that looking at residuals of a fitted model is a core part of data analysis. Amazing that something written so long ago remains so relevant!

0

3

0

1

200

William Steele @willjsteele

7 months ago

@simonw The live coding approach is much more useful. I basically learned to program from watching programmers do this on youtube. Seeing people getting stuck and how they get unstuck is a goldmine of insight.

0

1

0

0

105

William Steele @willjsteele

7 months ago

@lemire Can you point me to some good resources on this?

0

0

0

1

23K

William Steele @willjsteele

7 months ago

@svpino Good example of what @jeremyphoward and @clattner_llvm talked about in their recent video on AI coding

0

1

0

0

58

William Steele @willjsteele

7 months ago

Just finished a post explaining how to use the Union-Find algorithm for preparing protein structure data for ML model training😊 https://t.co/UGiiIynZAK

0

0

0

0

42

William Steele @willjsteele

7 months ago

@Noahpinion This is life in the UK in a nutshell.

0

1

0

0

167

William Steele @willjsteele

7 months ago

This is truly dystopian.. AI being used for anti-growth. https://t.co/DXnW7tl1c8

0

0

0

0

43

William Steele @willjsteele

7 months ago

@DudespostingWs This is essentially the plot of taxi driver

0

0

0

0

42

Last Seen Users on Sotwe

Trends for you

Most Popular Users