Will Connell @wilstc - Twitter Profile

Pinned Tweet

almost 3 years ago

🧬🔮 Single cell foundation models have been a recent hot topic in bio-ML! A few of the recent methods and some thoughts 🧬🔮 1) Geneformer 2) scGPT 3) scFoundation 4) Exceiver

4

251

59

210

52K

Will Connell @wilstc

about 1 month ago

Not sure it’s that binary. Previously impossible information synthesis is unlocking many new interpretations …but do you have a framework to verify?

Patrick OShaughnessy

@patrick_oshag

about 1 month ago

Alex on why AI drug discovery companies need to generate novel data to succeed: "AI models based on the research that's available is a lot of garbage in and garbage out." "A lot of the recorded literature is actually incorrect. There's been tons of studies that show if you go try to replicate the experiments that are in the literature, you don't even get the same results." "The AI companies that I believe are gonna be most set up for success are the companies with a novel way to generate science tokens that don't exist in the public domain."

17

253

39

196

126K

0

223

Will Connell @wilstc

about 2 months ago

@AllThingsApx Adept conclusion ✅

0

1

0

102

Will Connell @wilstc

about 2 months ago

There's a broader pattern here we're also finding success with @transcriptabio: provide structured data as context to elicit prior bio knowledge from an LLM. Here, there are steps of info restructure / distill via probes but worth asking - which are useful? Are they req? 🧐

Goodfire

@GoodfireAI

about 2 months ago

We achieved state-of-the-art performance in predicting which of 4.2 million genetic variants cause diseases by interpreting a genomics model, in a new preprint with @MayoClinic. We're now releasing an open source database for all variants in the NIH's clinvar database. 🧵(1/8)

GoodfireAI's tweet photo. We achieved state-of-the-art performance in predicting which of 4.2 million genetic variants cause diseases by interpreting a genomics model, in a new preprint with @MayoClinic.

We're now releasing an open source database for all variants in the NIH's clinvar database. 🧵(1/8) https://t.co/PTrRAqjDMA

10

881

171

581

221K

0

3

0

333

Who to follow

Theodoris Lab

@TheodorisLab

mapping gene networks to enable network-based therapeutic discovery in cardiovascular disease

ML for biology and drug discovery | Faculty @sangerinstitute @Cambridge_uni

wilstc retweeted

Mackenzie Morehead

@mackenziejem

2 months ago

New Post: Quantitative Look at Biotech Platforms Plenty has been written about bio platform strategy but no one's put numbers around it We used a whole lotta tokens to compile clinical, partnership + financial data on the 100 most successful public platform biotechs of all time

mackenziejem's tweet photo. New Post: Quantitative Look at Biotech Platforms

Plenty has been written about bio platform strategy but no one's put numbers around it

We used a whole lotta tokens to compile clinical, partnership + financial data on the 100 most successful public platform biotechs of all time https://t.co/JOSPpmbnyP

2

51

9

56

14K

wilstc retweeted

Leandro von Werra

@lvwerra

2 months ago

Auto-research for ML training models is all the rage now, but underrated is: auto-research for data! Sure, you can squeeze out a bit of model performance by optimizing hyperparameters, but code agents can do data work that has been very labour intensive and required a lot of attention to a lot details effortlessly: > download data from many different data sources > bring all the data sources into uniform format > do detailed EDA: find patterns and outliers > look at 100s of samples and take detailed notes > make beautiful infographics rather than mpl plots > iterate on data filtering by looking at more samples > make a simple pipelines robust and scalable It's now possible to write data pipelines for dozens of data sources in hours that would have taken weeks of reading many docs, debugging APIs and data formats, wrangling outliers and missing data. A few weeks ago we gave Claude access to the CPU partition of our cluster and it iteratively refined filters to retrieve a domain subset of FineWeb. This would have taken me 2-3 days to work through while it took Claude just a few hours with almost no babysitting and with a nice logbook. Thus the long tail of small, niche data sources becomes more accessible and can be aggregated to even larger high quality datasets for cool applications. Data has been fuelling LLM progress more than model architecture innovations, so I am very excited about this!

11

274

30

220

22K

Will Connell @wilstc

3 months ago

Great to see this out 👏🏼 plz read for a really big idea Also, I wrote an overview of Variational Synthesis in 2024 https://t.co/tQW6qQ47W6

Nature Biotechnology

@NatureBiotech

3 months ago

Manufacturing-aware generative models enable petascale synthesis of designed DNA https://t.co/keT4RxBJw1

2

159

44

100

29K

0

3

0

1

526

Will Connell @wilstc

3 months ago

"We find that intra-complex interactions are largely conserved, whereas inter-complex relationships are extensively rewired, revealing new context-dependent genetic dependencies." 👏 💡rich resource for virtual cell benchmarking to disambig contextual-modeling vs coexpr-modeling

LukeGilbert

@LukeGilbertSF

3 months ago

We mapped gene interactions across different environmental conditions (GxGxE) at scale for the first time in human cells. These maps lead to the realization that many genes function in a context dependent manner which provides insight into how humans have relatively few genes but many cell types. Congratulations Ben! Paper: https://t.co/w5bYZUUK4n

8

177

38

76

17K

0

1

0

385

Will Connell @wilstc

4 months ago

@j0hnparkhill @jermdemo What service did you use? I’d consider publishing my own 🤔

1

0

33

Will Connell @wilstc

4 months ago

I built Scaling Biology 🧬 — a dashboard that live-tracks the volume and growth of key biological data sources across genomics, transcriptomics, and proteomics. The project is open to community contributions, check out the repo linked in footer https://t.co/mTKv7Py4tr

2

127

17

82

7K

Will Connell @wilstc

4 months ago

@jermdemo any pointers toward resources that could help capture that stat?

1

0

170

wilstc retweeted

Xinming Tu

@TuXinming

4 months ago

1/13 Excited to share our (@anna_spiro @ChikinaLab @sara_mostafavi) latest preprint! 🧬💻 Personal Genome Prediction isn't just a downstream task—it’s the ultimate end-to-end benchmark for Variant Effect Prediction. We put the new SOTA AlphaGenome to the test and uncovered a striking "Modality Gap" between gene expression and chromatin accessibility. 📄 Link: https://t.co/Xj8wtbaAVt 🧵👇

1

87

27

65

8K

wilstc retweeted

Ronghui (Ron) Zhu @RonZhu2015

5 months ago

Together with Emma Dann, we are thrilled to present a massive new Perturb-seq atlas of 22M primary CD4+ T cells, from 4 donors, across 3 timepoints – the result of a decade-long collaboration between the Marson (@MarsonLab) and Pritchard (@jkpritch) labs. 🧵👇

RonZhu2015's tweet photo. Together with Emma Dann, we are thrilled to present a massive new Perturb-seq atlas of 22M primary CD4+ T cells, from 4 donors, across 3 timepoints – the result of a decade-long collaboration between the Marson (@MarsonLab) and Pritchard (@jkpritch) labs.
🧵👇 https://t.co/jiARZ6FNJS

4

242

52

104

39K

wilstc retweeted

Martin Borch Jensen

@MartinBJensen

7 months ago

The recent breakthroughs from @nablabio & @chaidiscovery emphasize a split in early biotech strategy. For the specific range of problems that antibodies address, making the binder, is becoming trivial. This forces a choice between 'fast but competitive' and 'AI intractable'. 🧵

4

92

11

64

14K

Will Connell @wilstc

7 months ago

This is a major reason I joined @transcriptabio We've proved our platform in rare disease – an area that uniquely allows you to: 1) realize the mission of helping people, immediately 2) receive the gold-standard of clinical feedback, immediately 📄: https://t.co/P2UipZuTuN

Dr. Shelby

@shelbynewsad

7 months ago

I will not stop tweeting this until every drug discovery company gets human evidence in 3 years or less

4

34

0

7

5K

0

2

0

515

Will Connell @wilstc

7 months ago

@RuxandraTeslo @JackScannell13 Nice, I wrote up a similar perspective earlier this year! https://t.co/cu8qOy2q8J

Will Connell @wilstc

over 1 year ago

R&D productivity for new drug approvals has steadily declined over the past 50 years. With the rise of AI tools, a common belief is that they will dramatically boost research productivity by accelerating the "search" for solutions—a sentiment echoed widely at #JPM this week. 🧵

1

7

0

2

1K

0

3

0

1

74

wilstc retweeted

Sanju Sinha @Sanjusinha7

7 months ago

Most current drug discovery efforts is structure-based eg. create small molecules or antibodies that best binds X. However, a drug may not drive its efficacy from its strongest binder. Taking a step away from structure-paradigm, we reason that if a CRISPR knockout of a gene mimics a drug's effects across cancer cell lines, that gene is likely the drug's target. This was done in @EytanRuppin in collaboration with @anideshpandelab and @BenDavidLab Using this principle, we integrated drug and crispr profiles from 1000s of drugs to find their context specific targets (different cancers or when known target is not expressed but drug is yet killing cancer cells). We call this tool DeepTarget. We show that this approach outperforms current structure based methods (AF3, RF, Chai) to find drug's target in a genome-wide search, when we had no information on what the target might be. We benchmarked in eight gold-standard drug-target pairs. It took us months to get this benchmarks (we hope this benchmark helps the field) We present two experimentally validated cases and pls see the paper for this (link at the end). An intriguing observation is that we had many cases where we have many small molecules targeting the same gene (eg. EGFR) and we found that small molecules with higher predicted target specificity show greater clinical advancement. Very happy to hear your feedback. Here's the free access link: https://t.co/r6EjR58xg2

Sanjusinha7's tweet photo. Most current drug discovery efforts is structure-based eg. create small molecules or antibodies that best binds X. However, a drug may not drive its efficacy from its strongest binder. Taking a step away from structure-paradigm, we reason that if a CRISPR knockout of a gene mimics a drug's effects across cancer cell lines, that gene is likely the drug's target. This was done in @EytanRuppin in collaboration with @anideshpandelab and @BenDavidLab

Using this principle, we integrated drug and crispr profiles from 1000s of drugs to find their context specific targets (different cancers or when known target is not expressed but drug is yet killing cancer cells).

We call this tool DeepTarget. We show that this approach outperforms current structure based methods (AF3, RF, Chai) to find drug's target in a genome-wide search, when we had no information on what the target might be. We benchmarked in eight gold-standard drug-target pairs. It took us months to get this benchmarks (we hope this benchmark helps the field)

We present two experimentally validated cases and pls see the paper for this (link at the end).

An intriguing observation is that we had many cases where we have many small molecules targeting the same gene (eg. EGFR) and we found that small molecules with higher predicted target specificity show greater clinical advancement.

Very happy to hear your feedback. Here's the free access link: https://t.co/r6EjR58xg2

9

195

44

139

47K

Will Connell @wilstc

7 months ago

@anshulkundaje Metrics are one thing, but there are other major challenges with data. This paper designs improved metrics to highlight bio signal, but my takeaway is actually that most (perturb-seq) datasets have a very low prediction ceiling https://t.co/ukIheaigtR

wilstc's tweet photo. @anshulkundaje Metrics are one thing, but there are other major challenges with data. This paper designs improved metrics to highlight bio signal, but my takeaway is actually that most (perturb-seq) datasets have a very low prediction ceiling

https://t.co/ukIheaigtR https://t.co/Ouq7pJlxAf

2

11

2

9

3K

Will Connell @wilstc

7 months ago

“this discussion on the challenges of evaluating a Foundation model is more interesting than the challenge itself.” Agreed!

dalloliogm @dalloliogm

7 months ago

My second post on the Arc Virtual Cell Challenge. The challenge’s Discord forums are in turmoil. Some participants have discovered a trick to get to the top of the leaderboard. https://t.co/aRb83MyCCn #arc_virtual_cell_challenge #foundation_models

4

94

14

75

49K

1

6

0

1

1K

Will Connell @wilstc

7 months ago

@anshulkundaje Yeah, this task is 1/n you expect such a model to perform well on. Creating community focus on evaluation strategies for this task is a very welcome outcome of the virtual cell challenge!

0

2

0

76

wilstc retweeted

Eli Weinstein @EliWeinstein6

8 months ago

We're excited to present LeaVS, a method to scale up learning for protein function models. It is based on the co-design of wet lab experiments and in silico training.

EliWeinstein6's tweet photo. We're excited to present LeaVS, a method to scale up learning for protein function models. It is based on the co-design of wet lab experiments and in silico training. https://t.co/4WIkVuVdss

4

52

11

33

11K

Will Connell

@wilstc

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users