Incredibly proud to see our benchmark of single-cell preprocessing methods finally published 🥳🥳🎉
We show that despite its theoretical limitations, no other transformation consistently outperforms log(y/s+1).
All details at https://t.co/0caikoVmBa and https://t.co/X61ICd6sao
One of most important articles I’ve done… showing the noise in clinic BP measurement is large & makes it impossible to track Rx effects; almost useless in evaluating change from 2 clinic visits. Let me explain… https://t.co/z93fXHDaEr @YaleMed@YaleCardiology@CircOutcomes
1. Thrilled and proud to announce our latest preprint: Spectra -Supervised discovery of interpretable gene programs from single-cell data. It is a factorization method that really works and is already in intensive use across projects in my own lab --> https://t.co/081H74jhtE
The best way to make an algorithm outperform t-SNE in a benchmark has always been to compare against @scikit_learn's implementation. Glad that it is now being modernized.
Great work and congratulations on a sweet bug fix by @hippopedoid and Weiyi!
In @scikit_learn 1.2 (just out!), manifold.TSNE adopts modern parameter defaults, including O(n) learning rate and PCA init (as argued by @GCLinderman, @Anna_C_Belkina, @pavlinpolicar, @CellTypist, myself, and others).
BTW, that crazy bug 🐞 fix in 1.0 deserves a thread! [1/n]
A very long overdue thread: happy to share preprint led by Sebastian Damrich from @FredHamprecht's lab.
*From t-SNE to UMAP with contrastive learning*
https://t.co/jnj0P0gSyc
I think we have finally understood the *real* difference between t-SNE and UMAP. It involves NCE! [1/n]
Ever wondered what image datasets look like if they could be visualized? We have developed a new algorithm for visualization based on contrastive learning. Joint work with @hippopedoid and @CellTypist. The full details are available as a preprint https://t.co/3O4B1YxKoM 🧵/16
I've been reading Andrei Okounkov's short and accessible expository articles about the work of this years' four Fields medallists and they are wonderful. Highly recommended reading!
https://t.co/R55qKVLfvk
https://t.co/ggUNCtEpkm
https://t.co/pKxd5ATgJl
https://t.co/Q7VpCIUb4A
@lucylgao, @popp_josh, @alexisjbattle, @daniela_witten and I are excited to share our new preprint!(https://t.co/Qee7mSeEwa) We introduce “count splitting”, a flexible framework that allows for valid p-values for differential expression across estimated latent variables. (1/8)
1/4 New commentary on the big data paradox, i.e., the phenomenon whereby as the number of patients enrolled in a study *increases*, the probability that the confidence intervals from that study will include the truth *decreases* 👉 https://t.co/yGNPg3yBf2
Our new study shows that data availability statements are not very useful; 1670 (93%) authors who indicated that data are available on request either did not respond or declined to share their data with us. Journal of Clinical Epidemiology: https://t.co/4IT2Dgphl4
Many theoretical works in ML & high-d stats focus on Gaussian data but why should we care? Real data are definitely not Gaussian, amiright?
Well, it might not be such a bad assumption, see plot 👇! How is this possible? Turns out there are universality properties in high-d 1/2
A final GI rounds to thank IR legend Dr. Mueller. General surgery at MGH is so thankful for all of his help over the last 4 decades! @MGHSurgery@MGHIR1
A big day for life science
The Tabula Sapiens, like a Periodic Table of Human Cells, ~500,000 cells analyzed, 24 tissues / organs @ScienceMagazine
"a broadly useful reference to deeply understand and explore human biology at cellular resolution" https://t.co/p0z103BnZH
The analysis of single-cell RNA-seq data begins with "normalizing" counts. In a preprint with @sinabooeshaghi, @IngileifBryndis & @agalvezmerchan, we examine the assumptions and challenges of normalization, benchmark methods, and motivate solutions: https://t.co/yb7sdzWbPQ 🧵 1/
I’m happy to share our latest preprint introducing the Single Cell Lung Cancer Atlas (LuCA) integrating >1.2M cells from 223 NSCLC patients + 86 controls from 29 datasets. We used it (among other things) to study tissue-resident neutrophils in NSCLC.🧵⬇️
https://t.co/rx7rblUWnX
My paper on Poisson underdispersion in reported Covid-19 cases & deaths is out in @signmagazine. The claim is that underdispersion is a HUGE RED FLAG and suggests misreporting.
Paper: https://t.co/X1ulWty1nM
Code: https://t.co/qLkVD7msEa
Figure below highlights 🇷🇺 and 🇺🇦. /1