We expanded kallisto to translated alignment (nucleotides <-> amino acids), and used it to detect novel viruses in RNA seq data. The single-cell resolution allowed us to determine whether the presence of these viruses affected host gene expression. 🧵
https://t.co/d76fumgcbX
@GorinGennady@simocristea@lpachter Actually, we did benchmark against a a non-neural approximation strategy (moment matched the distributions to negative binomials). It was about 10x less accurate, as measured by Hellinger distance, than our neural strategy. How much inferior depends on desired level of accuracy!
@mrazomej@GorinGennady@lpachter Thanks so much! Variational autoencoders are used to find underlying structure in and “de-noise” high-dimensional datasets. Thus, the “integration” of data is achieved by using both spliced and unspliced counts to create one low-dimensional latent embedding of the cell.
@simocristea@lpachter For non-trivial cases, we in fact do define a set of negative binomial basis functions and use the neural network to learn weights for those functions. While a simpler function could be used to find weights, the approximation could lose accuracy. Details are in the preprint! 2/2
@simocristea@lpachter Thanks for your question, Simona! There are indeed closed-form solutions to some simple cases. When considering only mature RNA, the bursty model and what we call the "extrinsic model" result in negative binomially distributed counts, and the constitutive in Poisson.
1/
We have another preprint out! Please see @lpachter's thread for an overview from the viewpoint of integration, or read on for
1/ biVI: THE CHEMICAL ENGINEER’S PERSPECTIVE
Interested in "integrating" multimodal #scRNAseq data? W/ @MariaCarilli, @GorinGennady, @funion10 & Tara Chari we introduce biVI, which combines the scVI variational autoencoder with biophysically motivated bivariate models for RNA distributions. 🧵 1/
https://t.co/aA8uH787jv
gget alphafold: Predict the 3D structure of a protein from its amino acid sequence using @DeepMind’s AlphaFold v2.0 from a Python or command-line environment in 3 lines of code. Runs on any laptop and requires only ~4 GB of disk space. Simply ‘pip install gget’ and:
In a new preprint with @GorinGennady and Maria Carilli (co-first) along with Tara Chari, we show how neural networks can be used to efficiently approximate steady state solutions of two species (nascent and mature RNA) bursty transcription models.
https://t.co/09FbrfmibB 🧵1/