Sergei R.

Sr Solution Architect Genomics @nvidia | Steering council @scverse_team | Postdoc at @fabian_theis lab Personal account - do not represent employer NVIDIA.

2 months ago

When spatial datasets accumulate across experiments and technologies, managing, querying, and training models on them becomes a major challenge. To address this, we built support for scverse's SpatialData format into LaminDB, enabling cross-dataset queries, dataset validation, and lineage tracking. The main challenge was extending pandera-based schema validation to the complicated structure of SpatialData; Parquet and AnnData are easier! Blog: https://t.co/pmpT2A6uOy Code: https://t.co/tb6t2FJ7tt With @LukasHeumos and many others!

falexwolf's tweet photo. When spatial datasets accumulate across experiments and technologies, managing, querying, and training models on them becomes a major challenge. To address this, we built support for scverse's SpatialData format into LaminDB, enabling cross-dataset queries, dataset validation, and lineage tracking.

The main challenge was extending pandera-based schema validation to the complicated structure of SpatialData; Parquet and AnnData are easier!

Blog: https://t.co/pmpT2A6uOy
Code: https://t.co/tb6t2FJ7tt

With @LukasHeumos and many others!

2

26

7

10

2K

Koncopd retweeted

Lamin @laminlabs

4 months ago

We partnered with @jejomath to help us explain the relation between biology’s sparse measurements and the data lakehouse concept.

laminlabs's tweet photo. We partnered with @jejomath to help us explain the relation between biology’s sparse measurements and the data lakehouse concept. https://t.co/HkRHsK2iP4

2

8

4

1

273

Koncopd retweeted

Fabian Theis

@fabian_theis

almost 2 years ago

Happy that our ehrapy toolbox for the exploratory analysis of electronic health records is out today @NatureMedicine! It enables early QC&imputation at scale, visualization and downstream clustering & patient trajectory learning. https://t.co/PTqJ5P4C8X https://t.co/CJRIsM8jzi

fabian_theis's tweet photo. Happy that our ehrapy toolbox for the exploratory analysis of electronic health records is out today @NatureMedicine! It enables early QC&imputation at scale, visualization and downstream clustering & patient trajectory learning.
https://t.co/PTqJ5P4C8X https://t.co/CJRIsM8jzi https://t.co/PEQqMj1LW2

10

245

56

96

54K

Who to follow

Lukas Heumos

@LukasHeumos

Lamin

@laminlabs

Open data framework for biology. Context and memory for datasets and models at scale.

Open data infra for biology @laminlabs. Previously, created Scanpy & led build-up of Cellarity's compute platform.

Koncopd retweeted

Jérémie Kalfon @jkobject

almost 2 years ago

In my new manuscript scPRINT, I present a tool called scDataLoader. Let me tell you more about it 🧵 https://t.co/PhCcBZG7xQ 1/8

1

4

2

1

213

about 2 years ago

@_canergen @falexwolf >Are some dataloaders CPU limited or all disk speed limited? I don't think we hit disk speed limits with any dataloader, MappedCollection is just slower because during the random sampling (training mode) it pulls individual indices instead of whole chunks like Merlin does.

0

23

about 2 years ago

@_canergen @falexwolf Hi, @_canergen , >Have you tried load_sparse_tensor similar to scVI? MappedCollection samples indices one by one randomly and then PyTroch DataLoader collates them into a tensor, it really loads neither sparse nor dense whole tensors. And Merlin stores and samples dense arrays.

0

23

Koncopd retweeted

Valentine Svensson @vallens

about 2 years ago

What's a good way of organizing scRNA-seq data for training foundation models? Say you run 1k experiments and each measures counts for 1M cells with varying metadata and orthogonal data. Storing these data in one gigantic array isn’t exactly easy. We wondered whether it’s necessary to train foundation models and found 3 setups that made sense to us. https://t.co/4p6g3iRbpE

falexwolf's tweet photo. What's a good way of organizing scRNA-seq data for training foundation models?

Say you run 1k experiments and each measures counts for 1M cells with varying metadata and orthogonal data.

Storing these data in one gigantic array isn’t exactly easy.

We wondered whether it’s necessary to train foundation models and found 3 setups that made sense to us.

https://t.co/4p6g3iRbpE

2

123

34

95

43K

Koncopd retweeted

over 2 years ago

Organizing scRNA-seq data with LaminDB - https://t.co/PktmLpPipL

3

55

13

27

9K

Koncopd retweeted

Lukas Heumos @LukasHeumos

over 2 years ago

I'm super excited to announce our new framework for exploratory electronic health record analysis "ehrapy". Although analysis is standardized for single-cell by seurat, bioconductor and scanpy, EHR analysis was until now the wild west. https://t.co/eOfJ0UaPum

LukasHeumos's tweet photo. I'm super excited to announce our new framework for exploratory electronic health record analysis "ehrapy". Although analysis is standardized for single-cell by seurat, bioconductor and scanpy, EHR analysis was until now the wild west. https://t.co/eOfJ0UaPum https://t.co/l8B2H1wj51

1

101

42

36

14K

Koncopd retweeted

Mo Lotfollahi

@mo_lotfollahi

over 3 years ago

(1/4) Thrilled that expiMap is now published in @NatureCellBio. It learns the activity of gene programs for a query single-cell data in a reference atlas while enabling learning novel gene programs (e.g., new cell states, disease)! https://t.co/iloivSId95

mo_lotfollahi's tweet photo. (1/4) Thrilled that expiMap is now published in
@NatureCellBio. It learns the activity of gene programs for a query single-cell data in a reference atlas while enabling learning novel gene programs (e.g., new cell states, disease)! https://t.co/iloivSId95 https://t.co/BJxRX0xTO9

4

238

53

52

26K

Koncopd retweeted

Fabian Theis

@fabian_theis

over 3 years ago

Excited to finally see ExpiMap out @NatureCellBio - led by @Mohlotf & Sergei Rybakov, we inform single-cell embeddings by pathway priors (+ newly-learnt ones). This allows for biologically understandable components in the latent space and program queries. https://t.co/czvxbbr4Eu

fabian_theis's tweet photo. Excited to finally see ExpiMap out @NatureCellBio - led by @Mohlotf & Sergei Rybakov, we inform single-cell embeddings by pathway priors (+ newly-learnt ones). This allows for biologically understandable components in the latent space and program queries. https://t.co/czvxbbr4Eu https://t.co/gyQc4NBNiu

1

186

42

26

29K

Koncopd retweeted

almost 4 years ago

New tool: nbproject helps manage Jupyter notebooks! A lightweight open-source ELN for the drylab. pip install nbproject

3

148

28

47

0

Koncopd retweeted

scverse @scverse_team

about 4 years ago

We are very excited to announce scverse (https://t.co/B4lDGfWDPQ), a new consortium around the core Python packages for single-cell omics data analysis. scverse is a cross-lab effort to ensure the longevity and interoperability of the single-cell analysis ecosystem in Python.

6

632

223

88

0

Koncopd retweeted

scverse @scverse_team

over 4 years ago

AnnData 0.8.0 is out! New features include: * Refactored IO, including low level access and support for new datatypes * Out of core pytorch data loaders * AnnDatas without an X value Check out the full release notes here: https://t.co/yDUqf9LVbL

1

43

5

0

Koncopd retweeted

Mo Lotfollahi

@mo_lotfollahi

over 4 years ago

(1/11) Excited to share our new approach to learn gene programs (GP) activity from single-cells "biologically informed deep learning".We add prior knowledge while learning new cellular circuits, going beyond data integration and towards interpretability.https://t.co/uaIYvwcUnj

3

212

59

50

0

almost 7 years ago

@deleeuw_jan Thank you!

0

almost 7 years ago

@deleeuw_jan sorry bo bother you with it, but is it still possible to get a pdf version of your book "Block Relaxation Methods in Statistics"? The link https://t.co/tGbm5UvHro doesn't work for me, it says "You don't have permission to access /bras/_book/_main.pdf on this server".

0