Lamin

Foundational tools for omics data (mostly in python) Join us at Bluesky: https://t.co/ziA75N5PZp

2 months ago

When spatial datasets accumulate across experiments and technologies, managing, querying, and training models on them becomes a major challenge. To address this, we built support for scverse's SpatialData format into LaminDB, enabling cross-dataset queries, dataset validation, and lineage tracking. The main challenge was extending pandera-based schema validation to the complicated structure of SpatialData; Parquet and AnnData are easier! Blog: https://t.co/pmpT2A6uOy Code: https://t.co/tb6t2FJ7tt With @LukasHeumos and many others!

falexwolf's tweet photo. When spatial datasets accumulate across experiments and technologies, managing, querying, and training models on them becomes a major challenge. To address this, we built support for scverse's SpatialData format into LaminDB, enabling cross-dataset queries, dataset validation, and lineage tracking.

The main challenge was extending pandera-based schema validation to the complicated structure of SpatialData; Parquet and AnnData are easier!

Blog: https://t.co/pmpT2A6uOy
Code: https://t.co/tb6t2FJ7tt

With @LukasHeumos and many others!

2

26

7

10

2K

laminlabs retweeted

Tyler Burns @tjburns08

3 months ago

Hi friends, I wrote a guest post for Lamin on using the open source LaminR package in an R workflow with the PBMC 3k dataset. Focus: provenance — tracking code, environment & execution order so analyses are reproducible when you (or someone else) comes back to them.

1

3

2

0

294

Who to follow

scverse

@scverse_team

Open data infra for biology @laminlabs. Previously, created Scanpy & led build-up of Cellarity's compute platform.

Enveda

@lifeschemistry

Engineering new drugs from nature with knowledge graphs, metabolomics, and ML

4 months ago

Read the full post: https://t.co/U21Tp2T0dX

0

24

4 months ago

We partnered with @jejomath to help us explain the relation between biology’s sparse measurements and the data lakehouse concept.

laminlabs's tweet photo. We partnered with @jejomath to help us explain the relation between biology’s sparse measurements and the data lakehouse concept. https://t.co/HkRHsK2iP4

2

8

4

1

273

4 months ago

Existing data infrastructure can't make sparse measurements across millions of features queryable. Warehouses are too rigid, data lakes can't be queried, tabular lakehouses don't understand the formats. Biology needs a data lakehouse with support for bio-formats and registries.

laminlabs's tweet photo. Existing data infrastructure can't make sparse measurements across millions of features queryable. Warehouses are too rigid, data lakes can't be queried, tabular lakehouses don't understand the formats. Biology needs a data lakehouse with support for bio-formats and registries. https://t.co/k3cvAhky3r

1

0

29

laminlabs retweeted

4 months ago

Two years ago we partnered with Mark Keller from Nils Gehlenborg’s Lab at Harvard to make Vitessce work seamlessly with LaminDB for interactive visualization of multimodal + spatial datasets. The integration has found much use across academia, biotech, and pharma — so we wrote up on design principles & use cases. This was a team effort involving Altana, Richard & Sunny in addition to Mark. Read the post: https://t.co/a7vvu6p0y3

falexwolf's tweet photo. Two years ago we partnered with Mark Keller from Nils Gehlenborg’s Lab at Harvard to make Vitessce work seamlessly with LaminDB for interactive visualization of multimodal + spatial datasets.

The integration has found much use across academia, biotech, and pharma — so we wrote up on design principles & use cases.

This was a team effort involving Altana, Richard & Sunny in addition to Mark.

Read the post: https://t.co/a7vvu6p0y3

0

10

2

4

702

laminlabs retweeted

David Fischer @davidsebfischer

4 months ago

What should the shared memory layer for agents and humans look like? Will it live in embeddings or in records? A high-level note.

1

8

2

260

laminlabs retweeted

about 2 years ago

Nice, detailed benchmark of backends that allow for batched training on a large scRNA-seq corpus - efficiently dealing with the specifics of a scenario can be a big engineering challenge, lowering this barrier will enable cool computational biology down the road!

0

20

4

3

4K

laminlabs retweeted

Valentine Svensson @vallens

about 2 years ago

What's a good way of organizing scRNA-seq data for training foundation models? Say you run 1k experiments and each measures counts for 1M cells with varying metadata and orthogonal data. Storing these data in one gigantic array isn’t exactly easy. We wondered whether it’s necessary to train foundation models and found 3 setups that made sense to us. https://t.co/4p6g3iRbpE

falexwolf's tweet photo. What's a good way of organizing scRNA-seq data for training foundation models?

Say you run 1k experiments and each measures counts for 1M cells with varying metadata and orthogonal data.

Storing these data in one gigantic array isn’t exactly easy.

We wondered whether it’s necessary to train foundation models and found 3 setups that made sense to us.

https://t.co/4p6g3iRbpE

2

123

34

95

43K

laminlabs retweeted

over 2 years ago

Organizing scRNA-seq data with LaminDB - https://t.co/PktmLpPipL

3

55

13

27

9K

laminlabs retweeted

Sunny Sun @sunnyosun

over 3 years ago

Thank you for the awesome collaboration, @marenbuettner! With Pytometry, we'd like to share readfcs: A package to load data and metadata from FCS files to AnnData. pip install readfcs

1

12

3

2

0

laminlabs retweeted

almost 4 years ago

New tool: nbproject helps manage Jupyter notebooks! A lightweight open-source ELN for the drylab. pip install nbproject

3

148

28

47

0