Alex Dobin @a_dobin - Twitter Profile

over 4 years ago

@fulop_dan Indeed, strandedness of the libraries does not (presently) affect alignments. --soloStrand option is necessary for assigning reads to genes in the single-cell gene expression.

0

1

0

Alex Dobin @a_dobin

over 4 years ago

@AnnLorainePhD @nomad421 @anshulkundaje Good suggestions from Rob! And as Anshul pointed out, tweaking parameters could be helpful. If you have specific examples of stubbornly wrong alignments, please post them on GitHub: https://t.co/0plPdXOISk

0

2

0

Alex Dobin @a_dobin

almost 5 years ago

@anshulkundaje @satijalab @nomad421 @stephaniehicks @humancellatlas @_hubmap To mitigate this approach, we can prioritize exons over introns when they overlap, as suggested in https://t.co/QtWNfDHrBj . This approach is being implemented in STARsolo (to be released soon). (2/2)

1

9

3

4

0

Alex Dobin @a_dobin

almost 5 years ago

@anshulkundaje @satijalab @nomad421 @stephaniehicks @humancellatlas @_hubmap Seconding all responses, good discussion! An issue with including intronic reads is with the genes whose exons overlap introns of other genes. Reads mapping to such overlapped regions will be considered multi-gene and (typically) excluded. (1/2)

1

6

0

1

0

Who to follow

Ming "Tommy" Tang

@tangming2005

Director of bioinformatics at AstraZeneca. YouTube at chatomics. On my way to helping 1 million people learn bioinformatics. Also talks about leadership.

Fabian Theis

@fabian_theis

Computational biologist @HelmholtzMunich, prof @TU_Muenchen & associate PI @sangerinstitute. Dad of 4 and mountain lover. Department news, see @CompHealthMuc

brent pedersen

@brent_p

computational biologist. Building humane tools for large-scale genomics and rare-disease. Contact me if you have genomics/bioinformatics contracting work.

Alex Dobin @a_dobin

about 5 years ago

@APredeus Indeed, we were using the "abridged" 10X annotations that exclude small non-coding RNA and pseudogenes. We checked it for the full Gencode 37 annotations, and the results were very similar.

0

2

0

Alex Dobin @a_dobin

about 5 years ago

STARsolo preprint is out on bioRxiv: https://t.co/okqCUWIERH STAR release 2.7.9a: https://t.co/bjkJskVfnl The major new feature is quantification of multi-gene (multi-mapping) reads/UMIs, which are necessary to detect expression from overlapping genes and paralogs. 1/5

6

290

98

39

0

Alex Dobin @a_dobin

about 5 years ago

@alexwstockinger Supertranscripts should work if you can make a set of Supertranscript sequences and a GTF describing spliced/unspliced transcripts with respect to transcsirpts and giving it to the STAR genome generation step.

0

Alex Dobin @a_dobin

about 5 years ago

@nomad421 Interesting approach, and very impressive accuracy improvement! And incredibly quick turn-around time!

1

4

0

Alex Dobin @a_dobin

about 5 years ago

@alexwstockinger The SuperTranscripts are very cool - but they would require spliced alignments. We were actually looking into that at some point but did not get far. The redundancy is not a problem, as long as redundant transcripts are assigned to the same gene.

1

0

Alex Dobin @a_dobin

about 5 years ago

@alexwstockinger This is a good point: for species without genome assembly, mapping to the transcriptome is the only option. You can do it with STARsolo by generating the genome index from transcript sequences instead of chromosomes. 3/3

1

0

Alex Dobin @a_dobin

about 5 years ago

@alexwstockinger Using simulations, we show the differences are due to Kallisto's lower accuracy, which is caused by the pseudoalignment-to-transcriptome algorithm. It forces intronic reads (abundant in single-cell data) to map to spurious genes. 2/3

2

1

0

Alex Dobin @a_dobin

about 5 years ago

@timtriche @manvendr7 @MollyHammell Interesting paper, thanks! It looks like they are aggregating reads over "meta" TE - they are not doing EM over individual genes.

0

2

0

Alex Dobin @a_dobin

about 5 years ago

@bdeonovic @BMirauta @biomonika @lpachter Sure, no disagreement here. I was thinking about a specific data type, scRNA-seq gene/cell counts: mostly 0s, many 1s, and fewer >=2 elements. But maybe Lior has something else on his mind, and I am being paranoid. https://t.co/pqNF3IF3qN

Alex Dobin @a_dobin

about 5 years ago

@hypercompetent @lpachter It’s getting late on the East coast, and still no blog from Lior, so I will make my presumptuous guess. I think Lior is trying to puzzle out why Kallisto to CellRanger correlation is lower in our Fig.4C https://t.co/okqCUWIERH vs. their Fig.2D https://t.co/x55WNVzIDh 1/3

1

0

1

0

Alex Dobin @a_dobin

about 5 years ago

@nomad421 @p_bourguet Indeed!

0

1

0

Alex Dobin @a_dobin

about 5 years ago

@BMirauta @bdeonovic @biomonika @lpachter And correlation coefficient does not have to be higher than the proportion of equal elements. An even simpler toy example: x=[0 0 1 1] y=[0 1 0 1] corr(x,y)=0 (obviously) while 50% of the elements agree.

1

0

Alex Dobin @a_dobin

about 5 years ago

@p_bourguet Right, there are a few features in STARsolo that would be good to have for bulk (e.g., counting only reads that are concordant with transcripts). They are high on my TODO list. Though for multimappers, quantifying with RSEM is still a better (albeit slower) option.

2

5

0

Alex Dobin @a_dobin

about 5 years ago

@hypercompetent @lpachter The answer to “why Kallisto to CellRanger correlation is lower in our calculation” is simple. We used Spearman correlation, while they used Pearson. Pearson correlation, of course, can be inflated by various artifacts and is not a good choice for RNA-seq data. 3/3

0

1

0

Alex Dobin @a_dobin

about 5 years ago

@hypercompetent @lpachter I am still not sure what’s the point of Lior’s toy example. Should we not use correlation as a metric at all? Then why was it used in Kallisto paper? 2/3

1

0

Alex Dobin @a_dobin

about 5 years ago

@dna_rosenberg @ParseBio Thanks, Alex!

0

Alex Dobin

@a_dobin

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users