Normalization methods for single-cell RNA-Seq data (high-level overview)

Florian Wagner

Просмотров 13 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 2 окт 2024
In this video, I provide a high-level overview over different scRNA-Seq normalization methods. In particular, I discuss the differences between log transforms, square root transforms, and Pearson residuals. My Twitter: / flo_compbio
DOI of this video (for citations): doi.org/10.528...
While discussing the scaling step, I forgot to mention that scaling should be done to the median transcript count of all cells in the dataset (approx. 9,000 in the example), not to an arbitrary number like 1 or 1,000,000. Otherwise, this can really throw off the following transformation step and lead to completely useless analysis results.
Further reading
-------------------------
1. "Validation of noise models for single-cell transcriptomics" (Grün et al., 2015) doi.org/10.103...
2. "Comprehensive Integration of Single-Cell Data" (Stuart et al., 2019) doi.org/10.101...
3. "K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data" (Wagner et al., 2018) doi.org/10.110...
4. "Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression" (Hafemeister and Satija, 2019) doi.org/10.118...
5. "Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data" (Lause et al., 2021) doi.org/10.110...
Data sources
-------------------------
1. Technical noise experiment: "Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells" (Klein et al., 2015) doi.org/10.101...
2. PBMC data: "10k PBMCs from a Healthy Donor (v3 chemistry)" (10x Genomics) support.10xgen...

Комментарии • 46

@abasu000 3 года назад ⁺⁵
Clear and accessible explanation- thanks for the tutorial.
@nikakhoshnevis6574 Год назад
Thank you so so much. Very much informative and made things clear that I was confused about.
@shahidwani7586 2 года назад ⁺²
Hey Florian, this is the best video regarding explanation the single cell normalization. 👍🏽👍🏽👍🏽
@cuiwang743 Месяц назад
Thank you very much for posting this precious video! It makes things so much easier for beginners!
@Yglandir 3 года назад ⁺⁴
Hi Florian,
Thanks for the great explanation.
I have a question though: in your last example concerning pearson residual how do you get to these numbers? If I try to follow your formular mentioned on the slide before, I recieve different results. Did you simplify the mentioned formular and used instead the formular stated in Hafemeister and Satija (2019) or Lause et al (2021) for calculations? Did you do something else or am I just confused?
@florianwagner1255 3 года назад ⁺²
Thank you! I could be confused, you could be confused, or we could both be confused :) Can you tell me why you think my math is off? For gene 1 I calculated a mean of 4, so you divide all the measurements by sqrt(4)=2. 8/2=4. For gene 2 I calculated a mean of 0.09, so you divide all the measurements by sqrt(0.09)=0.3. 4.5/0.3=15. Does that make sense?
@Yglandir 3 года назад
@@florianwagner1255 Thanks for your quick response! My confusion originates in the question how do you calculate the mean expression for each gene? For me the mean of gene 1 is (0+8+8)/3 = 5.333 and gene 2 (0+0+4.5)/3 = 1.5.
Therefore "my" pearson residuals are 8/sqrt(5.333) = 3.46 (gene1) and 4.5/sqrt(1.5) = 3.67 (gene 2).
@Yglandir 3 года назад ⁺¹
I think I finally found my mistake! I did not take the percentage into account. If I do than my mean for gene1 is (0*0.5+8*4.8+8*0.02)/3 = 4. And following the same logic 4.5*0.02/3 = 0.09.
Thanks for helping finding my mistake! =)
@florianwagner1255 3 года назад ⁺¹
@@Yglandir oh I think you are ignoring the cell type proportions specified in the example... Gene 1 has an expression of 8 in exactly 50% of the cells and 0 in the other 50%, so the mean is 4. Similarly, Gene 2 is only expressed in 2% of the cells. I hope that makes sense.
@Jenkins-f7s Год назад
Hi Yglandir! Thanks for open honest questioning - scientists need to do this more. Might I ask where you're from?
@derricmorgan2282 3 года назад ⁺²
Thanks a lot, really useful even after having read several papers and articles concerning the matter.
@sfmambero 2 года назад ⁺¹
Thank you for the clear explanation!
The toy examples really helped in understanding the effects of the different types of normalization.
What did you mean by “clipping” though when you talked about Pearson residuals?
@florianwagner1255 2 года назад ⁺¹
I was referring to a situation where the evidence of non-uniform expression for a gene is so strong, that the Pearson residuals become very large. This happens for example if there is a very cell type that has very high and specific expression of certain genes (e.g., hemoglobin genes in a few red blood cells that are contaminants in PBMC samples). "Clipping at X" means setting all values larger than a certain number X to X. The motivation for "clipping" is the idea that there isn't any benefit to letting Pearson residuals grow arbitrarily large, and it may result in strange outliers in certain analyses. I don't think clipping is always necessary, but it is something that has been described in the literature, so I mentioned it here.
@sfmambero 2 года назад
@@florianwagner1255 Understood. Thank you again!
@tommasogiacomello7870 Год назад ⁺¹
Hi! Really clear explanation thanks a lot it was very useful, I have a question: how do i choose the scaling factor?
@muratseker6406 2 года назад ⁺¹
Thank you for the video it is clearly explained! Looking forward to see more video on scRNA :)
@marcelochocki6281 2 года назад ⁺¹
Thank you so much for that video. Keep going :)
@Mirabell97 2 года назад ⁺²
Hey! Thanks for the great explanation, helped a ton! Did I get that correctly, that for Pearson-residual based normalization, no Scaling is done prior to the multiplication with the weight?
@florianwagner1255 2 года назад ⁺¹
No, in the way that I've explained it, the same scaling applies. I'm always using this method to get rid of "efficiency noise", which would otherwise throw off these very simple approaches to normalization.
@Mirabell97 2 года назад
@@florianwagner1255 thanks a lot!
@bio_mark 3 года назад ⁺²
Hi Florian, thank you for your clear explanation. I am not an expert on rna seq analysis and I am trying to learn on my own. I just have one (maybe big?) question. How would you conduct differential expression analysis after scaling and transforming your data as you explained. I know DEseq2 from R cannot be used with previously normalized data. Which R pipeline would you do after this? thank you
@bio_mark 3 года назад
or it is possible to conitnue with DESEQ2 after these steps? thank you
@florianwagner1255 3 года назад ⁺¹
Hi Marcos, I think most of the things I talk about in this video are not directly relevant to differential expression (DE) analysis. I think in many cases you probably still want to do a scaling step, but I don't think the transformations are very useful in the context of DE analysis. I wouldn't claim to be an expert on DE analysis of scRNA-Seq data, but I think this website might be interesting for you: biocellgen-public.svi.edu.au/mig_2019_scrnaseq-workshop/public/dechapter.html
@bio_mark 3 года назад
@@florianwagner1255 thank you for your reply and for the material!
@SuperMixedd 8 месяцев назад
@@bio_mark deseq2 always works on counts, so you'd be better off with raw counts if you work with 10x data
@sailingintosunshine Год назад
really helpful, thanks!
@pancake9191 2 года назад
For your example at 10:15, if you assume this matrix has already been thru scaling, why are the total number of reads in two cells still so different?
@wasima4463 2 года назад
examples data structures are transposed from the theoretical data structure (1:38) which creates confusion
@muratseker6406 2 года назад
when we look at the raw data, how can we have an idea how the raw data across every cell look like? So that we can determine like in your example?
@jordanwilson8277 2 года назад ⁺¹
Awesome!
@davidvanbergen2283 2 года назад
Thanks for the great explanation! One question: why considering the delta (10:52) and not the fold-change? (In my understanding fold-change is more biologically relevant.)
@florianwagner1255 2 года назад
Thank you! I am discussing fold changes while I'm talking about the examples on the slide.
@Jenkins-f7s Год назад
Wow, you're one of my new YT favorites work wise. Just FYI, they are adding ads with shamelessness!
@Jenkins-f7s Год назад
Just a question about scaling. Shouldn't the amount of RNA be used in cell type characterization, or in quality control? Seems weird to scale it all away.
@SunilDhasmana 6 месяцев назад
1:56 @florianwagner1255 Could you please explain how to generate this plot with 10X scRNA-seq data in R?
@andreacurtabbi7319 5 месяцев назад
Great content
@timazebardast1096 2 года назад
Great, Thanks.
@asunnyday3749 8 месяцев назад
Well done
@pariaalipour61 2 года назад
Thank you so much for this helpful video. I got two questions if you don't mind. First, does it matter the order of doing Normalization and Scaling? you mentioned scaling first however, in Satija vignette Normalization is done first what is the difference?. Second, what I realized is that normalization is separate from scaling. in this case, is normalization same as transformation?
@florianwagner1255 2 года назад ⁺¹
Thank you! The goal of the scaling step is to get rid of efficiency noise and convert from absolute expression levels to concentrations. This needs to be done first, because the transformation step is non-linear, so scaling after transformation doesn't have the same effect. Yes, "normalization" is sometimes used to mean transformation, but I've defined the term here to include both scaling and transformation, which I thought is more common.
@pariaalipour61 2 года назад
@@florianwagner1255 Thanks a lot for your explanation. Sorry, I'm trying to compare with Seurat vignette. I think the scaling+ transformation you mentioned here is done by NormalizeData in Seurat. Please correct me if I'm wrong. But what about the ScaleData in Seurat. did you mention it? or it's sth else?
@florianwagner1255 2 года назад ⁺¹
@@pariaalipour61 Yes I think you're right, NormalizeData does both scaling and transformation. ScaleData does something completely different, it subtracts the mean of each gene and divides by its standard deviation, which is usually called (feature) standardization or z-score normalization: satijalab.org/seurat/reference/scaledata
@pariaalipour61 2 года назад
@@florianwagner1255 you didn't mention that. Do you think it's not necessary for downstream analysis?
@offswitcher3159 2 года назад
great video!!
@mohamedrefaat197 2 года назад
Thanks a ton!

Следующие

Автовоспроизведение

How to perform PCA on single-cell RNA-Seq data in three simple steps