Comparing single-cell RNA integration methods | Which is the best?

Поделиться
HTML-код
  • Опубликовано: 19 фев 2023
  • Which single-cell integration method is the best? In this video I compare 5 different methods using 3 different challenging integration problems. I test Seurat CCA, Seurat RPCA, SCVI-tools, and Scanorama. I measure time and memory usage and also examine integration outcomes.
    Github:
    github.com/mousepixels/sanbom...
    Datasets:
    www.cell.com/cell/fulltext/S0...
    www.nature.com/articles/s4159...
    muscle/lung: unpublished
  • НаукаНаука

Комментарии • 26

  • @zhaomingwu4105
    @zhaomingwu4105 4 месяца назад

    Amazing tutorial, very helpful. I followed the steps in the video and tested out Scanaroma and scVI. I also got terrible result with Scanaroma when I was trying to integrate two datasets that are very different (I was expecting 5% of the cells from one dataset to be similar to the other dataset, and the rest of them should all be very different). Scanaroma just gave me a giant data cloud plus 2 tiny clusters which makes no sense biologically. ScVI's result, on the other hand, makes a lot of sence with this datasets.

  • @user-hw9nb9ov4p
    @user-hw9nb9ov4p Год назад +1

    Hi Sanbomics,
    I wanted to let you know that I am really enjoying studying single cell RNA sequencing through your videos! They have been very helpful.
    I am curious if you have compared the SCTransform method in Seurat with other integration methods. I have noticed that SCTransform is somewhat different, but I am not very familiar with the statistical aspects, so I have been using it as is.
    If you are interested, I would love to see a comparison between SCTransform and other methods!

    • @sanbomics
      @sanbomics  Год назад +2

      Hi, this is a good question. SCTransform is a good tool for normalization and variance stabilization but it doesn't replace integration methods. You could, and often should, use it for normalization prior to integration. I should probably switch to using it in all my videos but here i just used basic normalization for simplicity and easy comparison between methods.

  • @blackmatti86
    @blackmatti86 Год назад +1

    Which integration method would you use for 10X Multiome using multiple samples and how would you go about it? Would you integrate each modality (RNA & ATAC) separately for each sample or together? I cannot seem to find any useful info out there.. Thank you

  • @sbarman21
    @sbarman21 Год назад +1

    Thank you very much for the nice tutorial @Sanbomics. Could you please also make a tutorial about CITEseq data analysis, specifically data set of a library prepared with barcoded antibodies and the hashtags for the sample souce?

    • @sanbomics
      @sanbomics  Год назад +1

      I will definitely do one of these in the future!

  • @minjun9900
    @minjun9900 7 месяцев назад

    thanks a lot for your excellent explanations...

  • @hyeokome
    @hyeokome Год назад

    Crazy how different algorithms could take you from few minutes to 10+ hours.... Thanks again for another great video! For your own research, would you usually test datasets only on python based algorithms like scanorama and scvi?

    • @sanbomics
      @sanbomics  Год назад

      No problem! Hmm, I am biased towards python analysis unless there is some specific tool I need to use that is only available in R. Of course this is just my personal preference. While scanorma is ~"the best" you can still do quality and robust analysis in R

  • @ustigergirl
    @ustigergirl Год назад

    Hi loving your videos. Do you have a video on how to quality control raw scRNA datasets?

    • @sanbomics
      @sanbomics  Год назад

      I think i do a very brief QC in my 10x cellranger video. But it is not the point of the video so I don't go so much in depth.

  • @pietrodelfino7393
    @pietrodelfino7393 Год назад

    Hi, thanks for the video! So you say for differential expression analysis you use corrected counts or did I get you wrong? I read from Seurat developers that it is better to use unintegrated data for differential expression even after integration.

    • @sanbomics
      @sanbomics  Год назад +1

      Good point. Like you say for Seurat they suggest to use the original: satijalab.org/seurat/articles/integration_introduction.html
      On the other hand, SCVI allows you to do DE with the integrated model.

    • @pietrodelfino7393
      @pietrodelfino7393 Год назад

      @@sanbomics Finished trying scvi integration but in my setting it performs poorly. I then exported the concateneted anndata object to Seurat and tried harmony instead and it worked pretty well. Wonder what could be the reason for such a difference. It looks like it's "experiment specific".

  • @aayushinotra5775
    @aayushinotra5775 9 месяцев назад

    Hey @sanbomics
    I had a question is there a good batch correction method can can return the corrected counts matrix ?
    The corrected count matrix i intend to use should contain non negative values

    • @sanbomics
      @sanbomics  9 месяцев назад

      scVI gives corrected counts

  • @parmenideskim9739
    @parmenideskim9739 3 месяца назад

    Thanks for your really great video. I have a big trouble to run pySCENIC in large dataset. My computer has 92 cores and 1TB RAM but can not run pySCENIC because it requires more RAM over 1TB. Is there any solution for this issue or any other alternative analytic packages comparable to pySCENIC?

    • @sanbomics
      @sanbomics  2 месяца назад

      how many cells are in your dataset??

  • @krushnachChandra
    @krushnachChandra Год назад

    Sir can you explain how to do for loom file format? in python as show in your jupyter notebook

    • @sanbomics
      @sanbomics  Год назад

      Hi! scanpy has sc.read_loom()

  • @wangmengfei5462
    @wangmengfei5462 11 месяцев назад

    Hi Sanbomics,
    Thank you for your excellent work.
    I have two scRNA database and one snRNA database. There are more than 440,000 cells and more than 20,000 genes. My cloud server has 256g memory room. Could you please recomend me a suitable way to integrate the three databases. I wanted to perform cellchat analysis, so I used CCA to integrate the databases. After 24h running CCA in R, my cloud server crashed. (T﹏T)

    • @sanbomics
      @sanbomics  11 месяцев назад +1

      Yeah, you don't want to do CCA. If you like R, I recommend harmony. Should take

    • @wangmengfei5462
      @wangmengfei5462 11 месяцев назад

      @@sanbomics
      Hi Sanbomics,
      Thank you for your advice. Harmony worked very well xD
      And may I ask how could you calculat the time it will cost and cells number it could worked with for 256g memory? I used a sharing cloud server, it used about 70-80g memory during running harmony. And it took about 90 minutes, which is consistant with your prediction.