2024 updated single-cell guide - Part 1: RNA preprocessing and quality control

Поделиться
HTML-код
  • Опубликовано: 3 июл 2024
  • This is a comprehensive tutorial on the most up-to-date recommendations for single-cell sequencing. This is part 1 of a multi-part series. Here I download a dataset, remove background RNA, preform quality control, and remove low quality cells.
    Part 2 will cover dimension reduction and cell annotation. We will eventually get to in-depth analysis and scATAC analysis.
    Notebook:
    github.com/mousepixels/sanbom...
    Paper/dataset:
    www.cell.com/cancer-cell/full...
    Reference:
    www.sc-best-practices.org/pre...
    0:00 Intro
    0:27 Setup
    12:08 Cellbender
    18:20 QC
    28:05 preprocessing
    39:42 Conclusions
  • НаукаНаука

Комментарии • 70

  • @jonathanback5731
    @jonathanback5731 2 месяца назад

    Your work is fantastic, great content!

  • @yaseminsucu416
    @yaseminsucu416 2 месяца назад

    You rock! Thank you for doing this, looking forward to following this series!

  • @DuqueVJ
    @DuqueVJ 2 месяца назад

    Amazing! Thanks very much for the tutorial, I'm learning a lot!

  • @supakornpongpakdee1544
    @supakornpongpakdee1544 2 месяца назад +1

    Thank you very much for creating this tutorial! Looking forward to the next lessons!😊❤

  • @laloulymounia9266
    @laloulymounia9266 2 месяца назад

    Thx for the update !

  • @007ZK
    @007ZK 2 месяца назад +7

    Amazing series idea. I hope they keep coming.

    • @sanbomics
      @sanbomics  2 месяца назад +3

      Hope is next week!

  • @lly6115
    @lly6115 2 месяца назад +6

    Good to see you back😊 and thank you for your update

    • @sanbomics
      @sanbomics  2 месяца назад

      Yeah sorry I have been busy! Shouldn't be as long between the next few videos.

  • @avp300
    @avp300 2 месяца назад

    this is brilliant! can't wait for part two!! Ridge plot look awesome! thank you Mark! :-)

    • @sanbomics
      @sanbomics  2 месяца назад +1

      Tomorrow hopefully!

  • @piroDYMSUS
    @piroDYMSUS 2 месяца назад +3

    Amazing work, hope we will see second part soon

    • @sanbomics
      @sanbomics  2 месяца назад +1

      Trying to release in the next week or two!

  • @ykoy1577
    @ykoy1577 2 месяца назад +3

    I was waiting for your video. your video is so helpful for beginner like me. Thank you so much for sharing your knowledge and experience

  • @alexeyryzhenkov7579
    @alexeyryzhenkov7579 Месяц назад

    Thank you for your work!

    • @sanbomics
      @sanbomics  Месяц назад

      Thank you so much!!! Really appreciate it! :)

  • @jackmineeechen4380
    @jackmineeechen4380 2 месяца назад

    I started with the video camparing different intergration method. That one really helped me! I eventually choose scanorama for my dataset, which worked out. Looking forward to this series! I appreciate your videoes!

  • @0996Winglet-mq4on
    @0996Winglet-mq4on 2 месяца назад +5

    really appreciate your videos🎉❤cannot wait to see spatial omics tutorial in the future😊

    • @sanbomics
      @sanbomics  2 месяца назад

      Right now I am eagerly waiting some interesting datasets with newer more high res technology than visium

  • @caspase888
    @caspase888 2 месяца назад +1

    I look forward to your videos. Your grasp on the subject and the ability to teach are amazing. Thanks a lot 👍🏻

  • @brunovinagre427
    @brunovinagre427 2 месяца назад

    gratefull Mark!!

  • @dardas15
    @dardas15 2 месяца назад

    this is fantastic and really helps people with limited bioinformatics background to independently analyze data-thanks so much for making these videos, ive been using them with python ever since you shared a few years ago!

  • @jianhuacao7180
    @jianhuacao7180 2 месяца назад

    welcome back, bro. Your channel is better than before.

    • @sanbomics
      @sanbomics  2 месяца назад

      Thanks! I am trying to continually improve the quality and make videos people are actually interested in.

  • @moonmoun2983
    @moonmoun2983 2 месяца назад

    Waiting impatiently for the next part

    • @sanbomics
      @sanbomics  2 месяца назад

      Wait no further! :)

  • @MrJordi94
    @MrJordi94 2 месяца назад +1

    You trully are an inspiration for rna-seq! Love your videos and your communication skills. Hope to see the rest of the 2024 tutotial soon :D

  • @user-ne7vm7fb3y
    @user-ne7vm7fb3y 2 месяца назад

    You were great.

  • @babyfriedrice4878
    @babyfriedrice4878 2 месяца назад +5

    i love sanbomics so much!!!!!!!!!!!!!!!!!!!

    • @sanbomics
      @sanbomics  2 месяца назад +1

      I love you too!

  • @taoufikbensellak9274
    @taoufikbensellak9274 2 месяца назад

    I just started your sc guide and I really enjoy it. Just for some clarifications about the tools, I use mamba (conda) with python 3.8 and a lower version of pandas (

    • @sanbomics
      @sanbomics  Месяц назад

      I'll be doing DE using a different approach this time which should give people fewer issues. Diffxpy can be a struggle so I don't really use it anymore

  • @kristifourie8427
    @kristifourie8427 2 месяца назад

    best page ever

  • @MinnnWang-uv8bn
    @MinnnWang-uv8bn 2 месяца назад

    🎉🎉🎉thanks!

  • @gerolduntergasser4000
    @gerolduntergasser4000 2 месяца назад

    cool
    good job😁

  • @islemgammoudi842
    @islemgammoudi842 2 месяца назад

    Thanks for the Videos. Currently, I'm embarking on the journey of analyzing single-cell RNA sequencing (scRNA-seq) data combined with CITE-seq data. However, I'm facing challenges related to duplicate discrimination and assigning sub-samples via hashtags.
    Given your expertise in this area, I was hoping you could provide some guidance and advice on how to navigate these challenges effectively.

  • @moonmoun2983
    @moonmoun2983 2 месяца назад

    I would like to thank you immensely because you’re one of the few bioinfo channels I can follow along, I have a question regarding a result I obtained from a following the previous full scRnA seq walkthrough you posted a year ago. I tried applying the code to a before and after chemotherapy treatment. Everything worked perfectly until i got to the deg analysis part with heat maps, With 25 top upregulated and downregulated genes and the filtering codes it didn’t yield more than 12 degs, so I had to reduce the filtering and kept genes with significant fold change above 0.05 . And I ended up with more differentially expressed genes, however in both cases my heat map was devoid of pattern, both the condition and control looked mostly downregulated. Should I conclude that there is no deg or expression signatures in both cancer sample before and aftertreatment? Because the original paper i took my data from didn’t do a deg analysis for the whole dataset but selected 4 patients out of 12 to create a deg heatmap with less than 10 genes. thank you, I’d highly appreciate your insight on my results

    • @sanbomics
      @sanbomics  2 месяца назад

      Its really hard to say without knowing more and actually getting a feel for the data. You can try a pseduobulk approach and see if you have and degs. I have a video on that, but will also be covering it soon in the new tutorial series.

  • @fsh9134
    @fsh9134 Месяц назад

    Thanks for making very useful videos. I was wondering if you would like to make a video related to single cell analysis using Julius AI a data analysis AI.

  • @mehdiraouine2979
    @mehdiraouine2979 2 месяца назад

    amazing work as always ! on a side note, if I were to download a fastq data from GEO with no specification of whether the adapters were removed or not in the paper, how should I check if they were removed on python.

    • @sanbomics
      @sanbomics  Месяц назад

      I wouldn't use python to do it only because there are several command line tools that are much faster that can do the same thing. Like cutadapt

  • @mehdiraouine2979
    @mehdiraouine2979 2 месяца назад

    Another question: if you were to choose between SCVi model for detecting doublets and this clf doubletdetection method, which one is more straightforward? I feel like this method needs some tinkering around depending on the specific dataset

    • @sanbomics
      @sanbomics  Месяц назад

      The best method would be to use multiple methods. They will all give you slightly different results but hopefully have significant overlap. The reason I used doubletdetection here is because it is fast/simple and I already have multiple video tutorials on SOLO (scVI). It's hard to say which is more accurate. Changing parameters in scvi/SOLO will likely change the results a lot too just like what happened here.

  • @CaveCrack
    @CaveCrack 2 месяца назад

    Thanks for the great video and series. I have a question at around 36:40 on how to interpret the graph. If the experiment had loaded say 14000 cells it appears that around 8000 would be recovered which I assume we would interpret as the number called by cellranger... For 14000 cells loaded the multiplet rate appears to be 6%, 6% of 14000 being 840 expected multiplets. However, all the blue recovery dots are aligned around 4.5%. 4.5% of 8000 would be only 360 expected multiplets. The document from which the graph is extracted says "Generally an increased number of cells per sample will increase the doublet rate". I've not been able to find clarification. Thank you

    • @CaveCrack
      @CaveCrack 2 месяца назад

      Also, I am wondering if your low number of detected doublets at 1e-16 was due to the previous QC step where you exclude cells with the highest logp_total_counts and log1p_n_genes_by_counts, as these could filter a lot of doublets.

    • @sanbomics
      @sanbomics  2 месяца назад

      I think in this case just ignore the blue line. The more cells you load the higher multiplet rate and more total multiplets you will have

    • @sanbomics
      @sanbomics  2 месяца назад

      Exactly, it's hard to say exactly what percent the multiplets are because of the first step. I think I mention it in the video briefly... or at least i thought it

  • @goddyhong
    @goddyhong 2 месяца назад

    thx for sharing! if i use a filtered matrix for analysis, do i still need to remove the background RNA? since i dont have a 4090🤣

    • @sanbomics
      @sanbomics  2 месяца назад +1

      If you have a filtered matrix you can't remove background RNA. But if its just a time thing, you can use your CPUs with SoupX. I have another video on that. If you only have filtered counts, you are stuck with what you have!

  • @555gong9
    @555gong9 2 месяца назад

    Thank you for such a great video. Which is better for removing doublets, doubletdetection or the previous SCVI method?

    • @sanbomics
      @sanbomics  2 месяца назад

      I haven't done or seen a comparison between the two. The best would probably be to run both and see how they overlap. All i can say is that doubletdetection is easier and faster

    • @555gong9
      @555gong9 2 месяца назад

      Thank you for your advice, I will try it next, thank you very much, my superhero.

  • @abellopez8017
    @abellopez8017 2 месяца назад +1

    Hello! Thanks for the Video, I will begin my PhD in Bioinformatics in August, what computer do you have?

    • @sanbomics
      @sanbomics  2 месяца назад

      Well.. at home I have a 32 vCPU, 128 gb ram, rtx 4090. At werk I have a 64 cpu, 256 gb RAM, rtx 4090. Sometimes I have to use AWS when I need more than that. Depending on what you plan to do it can vary a lot.

  • @frutitadelosmares
    @frutitadelosmares 13 дней назад

    Hi! Thanks so much for such a great tutorial!
    Have a naïve question of someone who just started in this world: When raw data is not available, for example, you can only download normalised filtered values, do you skip the pre-processing step? Is it correct to pre-process normalised values, let's say tmm?
    Again, thanks so much for all the videos!

    • @sanbomics
      @sanbomics  7 дней назад

      Yeah if there are no raw counts then you will have to skip the ambient removal. Unfortunately, this is the only way sometimes.

  • @caspase888
    @caspase888 22 дня назад

    Your videos are amazing. Thanks a lot.
    Could I use 3050 with 64 GB RAM for this kind of analysis?
    Thanks a lot.

    • @sanbomics
      @sanbomics  7 дней назад

      You can do a decent number of cells with 64 gb ram. I would think you could handle around ~200k in memory at the same time without too many issues. Some steps/algoirthms use a lot more memory though so it is highly dependent on what you do. In my experience 64 gb wont be enough for large datasets/atlases but you can def do small numbers of samples.

  • @AP-vo7gp
    @AP-vo7gp 2 месяца назад

    Sir, I have count matrix and want generate annotation matrix out of it then do the batch correction and then DGA plz help via process as i am not getting suitable results.

    • @sanbomics
      @sanbomics  Месяц назад +1

      Hi it is hard for me to help without knowing more specifics and what the issue you are having is

    • @AP-vo7gp
      @AP-vo7gp Месяц назад

      @@sanbomics thanks alot sir I was able do it :)

  • @pinchos90
    @pinchos90 2 месяца назад

    are you're still going to develop workflows for R or you're sticking with python?

    • @sanbomics
      @sanbomics  Месяц назад

      I prefer python, but even this tutorial series will have some R in it because it is unavoidable. So I will have more R videos in the future

  • @ghujka
    @ghujka Месяц назад

    Have a beer on me bro🍺

    • @sanbomics
      @sanbomics  Месяц назад

      Thank you!!! I can do that ;)

  • @charlieintampa6769
    @charlieintampa6769 2 месяца назад

    F%(k. Seems super useful but you could have been speaking any random language and I would have understood about the same.