Pseudobulk single-cell analysis in Python with Scanpy and pyDeseq2

Поделиться
HTML-код
  • Опубликовано: 11 июл 2023
  • It is now possible to do pseudobulk analysis directly in python on your scanpy object. I create the pseudobulk from single-cell data then analyze it with the python port of Deseq2.
    Notebook:
    github.com/mousepixels/sanbom...
  • НаукаНаука

Комментарии • 29

  • @sanbomics
    @sanbomics  10 месяцев назад +3

    Important typo in code when making pseudo-replicates:
    Need to add [indices[i]]. It should be as follows:
    rep_adata = sc.AnnData(X = samp_cell_subset[indices[i]].X.sum(axis = 0),
    var = samp_cell_subset[indices[i]].var[[]])
    Also, If you get an error about the shape you will have to add .reshape(1, -1) to the end of sum(axis = 0)

  • @Brickkzz
    @Brickkzz Год назад +12

    Eternally grateful for this channel - the most useful resource on scRNAseq analysis in Python on the internet!

    • @sanbomics
      @sanbomics  11 месяцев назад +4

      Thank you :) ... Borne of my avoidance of R at all costs xD

  • @lly6115
    @lly6115 Год назад

    My gratitude. Thank you for you time.

  • @neishajmoments
    @neishajmoments 3 месяца назад

    You are a life saver ! 😊 Thanks

  • @jalv1499
    @jalv1499 9 месяцев назад

    Thank you very much! This is very helpful! I have One question: Can you clarify the difference between differential abundance analysis and this pseudo bulk approach to study the difference of two conditions?

    • @sanbomics
      @sanbomics  8 месяцев назад

      They are similar, but pseuobulk looks at the summed expression of a population of cells and other methods might look at the distribution of expression in all cells in a population. One issue, among others, being that the high sample size of many cells inflates significance.

  • @sjorsmaassen3764
    @sjorsmaassen3764 6 месяцев назад +1

    Thanks a lot for the tutorial. You are really doing a great service for anyone who is trying to learn more about scRNA seq analysis. I have a question that I hope someone here can anwser:
    For making a pseudobulk wouldnt it make more sence to get the mean of your counts instead of the sum? Because the sum method can be influenced by the total number of cells in a condition I would say. So if by random change you have outliers from a batch, or you have just more of a certain cell type in you tissue (which I would image to be the case for marcophages during a covid infection), this wouldinfluence you results.

    • @sanbomics
      @sanbomics  5 месяцев назад

      Good question. Later, the counts are corrected by size factor which will account for differences due to the total number of cells.

  • @ZnaniumTV
    @ZnaniumTV 4 месяца назад

    Thank you very much for this very helpful video. I have a question regarding batch correction before using DESeq2. I obtained 6 samples using hashing; however, they were sequenced in 2 lanes, leading to a significant batch effect that can be observed. Usually, this is corrected with integration methods in Scanpy or Seurat. However, if we pseudobulk based on our hashing and obtain the raw data needed for DESeq2, we lose this batch correction step. Would you have any ideas on how to address this? I've checked that some of the options are RUVSeq or SVA. Thank you very much.

    • @marwanmohamed3844
      @marwanmohamed3844 3 месяца назад

      i have similar issue , of batch effect in my libraries and if i use pseudo bulk rawcounts for deseq2 i see strong batch effect, did you manage to solve this?
      thanks would appreciate your advice on this

  • @qhawenid
    @qhawenid Год назад +1

    Thanks much for such a concise and informative tutorial. One question. Is there a way to do pseudobulk DGE analysis between cell types? Thanks in advance.

    • @sanbomics
      @sanbomics  11 месяцев назад +2

      You could just subset the cells by cell type, similarly to what we do here. You can pseudobulk any set of cells you can subset from your data. Although, usually cell type differences are so apparent that you don't really have to worry about pseudobulk. Maybe useful if you are comparing cell type subpopulations

    • @qhawenid
      @qhawenid 11 месяцев назад

      @@sanbomics Thanks for the timely response. You're ding God's work!

    • @sanbomics
      @sanbomics  11 месяцев назад +1

      Thanks :) You're too kind.. It wasn't that timely xD

    • @stefisjustthebest
      @stefisjustthebest 26 дней назад

      Have you come across omicverse which uses pydeg to compare two cell types and do you think thats a valid way of doing it? I'm not sure they even aggregate the cells by sample origin but would be interested to hear your thoughts!

  • @gracegregory4846
    @gracegregory4846 Месяц назад

    Not sure if the DeseqDataSet parameters have changed since this tutorial but I had to change clinical to metadata when running:
    dds = DeseqDataSet(
    counts = counts,
    metadata=pb.obs,
    design_factors="tumour")

    • @sanbomics
      @sanbomics  Месяц назад

      Yup its changed a lot. I'll be remaking it soon!

  • @carlahamilcaro6457
    @carlahamilcaro6457 2 месяца назад

    Hello thank you so much. I was wondering could I do differential expression analysis control vs treatment on all cell types at the same time ?

    • @sanbomics
      @sanbomics  2 месяца назад

      I would put each cell type in a loop and do them separately but you can put all the results back together in the end. I'll have an example posted in the next couple of weeks.

    • @carlahamilcaro6457
      @carlahamilcaro6457 2 месяца назад

      @@sanbomics oh that is amazing thank you so much ! Another question would it also be possible to do de on 3 categories at the same time ? say I want control vs sample that responded to treatment vs samples that did not respond to treatment.
      Thank you for all the help !

  • @estebanelias6958
    @estebanelias6958 5 месяцев назад

    Hi. Firstly, thank you very much for these tutorials. Very useful. I have 3 questions: 1. How can I check if I saved my raw data after normalization, 2. Can pseudoreplicates be applied in an experiment with 2 conditions that contains pools of cells from 2-3 different samples? 3. How differences in the number of cells in a cluster from 2 conditions can affect DGE results with this method? Thanks

    • @sanbomics
      @sanbomics  5 месяцев назад

      1) Make sure to save the raw data in a layer before you normalize or it wont be there. 2) Yes, this should be ok. 3) Theoretically, the counts are normalized by size factors, but if the number of cells are vastly different, some lowly expressed genes may show in the larger population just because its larger. It shouldn't affect the genes with higher expression

  • @qhawenid
    @qhawenid 4 месяца назад

    How to randomly partition samples (for a scRNA-seq dataset with one sample per condition) to obtain pseudo-replicate samples, and annotate these in metadata of the main adata object? or is there a way to map the newly generated pseudo-replicates to the main adata object?

    • @leoburgy
      @leoburgy 3 месяца назад +1

      You can insert the partition (described in the video) as a column (e.g., "replicate") of the adata.obs dataframe (of the main adata).

    • @qhawenid
      @qhawenid 3 месяца назад

      @@leoburgy Thank you for this

  • @emilynwo4254
    @emilynwo4254 11 месяцев назад

    Could you do a video on other RNA seq analysis such as SLAM-seq?

    • @sanbomics
      @sanbomics  11 месяцев назад

      Sure, I'll keep that in mind for a future video