RNAseq tutorial - part 4 - Differential expression analysis with Deseq2

Поделиться
HTML-код
  • Опубликовано: 8 ноя 2024

Комментарии • 76

  • @khawa
    @khawa Год назад +3

    Thank you!!! This is the best DESeq2 tutorial so far. It's easy to follow and every step makes sense. I am sure many others are benefiting and will benefit from you! I hope you are having a wonderful day or night wherever you are! Thanks a lot!!!

    • @sanbomics
      @sanbomics  Год назад

      That is great to hear! Thank you for your kind words! :)

  • @SlugmaB
    @SlugmaB Год назад +1

    Thanks mate, all the tutorials in this series have been top notch. They go at a great pace and I appreciate that you explain pretty much everything you're doing

  • @joyhoskeri2130
    @joyhoskeri2130 2 месяца назад

    I found your video as a diamond in the pile of sand.............. superb bro... Thanks a lot.

  • @chuz7487
    @chuz7487 4 дня назад

    Thanks!

  • @florawang4684
    @florawang4684 2 года назад +1

    Hi, Nice tutorial!!! Thank you so much. May I ask how to compare 3 or more groups with different sample sizes?

    • @sanbomics
      @sanbomics  2 года назад

      Good question. Sample size shouldn't matter as long as each is at least n = 3. For three groups the easiest way is to do a 2 v 2 v 2 analysis. So each group is compared against each of the other groups. In the deseq function you can specify which groups to compare. Alternatively, if you are newer to R and using Deseq you can just load in only two of the groups at a time.

    • @sanbomics
      @sanbomics  2 года назад

      to clarify. Given groups a,b,c. Do a vs b + a vs c + b vs c

    • @excelobiageli9446
      @excelobiageli9446 2 года назад

      I wanted to ask the same question. Please after comparing, how can I combine the results? I want to use the results to run co-expression analysis

  • @DebC_UK
    @DebC_UK 2 года назад +1

    Greetings, I greatly enjoyed and appreciated this series of 4 videos. I am definitely your follower now.
    I would like to know what exactly is the purpose of creating the vsdata transformed dataset.
    More important query: When creating the results table, why did you do it using the dds dataset instead of the vsdata dataset?? Just asking what is the logic behind it.

    • @sanbomics
      @sanbomics  2 года назад +2

      Thank you! Good question. I just use vst to get a normalized matrix to run the PCA. This step is not required for the normal deseq workflow, just for visualization. You could remove the vsdata and pca and still have the results at the end. dds is still the main deseq object. Hope this answered your question!

  • @freezingtolerance7493
    @freezingtolerance7493 Год назад

    Hi. Sir.
    Thank you for your video.
    Just quick question. Once we run DeSeq(dds) function, the generated results are based on normalized data? after you run "res = results(dds, contrast = c("condition", "S", "C"), you ve got 7 columns including log2foldchange. this log2foldchange is calculated based on normalized data? of course, we can get normalized data using estimaterSizeFactors(dds) followed by counts(dds,normalized=T). But, before this code, we just run dds function and then extract result.

  • @ashwaqkhaled7530
    @ashwaqkhaled7530 2 года назад

    Thank you Mark for your all informative videos. Is there a way to produce RPKM/FPKM and TPM values from DESeq2 library and what’s the easiest way to obtain gene length?

    • @sanbomics
      @sanbomics  2 года назад

      Thank you! You can set the mcols(dds)$basepairs column to the size of the genes using the GenomicRanges packages, then you can call the deseq fpkm function directly. If you can't figure it out, let me know and maybe I can include it in a short upcoming video

    • @ashwaqkhaled7530
      @ashwaqkhaled7530 2 года назад

      @@sanbomics I appreciate your response. I would love to share it. I tried your suggestion but I could not figure it out. Thank you very much.

    • @sanbomics
      @sanbomics  2 года назад

      Ok. I will do this in an upcoming video!

    • @ashwaqkhaled7530
      @ashwaqkhaled7530 2 года назад

      @@sanbomics Thank you so much! I truly appreciate it.

  • @ZahidHussain-xb8it
    @ZahidHussain-xb8it 2 года назад +1

    Hi, informative video. I want to know, how to deal with non-intergers data having decimal. The data matrix function doesn't work on such data set

    • @sanbomics
      @sanbomics  2 года назад

      Hi, I'm not sure I am exactly sure what you are asking. Which step are you referring to?

    • @florawang4684
      @florawang4684 2 года назад

      I tried going back to .csv data and round the numbers

    • @ZahidHussain-xb8it
      @ZahidHussain-xb8it 2 года назад

      @@florawang4684 Thank you 😊

    • @ragnulf_gamer3278
      @ragnulf_gamer3278 2 года назад

      is your data normalized? If not, the data won't be decimals. As far as I know

  • @anandhakumarchandran325
    @anandhakumarchandran325 2 года назад

    Hi thanks for the useful tutorial, how do we convert results (differential table) in to dds (DESeq output)? In a way we can apply the padj cut-off in the res -> dds -> vsdata. Or is there any other way to get padj cut-off applied dds? Thank you

    • @sanbomics
      @sanbomics  2 года назад +1

      Hi, you can take the rownames from the significant values from result and filter the objects by it

  • @diyabhattacharya1333
    @diyabhattacharya1333 2 года назад

    Hi, thanks for this informative video on DESeq2. I have been stuck for a while with input data matrix before running DESeq2. I can see that my Gene identifier column automatically becomes the first column when I am arranging the condition and coldata (column data of my htseq readcounts) into matrix format. Can you suggest me how do I fix it? Thanks!

    • @sanbomics
      @sanbomics  2 года назад

      Hi, no problem! I am a little confused at what the issue is you are having. When you make the coldata DF it has the gene ids in it?

  • @julieapostolou6309
    @julieapostolou6309 Год назад

    Hi there!
    That's really the best Deseq2 tutorial I have seen so far, thank you very much!!
    I have one question: I ran the first command that includes the header and row.names (row.names =1) but I get the following error message:
    "Error in read.table(file = file, header = header, sep = sep, quote = quote, :
    duplicate 'row.names' are not allowed"
    I read a lot of sites that suggest to null the row.names but that is not a good idea for my data.
    Have you ever encountered this error? Do you have any recommendations?
    Thanks in advance!

    • @sanbomics
      @sanbomics  Год назад +1

      You can try adding a tag like _2 to each of the duplicate ones and see if that fixes it

  • @hatchet646
    @hatchet646 Год назад

    Hi. I am really thankful for your videos. Atm i am in a pickle. I looked up results() function man page since i am a bit confused about this "contrast" argument. The confusion comes from the fact that i have 3 types of samples not just "s" and "c". Either the "contrast" has 1 vector with exactly 3 elements like in the video, (and here comes the confusion): or 2 vectors with names of the fold changes for the numerator, and names of the fold changes for the denominator. What are these? The 3rd option that contrast can contain is "a numeric contrast vector with one element for each element in resultsNames(object) (most general case". Should i use the 2nd or the 3rd option? and what these numerators and denominators mean here? Thank you really.

    • @sanbomics
      @sanbomics  Год назад

      I wouldn't worry about the third option and just use the variable names instead. You can only do pairwise comparisons. Contrast will always have only 3 elements (column, variable1, variable2). If you want to compare a third variable against variable1 it would be (column, variable1, variable3). Hope this helps!

    • @hatchet646
      @hatchet646 Год назад

      @@sanbomics Thanks! And have a great holiday!

  • @munibabashir6951
    @munibabashir6951 2 года назад

    Informative video. Thanks
    I have a query regarding data analysis if you could please help me in that. I have a data set for tumors that I downloaded from cancer data portal so now I have gene expression data and clinical data for both tumors. I want to compare the gene expression of both tumors but I am no getting from where I should start, how can I compare these tumors by using DESeq2. Please guide me. Thank you

    • @sanbomics
      @sanbomics  2 года назад

      Hi, thanks! You need to have a count matrix for deseq2. A table where the columns are samples, the rows are genes, and the values are read counts. How many tumors do you have? You will need at least 3 both for any meaningful analysis. In actuality, since you are using human (i.e., non-model organisms) you should have many more than 3 in each group. Let me know if you have any other questions!

  • @adekunleajiboye1244
    @adekunleajiboye1244 Год назад

    Thanks for this video. Please how can I reach out to you?

  • @sanjaisrao484
    @sanjaisrao484 2 года назад +1

    Please tell how to collect sample data for GSE99816 , how to know which sample is normal / diseased , Please help me sir.( i used geoquery but it didn't contain this information) please help

    • @sanbomics
      @sanbomics  2 года назад

      www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99816
      Let me know if this has what you need!

  • @adampassman
    @adampassman 2 года назад

    Could you further clarify with regard to how you would pick a threshold for row sums. Not sure what you meant by "filter their end result by their mean".

    • @sanbomics
      @sanbomics  2 года назад +1

      Sure: First you should filter out all rows with 0 counts. Then later after DE analysis you can filter out rows with low counts, e.g., 50. Because things with low counts are inherently more noisy. For example, some samples might have 1 count for a specific gene and another sample has 16. Sure that is 16 times higher, but likely does not reflect an actual 16 fold difference in gene expression. Most likely they are both lowly expressed. The actual threshold is a little bit arbitrary, but 50 as the mean expression across all genes is a good place to start.

    • @adampassman
      @adampassman 2 года назад +1

      @@sanbomics got it, thank you for taking the time to clarify for me! Guess its also dependent on type of material (probably expect less reads from 3' FFPE like I'm working with) and how many samples.

  • @ParthShah-hc8pw
    @ParthShah-hc8pw 5 месяцев назад

    hey mate
    Am getting this error
    followed all your steps except the filter one
    pls help
    I have 5 columns in my matrix (M10,M11,M12,M3 and M5) and EMBSEL gene ids to it
    The dds step is not working
    Error in checkForExperimentalReplicates(object, modelMatrix) :
    The design matrix has the same number of samples and coefficients to fit,
    so estimation of dispersion is not possible. Treating samples
    as replicates was deprecated in v1.20 and no longer

  • @apurvagopisetty1787
    @apurvagopisetty1787 Год назад

    Hi ! Thanks so much for the EXTREMELY helpful video ! Im running into an issue, could you please help me out here? Thanks!
    On running the code to check dispersion ==>> plotDispEsts(dds)
    I get the following error :
    Error in class(x)

    • @apurvagopisetty1787
      @apurvagopisetty1787 Год назад

      Before ^^this step i did run the code for deseq on my data like you did:
      dds < DESeq(dds)
      The console does show that its run entirely with the following message -
      .......fitting model and testing
      LogicalList of length 24397
      [["ENSMUSG00000102275.1"]] logical(0)
      [["ENSMUSG00000025903.14"]] logical(0)
      .
      .

  • @ragnulf_gamer3278
    @ragnulf_gamer3278 2 года назад +1

    Hello. I am getting an error. While running the DESeqDataSetFromMatrix function, an error pops up
    Error in DESeqDataSet(se, design = design, ignoreRank) :
    'design' should be a formula or a matrix
    can you tell me how to solve this issue? My dataset consists of 8 columns (4 cancer+ 4 normal samples).

    • @sanbomics
      @sanbomics  2 года назад +1

      Hi, often there is a typo or small differences that can cause this. Can you paste the command you ran?

    • @ragnulf_gamer3278
      @ragnulf_gamer3278 2 года назад

      @@sanbomics sure. here it is.
      des

    • @sanbomics
      @sanbomics  2 года назад +1

      if you change -de to ~de does the error go away?

    • @ragnulf_gamer3278
      @ragnulf_gamer3278 2 года назад

      @@sanbomics It worked! It's funny how I overlooked the typo. Thank you so much!

    • @ragnulf_gamer3278
      @ragnulf_gamer3278 2 года назад +1

      I'm also able to generate the PCA plot, with clear differences between the two clusters. Can you have a look at it and give me some more insights as to what additional info this provides? It would be so much appreciated.

  • @basicbioinformaticsmethods8646
    @basicbioinformaticsmethods8646 2 года назад

    Hi, Is it possible to do DEG analysis without replicates? I have 6 RNA sequences from different plant tissues that I want to compare

    • @sanbomics
      @sanbomics  2 года назад

      Hi, the short answer is: it really isn't possible. But if your data is completely exploratory, maybe you can help generate hypothesis based on some big differences you see. Which you will then test using a different method with more replicates.

    • @kitony
      @kitony Год назад

      @@sanbomics same way like in your nice tutorial?

  • @rushonline23
    @rushonline23 Год назад

    Hi, could you pls help me on how to filter out only the protein coding genes? Thankyou.

    • @sanbomics
      @sanbomics  Год назад

      There are multiple ways to do it, I am not sure what is the easiest way but this is what I would do: If you have a list of genes that are coding you can subset the DE results by that list. You can parse the gtf file if you need to make your own list. The gtf file should contain whether a gene is coding or not.

  • @mirij827
    @mirij827 2 года назад

    what If I have 3 conditions instead of 2? When I try to run res

    • @sanbomics
      @sanbomics  2 года назад

      You have to decide which to do pairwise comparisons between. Since you have three samples, you only have a maximum of three comparisons. So it is easy enough to just run the DE with the three possible comparisons. Which is assigned by the contrast argument, which should be passed the two group names.

    • @mirij827
      @mirij827 2 года назад +1

      @@sanbomics got it! Thanksb

  • @oksanaofeliahuertareyes4695
    @oksanaofeliahuertareyes4695 Год назад

    Hi, when use Counts 50),] I get this:
    Error in h(simpleError(msg, call)) :
    error in evaluating the argument 'x' in selecting a method for function 'which': 'x' must be numeric
    can you help me
    Thanks for the videos

    • @sanbomics
      @sanbomics  Год назад +1

      Do you have a value in the counts column that is not a number?

    • @oksanaofeliahuertareyes4695
      @oksanaofeliahuertareyes4695 Год назад

      ​@@sanbomics I fixed it. Yes, that was my problem. Thanks for the tutorial.

  • @raktimchowdhury7127
    @raktimchowdhury7127 2 года назад

    what is the function of row.names=1??

    • @sanbomics
      @sanbomics  2 года назад

      It's just specifying that the first column contains the row names

  • @chrisdoan3210
    @chrisdoan3210 2 года назад

    I followed along but the result I got was different. Could you explain what the possible reason is?

    • @sanbomics
      @sanbomics  2 года назад

      Hi, it's hard to say without seeing exactly what you did, or what you mean by different. You generated the counts table from the same fastq files, using the same reference, and using the same software/parameters that I did in the previous 3 videos?

    • @chrisdoan3210
      @chrisdoan3210 2 года назад

      @@sanbomics Hi Sanbomics, I use this reference: Homo_sapiens.GRCh38.107.gtf, from ensembl. I think I don't change anything that I know. Maybe I changed something that you didn't explicit mention.

    • @sanbomics
      @sanbomics  2 года назад

      I've uploaded the counts table I used to: github.com/mousepixels/sanbomics/blob/main/count_table_for_deseq_example.csv
      See if that turns out the same.

    • @chrisdoan3210
      @chrisdoan3210 2 года назад

      @@sanbomics The order of geneID I got was different from you, so it is a little difficult to compare 2 tables. Anyway, I think the data is the same, just the order is different.

    • @sanbomics
      @sanbomics  2 года назад

      Ahh makes sense. Glad you figured it out!

  • @MM-fj7ym
    @MM-fj7ym Год назад

    Help me
    My coade: dds

    • @sanbomics
      @sanbomics  Год назад

      It seems like there is something in your counts table that shouldn't be there. Try opening it in excel and taking a quick look at it for anything obvious

    • @chickenkorma3163
      @chickenkorma3163 Год назад

      ​​@@sanbomicsfeatureCounts has the options -M and --fraction. They allow fractional counting in case you can map the same reads to different locations in the genome. Each of the possible features gets a fraction of the count.
      Also thank you very much for the video it was a great help!

  • @OuanhPhomvisith
    @OuanhPhomvisith 11 месяцев назад

    Thank you very much for sharing the helpful vdo; however, this happened when I loaded my counts.
    counts

    • @sanbomics
      @sanbomics  10 месяцев назад

      You can read it as a csv, remove the relevant rows, then add the rownames based on that column