3 minute GSEA tutorial in R | RNAseq tutorials

Поделиться
HTML-код
  • Опубликовано: 7 авг 2022
  • Complete gene set enrichment analysis (GSEA) R tutorial in 3 minutes. I show you which R packages to install, how to run them on your differential expression output, and how to plot the results.
    My example is Deseq2 output, but you can use this on any set of genes you can rank based on LFC, P-value, etc. You can use data from outside of R if you read in the csv.
    Notebook:
    github.com/mousepixels/sanbom...
    More info and examples can be found:
    bioconductor.org/packages/dev...
  • НаукаНаука

Комментарии • 47

  • @aldaszarnauskas27
    @aldaszarnauskas27 Год назад +5

    Hey man, you are so effective! Everything was straightforward, spot on, no time waste. This is what usually people need, just a quick tutorial without extra info!
    Thanks!

  • @xelaldaero9339
    @xelaldaero9339 Год назад

    Thanks Man! I need a full analysis of DESeq and this

  • @lst595991
    @lst595991 Год назад

    Thanks for your tutorial!

  • @mocabeentrill
    @mocabeentrill Год назад +1

    Bro, where were you like a month ago. I struggled and struggled until I figured it out. but big thanks anyway.

    • @sanbomics
      @sanbomics  Год назад

      Wish I did it sooner for you :(. Did you end up using clusterprofiler, or something else?

    • @mocabeentrill
      @mocabeentrill Год назад +1

      @@sanbomics 😅😅😅. I used cluster profiler. Now, I'm busy with WGCNA. Can u belief, just 6 months ago, I didn't even know R syntax. I just wanna get dangerous enough in R then I'm learning Python just like u. You're a huge inspiration🙌🏿🙌🏿🙌🏿.

    • @sanbomics
      @sanbomics  Год назад +1

      It's surprising how much you learn just by struggling through things in the beginning. That is definitely the way to do it IMO. Enough R to be proficient then learn python to be more future-proof in this age of machine learning. Thank you for the kind words!

  • @mariannebest6796
    @mariannebest6796 Год назад +1

    Hi thank you so much for this video! I am fairly new to R so sorry if this is a dumb question - but you put geneSetID = 1 for the example of the nuclear division, but for choosing specific pathways to plot, would I look up a description and then use it's ID e.g. GO:0038065 and code like so: gseaplot(gse, geneSetID = "GO0038065") ?
    I have tried the above as I expect this pathway to be highly enriched in the downregulated genes - from the genes that were flagged in DEG analysis, however although the enrichment signature is the correct way round indicating it is enriched in the downregulated genes, there are barely any black lines scored for the genes... Just wondered if you had any insight for this? Thanks so much!

  • @ahmedal-mammari9639
    @ahmedal-mammari9639 Год назад

    you are great

  • @MrFluffster101
    @MrFluffster101 Год назад +2

    Thanks for the video! Why did you extract "stat" for your genelist, rather than lfcSE or padj?

    • @sanbomics
      @sanbomics  Год назад +3

      Good question! Any of these would work. Stat takes into account the difference as well as the error. Which statistic to use is somewhat arbitrary and can always be debated. Sometimes I use lfc * -log10P. For deseq2 output, stat seems to work pretty well.

  • @cleo4325
    @cleo4325 4 месяца назад

    Thanks for the video! Is there a way to just filter for protein encoding genes during GSEA? (In my case, I used EnrichGO)

  • @niharikasingh7677
    @niharikasingh7677 Год назад

    This is precisely the kind of content I'm looking for while performing bioinformatics analysis. Thank you so much! Just a quick query, what exactly does the stat parameter signify? It isn't in any way a misrepresentation of our DEGs, right?

    • @sanbomics
      @sanbomics  Год назад

      I don't remember off the top of my head exactly how it is calculated, but it takes into account the magnitude of the change as well as the standard deviation. It will be highly correlated to lfc, and the abs(stat) to p-value. If your DE genes will have a higher abs(stat). But for GSEA it is just used for ranking and GSEA is independent of whether it is a "significant" DE gene. The metric to use for ranking is still debated, but they never really differ that much

  • @giuconv7832
    @giuconv7832 9 месяцев назад

    Hi! Thanks so much for this tutorial, it was extremely useful. However, while running gseGO, I get an error: could not find function "fgseaMultilevelCpp". Any suggestion?

  • @13attles
    @13attles Год назад

    Hey Sanbomics, great video! With this method, is it possible to plug in Hallmark Gene sets from MSigDB? Not sure where would I plug in those

    • @sanbomics
      @sanbomics  Год назад +2

      You can for sure add your own gene sets. I don't know of the top of my head, but it is probably in the documentation. I think I cover it in my Python-based GSEA if you are interested.

  • @kyungwonmin7217
    @kyungwonmin7217 Год назад +1

    Hi. I am studying with non-model organism. I have done functional annotation so that I have gene names, ref-seq or GO ID along with DEseq data.
    With these data, can I do GSEA? You are using org database base in order to conduct GSEA.

    • @sanbomics
      @sanbomics  Год назад +1

      It depends if your organism has annotated gene sets. If it doesn't, you could make up your own to see if they are enriched. For example creating homolog gene sets from a model organism. And you will have to use gene symbol probably instead of ENSEMBL ID

  • @violetaduranlaforet5520
    @violetaduranlaforet5520 11 месяцев назад

    Awesome video! Can you do this with a custom set of genes?

    • @sanbomics
      @sanbomics  11 месяцев назад

      Yup! I don't remember off the top of my head the arguments for it, but it should be relatively straight forward. If you cant figure it out here, in my python GSEA video I think I do show how to do a custom set.

  • @gab4434
    @gab4434 Год назад

    Thank you so much! I was wondering if you have the code pasted in your GitHub, I cound't find it :(

    • @sanbomics
      @sanbomics  Год назад

      Sorry! I didn't end up posting it because it was just a few lines of code. Here it is: github.com/mousepixels/sanbomics_scripts/blob/main/3_min_GSEA_tutorial.Rmd

  • @yijingwang7308
    @yijingwang7308 Год назад

    Thank you so much! But I have two questions: the first one is why you selected only baseman > 50? The second one is you put all genes not differential expression genes for GSEA right?

    • @sanbomics
      @sanbomics  Год назад +1

      Genes that are very lowly expressed are noisy. Thats why it is good to filter them out. 50 is arbitrary. e.g., 100 could work as well. Yup! Thats what I do in the video. Better to keep all genes (except ones with very low expression)

    • @chrislee8408
      @chrislee8408 Год назад

      So is the purpose of doing a DEG in your video just to filter out lowly expressed genes? But you're actually using all genes (except lowly expressed ones) in the GSEA?

    • @elhombreloco3680
      @elhombreloco3680 Год назад

      @@chrislee8408 He did DEGs in the video just because it's faster for the demo

  • @chrislee8408
    @chrislee8408 Год назад +1

    is it possible to do a gene set enrichment analysis without doing a DEG? In my lab, we have just started doing NGS and we are still setting up our QCs. What we have in mind right now for one of our QC is to see if we can guess which sample (there are 4 samples) came from which tissues (heart, liver, kidney, diluent) by doing a gene set enrichment analysis to see if we can identify overexpressed genes which may only be expressed in specific tissues. Do you think it's feasible? Thank you.

    • @sanbomics
      @sanbomics  Год назад +1

      Yes, you just cant use this method. You can do overrepresentation analysis which just requires a list of genes. Look at one of my GO enrichment videos. ruclips.net/video/JPwdqdo_tRg/видео.html

    • @chrislee8408
      @chrislee8408 Год назад

      @@sanbomics I see what you're saying. I am currently stuck on how to group my samples for the DEG analysis. I know for DEG it's mostly comparing, for example: "treatment" v. "control" samples and there are usually 3 or more replicates for each groups (treatment & control). However, If we were "blinded" to knowing which sample came from which tissues and which sample is the control and we're not just comparing treatment v. control but instead 3 different tissue types that received the same treatment v. one control sample, in which our goal is to determine which sample came from which tissue based on genes that were overexpressed in each samples --do you have any idea in how to group or design a DEG analysis so that you can then take the differentially expressed genes from each sample to do a overrepresentation analysis (guessing you said GSEA isn't the best method for this goal) to figure out which sample came from which tissue? Thank you.

    • @sanbomics
      @sanbomics  Год назад

      I think I'm still a little confused on your ultimate goal... Maybe you could filter the common DE genes for one cell type? e.g., the common unregulated genes in muscle vs liver, muscle vs kidney, muscle vs adipose

  • @joeyoviedo5202
    @joeyoviedo5202 9 месяцев назад

    hi, could you do a video using just normalized counts to make a ranked list from which gsea could then be done? Thank you!

    • @sanbomics
      @sanbomics  8 месяцев назад

      This should be pretty straight forward just to rank a list based on this. Were you able to figure it out?

  • @frankchen1845
    @frankchen1845 Год назад +1

    thanks for the video; the bioconductor page ticked me off with their overly complicated tutorial with zero explanations

    • @sanbomics
      @sanbomics  Год назад

      No problem! Happens for a lot of things unfortunately

  • @0916079787
    @0916079787 Год назад

    2:45
    How did you flip the comparison at this point? what exactly did you do?

    • @sanbomics
      @sanbomics  Год назад +1

      In deseq2 i just changed the order of the contrast. ("condition", "C", "S") to ("condition", "S", "C")

    • @0916079787
      @0916079787 Год назад

      @@sanbomics Thank you a lot. I have finished this RNAseq and it was very useful. keep it up.

  • @meetpanjwani2752
    @meetpanjwani2752 Год назад

    Hello, Thank you for the insights. It has been really helpful. I have been getting this error when I run the code. I use Entrzid. Do you have any idea why?
    preparing geneSet collections...
    GSEA analysis...
    Error: BiocParallel errors
    2 remote errors, element index: 1, 57
    109 unevaluated and other errors
    first remote error:
    Error in fgseaMultilevelCpp(x[, ES], stats, unique(x[, size]), sampleSize, : could not find function "fgseaMultilevelCpp"
    In addition: Warning messages:
    1: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, :
    There are ties in the preranked stats (18.78% of the list).
    The order of those tied genes will be arbitrary, which may produce unexpected results.
    2: In serialize(data, node$con) :
    'package:stats' may not be available when loading
    3: In serialize(data, node$con) :
    'package:stats' may not be available when loading

    • @sanbomics
      @sanbomics  Год назад

      Hi, sorry it is hard to tell just from this. were you able to figure it out?

    • @meetpanjwani2752
      @meetpanjwani2752 Год назад

      @@sanbomics hello! No not yet.

    • @meetpanjwani2752
      @meetpanjwani2752 Год назад

      Hello ! I found a solution which solved the BiocParallel error
      I used this code before running your code.
      library(BiocParallel)
      register(DoparParam())
      I am not sure what it does, but it works.

    • @sanbomics
      @sanbomics  Год назад

      Huh, very interesting. I have never come across this. Nice job figuring it out. What operating system are you on?