Gene set enrichment analysis in R

Поделиться
HTML-код
  • Опубликовано: 23 янв 2025

Комментарии • 19

  • @CorruptedSon
    @CorruptedSon 2 года назад +1

    Thank you! Spend a while trying to figure out how to do pathway analysis in R and most guides always expected you already have some sort of GO or Kegg library where you can refer to and don't go into specifics how these libraries work and what to do when they do not work. This step-by-step guide was enough to get me from DEG lists into proper pathway analysis - and I even understood why and what I am doing in each step! I am working with rat sequencing data and some columns I had were very different from the example data you had here but after checking specific points a few times I managed to filter and re-format all the necessary information from my data.

  • @azure-hawk
    @azure-hawk 2 года назад +8

    Great video! I learned about msigdbr and the dplyr::separate function. I just want to mention a few things.
    1. The GSEA ranking metric doesn’t have to be fold-change. I use the gene wise average moderated t-statistic from limma or the signed -log10-transformed p-value. There are a ton of ranking metrics to choose from. Both of these are very similar, and we can compare their density plots to get an idea of how they would alter the GSEA results.
    2. Over-representation analysis is not great as a follow-up to differential analysis because of the arbitrary significance threshold that you mentioned and the fact that there may be duplicates at the gene level. Also, we lose information about the direction of change, since ORA only tells us which sets are more present in the significant group than what we expect by chance. However, it is great when genes uniquely map to discrete clusters, so it is good as a follow-up to WGCNA or K-means clustering.
    3. The figures you use to introduce GSEA show the phenotype permutation approach, but most R implementations (including fgsea) use the gene permutation approach, which is much faster but has a slightly different interpretation.
    4. For ORA, it may be useful to plot the ratio of the number of significant genes in the gene sets to the total number of significant genes along the x-axis and change the bars to points scaled according to the -log10(adjusted p-value). Gene sets that include all significant genes (ratio of 1) may be interesting to look at, even if their adjusted p-values are hovering near 0.05.
    5. The fora function in fgsea can be used for ORA as well. Personally, I find it easier than dealing with the bulkier clusterProfiler results objects.

  • @Stop-and-listen
    @Stop-and-listen 2 года назад +1

    I really enjoyed your presentation. I learned quite a bit. Thank you!

  • @yt.abhibhav
    @yt.abhibhav 2 года назад

    Thankyou! I was just wondering which paper to cite when performing the hypergeometric "simple" enrichment?

  • @smrutimishra9804
    @smrutimishra9804 Год назад

    How to download the genesets directly in R studio?

  • @naveedkhan-fi6ux
    @naveedkhan-fi6ux 2 года назад

    can we do the gene set enrichment analysis for rice using the same code and databases

  • @vaibhavsunkaria7291
    @vaibhavsunkaria7291 2 года назад

    Hi how can we do the gsea analysis for dna methylation genes i have beta values of samples and logFC cutoff of the same, thank you

  • @joeyoviedo5202
    @joeyoviedo5202 Год назад

    Hi, great video and clarification of types of enrichment analyses. I have a question, what is the best way to create a ranked list of genes for 3 treatment and 3 control samples in one data frame using just normalized read counts. I want to rank the gene list from all genes not DEGs then do enrichment analysis. Thank you!

  • @jessehines4044
    @jessehines4044 Год назад

    I'm new to this and I'm wondering why do you need to see how much your significant genes overlap with a larger or other gene set? Is that to elucidate what transcriptional regulation network controls the significant genes and or to discover other similar genes relative to the genes of interest?

  • @jajaja20703
    @jajaja20703 2 года назад +2

    Very clear explanation, thanks for this amazing content! Would you have any additional bio-inf analysis tutorials?

    • @kdillmcfarland
      @kdillmcfarland  2 года назад +1

      Thanks!
      I have other R workshop videos ruclips.net/p/PL_Oo8UFoIb007lGeg78awOu44Ido35zsY
      with materials for those and other workshops that don't have videos at
      github.com/BIGslu/workshops and github.com/hawn-lab/workshops_UW_Seattle

  • @tonkatsuburger3531
    @tonkatsuburger3531 3 года назад +1

    Thank you so much this was so helpful!

  • @hamidnikbakht1295
    @hamidnikbakht1295 3 года назад +3

    Thank you for the very clear explanations. One question is that for the purpose of GSEA (either simple or gsea), what type of normalization of the counts should one use? Or does it even matter? If so, how would it be different between the two methods? Thank you!

    • @kdillmcfarland
      @kdillmcfarland  2 года назад +2

      For RNAseq GSEA, we use fold changes calculated from TMM normalized log2 counts per million (see limma package tutorial) or estimates output by whatever linear model we ran. In essence, whatever data normalization needs to be done for stats should also be done before calculating fold changes for GSEA.
      For simple enrichment, it's similar. Treat the data however is best for statistical tests. Then find significant genes from those tests and input those gene lists into enrichment

  • @pragatigupta8999
    @pragatigupta8999 2 года назад

    HELLO, How can we add gene count and pvalue in same histogram by using clusterprofiler package of R?

    • @kdillmcfarland
      @kdillmcfarland  2 года назад

      Do you mean the FDR by count histograms around 35min? You can add the total # of genes (count) to the top of each bar in a histogram with
      stat_bin(geom="text", aes(label=..count..))
      And to plot Pvalue, I would make a new plot with x=Pval instead of x=FDR