Bioinformatics - Gene Ontology (GO) Enrichment Analysis

Поделиться
HTML-код
  • Опубликовано: 9 сен 2024
  • Today we are going to do some gene ontology enrichment analysis and look at what GO terms are enriched from the presence of 53 in our mice that were irradiated. If you find this helpful, please consider liking the video and subscribing!
    We are using the same project file that we set up a few videos ago to analyze our data!!
    GO overview:
    geneontology.or...
    Book Chapter about p53:
    www.sciencedir...
    These videos are intended to be learning material for others as well as myself. I like helping others and hope these videos spur conversation.
    The image at the bottom left of the thumbnail is modified from AllGenetics.EU.
    Please consider contributing to my Patreon where I may do merch, gather ideas for future content, and have further discussions:
    / alexsoupir

Комментарии • 31

  • @jimdurant1192
    @jimdurant1192 3 года назад

    Very helpful! Thank you for doing these!

    • @alexsoupir
      @alexsoupir  3 года назад

      Sure thing! If you have recommendation or requests let me know and I'll see what I can do haha

    • @jimdurant1192
      @jimdurant1192 3 года назад +1

      @@alexsoupir wow, thanks. Big thing is it has been about 3 decades since i took biology. Just getting an overall grasp of the big pictute of transcriptomics has been challenging.

    • @jimdurant1192
      @jimdurant1192 3 года назад +1

      @@alexsoupir i work in chemical risk assessment and am trying to learn toxicogenomics... so your example with p53 was useful. Any other examples along screening for chemical carcinogens would be useful as well.

  • @sanjaisrao484
    @sanjaisrao484 2 года назад

    Thanks

  • @josiknuff5900
    @josiknuff5900 2 года назад +3

    Many thanks for your videos!
    One question : is there a possibility one can get the gene ids / entrez ids belonging to the enriched GO term?
    I get RNA binding as enriched GO term and I would like to access the specific gene entries responsible for this enrichment.

  • @HunterDriguez
    @HunterDriguez 3 года назад +1

    I've been trying to figure out what each column in the GO enrichment output is...is the "Size" column the total number of genes associated with a particular term that are expressed in the tissue? Is the "ExpCount" column the number of genes related to a particular term that you would expect to get by chance in your smaller gene set? Maybe the "Count" column is the actual number of observed genes related to a term that you got in the smaller gene set?

  • @fseiva
    @fseiva Год назад

    Congrats for your videos!! Could you please tell me if there is a way to save the results, for exemple of "upBP", as a data frame?

  • @akmmahmudulhuque9846
    @akmmahmudulhuque9846 3 года назад +2

    Thanks a lot, Alex Soupir.
    These are by far the most detail videos regarding RNAseq data analysis.
    I will recommend this channel to all the beginners at my current institution (GUGC, Korea)
    Could you please suggest anything like that for SNP data analysis as well?

    • @alexsoupir
      @alexsoupir  3 года назад +1

      Hey thanks. I figured it would be a good way to help people get through an analysis as well as get feed back from other people about possibly ways they analyze the data. Science is collaborative and I try to bring that to learning too.

    • @alexsoupir
      @alexsoupir  3 года назад +1

      Forgot to answer your question. I will ask another postdoc on Monday what he uses - he does SNP work with transcription factors so I think he would be good.

    • @akmmahmudulhuque9846
      @akmmahmudulhuque9846 3 года назад +1

      @@alexsoupir Thank you so much for your cooperation. Looking forward to hearing from you.

    • @alexsoupir
      @alexsoupir  3 года назад

      Howdy! I asked my coworker today and he said you can use samtools mpileup? I guess I don't know enough about it to answer more. The manual for samtools says if you're looking for variant calling you should use bcftools mpileup.
      I'll have to do more looking because now I'm interested! Would be fun to explore some cancer sequencing data for SNPs.
      Sorry I'm not able to be more helpful. Possibly in the future I'll try to do this!!

    • @akmmahmudulhuque9846
      @akmmahmudulhuque9846 3 года назад

      @@alexsoupir Hello, Alex! Thank you so much. I will try with samtools and bcftools. Will also be looking for your video tutorials as those provide hands-on training.

  • @jyoti9426
    @jyoti9426 3 года назад

    Me always trying to say 'Alright, everybody!' in the beginning at the same time as he does. XD

  • @arjunkhadka4466
    @arjunkhadka4466 3 года назад +1

    Hi Alex, I do not have Entrez ID coz i am not working with model organisms. The output from the deseq2 has basemean, log2foldchange, lfcSE, stat, pvalue and padj only. Let me know if you can help me.

    • @alexsoupir
      @alexsoupir  3 года назад

      Hey Arjun. Unfortunately in that case, I don't know if I can help. There is a way using BioMart to sorry of make your own as previously mentioned, but it doesn't always have all information. You might be able to read what others have done for non-model organisms if Bioconductor doesn't have your organism at all. Maybe there's a way to convert between organisms?

  • @genticswithkazan
    @genticswithkazan 3 года назад

    Very clear and helpful
    Thank you for your effort
    I have two questions
    1- what is the statistical method in this Analysis
    2 - can you kindly suggest any visualization tool/package for this analysis
    Thank you in advance

    • @alexsoupir
      @alexsoupir  3 года назад

      Howdy!
      So for DESeq2, I believe it is using the Wald Test of each gene to test for difference (Wald Test I believe is a modification of the Chi-sq test). DESeq2 is actually a great package and if you're in for a good read could read the vignette of it which explains everything (likely more than you're interested in). Something interesting I found is that it doesn't perform a normalization past the median ratio method because if you're just going to be comparing individual genes, each sample was mapped to the same gene region and therefore in a sense already baselined. However due to the differences in library sizes (actually sequencing throughput) the 'library' needs to be normalized and this is the median ratio method.
      For visualization, the next video walks through how to visualize but `pathview` and the `gage` packages are useful, and actually make some really nice plots. This can be found here:ruclips.net/video/SMBF4DyRiuo/видео.html

  • @soumyarao8006
    @soumyarao8006 2 года назад

    Hi Alex, I have ensemble gene ids of mouse to start with, many of which lack Entrez IDs. Moreover, the selectGenesUP and selectGenesDown are null/empty even after changing padj to 0.1 and sig_lfc to 2 or 1. What should I do

  • @kyungwonmin7217
    @kyungwonmin7217 Год назад

    Hello. Is it possible to do GO analysis with non-model species?
    I have DESeq data with functional annotation file that contains Ref-seq or GO ID.
    In your video, org data base has been used, but my data cannot match with it.

    • @alexsoupir
      @alexsoupir  8 месяцев назад

      There are ways to build custom 'org.*' data objects, and has been a long time since I have done this. I believe it has to do with the biomaRt or AnnotationDBI package. Would start there and do further searching.

  • @tinacole1450
    @tinacole1450 3 года назад

    Is there a way to find the GO term which has the pathway info after you have gathered the counts data?

  • @gillesirfsh458
    @gillesirfsh458 2 года назад

    very informative... thanks, with one question though. what would be the approach in R, when dealing with non model organisms. ie the ones that are not well annotated

    • @alexsoupir
      @alexsoupir  2 года назад

      Hello, Gilles.
      There is a package that lets you build annotations for other organisms, though I don't know if all of them are in the data base. I believe it's a combination of AnnotationDBI and BioMART? but this would need to be further explored.
      Hope this is helpful!

  • @wmarei
    @wmarei 3 года назад

    When we run the hyperGTest, we do we set the universeGeneIds to the unique genes of res_05 and not all genes in res. I understand that we are trying to calculate the hypergeometric calculation of our subset of genes (Up or down) in total genes detected (without any significance or cut off values). Or do I miss something?

    • @alexsoupir
      @alexsoupir  3 года назад

      That's just how we were taught but a good question. I think it has to do with how we are subsetting the data. So we only want to look at those genes that are significant (hence the initial p-value subsetting), and then we want to find the ones are up or down which we can see using the log fold change (positive vs negative) - but with the log fold change we are also setting a threshold like we want them to be at least doubled for example with an LFC of 1.
      Does this answer your question?

    • @vpeska
      @vpeska 3 года назад

      @@alexsoupir Hi Alex, very helpful videos, I am a bit confused with the p-values. I understood padj subset. What I do not understand is the cutoff in uParams and downParams. What does this p value stand for? When I set it to 0.01 I have empty result. When p=1 I have like 18 genes and their GOs with p-value 0.04-0.95

    • @felipebatalini
      @felipebatalini 3 года назад

      ​@@alexsoupir Thanks for sharing this. The only caveat is that I think the right approach is setting the "universe" to all the genes expressed in the experiment, so in a typical RNAseq experiment, we would use all genes expressed. I would double-check on this.