How to do gene ontology analysis in python

Sanbomics

Просмотров 15 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 12 июл 2024
I show you how to do gene ontology enrichment in python using the goatools package. This is important for those who use scanpy, scvelo, and other python bioinformatics packages. Setting it up for the first time takes a little extra work, which I cover in detail!
Notebook:
github.com/mousepixels/sanbom...
0:00 Intro
0:52 goatools setup
3:25 running goatools
6:55 graphing example
Наука

Комментарии • 28

@subhasismohanty7166 2 года назад
This is indeed a great help to bring everything to Python. Thank you for helping us. Community will benefit a lot through your instructional videos.
@sanbomics 2 года назад
Thank you! I agree. I hope to see more and more transition from R to Python.
@virginiaandrade7149 Год назад ⁺¹
Hi! Thank you so much for this tutorial, it's super useful for what I need!! I have two questions, if you don't mind helping :) How do you calculate the p and p_corr and what do they mean? Do you think there are dictionaries with more general go terms, like metabolism, cell division, migration, or do I have to make a dictionary p.e called metabolism that includes all the metabolic go terms? Thank you again!!
@sanbomics Год назад ⁺¹
No problem! GO has multiple levels. It's like a tree with more specific terms branching out from more broad terms. You can filter the results however you want, e.g., to include only the broad terms. P value is from a fishers exact test. The package does that automatically. p_corr is just a multiple test correction, not sure off the top of my head which one they use but it is probably a BH. If you are interested in how to do p value correction i have a video that goes over just that.
@sarapatti8706 5 месяцев назад
Love all your tutorials! I’m stuck at the section where you do the NCB gene results to python. Did they change the code by chance because I can’t find that python file.
@aleksanderpurik5969 3 месяца назад
run python:
import goatools
goatools.__file__
find where the package is, then go to the directory and find 'cli' folder.
run python in cli folder:
import ncbi_gene_results_to_python as n2p
n2p.ncbi_tsv_to_py('gene_result.txt', 'genes_NCBI_7227.py')
@user-ym5vk2qq6v 2 года назад ⁺¹
Thanks for the great instructions and educational effort! I'm actually R user, trying to learn python for bioinformatic analysis. It looks like input genes for the analysis are 'gene symbols'. And I can see your example input genes are not capitalized. Mine is all capitalized and it seems to work. But I'm curious if there is any chance this will not work if the symbols are capitalized.
@sanbomics 2 года назад
Hi this is a good question. Human genes are normally all capitalized and normally only the first letter of mouse genes are capitalized. Python is case sensitive so you will have to make sure they match. If you are new this might be a little challenging, but what you can do is map something like:
df['gene symbols'] = df['gene symbols'].map(lambda x: x.upper())
@sanbomics 2 года назад ⁺¹
But also if you use the human database you won't need to change the capitalization I think
@oliviaringham8706 Год назад
Also, what does "per" mean in the axis? I understand it is number of genes over number of go terms but what does that mean?
@sanbomics Год назад ⁺¹
Percent of genes in the GO term that were in your DE genes.
@oliviaringham8706 Год назад
What would be a way to generate and graph the "log fold enrichment" of the GO term?
@sanbomics Год назад ⁺¹
Im not sure I have an easy answer for this. First you would need something to compare it to. Then you have to decide what value you are comparing. If you do it from the fisher/hypergeometric enrichment then it doesn't take into account the actual log-fold change of the gene itself, just if it was DE. Maybe you use GSEA and the enrichment score. See my other video(s) on GSEA. However, there might be a better answer out there that I am unaware of
@oliviaringham8706 Год назад
@@sanbomics ok! I am just wondering because when you enter a set of genes for GO analysis on PANTHER, one of the column statistics given is "fold enrichment", so it may be a useful stat to add into the function of an output here
@sanbomics Год назад
I misinterpreted your question. You mean log fold over what is expected by random chance? I'm not sure how panther does it, but you can likely do something similar with a hypergeometric distribution (see my hypergeometric video). In my opinion fold enrichment is somewhat redundant to other statistics. If you can find out how panther does it, I can likely give you a better pythonic answer
@Dr.UgurComlekcioglu 8 месяцев назад
This is a great video; thank you very much. I think !mv is for Linux. How can we move the created file to the default import location in Windows?
@vimalaa8721 5 месяцев назад
move
@ezhilgrace9247 4 месяца назад
when I was trying to execute the code "!python C:\Users\anaconda3\Lib\site-packages\goatools\cli
cbi_gene_results_to_python.py -o genes_ncbi.py gene_result.txt" The output file is not created.
@fatimafarhan531 Год назад ⁺¹
so red color indicate positive while blue is negative ?
@sanbomics Год назад
The color represents the significance. These were all upregulated pathways.
@oliviaringham8706 Год назад
how would you prob for genesets that are downregulated?
@sanbomics Год назад ⁺¹
Hi! You should run it on the downregulated genes separately.
@user-wu3ze1nw8y 5 месяцев назад
Nice content to create reproducible codes. Thanks.

Следующие

Автовоспроизведение

Single-cell integration in python with scanpy