Single cell analysis in python with Scanpy
HTML-код
- Опубликовано: 29 янв 2022
- Scanpy is becoming one of the most widely used single cell analysis packages. Here I go over the basics of preprocessing and clustering and also show you around the scanpy anndata object. I also cover important concepts not covered in the scanpy tutorial, like how to find which cells are positive for a gene/virus.
Scanpy has many advantages over Seurat. The biggest being that it is in Python. It is also faster and integrates better with machine learning tools which are more developed in python.
Notebook:
github.com/mousepixels/sanbom...
These are the tutorials that deserve millions of likes. A well-detailed tutorial is so rare on the internet although much needed. Godspeed my friend!
Thanks! If you liked this I would check out my "comprehensive" hour long video that goes into much more depth
I wonder what are the steps before this. I mean, how we generate the .mtx files.
@@jorge1869 Mtx files are output of Cellranger count/aggr
@@sanbomics I watched that video and it starts from .mtx files. I would like to see from the start with the fastqc files to the final analysis with scanpy. It would be great, huge. The complete workflow.
@@jorge1869 Yes I agree. @sanbomics
Extremely helpful, thank you !!
No problem!
Quality content!
Thanks!
Great explanation
thanks!
Thank you for this video
You are welcome!
thanks, great video!
Glad you liked it!
Very nice
Thanks alot for the tutorial video. Just wondering if there are any ways in python or R to analyze bacterial genome sequence ? Would be highly appreciated if you could help
There are many ways, but it depends on what you are trying to do.
Is there a way to display value counts, i.e. number of cells on the UMAP? I know you can display annotations on the UMAP by using ** legend_loc='on data' ** But I can't seem to find a way to display the number of cells next to/under the cluster names 🤔
Yup! I haven't done this exactly, but you can probably access the matplotlib text objects from the plot directly and update their text to include the number of cells in that cluster. Alternatively, and potentially easier if you are new to matplotlib, you can just add text directly to to the plot with matplotlib.pyplot.text() and just adjust them manually to your desired position. Of course the former is more automated and elegant, but the latter works too if you only plan to do it a few times and don't mind the manual step.
Thank you so much for videos. What dataset are you using for this? It is hard for me to find the dataset that you are using in this clip.
This was my own data. This video is getting pretty outdated. I would recommend my complete single-cell tutorial instead. That also uses public data which you can download to follow along
@@sanbomics got it. I wanted to replicate your graph. Thanks though. Thanks to your all clips, I have learnt a lot. Happy Christmas
I am wondering if your videos can be applied to the more common bulk RNA databases such as TCGA
Which videos/parts in particular?
Great Vid, Im new to the channel but you have helped me a ton already. Quick question... When I load my anndata object from the tutorial file. I have:
AnnData object with n_obs × n_vars = 2700 × 32738
var: 'gene_ids'
instead of :AnnData object with n_obs × n_vars = 8845 × 36602
var: 'gene_ids', 'feature_types'
any advice? thanks in advance!
Is 8845 × 36602 what I had? That is fine, it just means your sample has 2700 cells and 32738 genes. Don't worry!
@@sanbomics Thanks for the update! Love the Channel. Could you give some advice on how to subcluster on a particular cluster? I would like to analyze the difference between say cluster 0 and itself, not against all clusters. Thanks for the help your awesome!
Hello, to access the dataset do i have to do : tar -xzf pbmc3k_filtered_gene_bc_matrices.tar.gz
or is it a another file ?
that looks like one that should work. What is the output after tar?
Do Scanpy + SCVI tutorial please!
I shall keep that in mind for a future video!
Thank you so much! Your videos are really helpful to navigate the world of scRNA
I am glad the videos are helpful for you! Thanks!
Hey. I am getting an attribute error in my code while finding the marker gene.
I am at the last step where I am making a data frame for the marker genes. The error is coming while finding the zika index as per your code. The line is:
mark_i = np.where(adata.raw.var_names == 'FAM83D')[0][0]
mark_i
adata.raw.X.toarray()[:, mark_i]
The attribute error is coming in the last step because my adata.raw.X is not a sparse matrix. The dimensions are already there somehow. Can you tell me how to proceed?
Hi, sorry for the late reply. You can just remove the .toarray() part because this is what converts the sparse to dense
Do you know where I would be able to find scRNA-seq FASTQs for healthy human lung tissue other than Tabula Sapiens?
Most studies that do scRNAseq will have a control condition. So you can look for disease datasets and just take the control samples. For example, there are a lot of COVID datasets. You can check out the one I use in my 1+ hour SC video
@@sanbomics Perfect. Thanks so much for the input, I really appreciate it!
Interesting to see that XIST shows downregulation in Zika group... 16:16
Nice catch! Very interesting... These data aren't from one of my ongoing projects or I would have liked to explore that more!
Thank you for this video! Would you please made a video that compare scRNA-seq data between non-diseased vs diseased sample using scanpy? I appreciate that!
Hi. The most recent scRNA video (the long one) I have lethal covid and healthy samples. Albeit, i don't do too much comparison between the two.
@@sanbomics All videos you made are super helpful and concise.
I have tried to load the three files in sc and I got an error: Keyerror: 2, any idea?
Can you put the line of code you used here? It is 10x cellranger output?
@@sanbomics Those are the files that I have:
GSM3577886_late_KPC_barcodes.tsv.gz
GSM3577886_late_KPC_features.tsv.gz
GSM3577886_late_KPC_matrix.mtx.gz
This is my code: adata=sc.read_10x_mtx('./', prefix='GSM3577886_late_KPC_',var_names='gene_symbols', cache=True )
Thanks!
Any suggestion?
Hmm.. it says I have 3 replies but I only see 2. Did you comment the code? Can you make it an issue on my github?
Would Scanpy be good for analyzing copy number variant data (presented in similar matrix format: cell by gene)
Hmm, what exactly are the data?
@@sanbomics I don't understand why the comment is not showing up whenever I refresh the page, but the cnv data I am thinking about using is the 10X genomics' aggregated tnbc cnv dataset (on their website).