@@sanbomics I watched that video and it starts from .mtx files. I would like to see from the start with the fastqc files to the final analysis with scanpy. It would be great, huge. The complete workflow.
Most studies that do scRNAseq will have a control condition. So you can look for disease datasets and just take the control samples. For example, there are a lot of COVID datasets. You can check out the one I use in my 1+ hour SC video
Hey. I am getting an attribute error in my code while finding the marker gene. I am at the last step where I am making a data frame for the marker genes. The error is coming while finding the zika index as per your code. The line is: mark_i = np.where(adata.raw.var_names == 'FAM83D')[0][0] mark_i adata.raw.X.toarray()[:, mark_i] The attribute error is coming in the last step because my adata.raw.X is not a sparse matrix. The dimensions are already there somehow. Can you tell me how to proceed?
Is there a way to display value counts, i.e. number of cells on the UMAP? I know you can display annotations on the UMAP by using ** legend_loc='on data' ** But I can't seem to find a way to display the number of cells next to/under the cluster names 🤔
Yup! I haven't done this exactly, but you can probably access the matplotlib text objects from the plot directly and update their text to include the number of cells in that cluster. Alternatively, and potentially easier if you are new to matplotlib, you can just add text directly to to the plot with matplotlib.pyplot.text() and just adjust them manually to your desired position. Of course the former is more automated and elegant, but the latter works too if you only plan to do it a few times and don't mind the manual step.
This was my own data. This video is getting pretty outdated. I would recommend my complete single-cell tutorial instead. That also uses public data which you can download to follow along
@@sanbomics Those are the files that I have: GSM3577886_late_KPC_barcodes.tsv.gz GSM3577886_late_KPC_features.tsv.gz GSM3577886_late_KPC_matrix.mtx.gz This is my code: adata=sc.read_10x_mtx('./', prefix='GSM3577886_late_KPC_',var_names='gene_symbols', cache=True ) Thanks!
Thanks alot for the tutorial video. Just wondering if there are any ways in python or R to analyze bacterial genome sequence ? Would be highly appreciated if you could help
This was in house data. But there are plenty of tutorial datasets easily available like PBMC 3K. I wouldn't recommend following this tutorial anymore though, because it's outdated. I have newer ones
Did this video is made by Preprocessing and clustering 3k PBMCs (legacy workflow)? I didn't sure to use legacy workflow or the newest! Hoping can answer my doubts, thanks a lot!
Great Vid, Im new to the channel but you have helped me a ton already. Quick question... When I load my anndata object from the tutorial file. I have: AnnData object with n_obs × n_vars = 2700 × 32738 var: 'gene_ids' instead of :AnnData object with n_obs × n_vars = 8845 × 36602 var: 'gene_ids', 'feature_types' any advice? thanks in advance!
@@sanbomics Thanks for the update! Love the Channel. Could you give some advice on how to subcluster on a particular cluster? I would like to analyze the difference between say cluster 0 and itself, not against all clusters. Thanks for the help your awesome!
Thank you for this video! Would you please made a video that compare scRNA-seq data between non-diseased vs diseased sample using scanpy? I appreciate that!
@@sanbomics I don't understand why the comment is not showing up whenever I refresh the page, but the cnv data I am thinking about using is the 10X genomics' aggregated tnbc cnv dataset (on their website).
These are the tutorials that deserve millions of likes. A well-detailed tutorial is so rare on the internet although much needed. Godspeed my friend!
Thanks! If you liked this I would check out my "comprehensive" hour long video that goes into much more depth
I wonder what are the steps before this. I mean, how we generate the .mtx files.
@@jorge1869 Mtx files are output of Cellranger count/aggr
@@sanbomics I watched that video and it starts from .mtx files. I would like to see from the start with the fastqc files to the final analysis with scanpy. It would be great, huge. The complete workflow.
@@jorge1869 Yes I agree. @sanbomics
so how do we decide that n_neighbors=10 during clustering. do you think that I should change it if I am analyzing tissue cells
Extremely helpful, thank you !!
No problem!
Quality content!
Thanks!
Great explanation
thanks!
Do you know where I would be able to find scRNA-seq FASTQs for healthy human lung tissue other than Tabula Sapiens?
Most studies that do scRNAseq will have a control condition. So you can look for disease datasets and just take the control samples. For example, there are a lot of COVID datasets. You can check out the one I use in my 1+ hour SC video
@@sanbomics Perfect. Thanks so much for the input, I really appreciate it!
Thank you for this video
You are welcome!
Thank you so much! Your videos are really helpful to navigate the world of scRNA
I am glad the videos are helpful for you! Thanks!
Hey. I am getting an attribute error in my code while finding the marker gene.
I am at the last step where I am making a data frame for the marker genes. The error is coming while finding the zika index as per your code. The line is:
mark_i = np.where(adata.raw.var_names == 'FAM83D')[0][0]
mark_i
adata.raw.X.toarray()[:, mark_i]
The attribute error is coming in the last step because my adata.raw.X is not a sparse matrix. The dimensions are already there somehow. Can you tell me how to proceed?
Hi, sorry for the late reply. You can just remove the .toarray() part because this is what converts the sparse to dense
Is there a way to display value counts, i.e. number of cells on the UMAP? I know you can display annotations on the UMAP by using ** legend_loc='on data' ** But I can't seem to find a way to display the number of cells next to/under the cluster names 🤔
Yup! I haven't done this exactly, but you can probably access the matplotlib text objects from the plot directly and update their text to include the number of cells in that cluster. Alternatively, and potentially easier if you are new to matplotlib, you can just add text directly to to the plot with matplotlib.pyplot.text() and just adjust them manually to your desired position. Of course the former is more automated and elegant, but the latter works too if you only plan to do it a few times and don't mind the manual step.
Hello, to access the dataset do i have to do : tar -xzf pbmc3k_filtered_gene_bc_matrices.tar.gz
or is it a another file ?
that looks like one that should work. What is the output after tar?
Thank you so much for videos. What dataset are you using for this? It is hard for me to find the dataset that you are using in this clip.
This was my own data. This video is getting pretty outdated. I would recommend my complete single-cell tutorial instead. That also uses public data which you can download to follow along
@@sanbomics got it. I wanted to replicate your graph. Thanks though. Thanks to your all clips, I have learnt a lot. Happy Christmas
I have tried to load the three files in sc and I got an error: Keyerror: 2, any idea?
Can you put the line of code you used here? It is 10x cellranger output?
@@sanbomics Those are the files that I have:
GSM3577886_late_KPC_barcodes.tsv.gz
GSM3577886_late_KPC_features.tsv.gz
GSM3577886_late_KPC_matrix.mtx.gz
This is my code: adata=sc.read_10x_mtx('./', prefix='GSM3577886_late_KPC_',var_names='gene_symbols', cache=True )
Thanks!
Any suggestion?
Hmm.. it says I have 3 replies but I only see 2. Did you comment the code? Can you make it an issue on my github?
Thanks alot for the tutorial video. Just wondering if there are any ways in python or R to analyze bacterial genome sequence ? Would be highly appreciated if you could help
There are many ways, but it depends on what you are trying to do.
Which data you used for analysis? Could you please provide the link?
This was in house data. But there are plenty of tutorial datasets easily available like PBMC 3K. I wouldn't recommend following this tutorial anymore though, because it's outdated. I have newer ones
I am wondering if your videos can be applied to the more common bulk RNA databases such as TCGA
Which videos/parts in particular?
thanks, great video!
Glad you liked it!
Did this video is made by Preprocessing and clustering 3k PBMCs (legacy workflow)? I didn't sure to use legacy workflow or the newest! Hoping can answer my doubts, thanks a lot!
This video is very outdated at this point. I would check out some of my newer videos. Things change fast!
@@sanbomics OK, thanks a lot!
Very nice
Interesting to see that XIST shows downregulation in Zika group... 16:16
Nice catch! Very interesting... These data aren't from one of my ongoing projects or I would have liked to explore that more!
Do Scanpy + SCVI tutorial please!
I shall keep that in mind for a future video!
Great Vid, Im new to the channel but you have helped me a ton already. Quick question... When I load my anndata object from the tutorial file. I have:
AnnData object with n_obs × n_vars = 2700 × 32738
var: 'gene_ids'
instead of :AnnData object with n_obs × n_vars = 8845 × 36602
var: 'gene_ids', 'feature_types'
any advice? thanks in advance!
Is 8845 × 36602 what I had? That is fine, it just means your sample has 2700 cells and 32738 genes. Don't worry!
@@sanbomics Thanks for the update! Love the Channel. Could you give some advice on how to subcluster on a particular cluster? I would like to analyze the difference between say cluster 0 and itself, not against all clusters. Thanks for the help your awesome!
Thank you for this video! Would you please made a video that compare scRNA-seq data between non-diseased vs diseased sample using scanpy? I appreciate that!
Hi. The most recent scRNA video (the long one) I have lethal covid and healthy samples. Albeit, i don't do too much comparison between the two.
@@sanbomics All videos you made are super helpful and concise.
Would Scanpy be good for analyzing copy number variant data (presented in similar matrix format: cell by gene)
Hmm, what exactly are the data?
@@sanbomics I don't understand why the comment is not showing up whenever I refresh the page, but the cnv data I am thinking about using is the 10X genomics' aggregated tnbc cnv dataset (on their website).