Single cell analysis in python with Scanpy

Поделиться
HTML-код
  • Опубликовано: 29 янв 2022
  • Scanpy is becoming one of the most widely used single cell analysis packages. Here I go over the basics of preprocessing and clustering and also show you around the scanpy anndata object. I also cover important concepts not covered in the scanpy tutorial, like how to find which cells are positive for a gene/virus.
    Scanpy has many advantages over Seurat. The biggest being that it is in Python. It is also faster and integrates better with machine learning tools which are more developed in python.
    Notebook:
    github.com/mousepixels/sanbom...

Комментарии • 54

  • @tousifazmain6205
    @tousifazmain6205 Год назад +1

    These are the tutorials that deserve millions of likes. A well-detailed tutorial is so rare on the internet although much needed. Godspeed my friend!

    • @sanbomics
      @sanbomics  Год назад +1

      Thanks! If you liked this I would check out my "comprehensive" hour long video that goes into much more depth

    • @jorge1869
      @jorge1869 Год назад

      I wonder what are the steps before this. I mean, how we generate the .mtx files.

    • @tousifazmain6205
      @tousifazmain6205 Год назад

      @@jorge1869 Mtx files are output of Cellranger count/aggr

    • @jorge1869
      @jorge1869 Год назад +1

      @@sanbomics I watched that video and it starts from .mtx files. I would like to see from the start with the fastqc files to the final analysis with scanpy. It would be great, huge. The complete workflow.

    • @tousifazmain6205
      @tousifazmain6205 11 месяцев назад

      @@jorge1869 Yes I agree. @sanbomics

  • @abdelrahmanabdelhadi4195
    @abdelrahmanabdelhadi4195 Год назад

    Extremely helpful, thank you !!

  • @ianik
    @ianik 2 года назад +1

    Quality content!

  • @user-ry8fm7jq4y
    @user-ry8fm7jq4y Год назад

    Great explanation

  • @acorndaydreams3706
    @acorndaydreams3706 2 года назад +1

    Thank you for this video

  • @ohohjournal5828
    @ohohjournal5828 2 года назад

    thanks, great video!

  • @hypergamer1078
    @hypergamer1078 11 месяцев назад

    Very nice

  • @tenzintseten8928
    @tenzintseten8928 11 месяцев назад

    Thanks alot for the tutorial video. Just wondering if there are any ways in python or R to analyze bacterial genome sequence ? Would be highly appreciated if you could help

    • @sanbomics
      @sanbomics  10 месяцев назад

      There are many ways, but it depends on what you are trying to do.

  • @blackmatti86
    @blackmatti86 Год назад +1

    Is there a way to display value counts, i.e. number of cells on the UMAP? I know you can display annotations on the UMAP by using ** legend_loc='on data' ** But I can't seem to find a way to display the number of cells next to/under the cluster names 🤔

    • @sanbomics
      @sanbomics  Год назад +1

      Yup! I haven't done this exactly, but you can probably access the matplotlib text objects from the plot directly and update their text to include the number of cells in that cluster. Alternatively, and potentially easier if you are new to matplotlib, you can just add text directly to to the plot with matplotlib.pyplot.text() and just adjust them manually to your desired position. Of course the former is more automated and elegant, but the latter works too if you only plan to do it a few times and don't mind the manual step.

  • @bnb7462
    @bnb7462 7 месяцев назад

    Thank you so much for videos. What dataset are you using for this? It is hard for me to find the dataset that you are using in this clip.

    • @sanbomics
      @sanbomics  7 месяцев назад +1

      This was my own data. This video is getting pretty outdated. I would recommend my complete single-cell tutorial instead. That also uses public data which you can download to follow along

    • @bnb7462
      @bnb7462 7 месяцев назад

      @@sanbomics got it. I wanted to replicate your graph. Thanks though. Thanks to your all clips, I have learnt a lot. Happy Christmas

  • @antoniogiuseppefaietalasar6849

    I am wondering if your videos can be applied to the more common bulk RNA databases such as TCGA

    • @sanbomics
      @sanbomics  Год назад

      Which videos/parts in particular?

  • @saxtoncruz6128
    @saxtoncruz6128 Год назад

    Great Vid, Im new to the channel but you have helped me a ton already. Quick question... When I load my anndata object from the tutorial file. I have:
    AnnData object with n_obs × n_vars = 2700 × 32738
    var: 'gene_ids'
    instead of :AnnData object with n_obs × n_vars = 8845 × 36602
    var: 'gene_ids', 'feature_types'
    any advice? thanks in advance!

    • @sanbomics
      @sanbomics  Год назад +2

      Is 8845 × 36602 what I had? That is fine, it just means your sample has 2700 cells and 32738 genes. Don't worry!

    • @saxtoncruz6128
      @saxtoncruz6128 Год назад

      @@sanbomics Thanks for the update! Love the Channel. Could you give some advice on how to subcluster on a particular cluster? I would like to analyze the difference between say cluster 0 and itself, not against all clusters. Thanks for the help your awesome!

  • @amiel954
    @amiel954 Год назад

    Hello, to access the dataset do i have to do : tar -xzf pbmc3k_filtered_gene_bc_matrices.tar.gz
    or is it a another file ?

    • @sanbomics
      @sanbomics  Год назад

      that looks like one that should work. What is the output after tar?

  • @musedmoments
    @musedmoments 2 года назад +1

    Do Scanpy + SCVI tutorial please!

    • @sanbomics
      @sanbomics  2 года назад +1

      I shall keep that in mind for a future video!

  • @Phoenixpapagei
    @Phoenixpapagei 2 года назад

    Thank you so much! Your videos are really helpful to navigate the world of scRNA

    • @sanbomics
      @sanbomics  2 года назад

      I am glad the videos are helpful for you! Thanks!

  • @DipakKumar-de7gz
    @DipakKumar-de7gz Год назад

    Hey. I am getting an attribute error in my code while finding the marker gene.
    I am at the last step where I am making a data frame for the marker genes. The error is coming while finding the zika index as per your code. The line is:
    mark_i = np.where(adata.raw.var_names == 'FAM83D')[0][0]
    mark_i
    adata.raw.X.toarray()[:, mark_i]
    The attribute error is coming in the last step because my adata.raw.X is not a sparse matrix. The dimensions are already there somehow. Can you tell me how to proceed?

    • @sanbomics
      @sanbomics  Год назад

      Hi, sorry for the late reply. You can just remove the .toarray() part because this is what converts the sparse to dense

  • @daniel98carvalho
    @daniel98carvalho Год назад

    Do you know where I would be able to find scRNA-seq FASTQs for healthy human lung tissue other than Tabula Sapiens?

    • @sanbomics
      @sanbomics  Год назад +1

      Most studies that do scRNAseq will have a control condition. So you can look for disease datasets and just take the control samples. For example, there are a lot of COVID datasets. You can check out the one I use in my 1+ hour SC video

    • @daniel98carvalho
      @daniel98carvalho Год назад

      @@sanbomics Perfect. Thanks so much for the input, I really appreciate it!

  • @wholu8497
    @wholu8497 2 года назад

    Interesting to see that XIST shows downregulation in Zika group... 16:16

    • @sanbomics
      @sanbomics  2 года назад

      Nice catch! Very interesting... These data aren't from one of my ongoing projects or I would have liked to explore that more!

  • @chrisdoan3210
    @chrisdoan3210 Год назад

    Thank you for this video! Would you please made a video that compare scRNA-seq data between non-diseased vs diseased sample using scanpy? I appreciate that!

    • @sanbomics
      @sanbomics  Год назад

      Hi. The most recent scRNA video (the long one) I have lethal covid and healthy samples. Albeit, i don't do too much comparison between the two.

    • @chrisdoan3210
      @chrisdoan3210 Год назад

      @@sanbomics All videos you made are super helpful and concise.

  • @vjsanchezarevalo
    @vjsanchezarevalo Год назад

    I have tried to load the three files in sc and I got an error: Keyerror: 2, any idea?

    • @sanbomics
      @sanbomics  Год назад +1

      Can you put the line of code you used here? It is 10x cellranger output?

    • @vjsanchezarevalo
      @vjsanchezarevalo Год назад

      @@sanbomics Those are the files that I have:
      GSM3577886_late_KPC_barcodes.tsv.gz
      GSM3577886_late_KPC_features.tsv.gz
      GSM3577886_late_KPC_matrix.mtx.gz
      This is my code: adata=sc.read_10x_mtx('./', prefix='GSM3577886_late_KPC_',var_names='gene_symbols', cache=True )
      Thanks!

    • @vjsanchezarevalo
      @vjsanchezarevalo Год назад

      Any suggestion?

    • @sanbomics
      @sanbomics  Год назад

      Hmm.. it says I have 3 replies but I only see 2. Did you comment the code? Can you make it an issue on my github?

  • @jxyeee6525
    @jxyeee6525 Год назад

    Would Scanpy be good for analyzing copy number variant data (presented in similar matrix format: cell by gene)

    • @sanbomics
      @sanbomics  Год назад

      Hmm, what exactly are the data?

    • @jxyeee6525
      @jxyeee6525 Год назад

      @@sanbomics I don't understand why the comment is not showing up whenever I refresh the page, but the cnv data I am thinking about using is the 10X genomics' aggregated tnbc cnv dataset (on their website).