Label single-cells automatically in python | scVI label transfer

Поделиться
HTML-код
  • Опубликовано: 9 июл 2022
  • I use scVI to label the cell type of my input data using a reference dataset. My input data are mouse lung cells in adata format and I use the tabula muris senis dataset as the reference. I also show how to download a human dataset. These datasets contain cells from many tissues.
    Notebook:
    github.com/mousepixels/sanbom...
  • НаукаНаука

Комментарии • 28

  • @nooooor12
    @nooooor12 5 месяцев назад

    this channel is the reason I'm still sane while trying to learn how to deal with single-cell RNA data. Thanks!

  • @samkindl844
    @samkindl844 2 года назад

    Thanks for the tutorial! A lot clearer than the GitHub tutorial

  • @ramielshazli7329
    @ramielshazli7329 3 месяца назад

    you are perfect thank you

  • @suryakoturan7832
    @suryakoturan7832 Год назад

    Hi, thanks for everything you're doing for the sc-community!
    I had a question regarding the label transfer- when you merge multiple datasets, and then use the ref data and follow the scVI label transfer method as shown in this tutorial, does it also correct for batch? Or do you reckon we should use another tool to integrate and batch correct, and then do the label transfer?
    Or simply put, does 'label transfer' = 'batch correction' ?

    • @sanbomics
      @sanbomics  Год назад +1

      In this example I train an SCVI model that can be used for integration, but I do not use it for integration. You can use the SCVI embedings for batch correction but you don't have to. If you need to do batch correction anyways I would just recommend using SCVI for everything if you plan to use it for label transfer. It will also give you normalized counts, etc if you want to do differential expression. My long 1 hour video goes into this in depth

    • @suryakoturan7832
      @suryakoturan7832 Год назад

      @@sanbomics Thank you! I will give this a go as you suggested
      Can scVI also be used for velocyto outs with spliced and unspliced counts? I would like to eventually use CellRank- but still working out issues with integration, batch and labelling

    • @sanbomics
      @sanbomics  Год назад

      That's an interesting question but I don't know the answer. Would you want a normalized count matrix of spliced/unspliced counts? So for every gene you actually have two features? Maybe possible, but it might require a little hacking

    • @suryakoturan7832
      @suryakoturan7832 Год назад

      @@sanbomics Yes, we would need to normalise both spliced/unspliced counts..not sure if this is the right way or if it's even possible. I basically want integrated, batch corrected and labelled outs from velocyto that I can use for scVelo and CellRank 🙃

  • @yauheniyazhdanovich5725
    @yauheniyazhdanovich5725 Год назад

    Thanks for the tutorial! What is the correct way to label single-cells automatically if I have multiple samples. Should I label each sample separately like in this tutorial, and process them (incl. doublets removal) and perform the integration after that? Or should I firstly perform all the processing and integration steps like in your "Complete single-cell RNAseq analysis walkthrough ", and then label single-cells automatically (instead of manual mapping)?

    • @sanbomics
      @sanbomics  Год назад

      Good question. I think you could do it either way since the integration is independent form the labeling. There might be clusters with a few cells labeled differently, so after integration you could set something up to call a consensus for a given cluster.

  • @mst63th
    @mst63th 2 года назад +1

    Interesting approach for cell labeling. I'm working with the mouse brain's data generated from the droplet-based method (10X genomics); it seems no Tabula Muris data is available for the mouse brain using the droplet method.

    • @sanbomics
      @sanbomics  2 года назад

      Check the facs data, I know they have that in TMS. It doesn't matter if the methods are different, it should still transfer the labels correctly. Just make sure the cell types are similar to what you expect in your data because it will try to predict a label for each cell even if that cell didn't exist in the reference. Personally, I usually use this method to confirm my manual annotation. Unless I am doing some high-throughput or automated thing then I run it and go with the predictions (or if I am feeling lazy). If I have one small dataset that I will do a lot of analysis with, I like to confirm annotations at least 2 ways.

    • @mst63th
      @mst63th 2 года назад +1

      @@sanbomics Thanks for the comprehensive explanation. I tried facs data and it worked.

    • @sanbomics
      @sanbomics  2 года назад +1

      Great. I just used this method to label around 1 million cells this week. But i set up a little function to check the accuracy of each cluster then removed the labels from inaccurate ones. Still had to do a little manual work labeling those but it did great on ~90% of the clusters. Issue always came from the cell type being in the query but not the reference.

    • @mst63th
      @mst63th 2 года назад

      @@sanbomics Exactly, I got your point; I also annotated my data manually, but this method surprised me cause I also got 90% accuracy which is noticeable.

  • @HimanshuChintalapudi
    @HimanshuChintalapudi Год назад

    Excellent tutorial, what's the reason you ran 'sc.pp.highly_variable_genes' with flavor 'seurat_v3' and normalized "counts" instead of raw counts?

    • @sanbomics
      @sanbomics  Год назад

      Thanks! The sc.pp.highly_variable_gene is using the raw counts from the 'counts' layer. To run on raw counts you need to use the flavor 'seurat_v3'. The count normalization i did before that goes into the X slot and isn't necessary or used in the tutorial--it is just often that you might need it in downstream analysis so i threw it in there (side note: dont use target_sum anymore as that is not recommended)

  • @daniel98carvalho
    @daniel98carvalho Год назад

    Trying to label my clusters in tSNE which I could do manually; however, I found this video and it seems much better. I am on Mac Silicon, and so installed Homebrew and scvi-tools that way. When trying to run the scvi.model method, it says that it is not an attribute of the scvi package. Any tips on how to solve this?

    • @sanbomics
      @sanbomics  Год назад +1

      I recommend setting up a python virtual environment with miniconda. Also, its great and fast... but only as good as your reference dataset. If your reference dataset has different cell types than your data, you will get bad transfer. Good to always double check after. I like using this method to inform the manual labels I should use (unless I am trying to do a completely automated workflow, which is atypical for standard single-cell analysis)

  • @chaoma8150
    @chaoma8150 Год назад

    Really appreciate these series of tutorial of yours, they help me out a lot. I want to do paga after concatenating 83k cells of both periphereal blood and bone marrow. I tried to follow up the walkthrough video, but I found it very hard to manually annotate the clusters. So I tried celltypist and label transfer with scvi-tools here. With celltypist, I got very few majority voting annotations, however, I got very messy mapping with label transfer from Tabula Sapiens Single-Cell Dataset using scvi-tools. Both celltypist and label transfer gave clusters that overlaps a lot, no clear boaders. I'm not quite sure about how to proceed. Also, if I do majority voting using celltypist against transferred labels, only 2 or 4 types of cells can be annotated. Would you please help telling me how to check the steps that I can refine. (already tried define high variable genes with scanpy and scry)

    • @sanbomics
      @sanbomics  Год назад

      If there isn't a good reference that corresponds to the same cells in your dataset then label transfer can be a bad idea. Of course, I don't know what your data are, but this is usually the most common culprit. Secondly, we sometimes put too much emphasis on clustering--It is not always the best determinant of cell heterogeneity. Sometimes people use a cluster-free approach to identifying cell populations, like from one of these label transfer methods. If you know the reference corresponds to your data well and the cell annotations aren't completely mixed together randomly in UMAP space, then you could theoretically just go with the label transfer classifications.

    • @chaoma8150
      @chaoma8150 Год назад

      @@sanbomics Thanks for your reply😊. I work on a set of published microwell-seq data of bone marrow and peripheral blood sample. And, I concatenated them. I did label transfer using tabula-sapiens bone marrow and blood as reference.Can I assume the method is the problem, since tabula uses 10X and smartseq2, and microwell-seq data is of low quality as I see? Maybe, I should try scanvi and celltypist again. Thanks, again

  • @bioinfo3
    @bioinfo3 Год назад

    Why do you read in the raw data again after training the model? adata is already there. If someone wanted to integrate a bunch of samples first and then do this label transfer, what approach should be taken then?

    • @sanbomics
      @sanbomics  Год назад

      Various downstream analyses require the raw data. In addition to the raw log counts I now always keep the true raw count data as well

  • @benmoss8905
    @benmoss8905 Год назад

    thank you for your videos. they are incredibly helpful.
    I am stuck with this error:
    scvi.model.SCVI.setup_anndata(adata, layer = 'counts', batch_key='batch')
    vae = scvi.model.SCVI(adata)
    vae.train()
    AttributeError: module 'scvi' has no attribute 'model'
    I also get a warning that scvi was deprecated so I tried with scvi-tools, but get the same error. does your notebook still work for scv-tools? thank you again!

    • @sanbomics
      @sanbomics  Год назад

      Did you set up the model before with scvi.model.SCVI.setup_anndata?

  • @huntersmith1413
    @huntersmith1413 4 месяца назад

    Im confused, with what data did the model make the prediction?