Single-cell gene co-expression | single-cell RNAseq methods

Поделиться
HTML-код
  • Опубликовано: 21 авг 2022
  • How to test if two genes are co-expressed in cells OR find all the genes that are co-expressed with any gene. I do this with Pearson's correlation testing using an anndata python single-cell object.
    Notebook:
    github.com/mousepixels/sanbom...
  • НаукаНаука

Комментарии • 38

  • @mst63th
    @mst63th Год назад +1

    I'm afraid I didn't get what's the purpose of this test. Would you please briefly explain?

    • @sanbomics
      @sanbomics  Год назад +2

      For two genes, is their expression correlated? In the video the example gene I use is Zika RNA expression: so which genes increase or decrease in cells with zika. But for two endogenous genes--if they are correlated then they are likely affected by the same pathways/treatments/etc. They are likely in the same co-expression networks. An example: you are interested in p21 in your dataset. You then find p21 is correlated to several dna damage genes, suggesting the p21 expression in your cells is driven by DNA damage.

    • @mst63th
      @mst63th Год назад

      @@sanbomics Thanks for the explanation. Is it possible to use this approach equivalent to differential express gene analysis?

    • @sanbomics
      @sanbomics  Год назад +1

      They might give you some overlap, but they are fundamentally different. Two genes may be upregulated in a subpopulation of cells but not be correlated, suggesting their regulation is from different genetic programs. Also the case for negatively correlated genes. Even in a given cluster there is heterogeneity, which allows you to find these correlations.

    • @mst63th
      @mst63th Год назад

      @@sanbomics Thank you so much for explaining in detail.

    • @sanbomics
      @sanbomics  Год назад +1

      Happy to do so!

  • @mocabeentrill
    @mocabeentrill Год назад

    Nice one! I completed coexpression networks in R last night. Keen to see your rendition🙏🏿

    • @sanbomics
      @sanbomics  Год назад

      Thanks! I was going to do networks originally but then I realized the video was going to be way too long haha. So I decided to keep it simple and do that in the future

  • @uvdorian4722
    @uvdorian4722 Год назад

    Looking forward to the expression network video!

    • @sanbomics
      @sanbomics  Год назад

      If i ever get around to it xD

  • @yingyingliu308
    @yingyingliu308 Год назад +1

    Thanks for the video! I'm wondering is there any R package that can do gene co-expression analysis of single cell RNA-seq data?

  • @TakaTatsumi
    @TakaTatsumi 11 месяцев назад +1

    Thanks so much! I have a question about data preparation.
    In this video you used normalized data. However, for example, the scatterplot of the two genes resulted in a localized distribution at the bottom without log1p processed data.
    Is it possible to do a correlation analysis with the data processed by log1p?

    • @sanbomics
      @sanbomics  11 месяцев назад

      You actually caught a mistake. I should have log1p'd it. Nice catch! (also nowadays it is recommended not to pass target_sum in the normalization, just use default)

    • @TakaTatsumi
      @TakaTatsumi 11 месяцев назад

      ​@@sanbomics
      Thanks! I'm gonna analyze it with log1p.
      I've watched almost all of your videos. I look forward to your future videos!

    • @CarolinaDossena
      @CarolinaDossena 9 месяцев назад

      @@sanbomics Hi, thank you for the nice contents! Could you please point me to the source where these recommendations are discussed? I'd like to learn more about the best practices for count normalization. In the past, I've also set target_sum to 1e4, so this is an interesting change for me.

  • @michela.dallangelo
    @michela.dallangelo Год назад

    Hi, thank you for the video! I was wondering how the same analysis could be perform by considering the cell types :) I'm testing the co-expression of two genes but I would like to see how they are correlated in each cell type. Have you tried it?

    • @sanbomics
      @sanbomics  Год назад

      Sure, you should just be able to subset the data by the cells you want to test!

    • @michela.dallangelo
      @michela.dallangelo Год назад

      @@sanbomics thank you for the answer, it's what I did eventually! :) Also, when you plot the -log10 pval you speak about the most co-expressed genes with zika. Should you consider also the correlation r value to say if those genes are really co-expresed? My doubt is: can r be negative and two genes co-expressed? Because I was thinking I should consider both the value of r for a positive correlation and the p value (or log10 of it) to see if that correlation is significant.

  • @vjsanchezarevalo
    @vjsanchezarevalo Год назад

    What do you recommend to do with inf values? I have several in my analysis, in fact the first five are inf. I don't know how to represent those results graphically. Thanks!

    • @sanbomics
      @sanbomics  Год назад

      Which value is inf?

    • @vjsanchezarevalo
      @vjsanchezarevalo Год назад

      @@sanbomics After the Bon Ferroni correction I have inf values, in your example zika gene is inf. I have several genes with inf values. What should I do? Thanks!

    • @sanbomics
      @sanbomics  Год назад +1

      Oh, you mean the -log10 pvalue? You can leave it as inf or you can replace the 0s with 2.2x10-308 if the inf messes up your downstream analysis

  • @justinasmus1190
    @justinasmus1190 Год назад

    Long time lurker and great content by the way, I have referred a few other students to your videos already too. I actually wanted to know if I could ask your opinion on a potentially involved issue I am having with datasets I am analyzing, seeing as you have far more experience in the field than me and I am not having much luck with my supervisor. Please let me know if this would be something you'd be alright with and again great videos!

    • @sanbomics
      @sanbomics  Год назад

      Thanks! Sure, what is the issue?

    • @justinasmus1190
      @justinasmus1190 Год назад

      @@sanbomics So it is quite involved, but basically 1 of the datasets I am working on were sequenced and processed by a private sequencing company . So they give a report of the steps taken on the data in terms of pre-processing, but I am also analyzing another dataset where all these steps were not performed, because I processed that data just using freely available software and my limited knowledge of transcriptomic analysis. So the dataset from the company is a paired-end library generated by Illumina but the dataset I analysed is a single read library that was generated by Ion Torrent Proton platform by a different company. I guess the issue I am having is that I want to compare these two datasets as they both did the same concept of experiment, but I am worried that I need to pre-process them the same as well but I am just not sure how perform all the steps on the single read data that the company performed on the paired-end data as once again I am not that advanced at all. Another thing is one of the replicates in a third dataset that I also need to include, which was also sequenced and processed by the company that generated the paired-end libraries has 1 of the samples contaminated so there are duplicates of the treatment condition but only a single repeat of the control condition. I know you can use NOIseq for situations like that, but is there maybe a GUI program I can make use of as getting it right in R doesn't seem to be happening for me. Any advice you maybe have would be greatly appreciated!

    • @sanbomics
      @sanbomics  Год назад

      1) These are single-cell or bulk data? 2) Do you have the counts tables for each, or do you need to generate the counts tables?

    • @justinasmus1190
      @justinasmus1190 Год назад

      Whole transcriptome datasets of organisms grown under control conditions and then under test condition (in this case test is co-culture). I have the counts tables of all datasets, but not sure if the pre-processing before getting there would have affected being able to compare them given that this will affect the number of genes left over for downstream analyses

    • @sanbomics
      @sanbomics  Год назад

      Usually there isn't much preprocessing done in bulk RNAseq. It's recommended to not even remove duplicates for example. When you set up the experiment in Deseq2 you will be able to include a batch variable that will hopefully correct for some of the differences. I would worry more if different genome builds were used because gene names might not match up 100% etc. If you have access to the raw data, it might be best for you just to rerun it all. It's pretty simple if you have access to a linux machine, I have videos that go over it in depth.

  • @user-vw2bp7ev4o
    @user-vw2bp7ev4o Год назад

    Thanks for ur sharing!Would you please explain how to disdinguish 2 co-express genes between cancer cells and other healthy cells. Is that workable?thanks again~

    • @sanbomics
      @sanbomics  Год назад

      I'm not 100% if I get the question. Do you know which cells are cancer?

  • @jamilaiqbal202
    @jamilaiqbal202 3 месяца назад

    How is this different from pathway analysis

  • @ishas.7508
    @ishas.7508 Год назад

    from where can we get this dataset?

    • @sanbomics
      @sanbomics  Год назад

      Sorry, this dataset is not publicly available yet :(