Это видео недоступно.

Сожалеем об этом.

Weighted Gene Co-expression Network Analysis (WGCNA) Step-by-step Tutorial - Part 1

Bioinformagician

Просмотров 39 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 5 авг 2024
This is part 1 of step-by-step tutorial of Weighted Gene Co-expression Network Analysis (WGCNA).
In this video I demonstrate how to perform Weighted Gene Co-expression Network Analysis (WGCNA) using a RNA-Seq dataset. I go over data manipulation, methods to detect outlier genes and samples in the dataset, normalization, picking soft threshold, identifying modules and visualizing modules as a dendrogram. I hope you find this video helpful! I look forward to your comments in the comment section below!
Part 2 of this tutorial:
• Weighted Gene Co-expre...
Data:
www.ncbi.nlm.nih.gov/geo/quer...
Code:
github.com/kpatel427/RUclipsT...
WGCNA Tutorial:
horvath.genetics.ucla.edu/htm...
Chapters
0:00 Intro
0:40 WGCNA Workflow steps at a glance
1:09 Study Design
1:57 Fetch Data and read data in R
2:56 Get metadata using GEOquery package
5:00 Manipulate expression data
8:53 Quality Control - Remove outlier samples and genes; using goodSampleGenes()
11:27 Detecting outliers using hierarchical clustering
12:22 Detecting outliers using Principal Component Analysis (PCA)
17:16 Data Normalization using vst() from DESeq2 package
20:51 filtering out genes with low counts
22:38 Pick soft threshold
28:48 Identify Modules
31:15 maxBlockSize parameter
33:35 Get module eigengenes
34:34 Visualize modules as dendrogram
You can show your support and encouragement by buying me a coffee:
www.buymeacoffee.com/bioinfor...
To get in touch:
Website: bioinformagician.org/
Github: github.com/kpatel427
Email: khushbu_p@hotmail.com
#bioinformagician #bioinformatics #wgcna #coexpressionnetworks #geneexpression #scalefreenetworks #proteinproteininteractionnetworks #sequencing #coverage #samtools #depthofsequencing #samflag #sam #bam #alignment #phred #fasta #fastq #singlecell #10X #ensembl #biomart #annotationdbi #annotables #affymetrix #microarray #affy #ncbi #genomics #beginners #tutorial #howto #omics #research #biology #GEO #rnaseq #ngs

Комментарии • 99

@waffles764 26 дней назад
You mean I actually get to say I got something done tomorrow at work?! Killer tutorial, thank you so much for this
@mocabeentrill Год назад ⁺⁵
You're so smart! You made it look so easy! It took me 2 full weeks to complete this analysis. Picking the right soft threshold for SFT was helpful for me. I thank you profusely.
@Bioinformagician Год назад ⁺¹
I am glad my video was helpful! Thank you!
@asiyazhao3820 Год назад
This is absolutely AMAZING! GREAT job and Many thanks!
@sonaaritra Год назад
Thanks! Your suggestions were very helpful.
@amarjeetyadav5661 Год назад
thank you very much for making these tutorial videos
@RamakrishnanRS 11 месяцев назад ⁺¹
Great tutorial - I'm working through it slowly. One advice I have is to avoid the tidyverse route when you rename columns. If you used a simple indexed `match(gsub())` call instead of pivoting longer, inner joining then pivoting back wider, you'd not deal with the data at all, just with the vector of colnames. Saves a lot of memory that way.
@nataliagarcia5404 Год назад ⁺¹
amazing!! i was struggling with this
@user-mh7iv1rb9m 2 месяца назад
Your videos are so amazing
@dennisscheper1 2 месяца назад
Excellent. Thank you!
@amaliamurgueitio473 9 месяцев назад ⁺³
Hi, thanks for this tutorial and your other videos. I followed your tutotial step by step, the only difference was when I got to the point were you used 14000 genes, I had to use 7000 for RAM. Now I get this error, any idea how to fix it? plotDendroAndColors(bwnet$dendrograms[[1]], cbind(bwnet$unmergedColors, bwnet$colors),
+ c("unmerged", "merged"),
+ dendroLabels = FALSE,
+ addGuide = TRUE,
+ hang= 0.03,
+ guideHang = 0.05)
Error in .plotOrderedColorSubplot(order = order, colors = colors, rowLabels = rowLabels, :
Length of colors vector not compatible with number of objects in 'order'.
@user-gf4qt9mt4r 8 месяцев назад
I encountered the same error as you, did you solve it?
@hemangininaik0998 Год назад ⁺⁸
Please make a tutorial on WGCNA with TCGA samples.
@bobby5625 Год назад
This would be great! Please make one!
@jaykishansolanki2935 Год назад
Happy teacher's day ma'am Thank you for providing this amazing tutorials that help me a lot 🎉🎉🎉
@Bioinformagician Год назад
I am really glad to hear my videos have been helpful! Thank you!
@PortleyPortions Год назад ⁺¹
The package "janitor" is excellent for cleaning up column names if you do not want to do it manually at 19:20
@saraalidadiani5881 Год назад ⁺¹
Thank you again for an excellent video. May you please explain how we have to choose the numbers for minModuleSize and maxBlockSize in blockwiseModules? thank you in advance, looking forward to hearing from you!
@mocabeentrill Год назад ⁺⁵
Thanks!
@akshayavs3776 Год назад
During my analysis I am getting a lot of gene in the ME0 (they are no in any network) and when I compare to a trait I am getting maximum correlation with this group. But I do also have good correlation with other modules as well. I am tweaking the number of genes to select for the analysis and threshold params. But is there anything I am missing blatantly?
@pariaalipour61 3 месяца назад
Thank you for the amazing video. I was wondering if I want to start from Seurat object of single cell data how should I process the data to follow your tutorial?
@MasMariusb Год назад
Hi, nice video. May I ask why using vst(counts) rather than the actual DESeq2 normalization process?
@learnersseekers904 Год назад ⁺³
can you please make a tutorial video for de novo RNA seq assembly and its annotation
@Bioinformagician Год назад
I will surely plan a video covering this. Thanks for the suggestion!
@mehwishwahid183 11 месяцев назад
very nice video . I have couple of quick questions first 1) is finding the trait module relation compulsary for WGCNA.? if yes then what is a trait file ??means what information should be included in the trait file ?
@) how to find/identify the hub genes after networking modling
@harshitasharma3675 Год назад
Hello ma'am can you please explain how we can download data from GEO and convert the read count values to logfold and p-value
@merajulislam6179 11 месяцев назад
Effective vedio
@harkhabarparnazar4499 День назад
can you please elaborate which data should we use here? raw seq count data from featureCount/htseq-count or the expression data generated from stringtie??
@ps_scholar3407 Год назад
Kindly make a tutorial of GWAS and eQTL analysis.
@namratasahu4247 Год назад
Hello , thanks for the amazing tutorial. But I am getting error after performing WGCNA. Could you please help me out to solve , too few genes with valid expression levels in the required number of samples ?
@grace-426 Месяц назад
Thankyou mam.. I want to know that is it essential to have phenotypic data for ung this in my transcriptomics data?
@freezingtolerance7493 Год назад
Hello, Nice to provide a good video. I just wonder... I got raw data of Rseq, but I do not have a metadata which is not provided from a company generating Rseq data. This means should I make a metadata in person?
@marziyehsalehi2290 6 месяцев назад
It is really helpful, thank you. I have a question, how if the maxBlockSize is 5000? how can I change the rest of code?
@athenanguyen442 Год назад
Thank you so much for this! Do you recommend doing anything different for longitudinal data?
@Bioinformagician Год назад
It depends on the question you are asking. If you are interested in identifying genes that are significantly associated with a particular time point, then building a network for each time point individually would make sense. Otherwise, analyzing them all together would be the right way to go about it.
@sanjaisrao484 Год назад
Thanks for providing link of the tutorial, it was very useful
@sonialamba2767 Год назад
Thank u so much for providing this video...i have a query that, the dataset you have selected has a single text file...but if we have the datset that have multiples text files, then how to deal with it?...please help as I am new to this field...
@Bioinformagician Год назад
Does individual text file represent an individual sample?
@drgutharajasekar6275 11 месяцев назад ⁺³
Error in .plotOrderedColorSubplot(order = order, colors = colors, rowLabels = rowLabels, :
Length of colors vector not compatible with number of objects in 'order'. madam getting this error. please help me.
@stevendlg_ 26 дней назад
running into the same issue just now, did you ever find a solution?
@AAK00419 Год назад
Ma'am In the hclust plot I am getting the height scale as 20,40,60 so is there any parameter to set the height scale as 200000, 600000?
@aytacoksuzoglu2975 Год назад ⁺²
i used maxBlockSize = 7000 and when i tried to plot last dendrogram i got that error.
"Error in .plotOrderedColorSubplot(order = order, colors = colors, rowLabels = rowLabels, :
Length of colors vector not compatible with number of objects in 'order'."
got any idea why ?
@aytacoksuzoglu2975 Год назад
i solved ("i guess") i got low capacity of Ram so it divides the data, when i try to color it. it doesnt cuz i got like 3 part of data but 1 part of color ( for all samples) so it doesnt match. I Figure out that much lets see how can i fix it.
@amaliamurgueitio473 9 месяцев назад
Hi, I have the same issue, may I ask how you fixed it?@@aytacoksuzoglu2975
@suhasinivr5614 Год назад
Hello mam it could be really usefull if you make a video on how to interpret the results(images) obtained from wgcna
@Bioinformagician Год назад
I will surely plan on making a video on this. Thanks for the suggestion :)
@user-oe2qd9oq5i 7 дней назад
use count data to perform PCA?should be normalized data?(TPM or CPM)
@nazifahumaira4762 8 месяцев назад
Hello ma'am, I am facing a problem. In my case, the author provided a normalized count matrix data that have decimal points. Should I work with that one because they did not provide any raw data?
@user-cz4qr4ot9x 8 месяцев назад
the data i download when i read in r it says epmty any solution? the data is a Tar file. i also have it in TXT file when i read it in r using read.table or or read,dilim function it reads it into only to variables like all the details in two columns. i am begginer at R and not good with coding any kind of help will be appreciated.
@user-pz5cb4zx3t Год назад ⁺²
Hi,
Thank you for the informative videos,
due to my ram (4 GB) I had to define '5000' instead of '14000' that you used in one block. as a result I'm having problems in the plotDendroAndColors, which does not show me the merged & unmerged part under the dendrogram. I've searched and I could not find a solution. Do you have any suggestions?
@sarahmohammed515 8 месяцев назад
I have similar issues! Did you figure it out? 😢
@marziyehsalehi2290 6 месяцев назад
The same question
@marziyehsalehi2290 6 месяцев назад
please let me know if you could solve it
@adampassman Год назад
Thank you so much - can you recommend any packages for batch correction?
@Bioinformagician Год назад
you can use ComBat-seq for batch correction
@yipan3694 Год назад
Hi, thanks for your video. It's really helpful! I have a question, however, what is randomSeed and what's the effect of changing it? I see the WGCNA manual also use 54321. What's the difference between that and 1234? Thanks very much.
@Bioinformagician Год назад
Random seed to make the output of our R code reproducible. By setting a specific seed, the random processes in our script always start at the same point and hence lead to the same result. The result will not change if the seed is changed. You might want to set a different seed for your analysis however, to ensure your results are reproducible, you should always use the same seed for the particular analysis.
@yipan3694 Год назад
@@Bioinformagicianokay. Thanks very much.
@divyaagrawal6740 Год назад
Does an equal number or matched Healthy and diseased patients matter for this analysis? Scientifically?
@anithabavikatte192 Год назад
For my analysis ,0.4 is the highest r2 value that I found, so can go with that values to choose power and mean connectivity?
@PoulomiChatterjee-me7oc 4 месяца назад
I was going through the same problem. Check if your expression matrix is in right format. It should have samples in rows and genes in column.
@drgutharajasekar6275 5 месяцев назад
hi madam, the significant genes are of all the modules or only modules assosiated to the trait.
@nataliagarcia5404 Год назад
can you perform WGCNA analysis on a pre-filtered set of differentially expressed genes, in a more downstream analysis approach?
@Bioinformagician Год назад
WGCNA is an unsupervised method. It is NOT recommend to be used on a data that is pre-filtered for differentially expressed genes.
@abdullahaltulea142 Год назад
Thanks for your effort. Do we have to batch correct before Deseq2? I read that Deseq2 does batch correction like this: design = ~ condition + batch.
@Bioinformagician Год назад
Batch effects need to be corrected for before DESeq2. If you have batch information in your colData in a column called "batch", then you could provide it in your design like you mentioned.
@user-ej1lh5wl8f Год назад
@@Bioinformagician If I have done "design = ~condition + batch", then I don't need to use ComBat to remove batch effect?
@saadzaheer3451 3 месяца назад
Hi there, does WGCNA work with TPM values? How should one proceed if all they have is TPM values? Regards
@user-hb5zf7ze4q 4 месяца назад
Plz, make a video on WGCNA with microarray dataset. plz plz plz
@narens8511 7 месяцев назад
it says " Error in data %>% gather(key = "samples", value = "counts") %>% data % : could not find function "%>%
@amrsalaheldinabdallahhammo663 Год назад
Thank you genius, can you please make a video about mitch and how to use it in R :)
@Bioinformagician Год назад ⁺¹
I will surely plan a video on covering this :)
@amrsalaheldinabdallahhammo663 Год назад
@@Bioinformagician Thank you so much really can't wait to watch it !!!
@bobby5625 Год назад
Hi! Can I also use RSEM normalized gene expression data for WGCNA?
@Bioinformagician Год назад
You mean RPKM normalized gene expression data?
@abelardnsangou2794 Год назад
Please can you do a tutorial on Gene set Enrichment Analysis. (Idea behind that) Like you did for WGCNA?
@Bioinformagician Год назад
Sure, I'll definitely plan a video covering GSEA.
@abelardnsangou2794 Год назад
@@Bioinformagician Ok Thank you very much
@SaniyaKhullar Год назад
I also have some videos on my channel related to that. Please do check out and see :)
@sonaaritra Год назад ⁺¹
Hello, thank you very much for making these tutorial videos. However, I have encountered an error while plotting the dendrogram with module colors mentioned and the end of this video. Previously when I tried with the same dataset that you used in your analysis, it worked fine. But now I'm trying with one of my microarray data and I got the following error:
Error in .plotOrderedColorSubplot(order = order, colors = colors, rowLabels = rowLabels, :
Length of colors vector not compatible with number of objects in 'order'.
Due to this error, it is not generating the panel of colors at the bottom of the dendrogram. Please help me to sort out this problem.
@Bioinformagician Год назад ⁺¹
It is hard for me to recreate this error and troubleshoot it without code and data.
I can look into this if you can send me you code and normalized data.
@sonaaritra Год назад
@@Bioinformagician Thanks. Should I email it to you?
@Bioinformagician Год назад
@@sonaaritra yes please
@kartiksachdeva4323 Год назад ⁺¹
Were you able to fix the error? if yes could you please tell the solution
@SwedishRagers Год назад
I encountered the same error. How was this solved??
@Kaaaaaaaam Год назад
Why did you set TOMtype = "signed"? I am trying to understand the difference between adjacency type and TOM type. See Signed vs. Unsigned Topological Overlap Matrix
Technical report by Langfelder: "The take-home message from these notes is this: signed TOM takes into account possible anti-reinforcing connection strengths that may occur in unsigned networks. Since the anti-reinforcing connection strengths (practically) cannot occur in signed networks, in signed networks the signed and unsigned TOM are (practically) identical".
Since you are using the blockwiseModule instead of the constructing the network step-by-step, I believe the adjacency type is "unsigned" by default. I think you want the networkType to equal "signed".
@ramachandran8106 Год назад
Please release" GWAS" tutorial videos....
@quinattasneemrafique536 8 месяцев назад
Hello ma'am! It would be so helpful if you would provide your script for WGCNA as a file. It becomes difficult to note down every command
@Bioinformagician 8 месяцев назад
You can get all my scripts from github: github.com/kpatel427/RUclipsTutorials/blob/main/WGCNA.R
@athenanguyen442 Год назад
What do you mean by merged and unmerged? Do you mean data merged with phenodata?
@Bioinformagician Год назад
Can you provide timestamp?
@athenanguyen442 Год назад
@@Bioinformagician 35:05. Thank you!
@Bioinformagician Год назад
@@athenanguyen442 Oh I meant modules before merging and modules after merging.
@ritikasingh8809 Год назад
is it necessary that supplementary file must have rawcounts.txt.gz ?please reply and can I do co expression , if the file is in raw.tar
@fatimafarhan531 Год назад
Thank you for this very informative video ! I was applying your tutorial on my dataset, however, I kept receiving this error when running the blockwiseModules :
Error in colSums(!is.na(datExpr[useSamples, useGenes])) :
'x' must be an array of at least two dimensions
I searched for it online but couldn't find an explanation, could you help me please ?
@nanditapuri1916 Год назад
I got the same error! It is maybe because norm.counts is not a 2-dimensional as in lists in lists
@nanditapuri1916 Год назад
So i removed the previous step to convert them into numeric, and it worked for me.
@RajeshKumarDutta 11 месяцев назад
Thanks!
@emilyzhang2755 7 месяцев назад
Thanks!

Следующие

Автовоспроизведение

Weighted Gene Co-expression Network Analysis (WGCNA) Step-by-step Tutorial - Part 2