Multivariate D-Square statistics in R/Tocher method of clustering for genetic diversity assessment

Поделиться
HTML-код
  • Опубликовано: 12 июл 2024
  • Hi there in this video, I have shared the complete procedure involved in Multivariate D-square statistics in plant breeding, using Mahalanobis squared euclidian distance for assessment of genetic diversity in plants.
    This is a second order statistics not the most accurate one as compared to Metroglyph analysis but can be used for more number of individuals, with replications only
    ~ Here I am using the biotools package developed by Prof. Anderson da Silva to do clustering based on Tocher and Modified Tocher Method along with the inter and intra cluster distances.
    ~ The dataset used here is from
    sites.google.com/site/tnausta...
    ~ Cross verify the results if you think about credibility of the analysis
    drive.google.com/file/d/1DcuE...
    Both of them are from TNAU stat
    0:00 Introduction
    3:18 Coding
    Script
    docs.google.com/document/d/1-...

Комментарии • 133

  • @Guruprasad_A
    @Guruprasad_A  2 года назад

    Check the script in the description.....

  • @bandhanthapa7982
    @bandhanthapa7982 8 месяцев назад +1

    Thank you for the video , it was really informative and simple to execute 😇😇

  • @PlantBreeding_is_my_passion
    @PlantBreeding_is_my_passion 2 года назад +1

    very good hard work and wonderful video thanks bro

  • @shivangitare7102
    @shivangitare7102 11 месяцев назад

    sir, what is the script for find out the cluster mean value from tocher method

  • @aswathiap5592
    @aswathiap5592 2 года назад

    Just a doubt..Can we do univariate analysis for clustering when we have n number of genotypes studied for say m number of traits?

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      Univariate = one variable,
      Multivariate = Multiple or more than one variable,
      In our context,
      Variable = Trait.

  • @jaybabariya141
    @jaybabariya141 3 года назад +1

    Thank you for this amazing video. I have one doubt that how to find cluster mean table using R ?

    • @Guruprasad_A
      @Guruprasad_A  3 года назад

      If the output was in list. we can use the aggregate function (refer last part of my cluster analysis video), unfortunately it's not in list. so it's difficult to use aggregate function because we need to create list manually and it takes more time, so add an extra columns before genotypes and fill it with cluster membership (1,2,3. .......) Whatsoever for you and sort them based on cluster membership Column and calculate mean by adding a extra row at the end of each cluster.

  • @thomasmuse583
    @thomasmuse583 11 месяцев назад

    is it possible to analyze over location lattice design experiment using the explained script?

    • @Guruprasad_A
      @Guruprasad_A  11 месяцев назад

      If you have replication yes

  • @dibsohbordoloi7952
    @dibsohbordoloi7952 2 года назад

    Sir I have found error in code manova calculating part, the r studio showed error in is.factor(x) 'Genotype' not found

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      Is there any missing genotypes ?
      Please do check the name of genotype column in your dataset ?

  • @Guruprasad_A
    @Guruprasad_A  3 года назад

    Hi, everyone don't hesitate to ask your doubts here...... I will try to answer them.

    • @prawalpsverma4903
      @prawalpsverma4903 5 месяцев назад

      Sir Please tell me how to make image of inter cluster and intra cluster

  • @jaybabariya141
    @jaybabariya141 3 года назад

    Hello sir, I found error in summary.manova(mod) : residuals have rank 17 < 18
    I have 18 characters. Please suggest what does that mean? And what I have to do ?

    • @Guruprasad_A
      @Guruprasad_A  3 года назад

      stackoverflow.com/questions/39412865/error-in-summary-manova-residuals-have-rank-order-deficiency

    • @Guruprasad_A
      @Guruprasad_A  3 года назад

      Read the above article.

  • @rakeshgowda7982
    @rakeshgowda7982 2 года назад

    The video was very clear. I have one error while executing the code summary(mod), and it was "Error in summary.manova(mod) : residuals have rank 3 < 8", can you please help me in rectifying this error.

    • @Guruprasad_A
      @Guruprasad_A  2 года назад +1

      stackoverflow.com/questions/39412865/error-in-summary-manova-residuals-have-rank-order-deficiency

  • @sukumartaria3582
    @sukumartaria3582 Год назад

    Can we get p- value of each cluster based on mahalanobis distance or can we get p-value of each cluster in LDA analysis?? If yes put the code

  • @khushwantb.choudhary8157
    @khushwantb.choudhary8157 2 года назад

    how can we do D2 of two environment data (as pooled)/ for example control and drought environment, i want to calculate d2 clustering for pooled data involving both the envs. Kindly tell me.

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      Check this out once in STAR, PB tools.
      There we may get an option.

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      It's better to do it separately for both environment.

  • @vijaydunna8088
    @vijaydunna8088 Год назад

    Dear The Outlier, Thank you for the very good video. I am facing a problem. The total number of clusters formed using this code is only 2. While the same data was analysed using windowstat the number of clusters are 25. why this difference?

    • @Guruprasad_A
      @Guruprasad_A  Год назад

      In both the methods also you are getting 2 clusters.

    • @vijaydunna8088
      @vijaydunna8088 Год назад

      @@Guruprasad_A yes both tocher and modified tocher

    • @Guruprasad_A
      @Guruprasad_A  Год назад

      Can you able to check distance matrix sir, whether both are some or not.

    • @niharikadhuliya6219
      @niharikadhuliya6219 Год назад

      ​@@Guruprasad_A hii

  • @aswathiap5592
    @aswathiap5592 2 года назад

    Sir, I'm getting error in D2.dist step. I did as said in video but I'm getting error saying dimension is incompatible, I have 12 characters and 18 genotypes, please help...

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      I think there is a problem with number of variables, which you have considered for calculating variance and covariance matrix and the number of variables in average data.

    • @aswathiap5592
      @aswathiap5592 2 года назад

      Thank you

  • @amitrana734
    @amitrana734 2 года назад

    Sir can you suggest how i can make cluster diagram using any other software if not possible in R. Please also tell how i can make tocher based dendrogram in R

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      You can use window stat software if you have it

  • @deepikachandrasekaran3554
    @deepikachandrasekaran3554 8 месяцев назад

    Sir, in the R software while doing linear discriminant analysis, it is showing that figure margin is too large. Is there any solution?

    • @Guruprasad_A
      @Guruprasad_A  8 месяцев назад

      Try taking out one variable at a time and re do analysis until u find out the variable which might be that error…

    • @deepikachandrasekaran3554
      @deepikachandrasekaran3554 8 месяцев назад

      @@Guruprasad_A Thank you sir, will try and let u know....

    • @deepikachandrasekaran3554
      @deepikachandrasekaran3554 8 месяцев назад

      @@Guruprasad_A Sir I tried the method u said, but the same error is happening, it is showing plot.new (): figure margins too large

  • @saujanbashyal1247
    @saujanbashyal1247 2 года назад

    Can we define the number of clusters in tocher's method? I am getting 3 clusters but if i want 4 clusters, how would i do that?

  • @truongphu7407
    @truongphu7407 2 года назад

    Thanks author to deliver a valuable video, could you tell me how to create Fig.3 (Inter cluster and intra cluster distance...) in the end of the clip? Thanks again

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      Imagine you have got 5 no of clusters then draw Pentagon which will be having 5 sides at intersection you have to draw a circle so totally 5 circles, later label the 5 circles serially within them mention the intra cluster distance. Later connect all the circles then for example on the line which connects the 5th and 3rd circle (which represents inter cluster distance between 3rd and 5th cluster) mention inter cluster distance between 3rd and 5 th cluster in the same way do this inbetween other circles.

    • @truongphu7407
      @truongphu7407 2 года назад

      @@Guruprasad_A Thanks for your prompt reply. Did you mean we need to create manually?

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      Yup

    • @sourabhkumar1285
      @sourabhkumar1285 2 года назад

      @@truongphu7407 How to create manually?

  • @s.husain6125
    @s.husain6125 Год назад

    Hello sir code workig till covar calculation but when i run imp

    • @Guruprasad_A
      @Guruprasad_A  Год назад

      It means there is a lot of correlation in the variables of your dataset. Please check the correlation among the variables and try to remove those variables..
      For example you find correlation in days to first flower and days to fifty percent flowering etc...

  • @mahidargowd3432
    @mahidargowd3432 10 месяцев назад

    Bro how to plot the dendrogram we are getting the clusters na how to plot them can we get that process

    • @Guruprasad_A
      @Guruprasad_A  10 месяцев назад

      Dendrogam is not possible with tocher method.

  • @nehabelsariya5898
    @nehabelsariya5898 10 месяцев назад

    Thank you so much sir for this video. I would like to ask you about the diagram of tocher method?? How can we make that?

    • @Guruprasad_A
      @Guruprasad_A  10 месяцев назад

      We can't make it, it won't be that much appropriate because it's not a hierarchical method of clustering

    • @nehabelsariya5898
      @nehabelsariya5898 10 месяцев назад

      @@Guruprasad_A how you make last in video. diagram of 169 genotype by tocher method

  • @sukumartaria3582
    @sukumartaria3582 Год назад

    Any R code for plotting cluster diagrams with the Tocher method?

  • @rounakkumar6259
    @rounakkumar6259 Год назад

    How to draw dendogram after modified tocher method ..plse help

  • @saujanbashyal1247
    @saujanbashyal1247 2 года назад

    Isn't it necessary to scale the data?
    I think we should scale both dv matrix and averaged data frame

    • @Guruprasad_A
      @Guruprasad_A  2 года назад +1

      Calculating paired Mahalonobis distance is kind of scaling only.

    • @saujanbashyal1247
      @saujanbashyal1247 2 года назад

      @@Guruprasad_A
      Will it be valid if i use:
      mod

    • @Guruprasad_A
      @Guruprasad_A  2 года назад +1

      As per the information what I got from the original author, I did made this video. I hope you follow what I have shown in video

    • @Guruprasad_A
      @Guruprasad_A  2 года назад +1

      I hope this helps you to understand scanning is as good as calculating Mahalonobis distance stats.stackexchange.com/questions/210155/mahalanobis-distance-and-feature-scaling
      towardsdatascience.com/mahalonobis-distance-and-outlier-detection-in-r-cb9c37576d7d

  • @yellankinaveen2496
    @yellankinaveen2496 2 года назад

    Error in model.frame.default(formula = dv ~ as.factor(genotype) + as.factor(Repl), :
    variable lengths differ (found for 'as.factor(genotype)')
    In addition: Warning message:
    In xtfrm.data.frame(x) : cannot xtfrm data frames
    >
    i am facing this error
    place help to solve this problem

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      That error is because, the number of genotypes and replications are not same.
      Check some of them might be missing

    • @yellankinaveen2496
      @yellankinaveen2496 2 года назад

      @@Guruprasad_A sir i am beginner in the Programming can you tell me how to read the error

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      @@yellankinaveen2496 check if all the genotypes and replications are present are not

  • @s.husain6125
    @s.husain6125 Год назад

    How to generate dendrogram using tochers method in r studio sir

    • @Guruprasad_A
      @Guruprasad_A  Год назад

      Cluster diagram is not possible in this package

  • @sasipriyas3012
    @sasipriyas3012 3 года назад

    Sir, can you please share how to do the analysis without replicated data ( augmented design)

    • @Guruprasad_A
      @Guruprasad_A  3 года назад

      We can't do tocher method of clustering for non replicated data.

    • @Guruprasad_A
      @Guruprasad_A  3 года назад +1

      Try Ward min variance method insted, as shown in my cluster analysis video

    • @sasipriyas3012
      @sasipriyas3012 3 года назад +1

      Thank you sir 👍

    • @RamendraSarma
      @RamendraSarma 2 года назад

      @@Guruprasad_A why Ward minimum variance method

    • @Guruprasad_A
      @Guruprasad_A  2 года назад +1

      @@RamendraSarma It's subjective you can use any other methods such as average, complete etc... But most often used method is ward minimum variance method when it comes to hierarchical clustering in genetic diversity studies using morphological data.

  • @asingh9317
    @asingh9317 9 дней назад

    Kindly suggest when no. Of genotypes are less than no. Of traits, these codes are not giving results. It gave best results when genotypes were either more in number or equal to no. of observations.

    • @Guruprasad_A
      @Guruprasad_A  9 дней назад

      Not soo, i think.

    • @asingh9317
      @asingh9317 9 дней назад

      @@Guruprasad_A can we work for less no of genotypes and more observed traits?? If so then I have to check my data

    • @Guruprasad_A
      @Guruprasad_A  9 дней назад

      @@asingh9317 Sure, but inorder to study we need more genotypes.

    • @asingh9317
      @asingh9317 8 дней назад

      @@Guruprasad_A got it...thanks alot🍁🍁

  • @maramakashreddy6819
    @maramakashreddy6819 2 года назад

    Sir, help me to solve this error
    Error in is.factor(x) : object 'Genotypes' not found

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      Check the name of genotypes column in Excel sheet and confirm that same as mentioned in R studio also.
      Remember R is case sensitive

    • @maramakashreddy6819
      @maramakashreddy6819 2 года назад

      Both are same sir even though it is showing the same error

    • @senaitlegesse1709
      @senaitlegesse1709 2 года назад

      how to f ix this problem please i got same probllem

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      Check the name of the genotypes column what is there in Excel and in our video they have to be same.

    • @mysterious9718
      @mysterious9718 2 года назад

      How did you solve that sir.. Even I have the same error though everything is perfect...

  • @s.husain6125
    @s.husain6125 Год назад

    How to analyse D square with 2 year of data of fied trail sir

    • @Guruprasad_A
      @Guruprasad_A  Год назад

      Take multiple year data as replication.

  • @vikramana7796
    @vikramana7796 2 месяца назад

    Sir can you explain elaborately, how to do cluster means

    • @Guruprasad_A
      @Guruprasad_A  2 месяца назад

      Group them individually / seperate the groups in different Excel sheet then take average.

  • @akshatahattiholi8016
    @akshatahattiholi8016 11 месяцев назад

    Hello sir...
    Sir i didnt get how to find cluster means...can u explain me sir??

    • @Guruprasad_A
      @Guruprasad_A  11 месяцев назад

      Need to calculate manually, by seperating clusters in an Excel sheet.

    • @akshatahattiholi8016
      @akshatahattiholi8016 11 месяцев назад

      Sir, Separating clusters means: of single trait the genotypes present in the one cluster have to take the average ???

    • @Guruprasad_A
      @Guruprasad_A  11 месяцев назад

      Add one more column to data mention the cluster number, then sort according to cluster number and take average.

    • @akshatahattiholi8016
      @akshatahattiholi8016 11 месяцев назад

      Ok sir...got it...Thank you sir...

  • @gayatrikumawat6703
    @gayatrikumawat6703 2 года назад

    please mention code for last cluster diagram

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      We can't get a cluster diagram in this package. But for the last diagram which depicts inter and intra cluster distance we need to create manually in
      Microsoft power point.

  • @bhumikasinghlodhi6632
    @bhumikasinghlodhi6632 Год назад

    thanks for the video.
    I am getting error after executing manova command,
    Error in `[[

    • @Guruprasad_A
      @Guruprasad_A  Год назад

      I think it's a problem with your dataset or you might have mis selected the variables.

    • @bhumikasinghlodhi6632
      @bhumikasinghlodhi6632 Год назад

      @@Guruprasad_A I have check the data many times but couldn't get the mistake. can you help me in finding the mistake

    • @Guruprasad_A
      @Guruprasad_A  Год назад

      @@bhumikasinghlodhi6632Send the dataset, I am quite busy as of now, but I will try to clarify it by next Wednesday.

    • @bhumikasinghlodhi6632
      @bhumikasinghlodhi6632 Год назад

      Please provide me your email

    • @Guruprasad_A
      @Guruprasad_A  Год назад +1

      gthings1597@gmail.com

  • @dibsohbordoloi7952
    @dibsohbordoloi7952 2 года назад

    Sir how we draw the diagram of tochers clustering diagram please help me

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      In this package we can't generate a cluster diagram.

    • @dibsohbordoloi7952
      @dibsohbordoloi7952 2 года назад

      @@Guruprasad_A is there any package sir

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      @@dibsohbordoloi7952 Nope, as per my knowledge

  • @sasipriyas3012
    @sasipriyas3012 2 года назад

    sir, can you make a video on metroglyph analysis

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      As of now I am busy with my course work, I look into it later.

  • @thepandemics5614
    @thepandemics5614 2 года назад +2

    I used to visualise clustering using cytoscape......just try this.....

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      Ok

    • @SonuLangaya
      @SonuLangaya 2 года назад +1

      Hi, Is there any introductory tutorial on the same. Like how it is done in Cytoscape.

    • @thepandemics5614
      @thepandemics5614 2 года назад

      ​@@SonuLangaya well just installs Cytoscape and transforms the matrix according to individual interaction. directly fetch the file and you will find the tocher graph according to inter and intra values

  • @Guruprasad_A
    @Guruprasad_A  2 года назад

    Use “~” tilde not the “-” after dv~

  • @sathyarajdurai394
    @sathyarajdurai394 Год назад

    Cluster mean is where? Bro

    • @Guruprasad_A
      @Guruprasad_A  Год назад

      You have to calculate manually by grouping the varieties or treatments separately based on their clusters identity in an excel sheet.

  • @basazinewdegu4181
    @basazinewdegu4181 2 года назад

    Thank you Sir for this very important video. I have got a problem while I was doing my data on percentage contribution. The error said:
    Error in solve.default(cov) : system is computationally singular: reciprocal condition number = 1.54415e-19
    How can I amend/correct it? Please, give me your valuable comment.

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      It's seems your data is having too much correlation between the variables, please reduce it and see.

    • @basazinewdegu4181
      @basazinewdegu4181 2 года назад

      @@Guruprasad_A Thank you very much!

    • @stardust2419
      @stardust2419 2 года назад

      @@Guruprasad_A how to rectify that?

  • @rnsarma8876
    @rnsarma8876 2 года назад

    can you share the codes?

    • @Guruprasad_A
      @Guruprasad_A  2 года назад +1

      Sure give me your email I D

    • @Guruprasad_A
      @Guruprasad_A  2 года назад

      or Just send me a message to gthings1597@gmail.com

  • @prawalpsverma4903
    @prawalpsverma4903 5 месяцев назад

    Where is as.factor function ?

    • @Guruprasad_A
      @Guruprasad_A  4 месяца назад

      It's built in base r package.

  • @dgeethanjali7177
    @dgeethanjali7177 2 года назад

    Please can you send codes