Thank you so much for this detailed video....It helped me a lot with my analysis. It will be of great help if you can also show how to analyze publically available RNAseq from NCBI GEO.
You can use the msa package and a few others for the Bayesian and ML analyses you would usually see in softwares like MEGA. Here's an rbubs blog that goes through the basic process: rpubs.com/mvillalobos/L01_Phylogeny
Just using defaults for the tutorial, the packages have multiple different methods that can be applied, I'm not sure if this one has Bayesian inference or not, but I agree that or MLE would probably be optimal!
Can you elaborate a bit more on what you want to do? In the meantime here is the ggtree documentation, it might have what you're looking for. 4va.github.io/biodatasci/r-ggtree.html
Hi. It was easy. Thanks. Can you please provide some information that can allow me do diversity estimation using phylogenetic trees ( I don’t have any count matrix. I only have sequences from hiv patients). What R package can do that? Is there a GUI tool that can do diversity estimation and statistical test (t-test) ?
Tons of phylogenetics R packages. For tree building and visualization I would say phylotools, phytools, ape, and ggtree package are the most helpful. RevGadgets package has a mix of everything. see their paper here: besjournals.onlinelibrary.wiley.com/doi/epdf/10.1111/2041-210X.13750
Hey Ahmed, you can find node values within the phylo object (which I named "tre" in the tutorial) by using the function nodepath(). In this case you would run nodepath(tre), and it will show the initial node first (the entire tree) the secondary node (in this case my three secondary nodes are 17, 19, and 20), where the first branches are rooted, and so on...
Great job! I am developing an algorithm via the R program to create phylogenetic trees and calculate values that interest me like homoplasy, CI, RI etc. On 2019 I had used a function called ''matord'' but I can't find it anymore. Specifically I needed it for calculation of two matrices for CI and RI. Is there any way to know something about this function ? The packages that I used to complete the creation of phylogenetic trees and calculate the homoplasy and the distance are: phangorn, ape, ade4, graphics, and seqinr. Nicely explained! Thank you very much!
Hey Nic, the function matord doesn't ring any bells for me... do you know specifically what package it was from, or do you know what the function does? If the purpose is as the name suggests, to order a matrix, there is simple ways to do that in R depending on what way you're trying to order values. There seems to be a custom object within a function of the ClusterSeq package with the name "matord" but that's about all I could find rdrr.io/bioc/clusterSeq/src/R/associatePosteriors.R
@@RJG_Ecology In order to test the relation between distance and homoplasy I create this algorithm. The general concept of algorithm is to look for the most central strain of a given group of strains. This strain is the one that minimizes the average distance within a square distance matrix. Once the most central strain has been found, the other strains are sorted in increasing distance order. Adding one strain at a time, it is possible to have an increasing number of strains coming into play. At each addition, homoplasy and average distance of the strains from the most central strains are calculated and plotted. This procedure allows to consider carefully the trend of homoplasy and distance, as well as the Rescaled Index.
Hi Pamela, Yes actually! The taxonomizr package (see tutorial here: cran.r-project.org/web/packages/taxonomizr/readme/README.html) has a two step process for this purpose with functions "accessionToTaxa", which convert accession numbers to taxonomic IDs, and then "getTaxonomy" convert taxonomic IDs to taxonomy. They have examples of how to do so in the link. Let me know if you run into any issues!
Hey Margaux, yes there are two packages that are used for MLSR in R: 1) MLSTar bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2887-1 github.com/iferres/MLSTar and 2) STRAIN bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2887-1 I'm not very familiar with it but also 3) mlstverse github.com/ymatsumoto/mlstverse and 4) StrainR github.com/jbisanz/StrainR This blog may be helpful too: www.r-bloggers.com/2017/01/descriptive-analysis-of-mlst-data-for-mrsa/
I keep getting this error : in f(p.profile[, anchors[2, n - 1]:anchors[1, n], drop = FALSE], : Alignment larger (16,317,302,694) than the maximum allowable size (2,147,483,647) Could you help me understand why? I've already tried DECIPHER in two different versions: 2.20.0 and 2.22.0
Really good stuff. Is there an attr() like function that will allow one to pull the geographic location of each of the sequence? Is that included in the metadata?
Hey Vasilik, yes! So bootstrap values need to be appended to the phylo object itself as node labels, and then called in the ggtree as geom_nodetext. The top answer on the stackoverflow question addresses this in detail as well as how you can apply it yourself with coded examples: stackoverflow.com/questions/22749634/how-to-append-bootstrapped-values-of-clusters-tree-nodes-in-newick-format-in
@@RJG_Ecology thank you very much. I may have found a solution by calculating bootstrap values with boot.phylo() and assigning them to the phylo object with full_join() but I will also take a look at the page you are suggesting.
With tree data do you mean ".tre" files? You can just read those in with the read.tree function and combine them with merge_tree if they have common variables
@@RJG_Ecology thank you sir for reply. Actually my teacher told me to calculate phylogenetic diversity from tree data he send me i can use R but it makes me more confuse since last week i m trying did not find anyway how to do it. If you can guide me about phylogenetic diversity would be very appreciated. Thank you sir.
@@Gayensubrata89 it looks like the ade4 "table.paint" function has updated and removed the argument "cleg", just remove that and it should work. i.e. table.paint(temp, clabel.row=.4, clabel.col=.4)+ scale_color_viridis()
Hey Ugur, the reason for this error in R is that you have not opened the function library. In this case, the function library is "DECIPHER". Make sure you have installed DECIPHER using: if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("DECIPHER") and then open it using: library(DECIPHER)
@@RJG_Ecology Thank you for replying Russ. But I got new error like this: in f(p.profile[, anchors[2, n - 1]:anchors[1, n], drop = FALSE], : Alignment larger (9,174,518,227) than the maximum allowable size (2,147,483,647). How can I fix it?
@@uguremre3287 The maximum allowable size for alignments with DECIPHER alignseqs() is 2,147,483,647. Therefore anything larger will need to use a different alignment function such as FindSynteny() followed by AlignSynteny().
It does not, but I think I may add this in the near future. If you're already a student of the course, you can add a question regaurding this on the message board and I would be happy to post some code to walk you through it.
I can't believe how much easier this was in comparison to my attempts with the msa package, thank you !
Thank you very much!!! If only people kept it short and sweet such as this.. Kudos... :)
Thank you Russ, this made my day.
Thanks for the detailed explanation, I could create a tree really easy with your video
Thank you so much for this detailed video....It helped me a lot with my analysis. It will be of great help if you can also show how to analyze publically available RNAseq from NCBI GEO.
Thanks for the detailed explanation, how do you run the boot strap.
You can use the msa package and a few others for the Bayesian and ML analyses you would usually see in softwares like MEGA. Here's an rbubs blog that goes through the basic process:
rpubs.com/mvillalobos/L01_Phylogeny
Very helpful guide. Question: Why use neighbor-joining instead of something like Maximum Likelihood to build your tree?
Just using defaults for the tutorial, the packages have multiple different methods that can be applied, I'm not sure if this one has Bayesian inference or not, but I agree that or MLE would probably be optimal!
Does anyone know how to build the tree using Maximum Likelihood instead of neighbor-joining?
How do we define the node values?
Thanks!! How can we group sequences into different colors based on their taxonomic group?
Can you elaborate a bit more on what you want to do?
In the meantime here is the ggtree documentation, it might have what you're looking for.
4va.github.io/biodatasci/r-ggtree.html
Hi. It was easy. Thanks.
Can you please provide some information that can allow me do diversity estimation using phylogenetic trees ( I don’t have any count matrix. I only have sequences from hiv patients). What R package can do that? Is there a GUI tool that can do diversity estimation and statistical test (t-test) ?
Tons of phylogenetics R packages. For tree building and visualization I would say phylotools, phytools, ape, and ggtree package are the most helpful. RevGadgets package has a mix of everything. see their paper here:
besjournals.onlinelibrary.wiley.com/doi/epdf/10.1111/2041-210X.13750
Great job! How do you get the values for the nodes?
Hey Ahmed, you can find node values within the phylo object (which I named "tre" in the tutorial) by using the function nodepath(). In this case you would run nodepath(tre), and it will show the initial node first (the entire tree) the secondary node (in this case my three secondary nodes are 17, 19, and 20), where the first branches are rooted, and so on...
@@RJG_Ecology Thank you very much, Russ.
Great job!
I am developing an algorithm via the R program to create phylogenetic
trees and calculate values that interest me like homoplasy, CI, RI etc. On 2019 I had used a function
called ''matord'' but I can't find it anymore. Specifically I needed it for
calculation of two matrices for CI and RI.
Is there any way to know something about this function ?
The packages that I used to complete the creation of phylogenetic trees and calculate the homoplasy and the distance are: phangorn, ape, ade4, graphics, and seqinr.
Nicely explained! Thank you very much!
Hey Nic, the function matord doesn't ring any bells for me... do you know specifically what package it was from, or do you know what the function does? If the purpose is as the name suggests, to order a matrix, there is simple ways to do that in R depending on what way you're trying to order values.
There seems to be a custom object within a function of the ClusterSeq package with the name "matord" but that's about all I could find
rdrr.io/bioc/clusterSeq/src/R/associatePosteriors.R
Also, there's this custom function
gist.github.com/pedroj/1872314
@@RJG_Ecology In order to test the relation between distance and homoplasy I create this algorithm. The general concept of algorithm is to look for the most central strain of a given group of strains. This strain is the one that minimizes the average distance within a square distance matrix. Once the most central strain has been found, the other strains are sorted in increasing distance order. Adding one strain at a time, it is possible to have an increasing number of strains coming into play. At each addition, homoplasy and average distance of the strains from the most central strains are calculated and plotted. This procedure allows to consider carefully the trend of homoplasy and distance, as well as the Rescaled Index.
@@MrAraxon Not sure if you've seen this package yet, but maybe it has some helpful functionality?
www.ncbi.nlm.nih.gov/pmc/articles/PMC6412054/
Do you have any suggestions for renaming the tip labels from GenBank accession numbers to genus names?
Hi Pamela,
Yes actually!
The taxonomizr package (see tutorial here: cran.r-project.org/web/packages/taxonomizr/readme/README.html) has a two step process for this purpose with functions "accessionToTaxa", which convert accession numbers to taxonomic IDs, and then "getTaxonomy" convert taxonomic IDs to taxonomy. They have examples of how to do so in the link.
Let me know if you run into any issues!
Nicely explained! Thank you
Great video! Do you have any recommendations of packages or code for MLST analysis in R?
Hey Margaux, yes there are two packages that are used for MLSR in R:
1) MLSTar
bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2887-1
github.com/iferres/MLSTar
and
2) STRAIN
bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2887-1
I'm not very familiar with it but also
3) mlstverse
github.com/ymatsumoto/mlstverse
and
4) StrainR
github.com/jbisanz/StrainR
This blog may be helpful too:
www.r-bloggers.com/2017/01/descriptive-analysis-of-mlst-data-for-mrsa/
I keep getting this error :
in f(p.profile[, anchors[2, n - 1]:anchors[1, n], drop = FALSE], :
Alignment larger (16,317,302,694) than the maximum allowable size (2,147,483,647)
Could you help me understand why? I've already tried DECIPHER in two different versions: 2.20.0 and 2.22.0
What line of code are you getting this error and with what data?
Really good stuff. Is there an attr() like function that will allow one to pull the geographic location of each of the sequence? Is that included in the metadata?
@Joe Partington-Smith Geographic location.
Such a big help thank you!!
Nice job
Hello, I would like to add bootstrap values in my tree. Any idea how to do that? Thank you.
Hey Vasilik, yes!
So bootstrap values need to be appended to the phylo object itself as node labels, and then called in the ggtree as geom_nodetext. The top answer on the stackoverflow question addresses this in detail as well as how you can apply it yourself with coded examples:
stackoverflow.com/questions/22749634/how-to-append-bootstrapped-values-of-clusters-tree-nodes-in-newick-format-in
Check this guys response too:
www.researchgate.net/post/SOLVED_How_do_you_export_bootstrap_node_support_in_Rs_ape_package
@@RJG_Ecology thank you very much. I may have found a solution by calculating bootstrap values with boot.phylo() and assigning them to the phylo object with full_join() but I will also take a look at the page you are suggesting.
@@vasilikiskiada2332 would you mind sharing your solution?
Hi sir, it was very really informative R function, can i apply this function on Tree data?
With tree data do you mean ".tre" files? You can just read those in with the read.tree function and combine them with merge_tree if they have common variables
@@RJG_Ecology thank you sir for reply. Actually my teacher told me to calculate phylogenetic diversity from tree data he send me i can use R but it makes me more confuse since last week i m trying did not find anyway how to do it. If you can guide me about phylogenetic diversity would be very appreciated. Thank you sir.
Nice job, keep it up.
I am getting an Error in gray(valgris[numclass]) : invalid gray level, must be in [0,1]. how to solve that?
What lines of code are you running when you get the error?
@@RJG_Ecology temp
@@Gayensubrata89 it looks like the ade4 "table.paint" function has updated and removed the argument "cleg", just remove that and it should work. i.e.
table.paint(temp, clabel.row=.4, clabel.col=.4)+
scale_color_viridis()
@@RJG_Ecology No it is not working. Still having the error-
Error in gray(valgris[numclass]) : invalid gray level, must be in [0,1].
@@Gayensubrata89 Please show me the code you ran to get that error.
Good job thanks for sharing!
could not find function "OrientNucleotides" I got this error. Could you pls help me guys
Hey Ugur, the reason for this error in R is that you have not opened the function library. In this case, the function library is "DECIPHER". Make sure you have installed DECIPHER using:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("DECIPHER")
and then open it using:
library(DECIPHER)
@@RJG_Ecology Thank you for replying Russ. But I got new error like this:
in f(p.profile[, anchors[2, n - 1]:anchors[1, n], drop = FALSE], :
Alignment larger (9,174,518,227) than the maximum allowable size (2,147,483,647).
How can I fix it?
@@uguremre3287 The maximum allowable size for alignments with DECIPHER alignseqs() is 2,147,483,647. Therefore anything larger will need to use a different alignment function such as FindSynteny() followed by AlignSynteny().
@@RJG_Ecology I tried to run from AlignSynteny() but I couldn't figure out it:(
Error in AlignSynteny(apricot) :
synteny must be an object of class 'Synteny'
Thanks for the video Russ! Does your Udemy course includes how to run phylogenetic analysis using the maximum likelihood method?
It does not, but I think I may add this in the near future. If you're already a student of the course, you can add a question regaurding this on the message board and I would be happy to post some code to walk you through it.
Can you please share this code .
Link to the code and data is in the description already
Thanks!
excelente!
Your video is good however, the poor visual makes it difficult to follow R commands
thanks for share... nice job but the link is not working (github.com/RussellGrayxd/Phylogenetics). where can i find the formulas for rstudio
The link is working fine on my end. Check your browser and firewall settings, could also be connection. Can you access github by itself? github.com/