- Видео 19
- Просмотров 48 669
Russell Gray (RJG Ecology)
Добавлен 29 дек 2016
My goal is to help emerging ecologists, conservationists, and GIS practitioners to build their capacity in R, ESRI, QGIS, data analyses, statistics, and applied science.
Sentiment analysis in R
A quick example of how to run a sentiment analysis in R using the syuzhet package
Code:
github.com/RJGrayEcology/Snippets/blob/main/Sentiment%20analysis.R
Code:
github.com/RJGrayEcology/Snippets/blob/main/Sentiment%20analysis.R
Просмотров: 1 172
Видео
Making a comparison table in R
Просмотров 948Год назад
Create a table that compares counts and percentages between groups using R tidyverse functionality Code: github.com/RJGrayEcology/Snippets/blob/main/comparison table.R
Creating camera trap site locations and saving in GPX format in R
Просмотров 348Год назад
Creating camera trap site locations in R, using random sampling, regular (systematic) sampling, across the entire site, across a focused area, drawing custom polygons, and saving the files as GPX for field staff. Code: github.com/RJGrayEcology/Snippets/blob/main/Camera trap locations_code.R
Word clouds and word frequency tables in R
Просмотров 1,1 тыс.Год назад
Word clouds in R are simple, here is an example on how to make them, and also to create a frequency table to analyze the exact frequency of those words used throughout the responses. github.com/RJGrayEcology/Snippets/blob/main/word clouds.R
Frequency Percent Table in R
Просмотров 1,3 тыс.Год назад
Creating frequnecy/percent tables with grouped categorical variables in R is simple! R code template: github.com/RJGrayEcology/Snippets/blob/main/Frequency-Percent table
Likert Scale Analysis in R
Просмотров 4,3 тыс.Год назад
Just a code snippet to show the simple process of cleaning and plotting Likert-scale data Get the code snippet at my GitHub github.com/RJGrayEcology/Snippets/blob/main/Likert Scale Snippet
Regression Trends SMART R-Plugin
Просмотров 4842 года назад
An example on how to use the R plugin with pre-written code to analyze illegal activity data in protected areas via the Spatial Monitoring and Reporting Tool (SMART). In this example, we use a code snippet along with SMART queries of illegal activity observations (traps, illegal hunting camps, and offenders), and divide them by effort (daily km of patrols) to obtain a density estimate for each....
Quick and dirty semi-automatic landcover classification in QGIS
Просмотров 5773 года назад
A quick walkthrough on the QGIS 3.22.0 Semi-Automatic Classification Plugin V. 7 Timecodes 0:00 - Intro 0:17 - Get the plugin 0:44 - Acquiring Landsat data 4:26 - Bandsets and visualizing data 6:45 - Training the classification model 12:34 - Checking and classifying the data
EIR Web-conference Series Day 2: Sea Turtle and Seabird Ecology
Просмотров 4253 года назад
Nicole Barbour - Using Dynamic Time Warp Clustering to Classify Dive Profiles of a Highly Migratory Species Julian Perez-Correa - Climate oscillation and the invasion of alien species influence the oceanic distribution of seabirds
EIR Web-conference Series Day 1: Marine Community Ecology
Просмотров 1,3 тыс.3 года назад
Alfredo Ascanio - From corals to complex networks: Extending a measure of community assortativity Luis Montilla - Building a network of invertebrate-microbe associations from seagrass beds
EIR Web-conference Series Day 4: Statistical Concepts
Просмотров 2,4 тыс.3 года назад
Kyle Tomlinson - General Linear Model and Mixed Model concepts Michael Greenacre - Why I don't use the Bray-Curtis dissimilarity in multivariate analysis of ecological data
Ecology in R Online Course
Просмотров 1,7 тыс.4 года назад
Learn a wide variety of ecological data analyses by mining your own species occurrences and environmental data from various online sources. R code provided in each lesson is reproducible and easy to modify for your own projects and research. Upon completion of this course, you should have the knowledge to perform these analyses and GIS techniques with your own data, with an improved knowledge a...
Phylogenetic Analysis of ITS sequences in R
Просмотров 19 тыс.4 года назад
Phylogenetic Analysis of ITS sequences in R
Thanks for your amazing video. I had a grade 5 teacher who lived in Africa and introduced us kid to pythons and anacondas in school. Took the fear off from us.
Thanks for the detailed explanation, how do you run the boot strap.
You can use the msa package and a few others for the Bayesian and ML analyses you would usually see in softwares like MEGA. Here's an rbubs blog that goes through the basic process: rpubs.com/mvillalobos/L01_Phylogeny
Is there a way to get the periods out from in between the words on the plot at the end?
You can just use gsub (the find and replace function in R) with a regex to remove whatever you want. Like this: your_data$your_column <- gsub("\\.", "", your_data$your_column)
Thanks this is really amazing and educative, please can you help with method of analysing dataset that has many columns of numeric which are needed as mean analysis
absolutely! So in the code chunk where it says: dat.tab <- data%>% group_by(Grouping_column_ID)%>% count(value)%>% mutate(percent = round(n/sum(n)*100, 1)) You would change the count() function instead to summarize() function and add the new column name avg (or whatever you want to call it) and the mean() function for you numeric values (lets call them value here also as it should be the output of melt()). It would then look like this: dat.tab <- data%>% group_by(Grouping_column_ID)%>% summarize(avg = mean(value)) With this, you will get the mean for each grouping ID, however you cannot get a percentage in this case because you do not have the sum of the counts but rather the mean of the counts. What you could do, for comparative purposes would be to add maybe a min and max value, like so: dat.tab <- data%>% group_by(Grouping_column_ID)%>% summarize(avg = mean(value), min = min(value), max = max(value)) For these numeric value columns grouped by ID, you could also use great visualization functions from the package ggstatsplot to plot them into boxplots/violin plots with their associated pairwise and groupwise statistical tests library(ggstatsplot) ggbetweenstats(data=data, x = Grouping_coumn_ID, y = value)
Awesome! Thank you. However, when I do the plot, it opens in a new window and I can not get it to zoom out. Any advice?
Are you using R or Rstudios? Are you on a Mac, Windows, or Linux system?
My immediate guess would be that you're using a Mac and you need to install XQuarts (www.xquartz.org/)
@@RJG_Ecology I am using Windows and R Studio.
@@stephhogan6070So when you run the plot does it show up in the bottom right plot window at all? Or immediately opens an X11 (external) window by itself? Also when you say "I cannot get it to zoom out" what do you mean?
@@RJG_Ecology No, it does not show up in the bottom right plot window. It opens by itself in a new window. When it opens I am unable to see the strongly agree response at the bottom.
Thank you, good material
Thank you very much!!! If only people kept it short and sweet such as this.. Kudos... :)
Great vid glad i found this
Great info. I've been fascinated with snakes since I was a kid, I always wanna travel to exotic places around the globe and give amazing presentations with different species of snakes to locals such as United States, Peru, Africa, Asia, Costa Rica, etc whether I'm visiting at a school, rural village, zoo, things like that. If I catch a snake in the wild I need to identify the species before letting people know if it's safe/unsafe to get close or touch the animal if the snake expert says its okay
Great video for SMART user.
Hello brother, could you please start with SMART step again?
In SMART, the initial step is to install the R plugin (on SMART Desktop), which you can then access through the Queries tab. Once you have the plugin, you can get this script from here: github.com/RJGrayEcology/smartR/blob/main/Regression_plots.R Then you add the script in the R plugin on SMART Desktop and run. If you are using the SVW data model, then the script does not need to be modified. If you are using a different data model, you may have to modify the script to represent the names of your data fields.
@@RJG_Ecology many thank you for your reply to me, first of all I would like to tell you that my English is poor some time it might hard to understand my message. continue our point, I already install R plugin and added the script to the R queries, but my issue is my data model is different. So, now II need to know your queries step and see your queries column names for my understanding to edit the R code/script in my data model and queries. Many thanks again.
@@haivvemvang7731 If you have a different data model, then all of the fields (columns) I include in this video should still exist, they just have different names. So you need to query those same fields in SMART, but use the names that exist in your own data model. Also, you might want to make a linear model for things other than traps, camps, and offenders. In which case you can just pick any three data types in your own SMART query and run those instead. If you try it and run into any errors, let me know and I will try to help
@@RJG_Ecology Many thanks, I will try to do it and let you know again.
@@RJG_Ecology could I share my SMART database with you and then can you revise the script and revert to me?
For a more in-depth tutorial: cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html
Edit: I have updated the code after working with the sf devs. The code to use sf functionality and maintain gpx names (at the end) is: write_sf(pts.sys.ll, "cam_locations.gpx", driver = "GPX", dataset_options = "GPX_USE_EXTENSIONS=YES")
Very good teaching and concise.
Thanks!! How can we group sequences into different colors based on their taxonomic group?
Can you elaborate a bit more on what you want to do? In the meantime here is the ggtree documentation, it might have what you're looking for. 4va.github.io/biodatasci/r-ggtree.html
There's a dark red snake that is commonly found in the north of Belize. It is often been killed by everyone because it is said to be venomous. I don't think it's venomous and i believe it's a specie of earth snake. It's size is mostly around 2 to 3 feet in length. I want to know the true name and if it's venomous. It has a smooth looking appearance ..
How do we define the node values?
Thanks for the detailed explanation, I could create a tree really easy with your video
I will kill even more snakes now, just because some white trash made a video will not stop me from killing every snake I see.
Hi Russ. Are/were you associated with the Belize zoo at all? I found a coral snake in Belize in 1987/88, which didn’t match the description of any of the ones in the, supposedly comprehensive book I got from the Audubon society. I photographed it and took details. It was decades later when I took the details to the NHM in London, and whilst disappointed, I was reassured that science had high standards, because the information I had just wasn’t enough to identify or confirm it. It could have come from neighbouring countries, a zoo or the pet trade and I had nothing physical to show. So I may have discovered, even the most venomous snake in the Americas. Or not. I was in the RAF at the time, and an RAF GP was very interested in snakes and their venom. He was writing a book on venomous snakes of Belize. I still have a copy of his pre-published work. Can’t recall his name and the book is in storage. I understand that Belize is pretty well researched, so I doubt the species count has increased much/at all. I run the Phylogeny Explorer Project which is an attempt to list and put on an evolutionary historical/tree-like dendrogram, all known species of life, extinct nd extant, across all kingdoms. I really liked your work on the video, well done and thanks. I only found one (dead) Fer de Lance when there.
Hey Steve, no I am not. I was an independent researcher in Belize at this time. Regarding coral snakes in Belize, there are two confirmed species at the moment -- Micrurus hippocrepis and Micrurus diastema. I assume what you saw was the latter; its common name is the "variable coral snake" since its morphology is extremely variable from individual to individual. Regarding your phylogeny project, that sounds like quite the undertaking, NCBI also has something of this nature right now you should check out if you haven't already, called "LifeMap". It's basically a massive interactive phylogency of all known life in a fixed branch tree: lifemap-ncbi.univ-lyon1.fr/
Kindly send me your research papers or any published material if available?
Hi. It was easy. Thanks. Can you please provide some information that can allow me do diversity estimation using phylogenetic trees ( I don’t have any count matrix. I only have sequences from hiv patients). What R package can do that? Is there a GUI tool that can do diversity estimation and statistical test (t-test) ?
Tons of phylogenetics R packages. For tree building and visualization I would say phylotools, phytools, ape, and ggtree package are the most helpful. RevGadgets package has a mix of everything. see their paper here: besjournals.onlinelibrary.wiley.com/doi/epdf/10.1111/2041-210X.13750
Super useful, thanks!
could not find function "OrientNucleotides" I got this error. Could you pls help me guys
Hey Ugur, the reason for this error in R is that you have not opened the function library. In this case, the function library is "DECIPHER". Make sure you have installed DECIPHER using: if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("DECIPHER") and then open it using: library(DECIPHER)
@@RJG_Ecology Thank you for replying Russ. But I got new error like this: in f(p.profile[, anchors[2, n - 1]:anchors[1, n], drop = FALSE], : Alignment larger (9,174,518,227) than the maximum allowable size (2,147,483,647). How can I fix it?
@@uguremre3287 The maximum allowable size for alignments with DECIPHER alignseqs() is 2,147,483,647. Therefore anything larger will need to use a different alignment function such as FindSynteny() followed by AlignSynteny().
@@RJG_Ecology I tried to run from AlignSynteny() but I couldn't figure out it:(
Error in AlignSynteny(apricot) : synteny must be an object of class 'Synteny'
Do you have any suggestions for renaming the tip labels from GenBank accession numbers to genus names?
Hi Pamela, Yes actually! The taxonomizr package (see tutorial here: cran.r-project.org/web/packages/taxonomizr/readme/README.html) has a two step process for this purpose with functions "accessionToTaxa", which convert accession numbers to taxonomic IDs, and then "getTaxonomy" convert taxonomic IDs to taxonomy. They have examples of how to do so in the link. Let me know if you run into any issues!
I keep getting this error : in f(p.profile[, anchors[2, n - 1]:anchors[1, n], drop = FALSE], : Alignment larger (16,317,302,694) than the maximum allowable size (2,147,483,647) Could you help me understand why? I've already tried DECIPHER in two different versions: 2.20.0 and 2.22.0
What line of code are you getting this error and with what data?
Such a big help thank you!!
Red on yellow = kill a fellow. Red on black = venom lack. That's about all the coral snake wisdom I can remember, other than run like a motherfucker.
I am with you in a group on facebook now with you here.Hope to get new knowledge and skills
Used to saw you in fb group ecology in r. And now found you here. Subscribed. Haha. Look forward to your contents.
Very helpful guide. Question: Why use neighbor-joining instead of something like Maximum Likelihood to build your tree?
Just using defaults for the tutorial, the packages have multiple different methods that can be applied, I'm not sure if this one has Bayesian inference or not, but I agree that or MLE would probably be optimal!
Does anyone know how to build the tree using Maximum Likelihood instead of neighbor-joining?
I can't believe how much easier this was in comparison to my attempts with the msa package, thank you !
Thank you so much for this detailed video....It helped me a lot with my analysis. It will be of great help if you can also show how to analyze publically available RNAseq from NCBI GEO.
Hi sir, it was very really informative R function, can i apply this function on Tree data?
With tree data do you mean ".tre" files? You can just read those in with the read.tree function and combine them with merge_tree if they have common variables
@@RJG_Ecology thank you sir for reply. Actually my teacher told me to calculate phylogenetic diversity from tree data he send me i can use R but it makes me more confuse since last week i m trying did not find anyway how to do it. If you can guide me about phylogenetic diversity would be very appreciated. Thank you sir.
I am getting an Error in gray(valgris[numclass]) : invalid gray level, must be in [0,1]. how to solve that?
What lines of code are you running when you get the error?
@@RJG_Ecology temp <- as.data.frame(as.matrix(D)) table.paint(temp, cleg=1, clabel.row=.4, clabel.col=.4)+ scale_color_viridis()
@@Gayensubrata89 it looks like the ade4 "table.paint" function has updated and removed the argument "cleg", just remove that and it should work. i.e. table.paint(temp, clabel.row=.4, clabel.col=.4)+ scale_color_viridis()
@@RJG_Ecology No it is not working. Still having the error- Error in gray(valgris[numclass]) : invalid gray level, must be in [0,1].
@@Gayensubrata89 Please show me the code you ran to get that error.
Can you please share this code .
Link to the code and data is in the description already
thanks for share... nice job but the link is not working (github.com/RussellGrayxd/Phylogenetics). where can i find the formulas for rstudio
The link is working fine on my end. Check your browser and firewall settings, could also be connection. Can you access github by itself? github.com/
A small query. If I have a 20 year Climate data and current biodiversity data of Trees outside forest. How can I model it? Possible in R? Kindly brief.
That would be something you could do with species distribution models but that would entirely depend on the species of tree and the type of ecosystem. I would look into the packages sdm, biomod2, and ENMtools.
Thanks!
Thanks for this presentation. Very informative.
Greenacre's talk was amazing! Sad that I didn't get to see it live. And, BTW, the whole series of talks was really good.
28:53
Great talks! Regarding, the question using AIC/AICc to compare models which are identical except for differing families - you can when the the only difference is a dispersion parameter. So, as I understand it, the only difference between a poisson and a negative binomial distribution is a dispersion parameter, and so come from the same underlying distribution and therefore the likelihoods are commensurate. You cannot, however, compare a model with normal errors v.s. binomial errors.
André, would you mind elaborating why you think this? Any distribution provides the probability of the data given the covariates and parameters, at least in frequentist statistics. A larger likelihood means that the model chosen is more likely to have generated the data. This doesn't change with the distribution, and it is (or the log of it) what is used to calculate AIC. By definition, these probabilities are comparable.
And consequently, so are AIC values from different models, given that the same data are used.
@@bertvanderveen3528 stats.stackexchange.com/questions/345069/likelihood-comparable-across-different-distributions and stats.stackexchange.com/questions/139201/model-selection-can-i-compare-the-aic-from-models-of-count-data-between-linear have articulated it pretty well in these links!
@@andrebellve9027 thanks for your response! I'm afraid that a comment on stackexchange is hardly a convincing reference. My opinion on the matter remains, probabilities are comparable as they are on the same scale and measure the same thing. Regardless, it is an interesting argument, and I will look further into this. Are you aware of any scientific references on this matter? (that is, on the first link, not the second; that is hardly a good argument as the likelihood is always the probability of the data given the parameters, whatever the likelihood is). It sounds to me like that the comment merely points out that the normalising constant needs to be included, so that all distributions properly integrate to one, i.e. so that probabilities are in the range 0,1.
@@bertvanderveen3528 I do not know of any published literature, although I can see if Kjetil would supply some references. Personally, I am not a mathematician, but I am inclined to believe him given his credentials and, as far as I can follow it, the logic is sound. Not that me following it says a huge amount! Unless I am mistaken, the original question was in regards to the use of AIC/AICc for the comparison of models with different error distributions. As AIC and it's second order derivative are defined, the values are incommensurate when the models generating them come from different distributions, providing you agree with Kjetil Halvorsen logic in the first link. It may well be that the likelihoods can be made to be comparable and some penalising condition added to make it analogous to AIC/AICc or any other information criterion, but that isn't quite the same question. Moreover, I believe (and there are a lot of packages out there so I may be wrong!) most R functions would not be altering the likelihood with a normalising constant. As such, as it stands, the AIC values outputted should not be used for comparisons when the families differ. As a side note - the second link was included for posterity; it's how I came to find the first link and it links the likelihood and AIC discussions. I know cross-validated isn't as robust as a published article, but searches in the past for formally peer-reviewed literature on the topic haven't been fruitful for me!
Thank you for the excellent talks.
que bonito tema musical, quien lo canta.
Technical difficulties from 59:57 to 1:11:56. Sorry about that everyone!
Intro begins around 28:53
very informative! The limerick was wonderful too! Thank you!
Great job! I am developing an algorithm via the R program to create phylogenetic trees and calculate values that interest me like homoplasy, CI, RI etc. On 2019 I had used a function called ''matord'' but I can't find it anymore. Specifically I needed it for calculation of two matrices for CI and RI. Is there any way to know something about this function ? The packages that I used to complete the creation of phylogenetic trees and calculate the homoplasy and the distance are: phangorn, ape, ade4, graphics, and seqinr. Nicely explained! Thank you very much!
Hey Nic, the function matord doesn't ring any bells for me... do you know specifically what package it was from, or do you know what the function does? If the purpose is as the name suggests, to order a matrix, there is simple ways to do that in R depending on what way you're trying to order values. There seems to be a custom object within a function of the ClusterSeq package with the name "matord" but that's about all I could find rdrr.io/bioc/clusterSeq/src/R/associatePosteriors.R
Also, there's this custom function gist.github.com/pedroj/1872314
@@RJG_Ecology In order to test the relation between distance and homoplasy I create this algorithm. The general concept of algorithm is to look for the most central strain of a given group of strains. This strain is the one that minimizes the average distance within a square distance matrix. Once the most central strain has been found, the other strains are sorted in increasing distance order. Adding one strain at a time, it is possible to have an increasing number of strains coming into play. At each addition, homoplasy and average distance of the strains from the most central strains are calculated and plotted. This procedure allows to consider carefully the trend of homoplasy and distance, as well as the Rescaled Index.
@@MrAraxon Not sure if you've seen this package yet, but maybe it has some helpful functionality? www.ncbi.nlm.nih.gov/pmc/articles/PMC6412054/
I love snakes