I have been following your tutorials for a couple of days now. I want to say thank you, they are truly direct and straight to the point. I wish that you would offer consultation to students even if you decide to charge a price on it. Because sometimes one might get stuck and not know what to do.
Thanks a lot for the very kind feedback Careen, glad you find the tutorials useful! :) In case you have any questions, you may post them to the Statistics Globe Facebook group: facebook.com/groups/statisticsglobe Regards, Joachim
Directly answered what I was looking for - Thank you! I have used 'drop_na()' as oppose to 'na.omit()' for the most part, but always good to know alternative ways of doing things.
Love it, thank you one more time dude! Love the way you prepared your lessons ´cause they are really short, focus on an specific context and finally you gave us multiple solutions for an scenario, so thats the way it must be.
Hello, Thank you for your comment. Do you mean that you would like to see a tutorial on this topic? Is there something specific that you would like to know about tabulated value and calculated value in t-test normal distribution by plot in R programming ? Regards, Cansu
Hello, If you assume that the missingness in your data is MAR, see statisticsglobe.com/missing-data/.You can use multiple imputation (maybe the most preferred method under MAR) to impute your values. You can check the documentation of the mice() function: www.rdocumentation.org/packages/mice/versions/3.16.0/topics/mice, to see what methods are applicable for ordered or unordered categorical variable imputation. You should scroll down the page up until the Details section. Alternatively, you can do list-wise deletion like in the tutorial above, yet this would bring some cons with it. See the Listwise Deletion tutorial: statisticsglobe.com/listwise-deletion-missing-data/ for the details. Best, Cansu
I am trying to use ggscatter but I have many NAs in y column and no correlation coefficient appears. Is there any way of ignoring these NAs or changing them to "0"? please help me, thank you.
Hi Michelle, usually I try to replace missing values using missing data imputation methods. You can find more info here: statisticsglobe.com/missing-data-imputation-statistics/ Regards, Joachim
hi, i'm trying to do cov. with two groups of values, but one has NAs and R doesn't allow me to remove themwhan i do the cov, and if i rewrite the two groups without NA they are different in lenght, so cov can't be done, what i can do? ;(
Hello Francesco, It is always better to check the documentation of the function. There, you can see if the function offers a handling method. See the documentation here: www.rdocumentation.org/packages/pbdDMAT/versions/0.5-1/topics/covariance Best, Cansu
Hello i have a question! Should you always remove missing values in dataset (especially for public data)? Or do we need to consider the proportion of missing data, missing value type (MCAR, MAR, NMAR), and skewness of the data? I’m really struggled with this particular issue (not the technique, but the judgement as to remove missing values or not), Please shed me a light and thanks!
Hello Atthoriq, It is absolutely far from a good idea to remove the missing data unless the missingness is MCAR. This tutorial only discusses some missing value-removing functions, not the concept. Handling missing data is a HUGE concept on its own. Maybe these tutorials of ours: statisticsglobe.com/missing-data/, statisticsglobe.com/missing-data-imputation-statistics/ might be a starting point. Regards, Cansu
na.omit is removing the whole row. what if I do not remove the whole row? Is there any way I can plot geom_line without omitting na? The plot needs to ignore the point where there is a na?
Hello Tirtha, I think geom_line works as you wish by default. But if you want to avoid the gaps due to NA values. You can check our tutorial statisticsglobe.com/connect-lines-across-missing-values-ggplot2-line-plot-r. If the tutorial is not relevant to what you ask, please describe your wish in a bit more detail. Then I can try to find other solutions. Regards, Cansu
@@cansustatisticsglobe Hi, thank you so much for your reply. The tutorial that you showed is ok for one x,y pair. But I am looking for x, y1,y2,y3 dataframe. Now, if a data is NA in y1, not necessarily NA in y2, and y3. If I want to plot geom_line x-y1,x-y2,x-y3, what should I do?
@@tirthanandi6122 You are welcome. You can create new data columns for x-y1, x-y2, and x-y3 by simple data manipulation, then the data for x-y1 will be NA in some rows but not for x-y2 and x-y3. Ggplot will ignore the missing values and there will be breaks in your lines (I assume you pot multiple lines). If this solution doesn't address the issue please share your code with me then let me know what you want to change in the visual. I hope I can help then. Regards, Cansu
After omitting the NA the nos of rows still show the numbers in the original data set . Though I see that the number of row in the data after committing the rows is 111. which code can I use to get this 111 as nrow() gives me the original numbers
Hi Lavina, So you want to rename the rownames of the new data frame to be equal to the number of rows? Then you could use the following R code: rownames(data)
Hello, I am not sure if I got your question very well. Are you asking if the missing values are shown with other characters instead of NA? Regards, Cansu
@@cansustatisticsglobe yes, sir. In my data, missing values are shown by "?" instead of "NA". However, i have already known the solution by watching your other videos. Thanks a lot.
Hey Anand, Actually you can use the first three examples of the video also for categorical variables. Only the last example (taking the mean) is not applicable to categoricals. Regards, Joachim
when you ran na.omit(airquality) before mean(airquality$ozone) already rows with NAs were deleted, giving you a complete numeric dataset, then why mean(airquality$ozone) is returning NA again....
Hey Aditya, na.omit(airquality) is not storing the complete data set in a new data object. You may use this code to store the complete data set: airquality_complete
Hey Jeneva, thanks for the question. You can use the code shown in examples 5 and 6 of this tutorial: statisticsglobe.com/r-remove-data-frame-rows-with-some-or-all-na Regards, Joachim
Statistics Globe thank you very much, but I have another dilemma as I need to include the unique ID of the data for merging later, is there a way where I can only select columns with NA values in the row are present, so only that will be deleted? thank you very much for helping
@@jenevavergara4125 Is the following code working for you? data[rowSums(is.na(data[ , ! colnames(data) %in% "ID"])) == ncol(data[ , ! colnames(data) %in% "ID"]), ]
How can I delete a certain row only if the amount of NA's surpasses a certain threshold? E.g. when I have like 100 slope coefficients, but only one value is missing, it sounds a bit harsh to delete the whole row. How can I tell R to only delete the row, if there's let's say more than 10 NA's?
Hi Hezzi, in this case you should have a look at missing data imputation. For example, you may have a look at this tutorial: statisticsglobe.com/predictive-mean-matching-imputation-method/ Regards, Joachim
is that possbile to change na from a particular rows like I have created Code : airquality[is.na(airquality[52:61, c(1, 2)])] = 7 but it not working then I create code like this one : airquality[is.na(airquality[52:61, c(airquality$Ozone)])] = "Sherry" this one is also not working
Hey Shehriyar, thanks for the question. You can use the following R codes: airquality[52:61, c(1, 2)][is.na(airquality[52:61, c(1, 2)])] = 7 airquality[52:61, "Ozone"][is.na(airquality[52:61, "Ozone"])] = "Sherry" Regards, Joachim
The problem is that depending on the package na.rm does not work. It seems that each package has its own way to consider NAs. This is stressful when you are used to SAS.
Hey Maria, you can use na.omit to remove rows with NA values before applying other functions. Note that it is often better to impute missing values via missing data imputation techniques, but this depends on your specific data.
@@StatisticsGlobe in epidemiology we "rarely" impute data, unless with multiple imputation after kowning very well what is going on with data , that is, sampling and understanding who are the missings. I know that for certain areas imputation is always recomended. Thanks.
Thank you, Maybe you can even help me further... How can I exclude single missing values from cases runinng Confirmatory Factor Analysis , without deleting the whole cases? I think the "na.rm=TRUE"-function should be the right one, but it seems that this doesnt work with the CFA-function (lavaan). When I do this, R still excludes the whole cases from the analysis. I would be so thankful, if anyone could help me!
Hey Tina, I recommend to apply a missing data imputation method such as predictive mean matching. The following tutorial provides more info: statisticsglobe.com/predictive-mean-matching-imputation-method/ Regards, Joachim
Thank you so much for your fast answer and for the hint! I will definetely consider that option. So do you think it's not possible the way I wanted to do it (just exclude the values) in combination with the cfa-function? Best regrards :-)
As far as I know, it is not possible. I'm not an expert for CFA though, so please double check somewhere else. In general: Imputation is almost always better than deletion methods, since otherwise your results are likely to have a (stronger) bias. Regards, Joachim
Hey Azad, thank you for your comment! Unfortunately my tutorials do not follow a clear order. I have planned to publish a huge overview on R programming soon, in which I will structure all tutorials. I hope I'll find the time for it soon. Regards, Joachim
Hi Sofia, you may either apply a listwise deletion (see here: statisticsglobe.com/complete-cases-in-r-example/) or you may extract the column as a vector and then remove NAs (see here: statisticsglobe.com/convert-data-frame-column-to-a-vector-in-r). I hope that helps! Regards, Joachim
Hey maybe you can help me. On university we have a project and we need to remove all the NA's from our data but the problem is I don't know how to remove Na's if they are "words" instead of "numbers". For example -> you get the variable "house" and then "new house", "old house", "big house", "small house" and then there are also some NA's . I tried it with complete.cases but it didn't work and also with "factor" so I decided to do it one by one and the parts with numbers were easy.
Hey Janine, Thanks for the comment. That's actually a very common problem. I suggest to replace the word-NAs with real NA values first. You can do that with the following code: data[data == "NA"]
Can you just remove NA's from a specific column within a data set? For example, if I have a column such as "wind chill" which has a lot of blanks when its not cold outside, I don't want to erase all of that data from the data set if I am looking at another column/vector of interest. Thanks!
Hey Jay, you may impute your missing values. This depends a lot on the content of your variable though. You may have a look at this tutorial for more information: statisticsglobe.com/missing-data-imputation-statistics/ I hope that helps! Joachim
Sir, I would like to mutate a column named Daily revenue , which is added with promotion_revenue and non_promotion_revenue. However, there are some rows consists of NA in promotion_revenue whereas $30 in non_promotion_revenue. When I compute, the mutated column (Daily Revenue) will show the daily revenue in NA, even if there is number in one of the columns. I ady applied na.rm = TRUE in the summarize code summarize(daily_revenue = sum(total_rev, na.rm = TRUE)) , it doesn't work.
# 1. Load R packages > library("quantstrat") > > # 2. Stock Instrument Initialization > > # 2.1. Initial Settings > start.pf start.date end.date Sys.setenv(TZ='UTC') > init.eq # 2.2. Data Downloading or Reading > > # Data Downloading > getSymbols(Symbols='BMW',src='yahoo',from=start.date,to=end.date) [1] "BMW" Warning message: BMW contains missing values. Some functions will not work if objects contain missing values in the middle of the series. Consider using na.omit(), na.approx(), na.fill(), etc to remove or replace them. i don't want to see these errors how should i fix it
Hey Durdu, it seems like your data contains missing values. You may remove these missing values using the na.omit function as explained here: statisticsglobe.com/na-omit-r-example/ Please note that removing NA values should be theoretically justified.
@@StatisticsGlobe sorry for the inconvenience, I meant to ask that if in some table I receive NA than how shall I replace it with some Specific Value Of my choice. In all the cells.
Hi Zeus, you can find a detailed tutorial on exporting Excel files here: statisticsglobe.com/write-xlsx-xls-export-data-from-r-to-excel-file Does this answer your question? Regards, Joachim
@@StatisticsGlobe hello sir... could you please explain me about R functions and function components like function name, arguments, function body and return value... or can you make a video on this topic thanks
@@mohammadbasheer6192 Do you mean functions that are already available in R or do you mean user-defined functions? If you want to learn more about already available functions, you could have a look here: statistical-programming.com/r-functions-list/ If you want to learn more about user-defined functions, you could have a look here: statistical-programming.com/r-return-value-from-function-example
THank you very much for this video (Just subscribed). How do you remove 'NA" from a data set that has no numeric values. Say I just had to Columns( Name and Hair Color) and some of the Hair colors were NA.. how would I omit that?
Hey Frank, Thanks for subscribing! :) The class of your variables does not matter, you can apply the functions shown in this video the same way. If it doesn't work, you could check if your NA values are real NA values or if they are "NA" charater strings. In this case, you could replace the "NA" by real NA as shown in the following example code: data
What if I had two entries for each SUBJECT and I want to filter both of their entries if one of their entries in another collumn is NA? ps: great video as always!
@@StatisticsGlobe Hi, thank you!! I went for a tidy solution check it out: data %>% group_by(SUBJECT) %>% filter(all(!is.na(MYVARIABLE))) does that make sense?
Hey, this is difficult to tell without seeing your actual data, but I think this should produce a different result as my code. Is there a specific reason why you would like to use tidy instead of Base R?
@Gummibärmann Listwise Deletion wird in R normalerweise mit der Funktion complete.cases durchgeführt. Du kannst dir hierzu dieses Video ab Minute 2:40 anschauen: ruclips.net/video/OVHIYAEAHLY/видео.html Außerdem habe ich auf meiner Homepage ein Tutorial dazu veröffentlicht: statisticsglobe.com/listwise-deletion-missing-data/ Gib gerne Bescheid, ob dir die beiden Links geholfen haben :) Gruß Joachim
@Der Humanist Danke für deine Rückfrage. Es scheint so als hätte euer dozent der Variable help immer eine 1 zugewiesen, wenn eine der anderen Variablen in df NA ist. Hat er danach eventuell ein Subset von df genommen, in dem nur die Beobachtungen drin sind, die in help = 0 sind? Dann wäre das (auf umständlichere Weise) das Gleiche wie wenn man die complete.cases Funktion verwendet. Ohne genauere Informationen ist das für mich aber ehrlich gesagt schwer zu beurteilen.
Hello, Thank you for your comment. Do you mean that you would like to see a tutorial on this topic? Is there something specific that you would like to know about tR programming for t-test two-tail tabulated value in the plot? Regards, Cansu
@@StatisticsGlobe thank you very much, could I maybe send it to you on email or on another platform as the question may be a little long if you’re happy to suggest one ?
@@StatisticsGlobe Can I ask in R, if I have got 2 data sets, of different rows and columns but I want to merge them and this is based on one of the columns in each data set. So if the first column in dataset1 has 3 values and the first column in dataset2 has 9 values but the way the data is is such that each of the values in the first column of the first dataset maps onto 3 values in the second dataset how do i do it?
so like if the first column in dataset 1 has values 1 , 2 , 3 and first column in dataset 2 has values 1a 1b 1c 2a 2b 2c 3a 3b 3c and I want to merge the 2 columns based on the numbers but clearly first dataset only has 3 rows second dataset has 9 rows and I want to merge them so I can perform functions on them how do I do it thanks
@@StatisticsGlobe sorry for the long question. So all this must be done with R base package. Do let me know if you are able to help with this. Many thanks.
@Statistics Globe Vielen Dank für das tolle Video. Das hat wirklich geholfen :) Leider habe ich immer noch ein Problem, und ich hoffe wirklch sehr, dass du meine Frage beantworten kannst. An welche Stelle setzte ich das na.rm = TRUE in einem komplexeren Code? Ich bekomme immer eine Fehlermeldung und ich schätze (laut Internetrecherche) dass diese etwas mit den NA zu tun hat: Fehler in KhatriRao(sm, t(mm)) : (p
Hallo Paula, vielen Dank für die netten Worte. Freut mich sehr, dass dir meine Tutorials gefallen! :) Die Antwort auf deine Frage findest du in der Dokumentation der lmer Funktion. Diese kannst du mit dem R Code ?lmer aufrufen. Hierin steht: "na.action a function that indicates what should happen when the data contain NAs. The default action (na.omit, inherited from the 'factory fresh' value of getOption("na.action")) strips any observations with any missing values in any variables." In anderen Worten: Die Option na.rm ist bereits automatisch aktiviert, wenn du die lmer Funktion verwendest. Bitte beachte, dass dies auch zu Risiken bei der Datenanalyse führen kann und dass du eventuell deine Daten imputieren solltest. Mehr Informationen findest du hier: statisticsglobe.com/missing-data/ Viele Grüße, Joachim
Hey Rhena, auf diesem Channel lade ich nur englischsprachige Videos hoch. Aber ich habe schon geplant demnächst eine teilweise deutschsprachige Webseite zu erstellen, ich hoffe, das hilft dann weiter! :) Viele Grüße, Joachim
I have been following your tutorials for a couple of days now. I want to say thank you, they are truly direct and straight to the point. I wish that you would offer consultation to students even if you decide to charge a price on it. Because sometimes one might get stuck and not know what to do.
Thanks a lot for the very kind feedback Careen, glad you find the tutorials useful! :) In case you have any questions, you may post them to the Statistics Globe Facebook group: facebook.com/groups/statisticsglobe Regards, Joachim
Directly answered what I was looking for - Thank you!
I have used 'drop_na()' as oppose to 'na.omit()' for the most part, but always good to know alternative ways of doing things.
That's great to hear! Thank you very much for the feedback, Ezhan.
Very clear and succinct. All the info I needed clearly explained. 👍🏾
Thanks for the kind words, Shambo!
Love it, thank you one more time dude! Love the way you prepared your lessons ´cause they are really short, focus on an specific context and finally you gave us multiple solutions for an scenario, so thats the way it must be.
Awesome, thank you very much for the very nice feedback Anthony! :)
Thank you for the good lesson; explained very clearly.
Thank you very much for the feedback! Great to hear you like the video/explanations!
Thank you so much for the well-explained video. Keep on posting them please. You are doing a great job!
Wow, thanks a lot Shirin! More videos to come! :)
@@StatisticsGlobe I am very excited about it!
Awesome content! Very well demostrated!
Thank you so much, glad you liked it!
Your videos are amazing and easy to understand! Thank you!!!
Thanks a lot for the nice comment Eapen! Glad you like them!
excellent joachim, perfectly explained
Glad you liked it Arun, thank you for the kind words! :)
Thank you so much. Easiest method to remove NAs.
Thank you for the kind words Jabab!
Informative and well explained
Thank you David, glad you think so!
Excellent explanation! You are a fantastic teacher!
Thanks a lot for the awesome compliment! :)
Thank you so much! You have been such a good help.
That's great to hear Roshny! Thanks a lot for you support!
very simple to follow sir.
Glad to hear that Jabab, thanks for letting me know :)
Bravo! So well explained! Thank you
Glad you enjoyed it Dominique!
Thanks you,tutorial was very helpful
Thanks again Ronald! :)
hi, your video demonstration is very useful. keep it up !
Hi, thanks a lot for the positive feedback. Nice to hear that you like the videos!
Tabulated value and calculated value in t-test normal distribution by plot in R programming
Hello,
Thank you for your comment. Do you mean that you would like to see a tutorial on this topic? Is there something specific that you would like to know about tabulated value and calculated value in t-test normal distribution by plot in R programming ?
Regards,
Cansu
How to deal with the missing data for catergory variable, please?
Hello,
If you assume that the missingness in your data is MAR, see statisticsglobe.com/missing-data/.You can use multiple imputation (maybe the most preferred method under MAR) to impute your values. You can check the documentation of the mice() function: www.rdocumentation.org/packages/mice/versions/3.16.0/topics/mice, to see what methods are applicable for ordered or unordered categorical variable imputation. You should scroll down the page up until the Details section.
Alternatively, you can do list-wise deletion like in the tutorial above, yet this would bring some cons with it. See the Listwise Deletion tutorial: statisticsglobe.com/listwise-deletion-missing-data/ for the details.
Best,
Cansu
I am trying to use ggscatter but I have many NAs in y column and no correlation coefficient appears. Is there any way of ignoring these NAs or changing them to "0"? please help me, thank you.
Hey Jenny, I have never used ggscatter, but you may replace NA values by 0 as shown here: statisticsglobe.com/r-replace-na-with-0/
@@StatisticsGlobe Thank you, I fixed it
Glad you found a solution!
how do you handle or replace NA values in a dataset where dates and other numeric information is missing .
Hi Michelle, usually I try to replace missing values using missing data imputation methods. You can find more info here: statisticsglobe.com/missing-data-imputation-statistics/ Regards, Joachim
Thank you.
You are very welcome!
Excellent work
Many thanks Atif! :)
Thank you very much!
You're very welcome Clayton!
Thanks it is very informative.
Thank you Sameena, glad you liked it!
hi, i'm trying to do cov. with two groups of values, but one has NAs and R doesn't allow me to remove themwhan i do the cov, and if i rewrite the two groups without NA they are different in lenght, so cov can't be done, what i can do? ;(
Hello Francesco,
It is always better to check the documentation of the function. There, you can see if the function offers a handling method. See the documentation here: www.rdocumentation.org/packages/pbdDMAT/versions/0.5-1/topics/covariance
Best,
Cansu
Thanks for this video
Most welcome Kangben! :)
Hello i have a question!
Should you always remove missing values in dataset (especially for public data)? Or do we need to consider the proportion of missing data, missing value type (MCAR, MAR, NMAR), and skewness of the data?
I’m really struggled with this particular issue (not the technique, but the judgement as to remove missing values or not), Please shed me a light and thanks!
Hello Atthoriq,
It is absolutely far from a good idea to remove the missing data unless the missingness is MCAR. This tutorial only discusses some missing value-removing functions, not the concept. Handling missing data is a HUGE concept on its own. Maybe these tutorials of ours: statisticsglobe.com/missing-data/, statisticsglobe.com/missing-data-imputation-statistics/ might be a starting point.
Regards,
Cansu
@@cansustatisticsglobe Thank you. I'll check the article now.
na.omit is removing the whole row. what if I do not remove the whole row? Is there any way I can plot geom_line without omitting na? The plot needs to ignore the point where there is a na?
Hello Tirtha,
I think geom_line works as you wish by default. But if you want to avoid the gaps due to NA values. You can check our tutorial statisticsglobe.com/connect-lines-across-missing-values-ggplot2-line-plot-r. If the tutorial is not relevant to what you ask, please describe your wish in a bit more detail. Then I can try to find other solutions.
Regards,
Cansu
@@cansustatisticsglobe Hi, thank you so much for your reply. The tutorial that you showed is ok for one x,y pair. But I am looking for x, y1,y2,y3 dataframe. Now, if a data is NA in y1, not necessarily NA in y2, and y3. If I want to plot geom_line x-y1,x-y2,x-y3, what should I do?
@@tirthanandi6122 You are welcome. You can create new data columns for x-y1, x-y2, and x-y3 by simple data manipulation, then the data for x-y1 will be NA in some rows but not for x-y2 and x-y3. Ggplot will ignore the missing values and there will be breaks in your lines (I assume you pot multiple lines). If this solution doesn't address the issue please share your code with me then let me know what you want to change in the visual. I hope I can help then.
Regards,
Cansu
After omitting the NA the nos of rows still show the numbers in the original data set . Though I see that the number of row in the data after committing the rows is 111. which code can I use to get this 111 as nrow() gives me the original numbers
Hi Lavina, So you want to rename the rownames of the new data frame to be equal to the number of rows? Then you could use the following R code: rownames(data)
How can You make a new data frame that excludes all the NA values
Hey, please try the following R code: data_new
and how do i do if it only shows other characters but not "NA", sir?
Hello,
I am not sure if I got your question very well. Are you asking if the missing values are shown with other characters instead of NA?
Regards,
Cansu
@@cansustatisticsglobe yes, sir. In my data, missing values are shown by "?" instead of "NA". However, i have already known the solution by watching your other videos. Thanks a lot.
@@hoax9784 Perfect!
Thanks for this video.
You are welcome Vivek! :)
How to handle missing values in category variables not mentioned ??
Hey Anand, Actually you can use the first three examples of the video also for categorical variables. Only the last example (taking the mean) is not applicable to categoricals. Regards, Joachim
How do i merge two datasets A and B but data set B is a small data that has to go and replace certain cells in A
Hi Taruving. Maybe you could so something like this:
A
What of if there were character variables
Hey, most of these methods also work for character data.
when you ran na.omit(airquality) before mean(airquality$ozone) already rows with NAs were deleted, giving you a complete numeric dataset, then why mean(airquality$ozone) is returning NA again....
Hey Aditya, na.omit(airquality) is not storing the complete data set in a new data object. You may use this code to store the complete data set:
airquality_complete
@@StatisticsGlobe Wie speichere ich diesen neu erstellen Datensatz als eigenes Rda File? :-)
I like all your Video
This is great to hear! Thanks for the wonderful feedback Tamoghna! :)
how about if I only want to remove rows with all values are NA?
Hey Jeneva, thanks for the question. You can use the code shown in examples 5 and 6 of this tutorial: statisticsglobe.com/r-remove-data-frame-rows-with-some-or-all-na Regards, Joachim
Statistics Globe thank you very much, but I have another dilemma as I need to include the unique ID of the data for merging later, is there a way where I can only select columns with NA values in the row are present, so only that will be deleted? thank you very much for helping
EX. in my dataset i have column names: "ID" "A" "B" "C" "D" i only want to delete the rows with NAs in column A B & C
@@jenevavergara4125 Is the following code working for you? data[rowSums(is.na(data[ , ! colnames(data) %in% "ID"])) == ncol(data[ , ! colnames(data) %in% "ID"]), ]
How does this work the other way round? For example, I want all values in my dataframe to become NA if they are below 0.4. Thank you!
you can use df[df < 0.4] = NA
Thanks Yannick, that would have been my recommendation as well :)
Thanks guys! It worked
Great to hear!
@@StatisticsGlobe To elaborate on my question from earlier.. How do you remove all values between - 0.4 and 0.4? I tried 'data[data -0.4]
How can I delete a certain row only if the amount of NA's surpasses a certain threshold? E.g. when I have like 100 slope coefficients, but only one value is missing, it sounds a bit harsh to delete the whole row. How can I tell R to only delete the row, if there's let's say more than 10 NA's?
Hey Borknagar, the following R code should do the trick: data_new
@@StatisticsGlobe Worked perfectly, thx a lot. (Y)
Nice to hear Borknagar, thanks for letting me know :)
I was looking for how to working with the missing data, not to remove entire row that has NA, there are other columns for each row containing NA
Hi Hezzi, in this case you should have a look at missing data imputation. For example, you may have a look at this tutorial: statisticsglobe.com/predictive-mean-matching-imputation-method/ Regards, Joachim
is that possbile to change na from a particular rows like I have created Code : airquality[is.na(airquality[52:61, c(1, 2)])] = 7 but it not working then I create code like this one : airquality[is.na(airquality[52:61, c(airquality$Ozone)])] = "Sherry" this one is also not working
Hey Shehriyar, thanks for the question. You can use the following R codes:
airquality[52:61, c(1, 2)][is.na(airquality[52:61, c(1, 2)])] = 7
airquality[52:61, "Ozone"][is.na(airquality[52:61, "Ozone"])] = "Sherry"
Regards, Joachim
Hello,
How do handle NaN in R?
Hey Shanti, please have a look here: statisticsglobe.com/nan-in-r-is-nan-function
The problem is that depending on the package na.rm does not work. It seems that each package has its own way to consider NAs. This is stressful when you are used to SAS.
Hey Maria, you can use na.omit to remove rows with NA values before applying other functions. Note that it is often better to impute missing values via missing data imputation techniques, but this depends on your specific data.
@@StatisticsGlobe in epidemiology we "rarely" impute data, unless with multiple imputation after kowning very well what is going on with data , that is, sampling and understanding who are the missings. I know that for certain areas imputation is always recomended. Thanks.
OK I see, I have no experience in this field myself :)
Thank you, Maybe you can even help me further... How can I exclude single missing values from cases runinng Confirmatory Factor Analysis , without deleting the whole cases? I think the "na.rm=TRUE"-function should be the right one, but it seems that this doesnt work with the CFA-function (lavaan). When I do this, R still excludes the whole cases from the analysis. I would be so thankful, if anyone could help me!
Hey Tina, I recommend to apply a missing data imputation method such as predictive mean matching. The following tutorial provides more info: statisticsglobe.com/predictive-mean-matching-imputation-method/ Regards, Joachim
Thank you so much for your fast answer and for the hint! I will definetely consider that option. So do you think it's not possible the way I wanted to do it (just exclude the values) in combination with the cfa-function? Best regrards :-)
As far as I know, it is not possible. I'm not an expert for CFA though, so please double check somewhere else. In general: Imputation is almost always better than deletion methods, since otherwise your results are likely to have a (stronger) bias. Regards, Joachim
Sir, in your statisticsglobe website, where do we start? As a beginner to R, I'd like to know as to where to start. Thanks
Hey Azad, thank you for your comment! Unfortunately my tutorials do not follow a clear order. I have planned to publish a huge overview on R programming soon, in which I will structure all tutorials. I hope I'll find the time for it soon. Regards, Joachim
@@StatisticsGlobe OK Sir. Till then, I will try to watch the videos as best as I can. Thank you very much for all your work.
You are very welcome Azad! Let me know in the comments in case you have any questions :)
How can I remove NA values only if it is in a certain colunm.
Hi Sofia, you may either apply a listwise deletion (see here: statisticsglobe.com/complete-cases-in-r-example/) or you may extract the column as a vector and then remove NAs (see here: statisticsglobe.com/convert-data-frame-column-to-a-vector-in-r). I hope that helps! Regards, Joachim
Hey maybe you can help me. On university we have a project and we need to remove all the NA's from our data but the problem is I don't know how to remove Na's if they are "words" instead of "numbers". For example -> you get the variable "house" and then "new house", "old house", "big house", "small house" and then there are also some NA's . I tried it with complete.cases but it didn't work and also with "factor" so I decided to do it one by one and the parts with numbers were easy.
Hey Janine, Thanks for the comment. That's actually a very common problem. I suggest to replace the word-NAs with real NA values first. You can do that with the following code: data[data == "NA"]
Can you just remove NA's from a specific column within a data set? For example, if I have a column such as "wind chill" which has a lot of blanks when its not cold outside, I don't want to erase all of that data from the data set if I am looking at another column/vector of interest. Thanks!
Hey Jay, you may impute your missing values. This depends a lot on the content of your variable though. You may have a look at this tutorial for more information: statisticsglobe.com/missing-data-imputation-statistics/ I hope that helps! Joachim
When I try sum(is.na(data)) I am getting error as argument y is missing
Hi Manjunath, could you provide an example how your data looks like? Regards, Joachim
maybe you need to use dataset name if you have use data(airquality) then sum(is.na(airquality) or any other name that you have used for your data .
Sir, I would like to mutate a column named Daily revenue , which is added with promotion_revenue and non_promotion_revenue. However, there are some rows consists of NA in promotion_revenue whereas $30 in non_promotion_revenue. When I compute, the mutated column (Daily Revenue) will show the daily revenue in NA, even if there is number in one of the columns. I ady applied na.rm = TRUE in the summarize code summarize(daily_revenue = sum(total_rev, na.rm = TRUE)) , it doesn't work.
I tried this, failed :(
mutate((total_rev = promo_revenue + non_promo_revenue), na.rm = TRUE) %>%
group_by(order_date) %>%
summarize(daily_revenue = sum(total_rev))
promo_revenue, non_promo_revenue, total_rev
2020-03-18 NA 14.90 NA
2020-03-18 42.47 10.85 53.32
Hi Daphne, you may replace the NA values by 0 before taking the sum. You can find more information here: statisticsglobe.com/r-replace-na-with-0/
@@StatisticsGlobe thanks! Got it~
YESSSSS THANK YOUUUUU
You're welcome Lahirukudaligamage. We are happy you found the tutorial helpful!
thank you so much ❤❤❤
Always welcome Siddhesh! :)
# 1. Load R packages
> library("quantstrat")
>
> # 2. Stock Instrument Initialization
>
> # 2.1. Initial Settings
> start.pf start.date end.date Sys.setenv(TZ='UTC')
> init.eq # 2.2. Data Downloading or Reading
>
> # Data Downloading
> getSymbols(Symbols='BMW',src='yahoo',from=start.date,to=end.date)
[1] "BMW"
Warning message:
BMW contains missing values. Some functions will not work if objects contain missing values in the middle of the series. Consider using na.omit(), na.approx(), na.fill(), etc to remove or replace them.
i don't want to see these errors how should i fix it
Hey Durdu, it seems like your data contains missing values. You may remove these missing values using the na.omit function as explained here: statisticsglobe.com/na-omit-r-example/ Please note that removing NA values should be theoretically justified.
good stuff
That's great to hear, Lewis! Thanks for the positive feedback!
hello, great videos thanks! question, if I wanted to get the NA values in a separate subset instead of omitting or removing them, what can I do?
You are welcome Jay, glad you like it! :) Regarding your question, please have a look at the following R code:
airquality_NA
How to Undefined In place of NA?
Hey Kush, could you please explain your question in some more detail? I don't understand what you would like to do. Regards, Joachim
@@StatisticsGlobe sorry for the inconvenience, I meant to ask that if in some table I receive NA than how shall I replace it with some Specific Value Of my choice. In all the cells.
Hey Kush, I recommend using missing data imputation techniques for this: statisticsglobe.com/missing-data-imputation-statistics/
How do you save omitted data in excel?
Hi Zeus, you can find a detailed tutorial on exporting Excel files here: statisticsglobe.com/write-xlsx-xls-export-data-from-r-to-excel-file
Does this answer your question? Regards, Joachim
@@StatisticsGlobe ruclips.net/video/G2ra7Ku3eGM/видео.html
hi, can write a code to replace missing value "NA" with mean
Hi, you can use the following code: x[is.na(x)]
@@StatisticsGlobe thank you sir
You are welcome :)
@@StatisticsGlobe hello sir... could you please explain me about R functions and function components like function name, arguments, function body and return value... or can you make a video on this topic
thanks
@@mohammadbasheer6192 Do you mean functions that are already available in R or do you mean user-defined functions? If you want to learn more about already available functions, you could have a look here: statistical-programming.com/r-functions-list/ If you want to learn more about user-defined functions, you could have a look here: statistical-programming.com/r-return-value-from-function-example
THank you very much for this video (Just subscribed). How do you remove 'NA" from a data set that has no numeric values. Say I just had to Columns( Name and Hair Color) and some of the Hair colors were NA.. how would I omit that?
Hey Frank, Thanks for subscribing! :) The class of your variables does not matter, you can apply the functions shown in this video the same way. If it doesn't work, you could check if your NA values are real NA values or if they are "NA" charater strings. In this case, you could replace the "NA" by real NA as shown in the following example code:
data
What if I had two entries for each SUBJECT and I want to filter both of their entries if one of their entries in another collumn is NA? ps: great video as always!
Hey Larissa, thank you very much, glad you like the video! Regarding your question, please have a look at the following example code:
data
@@StatisticsGlobe Hi, thank you!! I went for a tidy solution check it out: data %>%
group_by(SUBJECT) %>%
filter(all(!is.na(MYVARIABLE))) does that make sense?
Hey, this is difficult to tell without seeing your actual data, but I think this should produce a different result as my code. Is there a specific reason why you would like to use tidy instead of Base R?
Gibt es von dir auch ein Video wie ich das mit dem Befehl "listwise deletion" handeln kann?
@Gummibärmann Listwise Deletion wird in R normalerweise mit der Funktion complete.cases durchgeführt. Du kannst dir hierzu dieses Video ab Minute 2:40 anschauen: ruclips.net/video/OVHIYAEAHLY/видео.html Außerdem habe ich auf meiner Homepage ein Tutorial dazu veröffentlicht: statisticsglobe.com/listwise-deletion-missing-data/ Gib gerne Bescheid, ob dir die beiden Links geholfen haben :) Gruß Joachim
@Der Humanist Danke für deine Rückfrage. Es scheint so als hätte euer dozent der Variable help immer eine 1 zugewiesen, wenn eine der anderen Variablen in df NA ist. Hat er danach eventuell ein Subset von df genommen, in dem nur die Beobachtungen drin sind, die in help = 0 sind? Dann wäre das (auf umständlichere Weise) das Gleiche wie wenn man die complete.cases Funktion verwendet. Ohne genauere Informationen ist das für mich aber ehrlich gesagt schwer zu beurteilen.
@Der Humanist Freut mich, dass ich helfen konnte! Lassen Sie es mich gerne in den Kommentaren wissen, falls Sie weitere Fragen haben :)
Thanks
@Oluwadolapo Bifarin You are welcome :)
R programming for t-test two tail tabulated value in plot
Hello,
Thank you for your comment. Do you mean that you would like to see a tutorial on this topic? Is there something specific that you would like to know about tR programming for t-test two-tail tabulated value in the plot?
Regards,
Cansu
I LOVE YOU AAAAAAAAAA
Haha thx ;)
Hi can I ask a question please
Sure Jonathan, go ahead!
@@StatisticsGlobe thank you very much, could I maybe send it to you on email or on another platform as the question may be a little long if you’re happy to suggest one ?
@@StatisticsGlobe Can I ask in R, if I have got 2 data sets, of different rows and columns but I want to merge them and this is based on one of the columns in each data set. So if the first column in dataset1 has 3 values and the first column in dataset2 has 9 values but the way the data is is such that each of the values in the first column of the first dataset maps onto 3 values in the second dataset how do i do it?
so like if the first column in dataset 1 has values 1 , 2 , 3 and first column in dataset 2 has values 1a 1b 1c 2a 2b 2c 3a 3b 3c and I want to merge the 2 columns based on the numbers but clearly first dataset only has 3 rows second dataset has 9 rows and I want to merge them so I can perform functions on them how do I do it thanks
@@StatisticsGlobe sorry for the long question. So all this must be done with R base package. Do let me know if you are able to help with this. Many thanks.
@Statistics Globe Vielen Dank für das tolle Video. Das hat wirklich geholfen :) Leider habe ich immer noch ein Problem, und ich hoffe wirklch sehr, dass du meine Frage beantworten kannst. An welche Stelle setzte ich das na.rm = TRUE in einem komplexeren Code?
Ich bekomme immer eine Fehlermeldung und ich schätze (laut Internetrecherche) dass diese etwas mit den NA zu tun hat: Fehler in KhatriRao(sm, t(mm)) : (p
Hallo Paula, vielen Dank für die netten Worte. Freut mich sehr, dass dir meine Tutorials gefallen! :) Die Antwort auf deine Frage findest du in der Dokumentation der lmer Funktion. Diese kannst du mit dem R Code ?lmer aufrufen. Hierin steht:
"na.action
a function that indicates what should happen when the data contain NAs. The default action (na.omit, inherited from the 'factory fresh' value of getOption("na.action")) strips any observations with any missing values in any variables."
In anderen Worten: Die Option na.rm ist bereits automatisch aktiviert, wenn du die lmer Funktion verwendest. Bitte beachte, dass dies auch zu Risiken bei der Datenanalyse führen kann und dass du eventuell deine Daten imputieren solltest. Mehr Informationen findest du hier: statisticsglobe.com/missing-data/
Viele Grüße, Joachim
great
@Suman Ghosh Thank you very much! :)
Könntest du das auch noch mal in Deutsch aufnehmen? :D
Hey Rhena, auf diesem Channel lade ich nur englischsprachige Videos hoch. Aber ich habe schon geplant demnächst eine teilweise deutschsprachige Webseite zu erstellen, ich hoffe, das hilft dann weiter! :) Viele Grüße, Joachim
Thanks. Very helpful
Thanks for the comment Eyad :)