Handling NA in R | is.na, na.omit & na.rm Functions for Missing Values

Поделиться
HTML-код
  • Опубликовано: 27 янв 2025

Комментарии • 203

  • @careenevans
    @careenevans 3 года назад +3

    I have been following your tutorials for a couple of days now. I want to say thank you, they are truly direct and straight to the point. I wish that you would offer consultation to students even if you decide to charge a price on it. Because sometimes one might get stuck and not know what to do.

    • @StatisticsGlobe
      @StatisticsGlobe  2 года назад

      Thanks a lot for the very kind feedback Careen, glad you find the tutorials useful! :) In case you have any questions, you may post them to the Statistics Globe Facebook group: facebook.com/groups/statisticsglobe Regards, Joachim

  • @ezhankhan1035
    @ezhankhan1035 Год назад +2

    Directly answered what I was looking for - Thank you!
    I have used 'drop_na()' as oppose to 'na.omit()' for the most part, but always good to know alternative ways of doing things.

  • @shambo9807
    @shambo9807 Год назад +1

    Very clear and succinct. All the info I needed clearly explained. 👍🏾

  • @anthonyfernandezgonzalez8262
    @anthonyfernandezgonzalez8262 3 года назад +1

    Love it, thank you one more time dude! Love the way you prepared your lessons ´cause they are really short, focus on an specific context and finally you gave us multiple solutions for an scenario, so thats the way it must be.

    • @StatisticsGlobe
      @StatisticsGlobe  3 года назад

      Awesome, thank you very much for the very nice feedback Anthony! :)

  • @WaayoArag.
    @WaayoArag. Год назад +1

    Thank you for the good lesson; explained very clearly.

    • @matthias.statisticsglobe
      @matthias.statisticsglobe Год назад +1

      Thank you very much for the feedback! Great to hear you like the video/explanations!

  • @shirinisabekova5504
    @shirinisabekova5504 4 года назад +1

    Thank you so much for the well-explained video. Keep on posting them please. You are doing a great job!

  • @mycountryfarm
    @mycountryfarm 10 месяцев назад +1

    Awesome content! Very well demostrated!

    • @StatisticsGlobe
      @StatisticsGlobe  10 месяцев назад

      Thank you so much, glad you liked it!

  • @eapen4irm
    @eapen4irm 3 года назад +1

    Your videos are amazing and easy to understand! Thank you!!!

    • @StatisticsGlobe
      @StatisticsGlobe  3 года назад

      Thanks a lot for the nice comment Eapen! Glad you like them!

  • @arunbioinfo1100
    @arunbioinfo1100 3 года назад +1

    excellent joachim, perfectly explained

    • @StatisticsGlobe
      @StatisticsGlobe  3 года назад

      Glad you liked it Arun, thank you for the kind words! :)

  • @jababnamgay6366
    @jababnamgay6366 4 года назад +1

    Thank you so much. Easiest method to remove NAs.

  • @DavidKaranjamdavis
    @DavidKaranjamdavis 3 года назад +1

    Informative and well explained

  • @multitaskprueba1
    @multitaskprueba1 4 года назад

    Excellent explanation! You are a fantastic teacher!

  • @roshnyabraham7941
    @roshnyabraham7941 2 года назад +1

    Thank you so much! You have been such a good help.

  • @jababnamgay6366
    @jababnamgay6366 4 года назад +1

    very simple to follow sir.

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад

      Glad to hear that Jabab, thanks for letting me know :)

  • @dominiquebarrette9621
    @dominiquebarrette9621 4 года назад +1

    Bravo! So well explained! Thank you

  • @mugangakivumbi
    @mugangakivumbi 4 года назад +1

    Thanks you,tutorial was very helpful

  • @malakasamaraweera6736
    @malakasamaraweera6736 5 лет назад +1

    hi, your video demonstration is very useful. keep it up !

    • @StatisticsGlobe
      @StatisticsGlobe  5 лет назад

      Hi, thanks a lot for the positive feedback. Nice to hear that you like the videos!

  • @mdiqbal7168
    @mdiqbal7168 2 года назад +1

    Tabulated value and calculated value in t-test normal distribution by plot in R programming

    • @cansustatisticsglobe
      @cansustatisticsglobe 2 года назад

      Hello,
      Thank you for your comment. Do you mean that you would like to see a tutorial on this topic? Is there something specific that you would like to know about tabulated value and calculated value in t-test normal distribution by plot in R programming ?
      Regards,
      Cansu

  • @organ1181
    @organ1181 Год назад +1

    How to deal with the missing data for catergory variable, please?

    • @cansustatisticsglobe
      @cansustatisticsglobe Год назад

      Hello,
      If you assume that the missingness in your data is MAR, see statisticsglobe.com/missing-data/.You can use multiple imputation (maybe the most preferred method under MAR) to impute your values. You can check the documentation of the mice() function: www.rdocumentation.org/packages/mice/versions/3.16.0/topics/mice, to see what methods are applicable for ordered or unordered categorical variable imputation. You should scroll down the page up until the Details section.
      Alternatively, you can do list-wise deletion like in the tutorial above, yet this would bring some cons with it. See the Listwise Deletion tutorial: statisticsglobe.com/listwise-deletion-missing-data/ for the details.
      Best,
      Cansu

  • @lsjenny2198
    @lsjenny2198 2 года назад +2

    I am trying to use ggscatter but I have many NAs in y column and no correlation coefficient appears. Is there any way of ignoring these NAs or changing them to "0"? please help me, thank you.

    • @StatisticsGlobe
      @StatisticsGlobe  2 года назад +1

      Hey Jenny, I have never used ggscatter, but you may replace NA values by 0 as shown here: statisticsglobe.com/r-replace-na-with-0/

    • @lsjenny2198
      @lsjenny2198 2 года назад +1

      @@StatisticsGlobe Thank you, I fixed it

    • @StatisticsGlobe
      @StatisticsGlobe  2 года назад

      Glad you found a solution!

  • @Michelle-mv1gg
    @Michelle-mv1gg 3 года назад +1

    how do you handle or replace NA values in a dataset where dates and other numeric information is missing .

    • @StatisticsGlobe
      @StatisticsGlobe  3 года назад +1

      Hi Michelle, usually I try to replace missing values using missing data imputation methods. You can find more info here: statisticsglobe.com/missing-data-imputation-statistics/ Regards, Joachim

    • @Michelle-mv1gg
      @Michelle-mv1gg 3 года назад +1

      Thank you.

    • @StatisticsGlobe
      @StatisticsGlobe  3 года назад

      You are very welcome!

  • @atifdai313
    @atifdai313 2 года назад

    Excellent work

  • @claytontherrien7583
    @claytontherrien7583 2 года назад +1

    Thank you very much!

  • @sameenabanu5931
    @sameenabanu5931 4 года назад +1

    Thanks it is very informative.

  • @francesco8150
    @francesco8150 Год назад +1

    hi, i'm trying to do cov. with two groups of values, but one has NAs and R doesn't allow me to remove themwhan i do the cov, and if i rewrite the two groups without NA they are different in lenght, so cov can't be done, what i can do? ;(

    • @cansustatisticsglobe
      @cansustatisticsglobe Год назад

      Hello Francesco,
      It is always better to check the documentation of the function. There, you can see if the function offers a handling method. See the documentation here: www.rdocumentation.org/packages/pbdDMAT/versions/0.5-1/topics/covariance
      Best,
      Cansu

  • @fostkangben
    @fostkangben 3 года назад +1

    Thanks for this video

  • @atthoriqpp
    @atthoriqpp Год назад +2

    Hello i have a question!
    Should you always remove missing values in dataset (especially for public data)? Or do we need to consider the proportion of missing data, missing value type (MCAR, MAR, NMAR), and skewness of the data?
    I’m really struggled with this particular issue (not the technique, but the judgement as to remove missing values or not), Please shed me a light and thanks!

    • @cansustatisticsglobe
      @cansustatisticsglobe Год назад +2

      Hello Atthoriq,
      It is absolutely far from a good idea to remove the missing data unless the missingness is MCAR. This tutorial only discusses some missing value-removing functions, not the concept. Handling missing data is a HUGE concept on its own. Maybe these tutorials of ours: statisticsglobe.com/missing-data/, statisticsglobe.com/missing-data-imputation-statistics/ might be a starting point.
      Regards,
      Cansu

    • @atthoriqpp
      @atthoriqpp Год назад +2

      @@cansustatisticsglobe Thank you. I'll check the article now.

  • @tirthanandi6122
    @tirthanandi6122 2 года назад +1

    na.omit is removing the whole row. what if I do not remove the whole row? Is there any way I can plot geom_line without omitting na? The plot needs to ignore the point where there is a na?

    • @cansustatisticsglobe
      @cansustatisticsglobe 2 года назад

      Hello Tirtha,
      I think geom_line works as you wish by default. But if you want to avoid the gaps due to NA values. You can check our tutorial statisticsglobe.com/connect-lines-across-missing-values-ggplot2-line-plot-r. If the tutorial is not relevant to what you ask, please describe your wish in a bit more detail. Then I can try to find other solutions.
      Regards,
      Cansu

    • @tirthanandi6122
      @tirthanandi6122 2 года назад +1

      @@cansustatisticsglobe Hi, thank you so much for your reply. The tutorial that you showed is ok for one x,y pair. But I am looking for x, y1,y2,y3 dataframe. Now, if a data is NA in y1, not necessarily NA in y2, and y3. If I want to plot geom_line x-y1,x-y2,x-y3, what should I do?

    • @cansustatisticsglobe
      @cansustatisticsglobe 2 года назад

      @@tirthanandi6122 You are welcome. You can create new data columns for x-y1, x-y2, and x-y3 by simple data manipulation, then the data for x-y1 will be NA in some rows but not for x-y2 and x-y3. Ggplot will ignore the missing values and there will be breaks in your lines (I assume you pot multiple lines). If this solution doesn't address the issue please share your code with me then let me know what you want to change in the visual. I hope I can help then.
      Regards,
      Cansu

  • @lavinaarora3697
    @lavinaarora3697 4 года назад +1

    After omitting the NA the nos of rows still show the numbers in the original data set . Though I see that the number of row in the data after committing the rows is 111. which code can I use to get this 111 as nrow() gives me the original numbers

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад

      Hi Lavina, So you want to rename the rownames of the new data frame to be equal to the number of rows? Then you could use the following R code: rownames(data)

  • @lh4818
    @lh4818 4 года назад +1

    How can You make a new data frame that excludes all the NA values

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад

      Hey, please try the following R code: data_new

  • @hoax9784
    @hoax9784 Год назад +1

    and how do i do if it only shows other characters but not "NA", sir?

    • @cansustatisticsglobe
      @cansustatisticsglobe Год назад

      Hello,
      I am not sure if I got your question very well. Are you asking if the missing values are shown with other characters instead of NA?
      Regards,
      Cansu

    • @hoax9784
      @hoax9784 Год назад +1

      @@cansustatisticsglobe yes, sir. In my data, missing values are shown by "?" instead of "NA". However, i have already known the solution by watching your other videos. Thanks a lot.

    • @cansustatisticsglobe
      @cansustatisticsglobe Год назад +1

      @@hoax9784 Perfect!

  • @negijivlogs4626
    @negijivlogs4626 4 года назад +1

    Thanks for this video.

  • @anandacharya9919
    @anandacharya9919 4 года назад +1

    How to handle missing values in category variables not mentioned ??

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад

      Hey Anand, Actually you can use the first three examples of the video also for categorical variables. Only the last example (taking the mean) is not applicable to categoricals. Regards, Joachim

  • @taruvingatakudzwa151
    @taruvingatakudzwa151 Год назад

    How do i merge two datasets A and B but data set B is a small data that has to go and replace certain cells in A

  • @mosesyoung9318
    @mosesyoung9318 5 месяцев назад +1

    What of if there were character variables

    • @StatisticsGlobe
      @StatisticsGlobe  5 месяцев назад

      Hey, most of these methods also work for character data.

  • @sun27g
    @sun27g 4 года назад +1

    when you ran na.omit(airquality) before mean(airquality$ozone) already rows with NAs were deleted, giving you a complete numeric dataset, then why mean(airquality$ozone) is returning NA again....

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад +4

      Hey Aditya, na.omit(airquality) is not storing the complete data set in a new data object. You may use this code to store the complete data set:
      airquality_complete

    • @Paan-2.1
      @Paan-2.1 4 года назад

      ​@@StatisticsGlobe Wie speichere ich diesen neu erstellen Datensatz als eigenes Rda File? :-)

  • @tmitra001
    @tmitra001 3 года назад +1

    I like all your Video

    • @StatisticsGlobe
      @StatisticsGlobe  3 года назад

      This is great to hear! Thanks for the wonderful feedback Tamoghna! :)

  • @jenevavergara4125
    @jenevavergara4125 4 года назад +1

    how about if I only want to remove rows with all values are NA?

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад +1

      Hey Jeneva, thanks for the question. You can use the code shown in examples 5 and 6 of this tutorial: statisticsglobe.com/r-remove-data-frame-rows-with-some-or-all-na Regards, Joachim

    • @jenevavergara4125
      @jenevavergara4125 4 года назад +1

      Statistics Globe thank you very much, but I have another dilemma as I need to include the unique ID of the data for merging later, is there a way where I can only select columns with NA values in the row are present, so only that will be deleted? thank you very much for helping

    • @jenevavergara4125
      @jenevavergara4125 4 года назад +1

      EX. in my dataset i have column names: "ID" "A" "B" "C" "D" i only want to delete the rows with NAs in column A B & C

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад +1

      @@jenevavergara4125 Is the following code working for you? data[rowSums(is.na(data[ , ! colnames(data) %in% "ID"])) == ncol(data[ , ! colnames(data) %in% "ID"]), ]

  • @aloysduistermaat7046
    @aloysduistermaat7046 3 года назад +2

    How does this work the other way round? For example, I want all values in my dataframe to become NA if they are below 0.4. Thank you!

    • @yannickpichardo5520
      @yannickpichardo5520 3 года назад +1

      you can use df[df < 0.4] = NA

    • @StatisticsGlobe
      @StatisticsGlobe  3 года назад

      Thanks Yannick, that would have been my recommendation as well :)

    • @aloysduistermaat7046
      @aloysduistermaat7046 3 года назад +1

      Thanks guys! It worked

    • @StatisticsGlobe
      @StatisticsGlobe  3 года назад

      Great to hear!

    • @aloysduistermaat7046
      @aloysduistermaat7046 3 года назад +1

      @@StatisticsGlobe To elaborate on my question from earlier.. How do you remove all values between - 0.4 and 0.4? I tried 'data[data -0.4]

  • @borknagarpopinga4089
    @borknagarpopinga4089 4 года назад +1

    How can I delete a certain row only if the amount of NA's surpasses a certain threshold? E.g. when I have like 100 slope coefficients, but only one value is missing, it sounds a bit harsh to delete the whole row. How can I tell R to only delete the row, if there's let's say more than 10 NA's?

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад +1

      Hey Borknagar, the following R code should do the trick: data_new

    • @borknagarpopinga4089
      @borknagarpopinga4089 4 года назад +1

      @@StatisticsGlobe Worked perfectly, thx a lot. (Y)

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад

      Nice to hear Borknagar, thanks for letting me know :)

  • @hezzia4427
    @hezzia4427 4 года назад +1

    I was looking for how to working with the missing data, not to remove entire row that has NA, there are other columns for each row containing NA

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад

      Hi Hezzi, in this case you should have a look at missing data imputation. For example, you may have a look at this tutorial: statisticsglobe.com/predictive-mean-matching-imputation-method/ Regards, Joachim

  • @shaheryarshafi
    @shaheryarshafi 4 года назад +1

    is that possbile to change na from a particular rows like I have created Code : airquality[is.na(airquality[52:61, c(1, 2)])] = 7 but it not working then I create code like this one : airquality[is.na(airquality[52:61, c(airquality$Ozone)])] = "Sherry" this one is also not working

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад

      Hey Shehriyar, thanks for the question. You can use the following R codes:
      airquality[52:61, c(1, 2)][is.na(airquality[52:61, c(1, 2)])] = 7
      airquality[52:61, "Ozone"][is.na(airquality[52:61, "Ozone"])] = "Sherry"
      Regards, Joachim

  • @shanti3310
    @shanti3310 3 года назад +1

    Hello,
    How do handle NaN in R?

    • @StatisticsGlobe
      @StatisticsGlobe  3 года назад

      Hey Shanti, please have a look here: statisticsglobe.com/nan-in-r-is-nan-function

  • @mariasaraiva9675
    @mariasaraiva9675 3 года назад +1

    The problem is that depending on the package na.rm does not work. It seems that each package has its own way to consider NAs. This is stressful when you are used to SAS.

    • @StatisticsGlobe
      @StatisticsGlobe  3 года назад

      Hey Maria, you can use na.omit to remove rows with NA values before applying other functions. Note that it is often better to impute missing values via missing data imputation techniques, but this depends on your specific data.

    • @mariasaraiva9675
      @mariasaraiva9675 3 года назад +1

      @@StatisticsGlobe in epidemiology we "rarely" impute data, unless with multiple imputation after kowning very well what is going on with data , that is, sampling and understanding who are the missings. I know that for certain areas imputation is always recomended. Thanks.

    • @StatisticsGlobe
      @StatisticsGlobe  3 года назад

      OK I see, I have no experience in this field myself :)

  • @TinaHelen
    @TinaHelen 4 года назад +1

    Thank you, Maybe you can even help me further... How can I exclude single missing values from cases runinng Confirmatory Factor Analysis , without deleting the whole cases? I think the "na.rm=TRUE"-function should be the right one, but it seems that this doesnt work with the CFA-function (lavaan). When I do this, R still excludes the whole cases from the analysis. I would be so thankful, if anyone could help me!

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад

      Hey Tina, I recommend to apply a missing data imputation method such as predictive mean matching. The following tutorial provides more info: statisticsglobe.com/predictive-mean-matching-imputation-method/ Regards, Joachim

    • @TinaHelen
      @TinaHelen 4 года назад +1

      Thank you so much for your fast answer and for the hint! I will definetely consider that option. So do you think it's not possible the way I wanted to do it (just exclude the values) in combination with the cfa-function? Best regrards :-)

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад

      As far as I know, it is not possible. I'm not an expert for CFA though, so please double check somewhere else. In general: Imputation is almost always better than deletion methods, since otherwise your results are likely to have a (stronger) bias. Regards, Joachim

  • @azad2546421
    @azad2546421 3 года назад +1

    Sir, in your statisticsglobe website, where do we start? As a beginner to R, I'd like to know as to where to start. Thanks

    • @StatisticsGlobe
      @StatisticsGlobe  3 года назад +1

      Hey Azad, thank you for your comment! Unfortunately my tutorials do not follow a clear order. I have planned to publish a huge overview on R programming soon, in which I will structure all tutorials. I hope I'll find the time for it soon. Regards, Joachim

    • @azad2546421
      @azad2546421 3 года назад +1

      @@StatisticsGlobe OK Sir. Till then, I will try to watch the videos as best as I can. Thank you very much for all your work.

    • @StatisticsGlobe
      @StatisticsGlobe  3 года назад

      You are very welcome Azad! Let me know in the comments in case you have any questions :)

  • @sofiac4058
    @sofiac4058 4 года назад +1

    How can I remove NA values only if it is in a certain colunm.

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад

      Hi Sofia, you may either apply a listwise deletion (see here: statisticsglobe.com/complete-cases-in-r-example/) or you may extract the column as a vector and then remove NAs (see here: statisticsglobe.com/convert-data-frame-column-to-a-vector-in-r). I hope that helps! Regards, Joachim

  • @Janine5748
    @Janine5748 4 года назад +1

    Hey maybe you can help me. On university we have a project and we need to remove all the NA's from our data but the problem is I don't know how to remove Na's if they are "words" instead of "numbers". For example -> you get the variable "house" and then "new house", "old house", "big house", "small house" and then there are also some NA's . I tried it with complete.cases but it didn't work and also with "factor" so I decided to do it one by one and the parts with numbers were easy.

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад +1

      Hey Janine, Thanks for the comment. That's actually a very common problem. I suggest to replace the word-NAs with real NA values first. You can do that with the following code: data[data == "NA"]

  • @Jay19876
    @Jay19876 4 года назад +1

    Can you just remove NA's from a specific column within a data set? For example, if I have a column such as "wind chill" which has a lot of blanks when its not cold outside, I don't want to erase all of that data from the data set if I am looking at another column/vector of interest. Thanks!

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад +1

      Hey Jay, you may impute your missing values. This depends a lot on the content of your variable though. You may have a look at this tutorial for more information: statisticsglobe.com/missing-data-imputation-statistics/ I hope that helps! Joachim

  • @manjunathroyal2133
    @manjunathroyal2133 4 года назад +1

    When I try sum(is.na(data)) I am getting error as argument y is missing

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад

      Hi Manjunath, could you provide an example how your data looks like? Regards, Joachim

    • @shaheryarshafi
      @shaheryarshafi 4 года назад +1

      maybe you need to use dataset name if you have use data(airquality) then sum(is.na(airquality) or any other name that you have used for your data .

  • @daphne_moo
    @daphne_moo 4 года назад +1

    Sir, I would like to mutate a column named Daily revenue , which is added with promotion_revenue and non_promotion_revenue. However, there are some rows consists of NA in promotion_revenue whereas $30 in non_promotion_revenue. When I compute, the mutated column (Daily Revenue) will show the daily revenue in NA, even if there is number in one of the columns. I ady applied na.rm = TRUE in the summarize code summarize(daily_revenue = sum(total_rev, na.rm = TRUE)) , it doesn't work.

    • @daphne_moo
      @daphne_moo 4 года назад

      I tried this, failed :(
      mutate((total_rev = promo_revenue + non_promo_revenue), na.rm = TRUE) %>%
      group_by(order_date) %>%
      summarize(daily_revenue = sum(total_rev))

    • @daphne_moo
      @daphne_moo 4 года назад

      promo_revenue, non_promo_revenue, total_rev
      2020-03-18 NA 14.90 NA
      2020-03-18 42.47 10.85 53.32

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад +1

      Hi Daphne, you may replace the NA values by 0 before taking the sum. You can find more information here: statisticsglobe.com/r-replace-na-with-0/

    • @daphne_moo
      @daphne_moo 4 года назад +1

      @@StatisticsGlobe thanks! Got it~

  • @lahirukudaligamage13
    @lahirukudaligamage13 Год назад

    YESSSSS THANK YOUUUUU

    • @Ifeanyi.StatisticsGlobe
      @Ifeanyi.StatisticsGlobe Год назад

      You're welcome Lahirukudaligamage. We are happy you found the tutorial helpful!

  • @siddheshgosavi3552
    @siddheshgosavi3552 4 года назад +1

    thank you so much ❤❤❤

  • @durduozkarc6345
    @durduozkarc6345 3 года назад +1

    # 1. Load R packages
    > library("quantstrat")
    >
    > # 2. Stock Instrument Initialization
    >
    > # 2.1. Initial Settings
    > start.pf start.date end.date Sys.setenv(TZ='UTC')
    > init.eq # 2.2. Data Downloading or Reading
    >
    > # Data Downloading
    > getSymbols(Symbols='BMW',src='yahoo',from=start.date,to=end.date)
    [1] "BMW"
    Warning message:
    BMW contains missing values. Some functions will not work if objects contain missing values in the middle of the series. Consider using na.omit(), na.approx(), na.fill(), etc to remove or replace them.
    i don't want to see these errors how should i fix it

    • @StatisticsGlobe
      @StatisticsGlobe  3 года назад +1

      Hey Durdu, it seems like your data contains missing values. You may remove these missing values using the na.omit function as explained here: statisticsglobe.com/na-omit-r-example/ Please note that removing NA values should be theoretically justified.

  • @whitfieldlewis837
    @whitfieldlewis837 Год назад +1

    good stuff

  • @jayw6886
    @jayw6886 2 года назад +1

    hello, great videos thanks! question, if I wanted to get the NA values in a separate subset instead of omitting or removing them, what can I do?

    • @StatisticsGlobe
      @StatisticsGlobe  2 года назад

      You are welcome Jay, glad you like it! :) Regarding your question, please have a look at the following R code:
      airquality_NA

  • @16kush
    @16kush 2 года назад +1

    How to Undefined In place of NA?

    • @StatisticsGlobe
      @StatisticsGlobe  2 года назад

      Hey Kush, could you please explain your question in some more detail? I don't understand what you would like to do. Regards, Joachim

    • @16kush
      @16kush 2 года назад +1

      @@StatisticsGlobe sorry for the inconvenience, I meant to ask that if in some table I receive NA than how shall I replace it with some Specific Value Of my choice. In all the cells.

    • @StatisticsGlobe
      @StatisticsGlobe  2 года назад

      Hey Kush, I recommend using missing data imputation techniques for this: statisticsglobe.com/missing-data-imputation-statistics/

  • @zeusbhattacharya3122
    @zeusbhattacharya3122 5 лет назад +1

    How do you save omitted data in excel?

    • @StatisticsGlobe
      @StatisticsGlobe  5 лет назад

      Hi Zeus, you can find a detailed tutorial on exporting Excel files here: statisticsglobe.com/write-xlsx-xls-export-data-from-r-to-excel-file
      Does this answer your question? Regards, Joachim

    • @ayeledesalegn5367
      @ayeledesalegn5367 5 лет назад

      @@StatisticsGlobe ruclips.net/video/G2ra7Ku3eGM/видео.html

  • @mohammadbasheer6192
    @mohammadbasheer6192 5 лет назад +1

    hi, can write a code to replace missing value "NA" with mean

    • @StatisticsGlobe
      @StatisticsGlobe  5 лет назад +1

      Hi, you can use the following code: x[is.na(x)]

    • @mohammadbasheer6192
      @mohammadbasheer6192 5 лет назад

      @@StatisticsGlobe thank you sir

    • @StatisticsGlobe
      @StatisticsGlobe  5 лет назад +1

      You are welcome :)

    • @mohammadbasheer6192
      @mohammadbasheer6192 5 лет назад

      @@StatisticsGlobe hello sir... could you please explain me about R functions and function components like function name, arguments, function body and return value... or can you make a video on this topic
      thanks

    • @StatisticsGlobe
      @StatisticsGlobe  5 лет назад

      @@mohammadbasheer6192 Do you mean functions that are already available in R or do you mean user-defined functions? If you want to learn more about already available functions, you could have a look here: statistical-programming.com/r-functions-list/ If you want to learn more about user-defined functions, you could have a look here: statistical-programming.com/r-return-value-from-function-example

  • @frankjr3787
    @frankjr3787 4 года назад +1

    THank you very much for this video (Just subscribed). How do you remove 'NA" from a data set that has no numeric values. Say I just had to Columns( Name and Hair Color) and some of the Hair colors were NA.. how would I omit that?

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад

      Hey Frank, Thanks for subscribing! :) The class of your variables does not matter, you can apply the functions shown in this video the same way. If it doesn't work, you could check if your NA values are real NA values or if they are "NA" charater strings. In this case, you could replace the "NA" by real NA as shown in the following example code:
      data

  • @larissacury7714
    @larissacury7714 2 года назад +1

    What if I had two entries for each SUBJECT and I want to filter both of their entries if one of their entries in another collumn is NA? ps: great video as always!

    • @StatisticsGlobe
      @StatisticsGlobe  2 года назад +1

      Hey Larissa, thank you very much, glad you like the video! Regarding your question, please have a look at the following example code:
      data

    • @larissacury7714
      @larissacury7714 2 года назад +1

      @@StatisticsGlobe Hi, thank you!! I went for a tidy solution check it out: data %>%
      group_by(SUBJECT) %>%
      filter(all(!is.na(MYVARIABLE))) does that make sense?

    • @StatisticsGlobe
      @StatisticsGlobe  2 года назад

      Hey, this is difficult to tell without seeing your actual data, but I think this should produce a different result as my code. Is there a specific reason why you would like to use tidy instead of Base R?

  • @punchline9131
    @punchline9131 5 лет назад +1

    Gibt es von dir auch ein Video wie ich das mit dem Befehl "listwise deletion" handeln kann?

    • @StatisticsGlobe
      @StatisticsGlobe  5 лет назад

      @Gummibärmann Listwise Deletion wird in R normalerweise mit der Funktion complete.cases durchgeführt. Du kannst dir hierzu dieses Video ab Minute 2:40 anschauen: ruclips.net/video/OVHIYAEAHLY/видео.html Außerdem habe ich auf meiner Homepage ein Tutorial dazu veröffentlicht: statisticsglobe.com/listwise-deletion-missing-data/ Gib gerne Bescheid, ob dir die beiden Links geholfen haben :) Gruß Joachim

    • @StatisticsGlobe
      @StatisticsGlobe  5 лет назад

      @Der Humanist Danke für deine Rückfrage. Es scheint so als hätte euer dozent der Variable help immer eine 1 zugewiesen, wenn eine der anderen Variablen in df NA ist. Hat er danach eventuell ein Subset von df genommen, in dem nur die Beobachtungen drin sind, die in help = 0 sind? Dann wäre das (auf umständlichere Weise) das Gleiche wie wenn man die complete.cases Funktion verwendet. Ohne genauere Informationen ist das für mich aber ehrlich gesagt schwer zu beurteilen.

    • @StatisticsGlobe
      @StatisticsGlobe  5 лет назад

      @Der Humanist Freut mich, dass ich helfen konnte! Lassen Sie es mich gerne in den Kommentaren wissen, falls Sie weitere Fragen haben :)

  • @oluwadolapobifarin105
    @oluwadolapobifarin105 5 лет назад +1

    Thanks

  • @mdiqbal7168
    @mdiqbal7168 2 года назад +1

    R programming for t-test two tail tabulated value in plot

    • @cansustatisticsglobe
      @cansustatisticsglobe 2 года назад

      Hello,
      Thank you for your comment. Do you mean that you would like to see a tutorial on this topic? Is there something specific that you would like to know about tR programming for t-test two-tail tabulated value in the plot?
      Regards,
      Cansu

  • @victoresende
    @victoresende 4 года назад +1

    I LOVE YOU AAAAAAAAAA

  • @Jonpaulim
    @Jonpaulim 4 года назад +1

    Hi can I ask a question please

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад +1

      Sure Jonathan, go ahead!

    • @Jonpaulim
      @Jonpaulim 4 года назад

      @@StatisticsGlobe thank you very much, could I maybe send it to you on email or on another platform as the question may be a little long if you’re happy to suggest one ?

    • @Jonpaulim
      @Jonpaulim 4 года назад

      @@StatisticsGlobe Can I ask in R, if I have got 2 data sets, of different rows and columns but I want to merge them and this is based on one of the columns in each data set. So if the first column in dataset1 has 3 values and the first column in dataset2 has 9 values but the way the data is is such that each of the values in the first column of the first dataset maps onto 3 values in the second dataset how do i do it?

    • @Jonpaulim
      @Jonpaulim 4 года назад

      so like if the first column in dataset 1 has values 1 , 2 , 3 and first column in dataset 2 has values 1a 1b 1c 2a 2b 2c 3a 3b 3c and I want to merge the 2 columns based on the numbers but clearly first dataset only has 3 rows second dataset has 9 rows and I want to merge them so I can perform functions on them how do I do it thanks

    • @Jonpaulim
      @Jonpaulim 4 года назад

      @@StatisticsGlobe sorry for the long question. So all this must be done with R base package. Do let me know if you are able to help with this. Many thanks.

  • @Paan-2.1
    @Paan-2.1 4 года назад +1

    ​ @Statistics Globe Vielen Dank für das tolle Video. Das hat wirklich geholfen :) Leider habe ich immer noch ein Problem, und ich hoffe wirklch sehr, dass du meine Frage beantworten kannst. An welche Stelle setzte ich das na.rm = TRUE in einem komplexeren Code?
    Ich bekomme immer eine Fehlermeldung und ich schätze (laut Internetrecherche) dass diese etwas mit den NA zu tun hat: Fehler in KhatriRao(sm, t(mm)) : (p

    • @StatisticsGlobe
      @StatisticsGlobe  4 года назад +1

      Hallo Paula, vielen Dank für die netten Worte. Freut mich sehr, dass dir meine Tutorials gefallen! :) Die Antwort auf deine Frage findest du in der Dokumentation der lmer Funktion. Diese kannst du mit dem R Code ?lmer aufrufen. Hierin steht:
      "na.action
      a function that indicates what should happen when the data contain NAs. The default action (na.omit, inherited from the 'factory fresh' value of getOption("na.action")) strips any observations with any missing values in any variables."
      In anderen Worten: Die Option na.rm ist bereits automatisch aktiviert, wenn du die lmer Funktion verwendest. Bitte beachte, dass dies auch zu Risiken bei der Datenanalyse führen kann und dass du eventuell deine Daten imputieren solltest. Mehr Informationen findest du hier: statisticsglobe.com/missing-data/
      Viele Grüße, Joachim

  • @SumanGhosh-vn3tx
    @SumanGhosh-vn3tx 5 лет назад +1

    great

  • @Rhena
    @Rhena 3 года назад +1

    Könntest du das auch noch mal in Deutsch aufnehmen? :D

    • @StatisticsGlobe
      @StatisticsGlobe  3 года назад

      Hey Rhena, auf diesem Channel lade ich nur englischsprachige Videos hoch. Aber ich habe schon geplant demnächst eine teilweise deutschsprachige Webseite zu erstellen, ich hoffe, das hilft dann weiter! :) Viele Grüße, Joachim

  • @eyadha1
    @eyadha1 4 года назад +1

    Thanks. Very helpful