Data Cleaning in R with Real Campaign Data!

Поделиться
HTML-код
  • Опубликовано: 21 авг 2024

Комментарии • 47

  • @saranuprety3754
    @saranuprety3754 4 года назад +2

    Clear delivery of Subject materials. Appreciated.

  • @saneeshcs6604
    @saneeshcs6604 4 года назад +2

    You are an amazing teacher and data scientist. I wish if had a teacher like you in my collage. Thanks for helping beginners like me.

    • @techknowhow4802
      @techknowhow4802  4 года назад +3

      This is part of why I am creating these videos. I wish I had college professors that had a knack and love for data science, instead of ones that were just there to become tenured. Some were good researchers, but they couldn't teach the ideas or get students to think creatively and outside the box. I also create these videos to give back to the data science community because it has been so good to me over the years. Plus, I just love to help others. I am glad you like my videos and are benefiting from them. Please feel free to share them with others so that they can learn also. Thanks again. :)

    • @revydmat
      @revydmat 4 года назад

      @William Maphosa Did you receive the dataset and could you kindly share? Thanks

  • @dasrotrad
    @dasrotrad 4 года назад +2

    Being new to R, I really appreciate this. So well explained.

    • @techknowhow4802
      @techknowhow4802  4 года назад +1

      Thank you. Helping people to learn data science and real everyday use for data science is what this RUclips channel is all about. The data science community has been very good to me, so this is my way of giving back. These are all processes that I and my team actually use for a Fortune 200 international company every day to provide really interesting insights that executives can act on and make great decisions based off of. Thank you.

  • @funlolageorge1467
    @funlolageorge1467 3 года назад +1

    New to R. . This has been so helpful. Thanks

  • @tgulati1300
    @tgulati1300 3 года назад +2

    thank you so much for the video! so helpful to beginners in R and very clearly explained in simple language i can't thank you enough :)

    • @techknowhow4802
      @techknowhow4802  3 года назад +1

      Glad you found it helpful! Thanks again. :)

  • @mdmahmudhasan1645
    @mdmahmudhasan1645 2 года назад

    Thank you 😊 I just got some ideas about data cleaning.

  • @lukmanmanggo
    @lukmanmanggo 3 года назад +1

    Thanks for the video. This help me much to cleaning Data efficiently.

    • @techknowhow4802
      @techknowhow4802  3 года назад +1

      Thank you. I am glad you found this helpful. You will find that roughly 80% of a data scientist's and data analyst's work day is spent dealing with data issues (new, unclean data, data in hard to get to places, etc...) the better and more effective one gets at cleansing data, the easier your day will be and the more time you can spend on the other 20% (creating new novel processes, deep learning, etc.) Look at the other videos on my channel which cover all sorts of stuff that you will find helpful to your career. Thanks again and best of luck with your career!

  • @yuthpatirathi2719
    @yuthpatirathi2719 4 года назад +1

    Superb video sir!

    • @techknowhow4802
      @techknowhow4802  4 года назад

      Thank you. 80% of data science is obtaining and cleaning the data. This video shows how we do this with real campaign data for a real company. Real applications! What would you like to see next in my videos?

  • @onyinoyi_o
    @onyinoyi_o 4 года назад +3

    Thanks for this amazing video. Could you by any chance give us the R script you're working from?

  • @nguyentho9467
    @nguyentho9467 3 года назад

    thank you

  • @srinivasansridhar5037
    @srinivasansridhar5037 4 года назад +1

    This video is amazing! Thank you, David, for posting it. Could you please share the R Codes and the data you are using in this video?

    • @techknowhow4802
      @techknowhow4802  4 года назад

      As I have time, a lot of the datasets are uploaded to Kaggle and the code on Researchgate.net. Just look up David Maillie on either. Thanks again. Question - I like to hear from my audience, what would you like to see in an upcoming tutorial video?

    • @revydmat
      @revydmat 4 года назад

      @@techknowhow4802 Hello! This is a great tutorial! However the dataset is not up on the Kaggle or Researchgate. Could you please send us a link or give us an email to send a request. thanks!

    • @ranjeetkumarjha9844
      @ranjeetkumarjha9844 2 года назад

      @@techknowhow4802 Please upload Machine learning videos on R like how to develop a regression models using R.

  • @stephenruotilio3639
    @stephenruotilio3639 3 года назад

    Thank you so much for this video. I am new to R and this really helped me gain a better understand of the coding.

  • @leequist2908
    @leequist2908 2 года назад

    Is the subset function being used an alternative to filter?
    If yes why prefer this over filter and what’s the syntax?
    If No what’s the difference between subset and filter and again why prefer the former over the later?

  • @demirenteria1700
    @demirenteria1700 4 года назад

    Thanks for posting this video it´s very helpful, do you have any recommendation on how to standardize Tech-Stack inventories?

  • @shirazahmed7125
    @shirazahmed7125 4 года назад +1

    I couldn't find the dataset when I searched with your name (David Maillie) on Kaggle. I found your Kaggle but did not find any Oil Change data

    • @techknowhow4802
      @techknowhow4802  4 года назад

      I don't publish every dataset. Some contain proprietary data that I cannot release like that. Thank you for understanding.

  • @lprayaga1
    @lprayaga1 3 года назад +1

    Example is really nice. Can you share the data file?

    • @techknowhow4802
      @techknowhow4802  3 года назад

      SOme of these data files are on Kaggle.com. Others are sometimes proprietary information and so I cannot always release them to the public. Thanks again. :)

  • @rvsingh6609
    @rvsingh6609 4 года назад

    if you can help with, how to remove a bivariate outlier. thanks

  • @hirenkakkad3747
    @hirenkakkad3747 3 года назад +1

    Hi, Thank you for wonderful video. I have one question. I have dd-mm-yyyy HH:MM format in excel file. When I import it, it is imported as 43922.03 and so on. How shall I convert this to date and time format in R?
    Thanks in advance.

    • @techknowhow4802
      @techknowhow4802  3 года назад

      Look at the data type of your columnsin Excel. If you leave them as text that can happen. Also, general date with minutes and hours will have same issues. Change format for affected columns in Excel to be the very first choice in Date type. *3/14/2012. That will then come across correctly. Easier to fix in Excel by column. ☺

    • @techknowhow4802
      @techknowhow4802  3 года назад

      If you need time - place that portion in a Seperate column. I have never had to use the time portion for any analysis. It is possible depend8ng on your niche and your data. If you need that, place it in a Seperate column for later analysis.

  • @tinatwine137
    @tinatwine137 4 года назад +1

    Thank you for creating a video to help people learn data science. I wish I would have known about you 3 months ago. I would have gladly pay you to teach me Data Science in such great detail:)
    (I am trying to install package (openxlsx) and I keep getting this error message- Error in library(openxlsx) : there is no package called ‘openxlsx’). I have done "installed.packages ("openxlsx", dependencies = TRUE), and the error message continues to appear. And, I even tried - 'help(openxlsx)' it says,
    No documentation for ‘openxlsx’ in specified packages and libraries:
    you could try ‘??openxlsx’.
    Any suggestions of what is the issue and why I continue to get this error message.

    • @techknowhow4802
      @techknowhow4802  4 года назад

      Your welcome. If you don't mind my asking - What is the current BI tool that you are using? Or do you use numerous tools and/or apps? Do you see any specific problems you need solutions to? I like to see what my audience is using and problems they might be facing and then I can create tutorial videos that are better catered to them. Thanks again. :)

  • @noneofurbusns3139
    @noneofurbusns3139 4 года назад +1

    using as.Date gives me NA in the column... any idea why??

    • @techknowhow4802
      @techknowhow4802  4 года назад

      Look at your data. Might be a problem with the original formatting or maybe you have a missing value hiding in there?

  • @Trafficfromsky
    @Trafficfromsky 3 года назад

    Do you provide the link for the data you are using

  • @ninadjoshi202
    @ninadjoshi202 3 года назад

    Could you please share the R file you used?

  • @onpa7492
    @onpa7492 3 года назад +1

    Where to get the data from

    • @techknowhow4802
      @techknowhow4802  3 года назад

      SOme of the datasets are at Kaggle. Look me up there. :)

    • @onpa7492
      @onpa7492 3 года назад +1

      @@techknowhow4802 thank you very much

  • @JarJarBinkz68
    @JarJarBinkz68 3 года назад +2

    Couldn't you just put the data in the dishwasher and clean it automatically ? hehehe :0

    • @techknowhow4802
      @techknowhow4802  3 года назад

      Unfortunately no. Data doesn't work like that. There are some applications like Alteryx that help here, but R is great once you have built a reusable process.

  • @nicholettejones5626
    @nicholettejones5626 4 года назад +1

    This video was really helpful. Would you consider sharing the script as a link to the file in the description box?

    • @techknowhow4802
      @techknowhow4802  4 года назад

      A lot of the data I use is proprietary. That means I cannot share it and in most cases it is scrubbed. If you are looking for a similar practice dataset for data cleaning in r or pretty much anything in r for that matter, try the University of California at Irvine Data Science Data Repository. They have thousands of freely available datasets all labelled correctly for you to choose from and use for learning, building reusable processes and templates and much more. Thanks again for watching this and the other videos on my data science channel. :)