How Much Statistics Do You REALLY Need for Data Science?

Поделиться
HTML-код
  • Опубликовано: 28 ноя 2019
  • Subscribe to RichardOnData here: / @richardondata
    In my last video I discussed the fact that statistics is a must-know component of the broad, multidisciplinary data science skill set. If you missed that video you can find it here: • What Is a Data Scienti...
    A book I highly recommend, especially for the non-mathematical reader, is "How Not to be Wrong" by Jordan Ellenberg. Find it here: amzn.to/2U1FjpQ
    However, not everyone going into data science necessarily has a statistics background. It begs an obvious follow-up question: how much statistics do you REALLY need for data science? Education is valuable but not every single thing you learn in a traditional statistics degree is a hard and fast requirement. Here are, from my perspective, the core skills you need:
    Fundamentals
    - Probability calculations including conditional probability/Bayes rule and the Central Limit Theorem
    - Basic understanding of distributions including properties of random variables such as expected value and variance
    - Full confidence interval framework
    - Full hypothesis testing framework including p-values, conclusions, Type I and Type II error
    Tools
    - Linear models (how to setup, interpret, iterate)
    - Machine learning models including setting them up in a programming language from pre-processing to outputting results, also understanding the bias-variance tradeoff and how to address over (and under) fitting
    - Survival analysis
    Reasoning
    - Assumptions of tests and models used
    - How bias affects results
    - Confounding variables and Simpson's Paradox
    #statistics #datascience #StatisticsForDataScience
    PayPal: richardondata@gmail.com
    Patreon: / richardondata
    BTC: 3LM5d1vibhp1F7pcxAFX8Ys1DM6XLUoNVL
    ETH: 0x3CfC599C4c1040963B644780a0E62d45999bE9D8
    LTC: MH8yPjvSmKvpmRRmufofjRB9hnRAFHfx32
  • НаукаНаука

Комментарии • 129

  • @THansenite
    @THansenite 2 года назад +1

    Thank you for these videos! As a programmer looking to get into data science, this video in particular has been exactly what I'm looking for as far as getting an idea of what I should work on learning from the statistics side.

  • @codelucky
    @codelucky 2 года назад +11

    Was looking for gold, found a gem.

  • @romulodamasceno2427
    @romulodamasceno2427 3 года назад +12

    This channel really is the only one I watch every single AD. Thank you so much for what you're doing, please keep up!

    • @RichardOnData
      @RichardOnData  3 года назад +4

      That really means a lot, thanks so much, no intention of ending what I do here any time soon!

  • @sanjag
    @sanjag 4 года назад +9

    I am learning so much from your videos! Thank you, thank you, thank you!!!

    • @RichardOnData
      @RichardOnData  4 года назад +1

      That is precisely the goal! Thanks so much!

  • @FredericBiondi
    @FredericBiondi 4 года назад +53

    It is incredible that you have such a low number of followers considering that you are one of the best data scientist RUclipsr out there.
    Keep it up, your efforts will eventually pay off!👍

    • @RichardOnData
      @RichardOnData  4 года назад +15

      Thank you so much! My channel is relatively new -- only about four months old which is a very small amount of time in the RUclips world -- so growing it is and will be a work in progress!

    • @mandelbro777
      @mandelbro777 3 года назад +1

      agree

  • @javihntay6445
    @javihntay6445 3 года назад +1

    I just started my Bsc in data science and I just stumbling to understand why I even chose this path. This video really helps to provide a bird's eye of the career I moving towards. Awesome job!!

  • @data-science-ai
    @data-science-ai Год назад

    This is pretty cogent advice. I personally like to know linear algebra and calculus (the maths) but recognize that stats and statistical reasoning are more important to avoid data science mistakes/pitfalls.

  • @dangunwangum
    @dangunwangum 2 года назад +1

    Very concise and helpful. Thank you

  • @Rumil_
    @Rumil_ 4 года назад +3

    Dude thank you so much. Im starting my Masters in applied stats this summer and looking forward to it. I like the way you grouped these different elements into their own categories, it definitely helps with understanding these concepts. Also curious here, what was your favorite class during your masters and which class do you give the utmost importance to?

    • @RichardOnData
      @RichardOnData  4 года назад +1

      I have to assign a tie here to my Machine Learning class and my Statistical Consulting class. Got all the methods, and got all the straightforward yet powerful ways that things could be applied. Best of all worlds.

  • @chougaghil
    @chougaghil 4 месяца назад +1

    Again a very helpful talk
    I see how RichardOnData channel was born under the sign of excellence !

  • @MarceloTutoriais100
    @MarceloTutoriais100 3 года назад +1

    Great video, Richard! Really appreciated the effort in the examples 👍🏿👊🏿

    • @RichardOnData
      @RichardOnData  3 года назад

      My pleasure! Examples are how you learn!

  • @machinelearningid3931
    @machinelearningid3931 3 года назад +1

    Wow, this is the best video explaining data science in general, Thanks a lot

    • @RichardOnData
      @RichardOnData  3 года назад

      You are most welcome, I'm glad that it provided a helpful overview of the field!

  • @hamza8641
    @hamza8641 3 года назад +1

    A huge knowledge. Appreciate for your effort to make this video.

  • @ahsanmohammed1
    @ahsanmohammed1 3 года назад +2

    Great tips! Great video! Much appreciated! Thank you so much!

  • @alyahyai
    @alyahyai 3 года назад +1

    Very helpful information, thank you & keep up the great work & efforts 🙏🏻

  • @maxpower677
    @maxpower677 Год назад

    I really like this video because it's very well-detailed. Currently, I'm using ChatGPT to help me develop data science projects. As a part of the prompt, I inserted all the stats and math concepts mentioned in your videos and asked it to summon each topic when needed and debate with me when each of them is important in each part of the process. In that way, I'm "forced" to think more deeply about the process and not just rely on library functions and hope for the best. Thanks!

  • @johngarrett8311
    @johngarrett8311 Год назад

    I firmly agree that those are the basics. Once you have those basics you can pick up whatever tool you need for modeling and analysis. If you lack those basics it can be hard to pick up any specialized tools that you may need for specific problems. I would quibble on survival modeling: I learned from my work in life insurance and banking that many of the standard survival models have functional forms that make it impossible to analyze key business problems. In lending most standard survival functions assume shapes that are horribly wrong, resulting in poor business decisions.

  • @polarbear986
    @polarbear986 2 года назад

    wow, thank you so much. Very very helpful!

  • @codebala9132
    @codebala9132 3 года назад +1

    your efforts are appreciable , keep going

  • @mandelbro777
    @mandelbro777 3 года назад +4

    great videos. I have a background as a Business Analyst and I might add that a really important skill for those people who live between data science and business analysis is Excel and VBA programming skills, which are ubiquitous across almost all industries.

  • @ef7496
    @ef7496 Год назад

    Thank you so much but we need a video that tell us where to take these classes . Or a study plan for them and where to find them

  • @KarmaTrain737
    @KarmaTrain737 3 года назад +1

    Invaluable. Thank you!

  • @victorzurkowski2388
    @victorzurkowski2388 3 года назад +3

    Nice list. I would have included time series instead of survival analysis. Also, my go-to predictive models are penalized generalized linear models (aka: "elastic nets").

    • @RichardOnData
      @RichardOnData  3 года назад

      Fair enough! Elastic nets are one of my favorite techniques too. I might be speaking from a bit of an experience bias, considering I've done more survival analysis than time-series analysis myself, but that's an incredibly fair point.

  • @MohamedHassan-sk7qj
    @MohamedHassan-sk7qj 3 года назад +1

    Thank You. you hit the nile on the head. in my question how long does it take all these mentioned staff yrs months?

    • @RichardOnData
      @RichardOnData  3 года назад

      I think you can be in a pretty good position with much of this stuff after a few months (3-6 months or so), particularly if you're making use of things like Coursera courses. Of course the best possible option is a 4 year bachelors degree (or 2 year masters) but that's not necessary for absolutely everybody.

  • @fahadreda3060
    @fahadreda3060 3 года назад +1

    Great video man , keep them coming

    • @RichardOnData
      @RichardOnData  3 года назад

      Thank you! More to come -- 6-8 per month is the goal!

  • @jeanandreyy
    @jeanandreyy 2 года назад

    i'm brazilian student, and this guy is really and absolute funny and impressive smart, the way his speak and tauch is amazing, i'm really impressed.

  • @user-qu8fu6zv4w
    @user-qu8fu6zv4w 3 года назад +1

    Good to know, thanks a lot for sharing!

    • @RichardOnData
      @RichardOnData  3 года назад

      My pleasure, hope this video was helpful!

  • @AmeerulIslam
    @AmeerulIslam 3 года назад +2

    Please do a video tutorial series for these statistical concepts!

    • @RichardOnData
      @RichardOnData  3 года назад

      I have a Statistics Tutorial series already on my channel actually - and these ideas all form the bedrock of that series!
      Having said that these videos tend to be some of mine with the lowest number of views. So if you'd like to check them out and let me know which topics you'd most like me to cover on top of those, that'd be awesome!

  • @rezakheilikhare
    @rezakheilikhare 2 года назад

    Hi Richard, I am wondering if you can suggest a book on statistics beside the one you have mentioned on the description, something more for people with math knowledge and want to know more in-dept statistics. Mostly, I am wondering if there is a book which covers the fundamentals you have listed?

  • @sofusaxelsen4029
    @sofusaxelsen4029 3 года назад +2

    A really good video to watch, I really like your explanation! Do you think that statistics is also needed for Data Analysts? I think you should also make more videos about data analytics, because I notice that there a LOT of jobs available in this field compared to data science. Thank you so much in advance! You're the best :)

    • @RichardOnData
      @RichardOnData  3 года назад

      That's interesting that you're seeing data analytics have more available jobs in the field. What area are you in?
      I would say statistics is certainly still important - particularly to know methods and how to reason - while the deep theory of it probably won't be as important, as you're unlikely to be involved in creating your own methodologies in data analytics at least compared to data science.

  • @billycheung7095
    @billycheung7095 3 года назад +1

    You gave me a clear picture of what a data scientist do. Thx

    • @RichardOnData
      @RichardOnData  3 года назад +1

      That's the idea! If you have other things you'd like to see me cover please let me know.

  • @positiveoptimist5060
    @positiveoptimist5060 3 года назад +1

    Thank you Richard, that's really insightful!
    I have a BS in Mathematics, which degree do you think is better to break in the data science field: a MS in Applied Statistics or a MS in Data Science?

    • @RichardOnData
      @RichardOnData  3 года назад

      Great question. This is going to depend on curriculum and whatnot, but my guess is that a lot of the statistics stuff will be easier to pick up with if you have a mathematics background (that's the beauty of some linear algebra knowledge!). I'd probably go with the Data Science MS especially if you're not already also a CS expert!

  • @nicolleperalta3039
    @nicolleperalta3039 3 года назад +1

    super helpful , thank you ✨

  • @RajkumarMahendran
    @RajkumarMahendran 3 года назад +3

    Totally Worth Sub'g your channel.

  • @oliviayoon1968
    @oliviayoon1968 4 года назад +1

    Your video is so informative!! Im currently deciding my major and need ur advice :( Before getting a MS in Stats, would u recommend majoring in Applied Math OR Stats for BS if I wanna go into a DS field??

    • @RichardOnData
      @RichardOnData  4 года назад

      Thank you so much! That is the goal!
      Unless you have a burning passion for pure mathematics, I would go with the stats BS. While the strong knowledge of calculus and linear algebra will probably serve you well, I've actually heard from applied math MS's who say their pathway was a bit too theoretical for industry DS.

  • @VesqVj
    @VesqVj 4 года назад +1

    Here also 5y experience of data science jobs as a statistician, and I can confirm pretty much everything Richard is saying. I would also like to stress how important the reasoning part is. Data quality issues lurk EVERYWHERE and bad data is handed over to you all the time. Statistics help you imagine a process that has produced your data. In other words, your data is a sample drawn from a certain distribution. When that process suddenly changes, you'll be there saving the day.

    • @RichardOnData
      @RichardOnData  4 года назад +1

      Thank you! Yes, I think one of the greatest failings in academia right now is a failure to emphasize that real world data is NOT clean and clear. I wish there was significantly more emphasis on thinking through practical data quality problems.

  • @noofams7441
    @noofams7441 3 года назад +1

    Wooow this such a perfect breakdown for the data science subject..
    Well I finished one semester in master data science but I have an acceptance to study applied statistics in another university which is one of the best university .. my bachelor degree in MIS .. and I am hesitant which subject to choose!!

    • @RichardOnData
      @RichardOnData  3 года назад

      Glad you enjoyed the video! I'm a fan of the concept of Data Science degrees, but at the same time I'd take a strong Statistics education over a weak Data Science education any day. It'll come down to how strong that DS curriculum is.

  • @ma.veronicamagsilang4843
    @ma.veronicamagsilang4843 3 года назад +1

    count me in as your students sir! im taking action to build a skills that can be market . Im quite overwhelm to a lot of information so i decided to follow you and lets seee

    • @RichardOnData
      @RichardOnData  3 года назад

      Fantastic! That's very kind and I'm glad you're here.

  • @tallalmoshrif6643
    @tallalmoshrif6643 3 года назад +1

    Bravo!
    Very informative contents
    Thank you

  • @doctor9101
    @doctor9101 2 года назад +1

    I have a PhD in Pure mathematics, specialising Dynamical Systems. Today I am a Software developer/Software Architect.
    Please correct me if I am wrong, for me Data Science is 80% Statistics.
    Computer science guys doing data science , ok I can live with that.
    A Post graduate or an graduate in Statistics for me is a better fit as a Data Scientist. Programming is the easier part

  • @Didanihaaaa
    @Didanihaaaa 2 года назад

    Could you please provide some sort of examples for each component?
    Thanks

  • @alokgupta3241
    @alokgupta3241 3 года назад +1

    Good information Thanks

  • @kcheng3594
    @kcheng3594 Год назад

    Good Points!

  • @fattehmohamed4385
    @fattehmohamed4385 3 года назад +2

    Presentation 100% thanks man appreciate it. May Allah guide you and everyone else.
    ;)

  • @thalesprado6371
    @thalesprado6371 4 года назад +1

    Very interesting.
    I am coming from a hospitality background and I am about to conclude my Mba in big data applied to marketing which basically is giving me a better approach from the data that usually I am dealing with.
    Most of the times is to predict some output(demand) based on my history using time series or best price that will maximize my profit (linear regression). As my goal is to stick with pricing field and finding solutions related to pricing and demand.
    Would you consider that in this case I should dive into a BS in statistics to get a better foundation or Data science ?
    I work in the revenue management department just to give some context.
    Thanks

    • @RichardOnData
      @RichardOnData  4 года назад

      Do you currently know much in the way of computer science fundamentals? Maybe know one of R or Python? Want to stick with that role for a while? I'm inclined to say statistics here because programming resources tend to be easier to find than solid statistics resources, and knowing some of the fundamentals serve you really nicely.

    • @thalesprado6371
      @thalesprado6371 4 года назад +1

      RichardOnData currently I am learning python and use r to run some regression models and other stuff that is required in the work.
      Thanks for the advice I think that I will stick with statistics then.

  • @marcello4258
    @marcello4258 2 года назад

    Super underrated.. I used to work as a DS in my last job.. many of my peers asked me how to become a DS because it's a nice buzzword and gusss what.. everyone thinks.. just learn python and you are a DS... lol.. yea but they don't like statistics and are afraid of.. yes I don't get if either

  • @jaydippatel363
    @jaydippatel363 3 года назад +1

    Could you please suggest some books and material that one can learn things you just mentioned from?
    Thank you!

    • @RichardOnData
      @RichardOnData  3 года назад +1

      Highly recommend the book "An Introduction to Statistical Learning: With Applications in R". I also recommend the book "Approaching (Almost) Any Machine Learning Problem" due to its practical narrative. If you want to go deep into the theory, the book "Statistical Inference" by Casella & Berger is classic, albeit probably more than most people require.

  • @thethirdnote574
    @thethirdnote574 2 года назад

    Even I am persuing MSc in Applied Statistics , so is it better than MSc in Data science?

  • @ehsansslman9436
    @ehsansslman9436 3 года назад +1

    Can you please recommend any source where you can apply statistical solutions to real world problem? thanks in advance.

    • @RichardOnData
      @RichardOnData  2 года назад

      Naturally, there's nothing quite as good as true experience, but the closest thing that I would encourage is competitions on kaggle.com.

  • @ayushbanerjee4442
    @ayushbanerjee4442 3 года назад +1

    hi richard, i have a question, I want to know that do we need to study algorithms as a general topic (may be written by thomas cormen) in order to get a clear understanding of how algorithms are working in real life basis? kindly suggest, i am an aspiring data scientist

    • @RichardOnData
      @RichardOnData  3 года назад +2

      When you say how they work in real life - do you mean an intuitive explanation of how the algorithms work (getting rid of mathematical jargon), or how industries in the real world are using them? If it's the latter, it's helpful to read blogs like KDNuggets because there's articles for applications of AI, etc. all the time. For explanations of how algorithms work, books like "An Introduction to Statistical Learning" and "Applied Predictive Modeling" are excellent.

    • @ayushbanerjee4442
      @ayushbanerjee4442 3 года назад

      @@RichardOnData thanks a lot Richard, i have another question, lets say, i have interest in how AI and ML is helpful in cyber security domain. So in that regard, I want to know, how much domain knowledge is required in cyber security? Am i suppose to train myself as a hacker or just knowing the basics may work fine ? I have my friend who is OSCP certified ML practitioner, and i am really not sure about how much knowledge should someone have into cybersecurity, to start a career in that domain. Kindly assist.

  • @bryanstark324
    @bryanstark324 2 года назад

    Very informative! Do you have a course? if not, have you thought about making a course on Udemy? It's also good to have information here, but I'd be surprised if you made a whole course on just RUclips. But I'm now a subscriber because I really liked this videa for breaking down the types of approaches to learn.

  • @hellogelo91
    @hellogelo91 2 года назад +1

    Hello can you provide what are the best statistics textbooks for data science

    • @RichardOnData
      @RichardOnData  2 года назад +1

      I have a video on 5 machine learning books I recommend... granted the focus is ML there but many of them have great value to statistics more broadly. A single one that's a staple for introductory statistics though (with an R focus as well) is "An Introduction to Statistical Learning".

  • @matiasa769
    @matiasa769 3 года назад +1

    Is there any book you can recommend to learn the statistic neccesary for data science? I have a strong math background so that won't be a problem. Thank you in advance!

    • @RichardOnData
      @RichardOnData  3 года назад

      If you could read only one book, the highest recommendation I have is "An Introduction to Statistical Learning: With Applications in R".

    • @matiasa769
      @matiasa769 3 года назад +1

      @@RichardOnData Thanks for your response! I'm planning on reading more than one book actually. My current plan is to read Blitzstein and Hwang's Introduction to Probability to cover the probability part, and the one you recommend to cover the "tools" aspect.
      What I feel I'm missing is some material regarding the estimation/inference aspect. I'm currently considering Hogg, McKean, & Craig and Larsen & Marx (slightly inclined towards the latter), but any other suggestion is welcome.
      Thank you again for your time.

    • @RichardOnData
      @RichardOnData  3 года назад +1

      I didn't see this comment reply for the longest time, sorry about that. If you're still interested in books on this, an excellent one is "The Book of Why: The New Science of Cause and Effect". It's an excellent resource for inference considering most resources out there focus most on estimation!

    • @matiasa769
      @matiasa769 3 года назад

      @@RichardOnData Hey Richard, thanks you a lot for the recommendation! I will definitely check it out.

  • @shashankkumarshah7279
    @shashankkumarshah7279 3 года назад +1

    Sir, what's study path of mathematics, statistics and python or R for data science?

    • @RichardOnData
      @RichardOnData  3 года назад +1

      I did a video on this very topic, look up "A Study Pathway for Data Science".... but in short, my recommendation is statistics first, followed by either R or Python. I'd probably do the other of those two next followed by a more rigorous understanding of math.

    • @shashankkumarshah7279
      @shashankkumarshah7279 3 года назад

      @@RichardOnData which is the best option R or Python for as a data science.

  • @samueltwumasi4214
    @samueltwumasi4214 3 года назад +1

    Hello sir please i am not that good in statistics but I good in Mathematics so can I major in CS and minor in Maths for the data science?

    • @RichardOnData
      @RichardOnData  2 года назад

      Yes, I still see value in learning statistics such that you have that exposure, but I think that will be a winning combination.

  • @daniyarturak9558
    @daniyarturak9558 3 года назад +1

    Hello, do you have any courses?

    • @RichardOnData
      @RichardOnData  3 года назад

      I do not; at this time I do not have any intention to start a course, just due to the abundance of resources out there on the internet. However, let me know if there are particular topics you would like me to cover!

  • @Phoenixspin
    @Phoenixspin 3 года назад +1

    Cool shirt, Richard.

    • @RichardOnData
      @RichardOnData  3 года назад

      Haha, thank you! I try to switch it up. Collar shirt one day, band t-shirt the next day.....

  • @marshel9884
    @marshel9884 2 года назад +1

    Thnks

  • @ahsanmohammed1
    @ahsanmohammed1 3 года назад +1

    what is your undergrad?
    what specifically did you do in your masters?
    Thanks.

    • @RichardOnData
      @RichardOnData  3 года назад +1

      My undergrad was a Statistics and Economics double major, and I did a masters in Statistics.
      Off the top of my head the masters included the following: mathematical statistics, statistical computing, machine learning, statistical consulting. I also did electives in sampling, spatial, survival, and categorical data analysis.

  • @SivaKumar-rd2gl
    @SivaKumar-rd2gl 3 года назад +1

    Sir how much data science is needed for Bioinformatics please reply sir

    • @RichardOnData
      @RichardOnData  3 года назад

      For bioinformatics I recommend strong knowledge of the underlying domain as well as statistics, SQL, and one of R or Python. You are probably not getting into crazy data engineering or too complex of ML in that case, but a healthy understanding of ML algorithms wouldn't hurt either.

  • @joshuddin897
    @joshuddin897 2 года назад

    Bonjour Richard

  • @starship1701
    @starship1701 2 года назад

    Rather than focusing on academic knowledge and theory, why not make videos that talk about the actual application of statistical methods in data science? You could go through some actual projects and some Jupyter notebooks that explain with Markdown how statistics is important in solving that specific problem.

  • @emmanuelkehinde7112
    @emmanuelkehinde7112 3 года назад +1

    Just stumbled on your channel. A new subscriber here. If you have the time, please put a course out there on Udemy. You are doing a great job with your content!

    • @RichardOnData
      @RichardOnData  3 года назад +1

      Thank you; I have received a number of requests to do so in the recent months. I've tried to stray away from doing courses due to the sheer amount of free material online - and I'd have to do a lot of introspection and planning before coming up with something that I think makes sense, but I certainly haven't ruled it out and it's a project I may embark on in the future!

    • @emmanuelkehinde7112
      @emmanuelkehinde7112 3 года назад

      @@RichardOnData thank You! I will be on the lookout!

  • @Perriax
    @Perriax 3 года назад +1

    I liked the fern.

    • @RichardOnData
      @RichardOnData  3 года назад

      Yeah it's kind of sitting in a corner of my room right now, it needs a comeback

  • @banjohead66
    @banjohead66 3 года назад +2

    Between one fern.

  • @kloelind1935
    @kloelind1935 2 года назад +1

    You forgot step 4, plants! :D

  • @patrickbateman7665
    @patrickbateman7665 3 года назад +1

    Hey Richard You have Zero Dislikes on this video. You want one ? :)
    Just Kidding

    • @RichardOnData
      @RichardOnData  3 года назад +1

      Haha, now that surprises me because this is one of my first videos, and in my own opinion I've greatly improved in literally all aspects of the content creation process. Just a matter of time!

    • @patrickbateman7665
      @patrickbateman7665 3 года назад

      @@RichardOnData Yes You deserve this Richard . Like to see more videos from you 😎😎🔥🔥

  • @pathansahab2977
    @pathansahab2977 3 года назад +1

    you look like mark zuckerberg lite version.

    • @RichardOnData
      @RichardOnData  3 года назад +3

      I'll take it. Who knows, maybe I'll even become part robot and part lizard as the years go by.

  • @wlaidos112
    @wlaidos112 2 года назад

    Your content is good but Your way of talking is really annoying