How Much Statistics Do You REALLY Need for Data Science?
HTML-код
- Опубликовано: 28 ноя 2019
- Subscribe to RichardOnData here: / @richardondata
In my last video I discussed the fact that statistics is a must-know component of the broad, multidisciplinary data science skill set. If you missed that video you can find it here: • What Is a Data Scienti...
A book I highly recommend, especially for the non-mathematical reader, is "How Not to be Wrong" by Jordan Ellenberg. Find it here: amzn.to/2U1FjpQ
However, not everyone going into data science necessarily has a statistics background. It begs an obvious follow-up question: how much statistics do you REALLY need for data science? Education is valuable but not every single thing you learn in a traditional statistics degree is a hard and fast requirement. Here are, from my perspective, the core skills you need:
Fundamentals
- Probability calculations including conditional probability/Bayes rule and the Central Limit Theorem
- Basic understanding of distributions including properties of random variables such as expected value and variance
- Full confidence interval framework
- Full hypothesis testing framework including p-values, conclusions, Type I and Type II error
Tools
- Linear models (how to setup, interpret, iterate)
- Machine learning models including setting them up in a programming language from pre-processing to outputting results, also understanding the bias-variance tradeoff and how to address over (and under) fitting
- Survival analysis
Reasoning
- Assumptions of tests and models used
- How bias affects results
- Confounding variables and Simpson's Paradox
#statistics #datascience #StatisticsForDataScience
PayPal: richardondata@gmail.com
Patreon: / richardondata
BTC: 3LM5d1vibhp1F7pcxAFX8Ys1DM6XLUoNVL
ETH: 0x3CfC599C4c1040963B644780a0E62d45999bE9D8
LTC: MH8yPjvSmKvpmRRmufofjRB9hnRAFHfx32 - Наука
Thank you for these videos! As a programmer looking to get into data science, this video in particular has been exactly what I'm looking for as far as getting an idea of what I should work on learning from the statistics side.
Was looking for gold, found a gem.
Well thanks, I appreciate that!
This channel really is the only one I watch every single AD. Thank you so much for what you're doing, please keep up!
That really means a lot, thanks so much, no intention of ending what I do here any time soon!
I am learning so much from your videos! Thank you, thank you, thank you!!!
That is precisely the goal! Thanks so much!
It is incredible that you have such a low number of followers considering that you are one of the best data scientist RUclipsr out there.
Keep it up, your efforts will eventually pay off!👍
Thank you so much! My channel is relatively new -- only about four months old which is a very small amount of time in the RUclips world -- so growing it is and will be a work in progress!
agree
I just started my Bsc in data science and I just stumbling to understand why I even chose this path. This video really helps to provide a bird's eye of the career I moving towards. Awesome job!!
Thank you and best of luck to you!
This is pretty cogent advice. I personally like to know linear algebra and calculus (the maths) but recognize that stats and statistical reasoning are more important to avoid data science mistakes/pitfalls.
Very concise and helpful. Thank you
Dude thank you so much. Im starting my Masters in applied stats this summer and looking forward to it. I like the way you grouped these different elements into their own categories, it definitely helps with understanding these concepts. Also curious here, what was your favorite class during your masters and which class do you give the utmost importance to?
I have to assign a tie here to my Machine Learning class and my Statistical Consulting class. Got all the methods, and got all the straightforward yet powerful ways that things could be applied. Best of all worlds.
Again a very helpful talk
I see how RichardOnData channel was born under the sign of excellence !
Hah! Thank you!
Great video, Richard! Really appreciated the effort in the examples 👍🏿👊🏿
My pleasure! Examples are how you learn!
Wow, this is the best video explaining data science in general, Thanks a lot
You are most welcome, I'm glad that it provided a helpful overview of the field!
A huge knowledge. Appreciate for your effort to make this video.
Glad it was helpful!
Great tips! Great video! Much appreciated! Thank you so much!
Glad it was helpful!
Very helpful information, thank you & keep up the great work & efforts 🙏🏻
Thank you! Will do!
I really like this video because it's very well-detailed. Currently, I'm using ChatGPT to help me develop data science projects. As a part of the prompt, I inserted all the stats and math concepts mentioned in your videos and asked it to summon each topic when needed and debate with me when each of them is important in each part of the process. In that way, I'm "forced" to think more deeply about the process and not just rely on library functions and hope for the best. Thanks!
I firmly agree that those are the basics. Once you have those basics you can pick up whatever tool you need for modeling and analysis. If you lack those basics it can be hard to pick up any specialized tools that you may need for specific problems. I would quibble on survival modeling: I learned from my work in life insurance and banking that many of the standard survival models have functional forms that make it impossible to analyze key business problems. In lending most standard survival functions assume shapes that are horribly wrong, resulting in poor business decisions.
wow, thank you so much. Very very helpful!
your efforts are appreciable , keep going
That I will! Thanks for watching!
great videos. I have a background as a Business Analyst and I might add that a really important skill for those people who live between data science and business analysis is Excel and VBA programming skills, which are ubiquitous across almost all industries.
Thank you so much but we need a video that tell us where to take these classes . Or a study plan for them and where to find them
Invaluable. Thank you!
Very welcome!
Nice list. I would have included time series instead of survival analysis. Also, my go-to predictive models are penalized generalized linear models (aka: "elastic nets").
Fair enough! Elastic nets are one of my favorite techniques too. I might be speaking from a bit of an experience bias, considering I've done more survival analysis than time-series analysis myself, but that's an incredibly fair point.
Thank You. you hit the nile on the head. in my question how long does it take all these mentioned staff yrs months?
I think you can be in a pretty good position with much of this stuff after a few months (3-6 months or so), particularly if you're making use of things like Coursera courses. Of course the best possible option is a 4 year bachelors degree (or 2 year masters) but that's not necessary for absolutely everybody.
Great video man , keep them coming
Thank you! More to come -- 6-8 per month is the goal!
i'm brazilian student, and this guy is really and absolute funny and impressive smart, the way his speak and tauch is amazing, i'm really impressed.
Good to know, thanks a lot for sharing!
My pleasure, hope this video was helpful!
Please do a video tutorial series for these statistical concepts!
I have a Statistics Tutorial series already on my channel actually - and these ideas all form the bedrock of that series!
Having said that these videos tend to be some of mine with the lowest number of views. So if you'd like to check them out and let me know which topics you'd most like me to cover on top of those, that'd be awesome!
Hi Richard, I am wondering if you can suggest a book on statistics beside the one you have mentioned on the description, something more for people with math knowledge and want to know more in-dept statistics. Mostly, I am wondering if there is a book which covers the fundamentals you have listed?
A really good video to watch, I really like your explanation! Do you think that statistics is also needed for Data Analysts? I think you should also make more videos about data analytics, because I notice that there a LOT of jobs available in this field compared to data science. Thank you so much in advance! You're the best :)
That's interesting that you're seeing data analytics have more available jobs in the field. What area are you in?
I would say statistics is certainly still important - particularly to know methods and how to reason - while the deep theory of it probably won't be as important, as you're unlikely to be involved in creating your own methodologies in data analytics at least compared to data science.
You gave me a clear picture of what a data scientist do. Thx
That's the idea! If you have other things you'd like to see me cover please let me know.
Thank you Richard, that's really insightful!
I have a BS in Mathematics, which degree do you think is better to break in the data science field: a MS in Applied Statistics or a MS in Data Science?
Great question. This is going to depend on curriculum and whatnot, but my guess is that a lot of the statistics stuff will be easier to pick up with if you have a mathematics background (that's the beauty of some linear algebra knowledge!). I'd probably go with the Data Science MS especially if you're not already also a CS expert!
super helpful , thank you ✨
So glad!
Totally Worth Sub'g your channel.
Thanks and welcome!
Your video is so informative!! Im currently deciding my major and need ur advice :( Before getting a MS in Stats, would u recommend majoring in Applied Math OR Stats for BS if I wanna go into a DS field??
Thank you so much! That is the goal!
Unless you have a burning passion for pure mathematics, I would go with the stats BS. While the strong knowledge of calculus and linear algebra will probably serve you well, I've actually heard from applied math MS's who say their pathway was a bit too theoretical for industry DS.
Here also 5y experience of data science jobs as a statistician, and I can confirm pretty much everything Richard is saying. I would also like to stress how important the reasoning part is. Data quality issues lurk EVERYWHERE and bad data is handed over to you all the time. Statistics help you imagine a process that has produced your data. In other words, your data is a sample drawn from a certain distribution. When that process suddenly changes, you'll be there saving the day.
Thank you! Yes, I think one of the greatest failings in academia right now is a failure to emphasize that real world data is NOT clean and clear. I wish there was significantly more emphasis on thinking through practical data quality problems.
Wooow this such a perfect breakdown for the data science subject..
Well I finished one semester in master data science but I have an acceptance to study applied statistics in another university which is one of the best university .. my bachelor degree in MIS .. and I am hesitant which subject to choose!!
Glad you enjoyed the video! I'm a fan of the concept of Data Science degrees, but at the same time I'd take a strong Statistics education over a weak Data Science education any day. It'll come down to how strong that DS curriculum is.
count me in as your students sir! im taking action to build a skills that can be market . Im quite overwhelm to a lot of information so i decided to follow you and lets seee
Fantastic! That's very kind and I'm glad you're here.
Bravo!
Very informative contents
Thank you
Glad you enjoyed it!
I have a PhD in Pure mathematics, specialising Dynamical Systems. Today I am a Software developer/Software Architect.
Please correct me if I am wrong, for me Data Science is 80% Statistics.
Computer science guys doing data science , ok I can live with that.
A Post graduate or an graduate in Statistics for me is a better fit as a Data Scientist. Programming is the easier part
Could you please provide some sort of examples for each component?
Thanks
Good information Thanks
You're welcome!
Good Points!
Presentation 100% thanks man appreciate it. May Allah guide you and everyone else.
;)
Much appreciated!!
Very interesting.
I am coming from a hospitality background and I am about to conclude my Mba in big data applied to marketing which basically is giving me a better approach from the data that usually I am dealing with.
Most of the times is to predict some output(demand) based on my history using time series or best price that will maximize my profit (linear regression). As my goal is to stick with pricing field and finding solutions related to pricing and demand.
Would you consider that in this case I should dive into a BS in statistics to get a better foundation or Data science ?
I work in the revenue management department just to give some context.
Thanks
Do you currently know much in the way of computer science fundamentals? Maybe know one of R or Python? Want to stick with that role for a while? I'm inclined to say statistics here because programming resources tend to be easier to find than solid statistics resources, and knowing some of the fundamentals serve you really nicely.
RichardOnData currently I am learning python and use r to run some regression models and other stuff that is required in the work.
Thanks for the advice I think that I will stick with statistics then.
Super underrated.. I used to work as a DS in my last job.. many of my peers asked me how to become a DS because it's a nice buzzword and gusss what.. everyone thinks.. just learn python and you are a DS... lol.. yea but they don't like statistics and are afraid of.. yes I don't get if either
Could you please suggest some books and material that one can learn things you just mentioned from?
Thank you!
Highly recommend the book "An Introduction to Statistical Learning: With Applications in R". I also recommend the book "Approaching (Almost) Any Machine Learning Problem" due to its practical narrative. If you want to go deep into the theory, the book "Statistical Inference" by Casella & Berger is classic, albeit probably more than most people require.
Even I am persuing MSc in Applied Statistics , so is it better than MSc in Data science?
Can you please recommend any source where you can apply statistical solutions to real world problem? thanks in advance.
Naturally, there's nothing quite as good as true experience, but the closest thing that I would encourage is competitions on kaggle.com.
hi richard, i have a question, I want to know that do we need to study algorithms as a general topic (may be written by thomas cormen) in order to get a clear understanding of how algorithms are working in real life basis? kindly suggest, i am an aspiring data scientist
When you say how they work in real life - do you mean an intuitive explanation of how the algorithms work (getting rid of mathematical jargon), or how industries in the real world are using them? If it's the latter, it's helpful to read blogs like KDNuggets because there's articles for applications of AI, etc. all the time. For explanations of how algorithms work, books like "An Introduction to Statistical Learning" and "Applied Predictive Modeling" are excellent.
@@RichardOnData thanks a lot Richard, i have another question, lets say, i have interest in how AI and ML is helpful in cyber security domain. So in that regard, I want to know, how much domain knowledge is required in cyber security? Am i suppose to train myself as a hacker or just knowing the basics may work fine ? I have my friend who is OSCP certified ML practitioner, and i am really not sure about how much knowledge should someone have into cybersecurity, to start a career in that domain. Kindly assist.
Very informative! Do you have a course? if not, have you thought about making a course on Udemy? It's also good to have information here, but I'd be surprised if you made a whole course on just RUclips. But I'm now a subscriber because I really liked this videa for breaking down the types of approaches to learn.
Hello can you provide what are the best statistics textbooks for data science
I have a video on 5 machine learning books I recommend... granted the focus is ML there but many of them have great value to statistics more broadly. A single one that's a staple for introductory statistics though (with an R focus as well) is "An Introduction to Statistical Learning".
Is there any book you can recommend to learn the statistic neccesary for data science? I have a strong math background so that won't be a problem. Thank you in advance!
If you could read only one book, the highest recommendation I have is "An Introduction to Statistical Learning: With Applications in R".
@@RichardOnData Thanks for your response! I'm planning on reading more than one book actually. My current plan is to read Blitzstein and Hwang's Introduction to Probability to cover the probability part, and the one you recommend to cover the "tools" aspect.
What I feel I'm missing is some material regarding the estimation/inference aspect. I'm currently considering Hogg, McKean, & Craig and Larsen & Marx (slightly inclined towards the latter), but any other suggestion is welcome.
Thank you again for your time.
I didn't see this comment reply for the longest time, sorry about that. If you're still interested in books on this, an excellent one is "The Book of Why: The New Science of Cause and Effect". It's an excellent resource for inference considering most resources out there focus most on estimation!
@@RichardOnData Hey Richard, thanks you a lot for the recommendation! I will definitely check it out.
Sir, what's study path of mathematics, statistics and python or R for data science?
I did a video on this very topic, look up "A Study Pathway for Data Science".... but in short, my recommendation is statistics first, followed by either R or Python. I'd probably do the other of those two next followed by a more rigorous understanding of math.
@@RichardOnData which is the best option R or Python for as a data science.
Hello sir please i am not that good in statistics but I good in Mathematics so can I major in CS and minor in Maths for the data science?
Yes, I still see value in learning statistics such that you have that exposure, but I think that will be a winning combination.
Hello, do you have any courses?
I do not; at this time I do not have any intention to start a course, just due to the abundance of resources out there on the internet. However, let me know if there are particular topics you would like me to cover!
Cool shirt, Richard.
Haha, thank you! I try to switch it up. Collar shirt one day, band t-shirt the next day.....
Thnks
My pleasure
what is your undergrad?
what specifically did you do in your masters?
Thanks.
My undergrad was a Statistics and Economics double major, and I did a masters in Statistics.
Off the top of my head the masters included the following: mathematical statistics, statistical computing, machine learning, statistical consulting. I also did electives in sampling, spatial, survival, and categorical data analysis.
Sir how much data science is needed for Bioinformatics please reply sir
For bioinformatics I recommend strong knowledge of the underlying domain as well as statistics, SQL, and one of R or Python. You are probably not getting into crazy data engineering or too complex of ML in that case, but a healthy understanding of ML algorithms wouldn't hurt either.
Bonjour Richard
Rather than focusing on academic knowledge and theory, why not make videos that talk about the actual application of statistical methods in data science? You could go through some actual projects and some Jupyter notebooks that explain with Markdown how statistics is important in solving that specific problem.
Just stumbled on your channel. A new subscriber here. If you have the time, please put a course out there on Udemy. You are doing a great job with your content!
Thank you; I have received a number of requests to do so in the recent months. I've tried to stray away from doing courses due to the sheer amount of free material online - and I'd have to do a lot of introspection and planning before coming up with something that I think makes sense, but I certainly haven't ruled it out and it's a project I may embark on in the future!
@@RichardOnData thank You! I will be on the lookout!
I liked the fern.
Yeah it's kind of sitting in a corner of my room right now, it needs a comeback
Between one fern.
Indeed.
You forgot step 4, plants! :D
Your avatar checks out!
Hey Richard You have Zero Dislikes on this video. You want one ? :)
Just Kidding
Haha, now that surprises me because this is one of my first videos, and in my own opinion I've greatly improved in literally all aspects of the content creation process. Just a matter of time!
@@RichardOnData Yes You deserve this Richard . Like to see more videos from you 😎😎🔥🔥
you look like mark zuckerberg lite version.
I'll take it. Who knows, maybe I'll even become part robot and part lizard as the years go by.
Your content is good but Your way of talking is really annoying