Before doing all these studies, pick a field where you would like to work as a data scientist. That simplifies a lot of things. For instance, you can only focus on learning specific statistical methods, learn about data in your chosen field, master one programming language + SQL, and learn about cloud computing basics. Let s say that you like to be a data scientist in finance. Then learn statistics relevant to finance, SAS, advanced Excel, SQL. Let's say that you like to be a data scientist in biotech. Then you need a biotech-related PhD + cloud computing + R or Python programming, SQL Let's say that you like to be data scientist focused on Analytics (like most META hiring), you need to learn very basic statistics, SQL, knowledge about products by that particular company. Once you find a job, Keep learning, research on real-world problem solving.
Hey! I did exactly the same combination as you in undergrad. I also went on to do a DS MS and it was totally the right decision for me. I recommend you start digging into some stats courses if you can before making the decision. Personally I finished my Math BS having taken 0 stats classes because "that wasn't REAL math". Big mistake. If I could go back in time I would've taken a lot more stats and a lot less topology and number theory.
I read this as "Doing some theoretical math BS" which is a bit different than doing "a theoretical math BS." I highly recommend introduction to machine learning by Alpaydin btw. Helped quite a lot in my first year of grad school
Great stuff! I'm currently pursuing a master's degree in data science, and learning a ton of mathematics. This semester I'm enrolled in linear algebra, applied statistics, and graph theory.
Some other recommendations to add, but it could be a little over the top, are books of machine learning. Pretty good free books, that can be downloaded free (and legally) are "Elements of Statistical Learning" of Hastie, Tibshirani and Friedman: this one requires advanced mathematics, similar to the requirements you mentioned for the Mathematical Statistics book, and is THE book to learn machine learning. But there are other book of the same authors, "An Introduction to Statistical Learning" of Witten, James, Tibshirani and Hastie that tries to be more about application of machine learning with less mathematical deep. Still is really good
Great resources! I loved the fact that you included textbooks that require proofs. For me, the type of math taught at engineering schools is what I'd call (in an analogy to software testing) "black-box math": you know how to do computations, you know what the theorems are used for, but you don't get to see the "code", the logical structure that makes all these theorems actually true. I prefer "white-box math", even if it's a lot harder, it's a lot more rewarding at the end of the day, and you end up having a more profound understanding of how and why things work.
these white-box math must be taught in masteral classes, we use it in validating/challenging the calculations done by the professional engineers whether they understand the suitability of their formulas for the job , more often the products are overdesigned and super expensive , im sick of engineers making legacy projects at the company's expense
Before doing all these studies, pick a field where you would like to work as a data scientist. That simplifies a lot of things. For instance, you can only focus on learning specific statistical methods, learn about data in your chosen field, master one programming language + SQL, and learn about cloud computing basics. Let s say that you like to be a data scientist in finance. Then learn statistics relevant to finance, SAS, advanced Excel, SQL. Let's say that you like to be a data scientist in biotech. Then you need a biotech-related PhD + cloud computing + R or Python programming, SQL Let's say that you like to be data scientist focused on Analytics (like most META hiring), you need to learn very basic statistics, SQL, knowledge about products by that particular company. Once you find a job, Keep learning, research on real-world problem solving.
@@slhermit i want to create engineering heurictics for energy systems to predict results, is that doable through data science math ? context : im not an engineer but a budget officer , want to assert that a building with excessive quantity slender columns and unnecessary expense and 80% chance of not surviving seismic level 7 , without doing/reading horendous calculations by opportunistic engineers
Highly recommend Introduction to Statistical Learning by James, Witten and Hastie. It is a clear and thorough exposition of the bias variance tradeoff as well as a variety of common models.
It's my first year in university studying data science, and i can tell u this the best video that explains what are the "must know" for a data scientist, and also i appreciate all the books review videos , they're just amazing
This is what I'm talking about! CS and Math major here. Love the merging between analysis, probability, and data science. My strongest opinion on the books there since I've only read a couple is that Gilbert Strang's Linear Algebra and Its Applications is amazing for a second course in linear algebra and is well suited applied mathematics.
Thank you Sorcerer for your continued prodigious output of Mathematics information sources. You are a truly valuable resource for those of us trying to learn this stuff on our own. Your enthusiasm is infectious. Well done and Kudos.
@naydoorf Pretending? Yeah and he also makes money off the views for every video too so what? You're clearly lacking common sense. Making money off something equates to pretending nowadays apparently. You act as if he recommended Harry Potter books for a data science topic
I also wanted to recommend "Introduction to the New Statistics: Estimation, Open Science, and Beyond" by Prof. Geoff Cumming (2nd edition coming in 2023). Prof. Cumming really explains very well the predominant importance of confidence intervals and effect sizes as opposed to only null hypothesis significance testing. 😉
Data scientist is one of the careers I’m looking to become, but I’m also interested in becoming a mathematician or math professor. Thanks for the books!
Oh, so you’re okay. You got us going there. That’s kind of weird. I don’t know if I’m going to follow this channel anymore. Good luck with all that drama.
It is amazing how many of the textbooks I used for my Mathematics Bsc(Hons) from the 1980s are still used. I used both the Gilbert Strang Linear Algebra with Applications and the Seymour Lipschultz Schaum outline Linear Algebra. For Probability and Statistics I used Introductory Probability and Statistical Applications by Paul L Meyer and Introduction to the Theory of Statistics by Alexander M Mood, Franklin A Graybill and Duane C Boes. Although not my favourite at university, because I worked in the Banking industry afterwards Statistics proved to be one of the more useful subjects I learned at university.
As you mentioned references quite a few times in this video. In data science/engineering it is so damn important to be precise to the letter about what assumptions, tools, methods, and state of data you used when, and why. Clean work is always important in any science or engineering topic, but in applied work with data, it is so easy to be off significantly. And expressions that clearly show you that you are wrong like the lagrangian in mechanics are rare. In Europe we use the Lothar Papula quite a lot for math, I learned python by extending the basics, tutorials, and examples from the documentation. Now I`m working with "Data-Driven Science and Engineering" by Brunton & Kutz as well as "Dynamic Data Analysis: Modeling Data with Differential Equations". It's quite tailored to control theory and system engineering so might not be the best "Data Science" book but it`s great if you want to build robots, machines, etc.
My only addition would be a book on design patterns for software development. It helps particularly when you are going to be working in the same code for a long period of time or with a larger team of people. But the choice of book here is going to depend on the language you are working in. Otherwise great picks. The stats book with Mendenhall, Wackerly, and Schaeffer is also my first reference book.
I'm a biologist starting a data science master's in January. It's been a bit stressful trying to relearn calculus and statistics - i didn't do well when I took them for the first time. Your videos calm me down and give me hope! I know I can do it, it just takes a bit of time and practice. Thanks for your videos! I'm very glad to have found your channel.
Wish you success, relearn might sound daunting for some but for me sounds interesting to rediscover Maths with a new pair of eyes after working as SE for several years.
@@RyeCA Things are going well. I'm halfway through my degree now. Math is less scary, and i stop freezing when I see letters in my math. I'm taking a beefy stats class that requires calc 1,2, probability and stats soon and I can't believe I made it this far!
@@RyeCA p.s. to be honest, I am still slower compared to my peers who come from a statistics / computer science / math background, but I am managing fine in my projects and classes. I just take more time
Great video! The following information that I will provide should be taken with a grain of salt. If a person is a college student, one beneficial major that could lead to a future career in Data Science is Computer Science (CS) with a minor in Statistics. When comparing a Computer Science Degree with a Data Science Degree, the coursework is fairly similar. As an undergraduate, I prefer a Computer Science Degree as it emphasizes programming concepts that will help when applying for jobs as a majority of Data Science jobs require a Master's Degree for candidacy, while CS jobs mostly require a Bachelor's Degree. Additionally, a minor in Statistics would allow students to cover the statistical concepts mentioned in this video.
Man I used to watch your videos some years ago when I was doing my bachelor's degree in statistics! Your videos helped me more than the lectures from my teacher. To be honest, I completely forgot about the channel and now I'm learning about data science and your video popped up again! Thank you for all those efforts your videos are so clear and easy to understand. Love from Nepal
hi math sorcerer i'm using R for projects and there's a book that i'm using called R for everyone by jared lander which he explains R from a beginner level and i'm happy that you've included a book on R which plays a crucial role in data science.
Norm Matloffs book on R is excellent. I use both R and Python. I'd say that for all things stats related, I use R. I tend to use python for a lot of data pipelining and nlp. The statistics procedures in python tend to be problematic. I don't believe in language wars though. I use C/C++/fortran within r and python to speed up stuff as needed as well. I also keep SAS guides handy. They are excellent for understanding procedures and have paper references. It's fallen out of favor though. 'Design and analysis of experiments' is good to have. Great job including it. Not many people understand that topic.or the need for it. I'd recommend knowing hierarchical modeling as well. Gelman and Hills book is one I would highly recommend. Last thing I would like to make clear is that you would be a good data scientist if you don't think with your tools/math first. Tools are tools. Your job is to solve problems. In most cases, the reason for your job is to enable the employer to make or save money. Many data scientists think their job is an extension of grad school. So, they want to use the latest and greatest algorithm they read about. Great minds. But highly ineffective, who end up wasting their and everyone else's time. Putting things into use in a running machine like a complex business, is hard in itself. The more complicated your solution, the longer it will take to make it useful, it will be expensive to maintain, will need constant supervision, and leave everyone exhausted and exasperated. This is not trivial. Data Science courses popping up produce unusable talent because it's taught by people who have never done any real work.
Most data scientists can use python and pandas along with PyTorch to do the math needed. The key skill in data science is how to get the data and figure out what you can do with the data.
Thankful that you made this! Working on a Master's in Data Science, but to be honest, my stats background it pretty weak. I will DEFINITELY be ordering some of those books!
Hey, this is very helpful! I'm already a programmer (Python is one of the languages I'm more familiar with) and I was actually looking into how to learn statistics and data visualization since I can't do much more than basic bar and line plots, and I'm not very strong when it comes to mathematics (surprisingly enough, generic programming doesn't require much more than pre-algebra, numeric systems such as binary, octal and hexadecimal, and boolean algebra). So It's good that you just made a video on books on how to get started! It does feel like an overwhelming amount of math, specially considering I still have to review a lot about pre-algebra because I never touched it ever since high school many years ago but... I'll do my best!
So crazy because I was researching what I should learn in mathematics next! I’m an engineering major and I’ve taken Elementary Linear Algebra, Ordinary Differential equations, and Multivariable Calculus. I’m curious about statistics because I will take a statistics class for engineering and a math methods class for engineering. I do not want to stop learning math because I love it so much. Should I continue to learn about statistics or should I go down discrete mathematics, math proofs, real analysis ect. Maybe learn about PDEs or complex variables? It’s very confusing which one I should do or how a math subject is relevant to me and engineering. To be more specific, I’m a mechanical engineering major. I’ve been watching your channel since I went back to college and started taking college algebra! That was two years ago! Maybe a minor in math or statistics is in the works whenever i transfer to ASU a 4 year university. Thank you!
In order to learn more upper level math, you need analysis and math proofs and you should have a course on discrete math. But not the typical discrete math a CS major takes, that is usually a bit too elementary and easily learned through your intro to proofs class (at my college the proofs class counts for discrete math as well), so take a course in applied combinatorics or something similar. Your course in statistics is probably great for engineers and stuff but after one class you can't get far without a background in combinatorics and proofs. Even my first undergraduate introduction to probability theory assumed most students were or had taken analysis and the prerequisites were applied combinatorics, proofs and multi . So, needless to say, proofs and combinatorics are both really important. Additionally, any graduate level probability or some statistics will require analysis all the way through measure theory (typically three semesters worth of analysis). For your goals though, I would say your statistics course will suffice for now as a mechanical engineer. I personally think that you would get a lot more out of both PDEs and Complex Analysis. Both of which are beautiful and extremely relevant for all mechanical engineering occupations. Additionally, you should consider numerical analysis but it may be too much for your degree. Consider it though, it's really interesting and just as relevant as PDEs to mechanical engineers.
@@thefourthbrotherkaramazov245 Wow! First thank you for the information. That makes sense that the proofs class and real analysis are the key to the upper division levels because I have seen real analysis is a prerequisite for many upper level math. Arizona State also takes the math proof writing course in place of discrete mathematics. I did not know that combinatorics were that significant. I am trying to pursue a math minor and I was looking into which other courses besides Linear algebra and statistics I should consider. So again thank you so much. This is great knowledge for anyone else wondering the same. I love that in my mechanical engineering degree I can use 3 classes in upper level mathematics. I think I will take the math proofs course and real analysis for the first two and I think I will look more into Numerical analysis because there is two numerical analysis courses that is available fore me as a mechanical engineering major. I’m a little more familiar with PDE’s and complex analysis but I think I should look into numerical analysis before I make the decision what I should use for the third class considering how important that is too. I should note that I am using my electives for 3 math courses and I would take the extra math courses for a math minor. There is a possibility I could double major but I would rather talk to an advisor. I am aware that many people end up with the extra degree with the careful planning of the two degrees.
@@tmann986 That sounds great! As a heads up, real analysis is useful but moreso just for understanding higher maths. It is pretty useless itself. What I mean is, if you take 3 upper math classes and leave it at that, it is way less likely that you will use it versus a course in PDEs. Now, if you are just curious and you want to learn it, go for it! Im also not saying you wont self study in the future or go to grad school. But some math classes, as interesting as they are, are not so fruitful for most occupations outside of math research. You should hear someone else's input but I think it should be noted. Glad I am of help! Good luck on your degree
@@thefourthbrotherkaramazov245 haha that makes sense! Real analysis is really out there on the pure mathematics side. I watched blackpenredpen video of him going over an epsilon-delta proof and I just loved it! Again I can go my whole life knowing the calculus series without the E-D proof but I think its cool nonetheless! My calc 3 professor didn’t think it was nesscessary to go over the epsilon delta proof so I went ahead and watched TheMathSorcerers videos on the proofs and did the problems with him and it was really fun! It was really funny to learn these techniques to solve ODE’s just to be told later we have approximation methods since most real applications have more variables (PDE’s) and need the computing power to approximate an answer haha. Your input was gold btw!
I loved math at school and taught myself a bit more as a medical student. I would like to learn enough to understand modern physics including quantum theory and general relativity. This is one of the few channels I watch which does not have irritating music in the background. I hope you are not considering it?
heres my thoughts on it i would do it in this order: Probability Mathmatical series and convergence, numerical methods for analysis Matrix and linear algebra bayesian statistics vectors calculus markov process and chains optimization (linear and quadratic advanced matrix algebras and calculus (gradients, divergence, curls etc)
Good selections. I would also recommend one on Bayesian inference (ET Jaynes Probability Theory is good),and graphic display of information (Tufte's books). There are also several good books with code outlines for basic Machine Learning / AI algorithms.
I love "Mathematical Statistics with Applications", but I'd suggest a more rigorous book like "Introduction to Probability" for the first half. Blitzsteins STAT 110 lectures from Harvard (what the book is based off of) are on RUclips. It'd more digestible than "Statistical Inference" for self study, but covers the first half of the material and then some very well
Going to have to say that Python is the better language to take up. Used both R and Python in a statistical machine learning course and it was extremely difficult to understand the R examples, while it was very easy to understand the Python examples.
R, python, julia, etc., they all have strengths and weaknesses depending on your use case so I think you gotta give terms and conditions when recommending languages lest you lead people astray. R has an enormous advantage in biomedical packages thanks to Bioconductor. It also leads in niche economic analysis where a lot of the academics use R. Python is more prevalent in industry and is a fabulous general scripting language.
Nice list. I thought Clifford Algebra and geometric calculus were making inroads in data science. You can do a lot of what vector calculus does in conceptually much easier ways. I know they have made some waves in the 3D graphics community. For linear, Strang is really good for anyone who struggles with getting a grip. His alternate approach can help a lot.
very interesting how you inteoduce linear algebra after the calculus books. I'd have thought you would want to order them so people making a list of what to look out for and not mess themselves up studying something they don't have the prerequisite knowledge for.
On your section of Linear Algebra you say that it is "sometimes used"; I work as a computational scientist and do a lot of data science in my job and everything is linear algebra. For example SVD (singular value decomposition) is used for principle component analysis and is also used for least squares estimation. The more linear algebra you know, the better. :D The regression suggestions tie in to the importance of time series methods. I've only used "Nonlinear Time Series analysis" by Kantz and Schreiber but I've seen people recommend "Time Series Analysis" by Hamliton.
As an aside, what is your opinion on Axler's Linear Algebra Done Right? The new fourth ed will be made available for free online so it may be a useful recommendation as well.
College-level textbooks usually have monopoly-dictated (aka "eye-watering") prices. Fortunately, they get gratuitously revised to mainatin sales, (making each class buy new books, not last year's), so second-hand versions should be available inexpensively. Learning to program is only minimally about the languages, (each of which has its special niche; learn to identify and pick the right tool for the job, though any language can be contorted to solve a given problem. Start by learning to write spreadsheets, which are programs in disguise. There are a lot of aspects of programing that often get overlooked, like designing test data, using version control, taking advantage of existing tools to simplify programs, &c. Remember that the greatest productivity tool in programming is plagiarism. DRY; Don't repeat Yourself, and dont repeat anyone else's work if you can use it. The world is awash with bad, redundant code, painfully cranked out by grad students ar huge opportunity cost.
I'm a big fan of old books but some of the stat books like Nonparametrics (Hollander/Wolfe) are greatly revised with newer editions with tons of computer code to work along with the exercises and see how to apply some of the methods.
I study mathematics and a little interested in data science but have never learned it because it requires programming skills and knowledge beyond that of mathematics I’ve learned so far. Though I’ve already learned fundamental statistics and probability theory using measure theory, they seem to be not enough
I was told that a deep understanding of mathematics isn't necessary anymore, because computer programs can do all of that work for us. But I never believed that. My main interest is social science research, and as far as I can tell, there's still a big replication crisis in the field. I'm pretty sure that a deeper understanding of mathematical statistics and related areas of math will be a vital part of the solution.
One my colleagues wife is a Social Science lecturer at a university. She said that one of the main problems with the students they get is they are unable to do Maths.
@@Anonymous-qw I've always felt that was the main problem in social science in general. Too many people go into the field with poor math skills, and they usually don't bother to improve.
@@surrealistidealist they go into that field because they're not forced to do math by it. It's not a they problem, it's the field itself that's the problem.
I've heard nothing good about R. What I've read was "21st Century C" who argued he was constantly writing C routines to speed up his R programs until he decided to ditch R altogether and just use C. In a similar vein, I just can't stand Python. I thought we gave up positional syntax with COBOL. I much prefer Perl, the original programming language for data analysis. And overall, I really prefer C. Sure, you can get some good libraries for Python, but there are even more and more solid ones for C.
@@TheMathSorcerer do you have anything for Bayesian statistics? I’m in a Georgia tech machine learning program - can reach you over email for questions I have for you.
Lord Sorcerer, do you know some books or paper about: -Numerical Method for Variational calculus. -Nonlinear programing. -Numerical nonconvex optimization. -FEM optimization ( reduce computer load, maybe Dim. reduction)
All I can say: the bigger the book with answers the better l fell, because I want a comprehensive tanky book to finish most of the job at once. And I don't know if this approach is right or not.
You just need to figure out how to utilize a microchip and the code references related to the voltage and expectation or what frequency band do you attempt to influence while utilizing the statistical data you are looking for or require for a specific outcome you are seeking.
This video is fantastic! Thank you for this advice. Would you recommend some books for those who are machine learning scientist? In my case, I am studying for a Ph.D. in Machine Learning (Deep Learning), and I have noted some lack of math when I am reading papers. I see everywhere that introductory algebra, calculus, and statistics are needed. But that is not how I see it. I would like your opinion and if you could make a similar video recommending some books. Thanks
I like physical books and have a prodigious Kindle Library, but how do you feel about the subscription services like Packt, O'Reilly, or Scribd that offer lots of digital materials for study?
I've used both Python and R fairly extensively, and I recommend learning Python. Python is a general-purpose language, so you can use it for a lot of tasks, not just data mining. R, on the other hand, is a special-purpose language that you are unlikely to want to use for anything else. The programming skills you use for Python are readily transferable to other languages, while for R you'll have to learn some structures and syntax that are unlike other languages. Both languages have packages that provide essentially the same capability.
Guys if you want to die quickly in your learning journey to become a data scientist then waste your time in math books. Learning math and spending long time in it is something wrong and waste of time. You should learn Python then learn fundamentals of data science and machine learning by writing python code. You cannot understand data science and machine learning without writing code. After solving some problems in python, you will understand what is needed for this field and you will know which statistics and math topics you need to learn or relearn. Don't waste your time in math before practicing data science and machine learning in Python otherwise you will find yourself lost in a lot of math without any context. You will enjoy math in programming and you don't need advanced math for most of problems.
While I understand the value of Algebra ( as we encounter some concepts like eigenvector , a concept in Matrix theory) , Stats ( for sure) what I really don’t understand is the significance of calculus in data science. Can someone make an exclusive video on this the applicability of differential and integral calculus in general?
Before doing all these studies, pick a field where you would like to work as a data scientist. That simplifies a lot of things. For instance, you can only focus on learning specific statistical methods, learn about data in your chosen field, master one programming language + SQL, and learn about cloud computing basics.
Let s say that you like to be a data scientist in finance. Then learn statistics relevant to finance, SAS, advanced Excel, SQL.
Let's say that you like to be a data scientist in biotech. Then you need a biotech-related PhD + cloud computing + R or Python programming, SQL
Let's say that you like to be data scientist focused on Analytics (like most META hiring), you need to learn very basic statistics, SQL, knowledge about products by that particular company.
Once you find a job,
Keep learning, research on real-world problem solving.
Thanks!
What if I want to use data science for sports analytics, how should I go about that?
@aiiishiba it's sport specific, but I'd start with sabermetrics
Great advice thank you 🙏
Doing a theoretical math BS with a minor in comp sci and I'm heavily considering data science for grad school. The perfect video to watch!
A Master in Data Science or a Masters in Statistics will help you a lot in finding a Data Scientist or Machine Learning Engineer Job
Im doing a BS in comp sci with a minor in math and Im thinking the same thing
Hey! I did exactly the same combination as you in undergrad. I also went on to do a DS MS and it was totally the right decision for me.
I recommend you start digging into some stats courses if you can before making the decision. Personally I finished my Math BS having taken 0 stats classes because "that wasn't REAL math". Big mistake. If I could go back in time I would've taken a lot more stats and a lot less topology and number theory.
I read this as "Doing some theoretical math BS" which is a bit different than doing "a theoretical math BS." I highly recommend introduction to machine learning by Alpaydin btw. Helped quite a lot in my first year of grad school
@@RyanOManchester lmaooo. One could argue those aren't too far apart
Great stuff! I'm currently pursuing a master's degree in data science, and learning a ton of mathematics. This semester I'm enrolled in linear algebra, applied statistics, and graph theory.
Some other recommendations to add, but it could be a little over the top, are books of machine learning. Pretty good free books, that can be downloaded free (and legally) are "Elements of Statistical Learning" of Hastie, Tibshirani and Friedman: this one requires advanced mathematics, similar to the requirements you mentioned for the Mathematical Statistics book, and is THE book to learn machine learning.
But there are other book of the same authors, "An Introduction to Statistical Learning" of Witten, James, Tibshirani and Hastie that tries to be more about application of machine learning with less mathematical deep. Still is really good
Great resources! I loved the fact that you included textbooks that require proofs. For me, the type of math taught at engineering schools is what I'd call (in an analogy to software testing) "black-box math": you know how to do computations, you know what the theorems are used for, but you don't get to see the "code", the logical structure that makes all these theorems actually true. I prefer "white-box math", even if it's a lot harder, it's a lot more rewarding at the end of the day, and you end up having a more profound understanding of how and why things work.
these white-box math must be taught in masteral classes, we use it in validating/challenging the calculations done by the professional engineers whether they understand the suitability of their formulas for the job , more often
the products are overdesigned and super expensive , im sick of engineers making legacy projects at the company's expense
Python and R (Programming Languages) 0:43
Calculus 2:00
Linear Algebra 3:36
Statistics 6:56
Specialized Books 9:52
Before doing all these studies, pick a field where you would like to work as a data scientist. That simplifies a lot of things. For instance, you can only focus on learning specific statistical methods, learn about data in your chosen field, master one programming language + SQL, and learn about cloud computing basics.
Let s say that you like to be a data scientist in finance. Then learn statistics relevant to finance, SAS, advanced Excel, SQL.
Let's say that you like to be a data scientist in biotech. Then you need a biotech-related PhD + cloud computing + R or Python programming, SQL
Let's say that you like to be data scientist focused on Analytics (like most META hiring), you need to learn very basic statistics, SQL, knowledge about products by that particular company.
Once you find a job,
Keep learning, research on real-world problem solving.
@@slhermit i want to create engineering heurictics for energy systems to predict results, is that doable through data science math ?
context :
im not an engineer but a budget officer , want to assert that a building with excessive quantity slender columns and unnecessary expense and 80% chance of not surviving seismic level 7 , without doing/reading horendous calculations by opportunistic engineers
Highly recommend Introduction to Statistical Learning by James, Witten and Hastie. It is a clear and thorough exposition of the bias variance tradeoff as well as a variety of common models.
It's my first year in university studying data science, and i can tell u this the best video that explains what are the "must know" for a data scientist, and also i appreciate all the books review videos , they're just amazing
This is what I'm talking about! CS and Math major here. Love the merging between analysis, probability, and data science. My strongest opinion on the books there since I've only read a couple is that Gilbert Strang's Linear Algebra and Its Applications is amazing for a second course in linear algebra and is well suited applied mathematics.
Thank you Sorcerer for your continued prodigious output of Mathematics information sources. You are a truly valuable resource for those of us trying to learn this stuff on our own. Your enthusiasm is infectious. Well done and Kudos.
@naydoorf Pretending? Yeah and he also makes money off the views for every video too so what? You're clearly lacking common sense. Making money off something equates to pretending nowadays apparently. You act as if he recommended Harry Potter books for a data science topic
I also wanted to recommend "Introduction to the New Statistics: Estimation, Open Science, and Beyond" by Prof. Geoff Cumming (2nd edition coming in 2023). Prof. Cumming really explains very well the predominant importance of confidence intervals and effect sizes as opposed to only null hypothesis significance testing. 😉
Doing a Data Science boot-camp to follow up a Cognitive Science Ph.D. This is a great resourse. Thanks!
WOOOW amazing PhD bro
Data scientist is one of the careers I’m looking to become, but I’m also interested in becoming a mathematician or math professor. Thanks for the books!
Oh, so you’re okay. You got us going there. That’s kind of weird. I don’t know if I’m going to follow this channel anymore. Good luck with all that drama.
@@declanfarber what do you mean?
@@declanfarber same here
@@rusi6219 Some stuff got moved, edited or deleted. Pay no mind then, it is to talk into the aether.
It is amazing how many of the textbooks I used for my Mathematics Bsc(Hons) from the 1980s are still used.
I used both the Gilbert Strang Linear Algebra with Applications and the Seymour Lipschultz Schaum outline Linear Algebra. For Probability and Statistics I used Introductory Probability and Statistical Applications by Paul L Meyer and Introduction to the Theory of Statistics by Alexander M Mood, Franklin A Graybill and Duane C Boes.
Although not my favourite at university, because I worked in the Banking industry afterwards Statistics proved to be one of the more useful subjects I learned at university.
As you mentioned references quite a few times in this video. In data science/engineering it is so damn important to be precise to the letter about what assumptions, tools, methods, and state of data you used when, and why. Clean work is always important in any science or engineering topic, but in applied work with data, it is so easy to be off significantly. And expressions that clearly show you that you are wrong like the lagrangian in mechanics are rare.
In Europe we use the Lothar Papula quite a lot for math, I learned python by extending the basics, tutorials, and examples from the documentation. Now I`m working with "Data-Driven Science and Engineering" by Brunton & Kutz as well as "Dynamic Data Analysis: Modeling Data with Differential Equations". It's quite tailored to control theory and system engineering so might not be the best "Data Science" book but it`s great if you want to build robots, machines, etc.
My only addition would be a book on design patterns for software development. It helps particularly when you are going to be working in the same code for a long period of time or with a larger team of people. But the choice of book here is going to depend on the language you are working in. Otherwise great picks. The stats book with Mendenhall, Wackerly, and Schaeffer is also my first reference book.
As a Data Science undergrad I can say this is a fantastic and comprehesive overview of the matterial we study, great stuff! keep it up!
I'm a biologist starting a data science master's in January. It's been a bit stressful trying to relearn calculus and statistics - i didn't do well when I took them for the first time. Your videos calm me down and give me hope! I know I can do it, it just takes a bit of time and practice. Thanks for your videos! I'm very glad to have found your channel.
Wish you success, relearn might sound daunting for some but for me sounds interesting to rediscover Maths with a new pair of eyes after working as SE for several years.
@@jamesmccaul2945 Thanks for the encouragement! I appreciate it.
Hey I'm in the same boat, how is it going for you?
@@RyeCA Things are going well. I'm halfway through my degree now. Math is less scary, and i stop freezing when I see letters in my math. I'm taking a beefy stats class that requires calc 1,2, probability and stats soon and I can't believe I made it this far!
@@RyeCA p.s. to be honest, I am still slower compared to my peers who come from a statistics / computer science / math background, but I am managing fine in my projects and classes. I just take more time
Great video! The following information that I will provide should be taken with a grain of salt. If a person is a college student, one beneficial major that could lead to a future career in Data Science is Computer Science (CS) with a minor in Statistics. When comparing a Computer Science Degree with a Data Science Degree, the coursework is fairly similar. As an undergraduate, I prefer a Computer Science Degree as it emphasizes programming concepts that will help when applying for jobs as a majority of Data Science jobs require a Master's Degree for candidacy, while CS jobs mostly require a Bachelor's Degree. Additionally, a minor in Statistics would allow students to cover the statistical concepts mentioned in this video.
God bless you.
Man I used to watch your videos some years ago when I was doing my bachelor's degree in statistics! Your videos helped me more than the lectures from my teacher. To be honest, I completely forgot about the channel and now I'm learning about data science and your video popped up again! Thank you for all those efforts your videos are so clear and easy to understand. Love from Nepal
Programming: 0:41
Calculus: 1:59
Linear Algebra: 3:36
Statistics: 6:31
hi math sorcerer i'm using R for projects and there's a book that i'm using called R for everyone by jared lander which he explains R from a beginner level and i'm happy that you've included a book on R which plays a crucial role in data science.
Can we get 'Everything Computer Science' next? Thanks for all the amazing content!
Norm Matloffs book on R is excellent.
I use both R and Python. I'd say that for all things stats related, I use R. I tend to use python for a lot of data pipelining and nlp.
The statistics procedures in python tend to be problematic.
I don't believe in language wars though. I use C/C++/fortran within r and python to speed up stuff as needed as well. I also keep SAS guides handy. They are excellent for understanding procedures and have paper references. It's fallen out of favor though.
'Design and analysis of experiments' is good to have. Great job including it. Not many people understand that topic.or the need for it.
I'd recommend knowing hierarchical modeling as well. Gelman and Hills book is one I would highly recommend.
Last thing I would like to make clear is that you would be a good data scientist if you don't think with your tools/math first. Tools are tools. Your job is to solve problems. In most cases, the reason for your job is to enable the employer to make or save money. Many data scientists think their job is an extension of grad school. So, they want to use the latest and greatest algorithm they read about. Great minds. But highly ineffective, who end up wasting their and everyone else's time. Putting things into use in a running machine like a complex business, is hard in itself. The more complicated your solution, the longer it will take to make it useful, it will be expensive to maintain, will need constant supervision, and leave everyone exhausted and exasperated. This is not trivial. Data Science courses popping up produce unusable talent because it's taught by people who have never done any real work.
"She Blinded Me With Science" ~ Thomas Dolby
because we need to lighten-up at times.
Most data scientists can use python and pandas along with PyTorch to do the math needed. The key skill in data science is how to get the data and figure out what you can do with the data.
Thankful that you made this! Working on a Master's in Data Science, but to be honest, my stats background it pretty weak. I will DEFINITELY be ordering some of those books!
Hey, this is very helpful! I'm already a programmer (Python is one of the languages I'm more familiar with) and I was actually looking into how to learn statistics and data visualization since I can't do much more than basic bar and line plots, and I'm not very strong when it comes to mathematics (surprisingly enough, generic programming doesn't require much more than pre-algebra, numeric systems such as binary, octal and hexadecimal, and boolean algebra). So It's good that you just made a video on books on how to get started! It does feel like an overwhelming amount of math, specially considering I still have to review a lot about pre-algebra because I never touched it ever since high school many years ago but... I'll do my best!
This is me rn literally. This video is so helpful to me
Watching this while procrastinating on studying for my data science exam
So crazy because I was researching what I should learn in mathematics next! I’m an engineering major and I’ve taken Elementary Linear Algebra, Ordinary Differential equations, and Multivariable Calculus. I’m curious about statistics because I will take a statistics class for engineering and a math methods class for engineering. I do not want to stop learning math because I love it so much. Should I continue to learn about statistics or should I go down discrete mathematics, math proofs, real analysis ect. Maybe learn about PDEs or complex variables? It’s very confusing which one I should do or how a math subject is relevant to me and engineering. To be more specific, I’m a mechanical engineering major. I’ve been watching your channel since I went back to college and started taking college algebra! That was two years ago! Maybe a minor in math or statistics is in the works whenever i transfer to ASU a 4 year university. Thank you!
In order to learn more upper level math, you need analysis and math proofs and you should have a course on discrete math. But not the typical discrete math a CS major takes, that is usually a bit too elementary and easily learned through your intro to proofs class (at my college the proofs class counts for discrete math as well), so take a course in applied combinatorics or something similar.
Your course in statistics is probably great for engineers and stuff but after one class you can't get far without a background in combinatorics and proofs. Even my first undergraduate introduction to probability theory assumed most students were or had taken analysis and the prerequisites were applied combinatorics, proofs and multi . So, needless to say, proofs and combinatorics are both really important. Additionally, any graduate level probability or some statistics will require analysis all the way through measure theory (typically three semesters worth of analysis).
For your goals though, I would say your statistics course will suffice for now as a mechanical engineer. I personally think that you would get a lot more out of both PDEs and Complex Analysis. Both of which are beautiful and extremely relevant for all mechanical engineering occupations. Additionally, you should consider numerical analysis but it may be too much for your degree. Consider it though, it's really interesting and just as relevant as PDEs to mechanical engineers.
@@thefourthbrotherkaramazov245 Wow! First thank you for the information. That makes sense that the proofs class and real analysis are the key to the upper division levels because I have seen real analysis is a prerequisite for many upper level math. Arizona State also takes the math proof writing course in place of discrete mathematics. I did not know that combinatorics were that significant. I am trying to pursue a math minor and I was looking into which other courses besides Linear algebra and statistics I should consider. So again thank you so much. This is great knowledge for anyone else wondering the same.
I love that in my mechanical engineering degree I can use 3 classes in upper level mathematics. I think I will take the math proofs course and real analysis for the first two and I think I will look more into Numerical analysis because there is two numerical analysis courses that is available fore me as a mechanical engineering major. I’m a little more familiar with PDE’s and complex analysis but I think I should look into numerical analysis before I make the decision what I should use for the third class considering how important that is too.
I should note that I am using my electives for 3 math courses and I would take the extra math courses for a math minor. There is a possibility I could double major but I would rather talk to an advisor. I am aware that many people end up with the extra degree with the careful planning of the two degrees.
@@tmann986 That sounds great! As a heads up, real analysis is useful but moreso just for understanding higher maths. It is pretty useless itself. What I mean is, if you take 3 upper math classes and leave it at that, it is way less likely that you will use it versus a course in PDEs. Now, if you are just curious and you want to learn it, go for it! Im also not saying you wont self study in the future or go to grad school. But some math classes, as interesting as they are, are not so fruitful for most occupations outside of math research. You should hear someone else's input but I think it should be noted. Glad I am of help! Good luck on your degree
@@thefourthbrotherkaramazov245 haha that makes sense! Real analysis is really out there on the pure mathematics side. I watched blackpenredpen video of him going over an epsilon-delta proof and I just loved it! Again I can go my whole life knowing the calculus series without the E-D proof but I think its cool nonetheless! My calc 3 professor didn’t think it was nesscessary to go over the epsilon delta proof so I went ahead and watched TheMathSorcerers videos on the proofs and did the problems with him and it was really fun! It was really funny to learn these techniques to solve ODE’s just to be told later we have approximation methods since most real applications have more variables (PDE’s) and need the computing power to approximate an answer haha. Your input was gold btw!
@@thefourthbrotherkaramazov245 What's combinoratics ?
Data science seems interesting to learn. May you continue to receive more blessings along the way.
A very much needed video, I'm currently a freshmen in Applied Mathematics with my interests lying in AI/ML/DL.
That Schaum's Outline to Linear Algebra moved me from a B- to an A in my first Linear Algebra course. It was like turning on a lightbulb.
Thank you teacher, next week it's my birthday, i'm currently studying Data Science in University and this video it's a great fount of inspiration!
Must have read my mind! Just started a new job in programming and trying to skill up a bit more while finishing my onboarding tasks. :D
One of the most informative videos on RUclips. Thank you so much for creating it. Appreciate your efforts.
It's a good idea showing materials to get prepared to data science and machine learning. In my opinion this is a new step in math evolution.
I loved math at school and taught myself a bit more as a medical student. I would like to learn enough to understand modern physics including quantum theory and general relativity.
This is one of the few channels I watch which does not have irritating music in the background. I hope you are not considering it?
heres my thoughts on it i would do it in this order:
Probability
Mathmatical series and convergence, numerical methods for analysis
Matrix and linear algebra
bayesian statistics
vectors
calculus
markov process and chains
optimization (linear and quadratic
advanced matrix algebras and calculus (gradients, divergence, curls etc)
I actually want to be a machine learning engineer and great at mathematics this video the best thank you
Another fantastic presentation by The Math Sorcerer --- thanks for the exposure to all these nice books! Have a great Thanksgiving!
Good selections. I would also recommend one on Bayesian inference (ET Jaynes Probability Theory is good),and graphic display of information (Tufte's books). There are also several good books with code outlines for basic Machine Learning / AI algorithms.
Wow! Didn't know you have actually video about data science which I'm studying right now. You're cool!
I love "Mathematical Statistics with Applications", but I'd suggest a more rigorous book like "Introduction to Probability" for the first half. Blitzsteins STAT 110 lectures from Harvard (what the book is based off of) are on RUclips. It'd more digestible than "Statistical Inference" for self study, but covers the first half of the material and then some very well
Going to have to say that Python is the better language to take up. Used both R and Python in a statistical machine learning course and it was extremely difficult to understand the R examples, while it was very easy to understand the Python examples.
Python is definately the more understandable and flexible language. R is falling out of fashion, though it is computationally more powerful.
R, python, julia, etc., they all have strengths and weaknesses depending on your use case so I think you gotta give terms and conditions when recommending languages lest you lead people astray. R has an enormous advantage in biomedical packages thanks to Bioconductor. It also leads in niche economic analysis where a lot of the academics use R.
Python is more prevalent in industry and is a fabulous general scripting language.
Nice list. I thought Clifford Algebra and geometric calculus were making inroads in data science. You can do a lot of what vector calculus does in conceptually much easier ways. I know they have made some waves in the 3D graphics community. For linear, Strang is really good for anyone who struggles with getting a grip. His alternate approach can help a lot.
Thank you so much Math Sorcerer you are a life saver !
very interesting how you inteoduce linear algebra after the calculus books. I'd have thought you would want to order them so people making a list of what to look out for and not mess themselves up studying something they don't have the prerequisite knowledge for.
Nice review! Smell is important, I'll take it into consideration. Thank you!
Man, awesome content, keep it up, Sir!
Much appreciated!
I LOVE THIS VIDEO!! Juat what I needed.
Please make a recent video as keep in mind all aspects as GenAI , NLP ,CV ,DL ,ML, PYTHON , STATISTICS, MATHEMATICS and GUESTIMATES
On your section of Linear Algebra you say that it is "sometimes used"; I work as a computational scientist and do a lot of data science in my job and everything is linear algebra. For example SVD (singular value decomposition) is used for principle component analysis and is also used for least squares estimation. The more linear algebra you know, the better. :D
The regression suggestions tie in to the importance of time series methods. I've only used "Nonlinear Time Series analysis" by Kantz and Schreiber but I've seen people recommend "Time Series Analysis" by Hamliton.
As an aside, what is your opinion on Axler's Linear Algebra Done Right? The new fourth ed will be made available for free online so it may be a useful recommendation as well.
College-level textbooks usually have monopoly-dictated (aka "eye-watering") prices. Fortunately, they get gratuitously revised to mainatin sales, (making each class buy new books, not last year's), so second-hand versions should be available inexpensively.
Learning to program is only minimally about the languages, (each of which has its special niche; learn to identify and pick the right tool for the job, though any language can be contorted to solve a given problem. Start by learning to write spreadsheets, which are programs in disguise.
There are a lot of aspects of programing that often get overlooked, like designing test data, using version control, taking advantage of existing tools to simplify programs, &c. Remember that the greatest productivity tool in programming is plagiarism. DRY; Don't repeat Yourself, and dont repeat anyone else's work if you can use it. The world is awash with bad, redundant code, painfully cranked out by grad students ar huge opportunity cost.
Brilliant post
@@julianpenfold1638 Blush, hide. :-)*
I used the statistics book by Wackerly, Mendenhall.. for my statistical mechanics course, they were indeed very helpful
I like your pfp, quite a unique choice of putting plain blank space as pfp.
I suggest adding optimization and high-dimension data analysis to the statistics stack.
I'm a big fan of old books but some of the stat books like Nonparametrics (Hollander/Wolfe) are greatly revised with newer editions with tons of computer code to work along with the exercises and see how to apply some of the methods.
I started with Python which is what initially made me fall in love with programming. I really like R too though.
I study mathematics and a little interested in data science but have never learned it because it requires programming skills and knowledge beyond that of mathematics I’ve learned so far. Though I’ve already learned fundamental statistics and probability theory using measure theory, they seem to be not enough
I was told that a deep understanding of mathematics isn't necessary anymore, because computer programs can do all of that work for us. But I never believed that.
My main interest is social science research, and as far as I can tell, there's still a big replication crisis in the field. I'm pretty sure that a deeper understanding of mathematical statistics and related areas of math will be a vital part of the solution.
One my colleagues wife is a Social Science lecturer at a university. She said that one of the main problems with the students they get is they are unable to do Maths.
@@Anonymous-qw I've always felt that was the main problem in social science in general. Too many people go into the field with poor math skills, and they usually don't bother to improve.
@@surrealistidealist they go into that field because they're not forced to do math by it. It's not a they problem, it's the field itself that's the problem.
I'd like to suggest Statistical Distributions by Hastings and Peacock.
Thanks @TMS, so much to learn so little time.
These book reviews are awesome man. Thankyou for that 😀👌🏽👌🏽👍🏽
Thanks for your comprehensive explanation❤❤
Excellent video. This has been one of your best. Please do a video like this one about mathematical modelling.
Great suggestion!
Please do seperate book reviews on every single book you showed in this video.
I've heard nothing good about R. What I've read was "21st Century C" who argued he was constantly writing C routines to speed up his R programs until he decided to ditch R altogether and just use C.
In a similar vein, I just can't stand Python. I thought we gave up positional syntax with COBOL. I much prefer Perl, the original programming language for data analysis. And overall, I really prefer C. Sure, you can get some good libraries for Python, but there are even more and more solid ones for C.
thank you
Thank you very much sir. Useful video.
You should make a video like this for computer science!
Intro to statistical learning in R
thank you for your efforts
Thank you so much for this video!!
A Whiff never misses a Math Sorcerer's book review 😂😂😂😂😂
Here in brazil we have used stewart for the past 15 years as well
awesome plethora of books.....hey buddy...btw....I just passed the clep college algebra exam....thx a mil!
That is awesome!
@@TheMathSorcerer you were in my ear the whole time.....lol!
@@TheMathSorcerer do you have anything for Bayesian statistics? I’m in a Georgia tech machine learning program - can reach you over email for questions I have for you.
Dude! Holy Smokes! I'm going to need another room to build a second library!
I wish I could double or triple like this video. Thank you sir.
Lord Sorcerer, do you know some books or paper about:
-Numerical Method for Variational calculus.
-Nonlinear programing.
-Numerical nonconvex optimization.
-FEM optimization ( reduce computer load, maybe Dim. reduction)
Great selection of books!!!
Thank you!
All I can say: the bigger the book with answers the better l fell, because I want a comprehensive tanky book to finish most of the job at once. And I don't know if this approach is right or not.
You just need to figure out how to utilize a microchip and the code references related to the voltage and expectation or what frequency band do you attempt to influence while utilizing the statistical data you are looking for or require for a specific outcome you are seeking.
This video is fantastic! Thank you for this advice.
Would you recommend some books for those who are machine learning scientist? In my case, I am studying for a Ph.D. in Machine Learning (Deep Learning), and I have noted some lack of math when I am reading papers.
I see everywhere that introductory algebra, calculus, and statistics are needed. But that is not how I see it.
I would like your opinion and if you could make a similar video recommending some books. Thanks
Wow, nice list. I've been looking into Computational Mathematics(Physics). Those requirements are a bit different
Real-time investing required me to calculate accurately when to cut-loss, Take-profit, break-even. I worked so hard and smartly.
wow this is epic, good work MS :))
Thank you!! As always :)
Great, thanks for suggesting nice books. 😊
Yep checked my bookshelf and the Calculus book is by James Stewart. Interesting its mostly the same everywhere.
Ultimately, calculus serves as a big foundation and prerequisite. As for all science.
Nice! Do Bioinformatics next
Video Suggestion:
Everything for Theoretical Physicist (All of Physics)
I have read "Statistical Inference via Data Science: A ModernDive into R and the Tidyverse".
Thank you so much
I like physical books and have a prodigious Kindle Library, but how do you feel about the subscription services like Packt, O'Reilly, or Scribd that offer lots of digital materials for study?
I've used both Python and R fairly extensively, and I recommend learning Python. Python is a general-purpose language, so you can use it for a lot of tasks, not just data mining. R, on the other hand, is a special-purpose language that you are unlikely to want to use for anything else. The programming skills you use for Python are readily transferable to other languages, while for R you'll have to learn some structures and syntax that are unlike other languages. Both languages have packages that provide essentially the same capability.
Guys if you want to die quickly in your learning journey to become a data scientist then waste your time in math books. Learning math and spending long time in it is something wrong and waste of time. You should learn Python then learn fundamentals of data science and machine learning by writing python code. You cannot understand data science and machine learning without writing code. After solving some problems in python, you will understand what is needed for this field and you will know which statistics and math topics you need to learn or relearn. Don't waste your time in math before practicing data science and machine learning in Python otherwise you will find yourself lost in a lot of math without any context. You will enjoy math in programming and you don't need advanced math for most of problems.
While I understand the value of Algebra ( as we encounter some concepts like eigenvector , a concept in Matrix theory) , Stats ( for sure) what I really don’t understand is the significance of calculus in data science. Can someone make an exclusive video on this the applicability of differential and integral calculus in general?
I am from India
Thank you 😊
Youre our Math Godfather