I used to give a lecture on when to use R and when to use Python. I gave it for many years, but every year both languages would grow closer and closer together. I eventually stopped giving the lecture, because they're so much the same that it doesn't benefit students to talk about it anymore. The only thing that's different anymore, in my opinion, is it depends on how your brain thinks about problems. If you think about and solve problems from a programmer mind set, Python will be easier for you brain to wrap around. If you come from SAS, MPlus, or SPSS, R might be easier for your brain to wrap around. Much like picking skis or snowboards, try them both and go with the one that feels right for the way you work.
I know Im asking randomly but does anyone know a trick to get back into an instagram account..? I somehow lost my account password. I appreciate any tips you can give me.
As a noob with only excel background, I got into R much more easily. One huge advantage of R imo is RStudio. Such a great tool to work with. Also in R the documentation is helpful, even the error messages are useful. I'm starting with python, but for me it's not as sticky and intuitive. I find Spyder as an IDE ok, but imo it's way behind RStudio.
I also pretty much only with excel background. But I picked up python more easily… it’s really hard to understand R language..though I have to learn it anyways
Python is unquestionably more straightforward as a language in general. However, it's fundamentally a general-purpose scalar language, not a vector-data language like R or a matrix language like matlab. That fact makes the type of data manipulation and analysis that is meat and potatoes in R less convenient in Python. "Hello world" is easier in Python, but real data analysis is easier in R. I use Python for general programming, but it's just not worth the trouble to force Python to pretend to be R for data analysis, econometrics, or statistics. Python is way, way behind in all forms of data analysis. For example, Python is only now considering basic ideas like "missing" values being different from "not a number" values, which the creators of R thought of and planned for from day 1. If I had to pick one language as "better," I might choose Python, but it's not better for data analysis, which is what's being discussed here.
The syntax example for R is way more complicated than it needs to be. You technically don’t even need to load any packages to read in a CSV and calculate the mean.
For a new programer, I'd say learn Python. It's much much easier to get a job with Python, your in the general software engineer camp vs being locked to data scientist roles.
I came to R from using C, visual engineering environment (an instrument control language used in metrology), SAS & SQL. Nowadays I make my living with R, automating reporting, text mining, and developing data manipulation tools for an intelligence team. It has to be said that in my industry, I haven't yet come acress a Python user. It might just be that the big players in town are all either R or SAS background.
5:56 you can use colMeans(nba[sapply(nba, is.numeric)]) for calculating means of the numeric columns, you don't even have to import any libraries. I understand the python way is still cleaner, however, there are tons of situations where the other way around is true. 7:09 library(tidyverse) and you get every functionality that python pandas can offer, you don't have to remember a lot of things for doing a simple task.
Was just going to say this. That was a pretty poor example.. There are so many more situations where R is cleaner and easier to use for wrangling data. I feel that pandas is disappointing whereas dplyr/tidyverse in general are better tools for data science
I don't want to worry too much about data types when doing my analysis. The fact that base R supports operations of matrices and data frames makes it much easier to use. R knows when you are subtracting two series (column/vector whichever) to subtract it term by term for example, it's pretty messy with python when you get lists, series, arrays and such going about all with different methods for that one exact operation.
Personally I prefer R when doing hardcore data analysis. Dplyr, ggplot2. and the rest of the tidyverse enable you to do more with dramatically less code compared to Python. For anything outside of hardcore Data analysis I use Python.
@@squirrelpatrick3670 R's data table is one of the fastest in the whole programming language universe. I rarely use dplyr or pandas after I started using data table in R
I've waited long for this video! Right now I'm learning Python and in my company, they use both depending on if we are using classical statistical models or ML. However, I'm also an economist who would like to get more involved in academia and I think R is more used there than Python. Both are excellent choices tbh
@Harry S It depends on where you are and the country laws. Here in Brazil there's no law which regulates the data analyst profession in private market (aka companies). But in public sector is required to have an university degree as statistician, IT, software engineer, etc.
Controversial take: I would suggest python for economists. Reason being, if you're an economist, you are likely to use (or a coauthor is likely to use) Stata. The newer versions of Stata talk to Python really really well. I can run Stata from within Jupyter or Spyder or run Python from within Stata VERY easily, and that includes, for example, running a Python command from Stata USING my live Stata dataset! In other words, you can open stata, play with some data, then run a python command on that data, then run a stata command on that data, etc. Will Python be able to do something newer, for example techniques that have come out in the wake of Goodman-Bacon 2018? Probably not. But just use the instructions to turn your section of your .do file into python code and run what you need to there, then switch back. Easy peasy.
When we speak about analysis, we speak about mathematics and more precisely statistics... in my point of view, R has more mathematical libraries than python ... and please keep python for web development and other stuffs
Two minutes in, you're pedalling the standard nonsense that R is a statistical package. I've been using R for twelve years and pretty much never for statistics. Text processing, data cleaning, report writing (markdown) and GIS, GIS, GIS. R is really good for mapping and geospatial data processing (not just spatial statistics).
The strenght of R is the statistics you can't deny that. Sure you can do other things with this language but it strong point is making plots, modifying data frames and statistical tests. Sorry for my english btw
Very good, I mostly use R. But when Combining R with python and sql, then you are unstoppable data programing machine. So learn both, it's a lot of fun.
@@DailyMental wow good question. I learned SQL then python and R. My opinion is that R requires less code to do something than python. Also the amazing R Tidyverse package makes it so mush easier to code and to work with data. Also R mostly just works, where with python you do get a couple more issues with package versions. But saying that, SQL is also a great place to start because it's easy to read and understand the code. Keep in mind that allot of companies store data in a databases so it's always a bonus if you can use SQL to extract data from the database and then analyze it in R or python. This is just my opinion. Good luck
@@jacobusstrydom7017 Thank you for the advice! I was thinking this as well, SQL is my first step and then R since im from a business background and its probably better to have a solid foundation before learning more complex syntax.
Tableu is expensive but have great feature and ux, power bi is cheaper even free but not so great ux. Both of the app will do your data viz job eventually
I think R is better, as pandas is much slower and less easy to use than dplyr. Data prep takes up most of my time, so this is huge. Both R and python are relatively equivalent to me for machine learning. Alot of these ml packages are just R and Python wrappers to c code. Maybe if you work for a fang and do alot of pure deep learning, python may be better, but I think that situation is rather rare.
It truly depends: once on personal preference, and also on what your work, that is your company, requires you to use. I prefer Python, and I think Python will grow to offer the same amount of features (if not more) as R in the future.
You definitely cherry picked to get mean of column you don't need to load packages just the base summary function will give that. summary(data) will give you the mean, median, Q1, Q3, min and max of every numeric column, not to mention the counts of qualitative columns.
When I subscribed to this channel two weeks ago I did it because I wanted to be ready for my data analyst interview. I passed it very well and I think this channel helped at least when it came to learning more about the job and the differences between a data analyst and a data scientist. I will start on the first day of March and I am looking forward to it. I am studying for a master in Big Data at the same time and I am learning R there, whereas I need to learn Python for work. R doesn't look difficult to me but Python kinda looks more familiar for me and those with a background in other general purpose programming languages. I agree about the huge amount of libraries in R and I think that it is really great for visualization. However, since Python is becoming the most popular programming language I would already prefer it for that reason alone not counting anything else.
filtering with pandas df[df['column'] == x] vs R data.table df[column == x, ] What is easier to read? Become even worse in pandas then you have more complex condition. Not to mention multi index in pandas. It is a hell. And direct comparison with reading of csv in r that you can also do in 2 lines: x
Honestly, I like both. Since I came from SPSS and statistics background, R suits me better. But when I need to analyze missing values or do some graphics, Python helps me a lot more.
@@ankicanozinic6551 If you never touched a database or programming before, SPSS may be easier to learn once it resembles a little bit the Excel and alikes. Also, you can click the buttons SPSS offers and the software gives you the programming script that your clicks generate. Disadvantage: SPSS requires a paid license. But it has a trial version to test. If you're commited (and have enough time to study) to learn programming basics alongside with statistics, R is the way to go. The answer is: it depends. I always tell my students to go step by step. Well, hopefully it will be useful to you.
Thank you! I'll try both. Approximately in a month I'll have a course in Coursera about R (from Google Data Analyst Certification), but after that I'm interested to try Python as well.
Stopped watching at 5:55 because either Alex was biased or he has no idea whatsoever about R, since he did not use the mean( ) function which is even a base R function and you don't need to install and load any package to use it.
I mostly do data analysis on survey data and in my experience R is more robust in this regard. For instance, there are several packages that will create survey weights for you, but I have yet to find one Python package that actuially works. I do agree that Python syntax is somewhat easier to pick up, but once you understand vectorized operations in R it becomes easier to use.
From my understanding if you're familiar with SAS, R would probably be easier to pick up vs Python. I personally started with C#/C++ so python was easier for me to pick up. Also perfect troll post on LinkedIn, just say something controversial and walk away lol 👍.
R and Python require totally different mindsets. Picking the better one of them is like asking "which is a better career, statistician or engineer?". With my mathematical background I find R code much more straightforward, and when I started to use Python, it's not like any single piece of code is unreadable but the entire workflow is unfamiliar: how tasks are broken down, why makes a copy here, and so on. It also took me quite some time to be convinced that Python does not have a library for the Spatio-Temporal Autoregression model (for a few hours I thought I just hadn't searched the right way) since R offers abundent solutions for spatio-temporal data. Eventually I realized that modelling is never at the top of engineers' priority list, and mathematicians/statisticians can focus on the intellectual work only because engineers have got their hands dirty. Also, Python makes it easier to collaborate with other platforms. If I were to communicate with laymen rather than other professionals in my company, deploying a dashboard or web app would be the best explanation. Again, it didn't occur to me because this hadn't been my top priority, so I preferred R. Now the job has changed and I'm using Python more often.
If there are no librairie for spacio temporal autoregression you can build it from scratch with python, as object oriented programming languages which is not possible in R
The youtube algorithm brought me here lol. I think an example of data analysis using popular libraries on both for comparison is good. Like the processing time, the amount of codes written etc. For me python is easier since in colleague we used cpp and fortran for learning basics of algorithm and numerical methods. The one time i had to use spss for statistics assignments i got really confused.
True, for me it broadens your way of thinking about programming in general. Since every language approaches the subject in a unique way with a unique motivation. It makes you very appreciative of the strengths and weaknesses of every language.
I am an economist trying to dedicate to data analysis and I still didn't understand the pros and cons of both, so this video is exactly what I needed. Thank you! 😄
my whole knowledge in programming is with c++, I even made my calculations for my M.Sc in Statistics and Operations research in c++ . Now I’m not sure what to begin with Python or R. Most my work are hover around numerical analysis.
Great video. But I think you could make a syntax example with R much easier. If you want to know the mean of a variable or of data you only type mean(data) or mean(data$variable), of course, depends on this variable or data is numeric or not. Thanks for your video. Regards.
I've personally enjoyed my SQL, Excel, R, PowerBI group I've got set-up. The only thing I really planned for was learning PowerBI but the rest came about oddly naturally. Great video by the way!
@@praveen26699 I found it particularly easy. I'd picked up most on my own after reading: "Learn R" by Aphalo. I'd taken some DataCamp courses and other paid courses by ppl like Matt Dancho that provided spot-on business applications for it. I also read "Advanced R" by Wichkham and with all of that R is my main powertool in the tool chest. R is a lot like Excel and SQL, Python is a lot more like other programming languages. All of the above are interchangeable and as long as you can learn how to provide business value you are golden.
Excellent summary, great balance of conciseness and examples. "R is harder to learn, but has more features"... specifically for analytics, right? My understanding is Python has far more features in general. Never heard of someone building a mobile app in R.
There must be a reason behind the growing number of R packages. My clients won't care about if I produce results in R or Python. If they ask me to build an app, then I'll reconsider.
I know absolutely nothing about Python, but your example at ~6:30 is a major giveaway that you are not experienced enough with R programming to form a reliable comparison - the example could be done in base R with two lines of simple code. I've never seen such an overcomplicated way to find the mean as you described..
In my opinion the biggest advantages of R are its IDE Rstudio and the capacity to execute only the mouse selected portion of code (no, notebooks are not as convenient). Web deployment is possible through Shiny but it seems much more of a hassle than on Python.
Just calling attention to librarian::shelf(tidyverse) You don't need to write 10 lines of library(dplyr), you can write all your package name in a single line code and it will automatically install if needed and load it.
Loved it! Thank you very much for your content, just started following you. My advice is just express your opinion like you did, makes content far more unique. Cheers!
Use the language your team uses. The guess work can be taken out based on the company you work for or the company you want to work for in the future. If they use R, use R. If they use Python, use Python. If it’s only up to you, flip a coin.
It's insane how many times you had to ask ahead for forgiveness to avoid potentially offending anyone. We are all different and haver different opinions - get over it people! Very cool video mate. As a statistician I'm very in love with R, but I'm trying to learn Python as I am very aware of it's coverage and power. Cheers
i actually gave up on R as I moved to a more strategic role and away from hardcore data analysis, i found it harder and harder just to recollect syntax across different libraries. Plus I see Industry is tilting more and more towards Python and learning Python is kind of "future proofing" your time spent on it.
If all you want to do is read a CSV file and see the mean, you could use RStudio and not program anything in either R or Python. Use the right tool for the right job.
Underrated skill that's complementary to these is Excel PowerQuery... Poor man's PowerBI and the only thing that makes Microsoft's Office suite irreplaceable by even the best of clones.
I would like to see a second part video comparison focused on comparing R and Python languages from a business standpoint rather than their more general-purpose, programming capabilities for building applications, and heavy used of sophisticated statistics that do not apply to the average business world. For instance, many of us in business are hoping to learn which language is better for business analysis which, after all, is the trend in using either of these languages. What we learn from the video is that R is being highlighted as useful in purely statistical analysis, while the comparison with Python does not provide any insights into Python's capabilities for statistical analysis. R is being highlighted as great for statistical analysis, however, advanced statistics is mostly used by the scientific and academic community mostly as well as sophisticated business environments whereby most of these advanced statistics are not needed in the general business world. I would like to see the view from a business analyst/business intelligence professional who has truly used both R and Python for exactly the same purpose, using these languages for business analysis. It would be great to move away from the general-purpose and application development and get more into the business uses for each language and on what statistical and data analysis truly serves the vast majority of business users, business intelligence and data analysts analyzing business related data. Looking forward to this second video. Thanks Alex!
For business analyst/business intelligence both languages are equal and more about preference rather than advantages. Usually there are no issue with performance, you do not need sophisticated models and packages. You can build up your own functions and make it closer to you field in both languages. Maybe there are better visualisation with ggplot in R (it is more versatile). But if you want to build proper self-service BI then better to go with classical BI tools like Tableau/Power BI and etc. R and Python are for search of deep insides made by hardcore analyst, and BI tools for managers. I am actually bilingual in R and Python and do both ways in my work.
I learned R and Python and I can say R is much easier to learn but Python is way more robust. I replaced a VBA code that creates MS Excel workbooks from a template, and it took like 3 seconds to complete. Using R took like 45 seconds.. After I saw the benefits and speed of Python, I put R aside and focus on Python.
I think the best thing about R is RMarkdown. Being able to hit one button, run my statistical analysis, and output a word document with all the right numbers and figures in it is amazing for reproducible reporting. I'm switching to Python soon. Do you have any recommendations for a similar functionality?
Python is a compuer scientist designed language, R is a statistician designed one. Python uses = for assignment, while R use -> as assignment symbol. Python's function is more flexible than R. Deep learning packages is written for Python, but not R. So R is a statistics-lise language, and Python is a data science language.
Hard to go wrong either way. If your job leans more towards data engineering and ETL then probably Python is a good choice to start with first, IMO. Thanks Alex!
I am trying to find the best way to build a sports betting model using past statistics to project the outcome of future games/events/etc. I have messed around a little bit on Microsoft Excel doing this but I was just curious if anyone has a suggestion for which program would be the best for my needs between Excel/Python/R. Thank you for the help!
The Python example seems to be a bit cherry picked to show that Python has one function to apply the function mean to all the columns. Also, I'd like to know why R is more difficult to maintain? I think the pros for R should include ability for Markdown, better visualisation libraries, and piping is intuitive. Python pro should include that it's an actual programming language.
I class myself as an R guy, but I do have one problem with R as far as maintainability is concerned: I can't count on it always producing the same answers from the same code. Even on the same version of R, changes in its many packages tend to change my results all the time. And I get slightly different results on different computers. I have spent a lot of time in the last few years trying to overcome this problem but I'm considering sucking it up and porting a lot of my production code to python.
@@mycrushisachicken R and python differ in their third party package systems. Python has many packages but, you really only use 2 or 3 of them for data science. They are each large and aren't updated all that often--they seem to have lots of eyes on them and a reluctance to change rapidly. R has many thousands of packages, all of which do data science or statistics/econometrics. And they are all created/maintained/updated by their original authors, so the updates go to CRAN (R's central repository of packages) right away whenever the authors feel like it. You end up using lots more packages in R and the packages are all written by different people, who may not be overly concerned about the effects of updates on end users. So it's a lot easier to get two systems out of sync in R than python. There may also be numerical reasons why you get different answers on one computer versus another in R. I have, many times, had the experience of optimizers getting a slightly different answer on intel versus AMD, despite all my efforts to standardize, in R. I'm not saying this can't happen in python or matlab or whatever, but I haven't seen it as much.
Well, if you ask CS students, what programming language should be learnt first, 99% will tell you Python. CS students just love Python so much that they could have sexy dance with it if it were a girl. If anyone tells you to learn C first, you know you find your true lover.
Hey Alex do you mind doing a video on the impact of automation on the future of the data analyst career? It would be really helpful to those who are on the fence about starting/changing their careers.
If you are a non programmer R is easier to learn compared to python where you have to begin with oops and env etc... Which are way harder to grasp And the in built data types of R support data.frame and vector calculations thus making it easier to reason about... And i have deployed like dozens of code into production including web apps so almost all of your arguement are biased for what you use...
Alex, its been 4 days getting in the Data analyst Game. you are my GOTO guy and the way you started this business story is real. I am following your classes because, they are simple and easy to read. the classes that are taking in LOOOONG and complicating stuff. Please let me know if you have Instagram, its easy to communicate there.Please Explain to your wife why you could to Instagram account.
For me, it's clear. Python is better for most of people, but, if one have strong math and stats background, R is probably the best. It's so much easier to collect data, clean it, put everything to work in python. But R is just THE WAY TO GO for statistical analysis. You get so many stuff out of the box. So many statistics, it is amazing. Tl dr: learn both, R for statistical and ML modeling. Python for anything else. If learning both is not an option, probably go the python route.
Thank you for this video Alex. I have always been intrigued by data analysis and want to learn more of the programming side. I was not familiar with either of these programs but your breakdown of the two help. If I wanted to learn both of these, would it be easier to learn Python first, then try to learn R or should I try to learn about R first, then Python?
I used to give a lecture on when to use R and when to use Python. I gave it for many years, but every year both languages would grow closer and closer together. I eventually stopped giving the lecture, because they're so much the same that it doesn't benefit students to talk about it anymore. The only thing that's different anymore, in my opinion, is it depends on how your brain thinks about problems. If you think about and solve problems from a programmer mind set, Python will be easier for you brain to wrap around. If you come from SAS, MPlus, or SPSS, R might be easier for your brain to wrap around. Much like picking skis or snowboards, try them both and go with the one that feels right for the way you work.
No. F*ck R. It needs to die and become a bad memory for the human species.
@@jhernandez9617 Why?
@@jhernandez9617 haha ,don't F**K R, R👍👍👍👍💖💖❤
this comment save my 11 mins, thanks!
I know Im asking randomly but does anyone know a trick to get back into an instagram account..?
I somehow lost my account password. I appreciate any tips you can give me.
As a noob with only excel background, I got into R much more easily. One huge advantage of R imo is RStudio. Such a great tool to work with. Also in R the documentation is helpful, even the error messages are useful. I'm starting with python, but for me it's not as sticky and intuitive. I find Spyder as an IDE ok, but imo it's way behind RStudio.
Agreed on RStudio being a huge advantage.
Agree on RStudio. it is really helpful!
Try using vs code for python
Jupyter Notebook or Atom are great
I also pretty much only with excel background. But I picked up python more easily… it’s really hard to understand R language..though I have to learn it anyways
Python is unquestionably more straightforward as a language in general. However, it's fundamentally a general-purpose scalar language, not a vector-data language like R or a matrix language like matlab. That fact makes the type of data manipulation and analysis that is meat and potatoes in R less convenient in Python. "Hello world" is easier in Python, but real data analysis is easier in R. I use Python for general programming, but it's just not worth the trouble to force Python to pretend to be R for data analysis, econometrics, or statistics. Python is way, way behind in all forms of data analysis. For example, Python is only now considering basic ideas like "missing" values being different from "not a number" values, which the creators of R thought of and planned for from day 1. If I had to pick one language as "better," I might choose Python, but it's not better for data analysis, which is what's being discussed here.
Well said!
agree, I prefer R in terms of statistical analyses.
The syntax example for R is way more complicated than it needs to be. You technically don’t even need to load any packages to read in a CSV and calculate the mean.
x
As a non-programmer who use programming for work, I find R's syntax to be more intuitive. Programmers ma think in a different way.
For a new programer, I'd say learn Python.
It's much much easier to get a job with Python, your in the general software engineer camp vs being locked to data scientist roles.
I came to R from using C, visual engineering environment (an instrument control language used in metrology), SAS & SQL. Nowadays I make my living with R, automating reporting, text mining, and developing data manipulation tools for an intelligence team. It has to be said that in my industry, I haven't yet come acress a Python user. It might just be that the big players in town are all either R or SAS background.
Hi James, I'm an R enthusiast and in need of a mentor
Can we connect?
5:56 you can use colMeans(nba[sapply(nba, is.numeric)]) for calculating means of the numeric columns, you don't even have to import any libraries. I understand the python way is still cleaner, however, there are tons of situations where the other way around is true.
7:09 library(tidyverse) and you get every functionality that python pandas can offer, you don't have to remember a lot of things for doing a simple task.
Even easier summary(data) will give mean, median, Q1, Q2, min, and max of data no loading packages and its cleaner than python.
@@rashawnhoward564 Exactly. Alex is bullshitting.
In R, you could also use library(tidytable) for the same functionality with great memory efficiency.
Was just going to say this. That was a pretty poor example.. There are so many more situations where R is cleaner and easier to use for wrangling data. I feel that pandas is disappointing whereas dplyr/tidyverse in general are better tools for data science
I don't want to worry too much about data types when doing my analysis. The fact that base R supports operations of matrices and data frames makes it much easier to use. R knows when you are subtracting two series (column/vector whichever) to subtract it term by term for example, it's pretty messy with python when you get lists, series, arrays and such going about all with different methods for that one exact operation.
Personally I prefer R when doing hardcore data analysis. Dplyr, ggplot2. and the rest of the tidyverse enable you to do more with dramatically less code compared to Python. For anything outside of hardcore Data analysis I use Python.
I can definitely feel that
I’m with this
I really prefer pandas to dplyr, and R was my first language of the two. I did spend a while getting fluent with it.
@@squirrelpatrick3670 R's data table is one of the fastest in the whole programming language universe.
I rarely use dplyr or pandas after I started using data table in R
Same here! R is king for hardcore analysis, but go beyond that, and Python leaves it in the dust. But R is my first love.
I've waited long for this video! Right now I'm learning Python and in my company, they use both depending on if we are using classical statistical models or ML. However, I'm also an economist who would like to get more involved in academia and I think R is more used there than Python. Both are excellent choices tbh
@Harry S It depends on where you are and the country laws. Here in Brazil there's no law which regulates the data analyst profession in private market (aka companies). But in public sector is required to have an university degree as statistician, IT, software engineer, etc.
Controversial take: I would suggest python for economists. Reason being, if you're an economist, you are likely to use (or a coauthor is likely to use) Stata. The newer versions of Stata talk to Python really really well. I can run Stata from within Jupyter or Spyder or run Python from within Stata VERY easily, and that includes, for example, running a Python command from Stata USING my live Stata dataset! In other words, you can open stata, play with some data, then run a python command on that data, then run a stata command on that data, etc.
Will Python be able to do something newer, for example techniques that have come out in the wake of Goodman-Bacon 2018? Probably not. But just use the instructions to turn your section of your .do file into python code and run what you need to there, then switch back. Easy peasy.
Thank you for this comment, I'm currently a second year Econ student and this helped a lot!
for finding the mean of the column in R, you use mean() function. I dont know why you have shown pipes in the R section of syntax example
I know right, it's even one of the functions in the base package !
When we speak about analysis, we speak about mathematics and more precisely statistics... in my point of view, R has more mathematical libraries than python ... and please keep python for web development and other stuffs
Two minutes in, you're pedalling the standard nonsense that R is a statistical package. I've been using R for twelve years and pretty much never for statistics. Text processing, data cleaning, report writing (markdown) and GIS, GIS, GIS. R is really good for mapping and geospatial data processing (not just spatial statistics).
The strenght of R is the statistics you can't deny that. Sure you can do other things with this language but it strong point is making plots, modifying data frames and statistical tests. Sorry for my english btw
2 mins in and you already b**ching. Geez that flacid ego needs be toned down son.
@@alienboogieman I was being polite. It's a crap video at best. It's disingenuous and dishonest at worst.
@@simonparker4992 is that what your wife said to you before she left your ass? If so, good because you assume you know best when you do not.
Is there a similarities between csharp and R language, because I'm using c# now and i have good experience with it
I learn both. My conclusion is python is better but i love Rmarkdown and ggplot more than jupyter and matplotlib + seaborn.
Very good, I mostly use R. But when Combining R with python and sql, then you are unstoppable data programing machine. So learn both, it's a lot of fun.
Hey sir, im currently learning from scratch, would you recommend me to learn R first and then move to python or what would be your approach?
@@DailyMental wow good question. I learned SQL then python and R.
My opinion is that R requires less code to do something than python. Also the amazing R Tidyverse package makes it so mush easier to code and to work with data. Also R mostly just works, where with python you do get a couple more issues with package versions.
But saying that, SQL is also a great place to start because it's easy to read and understand the code.
Keep in mind that allot of companies store data in a databases so it's always a bonus if you can use SQL to extract data from the database and then analyze it in R or python.
This is just my opinion. Good luck
@@jacobusstrydom7017 Thank you for the advice! I was thinking this as well, SQL is my first step and then R since im from a business background and its probably better to have a solid foundation before learning more complex syntax.
Power BI vs. Tableau
power bi..... no mistake there. download it & see 30 minutes demo
@@deniskk2 I've used Power BI and love it. I don't have much experience with Tableau, so I'm wondering his justification for Tableau.
Coming soon!
Tableu is expensive but have great feature and ux, power bi is cheaper even free but not so great ux. Both of the app will do your data viz job eventually
guysss, Google data studio is better than Power Bi y Tableau
I feel this video is a little biased at 5:55, as I don't think anybody will write that code to just get the mean
I think R is better, as pandas is much slower and less easy to use than dplyr. Data prep takes up most of my time, so this is huge. Both R and python are relatively equivalent to me for machine learning. Alot of these ml packages are just R and Python wrappers to c code. Maybe if you work for a fang and do alot of pure deep learning, python may be better, but I think that situation is rather rare.
It truly depends: once on personal preference, and also on what your work, that is your company, requires you to use. I prefer Python, and I think Python will grow to offer the same amount of features (if not more) as R in the future.
You definitely cherry picked to get mean of column you don't need to load packages just the base summary function will give that. summary(data) will give you the mean, median, Q1, Q3, min and max of every numeric column, not to mention the counts of qualitative columns.
When I subscribed to this channel two weeks ago I did it because I wanted to be ready for my data analyst interview. I passed it very well and I think this channel helped at least when it came to learning more about the job and the differences between a data analyst and a data scientist. I will start on the first day of March and I am looking forward to it. I am studying for a master in Big Data at the same time and I am learning R there, whereas I need to learn Python for work. R doesn't look difficult to me but Python kinda looks more familiar for me and those with a background in other general purpose programming languages. I agree about the huge amount of libraries in R and I think that it is really great for visualization. However, since Python is becoming the most popular programming language I would already prefer it for that reason alone not counting anything else.
It would be nice to have a video with examples or real world scenarios for both cases.
filtering with pandas
df[df['column'] == x]
vs R data.table
df[column == x, ]
What is easier to read? Become even worse in pandas then you have more complex condition.
Not to mention multi index in pandas. It is a hell.
And direct comparison with reading of csv in r that you can also do in 2 lines:
x
I don’t think it’s a big deal, it’s a preferential thing! I’m currently learning how to use Python
Definitely is 👍
Thank you Alex for this video!
Honestly, I like both. Since I came from SPSS and statistics background, R suits me better. But when I need to analyze missing values or do some graphics, Python helps me a lot more.
What would you recommend to a beginner in statistics - SPSS or R?
@@ankicanozinic6551 If you never touched a database or programming before, SPSS may be easier to learn once it resembles a little bit the Excel and alikes. Also, you can click the buttons SPSS offers and the software gives you the programming script that your clicks generate. Disadvantage: SPSS requires a paid license. But it has a trial version to test.
If you're commited (and have enough time to study) to learn programming basics alongside with statistics, R is the way to go.
The answer is: it depends. I always tell my students to go step by step. Well, hopefully it will be useful to you.
@@adrielbezerra7887 thank you for a thorough explanaiton
I am a statistics background student.If Learn MS excel, R and power BI for data analysts, it is enough for me in smooth data analysis ?
Thank you!
I'll try both.
Approximately in a month I'll have a course in Coursera about R (from Google Data Analyst Certification),
but after that I'm interested to try Python as well.
How did it go ?
Stopped watching at 5:55 because either Alex was biased or he has no idea whatsoever about R, since he did not use the mean( ) function which is even a base R function and you don't need to install and load any package to use it.
Love your overall points Alex, but saying that R can't be integrated in web apps is plane wrong!
I mostly do data analysis on survey data and in my experience R is more robust in this regard. For instance, there are several packages that will create survey weights for you, but I have yet to find one Python package that actuially works.
I do agree that Python syntax is somewhat easier to pick up, but once you understand vectorized operations in R it becomes easier to use.
From my understanding if you're familiar with SAS, R would probably be easier to pick up vs Python. I personally started with C#/C++ so python was easier for me to pick up. Also perfect troll post on LinkedIn, just say something controversial and walk away lol 👍.
R and Python require totally different mindsets. Picking the better one of them is like asking "which is a better career, statistician or engineer?". With my mathematical background I find R code much more straightforward, and when I started to use Python, it's not like any single piece of code is unreadable but the entire workflow is unfamiliar: how tasks are broken down, why makes a copy here, and so on. It also took me quite some time to be convinced that Python does not have a library for the Spatio-Temporal Autoregression model (for a few hours I thought I just hadn't searched the right way) since R offers abundent solutions for spatio-temporal data. Eventually I realized that modelling is never at the top of engineers' priority list, and mathematicians/statisticians can focus on the intellectual work only because engineers have got their hands dirty. Also, Python makes it easier to collaborate with other platforms. If I were to communicate with laymen rather than other professionals in my company, deploying a dashboard or web app would be the best explanation. Again, it didn't occur to me because this hadn't been my top priority, so I preferred R. Now the job has changed and I'm using Python more often.
If there are no librairie for spacio temporal autoregression you can build it from scratch with python, as object oriented programming languages which is not possible in R
@@LHommeEnVertthen how do you think they built the library in the first place??
The youtube algorithm brought me here lol. I think an example of data analysis using popular libraries on both for comparison is good. Like the processing time, the amount of codes written etc. For me python is easier since in colleague we used cpp and fortran for learning basics of algorithm and numerical methods. The one time i had to use spss for statistics assignments i got really confused.
By the way R can be embedded in web application.
Really!! Is it true?
It helps to know many programming languages - that much I have learned so far
True, for me it broadens your way of thinking about programming in general.
Since every language approaches the subject in a unique way with a unique motivation.
It makes you very appreciative of the strengths and weaknesses of every language.
I prefer the R syntax. I find it easier to remember “weird” syntax.
$ % -> lol
I am an economist trying to dedicate to data analysis and I still didn't understand the pros and cons of both, so this video is exactly what I needed. Thank you! 😄
my whole knowledge in programming is with c++, I even made my calculations for my M.Sc in Statistics and Operations research in c++ . Now I’m not sure what to begin with Python or R. Most my work are hover around numerical analysis.
I guess you could say that the messages Alex received in regard to his “controversial” post included some R-Rated content ;)
Great video. But I think you could make a syntax example with R much easier. If you want to know the mean of a variable or of data you only type mean(data) or mean(data$variable), of course, depends on this variable or data is numeric or not. Thanks for your video. Regards.
Also, mean for each column, just use the code
sapply(dataframe, mean)
using apply functions or colmeans( )
You could simply use the describe method for that using pandas
I've personally enjoyed my SQL, Excel, R, PowerBI group I've got set-up. The only thing I really planned for was learning PowerBI but the rest came about oddly naturally. Great video by the way!
That's a good toolbelt right there!
I too learned Power bi...Is it easy to learn R?
@@praveen26699 I found it particularly easy. I'd picked up most on my own after reading: "Learn R" by Aphalo. I'd taken some DataCamp courses and other paid courses by ppl like Matt Dancho that provided spot-on business applications for it. I also read "Advanced R" by Wichkham and with all of that R is my main powertool in the tool chest.
R is a lot like Excel and SQL, Python is a lot more like other programming languages. All of the above are interchangeable and as long as you can learn how to provide business value you are golden.
Excellent summary, great balance of conciseness and examples.
"R is harder to learn, but has more features"... specifically for analytics, right? My understanding is Python has far more features in general. Never heard of someone building a mobile app in R.
There must be a reason behind the growing number of R packages. My clients won't care about if I produce results in R or Python. If they ask me to build an app, then I'll reconsider.
I know absolutely nothing about Python, but your example at ~6:30 is a major giveaway that you are not experienced enough with R programming to form a reliable comparison - the example could be done in base R with two lines of simple code. I've never seen such an overcomplicated way to find the mean as you described..
Liked and subscribed! Thank you for the valuable input.
In my opinion the biggest advantages of R are its IDE Rstudio and the capacity to execute only the mouse selected portion of code (no, notebooks are not as convenient). Web deployment is possible through Shiny but it seems much more of a hassle than on Python.
Try rmarkdown, sweave or knitr for notebook IDE. They are even better than the python notebooks I worked with so far
Thanks, Alex. Great video. The right tool for the right job.
Absolutely!
Just calling attention to librarian::shelf(tidyverse)
You don't need to write 10 lines of library(dplyr), you can write all your package name in a single line code and it will automatically install if needed and load it.
Loved it! Thank you very much for your content, just started following you.
My advice is just express your opinion like you did, makes content far more unique.
Cheers!
thank you!🙏
So which tool y'all think would be better for conducting economic and financial analyses?
Use the language your team uses.
The guess work can be taken out based on the company you work for or the company you want to work for in the future.
If they use R, use R. If they use Python, use Python. If it’s only up to you, flip a coin.
LoL Flip a Coin....... that's what I'm gonna do as a Student to start Learning XD
Pandas > tidyverse
@@jaqo92 R data table > pandas
Amazing video!! Thanks!!
It's insane how many times you had to ask ahead for forgiveness to avoid potentially offending anyone. We are all different and haver different opinions - get over it people! Very cool video mate. As a statistician I'm very in love with R, but I'm trying to learn Python as I am very aware of it's coverage and power. Cheers
I like functional programming. I love R for data science. Anything else I’ll just write out some C or JS
They’re just tools to get my work done. I use both on daily basis.
Do we have something like r studio for python?
Pycharm I guess
SQL VS NOSQL
Will do!
Team SQL!
Why IBM don't give data science certificate with r?
i actually gave up on R as I moved to a more strategic role and away from hardcore data analysis, i found it harder and harder just to recollect syntax across different libraries. Plus I see Industry is tilting more and more towards Python and learning Python is kind of "future proofing" your time spent on it.
Learn sql for data analysis from your playlist it’s enough or need to continue with another course
For sure! I think it's a good place to start :D
question: if you had to choose one background to have to work as a data analyst, business or statistics, which one would you choose and why?
R code can be difinitely maintained by markdown for example.
If all you want to do is read a CSV file and see the mean, you could use RStudio and not program anything in either R or Python. Use the right tool for the right job.
Thank you Alex!
Underrated skill that's complementary to these is Excel PowerQuery... Poor man's PowerBI and the only thing that makes Microsoft's Office suite irreplaceable by even the best of clones.
U R amazing man ❤️👏
I would like to see a second part video comparison focused on comparing R and Python languages from a business standpoint rather than their more general-purpose, programming capabilities for building applications, and heavy used of sophisticated statistics that do not apply to the average business world. For instance, many of us in business are hoping to learn which language is better for business analysis which, after all, is the trend in using either of these languages.
What we learn from the video is that R is being highlighted as useful in purely statistical analysis, while the comparison with Python does not provide any insights into Python's capabilities for statistical analysis. R is being highlighted as great for statistical analysis, however, advanced statistics is mostly used by the scientific and academic community mostly as well as sophisticated business environments whereby most of these advanced statistics are not needed in the general business world.
I would like to see the view from a business analyst/business intelligence professional who has truly used both R and Python for exactly the same purpose, using these languages for business analysis. It would be great to move away from the general-purpose and application development and get more into the business uses for each language and on what statistical and data analysis truly serves the vast majority of business users, business intelligence and data analysts analyzing business related data.
Looking forward to this second video. Thanks Alex!
For business analyst/business intelligence both languages are equal and more about preference rather than advantages. Usually there are no issue with performance, you do not need sophisticated models and packages. You can build up your own functions and make it closer to you field in both languages. Maybe there are better visualisation with ggplot in R (it is more versatile). But if you want to build proper self-service BI then better to go with classical BI tools like Tableau/Power BI and etc. R and Python are for search of deep insides made by hardcore analyst, and BI tools for managers.
I am actually bilingual in R and Python and do both ways in my work.
"R can't be embedded in web-applications"
I imagine this should be possible with Web Assembly right?
I think Python is better too, but I do like Hadley Wickham's TidyVerse for R.
I learned R and Python and I can say R is much easier to learn but Python is way more robust. I replaced a VBA code that creates MS Excel workbooks from a template, and it took like 3 seconds to complete. Using R took like 45 seconds.. After I saw the benefits and speed of Python, I put R aside and focus on Python.
I use both as a digital analyst student. R for data cleaning, structure, and manipulation. Python for ML
no web applications? What about shiny etc/??
I think the best thing about R is RMarkdown. Being able to hit one button, run my statistical analysis, and output a word document with all the right numbers and figures in it is amazing for reproducible reporting.
I'm switching to Python soon. Do you have any recommendations for a similar functionality?
jupyter notebook is similar
Google chose Python for its ML/AI coding. So if you are looking at ML or AI, python is the way to go.
I use R, but want to learn python eventually! Thx for this video
Miss you Alex!! i worked with both and i've one little thing to say, in R you can write mean(nba) or you can use summary(nba)
Thanks Mohammed! Glad to be back! Is that with R Tidyverse?
@@AlexTheAnalyst No! it is predefined function.
The genesis of R actually dates back to circa 1975 at Bell Labs where is was named S. Python had its origin around 1989.
and Python was insipred by ABC which was inspired by SETL
Both are needed once things get a little advanced
I’m new to programming and I chose Python as a starting point... easier for a noob like me.
So then is better the IBM Course (Python) what Google Course (R) ?
I dont unterstand the last point: R is for statistics and Python for machine learning. I thought, machine learning is nothing else than statistic?
Python is a compuer scientist designed language, R is a statistician designed one. Python uses = for assignment, while R use -> as assignment symbol. Python's function is more flexible than R. Deep learning packages is written for Python, but not R. So R is a statistics-lise language, and Python is a data science language.
In fact in R you can use = in same way as -> for assignment.
Hard to go wrong either way. If your job leans more towards data engineering and ETL then probably Python is a good choice to start with first, IMO.
Thanks Alex!
Very true!
Thanks for clarification.
I am trying to find the best way to build a sports betting model using past statistics to project the outcome of future games/events/etc. I have messed around a little bit on Microsoft Excel doing this but I was just curious if anyone has a suggestion for which program would be the best for my needs between Excel/Python/R. Thank you for the help!
The Python example seems to be a bit cherry picked to show that Python has one function to apply the function mean to all the columns.
Also, I'd like to know why R is more difficult to maintain?
I think the pros for R should include ability for Markdown, better visualisation libraries, and piping is intuitive.
Python pro should include that it's an actual programming language.
I class myself as an R guy, but I do have one problem with R as far as maintainability is concerned: I can't count on it always producing the same answers from the same code. Even on the same version of R, changes in its many packages tend to change my results all the time. And I get slightly different results on different computers. I have spent a lot of time in the last few years trying to overcome this problem but I'm considering sucking it up and porting a lot of my production code to python.
@@bendirval3612 how tf does that happen?
@@mycrushisachicken R and python differ in their third party package systems. Python has many packages but, you really only use 2 or 3 of them for data science. They are each large and aren't updated all that often--they seem to have lots of eyes on them and a reluctance to change rapidly. R has many thousands of packages, all of which do data science or statistics/econometrics. And they are all created/maintained/updated by their original authors, so the updates go to CRAN (R's central repository of packages) right away whenever the authors feel like it. You end up using lots more packages in R and the packages are all written by different people, who may not be overly concerned about the effects of updates on end users. So it's a lot easier to get two systems out of sync in R than python. There may also be numerical reasons why you get different answers on one computer versus another in R. I have, many times, had the experience of optimizers getting a slightly different answer on intel versus AMD, despite all my efforts to standardize, in R. I'm not saying this can't happen in python or matlab or whatever, but I haven't seen it as much.
@@farnsworthsclasses3523 damn thats crazy
Debates of Python VS R are pointless imo. They are good in their own ways.
Most importantly, they are better than SAS.
Well, if you ask CS students, what programming language should be learnt first, 99% will tell you Python. CS students just love Python so much that they could have sexy dance with it if it were a girl.
If anyone tells you to learn C first, you know you find your true lover.
thanks much!
So is R a programming language or a program like SPSS? R being close to python according to the comments kinda melts my brain
Hey Alex do you mind doing a video on the impact of automation on the future of the data analyst career? It would be really helpful to those who are on the fence about starting/changing their careers.
I definitely plan on making a video on AI and automation :)
@@AlexTheAnalyst Thanks!
Great topic! I also interested in your thoughts
If you are a non programmer R is easier to learn compared to python where you have to begin with oops and env etc... Which are way harder to grasp
And the in built data types of R support data.frame and vector calculations thus making it easier to reason about...
And i have deployed like dozens of code into production including web apps so almost all of your arguement are biased for what you use...
Why not both
Which one would you learn first?
Have you ever used Eviews?
Can you recommend a R course on Udemy and/or have a promo code for a R course?
i wouldn't recommend any R course in Udemy
go for O'REILLY books
@@veerasekhar8551 bhai aap india se ho? whats your thought on spss?
Alex, its been 4 days getting in the Data analyst Game. you are my GOTO guy and the way you started this business story is real. I am following your classes because, they are simple and easy to read. the classes that are taking in LOOOONG and complicating stuff. Please let me know if you have Instagram, its easy to communicate there.Please Explain to your wife why you could to Instagram account.
Haha I’ll look into it 😁 so awesome to hear you’ve enjoyed the channel 👍
@@AlexTheAnalyst And that's why your wife is on the money about the professor thing xD
Once again, thanks for sharing your thoughts with us Alex. Can always count on your fair unbiased opinions.
It's crap. He has no idea what he's talking about.
For me, it's clear. Python is better for most of people, but, if one have strong math and stats background, R is probably the best.
It's so much easier to collect data, clean it, put everything to work in python. But R is just THE WAY TO GO for statistical analysis. You get so many stuff out of the box. So many statistics, it is amazing.
Tl dr: learn both, R for statistical and ML modeling. Python for anything else.
If learning both is not an option, probably go the python route.
Thank you for this video Alex. I have always been intrigued by data analysis and want to learn more of the programming side. I was not familiar with either of these programs but your breakdown of the two help. If I wanted to learn both of these, would it be easier to learn Python first, then try to learn R or should I try to learn about R first, then Python?
They’re both hard. Start with python