Solving real world data science tasks with Python Pandas!

Поделиться
HTML-код
  • Опубликовано: 21 ноя 2024

Комментарии • 1,8 тыс.

  • @KeithGalli
    @KeithGalli  4 года назад +195

    Posted a new "Solving real world data science tasks" video! Check it out here: ruclips.net/video/Ewgy-G9cmbg/видео.html

    • @Trazynn
      @Trazynn 4 года назад +4

      This is awesome. Learning Python is so much easier when there's something tangible and grounded to work towards.

    • @colorways518
      @colorways518 4 года назад

      hii keith!!! I am getting an error after this line
      CODE: for file in files:
      current_data = pd.read_csv(path + "/" + file)
      ERROR: ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2
      Please can you help me solve this error....I tried to find solution online but didn't get any.

    • @larrywang1983
      @larrywang1983 4 года назад

      @@colorways518 Just thinking out loud,aren't we able to find the below kind of info from Amazon Jungle Scout, Helium10, Sellics. We are amazon seller, do we also need to go thru Python and data-science on Amazon. There are 3rd Party SaaS plug-ins to solve these questions. Correct me if i am wrong?
      - What was the best month for sales? How much was earned that month?

    • @ismaeelaileru4612
      @ismaeelaileru4612 4 года назад +2

      For the problem on getting city with highest sales, we ran into an ordering problem while plotting the cities, I think we can also use result.index as our xtick
      That way it simply takes the values straight from the Dataframe in the right order rather than using df.unique and rearranging

    • @rodrigo100kk
      @rodrigo100kk 3 года назад +1

      This red warning displays bcuz u didn't make a copy of the original dataframe, do it and this warning goes off.

  • @justapugontheinternet
    @justapugontheinternet Год назад +192

    As a programmer/data analyst/systems administrator I can safely say that this is exactly how we solve problems in real life. Good job!

    • @pasha7293
      @pasha7293 Год назад +4

      you wouldnt have watched this video if you were

    • @justapugontheinternet
      @justapugontheinternet Год назад +49

      @Pasha people who think they know it all are a bore. 🙄 You could always learn something new from other people, it never hurts to learn new perspectives. Good luck with that mindset. I learn everyday. 😌

    • @saugatjarif8272
      @saugatjarif8272 Год назад +14

      @@justapugontheinternet love your mindset on🎉🎉🎉🎉

  • @terrymaverick580
    @terrymaverick580 4 года назад +311

    the best part part was watching some one google the answer an seeing how they implement the solution instead of just acting like they know everything. man your tutorials are the best an down to earth

    • @Amir-tv4nn
      @Amir-tv4nn 2 года назад +1

      hahahahaaha you think this kids knows what he is doing and for your information we all google no matter what postion we hold. 🤣 we built websites for a reason to always look back to when needed. Google provides faster search capability rather going to src and look through to get to. Get your mind straight about goodle 🤣 This kid clearly looking around for the code he already written and you assuming google is preferred to be a bad example as a programmer 😂 tells me you expecting movies type like hackers hahahaahahaha. Come to reality

    • @dragonmateX
      @dragonmateX 2 года назад

      It honestly makes it feel more real, like, I am studying data science now and I google stuff all the time, the fact that even someone well versed in data science still googles stuff constantly is reassuring.

    • @Amir-tv4nn
      @Amir-tv4nn 2 года назад

      @@dragonmateX people who work in google google stuff 😂 get back to reality to why google is meant for🤣

    • @buak809
      @buak809 2 года назад

      @@Amir-tv4nn and? what the fuck is your problem? so far you didn't write anything valuable here

    • @Diabolic9595
      @Diabolic9595 Год назад

      @@Amir-tv4nn Come to reality. Man, come to reality. Could you please come to reality? Btw you should come to reality

  • @kyledawes9593
    @kyledawes9593 3 года назад +41

    As a business major with very limited internship experience, I am teaching myself python and data analytics from scratch. This video is literal gold to me because this is one of the few that actually shows the entire wrangling process! Thanks for the great vid!

    • @vilw4739
      @vilw4739 3 года назад +1

      If i use only fd=pd.read_csv("./Sales_Data/Sales_April_2019.csv") i get file not found error..i should use the whole path starting from c drive..How does he not get error

    • @ashiksrinivas
      @ashiksrinivas 3 года назад +1

      @@vilw4739 He is using jupyter notebook where files are stored separately in a jupyter notebook directory and you can upload files in the directory and import them by simply running fd=pd.read_csv("./Sales_Data/Sales_April_2019.csv")
      If you're using a local python IDE like pycharm and VSCode, you need to specify the whole directory like fd=pd.read_csv("C:/Data Science/Sales_Data/Sales_April_2019.csv") to import.

    • @vilw4739
      @vilw4739 3 года назад

      @@ashiksrinivas thankyou

    • @muhsintabatabayee8592
      @muhsintabatabayee8592 2 года назад

      @@vilw4739 did you ever figure it out? getting the same error

    • @vilw4739
      @vilw4739 2 года назад

      @@muhsintabatabayee8592 they should be in the same folder.Otherwise you need to put the whole path

  • @mid_paulownia
    @mid_paulownia 4 года назад +156

    This is the most practical Python tutorial video I've ever watched.

  • @Jordanptheone
    @Jordanptheone 9 месяцев назад +4

    Watching this 4 years after you published it, and you're still a legend ! Thank you !!!

    • @KeithGalli
      @KeithGalli  9 месяцев назад +1

      Thank you for watching and the kind words!!

  • @helmialfath9897
    @helmialfath9897 4 года назад +724

    This situation so realistic. The mistakes, the solving.. great video!

    • @Pidamoussouma
      @Pidamoussouma 4 года назад +8

      Yes liked it ..it was so realistic

    • @ЧернійЮрійМиколайович
      @ЧернійЮрійМиколайович 4 года назад +3

      is this sarcasm?

    • @ipshie
      @ipshie 4 года назад +1

      Юрій Черній pretty sure no it's not

    • @billyjorrosh9394
      @billyjorrosh9394 4 года назад +13

      not only teach us about pandas but also give us the confidence that "If this guy could be so success in data science then why shouldn't I?"

    • @89DerChristian
      @89DerChristian 2 года назад

      @@ЧернійЮрійМиколайович no

  • @edric7552
    @edric7552 2 года назад +11

    Hi Keith, I feel obligated to personally thank everyone that helps in pursuing my data career and of course, you included. I've used your project (and learned a LOT) and modify/add codes here and there with my own styling for my online portfolio. Moreover, you're a fantastic teacher and you deserve all the credits you should get for helping others like me. Thank you for doing this, may God return the favor and always bless you. Rock on Keith!

    • @KeithGalli
      @KeithGalli  2 года назад +2

      Thank you so much for the kind words! :)

  • @KeithGalli
    @KeithGalli  4 года назад +298

    Video Timeline!
    0:00 - Intro
    1:22 - Downloading the Data
    2:57 - Getting started with the code (Jupyter Notebook)
    Task #1: Merging 12 csvs into a single dataframe (3:35)
    4:25 - Read single CSV file
    5:44 - List all files in a directory
    7:06 - Concatenating files
    11:00 - Reading in Updated dataframe
    Task #2: Add a Month column (12:48)
    14:12 - Parse string in Pandas cell (.str)
    Cleaning our data!
    17:31 - Drop NaN values from df
    21:25 - Remove rows based on condition
    Task #3: Add a sales column (24:58)
    25:58 - Another way to convert a column to numeric (ints & floats)
    Question #1: What was the best month for sales? (29:20)
    30:35 - Visualizing our results with bar chart in matplotlib
    Question #2: What city sold the most product? (34:17)
    35:32 - Add a city column
    36:10 - Using the .apply() method (super useful!!)
    40:35 - Why do we use the lambda x ?
    40:57 - Dropping a column
    46:45 - Answering the question (using groupby)
    47:34 - Plotting our results
    Question #3: What time should we display advertisements to maximize the likelihood of purchases? (52:13)
    53:16 - Using to_datetime() method
    56:01 - Creating hour & minute columns
    58:17 - Matplotlib line graph to plot our results
    1:00:15 - Interpreting our results
    Question #4: What products are most often sold together? (1:02:17)
    1:03:31 - Finding duplicate values in our DataFrame
    1:05:43 - Use transform() method to join values from two rows into a single row
    1:08:00 - Dropping rows with duplicate values
    1:09:39 - Counting pairs of products (itertools, collections)
    Question #5: What product sold the most? Why do you think it did? (1:14:04)
    1:15:28 - Graphing data
    1:18:41 - Overlaying a second Y-axis on existing chart
    1:23:41 - Interpreting our results
    Thanks for watching! If you enjoyed, please consider subscribing :).

    • @ANKITRAJ-fe8dh
      @ANKITRAJ-fe8dh 4 года назад +4

      Heyy,machine learning would be awesome

    • @luuminhvuong
      @luuminhvuong 4 года назад +2

      I Have very big data in xlsx format. Read excel tâkes like forever...

    • @mberoakoko24
      @mberoakoko24 4 года назад +1

      I am on holiday and have started datascience for fun to see what the buzz is all about. I have to say I love it and I would appreciate if you'd apload more videos like this. I have learnt a TON

    • @kulpreetsingh9064
      @kulpreetsingh9064 4 года назад

      Hey man, are you gonna do more such videos anytime soon?

    • @mohammedyounis7207
      @mohammedyounis7207 4 года назад

      Thank you so much, it is very useful to me

  • @sathirasilva4958
    @sathirasilva4958 3 года назад +57

    Great tutorial!
    55:00 When parsing a column into datetime, specifying the format manually will decrease the execution time significantly:
    all_data['Order Date'] = pd.to_datetime(all_data['Order Date'], format='%m/%d/%y %H:%M')

    • @rotan90
      @rotan90 Год назад +1

      on google colab it was like 30 sec vs 2 sec. Great tip !

  • @anthonygonsalvis121
    @anthonygonsalvis121 4 года назад +90

    Love how this cool dude researches solutions on the fly and explains things as he goes even when he commits minor unforced errors. He is so relatable. His other tutorials on Pandas, Numpy, Matplotlib, etc. are equally helpful. I wish him all the success and hope that he continues to share his knowledge for decades to come.

    • @chineduezeofor2481
      @chineduezeofor2481 4 года назад

      He's such a GREAT tutor!!!

    • @indrajeetsinghyadav876
      @indrajeetsinghyadav876 2 года назад

      Agreed totally relatable and helpful videos for beginners giving them a chance to know what error can happen due to what syntax errors. Thanks for the informative guide.

  • @akosasuke5128
    @akosasuke5128 2 года назад +1

    I get the feeling in this video that you know more than you're letting on but you're just trying to make things as basic as possible and I love it. I hope to teach others in this same manner. God bless you

  • @billyjorrosh9394
    @billyjorrosh9394 4 года назад +918

    "I dont know how to do it, but i know how to google it." this guys knows how things going in real world haha

    • @thanhnhando3070
      @thanhnhando3070 4 года назад +60

      Googling is, indeed, one of the most important skills for coding.

    • @indexima6517
      @indexima6517 3 года назад +1

      Hahaha! We invite you to take a look at our videos which deal with the same topics :)

    • @carlurbananimals
      @carlurbananimals 3 года назад +11

      His very fast too, like I would need to know it, coz once I go to google im there for 4 hours :/

    • @samirvinchurkar8226
      @samirvinchurkar8226 3 года назад +5

      I did the exact same process be it R, Matlab or Py

    • @samirvinchurkar8226
      @samirvinchurkar8226 3 года назад +2

      @@carlurbananimals that's coz your question isn't exactly right ;)

  • @cusescholar3582
    @cusescholar3582 7 месяцев назад

    This is the best data science class on the net (that I have seen, of course). We are solving real problems, using google, and working with datasets that require a lot of preprocessing. Perfect.

  • @Yayaloy9
    @Yayaloy9 4 года назад +10

    At 50:10 for anyone who wants to use .unique(), when you calculate the sales for each city make sure to throw in a .reset_index() in there, it will reset the indexes and your bar is going to be alright.
    cityy=all_data.groupby("City").sum().reset_index()
    then you do the rest like him, you can also throw in ascending order in there as well, just follow the rest of his instruction.
    cityy=all_data.groupby("City").sum().reset_index().sort_values("Sales",ascending=False)
    xxx=cityy["City"].unique()
    plt.bar(xxx,cityy["Sales"])
    plt.ylabel("$$$")
    plt.xlabel("Cities")
    plt.xticks(xxx, rotation='vertical', size=8)
    plt.show()

    • @smackedup7657
      @smackedup7657 Год назад

      thanks a lot

    • @rezwanmehedad2095
      @rezwanmehedad2095 Год назад

      unfortunately, I am getting a ValueError. Any idea how I can solve this:
      ValueError: shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (10,) and arg 1 with shape (12,).
      I havent got any proper answer from google or maybe not an expert enough to understand :p.

  • @abdulqadirtinwala1296
    @abdulqadirtinwala1296 4 года назад

    Dude , literary i have never seen anyone solving real world problems on you tube .Your, way of teaching is quite impressive. Many, you tubers just showcase basic problems .But, hats off to you !!!

  • @olajiireolajide
    @olajiireolajide 3 года назад +7

    Love how realistic and down to earth all your videos are! Makes data analysis way more approachable. What a guy!

  • @FrancisBaconthe3rd
    @FrancisBaconthe3rd 4 года назад

    Didn't watch more than a few minutes since I already know how to do most of this stuff but loved how the dude straight up tells us to google it. SO TRUE!!! I've had professors who tell me the same thing. Thumbs up.

  • @羅俊洪
    @羅俊洪 6 месяцев назад

    I just enter data analysis area and amazing this videos made 4 years before already! thanks for made this, learnt your skills and problem solving as talents, appreciated!

  • @matty5ps444
    @matty5ps444 2 года назад +2

    just to add to what most people are saying, this is in my opinion the best way to do a tutorial. you showed me that even though im a super beginner and not long coming out of learning basic python things im able to pick up something really easily while realising that i dont have to feel bad thinking everyone else is better than me and that even experienced programmers google stuff and actually are not gods sitting on pedestals acting like they are better than us haha. great work

  • @ijbarraza
    @ijbarraza 4 года назад +11

    As a new learner of python I found this to be one of the best videos on youtube for beginners. How he managed to deal with the problems and solve them on the go (not knowing it all, but knowing how to consult google for the right answer). Way to go! Loved the approach and how easy you made it look

  • @katherinenavarrohansen2748
    @katherinenavarrohansen2748 3 года назад

    I write from Denmark, but I'm Chilean, I followed all the steps and really everything is very clear, I loved your explanations of each task and each question

  • @МаксимБазанов-и9э
    @МаксимБазанов-и9э 4 года назад +5

    Content of this quality deserves far more recognition. Thank you!

  • @kepenge
    @kepenge Год назад +3

    It's been three years since the video was posted, anyone watching now, as I am, in the column Month one way of getting the names of the months from Order Date, would be to convert the Order Date to_datetime and using the dt.month_name() in Month column. One other thing to remember is to clean the data before starting doing all the analysis.

  • @sushiplatter5540
    @sushiplatter5540 3 года назад +12

    Keith, you're literally the most underrated and one of the best teachers on youtube. This exercise cleared most of my doubts about Data Science and i fell in love with it because of you. Thank you so much for this, you're the best!

  • @OK-Computer
    @OK-Computer 4 года назад +44

    Great video! At the beginning it is much more concise to do this and concatenate all csv files into one like this (better to put ipython notebook csv files in the same directory and then):
    files=[f for f in os.listdir("./") if f.endswith('.csv')]
    df=pd.concat(pd.read_csv(i) for i in files)
    THAT'S IT!

    • @muhammadbashirmuhammad5529
      @muhammadbashirmuhammad5529 4 года назад +1

      Thats better thanks

    • @subho1766
      @subho1766 4 года назад +1

      monthly_dataframes = [pd.read_csv(file) for file in glob.glob(filePath + "*.csv")]
      merged_dataframe = pd.concat(monthly_dataframes)

    • @bartproffitt5240
      @bartproffitt5240 3 года назад +1

      thank you so much i have been battling no such directory all morning

    • @jeisonsanchez4842
      @jeisonsanchez4842 2 года назад +2

      Also consider adding a condition to skip the first row of each subsequent file - to avoid duplicate headers.

  • @manhaabdellah2682
    @manhaabdellah2682 Год назад

    Im new to data analysis. My instructor always tells us to search our questions on google and get help from stack overflow. I didnt understand it till now and got stuck on my second project for sales analysis. This helped me big time!!! I'm so thankful to you for telling all those shortcuts. The data time split had such a long tricky code online.

  • @hoiying-chan
    @hoiying-chan 4 года назад +11

    Your assignments are harder than Coursera's. I'm actually learning something. Major thanks all the way from Holland! 🙏

  • @ziephk
    @ziephk 3 года назад

    omg!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! i searched for so much "a day in a life of a data science" thinking they would show a glimpse of reality. and this is the best portrayal AND simultaneously one of the best tutorial video. YOU ARE A LEGEND!!!!

  • @Account-fi1cu
    @Account-fi1cu 4 года назад +4

    Great tutorial! thank you for sharing
    In 50:26 for cities: can always use the index values from 'results' DF:
    cities = results.index.values
    instead of a for loop

  • @andre__442
    @andre__442 2 года назад +2

    if every human being on earth had the will and disposition to teach like Keith... the world would be a 99% better place

  • @ujjawaljani6731
    @ujjawaljani6731 4 года назад +143

    He is like my friend who teachs one day before exams. 😂😅

  • @Josh-di2ig
    @Josh-di2ig 4 года назад

    you're the best. not only are you teaching people how to use python Pandas lib, but you're also teaching the type of hat you should be wearing when solving real world problems! kudos x 10000

  • @H99x2
    @H99x2 2 года назад +9

    Dude, this is by far one of the best real-life tutorials on YT. Subbed for more like this!

  • @mikeyu6347
    @mikeyu6347 Год назад

    I was absolutely blown away by the fanastic lectures. The best teacher I've ever had!

  • @anubhkumar8824
    @anubhkumar8824 4 года назад +8

    34:34
    Pro tip: go to command mode (press Esc) and press 'b' to make cells below current cell or 'a' to make cells above

    • @KeithGalli
      @KeithGalli  4 года назад +6

      Thanks for the tips! Love when people comment helpful stuff like this :). Just started using command mode to easily switch cells from code to markdown, will have to add these two commands to the arsenal as well!

    • @FlyingMonkeis
      @FlyingMonkeis 4 года назад +1

      f and j will move focus to above or below cells and u can pair this with shift and then press ‘m’ to merge the highlighted cells. so shift+f+m will merge the current cell with the one below it. ‘dd’ will delete a cell also! (these bindings are very vim like)

    • @christopherlyons7613
      @christopherlyons7613 4 года назад

      Think that's reversed. Use 'b' to make cells above and 'a' to make cells below.

  • @hajarja4512
    @hajarja4512 9 месяцев назад

    Hey Keith , thank you so much for this video
    concerning the 4th question 'What products are most often sold together?'
    i kinda had a similar approach and I got same order of grouped products when i counted the values using .value_counts(). However, the values themselves were different !
    here is my approach
    order_grouped =months_purchase.groupby('Order ID')
    def concatenate_strings(x):
    return x.str.cat(sep=',')
    products = pd.DataFrame(order_grouped['Product'].agg(concatenate_strings))
    combined_items = products[products['Product'].str.contains(',')]
    combined_items.value_counts().head(10)

  • @MicahJohns
    @MicahJohns 3 года назад +9

    23:39 that duplication was because of the header rows in each of the files. I've dealt with this a lot. You would have had had to have excluded those header rows on each file before you concatenated all of them together to resolve this.
    Great video course man, thank you for making all of that content

    • @vertik3895
      @vertik3895 3 года назад

      I just did what he did and all I am getting is the header rows, what's the solution?

    • @oscardyremyhr5948
      @oscardyremyhr5948 3 года назад +1

      @@vertik3895 load first df as normal and proceeding df´s as pd.read_csv('file2.csv', skiprows=1) before concat

    • @eduardosa9658
      @eduardosa9658 2 года назад

      @@vertik3895 The solution is call the method read_csv(..., header=None) for each iteration

  • @florenthoti9101
    @florenthoti9101 4 года назад

    As a beginner in Data science with Python, I find you as the best youtuber in this field.
    Good Job!

  • @leec8977
    @leec8977 2 года назад +3

    1:09:28 you can use df=df.groupby('Order ID')['Product'].apply(','.join) instead those three lines. Thanks for this video, it was great for me.

  • @kafaayari
    @kafaayari 3 года назад +16

    When passing a function to apply, you could have just passed the function name, there's no need to do apply(lambda x:get_city(x)). This is just enough and better => apply(get_city)

    • @MattHuisman
      @MattHuisman 2 года назад +2

      Came here to make sure someone said this! As long as the function you pass only takes a single argument. Otherwise lambda x: my_func(x, other_arg)

  • @jack.1.
    @jack.1. 4 года назад

    Thank you, there are tons of brilliant programmers on youtube but only a few programmers who are good communicators and teachers.

  • @Magmatic91
    @Magmatic91 4 года назад +20

    I love how this guy is explaining, I really enjoyed learning from you.

  • @hamishdosiad5764
    @hamishdosiad5764 2 года назад

    mate, you're a legend! not only did I learn matplolib and pandas but now I know my pokemon too, tip of the hat!

  • @oluwadamilaretijani1777
    @oluwadamilaretijani1777 2 года назад +7

    Your courses are very great as you delve into practical content. Your course helped me to pass data analysis test in Turing. Thank you so much

    • @akosasuke5128
      @akosasuke5128 2 года назад

      Congrats oludamire, I'm guessing you're a Nigerian. I'm a Nigerian too and recently got into Exploratory Data Analysis through the udacity Nanodegree program. I'm currently on my second project which is an Investigation of WeRateDogs Twitter dataset. I think I have learnt a thing or two so far. Do you think I'm ready for Turin?..i hear it's like going to the big leagues lol.

  • @AndyRhye
    @AndyRhye 4 года назад

    Man, I really like your style. Firstly, because you take real world problems and not some primitive stuff like some other bloggers, secondly, because you encourage your viewers to search for solutions themselves, and, thirdly, because you show how to find a sollution to a certain problem on the Interned. Please keep doing similar videos! With best wishes and sincere appreciation from Ukraine.

  • @dawnfantasy
    @dawnfantasy 4 года назад +6

    50:47 cities = result.Sales.keys() works as expected. great tutorial, tks!

  • @kelvingitari
    @kelvingitari 2 года назад

    Best data analysis video I have watched so far! I also love how most people in the comment sections have outlined alternative ways of approaching some of the tasks.

  • @royvivat113
    @royvivat113 4 года назад +7

    This is the most informative video I've ever seen on what data science actually is! I keep looking for actual applications and I loved seeing your thought process, comments, and method of asking and answering questions.

  • @GhizlaneBOUSKRI
    @GhizlaneBOUSKRI 4 года назад

    The first time I let the ads on a youtube video, because I wanted to watch every second of it. Many thanks Keith, you' re just amazing !

    • @KeithGalli
      @KeithGalli  4 года назад +1

      I appreciate the kind words! Glad you enjoyed :)

  • @Random_dudebro
    @Random_dudebro 4 года назад +24

    I just finished your two videos demonstrating numpy and pandas, finally feeling a good grasp of python basics (y)
    Thank you for everything you do!

  • @mclovin7300
    @mclovin7300 2 года назад +1

    Dude!! You are awesone teaching data science. You make the world better

  • @devmrin
    @devmrin 4 года назад +21

    Hands down one of the most useful I've seen. Insights galore. Thank you!

  • @tanmaysinghi1868
    @tanmaysinghi1868 2 года назад +1

    i would greatly appreciate another simillar video with a new project with some newer formulas and features, maybe understanding heatmaps, creating more complex functions etc. Thanks again for this.

  • @rezap1356
    @rezap1356 4 года назад +7

    The best graph type for correlation is 'scatter graph', looks like a constellation. Great video Keith. Thanks.

  • @DataScienceMAHAMAT
    @DataScienceMAHAMAT 6 месяцев назад

    This is the most practical Python tutorial video I've ever watched. Thanks for sharing!

  • @DarshanMalu
    @DarshanMalu 4 года назад +17

    You are awesome! Thanks for patiently explaining everything, also teaching how to google what you want! Thanks man!

  • @it_is_ni
    @it_is_ni 3 года назад

    The dataset contains January-data for both 2019 and 2020, so the grouping by month doesn't work because you only look at the month, not the year.
    Stopgap solution: also slice the year off the date string
    Proper solution: convert date string to an actual datetime, then groupby month with pd.Grouper.
    I suggest putting a card or a note there so others aren't confused.
    Thanks for the video though!

  • @SaulOjeda
    @SaulOjeda 3 года назад +13

    this video was amazing, I can't believe I actually sat throught the whole thing past my bedtime

    • @exploringwithdave5926
      @exploringwithdave5926 3 года назад

      If you are a coder, there is no such thing as "bedtime". Just, awake, and not awake.

  • @JulioSerratos
    @JulioSerratos Год назад

    Really good job that really give us a real daily solving problem. I’m sure most of us resolve problems as this way, googling, prove an error. I do not understand why in Interviews they expect you know everything about the Language, Algorithms and Syntax.

  • @yaswanthfinds
    @yaswanthfinds 4 года назад +18

    so nice I was searching this kind of tutorial, it has real-time mistake and solution,I hope you do this kind of videos regularly

  • @ryanmugo4206
    @ryanmugo4206 6 месяцев назад

    i would give this guy a 10/10...truly understood everything

  • @francescofaccia
    @francescofaccia 4 года назад +32

    Hy Keith, you're great! thanks to you we can be introduced to a hell of a lot of useful panda tools! keep up the good work!

  • @ahmetsenol6104
    @ahmetsenol6104 Год назад

    I even liked the name of the video. Straight to the point. I said "YESS IVE BEEN LOOKING FOR THIS" perfect. Thanks.

  • @JoaoOliveira-wh1tp
    @JoaoOliveira-wh1tp 4 года назад +5

    Great video. Just a few suggestions:
    At 4:25 when using os.listdir("'./"), this returns a list alread. So using [file for file in os.listdir(...)] is redundant.
    At 40:50 you don't need to use the lambda function, even if you want to access a cell content. If you simply pass the reference to a function, by default the *args will be passed. Example:
    def modify(a):
    return 'CHANGED ' + a + ' CHANGED'
    df['Column'].apply(modify) # modify without parenthesis is the reference to the function.

    • @mahermonirify
      @mahermonirify 4 года назад

      could u please help : why i'm getting path error when i did try to use os.listdir but not when i opened a specific file to read?

    • @enaba
      @enaba 5 месяцев назад

      @@mahermonirify hello i'm getting path error too can you please tell how do i resolve it?

  • @dp6736
    @dp6736 Год назад +1

    Hi Keith, Even after three years, this video is very useful. You are very good at explaining the concepts. Thank you very much

  • @karimkhatib8569
    @karimkhatib8569 3 года назад +3

    Really interesting to go through the entire process, including looking up solutions and solving errors!

  • @stefanlasek3256
    @stefanlasek3256 4 года назад

    Honestly, one of the best videos I have seen. From mistakes, how to look for answers and little tips & tricks.
    You have got new subscriber in me.

  • @omrieliyahulevy7985
    @omrieliyahulevy7985 4 года назад +6

    Great tutorial, I've learned a lot!
    a suggestion for you first question for the best month for sales:
    Instead of creating the extra cols of 'month' and 'sales' we can use the pandas "resample" method which does the group by month for us, and just like in the groupby method we close it with the "sum" and we get the same table!
    all_data.resample('M', on='Order Date').sum().sort_values(by='Price Each', ascending=False)

    • @Yayaloy9
      @Yayaloy9 4 года назад

      But heres the problem, Order Date is not a date time type so you have to conver it first.
      all_data["Order Date"]= pd.to_datetime(all_data["Order Date"], format="%m/%d/%y %H:%M")

  • @Abdullahkbc
    @Abdullahkbc 2 года назад +1

    You are great Keith. You are doing it in a manner that most students can understand better.

  • @KeithGalli
    @KeithGalli  2 года назад +3

    I'm launching a data analytics bootcamp!
    goto.masterschool.com/5wn3sw
    Some highlights of the program:
    - Fully remote (with flexible working hours)
    - No tuition fees until after you land a job in tech
    - Open to applicants anywhere in the world!
    This is a 7-month long program kicking off in June. To learn more and get your application started, click the link above ⬆

  • @StanleySI
    @StanleySI 3 года назад

    Very practical analysis on real data. As a beginner, I have to pause, research and learn each question separately.

  • @jenn6997
    @jenn6997 4 года назад +8

    You are always so passionate and enthusiastic even if there're errors haha :) Love your positive attitude! Look forward to more great videos!! :)

    • @masthanjinostra2981
      @masthanjinostra2981 3 года назад

      I get tensed like in hell..

    • @geekyprogrammer4831
      @geekyprogrammer4831 3 года назад

      he purposely introduced those errors for us to have real-life problem-solving experience :)

  • @rishabhdewangan6520
    @rishabhdewangan6520 3 года назад +2

    One of the easy simple and best ways to approach data analysis
    This is my first time watching you sir and Im already a sincere subscriber while(True): Do watch, learn and grow under your guidance
    You are Awesome

  • @Scratchmex
    @Scratchmex 4 года назад +10

    22:00 I think is more reliable to parse column of dates as datetime type to avoid all these problems

    • @stevejuso
      @stevejuso 4 года назад

      pd.to_datetime did not work for me on this data. How did you use it? I get an error

    • @SiIentFire
      @SiIentFire 3 года назад

      @@stevejuso Really late reply, but just incase it helps someone.
      You can tell the read_csv function to read a column as a date by passing in parse_dates=['col1', 'col2'] for any amount of columns.
      You can tell it to use European format with dayfirst=True
      And if you need a specific format you can use date_parser to give your own parser for a specific format.
      So in my case it was:
      df = pd.read_csv('filepath', parse_dates=[datecols], dayfirst=True) to get the cols I needed into European date format.
      One key thing is that it converts the dates to a pandas timestamp. But they are interchangeable with python datetimes almost all of the time. Can also be converted with an .apply(lambda x: x.to_pydatetime) if you need.

  • @iunknown563
    @iunknown563 3 года назад

    All the errors that were driving nuts are resurfacing here and being handled nicely! Such a treat:)!

  • @abhishek_raj
    @abhishek_raj 3 года назад +7

    Keith: I am gonna snatch the first two digits and make it the month.
    The data: Hold my NaNs !

  • @williamhendro9177
    @williamhendro9177 3 года назад

    i don't know, man. i think this is one of the very best channels in all platforms (not only youtube)

  • @rafacardenas8783
    @rafacardenas8783 4 года назад +4

    great job Keith!, keep up with the walk-through-style tutorials, hands on is the best and even better when you have the feedback.

  • @itsReshad
    @itsReshad 2 года назад

    Man I don't know how to compliment you but, you teach, explain super well
    I have learned python pandas from you, and have used it for other projects of mine
    Keep up the good content, very valuable

  • @berkayozkan2631
    @berkayozkan2631 3 года назад +6

    I love how he freaks out whenever there is a small warning lol

  • @hamidnikbakht1295
    @hamidnikbakht1295 3 года назад +1

    you get those headers multiple times because when you concatenated the files from different months, the headers from each file was also included in the concatenation!

  • @a.yashwanth
    @a.yashwanth 4 года назад +4

    Checking the length of dataframe helps instead of storing in csv file and verifying.

  • @TheMaltesemania
    @TheMaltesemania 4 года назад

    I feel like I struck gold with this video. It's helping me learn a lot quicker than online tutorials. Thank you!

  • @JohnnyRottenest
    @JohnnyRottenest 4 года назад +5

    50:00, use result.index as x values and x ticks.

  • @vickyzhang820
    @vickyzhang820 2 года назад +1

    Sooooo fantastic!!!
    This is definitely the best Data Project video I've seen on RUclips!

  • @arnopisspot5115
    @arnopisspot5115 4 года назад +4

    this video was super interesting. I can certainly watch 10 more of these!

  • @Eren53-z5b
    @Eren53-z5b 3 месяца назад

    Best things I have done today is finding this man. I was eating and chilling and saw the thumbnail of this video with pandas name and think that let see 5 min what he had to say. But believe guys I have already watched 1:02:57 this portion of the video and it getting more intresting as it goes towards ending. Kudos to his technique ❤❤❤

  • @Doorshlak
    @Doorshlak 4 года назад +12

    This channel is the best thing I've encountered in a while. Thank you for helping the desperate ;-; Would do 5 likes if I could

  • @RSUtsha
    @RSUtsha 2 года назад

    This video is really good, not only the solutions but the process of getting to the solutions shown is what makes it so good...!

  • @vikram3297
    @vikram3297 4 года назад +4

    32:15 you have created months list to pass it to plt.bar() out of thin air, in current scenario as our data is coming in sorted way by month so no issue is coming else it would have plotted Sales against wrong month. Instead I tried this, please let me know if I'm wrong about it?
    all_data.groupby('Month')['Daily Sale'].sum().plot(kind='bar')
    plt.show()

    • @naishkiteboarder
      @naishkiteboarder 4 года назад

      The groupby function sorts by months I think so that will be [1:13], same as the new month variable

    • @naishkiteboarder
      @naishkiteboarder 4 года назад

      Monthss = [month for month, df in All_Data.groupby('Month')]

  • @rafaelmachado7666
    @rafaelmachado7666 Год назад +2

    Amazing video ! All the mistakes and the searching process make the beginners in data science realize that it's possible to do a lot of things since the start of the journey. Thanks

  • @ng4logic
    @ng4logic 4 года назад +18

    58:22 I heard that

  • @datascience_azamat
    @datascience_azamat 4 года назад

    I just wanted to see how DS works and after searching and watching a lot of videos this one is very understandable and very real one. Thanks author! Great job!

  • @nishantbanjade920
    @nishantbanjade920 4 года назад +11

    I like the way you say in every mistakes - :: AAAAh What did i do ::" lol :D xD

    • @Jack-xy4fy
      @Jack-xy4fy 4 года назад +1

      hahaa it made me laugh because i do the exact same thing

  • @GoogleUser-nx3wp
    @GoogleUser-nx3wp 2 года назад

    Thanks for data science tutorials helps me alot in my labs couldn't have done without you

  • @alfiomarra9207
    @alfiomarra9207 2 года назад

    I definitely prefer to watch your tutorials instead of netflix...I love this format, thanks man 😊

  • @priyankasagwekar3408
    @priyankasagwekar3408 4 года назад

    At 51.16 you could have simply passed results['City'] as x -axis argument. Thank you so much. Looking forward for more real time analysis like this one. It was really cool hands on exercise.

  • @jeffmiller7010
    @jeffmiller7010 3 года назад

    I enjoyed working through this real world data analysis problem with you. I look forward to more, please do more problems like this. It helps me to work out problems in Python.

  • @debasishkar761
    @debasishkar761 3 года назад +2

    That tutorial was really helpful for getting the first grab of DS's application....please make more such "real world DS solutions" like airline data,travel data , companies profit with strategy data,hotel service data,salary vs domain age datasets etc.