Matplotlib Tutorial (Part 2): Bar Charts and Analyzing Data from CSVs

Поделиться
HTML-код
  • Опубликовано: 21 авг 2024

Комментарии • 347

  • @Sharmapawan98
    @Sharmapawan98 5 лет назад +325

    Who needs python docs when you have such an amazing teacher

  • @coreyms
    @coreyms  5 лет назад +116

    I hope everyone finds this video helpful. The next video of the series will be posted tomorrow at the same time. The next video will cover how to create pie charts.
    I'd like to thank Brilliant for sponsoring this series. If you'd like to check them out then you can sign up with this link and get 20% off your premium subscription:
    brilliant.org/cms

    • @dhananjaykansal8097
      @dhananjaykansal8097 5 лет назад

      As usual lovely!!!!!!!

    • @tamasasztalos7484
      @tamasasztalos7484 5 лет назад

      It's a great tutorial; the only thing I was missing is to add total values on the top of each bar charts (can be trickier for stacked bar chart)

    • @ishanpand3y
      @ishanpand3y 4 года назад

      Thank you, sir, for providing top-class tutorials for free.

    • @JuniorDIEKA
      @JuniorDIEKA 4 года назад

      Hello Corey!
      Please can you advise:
      1. how did you the clean the data within the column " LanguageWorkedWith" so that you can generate this clear data?
      2. After I have split it and save it to another csv file a part from the main, the is the output: [(" 'JavaScript'", 53020), (" 'HTML/CSS'", 39761), (" 'Java'", 29863), ("['Bash/Shell/PowerShell'", 28340), (" 'SQL']", 28178), (" 'Python'", 26185), (" 'PHP'", 20394), (" 'SQL'", 19094), (" 'TypeScript']", 16091), ("['HTML/CSS'", 15322)]
      [Finished in 33.6s]
      3. According the below output , how will I do so that it can bring the sum exact of the occurrence of the languages as it look like not doing it?
      Thank you,

    • @JoshKonoff1
      @JoshKonoff1 3 года назад

      Where is the CSV for this? I don't see it in the description. Thank you!

  • @ishanpand3y
    @ishanpand3y 4 года назад +75

    In case you don't know, the shortcut for 8:13 in jupyter notebook is *Ctrl + left mouse click* on the different lines one by one. You can write at different lines at the same time.

  • @MrBuion
    @MrBuion 4 года назад +85

    These series is much better than the curses in Udemy I paid for. Thank you very much.

  • @apoorvwatsky
    @apoorvwatsky 5 лет назад +81

    23:40 here's that one liner if anybody's interested. Personally, I like this more.
    languages, popularity = map(list, zip(*language_counter.most_common(15)))

    • @costasvas341
      @costasvas341 4 года назад +1

      Really nice! Could you please explain what the "*" symbol does?

    • @paklong2556
      @paklong2556 4 года назад

      nice

    • @jg9193
      @jg9193 4 года назад

      Or just: list(zip(*language_counter.most_common(15))). Map is unnecessary as list() automatically maps over an Iterable

    • @corben3348
      @corben3348 4 года назад +3

      @@jg9193 but if you don't use map(list, iterable) then languages and popularity will be tuples so you cannot use reverve() for the rest of the tutorial. Or languages, popularity = [list(e) for e in zip(*language_counter.most_common(15))] without map

    • @jg9193
      @jg9193 4 года назад +2

      @@corben3348 Fair point, I didn't think of that. That said, he could just do languages[::-1] instead of languages.reverse() to reverse a tuple
      Then again, using list() would even be unnecessary if he did that

  • @TheShubham67
    @TheShubham67 4 года назад +9

    This series with pandas one has taken my skills to a new level.

  • @fangshizhu9383
    @fangshizhu9383 3 года назад +1

    At 8:12, when you selected multiple locations and simultaneously type the same code to multiple lines, my world just expanded!

  • @Ghasakable
    @Ghasakable 5 лет назад +16

    Man, you are awesome, everything I have learned about python started from your channel, I wish you the very best all success, as you make everyone happy, keep up the excellent work, we all heavily rely on you.

    • @coreyms
      @coreyms  5 лет назад +5

      Thanks! That's very kind of you.

  • @sunshadow9704
    @sunshadow9704 2 года назад +3

    Corey, you are great teacher. You have rare ability to explain calmly. Much appreciating your efforts.

  • @bmwmhamam
    @bmwmhamam 4 года назад +6

    No body teaches like you. You are the best. Amazing delivery of information, truly useful tutorials. Thank you so much.

  • @SahilKhan-rv9xb
    @SahilKhan-rv9xb 3 года назад +9

    for those wondering how to obtain the CSV file, once you've clicked on it and you see all of the data in your web browser, just right click and say save as

  • @abhishek_raj
    @abhishek_raj 3 года назад +1

    Right from reading data from a csv file to plotting it, you helped a lot of people.

  • @borhansiddiki7079
    @borhansiddiki7079 4 года назад +2

    I think your videos are more understandable than rest of the youtube channels

  • @KienDoanTrung168
    @KienDoanTrung168 5 лет назад +5

    such a great Python instructor with an angelic voice. Thank you so much 😊

  • @shuklarahul17
    @shuklarahul17 4 года назад +1

    As you mentioned Zip can also be used
    language = cnt.most_common(10)
    language.reverse()
    language_X, language_Y = list(zip(*language))
    plt.barh(language_X, language_Y)

  • @shadowmasked7188
    @shadowmasked7188 Год назад +1

    Thank you very much bro, Greetings from Azerbaijan.

  • @djadamkent
    @djadamkent 5 лет назад +14

    Another great video, thank-you. A Pandas series of videos would be awesome!

  • @miosz952
    @miosz952 4 года назад +4

    The great thing about your tutorials is that despite main topic, you learn a lot useful tricks, modules etc.

  • @lalu225
    @lalu225 5 лет назад +5

    Excellent tutorial Corey! Real life stuff and practical, including the use of Counter. It's important to show these data preparation steps. Very helpful indeed, thank you.

  • @dalanxd
    @dalanxd 3 года назад

    Corey Schafer saves my life once again...
    Deep gratitude for your work, man!

  • @nicholasmaloof8378
    @nicholasmaloof8378 5 лет назад +4

    2 weeks later and still not a single dislike on this video

  • @ahmedskasmani
    @ahmedskasmani 4 года назад +4

    Amazing content Corey. The way you simplify the material and explain is awesome, many thanks. Can you please also do a video showing your setup and how you make video's. Thanks !!!

  • @storiesshubham4145
    @storiesshubham4145 2 года назад +1

    I can't express how amazing this video is. What a great teacher you are. 🔥🔥

  • @mohammedismail308
    @mohammedismail308 5 лет назад

    Thanks a lot Corey. Really your videos are endless treasure.
    Just a way for plotting bar charts for more than one dataset on the same plot without need to numpy. Just use built-in map function.
    width = 0.25 #Width of bar
    plt.bar(list(map(lambda x: x-width/2, age_x)), salaries1, color = 'k', width = width)
    plt.bar(list(map(lambda x: x+width/2, age_x)), salaries2, color = 'r', width = width)

  • @micheliwrmg
    @micheliwrmg Год назад

    sad fact, if you want to open csv file in PYcharm , you have to pay for PYcharm Professional(~$230) :(
    btw you are the best teacher I've ever seen

  • @ItzSenaCrazy
    @ItzSenaCrazy 5 лет назад +3

    What I really like is your videos, Corey. I can learn Python and English ;D
    Thanks!!

  • @ericfricke4512
    @ericfricke4512 4 года назад +2

    Programming is so fun.

  • @akunnaemeka395
    @akunnaemeka395 2 года назад

    thank you Brilliant for supporting Corey

  • @dhssb999
    @dhssb999 Год назад

    best matplotlib tutorial ever!

  • @introduction_official6547
    @introduction_official6547 5 месяцев назад +1

    Very informative video, good job Mr Corey

  • @Thedevineforce
    @Thedevineforce 4 года назад +1

    @Corey Schafer .. I came up with below function which will handle the bar widths for multiple bar plots by itself. Just in case anybody wants to use it :
    ages_x = np.asarray([25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35])
    count = 5
    width = 0.8/count
    def width_cal(position):
    shift = np.array([])
    if count < 2:
    return ages_x
    if count % 2 == 0:
    for i in range(1, count, 2):
    shift = np.append(shift, (width/2 * i))
    shift = np.sort(np.append(shift, np.negative(shift)))
    else:
    for i in range(0, count, 2):
    shift = np.append(shift, (width/2 * i))
    shift = np.unique(np.sort(np.append(shift, np.negative(shift))))
    shift = np.around(shift, decimals=3)
    return ages_x + shift[position]
    plt.bar(width_cal(0), dev_y, width=width, color='#444444', label="All Devs")

  • @dgh25
    @dgh25 Год назад

    Your videos are just sprinkled with little golden nuggets! I love it ❤

  • @luiggitello8546
    @luiggitello8546 8 месяцев назад

    This is the best content on RUclips, thank you for so much

  • @brumarul7481
    @brumarul7481 3 года назад +1

    This is pure Gold .

  • @androkranjcevic1988
    @androkranjcevic1988 3 года назад

    Really nice work over here, the most important man on youtube for me.

  • @LashaGoch
    @LashaGoch 3 года назад +1

    This is gold! Thank you very much for doing this, you have incredible talent to explain complicated stuff in an easy manner, keep up good work :)))

  • @58_yesilgul
    @58_yesilgul 7 месяцев назад

    What a perfect lesson, fast and insightful pieces of knowledge...

  • @manosmakris8308
    @manosmakris8308 Год назад +1

    You can also do this for geting the languages and popularity lists.
    languages = list(map(lambda x: x[0], language_counter.most_common(15)))
    print(languages)
    popularity = list(map(lambda x: x[1], language_counter.most_common(15)))
    print(popularity)

  • @Anon282828
    @Anon282828 2 года назад

    thank you for always showing the clear code before abbreviating

  • @SM-vu6fm
    @SM-vu6fm 2 года назад

    Counter() is the best thing I learned today

  • @MagnusAnand
    @MagnusAnand 2 года назад

    I can't believe we need this hack to make a bar chart.
    Great video.

  • @gamengine1176
    @gamengine1176 4 года назад

    Very helpful video. The pandas method is much simpler and easier to understand. Thanks Corey!

  • @redferne01
    @redferne01 5 лет назад +3

    Thank you for your work. I enjoy every lesson.

  • @pratikarai8115
    @pratikarai8115 4 года назад

    Your explanation is awesome...thank you so much ...A great teacher for a lifetime...

  • @dhairyaoza5422
    @dhairyaoza5422 3 года назад

    thank you so much sir,really glad i found ur playlist and didn't waste time on other platforms

  • @questscape
    @questscape 4 года назад +4

    For PANDAS folks:-
    import pandas as pd
    from collections import Counter
    df = pd.read_csv('data.csv', index_col=['Responder_id'])
    language_counter = Counter()
    for response in df['LanguagesWorkedWith']:
    language_counter.update(response.split(';'))

  • @muzaianghanem5644
    @muzaianghanem5644 4 года назад +1

    That's true......you are an amazing teacher. This was very helpful

  • @edcoughlan5742
    @edcoughlan5742 5 лет назад +5

    These videos are great! Coming from R (and ggplot) I was a tad skeptical that Python could emulate R when it came to data viz, but I stand corrected.

  • @KumarGauravhi
    @KumarGauravhi 3 года назад

    Hi Corey, thank you for the wonderful session , I have stuck at this point with the last example :-import csv
    import numpy as np
    import pandas as pd
    from collections import Counter
    from matplotlib import pyplot as plt
    plt.style.use("fivethirtyeight")
    data = pd.read_csv('data.csv')
    ids = data['Responder_id']
    lang_responses = data['LanguagesWorkedWith']
    language_counter = Counter()
    for response in lang_responses:
    language_counter.update(response.split(';'))
    languages = []
    popularity = []
    for item in language_counter.most_common(15):
    languages.append(item[0])
    popularity.append(item[1])
    languages.reverse()
    popularity.reverse()
    plt.barh(languages, popularity)
    plt.title("Most Popular Languages")
    # plt.ylabel("Programming Languages")
    plt.xlabel("Number of People Who Use")
    plt.tight_layout()
    plt.show()
    ### I am getting an error like AttributeError: 'float' object has no attribute 'split' ...Please explain..

  • @rahil1575
    @rahil1575 Год назад

    you are a life saviour for people like me

  • @adirbarak5256
    @adirbarak5256 4 года назад

    for unpacking counter.most_common(x) you can use:
    for a,b in counter.most_common(x) or for a,b in counter.items():
    cause they are the same, they are a list of tuples, which is "zipped" already =
    meaning you can iterate of it simultaneously (a is tuple[0]. b is tuple[1])
    I hope it helps you, yea you out there.

  • @Linshark
    @Linshark 3 года назад

    I just came across this series of videos. They are extremely good :-)

  • @brucegwon
    @brucegwon 4 года назад

    This is the best fantastic lecture for the relation of Python and Pandas I've ever seen!!!!!!!!!!!!!!
    Xie Xie!!!

  • @KC-rl8ub
    @KC-rl8ub 5 лет назад +8

    hi Corey....god bless you

  • @markkennedy9767
    @markkennedy9767 9 месяцев назад

    Thanks for this. Great lesson. As you say, creating multiple bars seems extraordinarily hacky. I would have thought this would be easily dealt with by a plotting library

  • @rnytpl
    @rnytpl 3 года назад

    Thank you man, appreciate the effort and time you've put in creating such amazing content as these.

  • @rotrose7531
    @rotrose7531 2 года назад

    Thank you very much. Please, please come back!

  • @fourdaysdead
    @fourdaysdead 5 лет назад +1

    thank you very much, very clear and straight to the point!

  • @minghaotao6259
    @minghaotao6259 5 лет назад +2

    Thank you for sharing your knowledge!

  • @SandeepChaudhary-vx9zy
    @SandeepChaudhary-vx9zy 4 года назад +1

    Great explanation...thanks a lot Corey sir

  • @franklinlima2571
    @franklinlima2571 4 года назад +1

    Great video! Thank you man

  • @randiarisman2419
    @randiarisman2419 4 года назад

    Another great video form you, Corey. Thank you, you made my day everyday!!

  • @jsceo
    @jsceo 5 лет назад +7

    that feel when I paused tutorial to figure out how to extract languages and popularity from language_counter and later it turns out that you've done that exactly in the same way, lol

  • @eliesawan9513
    @eliesawan9513 3 года назад +1

    you are amazing, waiting for your data science ( ML, AI ) course...... THANKS A LOT!

  • @ronaldjohnson4470
    @ronaldjohnson4470 5 лет назад +1

    Corey, I went back to the documentation, and changed by code to
    ax=plt.subplot()
    ax.set_xticks(x_indexes)
    ax.set_xticklabels(ages_x)
    It worked, but I received a message : MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
    warnings.warn(message, mplDeprecation, stacklevel=1)

  • @akashdeepchauhan5
    @akashdeepchauhan5 3 года назад

    You're making machine learning interesting, thank you

  • @abhishek_raj
    @abhishek_raj 3 года назад

    You explain things really well, kudos!

  • @shazkingdom1702
    @shazkingdom1702 5 лет назад +2

    This is the best Corey; Thank you very much from my 🧠 and ❣

  • @alexanderten5497
    @alexanderten5497 5 лет назад +2

    Thank you very much.its a great tutorial as always

  • @tongliu1076
    @tongliu1076 4 года назад

    Great video as always! Really helpful for detailed explanation.

  • @DeepakKumar-uz4xy
    @DeepakKumar-uz4xy 5 лет назад +1

    thank you professor. love from india. u know what i dont like to read those documentation. when i saw your videos.

  • @shreddersengupta7384
    @shreddersengupta7384 4 года назад

    we can also use the dictionary's keys() and values() for getting x and y axis. x_axis = list(dict.keys())

  • @Lucas-wn5wm
    @Lucas-wn5wm 2 года назад

    I jus found the python legend . Thank god

  • @thebuggser2752
    @thebuggser2752 9 месяцев назад

    Another great video. Thanks!!

  • @PaoloCondo
    @PaoloCondo Год назад

    Thank you for the series of video! :)

  • @lillyclive2641
    @lillyclive2641 4 года назад +1

    Such a great help, thankyou so much!

  • @AbubakerMahmoudshangab
    @AbubakerMahmoudshangab 2 года назад

    Corey. Million thanks bro

  • @DidaKusAlex
    @DidaKusAlex 2 года назад

    great tutorial! the best!! thanks for teaching us!

  • @Coney_island23
    @Coney_island23 2 года назад

    thank you!!!! you ar an excellent teacher

  • @luiscesar_agais
    @luiscesar_agais 2 года назад

    Very nice your explanations. Congratulations.

  • @chandansarkar1123
    @chandansarkar1123 5 лет назад +4

    We can not thank you enough..still thanks a ton Corey.
    I have an interesting observation @9.48. In the plt.xticks(...) method when I use the ticks and labels keywords it gives me AttributeError. It works when I pass the arguments without using keywords. Perhaps it has something to do with my Matplotlib version...

    • @nikhiledu7556
      @nikhiledu7556 5 лет назад

      Same happened with me

    • @asas-jf5iz
      @asas-jf5iz 4 года назад

      yes, some old version matplotlib will have this problem.

    • @kabongontumba9492
      @kabongontumba9492 3 года назад

      Thank you guy, I had the same problem

  • @oscar.kiamba
    @oscar.kiamba Год назад

    The best in you tube .👏

  • @Eric-ii4vj
    @Eric-ii4vj 3 года назад

    you can just use :
    plt.barh(list(reversed(df['language'])),list(reversed(df['popularity'])))

  • @emmanueljimawo5595
    @emmanueljimawo5595 5 лет назад +1

    Great videos. I'm so grateful...

  • @mamathakavety6529
    @mamathakavety6529 Год назад +1

    Please do a tutorial on numpy as well, it would be super helpful, by the way awesome content😁

  • @VishalSharma-rn7mt
    @VishalSharma-rn7mt 4 года назад +1

    Great, amazing video

  • @rajivswargiary1536
    @rajivswargiary1536 5 лет назад +1

    Great tutorial sir

  • @hayetchekired462
    @hayetchekired462 3 года назад

    great instructor

  • @Martin-ij2fp
    @Martin-ij2fp 3 года назад

    Great video!

  • @eziola
    @eziola 10 месяцев назад

    Would still love to see a video on counters and DefaultDict!

  • @guyindisguise
    @guyindisguise 4 года назад +2

    At 9:30 you correct the numbers of the x-axis with plt.xticks()
    Couldn't we just have circumvented that problem by saying
    x_indexes = np.array(ages_x)
    instead of
    x_indexes = np.arange(len(ages_x))
    Since that would have given us an array with the original numbers that we could add/subtract the width to/from?
    Is there any benefit to the plt.xticks() solution (other than seeing how xticks work)?

    • @johannesherbert4943
      @johannesherbert4943 4 года назад

      I thought the exact same thing, why is he making his life more complicated than necessary? There is no problem with adding/subtracting offsets directly from the ages np.array, it just works. It makes it less hacky, too.

  • @abhishek_raj
    @abhishek_raj 3 года назад

    just a suggestion for people with large samples, Use seaborn style, this 538 gets screwed when number of labels is more.

  • @FerdinandCoding
    @FerdinandCoding 4 года назад +1

    thank you for python tutorial

  • @mindbodysoulsculpt6479
    @mindbodysoulsculpt6479 4 года назад

    Great Matplotlib tutorial. But I feel like this is where Pandas also really comes to play, we can use sep = ; inside of the read_csv function instead of creating a custom function. Also, using iloc and loc for indexes and many more awesome built in functions

  • @kerimabdul2263
    @kerimabdul2263 4 года назад +1

    great video.

  • @giuseppeceravolo93
    @giuseppeceravolo93 4 года назад

    Thank you so much for your hard work! You are a great teacher and your video tutorial represent a valuable resource :)

  • @dougmetcalf3720
    @dougmetcalf3720 4 года назад +1

    For the side by side bar plots, I found that using the plt.bar() args align='edge', align='center', and align='edge', along with width=-0.75, 0.5, 0.75 is easier than using numpy and specifying the tick labels.

  • @frankconte2457
    @frankconte2457 5 лет назад +3

    Another great tutorial. Thank you. However, using a Jupyter Notebook, I am having a problem with plt.bar, plt.barh. The error I receive is "unsupported operand type(s) for -: 'str' and 'float'.

  • @johnjones5659
    @johnjones5659 4 года назад

    Thanks you and Brilliant

  • @noway4715
    @noway4715 4 года назад

    Best of the best!