How to scrape Understat for football data in Python with requests and BeautifulSoup

Поделиться
HTML-код
  • Опубликовано: 31 янв 2025

Комментарии • 100

  • @alexbushnell3608
    @alexbushnell3608 3 года назад +11

    The web scraping demo here is fantastic, very clear and easy to apply to other aspects of the website. Top man!

  • @braziliandre30
    @braziliandre30 Год назад +1

    Thank you for taking the time to do this! Been wanting to learn it for a while but lacked the basic skills to start and run run by run. I'd be great if there was a way to just pick a team and start scraping their data from each game for a specific time period... Maybe there's already more work on this as well. Either way I appreciate it!

  • @sergilehkyi
    @sergilehkyi 4 года назад +8

    Thanks for the shout out :)

    • @McKayJohns
      @McKayJohns  4 года назад +1

      You bet

    • @GuardianApe
      @GuardianApe 3 года назад +1

      Serhii I read your blog post. That was awesome man, thanks for putting that out there .

    • @sergilehkyi
      @sergilehkyi 3 года назад +1

      @@GuardianApe thanks 🙏😊

  • @inakigoya5959
    @inakigoya5959 2 года назад +6

    Hey man, excelent video!! I started a master in data science and i wanted to practice with something related with football. I will use this for my FPL team

  • @heisei7361
    @heisei7361 3 года назад +2

    Just coming across and had to click that subscribe button. You're so informative I wish you were my prof 😂 awesome work man!

    • @McKayJohns
      @McKayJohns  3 года назад

      Thank you! Welcome aboard!

  • @marianolambolla1013
    @marianolambolla1013 2 года назад +1

    Great Video! Congrats! You could get the entire json converted directly to dataframe by doing:
    import ast
    pd.read_json(json.dumps(ast.literal_eval(str(data_json['h']))))

    • @McKayJohns
      @McKayJohns  2 года назад

      yep! i didn't learn about ast until after this video but it's a great package. Thanks for pointing that out!

  • @henriquefriedrich5960
    @henriquefriedrich5960 2 года назад

    Superb content man! Btw I have good memories of Barcelona, my team (Internacional) defeated them in 2006 with Adriano Gabiru's goal.

  • @surajshivshankar2348
    @surajshivshankar2348 4 года назад +1

    Excellent video. Keep up the good work!

  • @richardogujawa-oldaccount1336
    @richardogujawa-oldaccount1336 11 месяцев назад

    Thanks McKay, learned a lot from this!

  • @johnmoran5740
    @johnmoran5740 4 года назад +1

    My man! Unreal, helping me a ton rn!!

    • @McKayJohns
      @McKayJohns  4 года назад +1

      Glad to help man!

    • @johnmoran5740
      @johnmoran5740 4 года назад

      @@McKayJohns hey man, what does the x and y coordinates run to and from on understat?

    • @McKayJohns
      @McKayJohns  4 года назад +1

      100 x 100 is the scale

  • @chefjuan6322
    @chefjuan6322 Год назад

    thanks man you saved few hours of my coding

  • @andreascalleja336
    @andreascalleja336 2 года назад +2

    Don't know if this has already been posted, but the nested for loops can be replaced with the following code:
    for shot_event in data_home:
    x.append(shot_event['X'])
    y.append(shot_event['Y'])
    xg.append(shot_event['xG'])
    team.append(shot_event['h_team'])
    And the same for the away team.
    Much cleaner imo this way - No nested loops and no multiple ifs.

  • @zoeksnarf7
    @zoeksnarf7 2 года назад

    Great tutorial, cheers McKay. Instant new sub!

  • @avazbektolibjonov4035
    @avazbektolibjonov4035 Год назад +1

    great video lesson

  • @GuardianApe
    @GuardianApe 3 года назад +1

    This had to be done , thanks for sharing your knowledge.

  • @raaghoal
    @raaghoal 3 года назад +1

    Great work man. Appreciate it.

  • @jakewinwood3872
    @jakewinwood3872 4 года назад +1

    This has been a great help. Thanks

  • @joshcaldwell6946
    @joshcaldwell6946 3 года назад

    This is an awesome tutorial! Thanks so much!

  • @SuperYash1997
    @SuperYash1997 3 года назад +7

    This is really helpful especially for someone starting with football analysis and getting stuck at the initial step of finding the right data. Is there a way to get pass or any event data in general from understat?

    • @McKayJohns
      @McKayJohns  3 года назад +2

      Unfortunately, understat only provides the shot locations.

  • @brandonflexer10
    @brandonflexer10 4 месяца назад

    Great video! Have you found a way to iterate over the competitions to retrieve all match urls for each competition/season? Or given the structure of Understat we have to manually collect all of them?

  • @dr.vojislavhadzimilic3649
    @dr.vojislavhadzimilic3649 3 года назад +1

    Thank you so much for this video

  • @ericsantos8839
    @ericsantos8839 4 года назад +1

    Genius! Really helpful!

  • @pratick5296
    @pratick5296 4 года назад +1

    Amazing content❤️❤️

  • @sehgaldeepika
    @sehgaldeepika 3 года назад

    Amazing content. You should be very proud of what you are doing for the community, specially the people who are just beginning in the field of data visualisation for football. Can i check if the method you have used is easily transferable to the other sites and we can easily scrap data? Also in one of your other videos you had mentioned that you wouldn't recommend scraping the data for any of the analysis, what is the reason for that?

    • @McKayJohns
      @McKayJohns  3 года назад

      I appreciate that! But yes you can use this method to scrape other websites, you will just need to adjust the tags you are looking for. Some things may not be in JSON so you will have to adjust accordingly.
      As well, I meant to say that you shouldn't be scraping the data and then using it in a way that you are going to be like making a ton of money off of it. Like if you scraped understat and then just went and threw up your own version of understat without their consent for profit that would probably be a no.

    • @sehgaldeepika
      @sehgaldeepika 3 года назад +2

      @@McKayJohns Thank you for your response and 100% with you on "ethical" scraping. I really struggled to scrap data from whoscored using this method, do you know what could be wrong? Also, would that be possible to do an article using whoscored as the reference site?

    • @avneetrekhi20
      @avneetrekhi20 3 года назад +1

      @@sehgaldeepika fully agree, that will be dope.

  • @Qwertythemouse
    @Qwertythemouse Год назад

    As far as the transformation from json to pd.DataFrame is concerned that one also works :
    # Combine 'h' and 'a' dictionaries into a single list
    combined_data = data['h'] + data['a']
    # Create a DataFrame from the combined data
    df = pd.DataFrame(combined_data)
    # Display the DataFrame
    df
    So, it does really create a full data frame from json, having that home/away parameter as a column. Then anyone could try his own cleaning wrangling or usage of understat data himself.

  • @emilsaji1762
    @emilsaji1762 3 года назад +1

    Thank you so much broooooo 😍

  • @AlimpanDey
    @AlimpanDey Год назад

    What is that x and y? if those are the x,y coordinates then why does it range from 0-1. Then it will be a square...
    please someone help me out with this..

  • @sravanjs6749
    @sravanjs6749 8 месяцев назад

    please do a video of scrap data and save to csv file for pizza,radr and other charts.
    🙏

  • @Jacek..
    @Jacek.. 4 месяца назад

    Where can I download updated scraped data from the understat website? On github someone shared a package with csv files but last updated 3 years ago. I'm not familiar with Python and can't update the data myself.

  • @BlueSkyGoldSun
    @BlueSkyGoldSun 3 года назад

    Nice. Where I can learn football analytics?
    And is possible to land job in football analytics?

  • @aayushanandan419
    @aayushanandan419 4 года назад +2

    Bro this was so helpful, but how can i segregate other data like shot start/end or defensive actions?

    • @McKayJohns
      @McKayJohns  4 года назад +1

      I don't believe understat provides that data... they are more focused on just shot location, xG and other stats.. You would probably need to get data from places such as wyscout or whoscored to do that

  • @andrkevichandvetal
    @andrkevichandvetal 2 года назад

    Thank you very much, man! It is helpful for my graduation work in university

  • @rmanalista9322
    @rmanalista9322 2 года назад

    Guys I get the following error json_data = json_data.encode('uft8').decode('unicode_escape')
    LookupError: unknown encoding: uft8. Do you know why I get this error? And how can I solve it

  • @samscholes9727
    @samscholes9727 3 года назад

    By converting everything to strings surely that means we cant manipulate the numbers since there arnt any numbers just strings

  • @Radiofreak87
    @Radiofreak87 3 года назад

    could you explain better the coordinate system that these dataframe has? i can't understand where is located the origin (x,y)=(0,0), because these coordinates are always positive (>0). Great video btw GJ
    😀

  • @sushantregmi2126
    @sushantregmi2126 3 года назад +1

    Great stuff man, which club do you support? Please don't say arsenal

  • @mobhamjee786
    @mobhamjee786 11 месяцев назад

    How would you plot this for the shot map

  • @afiqaimanafr
    @afiqaimanafr 3 года назад +1

    Sorry, I would like to ask, I am a beginner, what exactly the aim of scrapping the understat of football data?

    • @McKayJohns
      @McKayJohns  3 года назад

      they have data you can use to analyze things such as shots, xg, etc.

  • @joilsongb436
    @joilsongb436 3 года назад

    can you do this method on the page
    b e t 3 6 5 ?
    I couldn't with the instruction in this video
    Delete the spaces between the words

  • @sriram-uu6yd
    @sriram-uu6yd 3 года назад

    Hi, thanks for the video. I scrapped the shots data from understat, but I am not sure how to convert the X and Y values into X-coordinate, Y-coordinate values to create a shot map. Can you please give an idea.

    • @McKayJohns
      @McKayJohns  3 года назад

      If you watch some other videos they explain how to do this!

    • @sriram-uu6yd
      @sriram-uu6yd 3 года назад

      @@McKayJohns Thanks for the response. I did try, I am searching for almost a week, but I couldn’t find anything. Almost all the videos I’ve seen in RUclips or somewhere else starts with a ready made file where X and Y coordinates are there already. Or I guess I don’t know how to search exactly 😬

    • @abinthomas10
      @abinthomas10 Год назад

      You just have to multiply the X and y values to the dimensions of the pitch that u have selected. So u have to create a new column maybe (NewX= X*120) like that

  • @qurramzaheer3882
    @qurramzaheer3882 2 года назад

    Hey, I was wondering: if I want to scrape multiple pages, what kind of timeout should I be using between each request? Thanks for the very helpful video

    • @McKayJohns
      @McKayJohns  2 года назад +1

      It depends on how well the webpage is at slowing down / blocking requests. Technically you can hit it as fast as you want, but for understat's sake maybe you slow it down a half second between each so they don't get overwhelmed

  • @bernoulisan9649
    @bernoulisan9649 3 года назад

    How can i get data manually from a football match please ?

  • @davidchapman2629
    @davidchapman2629 2 года назад +1

    Great video! I'm trying to do this in Java, do you know how to do the encode & decode in Java? I'm talking about this line:
    encode('utf8').decode('unicode_escape')
    Thank you!

  • @goergejohn6986
    @goergejohn6986 2 года назад

    Do you know how I can scrape multiple matches/pages on that website?

  • @claudio7614
    @claudio7614 3 года назад

    Hi, can you help to convert the thrid script in the page called "roostersData? I changed from 1 to 2 in scripts, but even changing variables it doesn't work, seems it's a bit different from the shotsData one...thanks!

  • @GLDTruth
    @GLDTruth 3 года назад

    I had this working a while back, but went to run another game, and I'm getting this error:
    NameError Traceback (most recent call last)
    in ()
    1 res = requests.get(url)
    ----> 2 soup = BeautifulSoup(res.content, 'lxml')
    3 scripts = soup.find_all('script')
    NameError: name 'BeautifulSoup' is not defined
    Nothing else changed but the match id. Thank you for your tutorials

  • @mathijshartmann2118
    @mathijshartmann2118 2 года назад

    Can I ask what the x and y have for meaning in the match?

  • @nikhilrajesh7012
    @nikhilrajesh7012 4 года назад +1

    Did you make modifications to your scraper based on my feedback on Twitter?

  • @pramitbardhan7725
    @pramitbardhan7725 3 года назад

    But i dont think understat has any international or CL data right? Just the leagues ig

  • @willykitheka7618
    @willykitheka7618 3 года назад +1

    Am a Real Madrid fan and I subscribed!😁😁😁...thanks for sharing...I will be visiting again!

    • @McKayJohns
      @McKayJohns  3 года назад

      Awesome! Thank you! Even tho you are a madrid fan ;)

  • @grogg2243
    @grogg2243 3 года назад

    Hi mate. Is there a way to visualise the data at the end

    • @McKayJohns
      @McKayJohns  3 года назад

      Ya i have a lot of tutorials on my channel which show how to make shotmaps or xG charts for example

    • @grogg2243
      @grogg2243 3 года назад

      @@McKayJohns Hi McKay. I have had a look but I think for those visualisations you need the corresponding minutes with the data which isn't with the understat data. Is there anyway you can do it without the minutes?

  • @tibortoth2672
    @tibortoth2672 3 года назад

    Any recommendations on how to scrape Sofascore data?

    • @McKayJohns
      @McKayJohns  3 года назад

      I personally have never done it. Probably would just need to use requests and BeautifulSoup

  • @wartawen
    @wartawen 3 года назад +1

    Thank you for providing this tutorial! If I have a list with the match id's I want to scrape (instead of 1 by 1), what are the necessary modifications to the code? I guess that an additional for loop should be written, but don't know where.

    • @McKayJohns
      @McKayJohns  3 года назад

      you would need to loop through your list of match id's and every time you loop, you would use the next match id and then aggregate all of that data to a single dataframe.
      Put the for loop at the beginning of the code and it should work out

  • @rianmcnamara2399
    @rianmcnamara2399 4 года назад +2

    Hey man, fantastic videos, really great stuff!
    I just have one quick question, is this method easily transferrable to scraping a players data rather than data from a single game?
    I have tried it and gotten along nicely until it came to around the 16 minute mark in this video. Where you have inputted "data_away = data ['a']" and "data_home = data ['h']", I am struggling to figure out what to put as obviously there isn't any home/away data to separate. I hope I am making sense when I am explaining this, I'm probably not though!
    Anyways, great work man

    • @McKayJohns
      @McKayJohns  4 года назад

      Appreciate it!
      So to get an individual players data, you will need to switch the url to be something like this understat.com/player/2097 (that is for Messi) and you can find the shots for the player if you look through the json data.
      If you are just wanting to get an individual player's shots you won't need that part of the code.
      If you have any other questions reach out to me on twitter!

  • @christopheraryo3040
    @christopheraryo3040 3 года назад

    THANKYOUUU

  • @dbn-k9b
    @dbn-k9b 2 года назад

    Awesome video bro...help me write a program to alert me when my variable of choice (team) scores or gets a yellow card or wins a corner kick etc. I need to be able to punch in the id of the team and id of variable I want to keep an eye on, hook it up to the internet and let it scrap while Iwait for the program to alert me if id (goal, corner, yellow card, penalty, odd) is True...
    U get the idea....

  • @Solace_Yard
    @Solace_Yard 3 года назад

    Hello brother, thanks for the video. i want a scraping project done. Are you able to help please? we can talk privately.

  • @Paperscissor183
    @Paperscissor183 3 года назад

    Hi this is a great video, can please scrape lotto data

  • @MaartenRobaeys
    @MaartenRobaeys Год назад

    Github file still exits?

    • @McKayJohns
      @McKayJohns  Год назад

      github.com/mckayjohns/youtube-videos hey sorry i'll update it but heres where the files are at now

  • @bunnybabu1162
    @bunnybabu1162 3 года назад

    Wowwww