Building a bot to scrape job data… How NOT to collect data

Поделиться
HTML-код
  • Опубликовано: 7 янв 2025

Комментарии • 219

  • @LukeBarousse
    @LukeBarousse  3 года назад +18

    Where do you usually go to collect data for your data science projects? 👨🏼‍💻📊🤖📈

    • @ShivekMaharaj
      @ShivekMaharaj 3 года назад +1

      Epic Video, Luke! 👌

    • @ShivekMaharaj
      @ShivekMaharaj 3 года назад +3

      You know, I actually wanted to tell you, my friend (the one working at VW) actually does the job of a Data Engineer, Data Analyst, Data Scientist, and ML Engineer 😂 Cause he's the first data scientist the company hired. He's in a team of one person 😂

    • @IMajidAhmadKhan
      @IMajidAhmadKhan 3 года назад +3

      @@ShivekMaharaj you mean Volkswagen, no way bro

    • @johnrussell5715
      @johnrussell5715 3 года назад +5

      Most federal agencies have open data warehouses. I downloaded all of the PPP loan data from the Treasury department.

    • @LukeBarousse
      @LukeBarousse  3 года назад +2

      @@johnrussell5715 This is great to know, thank you for sharing this!

  • @johnrussell5715
    @johnrussell5715 3 года назад +64

    I saw an "entry level" data analyst job that required 6-8 years experience!

    • @LukeBarousse
      @LukeBarousse  3 года назад +10

      🤣 That's just ridiculous (and also unfortunate)! Entry level jobs should not be asking for that many years of experience

    • @yasinudun4147
      @yasinudun4147 2 года назад

      @@LukeBarousse I have encountered so many entry level jobs that requires so many years that I want to make a project focusing on that. I was hoping to scrape data from linkedin. However, your video shows that it is not a good idea. I wonder how did you solve this problem. Thank you for your kind attention.

    • @LukeBarousse
      @LukeBarousse  2 года назад +3

      @@yasinudun4147 Working on this project again RN actually... will have more details for how to do this in the near future

    • @victorbegnini5754
      @victorbegnini5754 Год назад

      😂😂😂

    • @nilsoncampos8336
      @nilsoncampos8336 6 месяцев назад

      😂😂😂😂😂😂

  • @KenJee_ds
    @KenJee_ds 3 года назад +14

    Congrats on 100K Luke!!

    • @LukeBarousse
      @LukeBarousse  3 года назад +3

      Thanks Ken! Honestly couldn't have done it without your support and shoutouts over this past year; it has helped out more than you know!! Here's to us making it to 1 million subs!

  • @928khaled
    @928khaled 3 года назад +32

    So ı was watching one of these videos, and the guy was talking about ' how to not to violate terms and conditions' and get the data, he managed to pull an amazing stunt, see one of the issues of bots is your requests keeps their servers busy, what he did is minimizing requests action, an equivlent of opening page 1, then page 2and so on , without running any other scrabing commands, and then content of the pages he opened was saved somehow locally on their machine as a log, now he opens these files and pull the data from the saved pages offline, pretty amazing but a dirty job, you might want to look it up😅

    • @LukeBarousse
      @LukeBarousse  3 года назад +10

      Thanks for sharing this! LinkedIn Job search pages are actually really slow to load (compared to other pages on my machine) and so I really feel like i'm taxing the LinkedIn servers; because of this I think you offer a really interesting solution to help minimize that. I'll take a look into this, so thank you!!

    • @wittyeva_
      @wittyeva_ 3 года назад +3

      @@LukeBarousse can’t wait to hear how it went

    • @928khaled
      @928khaled 3 года назад +9

      @@LukeBarousse the proper wording for the thing is HAR file, HTTP Archive format, you can use google chrome to download the file and at this point instead of scraping a website overloading it's server with requests, you're basically parsing a HAR file containg the data that you need.
      Also one good tip when scarping: don't make your actions look like an outlier when you get analyzed by those who you're scraping, blend in the crowd, computers are fast, humans are not. 🤫😉

    • @LukeBarousse
      @LukeBarousse  3 года назад +5

      @@928khaled Thanks for all these tips! I'm going to look more into HAR files and see if I can learn to apply it to my use case. Thanks again!

    • @thefamousdjx
      @thefamousdjx 3 года назад

      I dont get how this helps. If you opened a page then you have already taxed their server. Pulling info from the page isnt affecting them at all so saving the page offline is an unnecessary step.

  • @bouseuxlatache4140
    @bouseuxlatache4140 3 года назад +24

    that was a nice shared experience Luke. a friend had talks with several website administrators and it seems that big companies are trying to limit web scrapping for legal and technical reasons, as it tends to slow down the website performance.

    • @LukeBarousse
      @LukeBarousse  3 года назад +9

      Also very true! That's a great point on the technical reasons, that I meant to bring up in this video. 🤦🏼‍♂️ We sometimes don't think of how using a bot may actually slow down servers and thus effects the company's users experience for others, very valid!

    • @wanderingtravellerAB99
      @wanderingtravellerAB99 3 года назад +8

      That's only the second reason. The main reason is - it's their data, and managing it and curating it is part of their value offering. They limit access to prevent dilution of their assets and maximise their own monetisation potential. Why do you think they built the website in the first place?

    • @bouseuxlatache4140
      @bouseuxlatache4140 3 года назад +9

      @@wanderingtravellerAB99 absolutely. however since this informtion is made available on the internet, which is not an intranet, a website can be considered a catalog or a booklet to be shared. also just like our own information will be used by the website during our visit it would be fair to say that the visitor can also save some information also.

    • @annxiao7721
      @annxiao7721 Год назад

      lol my hub is one of those folks working for evil corporate America trying to stop scraping..

    • @annxiao7721
      @annxiao7721 Год назад

      @@wanderingtravellerAB99 I agree with what you are saying about the value of data asset, but speed cannot be underestimated to become secondary. If you have devops friends in big companies, they can tell you website speed is gauged in microseconds. Slowing down the website for a fraction of time can cause millions of direct monetary losses.

  • @IamAmisos
    @IamAmisos 4 месяца назад +1

    Nice content! I have nothing to add, just wanted to share my appreciation!

  • @aliffnabil5542
    @aliffnabil5542 3 года назад +6

    i just did this project last week but scraping from indeed and pretty much can relate to you so much!!!!

    • @LukeBarousse
      @LukeBarousse  3 года назад +2

      😂 You spend all this time building this beautiful bot, and then BAM! No more scraping!

    • @neamenbeyene7598
      @neamenbeyene7598 5 месяцев назад

      Did you run in to the same problem?

  • @VhangorD2
    @VhangorD2 3 года назад +4

    awesome video! Thanks for sharing this content. I'm in my path to become a Data Analyst (Google DA Certificate in progress) and all this information is really useful.

    • @LukeBarousse
      @LukeBarousse  3 года назад +1

      Heck yeah, glad you are getting use out of my content Marcelo! Good luck with the Google Certificate!!

  • @Major_Data
    @Major_Data 3 года назад +3

    Congratulations on 100K Luke!

    • @bouseuxlatache4140
      @bouseuxlatache4140 3 года назад +1

      Yes! i thought we should celebrate

    • @LukeBarousse
      @LukeBarousse  3 года назад +4

      Thanks a lot Al! So appreciative of your shoutouts for my channel, this has helped more than you know in reaching this milestone!

    • @LukeBarousse
      @LukeBarousse  3 года назад +4

      And Bouseux, looking to launch a special video either later this week or next!

    • @bouseuxlatache4140
      @bouseuxlatache4140 3 года назад +1

      @@LukeBarousse 🎆

  • @nigelstory5559
    @nigelstory5559 3 года назад +13

    Definitely wanna be careful with the “public” jobs posting page too because they can ban your IP address, and that would suck… that happened to me with the U.S. Senate website

    • @LukeBarousse
      @LukeBarousse  3 года назад +7

      This is good to know, that's why I'm still hesitant! Where you scraping data from the U.S. Senate website?

  • @arthurmiller5276
    @arthurmiller5276 3 года назад +1

    This is a great followup to the earlier video. I'd hoped when you posted the previous video you would follow up with information on ethically dealing with the TOS limitations on various sites. Looking forward to further followup.

    • @LukeBarousse
      @LukeBarousse  3 года назад

      Heck yeah, thanks for the idea Arthur. More to come on this series!

  • @marlonlozada3048
    @marlonlozada3048 Год назад +1

    great video and explanation of your process and pains!

  • @Pedro_Israel
    @Pedro_Israel 2 года назад +2

    Reading the comments, if linkedin will not allow scrapping its data because of servers overload or business info. Then, they shoul provide this kind of analysis for all of us searching. Maybe they could charge it too, I would pay for that

  • @acsy10000
    @acsy10000 3 года назад +8

    Nice project! I am really curios to see any kind of results of the data You have collected until now.

    • @LukeBarousse
      @LukeBarousse  3 года назад +3

      Yeah I'll be looking into the results in the upcoming episodes of this series!!

  • @jaredalbin5658
    @jaredalbin5658 3 года назад +2

    8:43 you're not comfortable doing that shit? I get it man, haha

    • @LukeBarousse
      @LukeBarousse  3 года назад +1

      🤣😂 Still deciding, but I really want that data from LinkedIn 🤷🏼‍♂️

  • @voidfinder3086
    @voidfinder3086 3 года назад +1

    This is really cool Luke.

    • @LukeBarousse
      @LukeBarousse  3 года назад

      Thanks Kumar, glad you enjoyed it!!

  • @muhammedanas638
    @muhammedanas638 2 года назад +2

    can you make a video about how we can legally scrape a data from linked in or other applications for business purpose...??

    • @LukeBarousse
      @LukeBarousse  2 года назад +1

      Yes, working on a video rn on this topic

  • @syhusada1130
    @syhusada1130 2 года назад

    Thank you Luke.
    I scrapped sports data recently and got banned. But it wasn't automated scrapping, and the ban was not harsh, still, would be wise to be more mindful of the scrapping we do.

  • @Cowwy
    @Cowwy Год назад +1

    I am learning about web scrapping now to tackle this project.
    I really love how you laid out your thought process and share with us the legality of web scrapping.
    Have you done any more job scrapping since this video?

    • @LukeBarousse
      @LukeBarousse  Год назад

      Yes! I have a video talking about how "I analyzed XXX,XXX jobs to solve this" Check it out!

  • @youssefahmed6383
    @youssefahmed6383 3 года назад +1

    awesome as usual ,luke

  • @alecubudulecu
    @alecubudulecu 2 года назад +1

    Regarding public scraping. Put the py task on something like a VM or raspberry pi with a VPN.
    They will ban your IP at some point.
    Change IPs. Repeat.
    You will have hundreds of IPs then after a few years can just change VPN or reuse old IPs.

    • @LukeBarousse
      @LukeBarousse  2 года назад

      I like this approach. I’m interested in trying this

    • @alecubudulecu
      @alecubudulecu 2 года назад

      @@LukeBarousse I do this with a few scrapers like this. Where the companies say “no”. I got blocked by IP so I set up a raspberry Pi that runs and I just change the IP whenever a block happens.

    • @LukeBarousse
      @LukeBarousse  2 года назад +1

      @@alecubudulecu Oh sweet! I have a few raspberry Pi's lying around actually!

  • @rem9486
    @rem9486 3 года назад

    Luke! Congratulations!!! 💯🔥🥳🍻

  • @rahulpareek328
    @rahulpareek328 3 года назад +2

    Hello Luke
    Thank you for providing such great videos
    Love from 🇮🇳

    • @LukeBarousse
      @LukeBarousse  3 года назад

      No problem at all Rahul!! I appreciate the support!

  • @Major_Data
    @Major_Data 3 года назад +8

    Linkedin jail is no joke! They keep throwing me in the slammer for the same reason, and I dunno how TF to even make a web scraper!

    • @LukeBarousse
      @LukeBarousse  3 года назад +5

      That's why you gotta come to the dark side #🐍. 😂🤣
      On a serious note, as my number one source of content on LinkedIn, I can't have you in LinkedIn Jail!

  • @davidmiller-td1sl
    @davidmiller-td1sl 3 года назад +5

    Glad you used a burner account! That’s a pretty hefty ban!

    • @LukeBarousse
      @LukeBarousse  3 года назад +2

      I learned that trick from you Dave 😘

  • @gelatsx3724
    @gelatsx3724 2 года назад +2

    Looking forward to chapter 2!

    • @LukeBarousse
      @LukeBarousse  2 года назад +1

      I need to get around to making this! 😳

  • @troy_neilson
    @troy_neilson 3 года назад

    Some very interesting points around click wraps, etc... Keen to see where you take this one.

    • @LukeBarousse
      @LukeBarousse  3 года назад +1

      I'm interested as well! Still brainstorming 😂 Thanks Troy!

  • @zolac9732
    @zolac9732 3 года назад +2

    The data is publicly available without logging in to LinkedIn. However I noticed certain filter fields like easy apply, and experience level and unavailable.

    • @LukeBarousse
      @LukeBarousse  3 года назад

      Yeah, I noticed the sme thing... although for my purposes those filters aren't really necessary so that may work

  • @StopWhining491
    @StopWhining491 2 года назад

    I always tell my students that if they didn't create it that they don't own it. I also remind them that software they didn't write will almost always comes with Terms and Conditions, that boring legalese that few people read; if they click "Agree to Terms and Conditions," they've just likely created a binding contract with the developer. Caveat emptor.

  • @caraziegel7652
    @caraziegel7652 3 года назад +1

    Very cool - i actually wanted to download my connections skills for a data science project, but got too busy with work . . . but i would realy only need a 1-time download, i think? idk, thanks for reminding me of this project idea anyways!

    • @LukeBarousse
      @LukeBarousse  3 года назад

      Glad I could motivate to do a similiar project!! Hope you dug back into this project!

  • @KenJee_ds
    @KenJee_ds 3 года назад +11

    May want to try to scrape glassdoor instead! I may know a playlist or something that can show you how 😉

    • @LukeBarousse
      @LukeBarousse  3 года назад +3

      ruclips.net/video/GmW4F6MHqqs/видео.html

    • @LukeBarousse
      @LukeBarousse  3 года назад +3

      😜 Should have went glassdoor instead, just watched your vid.. Your approach was a lot easier than mine 🤦🏼‍♂️

    • @mrhamster2983
      @mrhamster2983 2 года назад

      Ken ma man! ty for doing god's work!!

  • @sitaramkakumanu
    @sitaramkakumanu Год назад +1

    Very Nice video Luke.. I think the bot is logging at a regular period every time every day. how about creating a random time generator at a 1 hr window time so that they can't detect?

  • @arthemos4627
    @arthemos4627 11 месяцев назад

    Did you find a way to overcome the 1000/2500 search limit?

  • @PradeepReddy-vm1be
    @PradeepReddy-vm1be Год назад +1

    Hie bro i luved ur work.
    Am trying to collect jobs data but it got lot issues. Can you help me out with your git repo

  • @joseluizdurigon8893
    @joseluizdurigon8893 2 года назад +2

    Wouldn't setting a time.sleep() for a few seconds help the scrapping? I mean, what characterizes a bot is the speed (and automatically doing stuff obviously) but they see you're a bot because of the speed.

    • @eldarmammadov9917
      @eldarmammadov9917 Год назад +1

      There is also browser Post info which lets server understand it is bot

  • @thibaudfreyd3834
    @thibaudfreyd3834 3 года назад +1

    Thanks for sharing Luke, I am also building my internet scraper to gather house data, hope I do not run into similar banning issues!! 🤞

    • @LukeBarousse
      @LukeBarousse  3 года назад +1

      If you build it as close as possible to mimic human behavior, you shouldn't have an issue. Good luck with your scraper!

  • @DavidCarmonaUX
    @DavidCarmonaUX 2 года назад +1

    I work at Linkedin and learning how to scrape data with octoparse to build data sets for product designers. We have allot of internal tools and I'm curious to combine methods

    • @LukeBarousse
      @LukeBarousse  2 года назад +1

      Thats interesting about Octoparse, I haven't heard of a lot of people using this solution for scraping; i'll have to check it out. Thanks for sharing David. What websites have you had luck with scraping?

    • @DavidCarmonaUX
      @DavidCarmonaUX 2 года назад

      @@LukeBarousse I'm 100% linkedin focused, so I'm attempting to do something in the enterprise/consumer use-cases for anything professional.... using Airtable as the api for Figma or whatever software product uses. Currently stuck on doing batch url -> image downloads... and how to automate the deletion of data after captured (GDPR related). Thanks for your channel, I'm in a odd spot between design systems and trying to bridge AI or data science into product evolution.

    • @LukeBarousse
      @LukeBarousse  2 года назад +1

      @@DavidCarmonaUX This project sounds awesome, David! Good luck with it, it seems like you have the hardest part figured out!

  • @abhinav2529
    @abhinav2529 3 года назад +2

    Is it hard to work as a data scientist, I mean is it a chill job where you get a lot of time left or is it hectic?

    • @LukeBarousse
      @LukeBarousse  3 года назад +1

      I'm a data analyst, so I may not be best to answer that

  • @barulli87
    @barulli87 2 года назад +1

    why isnt there a video on how to do this project?

  • @juanpaolo21yt
    @juanpaolo21yt 3 года назад +1

    Imnot a pro but do some simple automation or bot at work. You inspired me to study more and do more. Thank you sir! From the Philippines.

    • @LukeBarousse
      @LukeBarousse  3 года назад

      Thanks so much for this!! This is actually inspiring to me as well to hear, so thank you!

  • @tacodumpling9282
    @tacodumpling9282 2 года назад

    Do you happen to know any site that has plenty of uncleaned data? Most of the data are already cleaned from websites like Awesome Public Datasets, Kaggle, and Google Data Search. Thank you!

    • @LukeBarousse
      @LukeBarousse  2 года назад

      I have some listed in the description

  • @pght95
    @pght95 Год назад +1

    by any chance did you share your code on GitHub or somewhere else?

    • @LukeBarousse
      @LukeBarousse  Год назад

      Yeah I have a few repos shared on github

    • @godojos
      @godojos Год назад

      @@LukeBarousse I didn't see this repo on your Github. Where did you land with deciding whether or not to follow up on this here video?

    • @LukeBarousse
      @LukeBarousse  Год назад

      @@godojos ruclips.net/video/7G_Kz5MOqps/видео.html
      This video is the follow up!

  • @Night_Sketching
    @Night_Sketching Год назад

    Love scraping, but I dont think that not being logging in will somehow void their TOS. The thing is that data available to see on a website without a log in, might be still protected by some law.

  • @mahdali6517
    @mahdali6517 2 года назад

    THanks for sharing the awesome information but I was hoping you would take one project from the start to the finish .

  • @laxyaberde8829
    @laxyaberde8829 3 года назад +2

    Hey Hi Luke Add more videos I like your explanation and you are doing nice work.

    • @LukeBarousse
      @LukeBarousse  3 года назад

      Aww thanks so much for this Laxya! Let me see what I can do!

  • @kunalghosh8852
    @kunalghosh8852 3 года назад +3

    May I know which course did you take to design a bot that you are using for web scraping?

    • @LukeBarousse
      @LukeBarousse  3 года назад +6

      Yeah! Python for Web Scraping by Data Camp: lukeb.co/WebScrapingPython

    • @kunalghosh8852
      @kunalghosh8852 3 года назад +1

      @@LukeBarousse thanks! :)

  • @Pedro_Israel
    @Pedro_Israel 2 года назад +2

    I started doing a project with the same objective but found it too hard to retrieve the data. Do you have the table with that data??. Anyways, great video, I didnt know it was against policy either.

  • @dataArtists
    @dataArtists 3 года назад +2

    Great video "Johnny"! You may also want to look into scrapping info that contains any PII (Personally Identifiable Information) that the postings have and its relation to GDPR if that information is from an EU member country.

    • @LukeBarousse
      @LukeBarousse  3 года назад +4

      I'm going to have to start going by Johnny now 😂 Thats also a good point on analyzing it from the GDPR perspective... I solely focused on the US in this case but may need to think larger, thanks for this Robert!

    • @dataArtists
      @dataArtists 3 года назад +3

      @@LukeBarousse That sounds like a new Data name. Think Larger = Data Superhero (like game boss level).

  • @G3nM
    @G3nM 3 года назад

    You are a legend, bro!

    • @LukeBarousse
      @LukeBarousse  3 года назад +1

      I appreciate that my dude!! 🤙🏼

  • @SunDevilThor
    @SunDevilThor 3 года назад

    So far, I’ve only been banned from one site, which was from the UK that I would never visit regularly. I was just using it to practice.

  • @mdhidayat5706
    @mdhidayat5706 3 года назад +1

    Thanks for the tips on burner account.

    • @LukeBarousse
      @LukeBarousse  3 года назад +2

      No probs! But I don't know if i'd recommend the burner account 🤣

  • @YMuhammadyusuf
    @YMuhammadyusuf Год назад

    hey, may linkedIn block my account if I do web scrapping in this way?

  • @ChaseBank_3k
    @ChaseBank_3k 4 месяца назад

    Do you have an updated video on this or does it still work?

  • @afaqahmed8869
    @afaqahmed8869 2 года назад

    Sir I am trying to implement your scraping code from your github repository and while installing requirements.txt file ,it is giving me this error -->" ERROR: Invalid requirement: '_ipyw_jlab_nb_ext_conf=0.1.0=py38_0' (from line 4 of requirements.txt)
    Hint: = is not a valid operator. Did you mean == ?"
    I changed = to ==,but it is still giving an error.Sir is it a mistake on my side or something is wrong with the requirements file.

  • @jordanhughes4436
    @jordanhughes4436 3 года назад +1

    How much time do you put into these projects including the googling and the learning?

    • @LukeBarousse
      @LukeBarousse  3 года назад +1

      It really depends. This one I probably have over a 100 hours. It’s not necessary for projects to spend this much time but sometimes I go a little overboard. Ha

  • @anirbanpatra3017
    @anirbanpatra3017 2 года назад

    Let Me Give You a Tip.Use Undetected Chromedriver as your webdriver in Selenium.
    Still You should use all the ways to avoid detection.

    • @LukeBarousse
      @LukeBarousse  2 года назад

      Thanks for this, i'm going to try this!

  • @ramiboy_y2049
    @ramiboy_y2049 Год назад

    I need the next part of the video. Or maybe someone has used scraperapi I really need that data from linkedin.

    • @LukeBarousse
      @LukeBarousse  Год назад

      check out my how I use python video... I provide the data there

  • @matpikachu
    @matpikachu 2 года назад

    So web scraping is like accessing by an API just dirtier data?

  • @bboyExia
    @bboyExia 2 года назад

    I felt this. I usually stay away from scrapping nowadays. And pray to god there's some api or third party unofficial api for data. Cant be bother to invest in creating something crawl only for it get banned or ip hit. Theres another comment on admins hating crawlers. So theres that too.

  • @TechNetworkz
    @TechNetworkz 2 года назад +1

    where i can get the watch bro

  • @pureffecto
    @pureffecto 3 года назад

    4:19 are you using VSCode ?

    • @LukeBarousse
      @LukeBarousse  3 года назад

      Yeah! love me some VSCode for python

  • @johnatef9076
    @johnatef9076 3 года назад

    Is there any site other than google to get a free data analysis course from with certificate ?

    • @LukeBarousse
      @LukeBarousse  3 года назад +1

      I'm not sure on this one... i'd have too look at it more. Just so you're aware the Google Data Analytics certificate has financial aid available, all you have to do is apply to see if you can get it

  • @DarkGT
    @DarkGT 2 года назад

    This remands of stories of how people hire low cost workers from India to solve the I'm Human tests for those automated bots and other shady things.

  • @Aquaowen
    @Aquaowen 5 месяцев назад

    This is very interesting!

  • @acararslan732
    @acararslan732 3 года назад

    that was the coolest thing i have ever seen

  • @BenKlock-k9w
    @BenKlock-k9w 2 месяца назад

    You left me hanging at the end. Like what does “this publically available data” mean? It seems like you cut this video off short.

  • @faustopf-.
    @faustopf-. 2 года назад

    Why nobody shows in their tutorials a real set of data extracted with their so mencioned "method"?

  • @vectoralphaSec
    @vectoralphaSec Год назад

    At least your IP wasnt banned.

  • @datastack1883
    @datastack1883 3 года назад +1

    That is valuable info.

  • @hamdimohamed8913
    @hamdimohamed8913 2 года назад

    is the code open sourced ? can't find any github repo ?

  • @michelchaghoury870
    @michelchaghoury870 2 года назад

    i would like to see a tutorial on how to do webscrping with python on amazon, ebay or any other place to get data, can you please make tutorials about these topics please and keep going

    • @LukeBarousse
      @LukeBarousse  2 года назад +1

      Thanks for this video idea Michel! Let me see what I can do on this topic!!

    • @michelchaghoury870
      @michelchaghoury870 2 года назад

      @@LukeBarousse great, thank you, you really give me passion in every single vid, please keep going your content is very helpfull, can you please in the future do more technical or project based vids, once again thank you

  • @storygenix3099
    @storygenix3099 3 года назад +1

    You look like the brother of Kalle Hallden

  • @keifer7813
    @keifer7813 2 года назад

    Noone and I mean noone makes videos quite like you.

    • @LukeBarousse
      @LukeBarousse  2 года назад +1

      I really appreciate this! That's my goal with my videos 🙌🏼

    • @keifer7813
      @keifer7813 2 года назад

      @@LukeBarousse Keep doing your thing man, look forward to em!

  • @BhupeshRajShakya
    @BhupeshRajShakya 3 года назад

    Once i used to slow down abit the script so that i could avoid recapcha

    • @LukeBarousse
      @LukeBarousse  3 года назад +1

      Yeah some waits to add time between requests

  • @asteriskconfidential7403
    @asteriskconfidential7403 3 года назад

    but I still dont know how LinkedIn discovered that you use scraper. maybe you access too frequently ? scroll too fast? or your access is python request?

    • @thefamousdjx
      @thefamousdjx 3 года назад

      Selenium can be detected

    • @LukeBarousse
      @LukeBarousse  3 года назад +1

      yeah, I need to look into if I can hide this...

    • @TheVintageEngineer
      @TheVintageEngineer 3 года назад

      Probably all kinda of stuff. If User Agent isn’t set to common one, it will flag. Some sites use Akamai servers which have all kinds of anti-bot measures. Request per time frame, progressive cookie data, user agent filtering, basic http request header filtering (like if the accepted language isn’t just right).

    • @thefamousdjx
      @thefamousdjx 3 года назад

      @@LukeBarousse Sure. I once created one on linkedin but I only used selenium to login then passed on the session to the requests library to continue the scraping with ease. Using requests helps avoids having to deal with captcha and 'Are you a robot' checks. But I wasnt running it everyday so maybe I probably would have met the same fate

    • @LukeBarousse
      @LukeBarousse  3 года назад

      @@thefamousdjx Thanks for sharing this!

  • @BLK_O
    @BLK_O 3 года назад +1

    Hope you've used a burner account

  • @Kaboomnz
    @Kaboomnz 3 года назад

    Nice work around for the legalities at the end but I have to address something you said. "python is the superior language" is factually incorrect, C# is far better than python, you should use it.

    • @LukeBarousse
      @LukeBarousse  3 года назад

      😂 I was mainly saying it as a joke... I feel all languages have their pros and cons... i'm just a python fan boi

    • @Kaboomnz
      @Kaboomnz 3 года назад

      @@LukeBarousse lol fair enough. I've been using C# for years, it's easier to learn than you might think and the multi-threading is top notch.

  • @KiranSharma-ey6xp
    @KiranSharma-ey6xp 3 года назад

    Nice,
    You forgot to attach scrap data csv :D :D

  • @vibe-runewild
    @vibe-runewild 3 года назад

    How can I contact you if I want you to
    A bot for me?

    • @LukeBarousse
      @LukeBarousse  3 года назад

      I don’t consult, per se, so sorry about this.

  • @redmashbeats5479
    @redmashbeats5479 3 года назад

    Thts amazing. Motivates me to do the same

    • @LukeBarousse
      @LukeBarousse  3 года назад

      Heck yeah, glad this motivated you!!

  • @luiszimmermann3196
    @luiszimmermann3196 3 года назад

    Funny coincidence… last semester during a project i was also scraping job data for jobs with the search terms: data governance, data culture etc. and i encountered exactly the same issue. Now this video pops up :D

  • @sukruthav7845
    @sukruthav7845 2 года назад

    Can u share the source code?

    • @LukeBarousse
      @LukeBarousse  2 года назад

      It’s linked in the description

  • @noentry1736
    @noentry1736 3 года назад

    Dude you look like DR house

    • @LukeBarousse
      @LukeBarousse  3 года назад

      Or does Dr. House look like me... 😜

  • @ivinitmittal
    @ivinitmittal 3 года назад +2

    👍🏻👍🏻

  • @abdelkrimbentorcha6227
    @abdelkrimbentorcha6227 3 года назад

    great!

  • @MdHamid-xz3pw
    @MdHamid-xz3pw 6 месяцев назад

    0:24

  • @abcdwerrty
    @abcdwerrty 3 года назад +2

    👍

  • @meisuci4708
    @meisuci4708 3 года назад

    Do you have a WhatsApp bot?

  • @Awaksica
    @Awaksica 3 года назад +2

    First

  • @myfathertaughtmethat
    @myfathertaughtmethat 2 года назад +1

    Hey Luke, Would you be interested in developing a scraping took for us? I have minimal experience in this area, so it needs to be easy to use. I am sure you know that Facebook, LinkedIn, and seek all hold contact details of people, and we are looking to use this bot to help us find candidates for our clients. We will need to be able to search for people with specific Job titles and then capture their email and/or phone number from these sites.

    • @LukeBarousse
      @LukeBarousse  2 года назад +1

      Sorry I don't do consulting, really just trying to focus on RUclips Content

  • @honneyykhatter
    @honneyykhatter 6 месяцев назад

    the simplest way to scrape linkedin jobs is using a 3rd party linkedin scraper API. I find brightdata and scrapingdog to be the best.

  • @MdHamid-xz3pw
    @MdHamid-xz3pw 4 месяца назад

    Hotocmt

  • @hashdata-official
    @hashdata-official 2 года назад +1

    Sir I am trying to make a LinkedIn scraper from 10 days but I am not getting a proper guide to do so! I am using beautiful soup to scrap but it is unable to scrap LinkedIn page because of JavaScript! Will you make a video how to make a LinkedIn scraper? Plzzzzz🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏 I will be very thankful to you for this act of kindness ❤️🙏😭

  • @ameysonawane
    @ameysonawane 3 года назад +1

    👍