You know, I actually wanted to tell you, my friend (the one working at VW) actually does the job of a Data Engineer, Data Analyst, Data Scientist, and ML Engineer 😂 Cause he's the first data scientist the company hired. He's in a team of one person 😂
@@LukeBarousse I have encountered so many entry level jobs that requires so many years that I want to make a project focusing on that. I was hoping to scrape data from linkedin. However, your video shows that it is not a good idea. I wonder how did you solve this problem. Thank you for your kind attention.
Thanks Ken! Honestly couldn't have done it without your support and shoutouts over this past year; it has helped out more than you know!! Here's to us making it to 1 million subs!
So ı was watching one of these videos, and the guy was talking about ' how to not to violate terms and conditions' and get the data, he managed to pull an amazing stunt, see one of the issues of bots is your requests keeps their servers busy, what he did is minimizing requests action, an equivlent of opening page 1, then page 2and so on , without running any other scrabing commands, and then content of the pages he opened was saved somehow locally on their machine as a log, now he opens these files and pull the data from the saved pages offline, pretty amazing but a dirty job, you might want to look it up😅
Thanks for sharing this! LinkedIn Job search pages are actually really slow to load (compared to other pages on my machine) and so I really feel like i'm taxing the LinkedIn servers; because of this I think you offer a really interesting solution to help minimize that. I'll take a look into this, so thank you!!
@@LukeBarousse the proper wording for the thing is HAR file, HTTP Archive format, you can use google chrome to download the file and at this point instead of scraping a website overloading it's server with requests, you're basically parsing a HAR file containg the data that you need. Also one good tip when scarping: don't make your actions look like an outlier when you get analyzed by those who you're scraping, blend in the crowd, computers are fast, humans are not. 🤫😉
I dont get how this helps. If you opened a page then you have already taxed their server. Pulling info from the page isnt affecting them at all so saving the page offline is an unnecessary step.
that was a nice shared experience Luke. a friend had talks with several website administrators and it seems that big companies are trying to limit web scrapping for legal and technical reasons, as it tends to slow down the website performance.
Also very true! That's a great point on the technical reasons, that I meant to bring up in this video. 🤦🏼♂️ We sometimes don't think of how using a bot may actually slow down servers and thus effects the company's users experience for others, very valid!
That's only the second reason. The main reason is - it's their data, and managing it and curating it is part of their value offering. They limit access to prevent dilution of their assets and maximise their own monetisation potential. Why do you think they built the website in the first place?
@@wanderingtravellerAB99 absolutely. however since this informtion is made available on the internet, which is not an intranet, a website can be considered a catalog or a booklet to be shared. also just like our own information will be used by the website during our visit it would be fair to say that the visitor can also save some information also.
@@wanderingtravellerAB99 I agree with what you are saying about the value of data asset, but speed cannot be underestimated to become secondary. If you have devops friends in big companies, they can tell you website speed is gauged in microseconds. Slowing down the website for a fraction of time can cause millions of direct monetary losses.
awesome video! Thanks for sharing this content. I'm in my path to become a Data Analyst (Google DA Certificate in progress) and all this information is really useful.
Definitely wanna be careful with the “public” jobs posting page too because they can ban your IP address, and that would suck… that happened to me with the U.S. Senate website
This is a great followup to the earlier video. I'd hoped when you posted the previous video you would follow up with information on ethically dealing with the TOS limitations on various sites. Looking forward to further followup.
Reading the comments, if linkedin will not allow scrapping its data because of servers overload or business info. Then, they shoul provide this kind of analysis for all of us searching. Maybe they could charge it too, I would pay for that
Thank you Luke. I scrapped sports data recently and got banned. But it wasn't automated scrapping, and the ban was not harsh, still, would be wise to be more mindful of the scrapping we do.
I am learning about web scrapping now to tackle this project. I really love how you laid out your thought process and share with us the legality of web scrapping. Have you done any more job scrapping since this video?
Regarding public scraping. Put the py task on something like a VM or raspberry pi with a VPN. They will ban your IP at some point. Change IPs. Repeat. You will have hundreds of IPs then after a few years can just change VPN or reuse old IPs.
@@LukeBarousse I do this with a few scrapers like this. Where the companies say “no”. I got blocked by IP so I set up a raspberry Pi that runs and I just change the IP whenever a block happens.
That's why you gotta come to the dark side #🐍. 😂🤣 On a serious note, as my number one source of content on LinkedIn, I can't have you in LinkedIn Jail!
The data is publicly available without logging in to LinkedIn. However I noticed certain filter fields like easy apply, and experience level and unavailable.
I always tell my students that if they didn't create it that they don't own it. I also remind them that software they didn't write will almost always comes with Terms and Conditions, that boring legalese that few people read; if they click "Agree to Terms and Conditions," they've just likely created a binding contract with the developer. Caveat emptor.
Very cool - i actually wanted to download my connections skills for a data science project, but got too busy with work . . . but i would realy only need a 1-time download, i think? idk, thanks for reminding me of this project idea anyways!
Very Nice video Luke.. I think the bot is logging at a regular period every time every day. how about creating a random time generator at a 1 hr window time so that they can't detect?
Wouldn't setting a time.sleep() for a few seconds help the scrapping? I mean, what characterizes a bot is the speed (and automatically doing stuff obviously) but they see you're a bot because of the speed.
I work at Linkedin and learning how to scrape data with octoparse to build data sets for product designers. We have allot of internal tools and I'm curious to combine methods
Thats interesting about Octoparse, I haven't heard of a lot of people using this solution for scraping; i'll have to check it out. Thanks for sharing David. What websites have you had luck with scraping?
@@LukeBarousse I'm 100% linkedin focused, so I'm attempting to do something in the enterprise/consumer use-cases for anything professional.... using Airtable as the api for Figma or whatever software product uses. Currently stuck on doing batch url -> image downloads... and how to automate the deletion of data after captured (GDPR related). Thanks for your channel, I'm in a odd spot between design systems and trying to bridge AI or data science into product evolution.
Do you happen to know any site that has plenty of uncleaned data? Most of the data are already cleaned from websites like Awesome Public Datasets, Kaggle, and Google Data Search. Thank you!
Love scraping, but I dont think that not being logging in will somehow void their TOS. The thing is that data available to see on a website without a log in, might be still protected by some law.
I started doing a project with the same objective but found it too hard to retrieve the data. Do you have the table with that data??. Anyways, great video, I didnt know it was against policy either.
Great video "Johnny"! You may also want to look into scrapping info that contains any PII (Personally Identifiable Information) that the postings have and its relation to GDPR if that information is from an EU member country.
I'm going to have to start going by Johnny now 😂 Thats also a good point on analyzing it from the GDPR perspective... I solely focused on the US in this case but may need to think larger, thanks for this Robert!
Sir I am trying to implement your scraping code from your github repository and while installing requirements.txt file ,it is giving me this error -->" ERROR: Invalid requirement: '_ipyw_jlab_nb_ext_conf=0.1.0=py38_0' (from line 4 of requirements.txt) Hint: = is not a valid operator. Did you mean == ?" I changed = to ==,but it is still giving an error.Sir is it a mistake on my side or something is wrong with the requirements file.
It really depends. This one I probably have over a 100 hours. It’s not necessary for projects to spend this much time but sometimes I go a little overboard. Ha
I felt this. I usually stay away from scrapping nowadays. And pray to god there's some api or third party unofficial api for data. Cant be bother to invest in creating something crawl only for it get banned or ip hit. Theres another comment on admins hating crawlers. So theres that too.
I'm not sure on this one... i'd have too look at it more. Just so you're aware the Google Data Analytics certificate has financial aid available, all you have to do is apply to see if you can get it
i would like to see a tutorial on how to do webscrping with python on amazon, ebay or any other place to get data, can you please make tutorials about these topics please and keep going
@@LukeBarousse great, thank you, you really give me passion in every single vid, please keep going your content is very helpfull, can you please in the future do more technical or project based vids, once again thank you
but I still dont know how LinkedIn discovered that you use scraper. maybe you access too frequently ? scroll too fast? or your access is python request?
Probably all kinda of stuff. If User Agent isn’t set to common one, it will flag. Some sites use Akamai servers which have all kinds of anti-bot measures. Request per time frame, progressive cookie data, user agent filtering, basic http request header filtering (like if the accepted language isn’t just right).
@@LukeBarousse Sure. I once created one on linkedin but I only used selenium to login then passed on the session to the requests library to continue the scraping with ease. Using requests helps avoids having to deal with captcha and 'Are you a robot' checks. But I wasnt running it everyday so maybe I probably would have met the same fate
Nice work around for the legalities at the end but I have to address something you said. "python is the superior language" is factually incorrect, C# is far better than python, you should use it.
Funny coincidence… last semester during a project i was also scraping job data for jobs with the search terms: data governance, data culture etc. and i encountered exactly the same issue. Now this video pops up :D
Hey Luke, Would you be interested in developing a scraping took for us? I have minimal experience in this area, so it needs to be easy to use. I am sure you know that Facebook, LinkedIn, and seek all hold contact details of people, and we are looking to use this bot to help us find candidates for our clients. We will need to be able to search for people with specific Job titles and then capture their email and/or phone number from these sites.
Sir I am trying to make a LinkedIn scraper from 10 days but I am not getting a proper guide to do so! I am using beautiful soup to scrap but it is unable to scrap LinkedIn page because of JavaScript! Will you make a video how to make a LinkedIn scraper? Plzzzzz🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏 I will be very thankful to you for this act of kindness ❤️🙏😭
Where do you usually go to collect data for your data science projects? 👨🏼💻📊🤖📈
Epic Video, Luke! 👌
You know, I actually wanted to tell you, my friend (the one working at VW) actually does the job of a Data Engineer, Data Analyst, Data Scientist, and ML Engineer 😂 Cause he's the first data scientist the company hired. He's in a team of one person 😂
@@ShivekMaharaj you mean Volkswagen, no way bro
Most federal agencies have open data warehouses. I downloaded all of the PPP loan data from the Treasury department.
@@johnrussell5715 This is great to know, thank you for sharing this!
I saw an "entry level" data analyst job that required 6-8 years experience!
🤣 That's just ridiculous (and also unfortunate)! Entry level jobs should not be asking for that many years of experience
@@LukeBarousse I have encountered so many entry level jobs that requires so many years that I want to make a project focusing on that. I was hoping to scrape data from linkedin. However, your video shows that it is not a good idea. I wonder how did you solve this problem. Thank you for your kind attention.
@@yasinudun4147 Working on this project again RN actually... will have more details for how to do this in the near future
😂😂😂
😂😂😂😂😂😂
Congrats on 100K Luke!!
Thanks Ken! Honestly couldn't have done it without your support and shoutouts over this past year; it has helped out more than you know!! Here's to us making it to 1 million subs!
So ı was watching one of these videos, and the guy was talking about ' how to not to violate terms and conditions' and get the data, he managed to pull an amazing stunt, see one of the issues of bots is your requests keeps their servers busy, what he did is minimizing requests action, an equivlent of opening page 1, then page 2and so on , without running any other scrabing commands, and then content of the pages he opened was saved somehow locally on their machine as a log, now he opens these files and pull the data from the saved pages offline, pretty amazing but a dirty job, you might want to look it up😅
Thanks for sharing this! LinkedIn Job search pages are actually really slow to load (compared to other pages on my machine) and so I really feel like i'm taxing the LinkedIn servers; because of this I think you offer a really interesting solution to help minimize that. I'll take a look into this, so thank you!!
@@LukeBarousse can’t wait to hear how it went
@@LukeBarousse the proper wording for the thing is HAR file, HTTP Archive format, you can use google chrome to download the file and at this point instead of scraping a website overloading it's server with requests, you're basically parsing a HAR file containg the data that you need.
Also one good tip when scarping: don't make your actions look like an outlier when you get analyzed by those who you're scraping, blend in the crowd, computers are fast, humans are not. 🤫😉
@@928khaled Thanks for all these tips! I'm going to look more into HAR files and see if I can learn to apply it to my use case. Thanks again!
I dont get how this helps. If you opened a page then you have already taxed their server. Pulling info from the page isnt affecting them at all so saving the page offline is an unnecessary step.
that was a nice shared experience Luke. a friend had talks with several website administrators and it seems that big companies are trying to limit web scrapping for legal and technical reasons, as it tends to slow down the website performance.
Also very true! That's a great point on the technical reasons, that I meant to bring up in this video. 🤦🏼♂️ We sometimes don't think of how using a bot may actually slow down servers and thus effects the company's users experience for others, very valid!
That's only the second reason. The main reason is - it's their data, and managing it and curating it is part of their value offering. They limit access to prevent dilution of their assets and maximise their own monetisation potential. Why do you think they built the website in the first place?
@@wanderingtravellerAB99 absolutely. however since this informtion is made available on the internet, which is not an intranet, a website can be considered a catalog or a booklet to be shared. also just like our own information will be used by the website during our visit it would be fair to say that the visitor can also save some information also.
lol my hub is one of those folks working for evil corporate America trying to stop scraping..
@@wanderingtravellerAB99 I agree with what you are saying about the value of data asset, but speed cannot be underestimated to become secondary. If you have devops friends in big companies, they can tell you website speed is gauged in microseconds. Slowing down the website for a fraction of time can cause millions of direct monetary losses.
Nice content! I have nothing to add, just wanted to share my appreciation!
i just did this project last week but scraping from indeed and pretty much can relate to you so much!!!!
😂 You spend all this time building this beautiful bot, and then BAM! No more scraping!
Did you run in to the same problem?
awesome video! Thanks for sharing this content. I'm in my path to become a Data Analyst (Google DA Certificate in progress) and all this information is really useful.
Heck yeah, glad you are getting use out of my content Marcelo! Good luck with the Google Certificate!!
Congratulations on 100K Luke!
Yes! i thought we should celebrate
Thanks a lot Al! So appreciative of your shoutouts for my channel, this has helped more than you know in reaching this milestone!
And Bouseux, looking to launch a special video either later this week or next!
@@LukeBarousse 🎆
Definitely wanna be careful with the “public” jobs posting page too because they can ban your IP address, and that would suck… that happened to me with the U.S. Senate website
This is good to know, that's why I'm still hesitant! Where you scraping data from the U.S. Senate website?
This is a great followup to the earlier video. I'd hoped when you posted the previous video you would follow up with information on ethically dealing with the TOS limitations on various sites. Looking forward to further followup.
Heck yeah, thanks for the idea Arthur. More to come on this series!
great video and explanation of your process and pains!
Reading the comments, if linkedin will not allow scrapping its data because of servers overload or business info. Then, they shoul provide this kind of analysis for all of us searching. Maybe they could charge it too, I would pay for that
Nice project! I am really curios to see any kind of results of the data You have collected until now.
Yeah I'll be looking into the results in the upcoming episodes of this series!!
8:43 you're not comfortable doing that shit? I get it man, haha
🤣😂 Still deciding, but I really want that data from LinkedIn 🤷🏼♂️
This is really cool Luke.
Thanks Kumar, glad you enjoyed it!!
can you make a video about how we can legally scrape a data from linked in or other applications for business purpose...??
Yes, working on a video rn on this topic
Thank you Luke.
I scrapped sports data recently and got banned. But it wasn't automated scrapping, and the ban was not harsh, still, would be wise to be more mindful of the scrapping we do.
I am learning about web scrapping now to tackle this project.
I really love how you laid out your thought process and share with us the legality of web scrapping.
Have you done any more job scrapping since this video?
Yes! I have a video talking about how "I analyzed XXX,XXX jobs to solve this" Check it out!
awesome as usual ,luke
Thanks Youssef!! Appreciate it!
Regarding public scraping. Put the py task on something like a VM or raspberry pi with a VPN.
They will ban your IP at some point.
Change IPs. Repeat.
You will have hundreds of IPs then after a few years can just change VPN or reuse old IPs.
I like this approach. I’m interested in trying this
@@LukeBarousse I do this with a few scrapers like this. Where the companies say “no”. I got blocked by IP so I set up a raspberry Pi that runs and I just change the IP whenever a block happens.
@@alecubudulecu Oh sweet! I have a few raspberry Pi's lying around actually!
Luke! Congratulations!!! 💯🔥🥳🍻
Thanks Rem!!
Hello Luke
Thank you for providing such great videos
Love from 🇮🇳
No problem at all Rahul!! I appreciate the support!
Linkedin jail is no joke! They keep throwing me in the slammer for the same reason, and I dunno how TF to even make a web scraper!
That's why you gotta come to the dark side #🐍. 😂🤣
On a serious note, as my number one source of content on LinkedIn, I can't have you in LinkedIn Jail!
Glad you used a burner account! That’s a pretty hefty ban!
I learned that trick from you Dave 😘
Looking forward to chapter 2!
I need to get around to making this! 😳
Some very interesting points around click wraps, etc... Keen to see where you take this one.
I'm interested as well! Still brainstorming 😂 Thanks Troy!
The data is publicly available without logging in to LinkedIn. However I noticed certain filter fields like easy apply, and experience level and unavailable.
Yeah, I noticed the sme thing... although for my purposes those filters aren't really necessary so that may work
I always tell my students that if they didn't create it that they don't own it. I also remind them that software they didn't write will almost always comes with Terms and Conditions, that boring legalese that few people read; if they click "Agree to Terms and Conditions," they've just likely created a binding contract with the developer. Caveat emptor.
Very cool - i actually wanted to download my connections skills for a data science project, but got too busy with work . . . but i would realy only need a 1-time download, i think? idk, thanks for reminding me of this project idea anyways!
Glad I could motivate to do a similiar project!! Hope you dug back into this project!
May want to try to scrape glassdoor instead! I may know a playlist or something that can show you how 😉
ruclips.net/video/GmW4F6MHqqs/видео.html
😜 Should have went glassdoor instead, just watched your vid.. Your approach was a lot easier than mine 🤦🏼♂️
Ken ma man! ty for doing god's work!!
Very Nice video Luke.. I think the bot is logging at a regular period every time every day. how about creating a random time generator at a 1 hr window time so that they can't detect?
Did you find a way to overcome the 1000/2500 search limit?
Hie bro i luved ur work.
Am trying to collect jobs data but it got lot issues. Can you help me out with your git repo
Wouldn't setting a time.sleep() for a few seconds help the scrapping? I mean, what characterizes a bot is the speed (and automatically doing stuff obviously) but they see you're a bot because of the speed.
There is also browser Post info which lets server understand it is bot
Thanks for sharing Luke, I am also building my internet scraper to gather house data, hope I do not run into similar banning issues!! 🤞
If you build it as close as possible to mimic human behavior, you shouldn't have an issue. Good luck with your scraper!
I work at Linkedin and learning how to scrape data with octoparse to build data sets for product designers. We have allot of internal tools and I'm curious to combine methods
Thats interesting about Octoparse, I haven't heard of a lot of people using this solution for scraping; i'll have to check it out. Thanks for sharing David. What websites have you had luck with scraping?
@@LukeBarousse I'm 100% linkedin focused, so I'm attempting to do something in the enterprise/consumer use-cases for anything professional.... using Airtable as the api for Figma or whatever software product uses. Currently stuck on doing batch url -> image downloads... and how to automate the deletion of data after captured (GDPR related). Thanks for your channel, I'm in a odd spot between design systems and trying to bridge AI or data science into product evolution.
@@DavidCarmonaUX This project sounds awesome, David! Good luck with it, it seems like you have the hardest part figured out!
Is it hard to work as a data scientist, I mean is it a chill job where you get a lot of time left or is it hectic?
I'm a data analyst, so I may not be best to answer that
why isnt there a video on how to do this project?
Imnot a pro but do some simple automation or bot at work. You inspired me to study more and do more. Thank you sir! From the Philippines.
Thanks so much for this!! This is actually inspiring to me as well to hear, so thank you!
Do you happen to know any site that has plenty of uncleaned data? Most of the data are already cleaned from websites like Awesome Public Datasets, Kaggle, and Google Data Search. Thank you!
I have some listed in the description
by any chance did you share your code on GitHub or somewhere else?
Yeah I have a few repos shared on github
@@LukeBarousse I didn't see this repo on your Github. Where did you land with deciding whether or not to follow up on this here video?
@@godojos ruclips.net/video/7G_Kz5MOqps/видео.html
This video is the follow up!
Love scraping, but I dont think that not being logging in will somehow void their TOS. The thing is that data available to see on a website without a log in, might be still protected by some law.
THanks for sharing the awesome information but I was hoping you would take one project from the start to the finish .
Hey Hi Luke Add more videos I like your explanation and you are doing nice work.
Aww thanks so much for this Laxya! Let me see what I can do!
May I know which course did you take to design a bot that you are using for web scraping?
Yeah! Python for Web Scraping by Data Camp: lukeb.co/WebScrapingPython
@@LukeBarousse thanks! :)
I started doing a project with the same objective but found it too hard to retrieve the data. Do you have the table with that data??. Anyways, great video, I didnt know it was against policy either.
Thanks Pedro!
Great video "Johnny"! You may also want to look into scrapping info that contains any PII (Personally Identifiable Information) that the postings have and its relation to GDPR if that information is from an EU member country.
I'm going to have to start going by Johnny now 😂 Thats also a good point on analyzing it from the GDPR perspective... I solely focused on the US in this case but may need to think larger, thanks for this Robert!
@@LukeBarousse That sounds like a new Data name. Think Larger = Data Superhero (like game boss level).
You are a legend, bro!
I appreciate that my dude!! 🤙🏼
So far, I’ve only been banned from one site, which was from the UK that I would never visit regularly. I was just using it to practice.
Thanks for sharing this!
Thanks for the tips on burner account.
No probs! But I don't know if i'd recommend the burner account 🤣
hey, may linkedIn block my account if I do web scrapping in this way?
Do you have an updated video on this or does it still work?
Sir I am trying to implement your scraping code from your github repository and while installing requirements.txt file ,it is giving me this error -->" ERROR: Invalid requirement: '_ipyw_jlab_nb_ext_conf=0.1.0=py38_0' (from line 4 of requirements.txt)
Hint: = is not a valid operator. Did you mean == ?"
I changed = to ==,but it is still giving an error.Sir is it a mistake on my side or something is wrong with the requirements file.
How much time do you put into these projects including the googling and the learning?
It really depends. This one I probably have over a 100 hours. It’s not necessary for projects to spend this much time but sometimes I go a little overboard. Ha
Let Me Give You a Tip.Use Undetected Chromedriver as your webdriver in Selenium.
Still You should use all the ways to avoid detection.
Thanks for this, i'm going to try this!
I need the next part of the video. Or maybe someone has used scraperapi I really need that data from linkedin.
check out my how I use python video... I provide the data there
So web scraping is like accessing by an API just dirtier data?
Yeah, pretty much
I felt this. I usually stay away from scrapping nowadays. And pray to god there's some api or third party unofficial api for data. Cant be bother to invest in creating something crawl only for it get banned or ip hit. Theres another comment on admins hating crawlers. So theres that too.
where i can get the watch bro
amazon! It's a garmin
4:19 are you using VSCode ?
Yeah! love me some VSCode for python
Is there any site other than google to get a free data analysis course from with certificate ?
I'm not sure on this one... i'd have too look at it more. Just so you're aware the Google Data Analytics certificate has financial aid available, all you have to do is apply to see if you can get it
This remands of stories of how people hire low cost workers from India to solve the I'm Human tests for those automated bots and other shady things.
Interesting!
This is very interesting!
that was the coolest thing i have ever seen
Ha, thank you!
You left me hanging at the end. Like what does “this publically available data” mean? It seems like you cut this video off short.
Why nobody shows in their tutorials a real set of data extracted with their so mencioned "method"?
At least your IP wasnt banned.
That is valuable info.
Aww thanks Priyadhara!
is the code open sourced ? can't find any github repo ?
github.com/lukebarousse/Job_Analysis
Forgot to lkink it
Thanks i found it in an other video of yours 💜
i would like to see a tutorial on how to do webscrping with python on amazon, ebay or any other place to get data, can you please make tutorials about these topics please and keep going
Thanks for this video idea Michel! Let me see what I can do on this topic!!
@@LukeBarousse great, thank you, you really give me passion in every single vid, please keep going your content is very helpfull, can you please in the future do more technical or project based vids, once again thank you
You look like the brother of Kalle Hallden
😂😂
Noone and I mean noone makes videos quite like you.
I really appreciate this! That's my goal with my videos 🙌🏼
@@LukeBarousse Keep doing your thing man, look forward to em!
Once i used to slow down abit the script so that i could avoid recapcha
Yeah some waits to add time between requests
but I still dont know how LinkedIn discovered that you use scraper. maybe you access too frequently ? scroll too fast? or your access is python request?
Selenium can be detected
yeah, I need to look into if I can hide this...
Probably all kinda of stuff. If User Agent isn’t set to common one, it will flag. Some sites use Akamai servers which have all kinds of anti-bot measures. Request per time frame, progressive cookie data, user agent filtering, basic http request header filtering (like if the accepted language isn’t just right).
@@LukeBarousse Sure. I once created one on linkedin but I only used selenium to login then passed on the session to the requests library to continue the scraping with ease. Using requests helps avoids having to deal with captcha and 'Are you a robot' checks. But I wasnt running it everyday so maybe I probably would have met the same fate
@@thefamousdjx Thanks for sharing this!
Hope you've used a burner account
🤣
Nice work around for the legalities at the end but I have to address something you said. "python is the superior language" is factually incorrect, C# is far better than python, you should use it.
😂 I was mainly saying it as a joke... I feel all languages have their pros and cons... i'm just a python fan boi
@@LukeBarousse lol fair enough. I've been using C# for years, it's easier to learn than you might think and the multi-threading is top notch.
Nice,
You forgot to attach scrap data csv :D :D
😜
How can I contact you if I want you to
A bot for me?
I don’t consult, per se, so sorry about this.
Thts amazing. Motivates me to do the same
Heck yeah, glad this motivated you!!
Funny coincidence… last semester during a project i was also scraping job data for jobs with the search terms: data governance, data culture etc. and i encountered exactly the same issue. Now this video pops up :D
I feel your pain then!! ha
Can u share the source code?
It’s linked in the description
Dude you look like DR house
Or does Dr. House look like me... 😜
👍🏻👍🏻
🤙🏼🤙🏼
great!
🤙🏼
0:24
👍
🤙🏼
Do you have a WhatsApp bot?
I don't
First
Second!
Hey Luke, Would you be interested in developing a scraping took for us? I have minimal experience in this area, so it needs to be easy to use. I am sure you know that Facebook, LinkedIn, and seek all hold contact details of people, and we are looking to use this bot to help us find candidates for our clients. We will need to be able to search for people with specific Job titles and then capture their email and/or phone number from these sites.
Sorry I don't do consulting, really just trying to focus on RUclips Content
the simplest way to scrape linkedin jobs is using a 3rd party linkedin scraper API. I find brightdata and scrapingdog to be the best.
Hotocmt
Sir I am trying to make a LinkedIn scraper from 10 days but I am not getting a proper guide to do so! I am using beautiful soup to scrap but it is unable to scrap LinkedIn page because of JavaScript! Will you make a video how to make a LinkedIn scraper? Plzzzzz🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏 I will be very thankful to you for this act of kindness ❤️🙏😭
👍
🤙🏼