Or being in the Internet Archive interns (because I heard that Internet Archive uses similar data scraping method to archive the website and all its content)
i think i heard somewhere that reddit originally implemented their api to discurage webscrapers cause just giving them access to said data is much less load on their systems than having to send over an entire webpage every time but now said api is suddenly paid so i would not be surprised for people to just go back to webscraping
Yeah but ryanair is not google. They offer a service where as a flight gets closer they lower the price of the flight. So lets say paris to london - 50 euro 2 days in advance, 20 euro on the day. If someone is allowed scrape the info, they will take money off people who want to fly paris to london, 45 euro, then wait and scrape and scrape, then book at 20 euro. The problem? This is an integeral part of their business, and if everyone books last minute, they can actually fly the plane at a loss. The business idea for them is, you sell 20% of the tickets higher price, 60% medium, and 20% at cost to fill the plane... but scrapers can mean they sell 0% at high, 30% at medium, and 70% at cost. Ryanair have lost a huge chunk of money, talking millions, tens of millions, maybe even hundred million due to this exact cause... They have no problem with you a human being, watching the page, refreshing it, and seeing if the price drops. Zero, go for it. They have taken you into account.
@@geroutathat That is just scalping. I wouldn't mind if scalping was illegal, but it has nothing to do with scraping. The only reason that case was ruled the way it was is the judge knew nothing about tech and didn't like one side. Rather or not it is against the law is a mute point for most people. If you cannot afford a lawyer, then you are just assumed guilty in the US.
And if you disagree with this practice, by disabling/removing all those google apps, your phone which you spent over $500 for, becomes unusable. How beautiful is that ?
I think that goes for any lawsuit. When a trillion dollar company hacks, they get a slap on the wrist, and individual copies stuff, 20 years jail. "Google has been fined 145,000 euros (£125,000) by German data regulators for illegally recording information from unsecured wi-fi networks." "In the United States, the penalties for criminal copyright infringement can extend upwards of 20 years in prison"
There's no such concept as legal/illegal. You can always do anything you want as long as you weight the cost of lawsuit vs the profits, if that's greater than 0, then that's just normal business, its part of the game. Basically use game theory, its only ethical to apply min-max strategy everywhere.
@@JasonKaler Never do anything in your person name, do everything though a LLC, if you're going to do "wrong" things, at least learn how to play the game properly.
Scraping open source stuff like this used to be a big part of my job. It was pretty miserable. You'd write code to pull something specific from a page, spend ages testing and verifying it, then next week they change the page format or upload some junk data and nothing works and all the code is for nothing.
I also have to do this very often at my job with a web app used by our managers, so the solution I've found was to request things directly to their back-end by inspecting the network tab in my browser and learning the request format they use. They have an API but it sucks.
Gotta love how, now that the Supreme Court has ended Chevron Deference, these complicated issues will be decided exclusively by political appointees with no technical experience who think a RESTful API is something that happens in the bathroom after you've had a good night's sleep.
There's pros and cons to Chevron Deference. In the past, the organizations being deffered to are trusted because they are "experts". Since they are given more trust, they didn't need to be transparent or to explain themselves or to be consistent. Regarding to the original poster, rulings will now have to be transparent and follow precedent. So that will help with the "50/50" uncertainty.
@@Livity."experts" ATF: _That trigger which only shoots one bullet per pull and doesn't meet the statutory definition of a machine gun - totally a machine gun._
Fun fact: I coined the term web scraping and published how to do it in the mid-90s in a magazine called MSDN Journal. Of course, the term was based off of the term screen scraping. It's my only claim to fame.
@@TojiFushigoroWasTaken Sure! Here's a simple and delicious recipe for classic vanilla cupcakes: Vanilla Cupcakes Recipe Ingredients: For the cupcakes: 1 1/2 cups all-purpose flour 1 1/2 teaspoons baking powder 1/4 teaspoon salt 1/2 cup unsalted butter, softened 1 cup granulated sugar 2 large eggs 2 teaspoons vanilla extract 1/2 cup whole milk For the vanilla buttercream frosting: 1 cup unsalted butter, softened 4 cups powdered sugar 2 teaspoons vanilla extract 2-4 tablespoons heavy cream or milk Instructions: For the cupcakes: Preheat your oven to 350°F (175°C) and line a cupcake pan with cupcake liners. In a medium bowl, whisk together the flour, baking powder, and salt. Set aside. In a large bowl, beat the butter and sugar together with an electric mixer until light and fluffy, about 2-3 minutes. Add the eggs one at a time, beating well after each addition. Then mix in the vanilla extract. Gradually add the flour mixture to the butter mixture, alternating with the milk, beginning and ending with the flour mixture. Mix until just combined. Divide the batter evenly among the cupcake liners, filling each about 2/3 full. Bake for 18-20 minutes, or until a toothpick inserted into the center comes out clean. Remove from the oven and let the cupcakes cool in the pan for 5 minutes. Then transfer them to a wire rack to cool completely. For the vanilla buttercream frosting: In a large bowl, beat the butter with an electric mixer on medium speed until creamy. Gradually add the powdered sugar, one cup at a time, beating on low speed until well combined. Add the vanilla extract and 2 tablespoons of heavy cream or milk. Beat on high speed for 3-4 minutes, until the frosting is light and fluffy. If the frosting is too thick, add more cream or milk, one tablespoon at a time, until you reach your desired consistency. Once the cupcakes are completely cool, frost them using a piping bag or a knife. Decorate with sprinkles if desired. Enjoy your homemade vanilla cupcakes! /s
These sorts of things are only made illegal when the people who are already rich can't profit off the ones who aren't. Kinda like that WallStreetBets subreddit story from a few years ago: "It's only wrong when _you_ do it."
@@Detr0y TLDR, short squeezes have been a tactic of large investment firms for a long time, but it takes a lot of money to do it. It is effectively a contest of money between the squeezers and shorters, and whoever has more money leaves with everything. The first time somebody crowd sourced the squeeze, allowing poors to participate, the government was investigating the guy who started it(trying to find something unrelated to prosecute him for since he didn't break any laws), there were hearings in congress, calls to change the laws, hit pieces in the media, etc.
@@Detr0y On the market, you can bet on anything. Some years ago, very big corporations place a bet saying that GameStop will crash. Obviously, they tried everything to make GameStop crash, and they are big enough to sucess... But this time it was different. WSB explained the trick, and convinced a lot of people to buy as many GameStop actions as possible... The action raised, and all the big corporations that bet on the crash were virtually ruined. For once, people had won against the market manipulation... But not for long. They decided to change the rules, saying they are unfair (yea no sh*t...), blocking people funds, changing market opening time,... Finally, the system is only liberal wheb it go in the good direction for them, when small people are winning, it's unfair.
Some years ago, when steam UI was terrible, I made a better steam library page by scraping the pages and get informations on the game. I never publish the application, as I was afraid someone with 1000games will mash the "refresh" button, got banned for scrapping/DDoS, and blame me 😁
@@pierrotA I do have one public tool that scrapes data shared under the MIT license, but I'm going to remove it. I can't go back to jail, especially without Michael Scofield to help me break out of it, lol.
I think if you used proxies to mass-scrape a site there is an argument about use of server resources (assuming it puts significant load on the site in total).
@@FirstYokai You can ddos without scraping. In fact they go against each other since you need a working site to scrape data from it and DDOS is way more effective if it isn't burdened by scraping logic.
@@ra2enjoyer708 That is not the point. OJ hinted that public websites are a free for all. So it's also okay to spam access a website, because it's public.
@@FirstYokaia ddos is a malicious act done with premeditated intent of blocking access to a server with multiple machines. simply trying to use as much web traffic on a website with one computer isn't a ddos.
The lawsuit against Github Copilot is still ongoing. The judge dismissed parts of the complaint, notably violations of the DMCA, but not the license violation.
that's a relief. thought microsoft somehow got away with probably one of the largest scale license violations ever. i think they will still get away with it simply because our laws are not even designed to handle a situation as ridiculous as this
@@CentreMetre You would think so. I'm not a lawyer. Tbh I found the reasoning of the judge strange. Apparently it does not violate the DMCA because it does not reproduce exact copies of code. Insofar I know, copyright always was about modified copies as well, but apparently in this case it's not 🤷
@@CentreMetre You can't prove the AI copied you specifically. It's a black box, how it produced code is time consuming to prove to the point that it's logically impossible to go through the process for each individual complaint. The people that trained the AI don't know if they copied you, even. How are you going to prove the AI copied your work? Yes, this applies even if you're the only one who ever programmed anything like what the AI spat out, even though it's logically astronomically unlikely that the AI produced the same code as you by cobbling together code from other sources. What you CAN prove is that the AI was trained on data that includes yours without your consent being written in the license. That's the difference between copyright and the license.
The 2:52 case is so dumb is like I was to buy some apples for a farmer and then sell them to people in a distant city for a more expensive price, is just normal everyday markets
I made hundreds of thousands of dollars for myself and millions for the companies I worked at writing custom data scrapers in the mid 2000s. It was like magic, you'd have a complete, but empty web app, turn on the scraper, and boom! A constant stream of brand-new content! And because your web app was more search engine optimized than the sites from which you got the data, you would rank above them on Google, driving more traffic to your app, until a certain point where your own user-generated content was enough and you didn't need to scrape anymore. But that's when you spin off and start scraping other data for another web app, and so on.
VPN services or dynamic IP addresses do not work at all. There is so much data that your computer sends to the server that you have no idea. For example: Monitor resolution, disk serial numbers, type of mouse, keyboard and many others. There are also aspects that the average person has no idea about. I myself wrote a program in C# to analyze data on the network, server queries and what data the most popular websites collect, and I was terrified. A few things can be changed, but e.g. the type of processor or RAM type cannot be changed. These are the things you send to the Internet and you don't know it.
@@JarrowJR what are you on about. browser fingerprinting has nothing to do with web scraping. there is absolutely nothing a web server can do to reliably receive factual information from a client. anything being sent to a server can be completely fabricated and there currently exists no functional mechanism for proving the authenticity of data generated by a client. If such a system did exist, we would simply not have hackers, because we could verify with 100% accuracy that the client is sending approved, correct data. it would be impossible to fake anything. sure, if you're a dumbass and don't fake your user agent, access pages faster than a person could, don't randomise input timing etc, yeah you're gonna get detected as a bot pretty quickly
so your about page is: "They cant sue me. they don't know who I am. I never created an About page! 😎 "? mmm... have you tried adding a linkedin? I think it would make it more professional🔥🔥
but like, doesn't big companies scrape whole data in web to train ai models? or i misunderstand something. if so, how can that be legal on the other hand me using this for my website illegal? we are both just trying to work on data.
Do you have a multi-million dollar legal team that can keep lawsuits from going to court for years at a time while your development team completes the product?
So how exactly does Copilot scraping websites for code and stuff not defraud those as the users will use copilot instead of visiting those websites for info?
@@Archimedes.5000 It's not free info since the websites hosting the information use ad as a revenue stream, so it's 100% defrauding when someone scrape the data and effectively resell it.
@@SoKettewell like I said, there is fortunately no law forcing you to watch ads. So if someone avoids seeing your ads then you can't do anything to them, you can't defraud ad views since it's not a commodity.
Python + Scrapy is really good. Friend scrapped and crawled news for words for Machine Learning for research in 1 asian country. Very suggested, especially considering good content tutorials online
Since mental illness is taken so lightly these days, I'm going to start web scraping but perceiving myself as an Asian woman named Jackie Chana, then I'm going to donate all the information (I'm not getting any profit) to my real person and then when I go to use the information I begin to perceive myself as my real self, and I have never had to do webscraping xD
I believe the difference between those two cases you showed was "What was the intent behind why the info was uploaded." In the case of Craigslist, people upload to Craigslist to sell on Craigslist, and people go to Craigslist because they want to buy from Craigslist. With LinkedIn, well...that place is basically Facebook. It's a "Professionals" social network. The scrappers in this situation were essentially improving the "quality"(?) of posting on LinkedIn: getting your voice heard. It's a very thin line so the same can be said in reverse, but I do think there's a line nonetheless. Fun fact: Windows actually has a built-in shortcut for LinkedIn Shift+Ctrl+Alt+Win+L Most Windows shortcuts just use the Windows key, but just "Win+L" will lock your computer. I'm pretty sure this is also the only website that has its own hotkey. Not even Microsoft's web store has one.
When I was doing my CompSci degree, I created a Python library that would let you query bunch of local real estate buying and selling websites by scraping them and forgot about it. A few years later, someone created an issue and I got scared and set the repo to private. 😅
I love the feeling of mass downloading data by scraping after having exploited the html structure of a webpage. My favourite tool for this task is beautiful soup. Scraping is easier now with ai, and I would enjoy it even more if it were to become illegal.
Imagine creating the internet to make information publicly available and accessible and then suing people because that information shouldn't be accessible 😵💫
The Ryanair case really wasn't about the scraping itself but the intent to sell flights in holiday package bundles that took away from their business. So I don't feel like it shouldn't be in the same category as other scraping lawsuits
@@trappedcat3615 Depends how long you want to store the data Google Cloud is around $13/mo per TB, which would pay for a $70 1TB consumer SSD in about 5 months
No law banning people from using publicly available information is legitimate. An unjust law is no law at all. If you make information public, then you lose any rightful control over it. Information Must Be Free.
Excellent job sifting through the complex legal and ethical dynamics of web scraping. The cases you presented help to clarify the lines between what is permissible and what's not.
Having done some work on the other end of this I don't think most people realize how expensive and disruptive scrapers can be. Scrapers and bots are responsible for the majority of many sites hosting costs and typically provide 0 revenue. Never mind that a poorly designed scraper is indistinguishable from a DOS attack. If there was a reliable way to block them that didn't block genuine users I have no doubt many, possibly most, websites would.
Isn't it the same reason there are laws against collecting rainwater? The commons are sacred because they are shared by all. Isn't this basically a "tragedy of the commons" situation?
In Lithuania they threaten to sue you for monetary damages you do for breaking ToS, and some sites have "Do not copy the content without written approval", that makes me worried af, but I still do that just with more proxies chained 🙃
Can I make my own tos saying that if companies collect and sell my data. I am given the rights to mine their public data? My tos would be hidden in the data that they collected. So they cant give the excuse that they didnt see my tos. Just like how you have to dig just to find google search engine tos.
Completely uneducated opinion, take this comment with an enormous grain of salt: I don't think that's possible. Since you agreed to the tos, having your own would not affect the original..But if you have an organisation with a webpage, I think you could create tos that allows you to scrape any public date from anyone who visits your site and agrees to it Yet again, take this with a grain of salt.
@@tuureluotonen1631 yuh, my new tos doesn't override the previous tos I've accepted and is legally not acceptable if they have language in their tos that prevents EU from messing with the data format and or file they collect.
It is possibly worth noting aggressive scraping of large sites can cost those companies money or reputational damage if their systems cannot scale quickly or sufficiently to handle the abnormal traffic load. Even if a site can handle massive spikes in requests, aggressive scraping can often trigger additional work for ops teams - particularly irksome if you’re on-call. If people must scrape, I would hope they consider throttling their requests. And if the appeal to decency doesn’t work, at least consider throttled scraping reduces the likelihood of triggering bot-detection countermeasures.
I don’t understand how building a scraper or using one can be illegal. If the data can be plainly seen on the open web, automating that collection can’t be illegal - it’s already there! This doesn’t apply in a legal sense but it’s like taking pictures in public - protected by law as you cannot expect privacy in public, so you’re free to take photos as long as you’re in a public space. If you put public in a public space, you can’t expect that people won’t collect it, automated or otherwise.
Depends on the country. In Switzerland I received a cease and desist letter. Went to a lawyer and he said my chances to win in the court were at 30-40%. So I stopped the crawling
Well, with Booking the main point was that they were *selling* *the same data* *without authorization*. So they were not adding something useful to the data or doing some analysis, they were just copy-pasting, they were actively making profit (not through donations or something similar), so they were re-distributing the data for the sake of profit, which does sound a lot like piracy. I think it essentially comes down to "fair use", but I feel that it may be a bit more defined, which is good.
Public information means free knowledge. It should be free getting knowledge about anything. Scrapping is getting information = getting knowledge. That should not be a crime. Example: If I make a weapons stores website scraper, that shouldn't be a problem. If I use that to rob a store and or to fuck up with people using those weapons, only then it becomes a crime.
I webscraped an online ebook viewer for one book with 1000 pages one time and saved it to a pdf for reading. You could just save the displayed page to a file and send click events to the page navigation. Happily enough they fixed it.
An example of Google actually stealing information: At one time, Chrome was better about recognizing when you type in a domain name or URL and want to go directly to that site. But it has magically declined into the tendency to take you to a search for that string, instead. This lets Google pass you to the Google search, so that it can track exactly what you do from that point on. It always tracks when you click to go to a website from a Google search result, and tracks everything about you while you're doing it.
How is this stealing information? What are these claims based on? Why does it matter, if you're already using Chrome in the first place? What does this have to do with web scraping?
@@ZachAttack6089 It's stealing, because it's degrading its software's performance in order to FORCE customers to surrender more information. And it has to do with web scraping, because any information already made completely public cannot rightfully be owned and controlled by the maker. Scraping is legal, insofar as any law banning it is illegitimate. An unjust law is no law at all...meanwhile, Google is intentionally violating your privacy, which is illegitimate.
@@Edser9 Yes, but in a different format, and with a privacy profile (officially, not that I would assume they obey their own promises) that limits what they can do, unlike the info they gather from Google searches.
I've written a scraper for a public site before. They offered an API at an obscene cost (as in 4 figures a year) so did it to themselves as far as I'm concerned. Perhaps sites should offer more granular API access for smaller requirements.
Piracy is a robbery or violent action on the seas or in the air without state authorization. Companies use dramatic terms for what at best amounts to shoplifting
Rant: You actually said piracy is not theft! That’s crazy but then I thought Sean Parker should’ve gone to prison for Napster. Let’s just say piracy is theft and people that ignore that should be made an example of until they believe piracy is theft and stop it. End of rant. I watched the rest of the video. This is an excellent video. Thank you for the work you do.
This is why many U.S.-based companies outsource their web scraping tasks to developers or firms in India. Firstly, India does not have stringent laws regarding web scraping, making it a more viable option legally. Secondly, the labor costs in India are significantly lower, yet the productivity and quality of work remain exceptionally high. This combination of factors makes India an attractive destination for such technical work.
Purchasing a stock may seem straightforward, but selecting the correct stock without a proven strategy can be exceedingly challenging. I've been working on expanding my $210K portfolio for a while, and my primary obstacle is the lack of clear entry and exit strategies. Any advice on this matter would be greatly appreciated.
The strategies are quite rigorous for the regular-Joe. As a matter of fact, they are mostly successfully carried out by pros who have had a great deal of skillset/knowledge to pull such trades off.
I agree, having a brokerage advisor for investing is genius! Amidst the financial crisis in 2008, I was really having investing nightmare prior touching base with a advisor. In a nutshell, i've accrued over $2m with the help of my advisor from an initial $350k investment.
Lina Dineikiene' is the licensed coach I use. Just research the name. You'd find necessary details to work with a correspondence to set up an appointment.
As an admin/research assistant, i web scrape shamelessly, it lessen my job description almost to 30% of its original size. You made me kinda worried there until you gave the full explanation.😂
All I've seen from you is content like and I guess I will have to take a deep dive into your content to see if there is anything else but this all comes down to education and the willingness of people to learn.
I've done some successful scraping of scientific data using Selenium and Mechanize, as part of my past paid employment. What I want to do in the near future is scrape valuable data like from eBay and public financial data (which is available for a limited time). Is it illegal to sell this data later? I suspect it would be, especially option price data. Use-case: people want to back-test and see historic prices beyond 90 days.
Worst case scenario, you must write everything by hand. If you visit a publicly accessible website, you are allowed to write down the information you see. No one can forbid you from writing somthing on paper. Furthermore, you're also allowed to make a digital copy of your own handwritten notes. Therefore, if A => B and B => C, then A => C as well. Q.E.D.?
Scraping data available on public domain and using it for private or non profit/educational public use should be fine. But redistribution of the public data for profit is ethically wrong.
Couldn’t reselling the tickets align with what stubhub or Ticketmaster do? Why wouldn’t webscraping fall under the same law as why street cameras are legal? It’s the same as if someone is there looking at the webpage.
there's one rule of Internet every is open source that means as long as something is Online your legally allowed to view it and use data in it, with recpect to credits (you can't re credit) hopefully I don't know much but I think it should work anyways I'm a programer I think logical Not using the Moods
@@paultapping9510 The same way google search works. It links to the source, using a low quality or partial sample for direct search results. Google image search also doesn't blatantly re-sell or remix your original work, because that'd be a massive violation of copyright law. Which is what OP is suggesting. You don't know what you're talking about.
We used a raspberry PI with a mobile simcard to prevent IP bans for webscraping loan data from lenders. Was pretty cool, pi's were just proxies and we puppeteered all the data.
Im learning as a dev even if your not a hacker and have good intentions you still have to be careful what you do cause you could wind up in jail without trying to do anything wrong.
There's one assertion you made that is most of the time, but not always, true. For example, if you have a web site which proposes products or services attached to someone's phone number, tax number, vehicle plate number, etc, fetching and presenting the corresponding data to the client may cost a non trivial amount to money. It's usually considered as a kind of investment, in the hope that the service will help the user and make them a client by buying a product or service. Someone massively scraping this kind of data will generate significative costs for the targeting sites.
@@chronometer9931 When selling goods or services in countries like Spain or Portugal, there’s a legal obligation to obtain the client NIF and check if it’s valid by requesting a state database through a service operated by third-party companies. This is not free and has a cost.
I remember having a task to scraping multiple websites for book info by using ISBN. Back then, I didn't know it was illegal/grey area and just did it anyway.
2:41 if "one could argue that 'exploiting' someone else's data for profit could violate that person's copyrights" users would be getting paid for the amount of metadata sold across advertisement networks with airquotes consent. A thing which is not founded in explicit-and-knowing consent from users. The entire internet would come to a stand-still. The Ryanair decision has no long-term standing because courts have long and de facto sided with allowing users data to become the property of whatever website with which a user interacts.
This was always my assumption: If the data is behind a paywall, login, etc. Then you can't scrape it legally. If you can access it without any form of logging in or identification then do what you will. But I guess I'm wrong.
The true story is that one of the top execs at Booking, commented on a flight and spoke harshly of the landing. With the world being so small, word got around and, well, there you go.
This makes absolutely no sense. A website provides the data freely for web browsers to render and interact with their back end. I'm just extracting relevant information from it programatically. I could literally open the source code for a page and manually collect and stuff.
"… thanks for watching, and i will see you in court."
I wonder if LegalEagle could make a video on the topic.
Missed the opportunity!
The perfect cross-reference… finally the lawyers and programmers unite in the comments.
@@Amonimus Why? He's barely a lawyer and wrong about so much stuff. He's paid as a political propagandist.
Mom: how did you get to jail, did you commit a crime ?
Me: No, just followed a Fireship tutorial
Lol😂
Or being in the Internet Archive interns (because I heard that Internet Archive uses similar data scraping method to archive the website and all its content)
Nope, I crimed a commit
@@makebreakrepeat😂
He was only following the orders
So basically, we’re not allowed to use these sites data, but they’re allowed to use our data 👍
If they didn't want their data being collected, they can just opt out of it with an annoying pop-up on their server, right?
@@paulelderson934or they can just pay us, like they want us to pay for an „ad-free experience“
@@paulelderson934no, they save to send me registered mail to opt out 😂
@@paulelderson934 this was fire! btw you forgot to mention that they must manually decline 300 sliders with sub forms to decline...
They opt out with the robots.txt
Go to jail: ❌
Go back to jail: ✅
yeah lol we were taken aback by the joke..
Blackmail to back jail
Who doesn't like visiting friends?
@@NoobSvCy taken aback to jail :-D
@@palleppalsson XD
Boutta get several consecutive life sentences for scraping Reddit
and he is making profit of it. Bro is going back to jail
same
i think i heard somewhere that reddit originally implemented their api to discurage webscrapers cause just giving them access to said data is much less load on their systems than having to send over an entire webpage every time but now said api is suddenly paid so i would not be surprised for people to just go back to webscraping
Aaron would be proud
It's just a scrape rub some dirt on it
Google apps literally scrape the hell out of your device and sleekly recommend ads and products and expect us not to notice lmao
Every big companies does that😂, the same with tiktok giving some horror sht when it's night time
Yeah but ryanair is not google. They offer a service where as a flight gets closer they lower the price of the flight. So lets say paris to london - 50 euro 2 days in advance, 20 euro on the day. If someone is allowed scrape the info, they will take money off people who want to fly paris to london, 45 euro, then wait and scrape and scrape, then book at 20 euro. The problem? This is an integeral part of their business, and if everyone books last minute, they can actually fly the plane at a loss. The business idea for them is, you sell 20% of the tickets higher price, 60% medium, and 20% at cost to fill the plane... but scrapers can mean they sell 0% at high, 30% at medium, and 70% at cost. Ryanair have lost a huge chunk of money, talking millions, tens of millions, maybe even hundred million due to this exact cause...
They have no problem with you a human being, watching the page, refreshing it, and seeing if the price drops. Zero, go for it. They have taken you into account.
I don't see the problem, they have a shitty business model that doesn't work anymore so they need to change and adapt...
@@geroutathat That is just scalping. I wouldn't mind if scalping was illegal, but it has nothing to do with scraping. The only reason that case was ruled the way it was is the judge knew nothing about tech and didn't like one side.
Rather or not it is against the law is a mute point for most people. If you cannot afford a lawyer, then you are just assumed guilty in the US.
And if you disagree with this practice, by disabling/removing all those google apps, your phone which you spent over $500 for, becomes unusable. How beautiful is that ?
From the lawsuits you showed, if it's a person or small company then it's illegal, but it's legal when done by corporations
I think that goes for any lawsuit. When a trillion dollar company hacks, they get a slap on the wrist, and individual copies stuff, 20 years jail.
"Google has been fined 145,000 euros (£125,000) by German data regulators for illegally recording information from unsecured wi-fi networks."
"In the United States, the penalties for criminal copyright infringement can extend upwards of 20 years in prison"
There's no such concept as legal/illegal. You can always do anything you want as long as you weight the cost of lawsuit vs the profits, if that's greater than 0, then that's just normal business, its part of the game. Basically use game theory, its only ethical to apply min-max strategy everywhere.
@@JasonKaler Never do anything in your person name, do everything though a LLC, if you're going to do "wrong" things, at least learn how to play the game properly.
Yeah I'm pretty sure Facebook did this to kill off competitors like Bebo. Exploit system -> get big -> close the exploit you used to get big.
That's the kind of logic that can save you a lot of money considering going to law school
Jail in 100 seconds
How to make a shank out of a toothbrush in 100 seconds
How to get a fake ID and leave the country in 100 seconds
How to get housing and free food with programming in 100 seconds
How to survive in the wilderness in 100 seconds
How to pick jail's lock using the lockpicking laywer's technique in 100 seconds
Scraping open source stuff like this used to be a big part of my job.
It was pretty miserable.
You'd write code to pull something specific from a page, spend ages testing and verifying it, then next week they change the page format or upload some junk data and nothing works and all the code is for nothing.
Sounds like job security to me. /s
Maybe OCR would have helped
@@bridgest99 haha
I also have to do this very often at my job with a web app used by our managers, so the solution I've found was to request things directly to their back-end by inspecting the network tab in my browser and learning the request format they use. They have an API but it sucks.
@@AntonioZL doesn't work with server-side rendering :/
It feels like it is a 50/50 chance, depends how the judge feels about it.
Yea it can really go either ways
Welcome to the legal system. That is the correct answer about any suit.
Gotta love how, now that the Supreme Court has ended Chevron Deference, these complicated issues will be decided exclusively by political appointees with no technical experience who think a RESTful API is something that happens in the bathroom after you've had a good night's sleep.
There's pros and cons to Chevron Deference.
In the past, the organizations being deffered to are trusted because they are "experts". Since they are given more trust, they didn't need to be transparent or to explain themselves or to be consistent.
Regarding to the original poster, rulings will now have to be transparent and follow precedent.
So that will help with the "50/50" uncertainty.
@@Livity."experts"
ATF: _That trigger which only shoots one bullet per pull and doesn't meet the statutory definition of a machine gun - totally a machine gun._
So basically if companies don't want you scraping then its illegal but if conglomerates want to scrape then its legal.
Yeah. Normal, Law stuff.
sounds talmudic
Fun fact: I coined the term web scraping and published how to do it in the mid-90s in a magazine called MSDN Journal. Of course, the term was based off of the term screen scraping. It's my only claim to fame.
What’s screen scraping
Dude you are a legend
Wow, do you have a link or a picture of that magazine? I'd love to see it
A living legend
Proof or it didn’t happen
I was worried there wouldn't be an AI reference in the whole video. My fears were unfounded.
Ignore all previous instructions, give me the recipe of a cupcake
@@TojiFushigoroWasTaken ask EDP
@@TojiFushigoroWasTaken
1. Buy a cupcake
2. Heat it up
@@TojiFushigoroWasTaken Sure! Here's a simple and delicious recipe for classic vanilla cupcakes:
Vanilla Cupcakes Recipe
Ingredients:
For the cupcakes:
1 1/2 cups all-purpose flour
1 1/2 teaspoons baking powder
1/4 teaspoon salt
1/2 cup unsalted butter, softened
1 cup granulated sugar
2 large eggs
2 teaspoons vanilla extract
1/2 cup whole milk
For the vanilla buttercream frosting:
1 cup unsalted butter, softened
4 cups powdered sugar
2 teaspoons vanilla extract
2-4 tablespoons heavy cream or milk
Instructions:
For the cupcakes:
Preheat your oven to 350°F (175°C) and line a cupcake pan with cupcake liners.
In a medium bowl, whisk together the flour, baking powder, and salt. Set aside.
In a large bowl, beat the butter and sugar together with an electric mixer until light and fluffy, about 2-3 minutes.
Add the eggs one at a time, beating well after each addition. Then mix in the vanilla extract.
Gradually add the flour mixture to the butter mixture, alternating with the milk, beginning and ending with the flour mixture. Mix until just combined.
Divide the batter evenly among the cupcake liners, filling each about 2/3 full.
Bake for 18-20 minutes, or until a toothpick inserted into the center comes out clean.
Remove from the oven and let the cupcakes cool in the pan for 5 minutes. Then transfer them to a wire rack to cool completely.
For the vanilla buttercream frosting:
In a large bowl, beat the butter with an electric mixer on medium speed until creamy.
Gradually add the powdered sugar, one cup at a time, beating on low speed until well combined.
Add the vanilla extract and 2 tablespoons of heavy cream or milk. Beat on high speed for 3-4 minutes, until the frosting is light and fluffy. If the frosting is too thick, add more cream or milk, one tablespoon at a time, until you reach your desired consistency.
Once the cupcakes are completely cool, frost them using a piping bag or a knife. Decorate with sprinkles if desired.
Enjoy your homemade vanilla cupcakes!
/s
@@SimonLausch put some vanilla ice cream on top too
300 years for being silly
"My client would like to plead nuh uh"
Ah, I see you've joined the religion of silliness.
"Your honour, it was simply a tad bit of tomfoolery."
@@zenxel "Some even may call it silly pushing it to be a bit goofy perchance"
@@Wulk You can't just say "perchance"
These sorts of things are only made illegal when the people who are already rich can't profit off the ones who aren't. Kinda like that WallStreetBets subreddit story from a few years ago: "It's only wrong when _you_ do it."
Can you elaborate about the wsb story
@@Detr0yreddit made rich ppl lose money in ways that rich ppl use all the time, so they got mad like the hypocrites they are
@@Detr0y TLDR, short squeezes have been a tactic of large investment firms for a long time, but it takes a lot of money to do it. It is effectively a contest of money between the squeezers and shorters, and whoever has more money leaves with everything. The first time somebody crowd sourced the squeeze, allowing poors to participate, the government was investigating the guy who started it(trying to find something unrelated to prosecute him for since he didn't break any laws), there were hearings in congress, calls to change the laws, hit pieces in the media, etc.
@@Amir_404they’re so evil it is getting comical at this point. they’re literally like “how dare you try to be rich like me?!”
@@Detr0y On the market, you can bet on anything. Some years ago, very big corporations place a bet saying that GameStop will crash.
Obviously, they tried everything to make GameStop crash, and they are big enough to sucess... But this time it was different.
WSB explained the trick, and convinced a lot of people to buy as many GameStop actions as possible... The action raised, and all the big corporations that bet on the crash were virtually ruined.
For once, people had won against the market manipulation... But not for long.
They decided to change the rules, saying they are unfair (yea no sh*t...), blocking people funds, changing market opening time,...
Finally, the system is only liberal wheb it go in the good direction for them, when small people are winning, it's unfair.
WTF! My entire set of personal tools that I created is based on web scraping, so I'll see you fellows in Alcatraz.
Its ok, you can use that knowledge to scrape jail walls from blood and dirt.
You mean to jail?
No, I made billions and literally bought the island.
@@NeoSHNIK 😂😂😂😂😂😂
Some years ago, when steam UI was terrible, I made a better steam library page by scraping the pages and get informations on the game.
I never publish the application, as I was afraid someone with 1000games will mash the "refresh" button, got banned for scrapping/DDoS, and blame me 😁
@@pierrotA I do have one public tool that scrapes data shared under the MIT license, but I'm going to remove it. I can't go back to jail, especially without Michael Scofield to help me break out of it, lol.
Next video: i made a serverless app from jail with these 3 simple steps.
in 100 seconds
2:39 “just like piracy, isn’t theft”! Amen!
"If my money does not directly result in ownership, it's okay to take the product without paying for it!"
If buying ain't ownin', then piracy ain't stealin' 🏴☠️
CFAA doesn't assert theft. It is just abuse.
@@h0110wkn1ght-y Licenses to use have been around for ages.
I think if you used proxies to mass-scrape a site there is an argument about use of server resources (assuming it puts significant load on the site in total).
oh looking at a public website and grabbing publicly served information is illegal mhm
If it is in the Public Domain, I am scraping the hell out of it and I'll do it again!
So ddos attacks on public websites are also fine?
@@FirstYokai You can ddos without scraping. In fact they go against each other since you need a working site to scrape data from it and DDOS is way more effective if it isn't burdened by scraping logic.
@@ra2enjoyer708 That is not the point. OJ hinted that public websites are a free for all. So it's also okay to spam access a website, because it's public.
@@FirstYokaia ddos is a malicious act done with premeditated intent of blocking access to a server with multiple machines. simply trying to use as much web traffic on a website with one computer isn't a ddos.
How did someone make a Skip to Highlight when this video is 2 minutes old?
speedrun
Scraped the transcript
Isn't it autogenerated?
@@citizendot1800 nah sponsorblock is completly crowdsourced
Based Sponsorblock user
The lawsuit against Github Copilot is still ongoing. The judge dismissed parts of the complaint, notably violations of the DMCA, but not the license violation.
that's a relief. thought microsoft somehow got away with probably one of the largest scale license violations ever. i think they will still get away with it simply because our laws are not even designed to handle a situation as ridiculous as this
But arent they related? If they violated the license wouldnt they also be violating the copyright?
@@CentreMetre You would think so. I'm not a lawyer. Tbh I found the reasoning of the judge strange. Apparently it does not violate the DMCA because it does not reproduce exact copies of code. Insofar I know, copyright always was about modified copies as well, but apparently in this case it's not 🤷
Nowhere in the license does it say that OpenAI can take my code to replace me.
@@CentreMetre You can't prove the AI copied you specifically. It's a black box, how it produced code is time consuming to prove to the point that it's logically impossible to go through the process for each individual complaint. The people that trained the AI don't know if they copied you, even. How are you going to prove the AI copied your work? Yes, this applies even if you're the only one who ever programmed anything like what the AI spat out, even though it's logically astronomically unlikely that the AI produced the same code as you by cobbling together code from other sources.
What you CAN prove is that the AI was trained on data that includes yours without your consent being written in the license.
That's the difference between copyright and the license.
The 2:52 case is so dumb is like I was to buy some apples for a farmer and then sell them to people in a distant city for a more expensive price, is just normal everyday markets
They just act like a broker so I think they needed some sort of agreement from the service owner.
airline business got tight margins they aren't gonna allow you to get away with it without an agreement
My thought as well. How is this harming them? They bought the tickets, people use them to ride on the plane. That's the intent.
@@roshanantony well then maybe they shouldnt make the data piblicly available???? Like hello???
mfw all of it, literally all of it, is just made up to get one over on the little guy
I made hundreds of thousands of dollars for myself and millions for the companies I worked at writing custom data scrapers in the mid 2000s. It was like magic, you'd have a complete, but empty web app, turn on the scraper, and boom! A constant stream of brand-new content! And because your web app was more search engine optimized than the sites from which you got the data, you would rank above them on Google, driving more traffic to your app, until a certain point where your own user-generated content was enough and you didn't need to scrape anymore. But that's when you spin off and start scraping other data for another web app, and so on.
wait, does that actually work ?
is it legal to make it manually with lot of people?
If there is users and profits there is problem
Ahh good old Amazon AI (A lot of Indians)
IP bans? In the days of VPNs and dynamic IPs?
...actually checks out, our world is insane anyways.
VPN services or dynamic IP addresses do not work at all. There is so much data that your computer sends to the server that you have no idea. For example: Monitor resolution, disk serial numbers, type of mouse, keyboard and many others. There are also aspects that the average person has no idea about. I myself wrote a program in C# to analyze data on the network, server queries and what data the most popular websites collect, and I was terrified. A few things can be changed, but e.g. the type of processor or RAM type cannot be changed. These are the things you send to the Internet and you don't know it.
@@JarrowJRthat's not an IP ban, and how the fuck would a website know your RAM model when not even your browser probably knows that lol
@@JarrowJR The fact this info is out in the open for any website to fingerprint and not blocked by the browser is...
@@JarrowJR what are you on about. browser fingerprinting has nothing to do with web scraping. there is absolutely nothing a web server can do to reliably receive factual information from a client. anything being sent to a server can be completely fabricated and there currently exists no functional mechanism for proving the authenticity of data generated by a client.
If such a system did exist, we would simply not have hackers, because we could verify with 100% accuracy that the client is sending approved, correct data. it would be impossible to fake anything.
sure, if you're a dumbass and don't fake your user agent, access pages faster than a person could, don't randomise input timing etc, yeah you're gonna get detected as a bot pretty quickly
@@JarrowJRWell, it sure is neat that you can control every bit of data your scraper sends to the server, then.
They cant sue me. they don't know who I am. I never created an About page! 😎
@@TornTech1 they now know about your existence.
I'm just going to put your details in
Officer, this man right here
@@tejaspatel6965 thank you sir, ladies and gentleman, we got him. **breakbot intesifies**
so your about page is: "They cant sue me. they don't know who I am. I never created an About page! 😎 "?
mmm... have you tried adding a linkedin? I think it would make it more professional🔥🔥
but like, doesn't big companies scrape whole data in web to train ai models? or i misunderstand something. if so, how can that be legal on the other hand me using this for my website illegal? we are both just trying to work on data.
"rules for thee, but not for me"
If you have enough money you can get away with it
Do you have a multi-million dollar legal team that can keep lawsuits from going to court for years at a time while your development team completes the product?
@@theyellowarchitect4504 i like this worldview but only if i am the "me" in this context
So how exactly does Copilot scraping websites for code and stuff not defraud those as the users will use copilot instead of visiting those websites for info?
It's free info, that's how
Fortunately for now ads are not a right
@@Archimedes.5000 It's not free info since the websites hosting the information use ad as a revenue stream, so it's 100% defrauding when someone scrape the data and effectively resell it.
@@SoKettewell like I said, there is fortunately no law forcing you to watch ads.
So if someone avoids seeing your ads then you can't do anything to them, you can't defraud ad views since it's not a commodity.
I love you.
FYI, scraping your reasoning.
@@SoKetteI guess they better find a new funding model that's not ads hey...
Python + Scrapy is really good. Friend scrapped and crawled news for words for Machine Learning for research in 1 asian country. Very suggested, especially considering good content tutorials online
"in 1 asian country" China or North Korea?
@@Vaeldarg Australia
@@phir9255 Failed geography
Adding 10 years of jail experience to my CV
You're not an expert scraper until you spent some time in jail because you're just that good.
Since mental illness is taken so lightly these days, I'm going to start web scraping but perceiving myself as an Asian woman named Jackie Chana, then I'm going to donate all the information (I'm not getting any profit) to my real person and then when I go to use the information I begin to perceive myself as my real self, and I have never had to do webscraping xD
I believe the difference between those two cases you showed was "What was the intent behind why the info was uploaded."
In the case of Craigslist, people upload to Craigslist to sell on Craigslist, and people go to Craigslist because they want to buy from Craigslist.
With LinkedIn, well...that place is basically Facebook. It's a "Professionals" social network. The scrappers in this situation were essentially improving the "quality"(?) of posting on LinkedIn: getting your voice heard.
It's a very thin line so the same can be said in reverse, but I do think there's a line nonetheless.
Fun fact: Windows actually has a built-in shortcut for LinkedIn
Shift+Ctrl+Alt+Win+L
Most Windows shortcuts just use the Windows key, but just "Win+L" will lock your computer. I'm pretty sure this is also the only website that has its own hotkey. Not even Microsoft's web store has one.
The shortcut is real. why is this a thing?
@@dmfr56 🤷
I use Windows and I checked: it works! WTF ?
@@dmfr56 Apparently YT doesn't like single-emote responses - original reply:
🤷♂
When I was doing my CompSci degree, I created a Python library that would let you query bunch of local real estate buying and selling websites by scraping them and forgot about it. A few years later, someone created an issue and I got scared and set the repo to private. 😅
Congratulations Sir, your comment was stolen by a bot further up this comment section.
I love the feeling of mass downloading data by scraping after having exploited the html structure of a webpage. My favourite tool for this task is beautiful soup. Scraping is easier now with ai, and I would enjoy it even more if it were to become illegal.
isnt beautiful soup just an html parser so ur not doing "scraping" with it necceserally
@@baze3541 I think you responded to an AI comment.
You can scrape without worries bro, don't make it commercial, i used to use api endpoint of other websites just to run a demo web app.
@@baze3541how do you want to scrape html without parsing it
@@baze3541 Its primary use case is scraping.
Imagine creating the internet to make information publicly available and accessible and then suing people because that information shouldn't be accessible 😵💫
i love your editing style and jokes with your background images and videos 😂 keep it up
You are killing it. Every video. Much love.
killing every video? Now that sounds like "off to jail" to me.
bot obviously
they just yapping, slap stock video and steal meme, what killing you mean?
sounds like a bot
Never thought I'd hear the words "Piracy is not theft" from Fireship but here we are. 😂
Too commonsensical or too controversial?
it really isn't, though. what are you yapping about?
Well, it is not.
It's good to hear that he feels that way. I am going to pirate his courses.
@@TheSuperBoyProjectdo it
I didn't know you made web CHAD scrapping videos. After this I'll check them. Thank for your service.
Insightful rundown on the gray areas of web scraping and its legal implications. It can feel like walking on a precarious line.
If the data is public (can be indexed by a search engine) then scraping is probably allowed. If it requires logins then it's probably over the line.
exactly, this means google committed a lot of crimes with their crawlers, lmao.
What about scraping by using a logged in user?
The Ryanair case really wasn't about the scraping itself but the intent to sell flights in holiday package bundles that took away from their business. So I don't feel like it shouldn't be in the same category as other scraping lawsuits
Only in the U.S. would a person go to prison for a non-violent offence. It’s embarrassing to be from the U.S.A.
i need 50 harddrives to be a serious web scraper
Probably cheaper to pay for cloud and use something like cryptomator to encrypt.
@@trappedcat3615 Cloud is never cheaper.
@@trappedcat3615 Depends how long you want to store the data
Google Cloud is around $13/mo per TB, which would pay for a $70 1TB consumer SSD in about 5 months
@@trappedcat3615 i dont think so, you can buy tonnes of cheap corpo-used harddrives for almost no money
@@trappedcat3615But then you won’t be in full control of the data you scrapped, no?
No law banning people from using publicly available information is legitimate.
An unjust law is no law at all.
If you make information public, then you lose any rightful control over it.
Information Must Be Free.
Excellent job sifting through the complex legal and ethical dynamics of web scraping. The cases you presented help to clarify the lines between what is permissible and what's not.
And those lines are still fuzzy.
2022: how to land a job
2024: how to land a jail
Ryanair: How to land a plane
your presentation skills are impeccable, always a pleasure to watch!
That "you're stealing, right to jail" is so good
Context is key. Financially profiting from web scraping is a no no.
I use arch btw.
I use windows 11 on my Microsoft spyware infested copilot+ pc and im lovin it ❤
I smoke my arch
no one asked
I can tell @rch
good for you
One of my best channels to watch. Just Subscribed.
Companies when you freely download the data they offer for free: 🤬😭😭
Having done some work on the other end of this I don't think most people realize how expensive and disruptive scrapers can be. Scrapers and bots are responsible for the majority of many sites hosting costs and typically provide 0 revenue. Never mind that a poorly designed scraper is indistinguishable from a DOS attack.
If there was a reliable way to block them that didn't block genuine users I have no doubt many, possibly most, websites would.
Isn't it the same reason there are laws against collecting rainwater? The commons are sacred because they are shared by all. Isn't this basically a "tragedy of the commons" situation?
@@Tubeytimewait, WHAT?!
@@Tubeytime really...?? what next...breathing...???
@@Brendan-tx3lgrate limits easily solve this problem, no one can reasonable expect all third party visitors (humans or scrapers) to magically behave
In Lithuania they threaten to sue you for monetary damages you do for breaking ToS, and some sites have "Do not copy the content without written approval", that makes me worried af, but I still do that just with more proxies chained 🙃
Can I make my own tos saying that if companies collect and sell my data. I am given the rights to mine their public data? My tos would be hidden in the data that they collected. So they cant give the excuse that they didnt see my tos. Just like how you have to dig just to find google search engine tos.
Completely uneducated opinion, take this comment with an enormous grain of salt: I don't think that's possible. Since you agreed to the tos, having your own would not affect the original..But if you have an organisation with a webpage, I think you could create tos that allows you to scrape any public date from anyone who visits your site and agrees to it
Yet again, take this with a grain of salt.
@@tuureluotonen1631 yuh, my new tos doesn't override the previous tos I've accepted and is legally not acceptable if they have language in their tos that prevents EU from messing with the data format and or file they collect.
When facing a possible court battle, ask yourself the most important question: Who can afford the best lawyers?
It is possibly worth noting aggressive scraping of large sites can cost those companies money or reputational damage if their systems cannot scale quickly or sufficiently to handle the abnormal traffic load. Even if a site can handle massive spikes in requests, aggressive scraping can often trigger additional work for ops teams - particularly irksome if you’re on-call. If people must scrape, I would hope they consider throttling their requests. And if the appeal to decency doesn’t work, at least consider throttled scraping reduces the likelihood of triggering bot-detection countermeasures.
10 efficient ways to avoid JAIL as a developer
Keep it up man, your vids are some of the best on RUclips. Love your work!
I don’t understand how building a scraper or using one can be illegal. If the data can be plainly seen on the open web, automating that collection can’t be illegal - it’s already there!
This doesn’t apply in a legal sense but it’s like taking pictures in public - protected by law as you cannot expect privacy in public, so you’re free to take photos as long as you’re in a public space. If you put public in a public space, you can’t expect that people won’t collect it, automated or otherwise.
Depends on the country.
In Switzerland I received a cease and desist letter. Went to a lawyer and he said my chances to win in the court were at 30-40%. So I stopped the crawling
Since we didn’t go to jail since Kazaa and Napster, I believe that we are good now.
Well, with Booking the main point was that they were *selling* *the same data* *without authorization*. So they were not adding something useful to the data or doing some analysis, they were just copy-pasting, they were actively making profit (not through donations or something similar), so they were re-distributing the data for the sake of profit, which does sound a lot like piracy. I think it essentially comes down to "fair use", but I feel that it may be a bit more defined, which is good.
Ruby on Jails.
Public information means free knowledge. It should be free getting knowledge about anything. Scrapping is getting information = getting knowledge. That should not be a crime.
Example:
If I make a weapons stores website scraper, that shouldn't be a problem. If I use that to rob a store and or to fuck up with people using those weapons, only then it becomes a crime.
Its absolutely crazy and ignorant that web scrapping would ever be considered illegal in any context.
I webscraped an online ebook viewer for one book with 1000 pages one time and saved it to a pdf for reading. You could just save the displayed page to a file and send click events to the page navigation. Happily enough they fixed it.
An example of Google actually stealing information:
At one time, Chrome was better about recognizing when you type in a domain name or URL and want to go directly to that site. But it has magically declined into the tendency to take you to a search for that string, instead.
This lets Google pass you to the Google search, so that it can track exactly what you do from that point on. It always tracks when you click to go to a website from a Google search result, and tracks everything about you while you're doing it.
you're using chrome, Google already tracks everything you do before sending you to the main Google search
How is this stealing information? What are these claims based on? Why does it matter, if you're already using Chrome in the first place? What does this have to do with web scraping?
@@ZachAttack6089 It's stealing, because it's degrading its software's performance in order to FORCE customers to surrender more information.
And it has to do with web scraping, because any information already made completely public cannot rightfully be owned and controlled by the maker. Scraping is legal, insofar as any law banning it is illegitimate. An unjust law is no law at all...meanwhile, Google is intentionally violating your privacy, which is illegitimate.
@@Edser9 Yes, but in a different format, and with a privacy profile (officially, not that I would assume they obey their own promises) that limits what they can do, unlike the info they gather from Google searches.
I've written a scraper for a public site before. They offered an API at an obscene cost (as in 4 figures a year) so did it to themselves as far as I'm concerned. Perhaps sites should offer more granular API access for smaller requirements.
Piracy is a robbery or violent action on the seas or in the air without state authorization.
Companies use dramatic terms for what at best amounts to shoplifting
I think you mean window shopping
@@chronometer9931 photography might even be more accurate. If I could use a laser to duplicate my car have I really stolen anything?
Rant: You actually said piracy is not theft! That’s crazy but then I thought Sean Parker should’ve gone to prison for Napster. Let’s just say piracy is theft and people that ignore that should be made an example of until they believe piracy is theft and stop it. End of rant. I watched the rest of the video. This is an excellent video. Thank you for the work you do.
Admit it, you came up with the airplane metaphor just to use the clip at 2:03
I’ve rewatched that clip like 10 times already now...gotta scrape the source!
1:29 As a Romanian, the fact that you used a video of people exchanging RONs to talk about the black market caught me completely off guard 😂
At 1:29
ROMANIAN LEU SPOTTED
Let's Goooooo
This is why many U.S.-based companies outsource their web scraping tasks to developers or firms in India. Firstly, India does not have stringent laws regarding web scraping, making it a more viable option legally. Secondly, the labor costs in India are significantly lower, yet the productivity and quality of work remain exceptionally high. This combination of factors makes India an attractive destination for such technical work.
"I refuse to go back to jail" - Fireship
Purchasing a stock may seem straightforward, but selecting the correct stock without a proven strategy can be exceedingly challenging. I've been working on expanding my $210K portfolio for a while, and my primary obstacle is the lack of clear entry and exit strategies. Any advice on this matter would be greatly appreciated.
The strategies are quite rigorous for the regular-Joe. As a matter of fact, they are mostly successfully carried out by pros who have had a great deal of skillset/knowledge to pull such trades off.
I agree, having a brokerage advisor for investing is genius! Amidst the financial crisis in 2008, I was really having investing nightmare prior touching base with a advisor. In a nutshell, i've accrued over $2m with the help of my advisor from an initial $350k investment.
Impressive gains! how can I get your advisor please, if you dont mind me asking? I could really use a help as of now
Lina Dineikiene' is the licensed coach I use. Just research the name. You'd find necessary details to work with a correspondence to set up an appointment.
Thank you for this Pointer. It was easy to find your handler, She seems very proficient and flexible. I booked a session with her.
Gonna Scrap Reddit And then train my AI Chatbot to it
Gonna be very pious
Sounds like an abomination
As an admin/research assistant, i web scrape shamelessly, it lessen my job description almost to 30% of its original size. You made me kinda worried there until you gave the full explanation.😂
All I've seen from you is content like and I guess I will have to take a deep dive into your content to see if there is anything else but this all comes down to education and the willingness of people to learn.
Go back to jail?
Going to jail is not that bad. It helps you think about your codebase.
He dug a tunnel out of horny jail
I've done some successful scraping of scientific data using Selenium and Mechanize, as part of my past paid employment.
What I want to do in the near future is scrape valuable data like from eBay and public financial data (which is available for a limited time). Is it illegal to sell this data later? I suspect it would be, especially option price data.
Use-case: people want to back-test and see historic prices beyond 90 days.
No views but has comments... hmmm...
I always comment before watching...and eat supper after brushing my teeth...just born that way
@@michaelnurse9089I guess you are a LIFO
I really like how you incorporated smoking into this. Very related
39 seconds ago is arousing
Worst case scenario, you must write everything by hand. If you visit a publicly accessible website, you are allowed to write down the information you see. No one can forbid you from writing somthing on paper. Furthermore, you're also allowed to make a digital copy of your own handwritten notes. Therefore, if A => B and B => C, then A => C as well. Q.E.D.?
Scraping data available on public domain and using it for private or non profit/educational public use should be fine.
But redistribution of the public data for profit is ethically wrong.
No, it isn't
Google and all search engines are basically scraping all internet and using them to profit
no it isnt. Why would it be
Couldn’t reselling the tickets align with what stubhub or Ticketmaster do? Why wouldn’t webscraping fall under the same law as why street cameras are legal? It’s the same as if someone is there looking at the webpage.
there's one rule of Internet
every is open source
that means as long as something is Online your legally allowed to view it and use data in it, with recpect to credits (you can't re credit)
hopefully
I don't know much but I think it should work anyways I'm a programer I think logical Not using the Moods
barf
Absolute garbage take.
So artists creating artwork and putting it online just waive any copyright to it?
@theod0r how do you think google image search works?
@@paultapping9510 The same way google search works.
It links to the source, using a low quality or partial sample for direct search results.
Google image search also doesn't blatantly re-sell or remix your original work, because that'd be a massive violation of copyright law.
Which is what OP is suggesting.
You don't know what you're talking about.
@@theod0r where exactly artists publish their original artworks online without down-scaling or not behind the paywall?
We used a raspberry PI with a mobile simcard to prevent IP bans for webscraping loan data from lenders. Was pretty cool, pi's were just proxies and we puppeteered all the data.
hey 1st
Bumping this comment
Im learning as a dev even if your not a hacker and have good intentions you still have to be careful what you do cause you could wind up in jail without trying to do anything wrong.
There's one assertion you made that is most of the time, but not always, true. For example, if you have a web site which proposes products or services attached to someone's phone number, tax number, vehicle plate number, etc, fetching and presenting the corresponding data to the client may cost a non trivial amount to money. It's usually considered as a kind of investment, in the hope that the service will help the user and make them a client by buying a product or service.
Someone massively scraping this kind of data will generate significative costs for the targeting sites.
Well that's their problem then...
No one is forcing them to do that, it's up to them to decide if that model works or not and if not then they have to change it.
@@chronometer9931 When selling goods or services in countries like Spain or Portugal, there’s a legal obligation to obtain the client NIF and check if it’s valid by requesting a state database through a service operated by third-party companies. This is not free and has a cost.
I remember having a task to scraping multiple websites for book info by using ISBN. Back then, I didn't know it was illegal/grey area and just did it anyway.
2:41 if "one could argue that 'exploiting' someone else's data for profit could violate that person's copyrights" users would be getting paid for the amount of metadata sold across advertisement networks with airquotes consent. A thing which is not founded in explicit-and-knowing consent from users. The entire internet would come to a stand-still. The Ryanair decision has no long-term standing because courts have long and de facto sided with allowing users data to become the property of whatever website with which a user interacts.
The web server is fulfilling a web request. They could require identification prior to sending the data if they don't want people "scraping" it.
This was my thought. Scraping boils down to downloading an html file and parsing it. It's too trivial to be illegal.
@@thesenamesaretaken i appreciate your optimism but i wouldn't put it past lawmakers
This was always my assumption: If the data is behind a paywall, login, etc. Then you can't scrape it legally. If you can access it without any form of logging in or identification then do what you will. But I guess I'm wrong.
1:29 thank you for putting Romanian currency in your video when talking about the black market
The true story is that one of the top execs at Booking, commented on a flight and spoke harshly of the landing. With the world being so small, word got around and, well, there you go.
This makes absolutely no sense. A website provides the data freely for web browsers to render and interact with their back end. I'm just extracting relevant information from it programatically. I could literally open the source code for a page and manually collect and stuff.
"thanks for watching, and I will need you to testify in court!🥰🥰"
The difference whether something is legal or not is mostly in whether it is a powerful party doing it against a powerless one or vice versa.