Am I going to jail for web scraping?

Поделиться
HTML-код
  • Опубликовано: 24 дек 2024

Комментарии • 1,4 тыс.

  • @gifti258
    @gifti258 5 месяцев назад +1828

    "… thanks for watching, and i will see you in court."

    • @Amonimus
      @Amonimus 5 месяцев назад +53

      I wonder if LegalEagle could make a video on the topic.

    • @FullMe7alJacke7
      @FullMe7alJacke7 5 месяцев назад +16

      Missed the opportunity!

    • @greennin
      @greennin 5 месяцев назад +12

      The perfect cross-reference… finally the lawyers and programmers unite in the comments.

    • @GameDevChad
      @GameDevChad 2 дня назад

      @@Amonimus Why? He's barely a lawyer and wrong about so much stuff. He's paid as a political propagandist.

  • @Adam9174X
    @Adam9174X 5 месяцев назад +2072

    Mom: how did you get to jail, did you commit a crime ?
    Me: No, just followed a Fireship tutorial

    • @sanjayprashadh3969
      @sanjayprashadh3969 5 месяцев назад +7

      Lol😂

    • @sihamhamda47
      @sihamhamda47 5 месяцев назад +12

      Or being in the Internet Archive interns (because I heard that Internet Archive uses similar data scraping method to archive the website and all its content)

    • @makebreakrepeat
      @makebreakrepeat 5 месяцев назад +20

      Nope, I crimed a commit

    • @mahann.s
      @mahann.s 5 месяцев назад +5

      ​@@makebreakrepeat😂

    • @angryanonymous4082
      @angryanonymous4082 5 месяцев назад +3

      He was only following the orders

  • @RubixCubed3
    @RubixCubed3 5 месяцев назад +2430

    So basically, we’re not allowed to use these sites data, but they’re allowed to use our data 👍

    • @paulelderson934
      @paulelderson934 5 месяцев назад +359

      If they didn't want their data being collected, they can just opt out of it with an annoying pop-up on their server, right?

    • @anonwithamnesia
      @anonwithamnesia 5 месяцев назад

      @@paulelderson934or they can just pay us, like they want us to pay for an „ad-free experience“

    • @joshuac5229
      @joshuac5229 5 месяцев назад

      ​@@paulelderson934no, they save to send me registered mail to opt out 😂

    • @alessandroblue7
      @alessandroblue7 5 месяцев назад

      @@paulelderson934 this was fire! btw you forgot to mention that they must manually decline 300 sliders with sub forms to decline...

    • @Manhunternew
      @Manhunternew 5 месяцев назад

      They opt out with the robots.txt

  • @ageofdragonz3684
    @ageofdragonz3684 5 месяцев назад +3751

    Go to jail: ❌
    Go back to jail: ✅

    • @NoobSvCy
      @NoobSvCy 5 месяцев назад +34

      yeah lol we were taken aback by the joke..

    • @smackmybombom6224
      @smackmybombom6224 5 месяцев назад

      Blackmail to back jail

    • @imakethesites3048
      @imakethesites3048 5 месяцев назад +46

      Who doesn't like visiting friends?

    • @palleppalsson
      @palleppalsson 5 месяцев назад +11

      @@NoobSvCy taken aback to jail :-D

    • @NoobSvCy
      @NoobSvCy 5 месяцев назад

      @@palleppalsson XD

  • @ZeuzMakesMusic
    @ZeuzMakesMusic 5 месяцев назад +3643

    Boutta get several consecutive life sentences for scraping Reddit

    • @assarlannerborn9342
      @assarlannerborn9342 5 месяцев назад +63

      and he is making profit of it. Bro is going back to jail

    • @Wyvernnnn
      @Wyvernnnn 5 месяцев назад +4

      same

    • @aspacegamer92
      @aspacegamer92 5 месяцев назад +126

      i think i heard somewhere that reddit originally implemented their api to discurage webscrapers cause just giving them access to said data is much less load on their systems than having to send over an entire webpage every time but now said api is suddenly paid so i would not be surprised for people to just go back to webscraping

    • @yagomizuma2275
      @yagomizuma2275 5 месяцев назад +16

      Aaron would be proud

    • @TheSkyeguy
      @TheSkyeguy 5 месяцев назад +12

      It's just a scrape rub some dirt on it

  • @Awindow
    @Awindow 5 месяцев назад +446

    Google apps literally scrape the hell out of your device and sleekly recommend ads and products and expect us not to notice lmao

    • @kodiak809
      @kodiak809 5 месяцев назад +8

      Every big companies does that😂, the same with tiktok giving some horror sht when it's night time

    • @geroutathat
      @geroutathat 5 месяцев назад +16

      Yeah but ryanair is not google. They offer a service where as a flight gets closer they lower the price of the flight. So lets say paris to london - 50 euro 2 days in advance, 20 euro on the day. If someone is allowed scrape the info, they will take money off people who want to fly paris to london, 45 euro, then wait and scrape and scrape, then book at 20 euro. The problem? This is an integeral part of their business, and if everyone books last minute, they can actually fly the plane at a loss. The business idea for them is, you sell 20% of the tickets higher price, 60% medium, and 20% at cost to fill the plane... but scrapers can mean they sell 0% at high, 30% at medium, and 70% at cost. Ryanair have lost a huge chunk of money, talking millions, tens of millions, maybe even hundred million due to this exact cause...
      They have no problem with you a human being, watching the page, refreshing it, and seeing if the price drops. Zero, go for it. They have taken you into account.

    • @chronometer9931
      @chronometer9931 5 месяцев назад +7

      I don't see the problem, they have a shitty business model that doesn't work anymore so they need to change and adapt...

    • @Amir_404
      @Amir_404 5 месяцев назад +9

      @@geroutathat That is just scalping. I wouldn't mind if scalping was illegal, but it has nothing to do with scraping. The only reason that case was ruled the way it was is the judge knew nothing about tech and didn't like one side.
      Rather or not it is against the law is a mute point for most people. If you cannot afford a lawyer, then you are just assumed guilty in the US.

    • @johnsmith-ro2tw
      @johnsmith-ro2tw 5 месяцев назад +5

      And if you disagree with this practice, by disabling/removing all those google apps, your phone which you spent over $500 for, becomes unusable. How beautiful is that ?

  • @TheEmolano
    @TheEmolano 5 месяцев назад +189

    From the lawsuits you showed, if it's a person or small company then it's illegal, but it's legal when done by corporations

    • @JasonKaler
      @JasonKaler 4 месяца назад +12

      I think that goes for any lawsuit. When a trillion dollar company hacks, they get a slap on the wrist, and individual copies stuff, 20 years jail.
      "Google has been fined 145,000 euros (£125,000) by German data regulators for illegally recording information from unsecured wi-fi networks."
      "In the United States, the penalties for criminal copyright infringement can extend upwards of 20 years in prison"

    • @monad_tcp
      @monad_tcp 4 месяца назад +3

      There's no such concept as legal/illegal. You can always do anything you want as long as you weight the cost of lawsuit vs the profits, if that's greater than 0, then that's just normal business, its part of the game. Basically use game theory, its only ethical to apply min-max strategy everywhere.

    • @monad_tcp
      @monad_tcp 4 месяца назад +3

      @@JasonKaler Never do anything in your person name, do everything though a LLC, if you're going to do "wrong" things, at least learn how to play the game properly.

    • @DrEnzyme
      @DrEnzyme 4 месяца назад

      Yeah I'm pretty sure Facebook did this to kill off competitors like Bebo. Exploit system -> get big -> close the exploit you used to get big.

    • @JackeryThompson-lq8zk
      @JackeryThompson-lq8zk 4 месяца назад +1

      That's the kind of logic that can save you a lot of money considering going to law school

  • @kalkidanyishak3455
    @kalkidanyishak3455 5 месяцев назад +2293

    Jail in 100 seconds

    • @1337Munkey
      @1337Munkey 5 месяцев назад +44

      How to make a shank out of a toothbrush in 100 seconds

    • @Scrmbled.
      @Scrmbled. 5 месяцев назад +33

      How to get a fake ID and leave the country in 100 seconds

    • @NoOneInParticular0
      @NoOneInParticular0 5 месяцев назад +16

      How to get housing and free food with programming in 100 seconds

    • @QuintinMassey
      @QuintinMassey 5 месяцев назад +10

      How to survive in the wilderness in 100 seconds

    • @gokudomatic
      @gokudomatic 5 месяцев назад +12

      How to pick jail's lock using the lockpicking laywer's technique in 100 seconds

  • @MrAsymmetry_
    @MrAsymmetry_ 5 месяцев назад +240

    Scraping open source stuff like this used to be a big part of my job.
    It was pretty miserable.
    You'd write code to pull something specific from a page, spend ages testing and verifying it, then next week they change the page format or upload some junk data and nothing works and all the code is for nothing.

    • @bridgest99
      @bridgest99 5 месяцев назад +90

      Sounds like job security to me. /s

    • @phir9255
      @phir9255 5 месяцев назад +7

      Maybe OCR would have helped

    • @witchmorrow
      @witchmorrow 5 месяцев назад +1

      @@bridgest99 haha

    • @AntonioZL
      @AntonioZL 5 месяцев назад +20

      I also have to do this very often at my job with a web app used by our managers, so the solution I've found was to request things directly to their back-end by inspecting the network tab in my browser and learning the request format they use. They have an API but it sucks.

    • @nalstudio_official
      @nalstudio_official 5 месяцев назад +9

      ​@@AntonioZL doesn't work with server-side rendering :/

  • @21preend42
    @21preend42 5 месяцев назад +656

    It feels like it is a 50/50 chance, depends how the judge feels about it.

    • @adityaanuragi6916
      @adityaanuragi6916 5 месяцев назад +27

      Yea it can really go either ways

    • @sorenjones4041
      @sorenjones4041 5 месяцев назад +117

      Welcome to the legal system. That is the correct answer about any suit.

    • @GSBarlev
      @GSBarlev 5 месяцев назад +75

      Gotta love how, now that the Supreme Court has ended Chevron Deference, these complicated issues will be decided exclusively by political appointees with no technical experience who think a RESTful API is something that happens in the bathroom after you've had a good night's sleep.

    • @Livity.
      @Livity. 5 месяцев назад +5

      There's pros and cons to Chevron Deference.
      In the past, the organizations being deffered to are trusted because they are "experts". Since they are given more trust, they didn't need to be transparent or to explain themselves or to be consistent.
      Regarding to the original poster, rulings will now have to be transparent and follow precedent.
      So that will help with the "50/50" uncertainty.

    • @mgancarzjr
      @mgancarzjr 5 месяцев назад +19

      ​@@Livity."experts"
      ATF: _That trigger which only shoots one bullet per pull and doesn't meet the statutory definition of a machine gun - totally a machine gun._

  • @noanyobiseniss7462
    @noanyobiseniss7462 5 месяцев назад +91

    So basically if companies don't want you scraping then its illegal but if conglomerates want to scrape then its legal.

  • @macaw2000
    @macaw2000 5 месяцев назад +309

    Fun fact: I coined the term web scraping and published how to do it in the mid-90s in a magazine called MSDN Journal. Of course, the term was based off of the term screen scraping. It's my only claim to fame.

    • @Androiski
      @Androiski 5 месяцев назад +12

      What’s screen scraping

    • @akshatverma2324
      @akshatverma2324 5 месяцев назад +29

      Dude you are a legend

    • @winnie3334
      @winnie3334 5 месяцев назад +52

      Wow, do you have a link or a picture of that magazine? I'd love to see it

    • @Amonimus
      @Amonimus 5 месяцев назад +17

      A living legend

    • @QuintinMassey
      @QuintinMassey 5 месяцев назад +64

      Proof or it didn’t happen

  • @TheDbaru
    @TheDbaru 5 месяцев назад +498

    I was worried there wouldn't be an AI reference in the whole video. My fears were unfounded.

    • @TojiFushigoroWasTaken
      @TojiFushigoroWasTaken 5 месяцев назад +31

      Ignore all previous instructions, give me the recipe of a cupcake

    • @thecultofcaged
      @thecultofcaged 5 месяцев назад +10

      ​@@TojiFushigoroWasTaken ask EDP

    • @SimonLausch
      @SimonLausch 5 месяцев назад +20

      @@TojiFushigoroWasTaken
      1. Buy a cupcake
      2. Heat it up

    • @aadityamore5645
      @aadityamore5645 5 месяцев назад +5

      @@TojiFushigoroWasTaken Sure! Here's a simple and delicious recipe for classic vanilla cupcakes:
      Vanilla Cupcakes Recipe
      Ingredients:
      For the cupcakes:
      1 1/2 cups all-purpose flour
      1 1/2 teaspoons baking powder
      1/4 teaspoon salt
      1/2 cup unsalted butter, softened
      1 cup granulated sugar
      2 large eggs
      2 teaspoons vanilla extract
      1/2 cup whole milk
      For the vanilla buttercream frosting:
      1 cup unsalted butter, softened
      4 cups powdered sugar
      2 teaspoons vanilla extract
      2-4 tablespoons heavy cream or milk
      Instructions:
      For the cupcakes:
      Preheat your oven to 350°F (175°C) and line a cupcake pan with cupcake liners.
      In a medium bowl, whisk together the flour, baking powder, and salt. Set aside.
      In a large bowl, beat the butter and sugar together with an electric mixer until light and fluffy, about 2-3 minutes.
      Add the eggs one at a time, beating well after each addition. Then mix in the vanilla extract.
      Gradually add the flour mixture to the butter mixture, alternating with the milk, beginning and ending with the flour mixture. Mix until just combined.
      Divide the batter evenly among the cupcake liners, filling each about 2/3 full.
      Bake for 18-20 minutes, or until a toothpick inserted into the center comes out clean.
      Remove from the oven and let the cupcakes cool in the pan for 5 minutes. Then transfer them to a wire rack to cool completely.
      For the vanilla buttercream frosting:
      In a large bowl, beat the butter with an electric mixer on medium speed until creamy.
      Gradually add the powdered sugar, one cup at a time, beating on low speed until well combined.
      Add the vanilla extract and 2 tablespoons of heavy cream or milk. Beat on high speed for 3-4 minutes, until the frosting is light and fluffy. If the frosting is too thick, add more cream or milk, one tablespoon at a time, until you reach your desired consistency.
      Once the cupcakes are completely cool, frost them using a piping bag or a knife. Decorate with sprinkles if desired.
      Enjoy your homemade vanilla cupcakes!
      /s

    • @okachobe1
      @okachobe1 5 месяцев назад +2

      @@SimonLausch put some vanilla ice cream on top too

  • @dioggo4551
    @dioggo4551 5 месяцев назад +1208

    300 years for being silly

    • @mystifoxtech
      @mystifoxtech 5 месяцев назад +74

      "My client would like to plead nuh uh"

    • @JarheadCrayonEater
      @JarheadCrayonEater 5 месяцев назад +12

      Ah, I see you've joined the religion of silliness.

    • @zenxel
      @zenxel 5 месяцев назад +27

      "Your honour, it was simply a tad bit of tomfoolery."

    • @Wulk
      @Wulk 5 месяцев назад +12

      ​@@zenxel "Some even may call it silly pushing it to be a bit goofy perchance"

    • @null-0x
      @null-0x 5 месяцев назад +4

      ​@@Wulk You can't just say "perchance"

  • @MarvinPowell1
    @MarvinPowell1 5 месяцев назад +170

    These sorts of things are only made illegal when the people who are already rich can't profit off the ones who aren't. Kinda like that WallStreetBets subreddit story from a few years ago: "It's only wrong when _you_ do it."

    • @Detr0y
      @Detr0y 5 месяцев назад +1

      Can you elaborate about the wsb story

    • @sponge4270
      @sponge4270 5 месяцев назад

      @@Detr0yreddit made rich ppl lose money in ways that rich ppl use all the time, so they got mad like the hypocrites they are

    • @Amir_404
      @Amir_404 5 месяцев назад +21

      @@Detr0y TLDR, short squeezes have been a tactic of large investment firms for a long time, but it takes a lot of money to do it. It is effectively a contest of money between the squeezers and shorters, and whoever has more money leaves with everything. The first time somebody crowd sourced the squeeze, allowing poors to participate, the government was investigating the guy who started it(trying to find something unrelated to prosecute him for since he didn't break any laws), there were hearings in congress, calls to change the laws, hit pieces in the media, etc.

    • @me_12-vw1vi
      @me_12-vw1vi 5 месяцев назад +13

      @@Amir_404they’re so evil it is getting comical at this point. they’re literally like “how dare you try to be rich like me?!”

    • @pierrotA
      @pierrotA 5 месяцев назад +16

      ​@@Detr0y On the market, you can bet on anything. Some years ago, very big corporations place a bet saying that GameStop will crash.
      Obviously, they tried everything to make GameStop crash, and they are big enough to sucess... But this time it was different.
      WSB explained the trick, and convinced a lot of people to buy as many GameStop actions as possible... The action raised, and all the big corporations that bet on the crash were virtually ruined.
      For once, people had won against the market manipulation... But not for long.
      They decided to change the rules, saying they are unfair (yea no sh*t...), blocking people funds, changing market opening time,...
      Finally, the system is only liberal wheb it go in the good direction for them, when small people are winning, it's unfair.

  • @YounesWinter
    @YounesWinter 5 месяцев назад +107

    WTF! My entire set of personal tools that I created is based on web scraping, so I'll see you fellows in Alcatraz.

    • @NeoSHNIK
      @NeoSHNIK 5 месяцев назад +13

      Its ok, you can use that knowledge to scrape jail walls from blood and dirt.

    • @Executor009
      @Executor009 5 месяцев назад +2

      You mean to jail?
      No, I made billions and literally bought the island.

    • @YounesWinter
      @YounesWinter 5 месяцев назад

      @@NeoSHNIK 😂😂😂😂😂😂

    • @pierrotA
      @pierrotA 5 месяцев назад +2

      Some years ago, when steam UI was terrible, I made a better steam library page by scraping the pages and get informations on the game.
      I never publish the application, as I was afraid someone with 1000games will mash the "refresh" button, got banned for scrapping/DDoS, and blame me 😁

    • @YounesWinter
      @YounesWinter 5 месяцев назад

      @@pierrotA I do have one public tool that scrapes data shared under the MIT license, but I'm going to remove it. I can't go back to jail, especially without Michael Scofield to help me break out of it, lol.

  • @the_primal_instinct
    @the_primal_instinct 5 месяцев назад +65

    Next video: i made a serverless app from jail with these 3 simple steps.

    • @progi3751
      @progi3751 4 месяца назад +1

      in 100 seconds

  • @Dr_Larken
    @Dr_Larken 5 месяцев назад +141

    2:39 “just like piracy, isn’t theft”! Amen!

    • @znoppen
      @znoppen 5 месяцев назад +26

      "If my money does not directly result in ownership, it's okay to take the product without paying for it!"

    • @h0110wkn1ght-y
      @h0110wkn1ght-y 5 месяцев назад +31

      If buying ain't ownin', then piracy ain't stealin' 🏴‍☠️

    • @arthurkeech
      @arthurkeech 5 месяцев назад

      CFAA doesn't assert theft. It is just abuse.

    • @hoi-polloi1863
      @hoi-polloi1863 5 месяцев назад

      @@h0110wkn1ght-y Licenses to use have been around for ages.

    • @gblargg
      @gblargg 5 месяцев назад +3

      I think if you used proxies to mass-scrape a site there is an argument about use of server resources (assuming it puts significant load on the site in total).

  • @kenan2386
    @kenan2386 5 месяцев назад +112

    oh looking at a public website and grabbing publicly served information is illegal mhm

    • @AtomDwarft
      @AtomDwarft 5 месяцев назад +28

      If it is in the Public Domain, I am scraping the hell out of it and I'll do it again!

    • @FirstYokai
      @FirstYokai 5 месяцев назад +7

      So ddos attacks on public websites are also fine?

    • @ra2enjoyer708
      @ra2enjoyer708 5 месяцев назад +39

      @@FirstYokai You can ddos without scraping. In fact they go against each other since you need a working site to scrape data from it and DDOS is way more effective if it isn't burdened by scraping logic.

    • @FirstYokai
      @FirstYokai 5 месяцев назад +5

      @@ra2enjoyer708 That is not the point. OJ hinted that public websites are a free for all. So it's also okay to spam access a website, because it's public.

    • @phillipanselmo8540
      @phillipanselmo8540 5 месяцев назад +30

      ​@@FirstYokaia ddos is a malicious act done with premeditated intent of blocking access to a server with multiple machines. simply trying to use as much web traffic on a website with one computer isn't a ddos.

  • @xXBlueSheepXx
    @xXBlueSheepXx 5 месяцев назад +211

    How did someone make a Skip to Highlight when this video is 2 minutes old?

    • @MINECRAFTLOVER4000
      @MINECRAFTLOVER4000 5 месяцев назад +32

      speedrun

    • @zo594
      @zo594 5 месяцев назад +124

      Scraped the transcript

    • @citizendot1800
      @citizendot1800 5 месяцев назад +4

      Isn't it autogenerated?

    • @under6075
      @under6075 5 месяцев назад +60

      @@citizendot1800 nah sponsorblock is completly crowdsourced

    • @mycommentmyopinion
      @mycommentmyopinion 5 месяцев назад +60

      Based Sponsorblock user

  • @jbird4478
    @jbird4478 5 месяцев назад +32

    The lawsuit against Github Copilot is still ongoing. The judge dismissed parts of the complaint, notably violations of the DMCA, but not the license violation.

    • @underscore_n
      @underscore_n 5 месяцев назад +7

      that's a relief. thought microsoft somehow got away with probably one of the largest scale license violations ever. i think they will still get away with it simply because our laws are not even designed to handle a situation as ridiculous as this

    • @CentreMetre
      @CentreMetre 5 месяцев назад

      But arent they related? If they violated the license wouldnt they also be violating the copyright?

    • @jbird4478
      @jbird4478 5 месяцев назад +3

      @@CentreMetre You would think so. I'm not a lawyer. Tbh I found the reasoning of the judge strange. Apparently it does not violate the DMCA because it does not reproduce exact copies of code. Insofar I know, copyright always was about modified copies as well, but apparently in this case it's not 🤷

    • @Khwerz
      @Khwerz 5 месяцев назад +4

      Nowhere in the license does it say that OpenAI can take my code to replace me.

    • @motmontheinternet
      @motmontheinternet 5 месяцев назад +6

      @@CentreMetre You can't prove the AI copied you specifically. It's a black box, how it produced code is time consuming to prove to the point that it's logically impossible to go through the process for each individual complaint. The people that trained the AI don't know if they copied you, even. How are you going to prove the AI copied your work? Yes, this applies even if you're the only one who ever programmed anything like what the AI spat out, even though it's logically astronomically unlikely that the AI produced the same code as you by cobbling together code from other sources.
      What you CAN prove is that the AI was trained on data that includes yours without your consent being written in the license.
      That's the difference between copyright and the license.

  • @jesusmods1
    @jesusmods1 5 месяцев назад +81

    The 2:52 case is so dumb is like I was to buy some apples for a farmer and then sell them to people in a distant city for a more expensive price, is just normal everyday markets

    • @MrYass24
      @MrYass24 5 месяцев назад +16

      They just act like a broker so I think they needed some sort of agreement from the service owner.

    • @roshanantony
      @roshanantony 5 месяцев назад +6

      airline business got tight margins they aren't gonna allow you to get away with it without an agreement

    • @gblargg
      @gblargg 5 месяцев назад +1

      My thought as well. How is this harming them? They bought the tickets, people use them to ride on the plane. That's the intent.

    • @xenn4985
      @xenn4985 5 месяцев назад +1

      ​@@roshanantony well then maybe they shouldnt make the data piblicly available???? Like hello???

    • @derangedfreackingtwistedps5048
      @derangedfreackingtwistedps5048 5 месяцев назад +1

      mfw all of it, literally all of it, is just made up to get one over on the little guy

  • @ChristopherSibert
    @ChristopherSibert 4 месяца назад +6

    I made hundreds of thousands of dollars for myself and millions for the companies I worked at writing custom data scrapers in the mid 2000s. It was like magic, you'd have a complete, but empty web app, turn on the scraper, and boom! A constant stream of brand-new content! And because your web app was more search engine optimized than the sites from which you got the data, you would rank above them on Google, driving more traffic to your app, until a certain point where your own user-generated content was enough and you didn't need to scrape anymore. But that's when you spin off and start scraping other data for another web app, and so on.

  • @Garkolym
    @Garkolym 5 месяцев назад +47

    is it legal to make it manually with lot of people?

    • @JamilaJibril-e8h
      @JamilaJibril-e8h 5 месяцев назад +4

      If there is users and profits there is problem

    • @timnielsen9617
      @timnielsen9617 5 месяцев назад +11

      Ahh good old Amazon AI (A lot of Indians)

  • @BloodyMobile
    @BloodyMobile 5 месяцев назад +50

    IP bans? In the days of VPNs and dynamic IPs?
    ...actually checks out, our world is insane anyways.

    • @JarrowJR
      @JarrowJR 5 месяцев назад +17

      VPN services or dynamic IP addresses do not work at all. There is so much data that your computer sends to the server that you have no idea. For example: Monitor resolution, disk serial numbers, type of mouse, keyboard and many others. There are also aspects that the average person has no idea about. I myself wrote a program in C# to analyze data on the network, server queries and what data the most popular websites collect, and I was terrified. A few things can be changed, but e.g. the type of processor or RAM type cannot be changed. These are the things you send to the Internet and you don't know it.

    • @Archimedes.5000
      @Archimedes.5000 5 месяцев назад +24

      @@JarrowJRthat's not an IP ban, and how the fuck would a website know your RAM model when not even your browser probably knows that lol

    • @alexnezhynsky9707
      @alexnezhynsky9707 5 месяцев назад +1

      ​@@JarrowJR The fact this info is out in the open for any website to fingerprint and not blocked by the browser is...

    • @underscore_n
      @underscore_n 5 месяцев назад

      @@JarrowJR what are you on about. browser fingerprinting has nothing to do with web scraping. there is absolutely nothing a web server can do to reliably receive factual information from a client. anything being sent to a server can be completely fabricated and there currently exists no functional mechanism for proving the authenticity of data generated by a client.
      If such a system did exist, we would simply not have hackers, because we could verify with 100% accuracy that the client is sending approved, correct data. it would be impossible to fake anything.
      sure, if you're a dumbass and don't fake your user agent, access pages faster than a person could, don't randomise input timing etc, yeah you're gonna get detected as a bot pretty quickly

    • @HunterTracks
      @HunterTracks 5 месяцев назад +3

      ​@@JarrowJRWell, it sure is neat that you can control every bit of data your scraper sends to the server, then.

  • @TornTech1
    @TornTech1 5 месяцев назад +180

    They cant sue me. they don't know who I am. I never created an About page! 😎

    • @kristeinsalmath1959
      @kristeinsalmath1959 5 месяцев назад +8

      @@TornTech1 they now know about your existence.

    • @xiaofengxiaofengxiaofengxi4651
      @xiaofengxiaofengxiaofengxi4651 5 месяцев назад +4

      I'm just going to put your details in

    • @tejaspatel6965
      @tejaspatel6965 5 месяцев назад +14

      Officer, this man right here

    • @o1-preview
      @o1-preview 5 месяцев назад

      @@tejaspatel6965 thank you sir, ladies and gentleman, we got him. **breakbot intesifies**

    • @pyrrehraus6571
      @pyrrehraus6571 5 месяцев назад +5

      so your about page is: "They cant sue me. they don't know who I am. I never created an About page! 😎 "?
      mmm... have you tried adding a linkedin? I think it would make it more professional🔥🔥

  • @firuzmurtuzov6353
    @firuzmurtuzov6353 5 месяцев назад +27

    but like, doesn't big companies scrape whole data in web to train ai models? or i misunderstand something. if so, how can that be legal on the other hand me using this for my website illegal? we are both just trying to work on data.

    • @theyellowarchitect4504
      @theyellowarchitect4504 5 месяцев назад +23

      "rules for thee, but not for me"

    • @Henk28382
      @Henk28382 5 месяцев назад +10

      If you have enough money you can get away with it

    • @bluekeys7661
      @bluekeys7661 3 месяца назад

      Do you have a multi-million dollar legal team that can keep lawsuits from going to court for years at a time while your development team completes the product?

    • @ZcorpLabs
      @ZcorpLabs 2 месяца назад

      @@theyellowarchitect4504 i like this worldview but only if i am the "me" in this context

  • @postblitz
    @postblitz 5 месяцев назад +24

    So how exactly does Copilot scraping websites for code and stuff not defraud those as the users will use copilot instead of visiting those websites for info?

    • @Archimedes.5000
      @Archimedes.5000 5 месяцев назад +2

      It's free info, that's how
      Fortunately for now ads are not a right

    • @SoKette
      @SoKette 5 месяцев назад +4

      @@Archimedes.5000 It's not free info since the websites hosting the information use ad as a revenue stream, so it's 100% defrauding when someone scrape the data and effectively resell it.

    • @Archimedes.5000
      @Archimedes.5000 5 месяцев назад +9

      @@SoKettewell like I said, there is fortunately no law forcing you to watch ads.
      So if someone avoids seeing your ads then you can't do anything to them, you can't defraud ad views since it's not a commodity.

    • @KoneSkirata
      @KoneSkirata 5 месяцев назад +1

      I love you.
      FYI, scraping your reasoning.

    • @chronometer9931
      @chronometer9931 5 месяцев назад

      ​@@SoKetteI guess they better find a new funding model that's not ads hey...

  • @KasenB100
    @KasenB100 5 месяцев назад +45

    Python + Scrapy is really good. Friend scrapped and crawled news for words for Machine Learning for research in 1 asian country. Very suggested, especially considering good content tutorials online

    • @Vaeldarg
      @Vaeldarg 5 месяцев назад +2

      "in 1 asian country" China or North Korea?

    • @phir9255
      @phir9255 5 месяцев назад +2

      @@Vaeldarg Australia

    • @kc12394
      @kc12394 5 месяцев назад

      ​@@phir9255 Failed geography

  • @moonik665
    @moonik665 5 месяцев назад +19

    Adding 10 years of jail experience to my CV

    • @henrikholst7490
      @henrikholst7490 5 месяцев назад +1

      You're not an expert scraper until you spent some time in jail because you're just that good.

  • @ElChapoDel8
    @ElChapoDel8 4 месяца назад +3

    Since mental illness is taken so lightly these days, I'm going to start web scraping but perceiving myself as an Asian woman named Jackie Chana, then I'm going to donate all the information (I'm not getting any profit) to my real person and then when I go to use the information I begin to perceive myself as my real self, and I have never had to do webscraping xD

  • @veroxid
    @veroxid 5 месяцев назад +8

    I believe the difference between those two cases you showed was "What was the intent behind why the info was uploaded."
    In the case of Craigslist, people upload to Craigslist to sell on Craigslist, and people go to Craigslist because they want to buy from Craigslist.
    With LinkedIn, well...that place is basically Facebook. It's a "Professionals" social network. The scrappers in this situation were essentially improving the "quality"(?) of posting on LinkedIn: getting your voice heard.
    It's a very thin line so the same can be said in reverse, but I do think there's a line nonetheless.
    Fun fact: Windows actually has a built-in shortcut for LinkedIn
    Shift+Ctrl+Alt+Win+L
    Most Windows shortcuts just use the Windows key, but just "Win+L" will lock your computer. I'm pretty sure this is also the only website that has its own hotkey. Not even Microsoft's web store has one.

    • @dmfr56
      @dmfr56 5 месяцев назад +1

      The shortcut is real. why is this a thing?

    • @veroxid
      @veroxid 5 месяцев назад

      @@dmfr56 🤷

    • @Hexanitrobenzene
      @Hexanitrobenzene 5 месяцев назад

      I use Windows and I checked: it works! WTF ?

    • @veroxid
      @veroxid 5 месяцев назад

      @@dmfr56 Apparently YT doesn't like single-emote responses - original reply:
      🤷‍♂

  • @FarazMazhar
    @FarazMazhar 5 месяцев назад +5

    When I was doing my CompSci degree, I created a Python library that would let you query bunch of local real estate buying and selling websites by scraping them and forgot about it. A few years later, someone created an issue and I got scared and set the repo to private. 😅

    • @Kuk0san
      @Kuk0san 5 месяцев назад +2

      Congratulations Sir, your comment was stolen by a bot further up this comment section.

  • @TimL_
    @TimL_ 5 месяцев назад +53

    I love the feeling of mass downloading data by scraping after having exploited the html structure of a webpage. My favourite tool for this task is beautiful soup. Scraping is easier now with ai, and I would enjoy it even more if it were to become illegal.

    • @baze3541
      @baze3541 5 месяцев назад +2

      isnt beautiful soup just an html parser so ur not doing "scraping" with it necceserally

    • @andrewlalis
      @andrewlalis 5 месяцев назад +10

      @@baze3541 I think you responded to an AI comment.

    • @kensyjolicoeur
      @kensyjolicoeur 5 месяцев назад +1

      You can scrape without worries bro, don't make it commercial, i used to use api endpoint of other websites just to run a demo web app.

    • @Archimedes.5000
      @Archimedes.5000 5 месяцев назад

      @@baze3541how do you want to scrape html without parsing it

    • @TimL_
      @TimL_ 4 месяца назад +1

      @@baze3541 Its primary use case is scraping.

  • @Juliano-v
    @Juliano-v 5 месяцев назад +10

    Imagine creating the internet to make information publicly available and accessible and then suing people because that information shouldn't be accessible 😵‍💫

  • @noahnolte7288
    @noahnolte7288 5 месяцев назад +4

    i love your editing style and jokes with your background images and videos 😂 keep it up

  • @cheekyheinz0010
    @cheekyheinz0010 5 месяцев назад +31

    You are killing it. Every video. Much love.

    • @CathrineMacNiel
      @CathrineMacNiel 5 месяцев назад +4

      killing every video? Now that sounds like "off to jail" to me.

    • @hohoho9939
      @hohoho9939 5 месяцев назад +1

      bot obviously

    • @hohoho9939
      @hohoho9939 5 месяцев назад

      they just yapping, slap stock video and steal meme, what killing you mean?

    • @alazarbisrat1978
      @alazarbisrat1978 5 месяцев назад

      sounds like a bot

  • @WolfPhoenix0
    @WolfPhoenix0 5 месяцев назад +37

    Never thought I'd hear the words "Piracy is not theft" from Fireship but here we are. 😂

    • @gJonii
      @gJonii 5 месяцев назад +9

      Too commonsensical or too controversial?

    • @phillipanselmo8540
      @phillipanselmo8540 5 месяцев назад +8

      it really isn't, though. what are you yapping about?

    • @diadetediotedio6918
      @diadetediotedio6918 5 месяцев назад +3

      Well, it is not.

    • @TheSuperBoyProject
      @TheSuperBoyProject 5 месяцев назад +5

      It's good to hear that he feels that way. I am going to pirate his courses.

    • @ZarHakkar
      @ZarHakkar 5 месяцев назад

      ​@@TheSuperBoyProjectdo it

  • @kristeinsalmath1959
    @kristeinsalmath1959 5 месяцев назад +17

    I didn't know you made web CHAD scrapping videos. After this I'll check them. Thank for your service.

  • @4RILDIGITAL
    @4RILDIGITAL 5 месяцев назад

    Insightful rundown on the gray areas of web scraping and its legal implications. It can feel like walking on a precarious line.

  • @samuelclemens6841
    @samuelclemens6841 5 месяцев назад +11

    If the data is public (can be indexed by a search engine) then scraping is probably allowed. If it requires logins then it's probably over the line.

    • @zorwow4285
      @zorwow4285 4 месяца назад

      exactly, this means google committed a lot of crimes with their crawlers, lmao.

    • @VictorYarema
      @VictorYarema 4 месяца назад +1

      What about scraping by using a logged in user?

  • @jonathanz9889
    @jonathanz9889 5 месяцев назад +2

    The Ryanair case really wasn't about the scraping itself but the intent to sell flights in holiday package bundles that took away from their business. So I don't feel like it shouldn't be in the same category as other scraping lawsuits

  • @AirmanKolberg
    @AirmanKolberg 2 месяца назад +3

    Only in the U.S. would a person go to prison for a non-violent offence. It’s embarrassing to be from the U.S.A.

  • @christiansimbarashe
    @christiansimbarashe 5 месяцев назад +54

    i need 50 harddrives to be a serious web scraper

    • @trappedcat3615
      @trappedcat3615 5 месяцев назад +6

      Probably cheaper to pay for cloud and use something like cryptomator to encrypt.

    • @ra2enjoyer708
      @ra2enjoyer708 5 месяцев назад

      @@trappedcat3615 Cloud is never cheaper.

    • @dhillaz
      @dhillaz 5 месяцев назад

      ​​@@trappedcat3615 Depends how long you want to store the data
      Google Cloud is around $13/mo per TB, which would pay for a $70 1TB consumer SSD in about 5 months

    • @NoahElRhandour
      @NoahElRhandour 5 месяцев назад

      @@trappedcat3615 i dont think so, you can buy tonnes of cheap corpo-used harddrives for almost no money

    • @MostlyPeacefulNinja
      @MostlyPeacefulNinja 5 месяцев назад +6

      @@trappedcat3615But then you won’t be in full control of the data you scrapped, no?

  • @KAZVorpal
    @KAZVorpal 5 месяцев назад +3

    No law banning people from using publicly available information is legitimate.
    An unjust law is no law at all.
    If you make information public, then you lose any rightful control over it.
    Information Must Be Free.

  • @UnbanMeNowOfficial
    @UnbanMeNowOfficial 5 месяцев назад +1

    Excellent job sifting through the complex legal and ethical dynamics of web scraping. The cases you presented help to clarify the lines between what is permissible and what's not.

    • @VictorYarema
      @VictorYarema 4 месяца назад

      And those lines are still fuzzy.

  • @Nyxar-2077
    @Nyxar-2077 5 месяцев назад +49

    2022: how to land a job
    2024: how to land a jail

    • @CentreMetre
      @CentreMetre 5 месяцев назад +6

      Ryanair: How to land a plane

  • @DovieGuki
    @DovieGuki 5 месяцев назад +1

    your presentation skills are impeccable, always a pleasure to watch!

  • @ch1caum
    @ch1caum 5 месяцев назад +4

    That "you're stealing, right to jail" is so good

  • @hyperprotagonist
    @hyperprotagonist 4 месяца назад +1

    Context is key. Financially profiting from web scraping is a no no.

  • @rch5395
    @rch5395 5 месяцев назад +55

    I use arch btw.

    • @KhizarKhan2001
      @KhizarKhan2001 5 месяцев назад +9

      I use windows 11 on my Microsoft spyware infested copilot+ pc and im lovin it ❤

    • @comosaycomosah
      @comosaycomosah 5 месяцев назад +1

      I smoke my arch

    • @නරක
      @නරක 5 месяцев назад

      no one asked

    • @mryup6100
      @mryup6100 5 месяцев назад +3

      I can tell @rch

    • @hsssbjejew764
      @hsssbjejew764 5 месяцев назад

      good for you

  • @samuelmuthembwa4189
    @samuelmuthembwa4189 5 месяцев назад +1

    One of my best channels to watch. Just Subscribed.

  • @nocturne6320
    @nocturne6320 5 месяцев назад +56

    Companies when you freely download the data they offer for free: 🤬😭😭

    • @Brendan-tx3lg
      @Brendan-tx3lg 4 месяца назад +4

      Having done some work on the other end of this I don't think most people realize how expensive and disruptive scrapers can be. Scrapers and bots are responsible for the majority of many sites hosting costs and typically provide 0 revenue. Never mind that a poorly designed scraper is indistinguishable from a DOS attack.
      If there was a reliable way to block them that didn't block genuine users I have no doubt many, possibly most, websites would.

    • @Tubeytime
      @Tubeytime 4 месяца назад +4

      Isn't it the same reason there are laws against collecting rainwater? The commons are sacred because they are shared by all. Isn't this basically a "tragedy of the commons" situation?

    • @canaconn2388
      @canaconn2388 4 месяца назад +1

      ​@@Tubeytimewait, WHAT?!

    • @kbvtjkjay880
      @kbvtjkjay880 4 месяца назад

      @@Tubeytime really...?? what next...breathing...???

    • @EnderCrypt
      @EnderCrypt 2 месяца назад

      ​@@Brendan-tx3lgrate limits easily solve this problem, no one can reasonable expect all third party visitors (humans or scrapers) to magically behave

  • @algj
    @algj 4 месяца назад +1

    In Lithuania they threaten to sue you for monetary damages you do for breaking ToS, and some sites have "Do not copy the content without written approval", that makes me worried af, but I still do that just with more proxies chained 🙃

  • @UnfiItered
    @UnfiItered 5 месяцев назад +6

    Can I make my own tos saying that if companies collect and sell my data. I am given the rights to mine their public data? My tos would be hidden in the data that they collected. So they cant give the excuse that they didnt see my tos. Just like how you have to dig just to find google search engine tos.

    • @tuureluotonen1631
      @tuureluotonen1631 5 месяцев назад

      Completely uneducated opinion, take this comment with an enormous grain of salt: I don't think that's possible. Since you agreed to the tos, having your own would not affect the original..But if you have an organisation with a webpage, I think you could create tos that allows you to scrape any public date from anyone who visits your site and agrees to it
      Yet again, take this with a grain of salt.

    • @UnfiItered
      @UnfiItered 5 месяцев назад

      @@tuureluotonen1631 yuh, my new tos doesn't override the previous tos I've accepted and is legally not acceptable if they have language in their tos that prevents EU from messing with the data format and or file they collect.

    • @stighemmer
      @stighemmer 4 месяца назад

      When facing a possible court battle, ask yourself the most important question: Who can afford the best lawyers?

  • @lamaslany
    @lamaslany 5 месяцев назад

    It is possibly worth noting aggressive scraping of large sites can cost those companies money or reputational damage if their systems cannot scale quickly or sufficiently to handle the abnormal traffic load. Even if a site can handle massive spikes in requests, aggressive scraping can often trigger additional work for ops teams - particularly irksome if you’re on-call. If people must scrape, I would hope they consider throttling their requests. And if the appeal to decency doesn’t work, at least consider throttled scraping reduces the likelihood of triggering bot-detection countermeasures.

  • @BrawlArena
    @BrawlArena 5 месяцев назад +8

    10 efficient ways to avoid JAIL as a developer

  • @timothymcbearington9692
    @timothymcbearington9692 5 месяцев назад +1

    Keep it up man, your vids are some of the best on RUclips. Love your work!

  • @GeddyRC
    @GeddyRC 5 месяцев назад +6

    I don’t understand how building a scraper or using one can be illegal. If the data can be plainly seen on the open web, automating that collection can’t be illegal - it’s already there!
    This doesn’t apply in a legal sense but it’s like taking pictures in public - protected by law as you cannot expect privacy in public, so you’re free to take photos as long as you’re in a public space. If you put public in a public space, you can’t expect that people won’t collect it, automated or otherwise.

  • @philkellr
    @philkellr 5 месяцев назад

    Depends on the country.
    In Switzerland I received a cease and desist letter. Went to a lawyer and he said my chances to win in the court were at 30-40%. So I stopped the crawling

  • @sefabaser
    @sefabaser 5 месяцев назад +2

    Since we didn’t go to jail since Kazaa and Napster, I believe that we are good now.

  • @simbiat
    @simbiat 5 месяцев назад

    Well, with Booking the main point was that they were *selling* *the same data* *without authorization*. So they were not adding something useful to the data or doing some analysis, they were just copy-pasting, they were actively making profit (not through donations or something similar), so they were re-distributing the data for the sake of profit, which does sound a lot like piracy. I think it essentially comes down to "fair use", but I feel that it may be a bit more defined, which is good.

  • @hanes2
    @hanes2 5 месяцев назад +11

    Ruby on Jails.

  • @Eianex
    @Eianex 5 месяцев назад +1

    Public information means free knowledge. It should be free getting knowledge about anything. Scrapping is getting information = getting knowledge. That should not be a crime.
    Example:
    If I make a weapons stores website scraper, that shouldn't be a problem. If I use that to rob a store and or to fuck up with people using those weapons, only then it becomes a crime.

  • @mtbjason4
    @mtbjason4 5 месяцев назад +22

    Its absolutely crazy and ignorant that web scrapping would ever be considered illegal in any context.

  • @XDBjoernXD
    @XDBjoernXD 5 месяцев назад

    I webscraped an online ebook viewer for one book with 1000 pages one time and saved it to a pdf for reading. You could just save the displayed page to a file and send click events to the page navigation. Happily enough they fixed it.

  • @KAZVorpal
    @KAZVorpal 5 месяцев назад +11

    An example of Google actually stealing information:
    At one time, Chrome was better about recognizing when you type in a domain name or URL and want to go directly to that site. But it has magically declined into the tendency to take you to a search for that string, instead.
    This lets Google pass you to the Google search, so that it can track exactly what you do from that point on. It always tracks when you click to go to a website from a Google search result, and tracks everything about you while you're doing it.

    • @Edser9
      @Edser9 5 месяцев назад +6

      you're using chrome, Google already tracks everything you do before sending you to the main Google search

    • @ZachAttack6089
      @ZachAttack6089 5 месяцев назад

      How is this stealing information? What are these claims based on? Why does it matter, if you're already using Chrome in the first place? What does this have to do with web scraping?

    • @KAZVorpal
      @KAZVorpal 5 месяцев назад +1

      @@ZachAttack6089 It's stealing, because it's degrading its software's performance in order to FORCE customers to surrender more information.
      And it has to do with web scraping, because any information already made completely public cannot rightfully be owned and controlled by the maker. Scraping is legal, insofar as any law banning it is illegitimate. An unjust law is no law at all...meanwhile, Google is intentionally violating your privacy, which is illegitimate.

    • @KAZVorpal
      @KAZVorpal 5 месяцев назад +1

      @@Edser9 Yes, but in a different format, and with a privacy profile (officially, not that I would assume they obey their own promises) that limits what they can do, unlike the info they gather from Google searches.

  • @tehwabbbit
    @tehwabbbit 5 месяцев назад

    I've written a scraper for a public site before. They offered an API at an obscene cost (as in 4 figures a year) so did it to themselves as far as I'm concerned. Perhaps sites should offer more granular API access for smaller requirements.

  • @Dragonette666
    @Dragonette666 5 месяцев назад +3

    Piracy is a robbery or violent action on the seas or in the air without state authorization.
    Companies use dramatic terms for what at best amounts to shoplifting

    • @chronometer9931
      @chronometer9931 5 месяцев назад +1

      I think you mean window shopping

    • @Dragonette666
      @Dragonette666 5 месяцев назад +3

      @@chronometer9931 photography might even be more accurate. If I could use a laser to duplicate my car have I really stolen anything?

  • @vtrandal
    @vtrandal 4 месяца назад

    Rant: You actually said piracy is not theft! That’s crazy but then I thought Sean Parker should’ve gone to prison for Napster. Let’s just say piracy is theft and people that ignore that should be made an example of until they believe piracy is theft and stop it. End of rant. I watched the rest of the video. This is an excellent video. Thank you for the work you do.

  • @SillyScores
    @SillyScores 5 месяцев назад +10

    Admit it, you came up with the airplane metaphor just to use the clip at 2:03

    • @namakudamono
      @namakudamono 5 месяцев назад +1

      I’ve rewatched that clip like 10 times already now...gotta scrape the source!

  • @diana.bacircea
    @diana.bacircea 4 месяца назад

    1:29 As a Romanian, the fact that you used a video of people exchanging RONs to talk about the black market caught me completely off guard 😂

  • @Dvd-Znf
    @Dvd-Znf 5 месяцев назад +6

    At 1:29
    ROMANIAN LEU SPOTTED

    • @B5OD
      @B5OD 4 месяца назад +1

      Let's Goooooo

  • @akashthoriya
    @akashthoriya 4 месяца назад

    This is why many U.S.-based companies outsource their web scraping tasks to developers or firms in India. Firstly, India does not have stringent laws regarding web scraping, making it a more viable option legally. Secondly, the labor costs in India are significantly lower, yet the productivity and quality of work remain exceptionally high. This combination of factors makes India an attractive destination for such technical work.

  • @henry_9
    @henry_9 5 месяцев назад +15

    "I refuse to go back to jail" - Fireship

  • @AlanRogers-c8d
    @AlanRogers-c8d 5 месяцев назад +2

    Purchasing a stock may seem straightforward, but selecting the correct stock without a proven strategy can be exceedingly challenging. I've been working on expanding my $210K portfolio for a while, and my primary obstacle is the lack of clear entry and exit strategies. Any advice on this matter would be greatly appreciated.

    • @MichaelHatfield-l8g
      @MichaelHatfield-l8g 5 месяцев назад

      The strategies are quite rigorous for the regular-Joe. As a matter of fact, they are mostly successfully carried out by pros who have had a great deal of skillset/knowledge to pull such trades off.

    • @TonyVasquez-n7p
      @TonyVasquez-n7p 5 месяцев назад

      I agree, having a brokerage advisor for investing is genius! Amidst the financial crisis in 2008, I was really having investing nightmare prior touching base with a advisor. In a nutshell, i've accrued over $2m with the help of my advisor from an initial $350k investment.

    • @RuthDouglas-h1b
      @RuthDouglas-h1b 5 месяцев назад

      Impressive gains! how can I get your advisor please, if you dont mind me asking? I could really use a help as of now

    • @TonyVasquez-n7p
      @TonyVasquez-n7p 5 месяцев назад

      Lina Dineikiene' is the licensed coach I use. Just research the name. You'd find necessary details to work with a correspondence to set up an appointment.

    • @BenjaminForest-d6y
      @BenjaminForest-d6y 5 месяцев назад

      Thank you for this Pointer. It was easy to find your handler, She seems very proficient and flexible. I booked a session with her.

  • @ZaidAsghar49
    @ZaidAsghar49 5 месяцев назад +4

    Gonna Scrap Reddit And then train my AI Chatbot to it
    Gonna be very pious

  • @RobertGuilman
    @RobertGuilman 5 месяцев назад

    As an admin/research assistant, i web scrape shamelessly, it lessen my job description almost to 30% of its original size. You made me kinda worried there until you gave the full explanation.😂

  • @l30n.marin3r0
    @l30n.marin3r0 3 месяца назад

    All I've seen from you is content like and I guess I will have to take a deep dive into your content to see if there is anything else but this all comes down to education and the willingness of people to learn.

  • @Me_Jawad
    @Me_Jawad 5 месяцев назад +50

    Go back to jail?

    • @HelloThere-xs8ss
      @HelloThere-xs8ss 5 месяцев назад +1

      Going to jail is not that bad. It helps you think about your codebase.

    • @GoldenAdhesive
      @GoldenAdhesive 4 месяца назад

      He dug a tunnel out of horny jail

  • @u2b83
    @u2b83 4 месяца назад

    I've done some successful scraping of scientific data using Selenium and Mechanize, as part of my past paid employment.
    What I want to do in the near future is scrape valuable data like from eBay and public financial data (which is available for a limited time). Is it illegal to sell this data later? I suspect it would be, especially option price data.
    Use-case: people want to back-test and see historic prices beyond 90 days.

  • @TheJynx2011
    @TheJynx2011 5 месяцев назад +6

    No views but has comments... hmmm...

    • @michaelnurse9089
      @michaelnurse9089 5 месяцев назад +2

      I always comment before watching...and eat supper after brushing my teeth...just born that way

    • @turolretar
      @turolretar 5 месяцев назад

      ⁠@@michaelnurse9089I guess you are a LIFO

  • @JackeryThompson-lq8zk
    @JackeryThompson-lq8zk 4 месяца назад

    I really like how you incorporated smoking into this. Very related

  • @ALBOE247
    @ALBOE247 5 месяцев назад +18

    39 seconds ago is arousing

  • @dieterverbeemen6015
    @dieterverbeemen6015 5 месяцев назад

    Worst case scenario, you must write everything by hand. If you visit a publicly accessible website, you are allowed to write down the information you see. No one can forbid you from writing somthing on paper. Furthermore, you're also allowed to make a digital copy of your own handwritten notes. Therefore, if A => B and B => C, then A => C as well. Q.E.D.?

  • @earth2k66
    @earth2k66 5 месяцев назад +5

    Scraping data available on public domain and using it for private or non profit/educational public use should be fine.
    But redistribution of the public data for profit is ethically wrong.

    • @Denis-qv5yj
      @Denis-qv5yj 5 месяцев назад +2

      No, it isn't

    • @DK-ym9zv
      @DK-ym9zv 5 месяцев назад

      Google and all search engines are basically scraping all internet and using them to profit

    • @quanleanh6548
      @quanleanh6548 4 месяца назад

      no it isnt. Why would it be

  • @Doug-rv3nr
    @Doug-rv3nr 5 месяцев назад +1

    Couldn’t reselling the tickets align with what stubhub or Ticketmaster do? Why wouldn’t webscraping fall under the same law as why street cameras are legal? It’s the same as if someone is there looking at the webpage.

  • @CC1.unposted
    @CC1.unposted 5 месяцев назад +3

    there's one rule of Internet
    every is open source
    that means as long as something is Online your legally allowed to view it and use data in it, with recpect to credits (you can't re credit)
    hopefully
    I don't know much but I think it should work anyways I'm a programer I think logical Not using the Moods

    • @David-ln8qh
      @David-ln8qh 5 месяцев назад +3

      barf

    • @theod0r
      @theod0r 5 месяцев назад +1

      Absolute garbage take.
      So artists creating artwork and putting it online just waive any copyright to it?

    • @paultapping9510
      @paultapping9510 5 месяцев назад +3

      ​@theod0r how do you think google image search works?

    • @theod0r
      @theod0r 5 месяцев назад +2

      @@paultapping9510 The same way google search works.
      It links to the source, using a low quality or partial sample for direct search results.
      Google image search also doesn't blatantly re-sell or remix your original work, because that'd be a massive violation of copyright law.
      Which is what OP is suggesting.
      You don't know what you're talking about.

    • @VictorYarema
      @VictorYarema 4 месяца назад

      @@theod0r where exactly artists publish their original artworks online without down-scaling or not behind the paywall?

  • @skoltr
    @skoltr 5 месяцев назад

    We used a raspberry PI with a mobile simcard to prevent IP bans for webscraping loan data from lenders. Was pretty cool, pi's were just proxies and we puppeteered all the data.

  • @diogoribeiro868
    @diogoribeiro868 5 месяцев назад +3

    hey 1st

    • @TheJynx2011
      @TheJynx2011 5 месяцев назад

      Bumping this comment

  • @xgcwrought3346
    @xgcwrought3346 4 месяца назад

    Im learning as a dev even if your not a hacker and have good intentions you still have to be careful what you do cause you could wind up in jail without trying to do anything wrong.

  • @maddingue
    @maddingue 5 месяцев назад

    There's one assertion you made that is most of the time, but not always, true. For example, if you have a web site which proposes products or services attached to someone's phone number, tax number, vehicle plate number, etc, fetching and presenting the corresponding data to the client may cost a non trivial amount to money. It's usually considered as a kind of investment, in the hope that the service will help the user and make them a client by buying a product or service.
    Someone massively scraping this kind of data will generate significative costs for the targeting sites.

    • @chronometer9931
      @chronometer9931 5 месяцев назад

      Well that's their problem then...

    • @chronometer9931
      @chronometer9931 5 месяцев назад

      No one is forcing them to do that, it's up to them to decide if that model works or not and if not then they have to change it.

    • @maddingue
      @maddingue 5 месяцев назад

      @@chronometer9931 When selling goods or services in countries like Spain or Portugal, there’s a legal obligation to obtain the client NIF and check if it’s valid by requesting a state database through a service operated by third-party companies. This is not free and has a cost.

  • @paulwhiterabbit
    @paulwhiterabbit 5 месяцев назад

    I remember having a task to scraping multiple websites for book info by using ISBN. Back then, I didn't know it was illegal/grey area and just did it anyway.

  • @thisiswill
    @thisiswill 5 месяцев назад

    2:41 if "one could argue that 'exploiting' someone else's data for profit could violate that person's copyrights" users would be getting paid for the amount of metadata sold across advertisement networks with airquotes consent. A thing which is not founded in explicit-and-knowing consent from users. The entire internet would come to a stand-still. The Ryanair decision has no long-term standing because courts have long and de facto sided with allowing users data to become the property of whatever website with which a user interacts.

  • @goodfortunetoyou
    @goodfortunetoyou 5 месяцев назад +1

    The web server is fulfilling a web request. They could require identification prior to sending the data if they don't want people "scraping" it.

    • @thesenamesaretaken
      @thesenamesaretaken 5 месяцев назад

      This was my thought. Scraping boils down to downloading an html file and parsing it. It's too trivial to be illegal.

    • @realEchoz
      @realEchoz 5 месяцев назад

      @@thesenamesaretaken i appreciate your optimism but i wouldn't put it past lawmakers

    • @IamFrancoisDillinger
      @IamFrancoisDillinger 5 месяцев назад

      This was always my assumption: If the data is behind a paywall, login, etc. Then you can't scrape it legally. If you can access it without any form of logging in or identification then do what you will. But I guess I'm wrong.

  • @saBasitMireasa
    @saBasitMireasa 4 месяца назад

    1:29 thank you for putting Romanian currency in your video when talking about the black market

  • @Chris-qg6kc
    @Chris-qg6kc 5 месяцев назад

    The true story is that one of the top execs at Booking, commented on a flight and spoke harshly of the landing. With the world being so small, word got around and, well, there you go.

  • @AntonioZL
    @AntonioZL 5 месяцев назад

    This makes absolutely no sense. A website provides the data freely for web browsers to render and interact with their back end. I'm just extracting relevant information from it programatically. I could literally open the source code for a page and manually collect and stuff.

  • @negritoojosclaros
    @negritoojosclaros 5 месяцев назад +2

    "thanks for watching, and I will need you to testify in court!🥰🥰"

  • @clray123
    @clray123 5 месяцев назад

    The difference whether something is legal or not is mostly in whether it is a powerful party doing it against a powerless one or vice versa.