Python AI Web Scraper Tutorial - Use AI To Scrape ANYTHING

Поделиться
HTML-код
  • Опубликовано: 29 дек 2024

Комментарии • 209

  • @TechWithTim
    @TechWithTim  3 месяца назад +12

    GET MY FREE SOFTWARE DEVELOPMENT GUIDE👇
    training.techwithtim.net/free-guide

    • @JTient
      @JTient 3 месяца назад

      Yeah I was making a web crawler with AI and found out you can get banned by an ISP.

    • @polashbanik53
      @polashbanik53 3 месяца назад

      sure man i will join with you

  • @norminemralino2260
    @norminemralino2260 3 месяца назад +17

    Tim is back with a banger tutorial! This is the kind of project/tutorial that made me subscribe to Tech With Tim in the first place. He takes a fairly complicated task and figures out how to make the task not as hard or doable. I’m really happy that he’s finally using Streamlit. It was something I commented and asked for a few projects back. Can you imagine how much worse it would be if Tim was just taking input and printing out content directly from a console? Anyway great job on this vid. I’m looking forward to the next one

  • @GokuDoku4
    @GokuDoku4 3 месяца назад +17

    I am creating one too using actual HTML, CSS and JavaScript and I am having a lot of fun coding this! Keep it up 😁👍

  • @onealoneal7047
    @onealoneal7047 4 месяца назад +3

    No need to see till the end you always provide great contents. Thank you . Keep working.

  • @tengdayz2
    @tengdayz2 4 месяца назад +2

    Thank you. I like that you give us alternative suggestions to your sponsor, but still invaluably represent them. Tim's gotta eat too, but you seem to get that having fun with it all comes first.

  • @ayanjawaid2251
    @ayanjawaid2251 4 месяца назад +83

    idk if you will believe this... but yesterday i asked gpt to give a unique idea and it gave me this exact idea related to web scraping... strealit too😮😮😮.... you are mind reader tim

    • @Rahul-ce9yz
      @Rahul-ce9yz 4 месяца назад

      You are hacked😂😂

    • @Abderraouf_IDEL
      @Abderraouf_IDEL 4 месяца назад +22

      Or he also asked chatgpt the same question and did a video about it for his sponsorship

    • @Nawdog
      @Nawdog 2 месяца назад +2

      He likely used ChatGPT or their api to develop the idea. It’s trained on user data, I had several projects that I never saw anywhere being released about a week after talking to gpt about it.

    • @RhumpleOriginal
      @RhumpleOriginal Месяц назад +3

      ​@@Nawdogyou have to turn off the setting that allows them to use what you discuss.

    • @arson1340
      @arson1340 20 дней назад

      it is the algorithm

  • @RaghavKumarR
    @RaghavKumarR 4 месяца назад +1

    Your content is high quality and top notch . Fantastic one brother , keep doing more stuff like this. Love to see it and really really appreciate it

  • @jordanjackson6151
    @jordanjackson6151 3 месяца назад

    So glad for this recent upload! Web Scraping is a little iffy to do since last year. Gotta stay updated.

  • @JitheshPs-r2t
    @JitheshPs-r2t 4 месяца назад +10

    very well explained even a beginner could understand and great content you just earned a new subscriber :)

  • @willhelliwell
    @willhelliwell 15 дней назад

    This got me a long way towards what I needed. Thank you! Bit of AI help and I can now scrape iFrames inside the site too.

  • @KinoInsight
    @KinoInsight 2 месяца назад +9

    Wonderful video. But can also post the web scraping tutorial without bright data? Just looking to save cost.
    But i like the way you teach - simple and easy descriptions supported by context specific highlights.
    Thank you.

  • @GJRahul-rr3uk
    @GJRahul-rr3uk 11 дней назад

    👏👏👏
    Helped me a lot!! Learned a lot and keep posting such contents.
    Your channel is a blessing

  • @bidam224
    @bidam224 2 месяца назад

    This is what I was looking for and now I see it on my recommended screen. Thanks!

  • @tomwawer5714
    @tomwawer5714 Месяц назад

    Great content! I suggest to save the html in a file and test bs4 code on a file to avoid block by website.

  • @SimoSimo-o8v
    @SimoSimo-o8v 2 месяца назад +1

    За просмотр одного бесплатного видео, поднял больше чем за месяц платного курса по спредовой торговле! Продолжай в том же духе помогать людям выбираться из нищеты! Дай Бог тебе здоровья и долгих лет жизни!!!

  • @BenRogersWPG
    @BenRogersWPG 4 месяца назад

    Very cool concept and great code walkthrough Tim!

  • @ErebeForgeLabs
    @ErebeForgeLabs 4 месяца назад

    you got some powers of reading minds bro , thank you so much...

  • @Asparuh.Emilov
    @Asparuh.Emilov 4 месяца назад +1

    On of the most useful videos in RUclips ever! Thank you so much bro! 👏🏻👏🏻👏🏻♥️♥️♥️

  • @cornelisderuiter4279
    @cornelisderuiter4279 4 месяца назад +5

    Actually busy with a project like this atm. This is great thanks Tim.

    • @TechWithTim
      @TechWithTim  4 месяца назад

      Cool let me know how yours compares!

    • @Virgilplaydirty
      @Virgilplaydirty 16 дней назад

      @@TechWithTim Tim, when i try and parse content after giving instructions to the llm it does not work, it just resets the whole process of scraping. what do i do?

  • @Leonardo_A1
    @Leonardo_A1 3 месяца назад

    WOW one of the best video about using and development with Ai for developers (Consultants) like me. Thanks a lot for this great video.
    I will use this case to build and extend it a littele bit.
    Have a great and peaceful time. Best regards from Germany. CU Leonardo

  • @hsimosa
    @hsimosa 2 месяца назад

    Excellent Tim. Thanks for this tutorial.

  • @amberforrester.m
    @amberforrester.m Месяц назад

    Incredible tutorial! Thank you for this!!!!

  • @hakanyuceturk5989
    @hakanyuceturk5989 2 месяца назад

    perfect explanation and great content. narration is great for all levels I think.

  • @web3jerry
    @web3jerry 3 месяца назад +1

    I'm really learning alot from you man 🥺 alongside a course i'm taking here on RUclips by a RUclipsr. I've always wanted to know how to code yea and I love anything "AUTOMATION" & "BOTS" call me crazy 😂😂😂

  • @jorper98
    @jorper98 Месяц назад

    Fantastic content. Very well layer out session!! Thank you great work! New sub!

  • @GrantNaylor-b8l
    @GrantNaylor-b8l 3 месяца назад

    Such a good and practical example! I've managed to build something entirely different with Ollama 3.1 ;-)

  • @dimox115x9
    @dimox115x9 3 месяца назад +1

    Thank you very much Tim, that's helpful, I love these kind of projects, keep up the good work :)

  • @k3kssks
    @k3kssks 2 месяца назад

    Really practical project. Thanks a lot !

  • @enderboy175
    @enderboy175 2 месяца назад

    WTH man this vid is a dub its 🔥🔥

  • @pythonenthusiast9292
    @pythonenthusiast9292 4 месяца назад +1

    can you make more such vids of this python + ai combination? these are awesome

  • @helloansuman
    @helloansuman 3 месяца назад +3

    great work. Now let's scrap the whole website instead of only 1 page.

  • @hubaibm5529
    @hubaibm5529 3 месяца назад +1

    Hey, great tutorial. Just a quick question, why not use undetected chromedriver package instead of normal selenium? Among other advantages, unlike this method, in uc you won't need to download chromedriver again and again when the chrome gets updated.

  • @iosule3719
    @iosule3719 3 месяца назад +2

    hello Tim, I'm actually shocked to see what streamlit is capable of after months of trying to do build complex projects with Flask and btw i did finish building my site its a website that allows anonymous posts and everything is stored in a mysql database, i used pythonanywhere to host it. my question is, should i quit flask and start streamlit or stick with flask?,
    coz mainly, i wanna focus more on backend like advanced database features and more

  • @madeshvaithya8046
    @madeshvaithya8046 4 месяца назад

    Your videos keeps me away from playing PUBG bro😂😂

  • @aibeginnertutorials
    @aibeginnertutorials 3 месяца назад

    Excellent tutorial Thanks!

  • @kenchang3456
    @kenchang3456 3 месяца назад

    Thanks for this, I really appreciate your work. And good luck and much success in Dubai.

  • @abtoog
    @abtoog 2 месяца назад

    great video, thanks for sharing.

  • @jacobdebrone
    @jacobdebrone 4 месяца назад

    Wow this is really creative .

  • @OnePieceShortGamer
    @OnePieceShortGamer 3 месяца назад

    bro did ig in the most old school way as possible

  • @hemantchawla
    @hemantchawla 22 дня назад

    If the page has dynamic content which gets loaded on clicking tabs, accordions, this will need further enhancements. Also, if you want to generalize it for multiple websites, it will be way more complicated.

  • @AdityaRaj-s6j
    @AdityaRaj-s6j 3 месяца назад

    Tim cooking everytime 🔥

  • @CodexOdyssey
    @CodexOdyssey 3 месяца назад +4

    Brother, please make a video teaching about making an AI chatbot to control API and database.

  • @srikanthkoltur6911
    @srikanthkoltur6911 4 месяца назад +1

    Thanks Tim it's helpful currently we put bits of html and get the right tags from chatgpt to build scrapers quickly
    But now I will put llms and try
    It's just llms are very expensive lol 😅

    • @TechWithTim
      @TechWithTim  4 месяца назад

      You can run them locally!

  • @polashbanik53
    @polashbanik53 3 месяца назад

    Thank you . Keep working.

  • @doitdifferent3856
    @doitdifferent3856 2 месяца назад

    39:30 could you also mention the way we can parallelize it

  • @AgGh-c5s
    @AgGh-c5s 3 месяца назад +1

    Please what is the best 'python for financial analysis and algotrading course' ???

  • @areebashakeel2042
    @areebashakeel2042 25 дней назад +1

    i have cloned this repo and it gives the following error whenever i am trying to scrape any website even the same that you have have scraped in the overview -
    AttributeError: 'NoneType' object has no attribute 'startswith'
    what is the issue

  • @sumdeo23
    @sumdeo23 Месяц назад

    Great tutorial! I wanted to implement this to parse additional pages (numerically paginated e.g., 1, 2, 3, 4). How to?

  • @niloben659
    @niloben659 Месяц назад

    Nice, can this project be deployed on netlify?

  • @NoName-qp7hq
    @NoName-qp7hq 2 месяца назад +1

    Bro that was my startup 😭😭

  • @hellothere31839
    @hellothere31839 4 месяца назад

    Great video, is there any way to use Bright Data without having a business email?

  • @uf9927
    @uf9927 3 месяца назад

    hi what is the use of these lines:
    for script_or_style in soup(["script", "style"]):
    script_or_style.extract()
    as per my understanding, "script_style" were never used for anything

  • @akshajande0519
    @akshajande0519 4 месяца назад

    just started watching.. hope i can get something out of it!

  • @Leonardo_A1
    @Leonardo_A1 3 месяца назад

    PLEASE , let us know which kind of machine (PC or Docker you use) .. THANKS a lot for your very cool videos. CU Leonardo

  • @Thazze00
    @Thazze00 3 месяца назад

    What theme do u use for VS Code? I liked it a lot :D

  • @kartikbhatnagar2219
    @kartikbhatnagar2219 2 месяца назад +1

    Doubt : Do we need to download OLLAM model everytime while running?

  • @fernandocorrales6028
    @fernandocorrales6028 4 месяца назад +1

    Next time, could you talk about decorators associated to a Class ?

  • @faisalishfaqciiisilver275
    @faisalishfaqciiisilver275 3 месяца назад +2

    Bro can I scrape more than 25000 rows from any website using this?

  • @jainamparekh3402
    @jainamparekh3402 2 месяца назад

    Can we scrape Google maps from this ?
    Because it does rendering only when we scroll down.
    Would it be able to get whole Dom at once without scrolling ?

  • @profesor6885
    @profesor6885 3 месяца назад +1

    Hey , I wanted to if we could do this say on our chromedriver ? when i tried using it in a chromedriver , i just couldnt copy the same html page source which sbr connection did, therby getting no html content. So I wanted to know if I could AI scrape using our driver or chrome driver ? as there are somethings only we could be doing , say logging into a page , which'd allow us only as we had logged in once in this IP , but wouldnt be possible in this sbr connection . I struggled days to get that , if it is possible , please help me out :))

  • @caokhoatrinh9121
    @caokhoatrinh9121 4 месяца назад

    Bro, have you thought of publishing the project idea and tech stack beforehand in your discord, so that everyone can try working with it before public these tutorials?
    Btw, thank you so much. I;ve learn alot by following your github and discord

  • @0xQwerty-x5e
    @0xQwerty-x5e 2 месяца назад

    Is it possible to use this as a template to create a chatbot that can scrape e-books online and return them as downloadable files?

  • @moroccangamereviews8824
    @moroccangamereviews8824 3 месяца назад +1

    Thanks for the great content! But I'm facing an issue with a website that limit the number of requests ?!! how could I bypass it?!!! Thanks community

  • @kodiak809
    @kodiak809 4 месяца назад +1

    OllamaLLM is run locally right? that means you can't deploy this?

  • @kangdanlin
    @kangdanlin 2 месяца назад

    Hello, can i scan facebook marketplace real estate ads with it, or does it need more coding?

  • @necuspam
    @necuspam 22 дня назад

    how this performs vs sites that use "robots" file? It is hard to believe it can scrap sites such as amazon, ebay or similar ad pages

  • @latlov
    @latlov 3 месяца назад

    How about scraping for Google Maps' reviews of multiple places for a given area? Make a tutorial about it, plz

  • @halloheinz
    @halloheinz 3 месяца назад

    Hi Tim, great content. I noticed your vscode shows more docs than mine when hovering over the syntax. for example when hovering over ChromeOptions() nothing shows for me but for you it does. Any tips on that?

  • @jandrinux
    @jandrinux Месяц назад

    i like this video men!!!

  • @siddhubhai2508
    @siddhubhai2508 4 месяца назад +1

    Hmmm, time to build my perplexity, some modifications and prompt engineering, and way far better than perplexity, isn't it!!

  • @ForRo3sS
    @ForRo3sS 4 месяца назад

    I moved to Abu Dhabi Tim! I and wish to improve in coding and hopefully get a job. Right now can't afford Dubai unfortunetly. But wish you a luck! And Thank you a lot for the project :):)

  • @salmesfer52
    @salmesfer52 Месяц назад

    will be able to download the table as excel file?

  • @saiavinash8547
    @saiavinash8547 3 месяца назад

    how do you access or interact with elements that are present inside shadow dom.

  • @fastmamajama
    @fastmamajama 4 месяца назад

    good stuff. i am using a script to capture ufos using opencv datasets and ollama. i am having a little trouble getting the right answer from ollama. it always gives different answers. i got figure out how to get a yes or no answer.

  • @showbikshowmma3520
    @showbikshowmma3520 2 месяца назад

    How does the scraping technique utilize website links to parse data, particularly in relation to the rules set by robots.txt files?

    • @webscrapingseniors
      @webscrapingseniors 2 месяца назад

      Even if a site uses robots.txt warnings, you can scrape the site as long as you extract information that is available to the public only, then avoid exhausting their servers with too many requests within short intervals(less than 10 seconds)

  • @lebayati2059
    @lebayati2059 3 месяца назад

    Is it possible to download a driver for Microsoft Edge and do everything for Edge instead?

  • @deepaklachman9340
    @deepaklachman9340 Месяц назад

    Could you use this for twitter??

  • @ngonibenjamin5955
    @ngonibenjamin5955 4 месяца назад

    great tutorial. But how do u handle the issue of pagination? Scrapers tend to grab only the first page of search results

    • @DaleIsWigging
      @DaleIsWigging 3 месяца назад +2

      usually it's just a number that has changed in the url
      e.g.
      baseURL/searchPage=0 becomes baseURL/searchPage=1
      so just do a for loop to loop through them all

    • @ngonibenjamin5955
      @ngonibenjamin5955 3 месяца назад

      @@DaleIsWigging good point. Makes sense thank you

  • @edgargill7828
    @edgargill7828 2 месяца назад

    This work with a site you need to be log in to scrap the data? What about a click to a link then get the data from the page.
    Thanks!

    • @webscrapingseniors
      @webscrapingseniors 2 месяца назад

      Yes, you can scrape data from a site that requires login, but you'll need to handle the authentication process first. Here’s how you can approach it without AI:
      Login Automation: Use libraries like requests or Selenium to automate the login process. With Selenium, you can simulate clicks and fill out forms as if you were manually logging in.
      Navigate to the Page: Once logged in, you can navigate to the desired page and extract the data. If the data is behind a link, you can use Selenium to click the link and then scrape the content from that page.
      Scraping Data: After reaching the target page, use BeautifulSoup or another scraping library to extract the information you need.
      If you need help with code snippets for any of these steps, just let me know

  • @Akriti-at22
    @Akriti-at22 3 месяца назад

    This is very interesting

  • @CynthiaWong-l9r
    @CynthiaWong-l9r Месяц назад +2

    It just didn't work, i followed every steps, but it just didn't work, it's not the first time I follow your script but didn't work, very frustrated. the browser can pop, but the html only appear in the web not in the terminal, i tried several websites, including Tim's website, it's just didn't work. I spent whole day on it, so disappointing

  • @aaronhehe7311
    @aaronhehe7311 3 месяца назад

    Why am I not able to pip install the requirements, I copied and pasted but it's green instead of yellow and underlined like in the video

  • @DeviceDuo-sl9rb
    @DeviceDuo-sl9rb 4 месяца назад

    Does this project work as a scrape for social media sites?

  • @explosiveenterprises1479
    @explosiveenterprises1479 2 месяца назад

    I'd like to figure out how to do something like this but on a site behind a login.

  • @satisfyingly1
    @satisfyingly1 2 месяца назад +2

    3:23 It is always good to mention the version of the python package.
    Otherwise when someone tries to set up this project after a long time, there will be an issue with the version that doesn't compete with the program

  • @KimDuyenNguyenThi-o4m
    @KimDuyenNguyenThi-o4m 2 месяца назад

    How can we solve capcha on app (not web)? can brightdata do it?

  • @RealPolitik-dy4it
    @RealPolitik-dy4it 3 месяца назад +2

    Brightdata no longer has the CAPTCHA bypass code

  • @Anesu-nv1mh
    @Anesu-nv1mh Месяц назад

    can it scrape photos and videos also and get it downloaded ?

  • @iMSps17
    @iMSps17 2 месяца назад

    What are the benefits of web scrapping?

  • @Leonardo_A1
    @Leonardo_A1 3 месяца назад

    One comment ... first I saw your Short-video on YT and I have some problemes to find this video. It's bad designed in YT to find the longterm version of video. SO maybe some more eplaining will be nice, how to find.

  • @messi8ballon_dor
    @messi8ballon_dor 3 месяца назад +1

    I saw your computer name and then I just have updated my Macbook name to Messi-Macbook-Pro-M1-Max

  • @owentheoutlaw_
    @owentheoutlaw_ 3 месяца назад +1

    I created a similar project 2 weeks ago that is more robust and powerful called Cyber-Scraper 2077, uses the similar approach!

  • @sandropxd
    @sandropxd 3 месяца назад

    Hi! How long does it usually take to parse the content? It says it's parsing, but it never gives me a response. I'm using Ollama 3.1 on Windows, and it either takes forever or doesn't work at all.

    • @TechWithTim
      @TechWithTim  3 месяца назад

      Depends on the size of the site. Can be minutes if it’s a huge dom

  • @Praudyogiki_Sangankyantram108
    @Praudyogiki_Sangankyantram108 Месяц назад

    Hey build ecommerce price comparison using web scrapping

  • @Jason-wm5qe
    @Jason-wm5qe 3 месяца назад +1

    I’ve been using Browserbase instead of hosting Chromium

  • @agasobanuyerockykirabiranya
    @agasobanuyerockykirabiranya 3 месяца назад +2

    Brooo please drop your “buy me coffee”

  • @EngineerK
    @EngineerK 2 месяца назад

    How would you get past websites with 2FA (either authenticator or SMS)?

  • @stargeneralltd
    @stargeneralltd 2 месяца назад

    Thanks Tim for this awesome tut, Strange though i am not able to get the code in the repo to work, keep getting errors like: WebDriverException: Message: Wrong customer name. Any ideas anyone?

    • @aasiyaansari7886
      @aasiyaansari7886 2 месяца назад

      I'm having the same issue and i have no idea how to solve it. Also the code i got from BrightData is very different from what Tim got. My code doesn't even have a captcha solver.

  • @lyricreationz2137
    @lyricreationz2137 3 месяца назад

    Is it legal to use in Final year college project

  • @jkscout
    @jkscout Месяц назад

    why doesn’t it work when you hit the parse content button the first time?