Reda Marzouk
Reda Marzouk
  • Видео 42
  • Просмотров 187 725
This AI Agent can Scrape ANY WEBSITE!!!
In this video, we'll create a python script together that can scrape any website with only minor modifications
________ 👇 Links 👇 ________
🤝 Discord: discord.gg/jUe948xsv4
💼 𝗟𝗶𝗻𝗸𝗲𝗱𝗜𝗻: www.linkedin.com/in/reda-marzouk-rpa/
📸 𝗜𝗻𝘀𝘁𝗮𝗴𝗿𝗮𝗺: redamarzouk.rpa
🤖 𝗬𝗼𝘂𝗧𝘂𝗯𝗲: www.youtube.com/@redamarzouk/videos
Website: www.automation-campus.com/
FireCrawl: www.firecrawl.dev/
Github repo: github.com/redamarzouk/Scraping_Agent
________ 👇 Content👇 ________
Introduction to Web Scraping with AI - 0:00
Advantages Over Traditional Methods - 0:36
Overview of FireCrawl Library - 1:13
Setting Up FireCrawl Account and API Key - 1:24
Scraping with FireCrawl : Example and Explanation - 1:36
Universal Web Scraping A...
Просмотров: 46 965

Видео

NEW GPT-4o: Prepare to be SHOCKED!!
Просмотров 1,5 тыс.2 месяца назад
In this video, we dive into the launch of GPT-4o by OpenAI, covering its new features and capabilities. We'll check out its real-time conversational speech, top-notch benchmark performance, and availability for free users. Stick around as we react to live demos and chat about it. 👇 Links 👇 🤝 Discord: discord.gg/jUe948xsv4 💼 𝗟𝗶𝗻𝗸𝗲𝗱𝗜𝗻: www.linkedin.com/in/reda-marzouk-rpa/ 📸 𝗜𝗻𝘀𝘁𝗮𝗴𝗿𝗮𝗺: instagram....
Llama 3 FULLY LOCAL on your Machine | Run Llama3 locally
Просмотров 1,1 тыс.3 месяца назад
FULLY Local Llama 3, on your machine. Run Llama 3-8B in a local server and integrate it inside your AI Agent project. 👇 Links 👇 🤝 Discord: discord.gg/jUe948xsv4 💼 𝗟𝗶𝗻𝗸𝗲𝗱𝗜𝗻: www.linkedin.com/in/reda-marzouk-rpa/ 📸 𝗜𝗻𝘀𝘁𝗮𝗴𝗿𝗮𝗺: redamarzouk.rpa 𝗬𝗼𝘂𝗧𝘂𝗯𝗲: www.youtube.com/@redamarzouk/videos LMStudio: lmstudio.ai/ Ollama: ollama.com/ www.automation-campus.com/ Introduction Llama3: 00:00...
Llama 3 BREAKS the industry !!! | Llama3 fully Tested
Просмотров 2,4 тыс.3 месяца назад
FULLY Tested Llama 3, the flagship model from Meta. Benchmark of GPT-4 vs GPT-4 Turbo vs Llama 3. 👇 Links 👇 lmstudio.ai/ 🤝 Discord: discord.gg/jUe948xsv4 💼 𝗟𝗶𝗻𝗸𝗲𝗱𝗜𝗻: www.linkedin.com/in/reda-marzouk-rpa/ 📸 𝗜𝗻𝘀𝘁𝗮𝗴𝗿𝗮𝗺: redamarzouk.rpa 𝗬𝗼𝘂𝗧𝘂𝗯𝗲: www.youtube.com/@redamarzouk/videos www.automation-campus.com/ 👇 Content👇 00:00 Introduction to Llama3 00:30 All you need to know about Lla...
GPT-4 Surpassed Claude 3 (Again) | GPT-4 Turbo fully tested
Просмотров 3,2 тыс.3 месяца назад
FULLY Tested GPT-4 Turbo, the flagship model from OPENAI. Benchmark of GPT-4 vs GPT-4 Turbo 👇 Links 👇 lmstudio.ai/ 🤝 Discord: discord.gg/jUe948xsv4 💼 𝗟𝗶𝗻𝗸𝗲𝗱𝗜𝗻: www.linkedin.com/in/reda-marzouk-rpa/ 📸 𝗜𝗻𝘀𝘁𝗮𝗴𝗿𝗮𝗺: redamarzouk.rpa 𝗬𝗼𝘂𝗧𝘂𝗯𝗲: www.youtube.com/@redamarzouk/videos www.automation-campus.com/ 👇 Content👇 00:00 Introduction and AI Week News 00:11 Launch of a new AI model by M...
AUTOGEN STUDIO : The Complete GUIDE (Build AI AGENTS in minutes)
Просмотров 9 тыс.3 месяца назад
The full guide to get started with Autogen Studio, Create Powerful AI Agents in a couple of minutes with real life projects. 👇 Links 👇 lmstudio.ai/ 🤝 Discord: discord.gg/jUe948xsv4 💼 𝗟𝗶𝗻𝗸𝗲𝗱𝗜𝗻: www.linkedin.com/in/reda-marzouk-rpa/ 📸 𝗜𝗻𝘀𝘁𝗮𝗴𝗿𝗮𝗺: redamarzouk.rpa 𝗬𝗼𝘂𝗧𝘂𝗯𝗲: www.youtube.com/@redamarzouk/videos www.automation-campus.com/ 👇 Content👇 00:00 Introduction to Agentic Workflow...
Easily Run LOCAL Open-Source LLMs for Free
Просмотров 3 тыс.4 месяца назад
Run locally hosted open source LLM for free. LMStudio helps you download and run private Models from huggingFace in a no code environment, it's a solid Free Chatgpt alternative. 👇 Links 👇 lmstudio.ai/ 🤝 Discord: discord.gg/jUe948xsv4 💼 𝗟𝗶𝗻𝗸𝗲𝗱𝗜𝗻: www.linkedin.com/in/reda-marzouk-rpa/ 📸 𝗜𝗻𝘀𝘁𝗮𝗴𝗿𝗮𝗺: redamarzouk.rpa 𝗬𝗼𝘂𝗧𝘂𝗯𝗲: www.youtube.com/@redamarzouk/videos www.automation-campus.c...
Elon Does The Unthinkable, Grok-1 is officially the LARGEST Open Source mode!!
Просмотров 2,7 тыс.4 месяца назад
Elon musk has just launched Grok-1 to the rest of the world. #elonmusk #grok #chatgpt #openai 👇 Links 👇 🤝 Discord: discord.gg/jUe948xsv4 💼 𝗟𝗶𝗻𝗸𝗲𝗱𝗜𝗻: www.linkedin.com/in/reda-marzouk-rpa/ 📸 𝗜𝗻𝘀𝘁𝗮𝗴𝗿𝗮𝗺: redamarzouk.rpa 𝗬𝗼𝘂𝗧𝘂𝗯𝗲: www.youtube.com/@redamarzouk/videos www.automation-campus.com/downloads
This Agent can create Dalle Images at SCALE!!
Просмотров 7544 месяца назад
Agent to generate images on Dalle 3 automatically. #chatgpt #gpt #dalle3 #automation 👇 Websites👇 Cloud.uipath.com 🤝 Discord: discord.gg/jUe948xsv4 💼 𝗟𝗶𝗻𝗸𝗲𝗱𝗜𝗻: www.linkedin.com/in/reda-marzouk-rpa/ 📸 𝗜𝗻𝘀𝘁𝗮𝗴𝗿𝗮𝗺: redamarzouk.rpa 𝗬𝗼𝘂𝗧𝘂𝗯𝗲: www.youtube.com/@redamarzouk/videos www.automation-campus.com/downloads 👇 Content👇 00:00 Dalle Agent 01:05 Prerequisites 02:34 Agent steps 06:39 R...
The HARSH REALITY of being an RPA Developer!!
Просмотров 3,5 тыс.4 месяца назад
Introducing the latest innovation in AI technology: Digital Agents that can control your desktop! With ChatGPT, you can now have a virtual assistant that can perform tasks on your computer, just by chatting with it. and this one can operate all of your desktop/web apps. 👇 Websites👇 github.com/OthersideAI/self-operating-computer 🤝 Discord: discord.gg/jUe948xsv4 💼 𝗟𝗶𝗻𝗸𝗲𝗱𝗜𝗻: www.linkedin.com/in/re...
This AI Agent can CONTROL your ENTIRE DESKTOP!!!
Просмотров 9 тыс.4 месяца назад
Introducing the latest innovation in AI technology: Digital Agents that can control your desktop! With ChatGPT, you can now have a virtual assistant that can perform tasks on your computer, just by chatting with it. and this one can operate all of your desktop/web apps. 👇 Websites👇 github.com/OthersideAI/self-operating-computer 🤝 Discord: discord.gg/jUe948xsv4 💼 𝗟𝗶𝗻𝗸𝗲𝗱𝗜𝗻: www.linkedin.com/in/re...
10x your PRODUCTIVITY with this NEW AI tool !!!
Просмотров 15 тыс.5 месяцев назад
Improve your productivity with this amazing new AI tool! This tutorial will show you how to use this tool to copy and paste from any document, screen, or application. Say goodbye to time-consuming tasks and hello to increased efficiency with this game-changing tool! 👇 Websites👇 www.uipath.com/product/clipboard-ai 🤝 Discord: discord.gg/jUe948xsv4 💼 𝗟𝗶𝗻𝗸𝗲𝗱𝗜𝗻: www.linkedin.com/in/reda-marzouk-rpa/...
A Free Personal AI Agent that actually WORKS!!!
Просмотров 27 тыс.5 месяцев назад
Learn about the future of digital automation with autonomous agents, large action models. Discover how these technologies are transforming industries and improving efficiency and productivity. Don't get left behind, stay ahead of the game and find out what the future holds for digital automation! 👇 Websites👇 🤝 Discord: discord.gg/jUe948xsv4 💼 𝗟𝗶𝗻𝗸𝗲𝗱𝗜𝗻: www.linkedin.com/in/reda-marzouk-rpa/ 📸 𝗜𝗻...
UiPath joins Large Action Model Race
Просмотров 1,9 тыс.5 месяцев назад
In this video you'll learn how to create a robot to fill forms automatically on any website and with only minimal changes. 👇 Websites👇 www.automation-campus.com/downloads cloud.uipath.com/ 🤝 Discord: discord.gg/jUe948xsv4 💼 𝗟𝗶𝗻𝗸𝗲𝗱𝗜𝗻: www.linkedin.com/in/reda-marzouk-rpa/ 📸 𝗜𝗻𝘀𝘁𝗮𝗴𝗿𝗮𝗺: redamarzouk.rpa 𝗬𝗼𝘂𝗧𝘂𝗯𝗲: www.youtube.com/@redamarzouk/videos 👇 Content👇 00:00 Intro 00:49 Data E...
UiPath made PDF Extraction a lot easier - Document Understanding UiPath
Просмотров 2,1 тыс.6 месяцев назад
Extract any pdf using 4 simple UiPath Activities, follow the steps in the video and you'll have a single process to interact with any pdf file. 👇 Websites👇 www.automation-campus.com/downloads cloud.uipath.com/ 🤝 Discord: discord.gg/jUe948xsv4 💼 𝗟𝗶𝗻𝗸𝗲𝗱𝗜𝗻: www.linkedin.com/in/reda-marzouk-rpa/ 📸 𝗜𝗻𝘀𝘁𝗮𝗴𝗿𝗮𝗺: redamarzouk.rpa 𝗬𝗼𝘂𝗧𝘂𝗯𝗲: www.youtube.com/@redamarzouk/videos 👇 Content👇 00:...
UiPath Queues For Beginners
Просмотров 2,3 тыс.9 месяцев назад
UiPath Queues For Beginners
UiPath Errors Troubleshoot - The only Trick you'll EVER need
Просмотров 8439 месяцев назад
UiPath Errors Troubleshoot - The only Trick you'll EVER need
Object reference not set to an instance of an object - UiPath - BEST SOLUTION!!!
Просмотров 8 тыс.10 месяцев назад
Object reference not set to an instance of an object - UiPath - BEST SOLUTION!!!
ChatGPT API Advanced Configuration | GPT-4 API Pricing
Просмотров 66610 месяцев назад
ChatGPT API Advanced Configuration | GPT-4 API Pricing
REVOLUTIONARY!!!!! UiPath Document Understanding and Generative AI - Invoice Data Extraction
Просмотров 4,6 тыс.11 месяцев назад
REVOLUTIONARY!!!!! UiPath Document Understanding and Generative AI - Invoice Data Extraction
Don't Miss Out: UiPath Document Understanding and Generative AI. (Game-Changer!!!!)
Просмотров 3,5 тыс.11 месяцев назад
Don't Miss Out: UiPath Document Understanding and Generative AI. (Game-Changer!!!!)
UiPath Advanced Certification | Activities and Properties Part 2 | Questions And Answers
Просмотров 38211 месяцев назад
UiPath Advanced Certification | Activities and Properties Part 2 | Questions And Answers
UiPath Advanced Certification | Activities and Properties | Practice Test Solutions
Просмотров 37111 месяцев назад
UiPath Advanced Certification | Activities and Properties | Practice Test Solutions
UiPath Advanced Certification | UiPath Studio | UiPath Practice Exam
Просмотров 50811 месяцев назад
UiPath Advanced Certification | UiPath Studio | UiPath Practice Exam
UiPath Advanced Certification | State Machine, Flowchart and Sequence in UiPath | UiPath RPA
Просмотров 46811 месяцев назад
UiPath Advanced Certification | State Machine, Flowchart and Sequence in UiPath | UiPath RPA
UiPath Advanced Certification | How to get certified in UiPath RPA in 2023
Просмотров 1,7 тыс.Год назад
UiPath Advanced Certification | How to get certified in UiPath RPA in 2023
Resume Screener - Extract data from CV PDF documents using UiPath and ChatGPT
Просмотров 2,7 тыс.Год назад
Resume Screener - Extract data from CV PDF documents using UiPath and ChatGPT
UiPath - Download File From URL | How to download file from website using UiPath
Просмотров 6 тыс.Год назад
UiPath - Download File From URL | How to download file from website using UiPath
UiPath Excel Add-In | UiPath Attended Automation inside Excel
Просмотров 677Год назад
UiPath Excel Add-In | UiPath Attended Automation inside Excel
Top 3 Changes in the MODERN Design of UiPath Studio
Просмотров 535Год назад
Top 3 Changes in the MODERN Design of UiPath Studio

Комментарии

  • @clarkzara15
    @clarkzara15 3 дня назад

    I'm currently working on a project where I need to classify documents, but I'm encountering an error every time I use the "Classify Document" activity. The error message I receive is: "Cannot authenticate, not connected to Orchestrator. Exception details: The document understanding service url was not provided." I have connected my UiPath Assistant to Orchestrator, and my machine appears as connected in the Orchestrator dashboard. However, I'm still facing this issue. Could you please provide some guidance on what might be causing this error and how I can resolve it?

  • @luanmotta5591
    @luanmotta5591 4 дня назад

    I tried FireCrawl with an url gerated by Storybook in react, and didn't work :(

  • @TrejonEdmonds
    @TrejonEdmonds 5 дней назад

    Web scraping (getting data) and parsing (making sense of it) are two crucial steps for data extraction, often misunderstood as interchangeable. While AI promises a magic solution for parsing, it's expensive, unreliable, and environmentally unfriendly. It's better suited for rare cases where traditional methods struggle. Here, data pre-processing, training data creation, and fine-tuning a specific AI model is the key for success. Overall, scraping and parsing remain essential, with AI as a valuable tool for specific situations.

  • @bestebahn
    @bestebahn 6 дней назад

    I have a question, the website that you're using seem to be listings from a city like San Francisco but the results that you're getting only have around 10 entries scraped. Why aren't there more?

    • @redamarzouk
      @redamarzouk 6 дней назад

      The reason is that website like the one I scrapped don't load the data unless we scroll down physically. Meaning we have to open the website and scroll using libraries like playwright that opens a browser instance using chromium and then scroll all the way down and then you'll have all the html.

  • @Islandrecords28
    @Islandrecords28 7 дней назад

    installed but when i put in operate nothing happens, here is the message PS C:\Users\Admin> operate Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "C:\Users\Admin\AppData\Local\Programs\Python\Python312\Scripts\operate.exe\__main__.py", line 4, in <module> File "C:\Users\Admin\AppData\Local\Programs\Python\Python312\Lib\site-packages\operate\main.py", line 6, in <module> from operate.operate import main File "C:\Users\Admin\AppData\Local\Programs\Python\Python312\Lib\site-packages\operate\operate.py", line 26, in <module> from operate.models.apis import get_next_action File "C:\Users\Admin\AppData\Local\Programs\Python\Python312\Lib\site-packages\operate\models\apis.py", line 7, in <module> import easyocr File "C:\Users\Admin\AppData\Local\Programs\Python\Python312\Lib\site-packages\easyocr\__init__.py", line 1, in <module> from .easyocr import Reader File "C:\Users\Admin\AppData\Local\Programs\Python\Python312\Lib\site-packages\easyocr\easyocr.py", line 9, in <module> from bidi.algorithm import get_display ModuleNotFoundError: No module named 'bidi.algorithm'

  • @nzb3290
    @nzb3290 7 дней назад

    Maintenance is hell with these, trying to read others automation is also hell

    • @redamarzouk
      @redamarzouk 6 дней назад

      yes when someone uses non standardized processes or even processes with REFramework with added state machines makes it harder to read and modify the workflow...

  • @CharliesCustomClones
    @CharliesCustomClones 8 дней назад

    You did a good job explaining something new, but, for 2024, Puppeteer and Cheerio will work better than AI

  • @OPPK100
    @OPPK100 13 дней назад

    So what would be a career to get into that AI wouldn’t dominate or not for long which covers most things needed to work in IT

    • @redamarzouk
      @redamarzouk 6 дней назад

      Whatever career you get into you have to see the extent in which AI is making easier and faster. in the case of RPA now we have AI Automation in Power automate that will allow you to create the process by recording the screen, describing in an audio or just prompting it. Same thing for UiPath, you can find an Autopilot that you can prompt to create the process you want and you can also find autopilot for so many other products they have (example UiPath apps Autopilot to create apps just from a prompt or a picture...). AI still isn't there in terms of automating a whole process, so only someone who knows how to use it will replace others, AI itself can't replace devs for the foreseeable future.

    • @OPPK100
      @OPPK100 5 дней назад

      I wanted to get into RPA but I’m a complete beginner. So I’m going with python data science first l. Then if I’m fine with it take the rpa classes. Thank you for your insight. I want to choose career path that’s long lasting. So I should take a AI class also

  • @YeafiAwal
    @YeafiAwal 16 дней назад

    What you are saying is 100% true, I really felt working with UiPath/AA -expectation is too high whereas things are not like that and then there are a lot of issues that are actually at the first place are not the best for automation/simply worst for test automation but who can make the upper management understand.

    • @redamarzouk
      @redamarzouk 6 дней назад

      I understand, been there and seen companies choose the worst processes to be their first robot to be implemented, but you can never convince them otherwise...

  • @yassirman1
    @yassirman1 17 дней назад

    الذكاء الإصطناعي، الخسران أ حمادي 😅😅 المهارات اللي جمعتي طوال سنين كمبرمج نقصات القيمة ديالها مع سهولة دخول منافسيس جدد. احتراماتي اخويا. قناة يوتوب ناضية

  • @jeremynx
    @jeremynx 19 дней назад

    thank u, what is your background and how to become RPA developer?

    • @redamarzouk
      @redamarzouk 6 дней назад

      you're welcome, I'm an RPA Dev myself for 7+ years now, and you can be an RPA Developer easily by choosing either Power Automate or UiPath and going to learn.microsoft.com/en-us/training/modules/get-started-flows/ or academy.uipath.com/courses I recommend UiPath since it has an entire ecosystem around RPA whilst Power automate is only considered part of the Power Platform and not very important for the giant company Microsoft.

  • @Alice8000
    @Alice8000 20 дней назад

    Tai Lopez?? Just driving my ferrari around the hollywood hills here

  • @Alice8000
    @Alice8000 21 день назад

    u nice bro

  • @user-dl6ds6ij8x
    @user-dl6ds6ij8x 21 день назад

    Great many thanks for sharing, quick questions how add line of code to go to page 2 and do the same thing then page 3 and so on Please.

    • @redamarzouk
      @redamarzouk 6 дней назад

      You're welcome, you'll have to crawl the pages first and then loop through them using the script I've shown.

  • @Sameer_Pattnaik
    @Sameer_Pattnaik 22 дня назад

    Sir how can I scrape raw data?

  • @aimattant
    @aimattant 23 дня назад

    Nice project, I worked on your code base for a while and used Groq mixtral instead, with multiple keys to pass limits, and Firecrawl is not automatic when it comes to pagination, you still need to add HTML code, which defeats the purpose, slow but ok for a free purpose. But I got around that I think. The next step is to use it in the front end. Zillow's API is only available for property developers, so scraping with manual inputs is the only way. However, working with the live API functionality would be the best way forward. Nice job!

    • @redamarzouk
      @redamarzouk 6 дней назад

      Thank you, most websites of real estate or any other industry hold on to their data very close and make you pay if you want to use their API. You'll almost always have to scrape data manually, and yes when it comes to pagination you'll have to make another script to crawl all the pages you'll be scraping.

  • @j.d.4697
    @j.d.4697 25 дней назад

    Having my computer managed by an AI I can naturally communicate with is one of my biggest dreams for the short-term future.

  • @EricAiken-oq4vu
    @EricAiken-oq4vu Месяц назад

    I can't find a OPENAI model that works for me. I've tried gpt-3, gpt-3.5, gpt-3.5-turbo-1186, I always get a 404 does not exist or you don't have access to it. GPT says use davinci or curie. Any suggestions?

  • @hemenths.k9009
    @hemenths.k9009 Месяц назад

    Hey, I am getting Invoke Code: Exception has been thrown by the target of an invocation. Error when ran this.

  • @Ashort12345
    @Ashort12345 Месяц назад

    The AI agent is unable to bypass Cloudflare, even after trying Ollama.

  • @dungtrananh1522
    @dungtrananh1522 Месяц назад

    Dear sir, can I use my local LLM models instead of OpenAI API?

    • @redamarzouk
      @redamarzouk 6 дней назад

      Yes you can, I did myself but I think as of now they simply don't perform as well. microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs/

  • @smokedoutmotions_
    @smokedoutmotions_ Месяц назад

    Thanks bro

  • @LearnAvecAmeen
    @LearnAvecAmeen Месяц назад

    Hello Si Reda, all the best insh'Allah :)

    • @redamarzouk
      @redamarzouk Месяц назад

      Thank you so much and to you too 😄

  • @sharifulislam7441
    @sharifulislam7441 Месяц назад

    Good technology to keep in good book!

  • @jatinsongara4459
    @jatinsongara4459 Месяц назад

    can we use this for email and phone number extraction

    • @redamarzouk
      @redamarzouk Месяц назад

      Absolutely you just need to change the websites and the fields and you’re good to go

  • @JoaquinTorroba
    @JoaquinTorroba Месяц назад

    What other options are beside Firecrawl? Thanks!

    • @JoaquinTorroba
      @JoaquinTorroba Месяц назад

      Just found it in the comments: "Firecrawl has 5K stars on GitHub, Jina ai has 4k and scrapegraph has 9k."

    • @redamarzouk
      @redamarzouk Месяц назад

      Exactly Jina AI, scrapegraph AI are also options

  • @FaithfulStreaming
    @FaithfulStreaming Месяц назад

    I like waht you did, but for no code people this is so hard because we dont know what we should install for windows etc.. really really nice video

  • @benom3
    @benom3 Месяц назад

    Can you scrape multiple URLs at once? For example if you wanted to scrape all the zillow pages not just the first page with a few houses. @redamarzouk

  • @avramgrossman6084
    @avramgrossman6084 Месяц назад

    This is a nice video and very useful. In my applications I'm looking for the 'system' to have ALL the customer pdF Invoices uploaded, or better yet, as an SalesOrder table in a database. this seems like a lot of work for just one customer and one email. Is there a way to create Agents that could filter out which customer order? Etc.

  • @AmanShrivastava23
    @AmanShrivastava23 Месяц назад

    I'm curious - what do you do after structuring the data - do you store it in a vector DB? If so, do you store the Json as it is or something else? And can it actually be completely universal - by that i mean can it structure data by us not providing the fields on which it should strucutre the data. Can we make it in some way where upload a website and it understands the data and structures it according to it?

  • @ilanlee3025
    @ilanlee3025 2 месяца назад

    Im just getting "An error occurred: name 'phone_fields' is not defined"

  • @nkofr
    @nkofr 2 месяца назад

    nice! any idea on how to self host firecrawl? like with Docker? also, can it be coupled with n8n? how?

    • @redamarzouk
      @redamarzouk 2 месяца назад

      I gotta be honest, I didn't even try. I tried to self host an agentic software tool before and my pc was going crazy, it couldn't take the load from Llama3-8B running on LM Studio plus docker plus filming at the same time, I simply don't have the hardware for it. if you want to self host here is the link: github.com/mendableai/firecrawl/blob/main/SELF_HOST.md it is with docker.

    • @nkofr
      @nkofr 2 месяца назад

      @@redamarzouk thanks. Is there any sense to use it with n8n? or maybe n8n can do the same without firecrawl? (noob here)

    • @nkofr
      @nkofr 2 месяца назад

      @@redamarzouk or maybe with things like Flowise?

  • @zvickyhac
    @zvickyhac 2 месяца назад

    can Use LLMA 3/ Phi3 on local pc ?

    • @redamarzouk
      @redamarzouk 2 месяца назад

      You theoretically can use it when it comes to Data Extraction, but you will need a large context window version of Llama3 or Phi3. I've seen a model where they have extended the context length to 1M tokens for Llama3-7B. you need to keep in my that your hardware need to match the requirements.

  • @karthickb1973
    @karthickb1973 2 месяца назад

    awesome bro

  • @kamalkamals
    @kamalkamals 2 месяца назад

    nop it s not better that GPT

    • @redamarzouk
      @redamarzouk 2 месяца назад

      you're right now it's not, these models are beating each other like there is no tomorrow, to this date GPT-4o is the one at the top.

    • @kamalkamals
      @kamalkamals 2 месяца назад

      @@redamarzouk before gpt 4 omni, gpt 4 turbo still better, the only best point with llama is free model :)

  • @titubhowmick9977
    @titubhowmick9977 2 месяца назад

    Nice video. Another helpful video on the same topic ruclips.net/video/dSX5eoD4-u4/видео.htmlsi=8iKzgqHG97Ivf8wK

  • @titubhowmick9977
    @titubhowmick9977 2 месяца назад

    Very helpful. How do you work around the output limit of 4096 tokens?

    • @redamarzouk
      @redamarzouk 2 месяца назад

      Hello, if you're using open ai api, you need to add the parameter (max_tokens=xxxxxxxx) inside your client open ai call and define a number that don't exceed the max number of token of the model you're using (128 000 for gpt-4o for example)

  • @YOGiiZA
    @YOGiiZA 2 месяца назад

    Helpful, Thank you

  • @IlllIlllIlllIlll
    @IlllIlllIlllIlll 2 месяца назад

    Does it work on mbp

  • @santhoshkumar995
    @santhoshkumar995 2 месяца назад

    I get Error code: 429 when running the code. -'You exceeded your current quota,...

    • @ilianos
      @ilianos 2 месяца назад

      In case you haven't used your OpenAI API key in a while: they changed the way it works, you need to pay in advance to refill your quota

  • @ArisimoV
    @ArisimoV 2 месяца назад

    Can you use this for self operating pc ? Thanks

    • @redamarzouk
      @redamarzouk 2 месяца назад

      Believe me I tried, but my NVIDIA RTX 3050 4Gb simply can’t withstand filming and running Llava at the same time. Hopefully I’ll upgrade my setup soon and be able to do it.

    • @ArisimoV
      @ArisimoV 2 месяца назад

      So it is possible it's just matter of programing and pc sepecs

  • @PointlessMuffin
    @PointlessMuffin 2 месяца назад

    Does it parse JavaScript, infinity scroll, button click navigations?

    • @morespinach9832
      @morespinach9832 2 месяца назад

      Yes, you can ask LLMs to do all that like a human would.

  • @ConsultantJS
    @ConsultantJS 2 месяца назад

    In the US, a “bedroom” is a room with a closet, a window, and a door that can be closed.

  • @bls512
    @bls512 2 месяца назад

    Neat overview. Curious about API costs associated with these demos. Try zooming into your code for viewers.

    • @morespinach9832
      @morespinach9832 2 месяца назад

      watch on big monitors as most coders do

    • @redamarzouk
      @redamarzouk 2 месяца назад

      for only the demo you've seen, I spent 0.5$, for creating the code and launching it 60+ times, I spent 3$. I will zoom in next time.

  • @shauntritton9541
    @shauntritton9541 2 месяца назад

    Wow! The AI was even clever enough to convert square meters into square feet, no need to write a conversion function!

  • @todordonev
    @todordonev 2 месяца назад

    Webscraping as it is right now is here to stay and AI will not replace it (it can just enhance it in certain scenarios). First of all the term "scraping" is tossed everywhere and being used vaguely. When you "scrape" all you do is move information from one place to another. For example getting a website's HTML into your computer's memory. Then comes "parsing", which is extracting different entities from that information. For example extracting product price and title, from the HTML we "scraped". These are separate actions, they are not interchangeable, one is not more important than the other, and one can't work without the other. Both actions come with their own challenges. What these kind of videos promise to fix is the "parsing" part of it. It doesn't matter how advanced AI gets, there is only ONE way to "scrape" information, and that is to make a connection to the place the information is stored(whether its HTTP request, browser navigation, RSS feed request, FTP download or a stream of data). It's just semi-automated in the background. Now that we have the fundamentals, let me clearly state this: For the vast majority(99%) of the cases "web scraping with AI" is a waste of time, money, resources and our environment. Time: its deceiving, as AI promises to extract information with a "simple prompt", you'll need to iterate over that prompt quite a few times in order to make a somewhat reliable data parsing solution. In that time you could have built a simple python script to extract the data required. More complicated scenarios will affect both the AI, and the traditional route. Money: You either use 3rd party services for LLM inference or you self-host an LLM. Both solutions in the long term will be in orders of magnitude more expensive than a traditional python script. Resources: A lot of people don't realize this but running an LLM for cases in which an LLM is not needed is extremely wasteful on resources. Ive ran scrapers on old computers, raspberry pi's and serverless functions, this is just a spec of dust of hardware requirements compared to running an LLM on an industrial grade computer with powerful GPU(s) Environment: As per the resources needed, this affects our environment greatly, as new and more powerful hardware needs to be invented, manufactured and ran. For the people that don't know, AI inference machines (whether self-hosted or 3rd party) are powerhouses, thus a lot of watt/hours wasted, fossil fuels burnt etc. Reliability: "Parsing" information with AI is quite unreliable, manly because of the nature of how LLMs work, but also because a lot more points of failure are introduced(information has to travel multiple times between services, LLM models change, you hit usage and/or budget limits, LLMs experience high loads and inference speed sucks or it fails all together, etc.) Finally: most of AI extraction is just marketing BS letting you believe that you'll achieve something that requires a human brain and workforce with just "a simple prompt". I've been doing web automation and data extraction for more than a decade for a living. Ive also started incorporating AI in some rare cases, where traditional methods just don't cut it. All that being said, for the last 1% of the cases that do make sense to use AI for data parsing, here's what I typically do (after the information is already scraped): 1. First I remove vast majority of the HTML. If you need an article from a website, its not going to be in the <script>, <style>, <head>, <footer> tags(you get the idea), so using a python library (I love lxml) I remove all these tags, along with their content. Since we are just looking for an article I will also remove ALL of the HTML attributes, like classes(big one), ids, and so on. After that I will remove all the parent/sibling cases where it looks like a useless staircase of tags. I've tried converting to markdown and parsing, Ive tried parsing with a screenshot, but this method is vastly superior due to important HTML elements still being present, and the general HTML knowledge of LLMs. This step will make each request at least 10 times cheaper, and will allow us to use models with lower context sizes. 2. I will then manually copy the article content that I need and will put it along with the above resulting string into a json object + prompts to extract an article form given HTML, I will do this at least 15 times. This is the step where training data is created. 3. Then I will fine tune a GPT3.5Turbo model with that json data. After 10ish minutes of fine-tuning and around $5-10, I have an "article extraction fine-tuned model", that will always outperform any agentic solution in all areas(price, speed, accuracy, reliability). Then I just feed the model a new(un-seen) piece of HTML that has passed step1(above) and it will reliably spew out an article for a fraction of a cent in a single step (no agents needed). I have a few of those running in production for clients(for different datapoints), and they do very good, but its important that a human goes over the results every now and again. Also if there is an edge case and the fine-tune did not perform well, you just iterate and feed it more training data, and it just works.

    • @ilianos
      @ilianos 2 месяца назад

      Thanks for taking the time to explain this! Very useful to clarify!

    • @rafael_tg
      @rafael_tg 2 месяца назад

      Thanks man. I am specializing in web scraping in my career. Do you have some blog or similar where you share content of web scraping as a career?

    • @morespinach9832
      @morespinach9832 2 месяца назад

      Nonsense. Scraping has for 10 years included both fetching data and then structuring it in some format, XML or JSON. Then we can do whatever we want with that structured that. Introducing "parsing" as some distinct construct is inane. More importantly, the way scraping can work today is leagues better than what the likes of APIFY used to do until 2 year ago, and yes this uses LLMs. Expand your reading.

    • @morespinach9832
      @morespinach9832 2 месяца назад

      @@ilianos his "explanation" is stupid.

    • @morespinach9832
      @morespinach9832 2 месяца назад

      @@rafael_tg watch more sensible videos and comments.

  • @6lack5ushi
    @6lack5ushi 2 месяца назад

    Dumpling ai is a startup doing The same! I’m swapping to this they are 50$ a month for 10,000 and 6 a min

  • @ajax0116
    @ajax0116 2 месяца назад

    It seems Zillow is blocking my access --> Press & Hold to confirm you are a human (and not a bot). I was able to run on trulia, but without my VPN.

  • @nabil-nc9sl
    @nabil-nc9sl 2 месяца назад

    tbarkallah 3lik a bro mashallah

  • @tirthb
    @tirthb 2 месяца назад

    Thanks for the helpful content.