Crawl4AI - Crawl the web in an LLM-friendly Style

Поделиться
HTML-код
  • Опубликовано: 22 янв 2025

Комментарии • 20

  • @miroslavaguzman4653
    @miroslavaguzman4653 7 дней назад

    I Been trying and the one difficult i found it was the installation, but I think this a great approach to resolve scraping thanks for sharing

  • @po6577
    @po6577 8 месяцев назад +4

    Love how you so excited of your project! Keep it up man! Great project

  • @blossom_rx
    @blossom_rx 2 месяца назад

    You deserve way more audience. Keep pushing man!

  • @kenchang3456
    @kenchang3456 3 месяца назад +1

    You wrote this project! U R The Man! :-) Thank you very much.

  • @saikrishna-vc2wj
    @saikrishna-vc2wj 6 дней назад

    Great video.
    I have a question regarding the exclusion of unwanted content during web page extraction. Specifically, how can headers, footers, navigational elements (including side navigation), and tables of contents be effectively removed? Considering that each website follows a different structure and pattern, it seems impractical to configure exclusion rules for every individual site.
    This issue becomes even more critical as it can lead to increased storage requirements and, in some cases, false retrieval results for Retrieval-Augmented Generation (RAG) systems due to the presence of unnecessary content.
    Could you share any insights or strategies to address this challenge effectively?

  • @dafivers4127
    @dafivers4127 Месяц назад +1

    I can't get the local lamma to work :(

  • @xinfeng3022
    @xinfeng3022 7 месяцев назад +1

    possible to put up a prebuilt docker image, including the 'models'? I had problem downloading the models during build docker. Thanks!

    • @unclecode788
      @unclecode788  7 месяцев назад +2

      I will work on that. Trying to have a version without model dependency as well

  • @plumpy8854
    @plumpy8854 6 месяцев назад +1

    Hey man. I'm going to be honest but i'm new to data scraping and wanted to ask if crawl4ai can be used to scrape data from tiktok. They have implemented some harsh measures with request rate limits and login requirements. From what i saw crawl4ai has some login feature but just wanted to ask you if i'm going in the right direction. Otherwise looks great

  • @dolboeb-tz4bw
    @dolboeb-tz4bw День назад

    Colab link?

  • @AWSFan
    @AWSFan 6 месяцев назад +1

    Very useful Project, I must admit! Is it a recursive crawler, when I say recursive, I mean it, (not restricted to depth threshold). Also How differet is this from FireCrawl, in terms of functionality and other stuffs. I can't wait to get started on using this project, and give it a shot! Thanks!

  • @MikeLevin
    @MikeLevin 7 месяцев назад

    Looks exciting. Have you considered a nix script?

  • @fieldcommandermarshall
    @fieldcommandermarshall 8 месяцев назад

    WHAT HAPPENED TO THE FLUTE UNCLE CODE

    • @unclecode788
      @unclecode788  8 месяцев назад +1

      Hahahaha!! Ok, ok, message received

  • @carlosa.villanuevacampoy931
    @carlosa.villanuevacampoy931 7 месяцев назад

    Really cool man! Can I crawl all accessible subpages from a main page? So I crawl 2 levels in total?

    • @unclecode788
      @unclecode788  7 месяцев назад +2

      You can send multiple links, so first crawl the main page, then get links and send them again. However soon I will release the ability to se the depth and get a cool result for that

  • @bitcoinquickbytes
    @bitcoinquickbytes 8 месяцев назад

    i got a result object. how to parse it

    • @unclecode788
      @unclecode788  8 месяцев назад

      Result is an object like this:
      class CrawlResult(BaseModel):
      url: str
      html: str
      success: bool
      cleaned_html: str = None
      markdown: str = None
      extracted_content: str = None
      metadata: dict = None
      error_message: str = None
      So you can access using this property (cleaned_html, markdown, extracted_content), or dump the model into a python dictionary using "result.model_dump()`

    • @harshshivani4170
      @harshshivani4170 3 месяца назад

      When I am using AsyncWebCrawler there is a runtime error there is no current event look in thread mainthread