Firecrawl: Convert Websites into LLM-Ready Data

Поделиться
HTML-код
  • Опубликовано: 19 июл 2024
  • Firecrawl, a tool that converts website URLs into Markdown format. This is useful for Retrieval-Augmented Generation (RAG) pipelines and LLM (Large Language Model) inference. I demonstrate how Firecrawl can crawl a URL, like the Langchain website, and convert the content into organized Markdown.
    Additional features include scraping single URLs and the LLM extract feature, which pulls specific information based on user-defined schemas. I encourage viewers to explore Firecrawl on GitHub, try its playground. An open-source version and SDKs are also available for developers.
    Site: www.firecrawl.dev/
    Repo: github.com/mendableai/firecrawl
    00:00 Introduction to Fire Crawl: Transforming URLs into Markdown
    00:43 Why Markdown Matters for LLM Applications
    02:00 Exploring Fire Crawl's Features and Use Cases
    02:25 LLM Extract: A New Feature in Action
    02:59 Pricing, Open Source Version, and Developer Support
    03:50 Conclusion and Encouragement to Explore Fire Crawl
  • ХоббиХобби

Комментарии • 19

  • @MendableAI
    @MendableAI 2 месяца назад +6

    Hey! Creators here! Thank you so much for shouting out our project! If anyone has any questions, feel free to reply to this comment and will do our best to answer them!

    • @mishal_legit
      @mishal_legit 29 дней назад

      Why not make it opensource?

  • @DanielMartinezRomero-ru4ru
    @DanielMartinezRomero-ru4ru 2 месяца назад +2

    You have no idea how your work is motivating me to learn and move forward. I am thinking of so many applications regarding generative ui with a rag pipeline.

  • @stonedizzleful
    @stonedizzleful 2 месяца назад +2

    Oh man this is so good can't believe they shared the repo. Been trying to do this myself in a logical way for ages!

  • @keffbarn
    @keffbarn 2 месяца назад +1

    Interesting thought that markdown is a more robust format then html. Fundamentally html should be good as well, since it was made with screen readers and such in mind

  • @georgerobbins5560
    @georgerobbins5560 2 месяца назад

    Great video post 😊. Thanks.

  • @_domdge_687
    @_domdge_687 10 дней назад

    the problem with that is when you scrape a website based on attribute's data e.g full title, name, or any related data that can't be seen in the UI is it cannot detect it, but it can still use to leverage ur scraping. Im using it now to read a specific HTML form convert it into markdown and in openai using API key, i ask what i need based on what the markdown data has given. Happy scrapping!

  • @sertenejoacustic
    @sertenejoacustic 2 месяца назад +1

    Very cool

  • @DanielMartinezRomero-ru4ru
    @DanielMartinezRomero-ru4ru 2 месяца назад

    Lastly i have a question regarding this video. Lets say i used firecrawl to fetch x website data and create embbeding of this content rather than fetching directly the html tags. Does the fact that as you mention in the video firecrawl avoids grabbing divs an such just gets the semantic content. Does this imply a better vector representationtion of semantic meaning and a lower costs because of the less ingested data or is the infrastructure of firrcrawl serve as an easy to use service with a more straigh forward approach but the stack requires a higher butget?

  • @DanielMartinezRomero-ru4ru
    @DanielMartinezRomero-ru4ru 2 месяца назад

    Is this a better alternative for the llm aswer engine rather than using brave?

  • @mateocastromc
    @mateocastromc Месяц назад

    Sorry if stupid question. What if I just crawl the websites I want and then cancel the plan? Would work?

  • @timothywcrane
    @timothywcrane 2 месяца назад

    Love you for this. Good actions deserve thanks. Blessed to all who use obsidian/logseq/etc... Redis however... what's the Redis carve out, drop in? They have been naughty.
    comments? advice? ridicule?

  • @raymond_luxury_yacht
    @raymond_luxury_yacht 2 месяца назад

    This is the future. Web is dead ppl just upload Embeddings to llm repository which is equivalent of Google for rag. Authorship confirmed thru Blockchain for audit. Awesome!

  • @RedShipsofSpainAgain
    @RedShipsofSpainAgain 2 месяца назад +1

    Great tool, but the "basic" Firecrawl Starter plan is $50/month, lol. Sorry that's insane. Not accessible for devs just playing around with the tool.
    Anyone know of similar cheaper/free tools like FireCrawl?

    • @DevelopersDigest
      @DevelopersDigest  2 месяца назад +2

      There is a self hosted version you can check out here! github.com/mendableai/firecrawl/blob/main/SELF_HOST.md

    • @RedShipsofSpainAgain
      @RedShipsofSpainAgain 2 месяца назад +1

      ​@@DevelopersDigest oh thank you I apologize

    • @DevelopersDigest
      @DevelopersDigest  2 месяца назад

      @@RedShipsofSpainAgain np! thanks for watching 🙏