Diffbot
Diffbot
  • Видео 51
  • Просмотров 150 994
Reliable Graph RAG with Neo4j and Diffbot
We're developing a GraphRAG system using Diffbot's APIs to construct reliable knowledge graphs, which are then stored in a Neo4j graph database for efficient querying and information retrieval.
0:00 intro
0:22 brief overview of graph rag and knowledge graphs
0:50 potential pitfalls of vector-based rag
1:29 graph rag research by microsoft
2:09 potential pitfalls of llms constructing knowledge graphs
2:20 brief intro of entity resolution
3:03 entity resolution problem with gpt-4
3:41 entity resolution handled by Diffbot
3:51 Graph RAG demo + article importer
4:37 web scraping without worrying about hallucinated sources
5:24 KG construction from news article
5:37 enrich the knowledge graph with Enhance ...
Просмотров: 14 202

Видео

Trying to make LLMs less stubborn in RAG (DSPy optimizer tested with knowledge graphs)
Просмотров 2,2 тыс.2 месяца назад
RAG (retrieval-augmented generation) has been recognized as a method to reduce hallucinations in LLMs, but is it really as reliable as many of us think it is? The timely research "How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior" resonated with our struggles when LLMs don't always follow external knowledge in RAG systems, even when ground truth (from ...
Things you should check before using Llama3 with DSPy.
Просмотров 3,6 тыс.2 месяца назад
No, comparing individually the performance of different language models and embedding models is not enough. To further investigate the hallucination issues we saw in our DSPy RAG pipeline in our last video, we tested pairing Llama3: 70B with both nomic embedding (local and open-source embedding model) and ada-002 (one of OpenAI's embeddings), while using gpt3.5 ada-002 as the baseline for our c...
DSPy with Knowledge Graphs Tested (non-canned examples)
Просмотров 7 тыс.3 месяца назад
The DSPy (Declarative Self-improving Language Programs in Python) framework has excited the developer community with its ability to automatically optimize and enhance language model pipelines, which may reduce the need to manually fine-tune prompt templates. We designed a custom DSPy pipeline integrating with knowledge graphs. The reason? One of the main strengths of knowledge graphs is their a...
Diffbot is making ____ intelligence possible.
Просмотров 4623 месяца назад
What's beyond just artificial intelligence? Hint:The answer is at the very end of the video.
Is Tree-based RAG Struggling? Not with Knowledge Graphs!
Просмотров 46 тыс.4 месяца назад
Long-Context models such as Google Gemini Pro 1.5 or Large World Model are probably changing the way we think about RAG (retrieval-augmented generation). Some are starting to explore the potential application of “Long-Context RAG”. One example is RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval), by clustering and summarizing documents, this method lets language models gras...
Building less wrong RAG with Corrective RAG?
Просмотров 2,9 тыс.4 месяца назад
Building a basic retrieval-augmented generation (RAG) system is becoming easier, but the harder part often comes from having it work correctly. For example, if wrong information is being selected early on in the retrieval process, it's obvious that the quality of generated answer is going to be bad. To address this issue, Corrective RAG is being explored to more carefully evaluate the quality o...
Extract 5 Lists in 2 Minutes
Просмотров 1,4 тыс.2 года назад
Our biggest update to Diffbot Extract EVER - Extract any type of list on any website into JSON or CSV with no rules or scripts. Diffbot Extract reads websites like a human so you don't have to. Stop scraping, start extracting. List API Documentation: docs.diffbot.com/docs/en/api-list MORE ABOUT DIFFBOT Access a trillion connected facts across the web, or extract them on demand with Diffbot - th...
Diffbot's Knowledge Graph In Three Minutes
Просмотров 2,5 тыс.2 года назад
The world's largest Knowledge Graph contains billions of organizations, articles, and people. But where do you get started? Here's our quick start video meant to be consumed alongside our Knowledge Graph Get Started Guide at: docs.diffbot.com/docs/en/dql-quickstart
Building a Better Quality Internet with Factmata
Просмотров 2852 года назад
Factmata helps monitor internet content and analyse its risks and threats. Their technology can automatically extract relevant claims, arguments and opinions, and identify threatening, growing narratives about any issue, brand, or product. Their tools save time for online media analysts, finding new opportunities, risks and threats. MORE ABOUT DIFFBOT Access a trillion connected facts across th...
10 New Market Intelligence Queries From Diffbot's Knowledge Graph [Webinar]
Просмотров 3612 года назад
In this weekly webinar we look at 10 new(ish) and innovative ways to use the world's largest Knowledge Graph to explore linked data on people, organizations, articles, and more. Diffbot's Knowledge Graph takes entities, facts, and relationships extracted from the public web and structures them into a queryable database.
Eight Ways Web-Reading Bots Revolutionize Market Intelligence [Webinar]
Просмотров 2522 года назад
Eight Ways Web-Reading Bots Revolutionize Market Intelligence [Webinar]
Best Practices: Using External Data To Enrich Internal Databases [Webinar]
Просмотров 2383 года назад
Data decays at an average of 30% a year, and dated or incorrect data can be more harmful than not having data coverage at all. In this webinar we explain the basics of data enrichment as well as work through hands-on ways in which you can use the world's largest Knowledge Graph to pull in millions of facts related to organizations you care about. Resources: Bulk Enhance API Google Collab Walkth...
Diffbot For Demand and Lead Generation [Webinar]
Просмотров 3193 года назад
Diffbot For Demand and Lead Generation [Webinar]
[Webinar] Informal Dashboard Building With Diffbot's Excel and Google Sheets Integrations
Просмотров 1403 года назад
[Webinar] Informal Dashboard Building With Diffbot's Excel and Google Sheets Integrations
[Webinar] Knowledge Graph Techniques For Global News Monitoring
Просмотров 3463 года назад
[Webinar] Knowledge Graph Techniques For Global News Monitoring
[Webinar] Competitor, Vendor, And Customer Data From Across The Web With Diffbot's Knowledge Graph
Просмотров 1653 года назад
[Webinar] Competitor, Vendor, And Customer Data From Across The Web With Diffbot's Knowledge Graph
Diffbot The Web-Reading Robot: Explainer Video
Просмотров 5423 года назад
Diffbot The Web-Reading Robot: Explainer Video
What's Rule-Less Web Scraping and How Is it Different Than Rule-Based Web Data Extraction? [Webinar]
Просмотров 2803 года назад
What's Rule-Less Web Scraping and How Is it Different Than Rule-Based Web Data Extraction? [Webinar]
Knowledge Graph Basics: Data Enrichment
Просмотров 5333 года назад
Knowledge Graph Basics: Data Enrichment
Crawlbot Basics - Choosing The Right Web Data Extraction API For Crawling
Просмотров 3423 года назад
Crawlbot Basics - Choosing The Right Web Data Extraction API For Crawling
The Ultimate Guide To Natural Language API Products
Просмотров 2,7 тыс.3 года назад
The Ultimate Guide To Natural Language API Products
Knowledge Graph Basics: Data Provenance
Просмотров 1,9 тыс.3 года назад
Knowledge Graph Basics: Data Provenance
NLP Fundamentals: Entities, Sentiment, Facts
Просмотров 6253 года назад
NLP Fundamentals: Entities, Sentiment, Facts
Knowledge Graph Basics: Faceting
Просмотров 4173 года назад
Knowledge Graph Basics: Faceting
Knowledge Graph Basics - Searching For Orgs Or Articles
Просмотров 3993 года назад
Knowledge Graph Basics - Searching For Orgs Or Articles
Knowledge Graph Basics: Entity Types
Просмотров 1,2 тыс.3 года назад
Knowledge Graph Basics: Entity Types
How to Track Market Indicators Using Knowledge Graph News Monitoring Scheduling
Просмотров 6323 года назад
How to Track Market Indicators Using Knowledge Graph News Monitoring Scheduling
Advanced Crawlbot Tutorial - Crawling Web Pages Behind Logins
Просмотров 9963 года назад
Advanced Crawlbot Tutorial - Crawling Web Pages Behind Logins
Diffbot Crawlbot Web Crawler Tutorial (2021) - Scrape Ecommerce Pages Quickly
Просмотров 3 тыс.3 года назад
Diffbot Crawlbot Web Crawler Tutorial (2021) - Scrape Ecommerce Pages Quickly

Комментарии

  • @davidwynter6856
    @davidwynter6856 13 дней назад

    Have a look at the latest Stanford NLP Group research that supercedes DSPy: TextGrad. You have very clear presenting style and don't waste time, love it! It'd be great to see a presentation on TextGrad with GraphRAG. Also have a look at this paper, 2407.01502v1 in arxiv, it proves Agentic systems that do a lot of expensive LLM calls are not required.

  • @grantdh
    @grantdh 15 дней назад

    You rock, this was great!

  • @ML-bf2bz
    @ML-bf2bz 16 дней назад

    This is a fantastic video. Where the RAG fails due to potential lost in the middle, is there any trace or context to confirm it was provided to the LLM?

  • @cagdasucar3932
    @cagdasucar3932 17 дней назад

    Why is the presenter speaking like a robot?

    • @diffbot4864
      @diffbot4864 17 дней назад

      Bc she probably is AI-generated

  • @MattJonesYT
    @MattJonesYT 20 дней назад

    More like this please!

  • @chillbeach7322
    @chillbeach7322 23 дня назад

    How does this compare to Graph RAG by Microsoft? What are their differences and similarities?

  • @BillVoisine
    @BillVoisine 25 дней назад

    Thank you! This is exactly what I needed!

  • @lesptitsoiseaux
    @lesptitsoiseaux 28 дней назад

    LeanChen, if you had to rate the similiarity of groups of words, how would you do it? A bit like are these two classes similar?

  • @Damon_Sieputovsky
    @Damon_Sieputovsky 28 дней назад

    fugazi KG- this is only simple LPG (Label Property Graph)

  • @PrincessKushana
    @PrincessKushana 28 дней назад

    Graphrag is so hot right now.

  • @alv9551
    @alv9551 Месяц назад

    omg I was trying to hold it together at the introduction when I saw the large soup bowl but I broke down when it said "whatever, y'all get replaced by AI in 5 yrs" 😂 Great video overall. I hope I am right about the size of the soup bowl and you are not 3 feet tall.😅

  • @viky2002
    @viky2002 Месяц назад

    how does this compare to llamaindex property graph ?

  • @johnkintree763
    @johnkintree763 Месяц назад

    Excellent presentation. Neo4j has done a great job of integrating vector embeddings with knowledge graphs. The neo4j LLM Knowledge Graph Builder for extracting entities and relationships from video transcripts, Wikipedia articles, and pdf format files is impressive. It also merges the extracted knowledge into a graph structure, and then provides an interface to query the LLM about knowledge in the graph.

    • @diffbot4864
      @diffbot4864 28 дней назад

      The neo4j graph builder (llm-graph-builder.neo4jlabs.com/) is indeed awesome! try selecting the "Diffbot" option in the generate graph dropdown

  • @ronifintech9434
    @ronifintech9434 Месяц назад

    Thanks for the knowledge (graph) sharing :)

  • @alchemication
    @alchemication Месяц назад

    Hey. Thx for the awesome content. Would it be possible to actually show a fully working large scale graph (like proper prod scale thing), and also discuss pros and cons of the approach, and when did you find KGs working well and not so well? The reason I asked is that my own KG experiments worked perfectly for me at smaller scale, but then the speed was so slow, that it was killing my M2 mac all together.

    • @diffbot4864
      @diffbot4864 28 дней назад

      Great idea for a future video. For trying out a production version of a 10B node graph built from the entire public web, try out Diffbot!

  • @darkmatter9583
    @darkmatter9583 Месяц назад

    be my teacher,and advisor please connect

  • @trinityblood5622
    @trinityblood5622 Месяц назад

    Hi it would be great if you could please make longer videos explaining how you did each of these transition for example entity extraction, relationship extraction and so on and then how you did the neo4j integration. Maybe you can make a short video like this out of the original video to attract customers while the long video would still serve as a promising directions for the developer/researcher. I love the output produced from your system, but there’s no way to reproduce what you are doing. Reproducibility is a major concern in KGC.

    • @diffbot4864
      @diffbot4864 28 дней назад

      Thanks for the feedback on producing deep dives into those topics. it's too much to cover in an applied video like this, but could be a good topic for a future video. In terms of reproducibility, you should be able to reproduce any of the examples in the video using the linked github project repo in the description.

  • @johannesdeboeck
    @johannesdeboeck Месяц назад

    Awesome indeed 😍Thank you!

  • @ignaciopincheira23
    @ignaciopincheira23 Месяц назад

    Hi, could you convert complex PDF documents (with graphics and tables) into an easily readable text format, such as Markdown? The input file would be a PDF and the output file would be a text file (.txt).

    • @joshuajose8598
      @joshuajose8598 Месяц назад

      Maybe the pages have to be turned into images using Poppler and you could use an LLM that allows image inputs like GPT4Vision and Claude3 along with Function Calling to get the entities, objects and relations.

  • @tomazbratanic5502
    @tomazbratanic5502 Месяц назад

    Awesome!

  • @mshonle
    @mshonle Месяц назад

    Once the entities are extracted, can the model then be prompted to write a graph query (that could then be executed)? I’m thinking in particular of the “knowing A is B but not B is A” problem, such as when you ask an LLM “who is Mary Pfeiffer’s son?” and it does not say “Tom Cruise” but can answer “who is Tom Cruise’s mother?” just fine?

    • @diffbot4864
      @diffbot4864 28 дней назад

      Yes! There are many people working on text-to-graph query language and that is a great motivating example for how to overcome language modeling bias

  • @V0v1kkk
    @V0v1kkk Месяц назад

    DiffBot - sound not so good for privacy. That is the role of the DiffBot? Entities extraction? Are there any selfhosted replacements?

    • @diffbot4864
      @diffbot4864 28 дней назад

      Unlike OpenAI, Diffbot does not train on your API inputs. Also, we offer on-premises solutions for enterprise customers that are self-hosted. Diffbot's services analyzes text and extracts entities and their relationships (aka a knowledge graph).

  • @kakashisensie100
    @kakashisensie100 Месяц назад

    Amazing video really. and jokes are definitely necessary. 🤓

  • @jackbauer322
    @jackbauer322 Месяц назад

    too much information on the screen at once ... really painful to follow ... be more sober and straight to the point. Jokes are not necessary

    • @diffbot4864
      @diffbot4864 Месяц назад

      Thanks for the feedback!

    • @3stdv93
      @3stdv93 Месяц назад

      Why so serious? 😅

  • @shubhanshuyadav2437
    @shubhanshuyadav2437 Месяц назад

    Can't sign in diffbot even with non-gmail ID. Is it supposed to be so?

  • @adarshchintada5600
    @adarshchintada5600 Месяц назад

    very well explained, Thanks!

  • @rahulvb5044
    @rahulvb5044 Месяц назад

    one basic doubt. def forward(self, question): # Step 1: Retrieve context based on the question context = self.retrieve(question).passages # Step 2: Generate an answer based on the context and question prediction = self.generate_answer(context=context, question=question) answer = prediction.answer # Step 3: Validate the answer type using the entity_linker function correct_question_type, original_answer_type, type_status = entity_linker(answer, question) # Optional: You can use the AnswerTypeValidityCheck signature for validation if needed validation = self.check_answer_type(entity_type=correct_question_type, question=question, answer=answer).type_status return dspy.Prediction(context=context, answer=answer, type_status=type_status) in this method, the answer returned is the one you got from self.generate_answer. That method doesnt use any entitylinker . so how is entitylinker influencing the answer ?

  • @Digitalcataloghub
    @Digitalcataloghub Месяц назад

    I really like this tool can it be used for seo

  • @timjrgebn
    @timjrgebn Месяц назад

    Do you know of any efforts on converting these entities and relationships further into formal logic representations? Being able to pair these graph databases with formal logic representations would definitely help improve the quality of written text, organic exploration/discovery, and understanding over time.

  • @timjrgebn
    @timjrgebn Месяц назад

    Do you know of any efforts on converting these entities and relationships further into formal logic representations? Being able to pair these graph databases with formal logic representations would definitely help improve the quality of written text, organic exploration/discovery, and understanding over time.

  • @madrush24
    @madrush24 Месяц назад

    Your videos are fantastic! Entertaining, educational, and still highly technical.

  • @GeorgeG472
    @GeorgeG472 Месяц назад

    Do you feel that nomic embeddings are adequate open-source embeddings model for RAG projects or do you recommend another?

  • @DavidConnerCodeaholic
    @DavidConnerCodeaholic Месяц назад

    I want an LLM that will write a slightly different version of LOTR each time, but without Fellowship

  • @knaz7468
    @knaz7468 Месяц назад

    Would love to see this in practice! Seems like it would add a good chunk of latency? I love graphs but they can also "hallucinate" if not given enough training data to build connections right? Or missing data entirely?

    • @EdwardAustin
      @EdwardAustin Месяц назад

      Also curious about the latency of graph RAG

  • @NicolasEmbleton
    @NicolasEmbleton 2 месяца назад

    Haha. Brilliant. Love the behind-the-scene piece at the end. Very instructive. Thanks 🙏🏻

  • @mbrochh82
    @mbrochh82 2 месяца назад

    Here's a ChatGPT summary: - Retrieval bandage generation Bragg is effective in reducing hallucinations in large language models (LLMs). - Despite providing correct context and ground truth, LLMs often do not incorporate external knowledge correctly. - Viewer comment suggests LLMs may prioritize their internal knowledge over external information. - Research compares GPT-4, GPT-3.5, and MST-7B in balancing external information with internal knowledge. - GPT-4 is the most reliable model when using external information, followed by GPT-3.5 and MST-7B. - All models tend to stick to their internal knowledge if they believe external knowledge is less correct. - Rax can enhance accuracy, but its effectiveness depends on the model's confidence and prompting technique. - Different combinations of language models and embedding models can lead to varied results. - The study highlights the influence of different prompting techniques on how LLMs follow external knowledge. - SPI framework for auto-tuning prompts can improve LLMs' adherence to external knowledge. - SPI uses bootstrapping to create and refine examples, improving prompts based on specific metrics. - Entity linking can prevent incorrect answers by mapping and identifying words in text to entities in a knowledge graph. - Default Knowledge Graph is used for validation due to its extensive network of verified information sources. - Entity linker helps filter made-up information when LLMs hallucinate. - DSP RAC pipeline updated with entity type validity check improves output accuracy. - Custom DSP RAC pipeline integrates knowledge graph data to refine questions and retrieve relevant information. - Two metrics for DSP optimizer: entity type check and alignment with knowledge graph context. - Knowledge graph context ensures final answers align with ground truth. - Enhanced output incorporates specific passages and relationships from the knowledge graph. - Example shows knowledge graph confirming Elon Musk as the sole founder of SpaceX. - Optimized program sometimes fails to make LLMs stick to external knowledge. - Manual prompt tweaking may be necessary to ensure LLMs follow external knowledge strictly. - DSPive framework has a steep learning curve but can yield better results for experienced programmers. - Main message: Integrating knowledge graphs and entity linking with LLMs can improve accuracy, but manual prompt customization may still be necessary to ensure adherence to external knowledge.

  • @TheRealAfroRick
    @TheRealAfroRick 2 месяца назад

    This is the way...

  • @ScottzPlaylists
    @ScottzPlaylists 2 месяца назад

    I haven't seen any good examples of the Self-improving part of DSPy yet. Is it ready for mainstream use❓

  • @pedromoya9127
    @pedromoya9127 2 месяца назад

    thanks great video!

  • @victoriamartindelcampo7827
    @victoriamartindelcampo7827 2 месяца назад

    girl, u rock thank you so much for this!! Where can I follow you?

  • @JeffreyWang-hh4ss
    @JeffreyWang-hh4ss 2 месяца назад

    Love this kind of RAG comparison, would be better if the background looks less like a spa room.😅

    • @diffbot4864
      @diffbot4864 2 месяца назад

      Leann here, I literally filmed it in my room. Which parts in the video do they suggest spa room features?

    • @JeffreyWang-hh4ss
      @JeffreyWang-hh4ss 2 месяца назад

      @@diffbot4864 oops, didnt think you would reply personally… maybe just the nice bed… very different from all the other tech influencers, haha, keep up the good work Leann

    • @diffbot4864
      @diffbot4864 2 месяца назад

      @@JeffreyWang-hh4ss It's Leann again :) Well, NeetCode also films a lot in his room: www.youtube.com/@NeetCode Currently, my room is the only place where I can get the best voice quality. The most important thing I hope is that the content itself delivers value. Thank you for the feedback!

    • @JeffreyWang-hh4ss
      @JeffreyWang-hh4ss Месяц назад

      @@diffbot4864 sorry hope im not being too rude here. the content is getting so much better since u start acting ;)

  • @wadejohnson4542
    @wadejohnson4542 2 месяца назад

    Until I saw this, I was starting to think that there was something wrong with me not being able to achieve magical improvements in results by using DSPy over meticulously hand-crafted prompts targeted at the observed quirkiness of specific LLMs. Thank you for restoring my self confidence. And now I'm also going to incorporate graph databases into my RAG pipelines after watching a couple of your videos.

  • @pedromoya9127
    @pedromoya9127 2 месяца назад

    great video! thanks

  • @plattenschieber
    @plattenschieber 2 месяца назад

    Hey @lckgllm, could you also upload the missing `dspy-requirements-2.txt` in the repo? 🤗

  • @PoGGiE06
    @PoGGiE06 2 месяца назад

    Very interesting, thanks! But Musk wasn't a co-founder of Tesla either.

  • @efexzium
    @efexzium 2 месяца назад

    Thanks 🙏🏽

  • @codelinx
    @codelinx 2 месяца назад

    Great info and content.

  • @googleyoutubechannel8554
    @googleyoutubechannel8554 2 месяца назад

    Impressive a 996 worker can find time to put this together, keep it up! Ah yeah, RAG doesn't work, in fundamental ways... it can't.

  • @ronifintech9434
    @ronifintech9434 2 месяца назад

    Love it! Finally Neo4j has good usage!

  • @kefanyou9928
    @kefanyou9928 2 месяца назад

    Great video~ Very interested in KG' adaption in LLM. Kindly reminder: hide your api key in the video😉