Reliable Graph RAG with Neo4j and Diffbot
HTML-код
- Опубликовано: 10 июл 2024
- We're developing a GraphRAG system using Diffbot's APIs to construct reliable knowledge graphs, which are then stored in a Neo4j graph database for efficient querying and information retrieval.
0:00 intro
0:22 brief overview of graph rag and knowledge graphs
0:50 potential pitfalls of vector-based rag
1:29 graph rag research by microsoft
2:09 potential pitfalls of llms constructing knowledge graphs
2:20 brief intro of entity resolution
3:03 entity resolution problem with gpt-4
3:41 entity resolution handled by Diffbot
3:51 Graph RAG demo + article importer
4:37 web scraping without worrying about hallucinated sources
5:24 KG construction from news article
5:37 enrich the knowledge graph with Enhance API
5:57 final network graph
6:09 question answering (vector vs. vector+kg)
7:26 more examples (vector vs. vector+kg)
7:35 skip the outro and have fun with the repo!
7:52 attribution to Tomaž Bratanic and Anej Gorkič
Get your free Diffbot token to start building graph rag at:
app.diffbot.com/get-started
Github repo for this Graph RAG project:
github.com/tomasonjo/diffbot-...
#graphrag #knowledgegraphs #llms - Наука
Amazing video really. and jokes are definitely necessary. 🤓
Excellent presentation. Neo4j has done a great job of integrating vector embeddings with knowledge graphs. The neo4j LLM Knowledge Graph Builder for extracting entities and relationships from video transcripts, Wikipedia articles, and pdf format files is impressive. It also merges the extracted knowledge into a graph structure, and then provides an interface to query the LLM about knowledge in the graph.
The neo4j graph builder (llm-graph-builder.neo4jlabs.com/) is indeed awesome! try selecting the "Diffbot" option in the generate graph dropdown
More like this please!
Thanks for the knowledge (graph) sharing :)
Thank you! This is exactly what I needed!
Awesome indeed 😍Thank you!
Awesome!
How does this compare to Graph RAG by Microsoft? What are their differences and similarities?
LeanChen, if you had to rate the similiarity of groups of words, how would you do it? A bit like are these two classes similar?
Once the entities are extracted, can the model then be prompted to write a graph query (that could then be executed)? I’m thinking in particular of the “knowing A is B but not B is A” problem, such as when you ask an LLM “who is Mary Pfeiffer’s son?” and it does not say “Tom Cruise” but can answer “who is Tom Cruise’s mother?” just fine?
Yes! There are many people working on text-to-graph query language and that is a great motivating example for how to overcome language modeling bias
omg I was trying to hold it together at the introduction when I saw the large soup bowl but I broke down when it said "whatever, y'all get replaced by AI in 5 yrs" 😂 Great video overall. I hope I am right about the size of the soup bowl and you are not 3 feet tall.😅
how does this compare to llamaindex property graph ?
Hi, could you convert complex PDF documents (with graphics and tables) into an easily readable text format, such as Markdown? The input file would be a PDF and the output file would be a text file (.txt).
Maybe the pages have to be turned into images using Poppler and you could use an LLM that allows image inputs like GPT4Vision and Claude3 along with Function Calling to get the entities, objects and relations.
Hey. Thx for the awesome content. Would it be possible to actually show a fully working large scale graph (like proper prod scale thing), and also discuss pros and cons of the approach, and when did you find KGs working well and not so well? The reason I asked is that my own KG experiments worked perfectly for me at smaller scale, but then the speed was so slow, that it was killing my M2 mac all together.
Great idea for a future video. For trying out a production version of a 10B node graph built from the entire public web, try out Diffbot!
DiffBot - sound not so good for privacy.
That is the role of the DiffBot? Entities extraction?
Are there any selfhosted replacements?
Unlike OpenAI, Diffbot does not train on your API inputs. Also, we offer on-premises solutions for enterprise customers that are self-hosted. Diffbot's services analyzes text and extracts entities and their relationships (aka a knowledge graph).
Graphrag is so hot right now.
Hi it would be great if you could please make longer videos explaining how you did each of these transition for example entity extraction, relationship extraction and so on and then how you did the neo4j integration. Maybe you can make a short video like this out of the original video to attract customers while the long video would still serve as a promising directions for the developer/researcher. I love the output produced from your system, but there’s no way to reproduce what you are doing. Reproducibility is a major concern in KGC.
Thanks for the feedback on producing deep dives into those topics. it's too much to cover in an applied video like this, but could be a good topic for a future video. In terms of reproducibility, you should be able to reproduce any of the examples in the video using the linked github project repo in the description.
fugazi KG- this is only simple LPG (Label Property Graph)
be my teacher,and advisor please connect
too much information on the screen at once ... really painful to follow ... be more sober and straight to the point. Jokes are not necessary
Thanks for the feedback!
Why so serious? 😅