Advanced RAG with Knowledge Graphs (Neo4J demo)

Johannes Jolkkonen | Funktio AI

Просмотров 42 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 20 июн 2024
I recently created a demo for some prospective clients of mine, demonstrating how to use Large Language Models (LLMs) together with graph databases like Neo4J.
The two have a lot of interesting interactions, namely that you can now create knowledge graphs easier than ever before, by having AI find the graph entities and relationships from your unstructured data, rather than having to do all that manually.
On top of that, graph databases also have some advantages for Retrieval Augmented Generation (RAG) applications compared to vector search, which is currently the prevailing approach to RAG.
Connect with me on LinkedIn: / johannesjolkkonen
▬▬▬▬▬▬ T I M E S T A M P S ▬▬▬▬▬▬
0:00 - Intro
2:16 - Demo starts
2:55 - Creating graph from unstructured data
4:23 - Chatting with the knowledge graph
5:55 - Advantages of Graphs vs Vector Search

Комментарии • 80

@w_chadly 6 месяцев назад ⁺⁴
this is incredible! I see so many use cases opening up. thank you for sharing this!
@johannesjolkkonen 7 месяцев назад ⁺¹³
Hey everybody, thanks for the great comments!
Finally got around to making a more detailed tutorial for this demo, with code available on Github. You can check it out here: ruclips.net/video/tcHIDCGu6Yw/видео.html
@SafetyLabsInc_ca 7 месяцев назад ⁺²
This is a great video.
It clearly explained to me the difference between vector database and graph databases and the new features. We can build using the Graph Databases. Thank you.
@jonathancooper7068 4 месяца назад
Very nice demo. It showed why and how to use the graph database for RAG and answered questions that I came up with while watching.
@alchemication 7 месяцев назад ⁺²
Super nice food for thought. Thanks for sharing an alternative. Would love a deeper dive with some clear examples confirming the 3 advantages 😊 But might experiment myself for fun too!
@agentDueDiligence 4 месяца назад ⁺¹
This concept & this video are truly amazing.
I have a specific idea how to apply this - i think this might change my whole project & I will explore this graph based approach!!
Great work - thank you.
@michaeldoyle4222 4 месяца назад
great content and delivery - love your work
@chrisogonas 2 месяца назад ⁺²
Well illustrated! Thanks
@itsdavidmora Месяц назад
Really neat demo! I think this works so well because graphs help LLMs approximate the sort of clear relationships humans have in their brain about the world.
@NLPprompter 7 месяцев назад ⁺⁶
I love when data engineers making videos it's so easy to understand side look even the description is structured 👍
@MuhammedBasil 6 месяцев назад ⁺¹
Wow. This is amazing
@WisherTheKing 7 месяцев назад ⁺¹
Great video! I wanted to explore the graph dbs exactly for this use case. Imagine also adding work pieces to this. Jiras, code reviews, comments, etc.
P.S. the music is great 😂
@engage-meta 7 месяцев назад
Good présentation. Thank you!
@AssassinUK 7 месяцев назад ⁺³
I had to subscribe based on this idea alone! I'm trying to think of another way I could implement this with standard RAG for those that use LangChain/Flowise, and Mermaid code to hold the node information.
@AEVMU 4 месяца назад
Gotta look at decentrlaized knowledge graphs. Those are the future of RAG databases.
@antoninleroy3863 7 месяцев назад
very interesting thanks !
@mostlazydisciplinedperson 7 месяцев назад
thank you for video
@pierrebonnet2026 7 месяцев назад
Nice!
@quansun8245 7 месяцев назад ⁺¹
A really awesome video Johannes, wondering if there is a github repo for this? Thanks.
@chenzhong1182 7 месяцев назад ⁺¹
That's exactly what I am looking for? Apart from the tutorials, are you also considering starting a discord channel where people can chat? I think there is growing interest in KG + LLMs but no where to dicuss
@_jen_z_ 7 месяцев назад ⁺¹
Thanks for sharing! Ca you also share how are you dealing with consolidation of output nodes. Some project descriptions might generate "Graph Neural Nets" another "Graph Neural Network" or "GNN"
@johannesjolkkonen 7 месяцев назад ⁺¹
Hey Djan! Consolidation/entity resolution is definitely one of the most interesting challenges with these kinds of applications, but in this demo there's nothing implemented for that yet
@evetsnilrac9689 7 месяцев назад ⁺¹
Excellent job Johannes! After watching the video "Knowledge Graph Construction Demo from raw text using an LLM" by Neo4j, I came across your video and found that you addressed the crucially important question some of us are thinking about: "How can we improve the way we do RAG?" I agree with your assessment that using KGs provide very significant benefits that would compel us to want to use this approach vs using vector embeddings. However, am I correct in understanding that we need better workflows / pipelines to get all the kinds of data we need to work with into a KG to take more advantage of these benefits?
Sounds like you may have listened to Denny Vrandecic discusses The Future of Knowledge Graphs in a World of Large Language Models.
@johannesjolkkonen 7 месяцев назад ⁺¹
Hey Steve, thank you!
You are correct, using a KG will almost certainly involve more pre-processing/workflows compared to just having an unstructured text/vector database. LLMs can be very useful in the process of extracting entities and relationships for your graph, but it's still a serious undertaking, with a lot of quality checks needed to make it production-ready. It's all still pretty experimental and niche, but I think this approach will become increasingly mainstream over the next 1-2 years.
I haven't checked out Danny's video, but I definitely will now! I can also recommend going through the content that the Neo4J team has been creating around LLMs
@evetsnilrac9689 7 месяцев назад ⁺⁹
@@johannesjolkkonen Here's my summary of the key points of Denny's presentation.
• LLMs are expensive to train
• LLMs are expensive to run inference responses.
• LLMs can’t be trusted to correctly output accurate facts.
o Answers are just guessing based on stochastic probability, even if it has inferred a different answer in a different language - i.e., it does not “know” what it “knows” because it does not maintain a list of all the things it knows-it just generates outputs at inference runtime.
• Knowledge in ChatGPT seems to be not stored in a language-independent way, but is stored within each individual language.
• They are not very good at math and it would be economically inappropriate to use them for math computation
• Autoregressive Transformer Models such as ChatGPT are supposed to be Turing complete, but they are a very expensive reiteration of Turing’s tarpit. You could do everything with them, but it doesn't mean you should.
• It is economically inappropriate to attempt to improve LLM’s ability to internalize knowledge (know what it knows) because it will always be cheaper, faster, and more accurate(?) to externalize it in a graph store and look it up when needed.
In a world where language models can generate infinite content, "knowledge" (vs content) becomes valuable.
• We don't want to machine learn Obama's place of birth every time we need it.
• We want to store it once and for all and that's what knowledge graphs are good for: to keep your valuable knowledge safe.
The knowledge graph provides you with the ground truth for your LLMs.
• LLMs are probably the best tool for knowledge extraction we have seen developed in a decade or two.
• They can be an amazing tool to speed up the creation of a knowledge graph.
• We want to extract knowledge into a symbolic form. We want the system to overfit for truth.
• And this is why it makes so much sense to store the knowledge in a symbolic system that can be edited, audited, curated, and understood…where we can cover the long tail by simply adding new nodes to the knowledge graph that can be simply looked up instead of systems that need to be trained to return knowledge with a certain probability that may have them making stuff up on the fly.
@wdonno 7 месяцев назад
@@evetsnilrac9689, such a helpful summary! Thank you!
@infinit854 7 месяцев назад
How does the chat interface communicate with the database? Is it based on prompts that create cypher queries?
@inflationking1271 6 месяцев назад ⁺¹
I would be curious about your view on when vector seach is better suited than graph search for RAG. Thanks for this great video! It helps a lot
@johannesjolkkonen 6 месяцев назад ⁺¹
Thank you!
Vector search is still great for a lot of situations, when answers can be found directly in the unstructured text. Where graphs (or really any other more "structured" databases) start to shine is when you need to understand concepts and their relationships beyond what's explicitly said in the text. But this is a lot more demanding too, and often not necessary.
Also the two aren't mutually exclusive, with neo4j (and recently AWS Neptune, another graph db) supporting vector search to also search nodes by their similarity. This combination is super exciting!
@Epistemophilos 6 месяцев назад ⁺²
Fabulous video, thanks! Would be even better with no music, or at least if it was very much lower volume :)
@tomgiannulli911 6 месяцев назад
Did you use attributes to add more characteristics to the nodes an edges, example : to score strength of relationship ? I have tried to ask LLMS to create graphs using various prompts from its native knowledge and it does poorly, which is interesting as des it indicate a lack of understanding / relationships or more of a fine tuning issue, what do you think?
@johannesjolkkonen 6 месяцев назад
Hey, I haven't added such metadata but that's a great idea!
For your problem, I'd say the most important thing is to make sure you tell the LLM what kinds of entities and relationships you are looking for. In other words, you should have a pre-defined schema in mind for your graph. Some pre-processing might also be useful if your data is also very messy.
@parvesh8227 15 дней назад
Darn!! I have been working on something similar, slightly different approach
@phmfthacim Месяц назад
I like the music
@jingqiwu2865 6 месяцев назад
very nice and inspiring. QQ: if gpt4 created incorrect cipher, do we try to detect and auto fix/retry?
@johannesjolkkonen 6 месяцев назад
Thank you!
You can see the details in my latest video, but in this setup we aren't doing that. That's definitely one of the top ways, and simplest ways that this could be improved
@chabo26 21 день назад
Great demo on learning neo4j and LLM. In a typical RAG, vector database is created for the documents, how does it work for neo4j graph db?
@johannesjolkkonen 20 дней назад
Thanks! As well as the multi-hop searches I talk about here, you can also use neo4j for storing vector-representations of the nodes and the text-content, and search based on node similarity and such.
@AdamLorentzen 7 месяцев назад ⁺²
I'm working on somthing similar, but you make it look easy! Would love to chat and see if we could collaborate on something to get in front of clients :)
@bartoszko4028 Месяц назад
Is it better in some way than using SQL db and relations based on for example sql schemas etc. which also can be easily used when doing retrieval?
@sakinamosavi1104 7 месяцев назад ⁺³
I am very excited to see how your code works. Please share your solution.
@Epistemophilos 6 месяцев назад
Around 5:45, how does the LLM combine the graph search with "normal" LLM generation? What happens behind the scenes?
@johannesjolkkonen 6 месяцев назад ⁺¹
Hey! I show that part in detail in my latest video, here: ruclips.net/video/Kla1c_p5v0w/видео.html
@thehappycookiehour 4 месяца назад
Why not using both KG with Vector embeddings?
@Jeremy-bd2yx 7 месяцев назад
When the text to cypher conversion happens, how does the LLM know how the nodes/edges are labeled and therefore able to accurately write the query?
@johannesjolkkonen 7 месяцев назад ⁺¹
Hey Jeremy! If you are referring to the chat interaction, we pass the schema of the graph onto the LLM, alongside the user's query.
For other questions, I just released a detailed breakdown of how to generate the graph which you can find on my channel. All the code is available as well.
@Jeremy-bd2yx 7 месяцев назад
@@johannesjolkkonen thank you! watching now!
@MrDonald911 6 месяцев назад
what's the added value for a company to use a tool like that ? Is it to save time ? Like what's their ROI if they invest in such a solution ? Thank you for the video and the great work ;) That would be awesome if you also talked about the business side of t his, thanks
@johannesjolkkonen 6 месяцев назад ⁺¹
Thank you! I'm sure I'll be talking more about some concrete business cases around this in the future 🙂
@98f5 6 месяцев назад ⁺¹
How can that generate useful relationship triples when you can only give small subsets of the data to the LLM at a time?
@johannesjolkkonen 6 месяцев назад ⁺¹
Hey, good question. Two points:
- We can add nodes and relationships to the graph incrementally, so we don't need to identify all the relationships at once.
- The subsets can also really be quite large, using the 16k-32k context window models that would be ~15-30 pages of content at a time.
And so while there can be some cases relationships that only become apparent when looking at the "full picture" of all the data, I think most of the relationships can be identified within the subsets, in isolation. For example, if a paragraph mentions that some technologies were used for one project, that's all we need to know about these tech->project relationships. Then if we find more relationships or attributes for that project or those technologies later in the data, we can just add them to the graph.
This can be different case-by-case, of course 🙂
@SDGwynn 7 месяцев назад ⁺¹
More info please.
@alinakhaee4935 7 месяцев назад ⁺¹
Please teach us how to do it
@zaursamedov8906 7 месяцев назад ⁺¹
will u be able to share the prompts and code snippets?
@johannesjolkkonen 7 месяцев назад ⁺⁴
The repo is still a work in progress, but I'm planning to make a video soon where I share and walk through the code in more detail!
@mtprovasti 7 месяцев назад
Mahtavaa, ajattelin soveltaa tämmöistä ihan perinteiseen hierarkiseen taksonomiaan. Odotan innolla.
@shaunjohann 7 месяцев назад
@@johannesjolkkonenthat's great to hear! i'm working on a project that needed to hear some of what you said
@johannesjolkkonen 7 месяцев назад
A full video-walkthrough is now live here: ruclips.net/video/tcHIDCGu6Yw/видео.html
Repository link included (:
@CreativityCourse 3 месяца назад
Hey great video , do you have the code on repo?
@johannesjolkkonen 3 месяца назад
Thanks! Yes I do, you can find a more detailed tutorial on my channel which also has the link to the repo (:
@openyard Месяц назад
I think there are learners who find music essential for concentration and understanding and would go as far as advocating for music in classrooms. But there are others who find the background music being noise and therefore distracting and annoying. I am assuming you listened to the video after adding the music and found it better with the background music than without.
To cater for both groups of learners, perhaps you could upload two versions of your videos, one version without the addition of the music and the other with the music. You may include a label such as "without music" and "with music" respectively.
@Noneofyourbusiness2000 Месяц назад
I really don't see how this is any different than a typical database with more columns. For example:
Sort by company
Lookup Azure
Next sort by number of projects
Lookup employee
@labloke5020 8 месяцев назад ⁺⁴⁹
Please do not use music when creating future videos.
@johannesjolkkonen 8 месяцев назад ⁺⁶
Hey, thanks for the feedback. I'll keep that in mind!
@UlrikStreetPoulsen 7 месяцев назад ⁺²
Agreed, that's really off-putting
@infinit854 7 месяцев назад ⁺¹⁰
I enjoyed the music 👍
@NLPprompter 7 месяцев назад ⁺²
agree but you can use music in between pause but not when you re not talking..
@itslordquas 7 месяцев назад ⁺¹¹
bro what about a "thank you for the amazing info" before nitpicking? 😂
@podunkman2709 4 месяца назад
Presentation about nothing. How to build that required
@johannesjolkkonen 4 месяца назад
Hey, I also have a full tutorial on this here: ruclips.net/video/tcHIDCGu6Yw/видео.html&lc=UgyOfLtgIOQyEu2zmMF4AaABAg 🙂
@openyard Месяц назад
Yes the background music is distracting and annoying.
@mcpduk 4 месяца назад ⁺¹
excellent video - but the music ...... please no.........

Следующие

Автовоспроизведение

GraphRAG: LLM-Derived Knowledge Graphs for RAG