GraphRAG: LLM-Derived Knowledge Graphs for RAG

Поделиться
HTML-код
  • Опубликовано: 24 ноя 2024

Комментарии • 142

  • @alexchaomander
    @alexchaomander  6 месяцев назад +32

    What scenarios do you see GraphRAG being useful for?
    UPDATE: GraphRAG is now open source! Check out the release announcement video here: ruclips.net/video/dsesHoTXyk0/видео.html

    • @jtjames79
      @jtjames79 6 месяцев назад +10

      Using GraphRAG to make GraphRAGs.
      Because AI should be able to go down the rabbit hole.

    • @alexanderroodt5052
      @alexanderroodt5052 6 месяцев назад +6

      Profiling people

    • @Sergio-rq2mm
      @Sergio-rq2mm 6 месяцев назад +10

      Any where, where relationships are important. Abstract associations between data sets, perhaps laws, policies, etc, things that are very narrative driven, such as stories, etc. Nontypical datasets basically.

    • @alexanderroodt5052
      @alexanderroodt5052 6 месяцев назад +1

      @@Sergio-rq2mm I choose to go the 1984 route

    • @ktbumjun
      @ktbumjun 6 месяцев назад +5

      Bible study

  • @alexanderbrown-dg3sy
    @alexanderbrown-dg3sy 6 месяцев назад +41

    This is basically causal grounding. We figure semantic symbolic reasoning, from an architectural perspective. Add a powerful model…something very compelling AGI-like would be the result I would assume(plus mcts sampling lol). Causal grounding is huge hole in current models.
    This is dope research. Kudos.

  • @jcourson8
    @jcourson8 6 месяцев назад +14

    I've been doing work in the area of creating knowledge graphs for codebases. The nice thing about generating them for code (as opposed to text) is that you don't have to rely on LLM calls to recognize and generate relationships, but you can utilize language servers and language parsers for that.

    • @tomasb3191
      @tomasb3191 4 месяца назад +2

      that's interesting, what kind of insight can you get from the derived structure? I don't think code agents are leveraging language servers enough, it just looks like they only rag vector-search for context.

    • @reurbanise
      @reurbanise 4 месяца назад +2

      I'd love to hear more about this. Any code you can share?

  • @BhaswataChoudhury
    @BhaswataChoudhury 6 месяцев назад +42

    Looking forward to the code for this!

  • @lalamax3d
    @lalamax3d 6 месяцев назад +8

    glad, i didn't skip this and watched video, thanks for sharing knowledge. seems very impressive.

  • @peteredmonds1712
    @peteredmonds1712 6 месяцев назад +19

    this was so well explained, nicely done. my first thoughts:
    1. i'd be curious to see benchmarks with cheaper LLMs. from my experience, even much smaller models like llama-3-8b can come close to gpt-4 in this use-case (entity extraction and relationships). a little fine-tuning could likely match or surpass gpt-4 for much cheaper.
    2. i wonder how this could be augmented with datasources which already have some concept of relationships, ie wikipedia, dictionaries, hypertext.

    • @mrrohitjadhav470
      @mrrohitjadhav470 6 месяцев назад

      i was having thoughts🙂

    • @Rkcuddles
      @Rkcuddles 6 месяцев назад

      GPT 4 not understanding these deep relationships is bar far the biggest bottleneck in me using it. This is super exciting

  • @andydataguy
    @andydataguy 6 месяцев назад +5

    That final streamlit app was awesome!!

  • @ChetanVashistth
    @ChetanVashistth 6 месяцев назад +4

    This seems very powerful. Thanks for sharing it and explaining it well.

  • @mvasa2582
    @mvasa2582 6 месяцев назад +7

    While RAG is a good process for eliminating hallucinations, GraphRAG makes the retrieved context richer with its relationship-building techniques. The expense is worth it. Is the result set then re-graphed, or will the same query twice be as expensive?

  • @TomBielecki
    @TomBielecki 6 месяцев назад

    I really like the addition of hierarchical agglomerative summarization, which gives holistic aanswers similar to RAPTOR RAG strategy but with the better data representation of knowledge graphs. I'll need to read the paper to understand if embeddings are used at all in this, and whether relationships are labelled or if they just have a strength value.

    • @UtsavChokshi
      @UtsavChokshi 4 месяца назад

      Relationships are not labelled but they have descriptions.

  • @mohamedkarim-p7j
    @mohamedkarim-p7j 13 дней назад +1

    Thank for sharing👍

  • @JasonSun386
    @JasonSun386 6 месяцев назад +2

    Seems like the video was incomplete. Is there another part

  • @sairajpednekar8049
    @sairajpednekar8049 6 месяцев назад +5

    May I know the underlying technology used for hosting the graph database? Was it Cosmos db?

    • @nas8318
      @nas8318 6 месяцев назад +1

      Likely neo4j

    • @alexchaomander
      @alexchaomander  6 месяцев назад +6

      It's graph database agnostic! You can use your choice of Graph DB. The technique is general enough to support multiple

    • @HediBen-r7t
      @HediBen-r7t 6 месяцев назад

      It's not about the datbase, it's about the methodlogy. RDF or PL graphs should both work

  • @Mrbeastifed
    @Mrbeastifed 6 месяцев назад +2

    Is there an Open source implementation of this or how could I build it into my own app?

  • @Aditya_khedekar
    @Aditya_khedekar 6 месяцев назад +2

    Hii, i am working on solving the same problem of vector search rag is not good. can you plz share the code a tutorial will be even great !!

  • @lifedownunderse
    @lifedownunderse 6 месяцев назад +2

    I really enjoyed this video! What tool did you use to visualise the POD cast graph?

  • @heterotic
    @heterotic 6 месяцев назад +2

    How is this any different then Self Organizing Maps for RAG?

  • @mrstephanwehner
    @mrstephanwehner 6 месяцев назад +4

    Is there no standard comparison approach? For example one could take academic literature reviews, collect their references, throw in some more, and ask the llm system. Compare the result with the original review. There might be summaries available in the accounting and legal world, that could be used also

    • @alexchaomander
      @alexchaomander  6 месяцев назад +3

      Comparison is tough! It's another area of research we're heavily invested in. But I like the ideas that you're bringing up!

    • @sathyanarayanbalaji2971
      @sathyanarayanbalaji2971 6 месяцев назад

      true that validation would be required to compare the result.

  • @olegpopov3180
    @olegpopov3180 6 месяцев назад +2

    What is technology stack for that?

  • @jasonjefferson6596
    @jasonjefferson6596 6 месяцев назад +1

    Does the repeated term“regular RAG” refer to setups using vector databases?

  • @knaz7468
    @knaz7468 5 месяцев назад +1

    Run this on the Lex Fridman podcast library!

  • @PsyoPhlux
    @PsyoPhlux 4 месяца назад +2

    Pretty soon, everyone will be graphragging their podcasts. Jre will be neat.

  • @crioss6803
    @crioss6803 2 месяца назад

    Hi, how about the abstract ideas visualization in multidimensional space eg. some mathematical ideas and relations to other ideas - eg. in form of knowledge graph and/or deeper details research possibilities tools? Are there any such tools based on ai?

  • @En1Gm4A
    @En1Gm4A 6 месяцев назад +54

    pls provide the code

    • @alexchaomander
      @alexchaomander  6 месяцев назад +25

      Code will be shared soon!

    • @SamuelJunghenn
      @SamuelJunghenn 6 месяцев назад

      +1 🙏

    • @En1Gm4A
      @En1Gm4A 6 месяцев назад +2

      @@alexchaomander Great! I have signed up for your newsletter. Will you inform about the code release there?

    • @Lutz1985
      @Lutz1985 6 месяцев назад

      le dot

    • @bejn5619
      @bejn5619 6 месяцев назад

      +1

  • @ir43
    @ir43 3 месяца назад

    Are there any accelerators to convert a typical knowledge corpus of unstructured text to a knowledge graph conducive for graph rag? I understand we need to extract entities and figure out relationships but who does that work? An LLM?

  • @sandeepsasikumar701
    @sandeepsasikumar701 5 месяцев назад +2

    Is the code available?

  • @vishwakneelamegam9479
    @vishwakneelamegam9479 5 месяцев назад

    Is there a way to retrieve specific area of the graph than providing total graph to the LLM.

  • @Cedric_0
    @Cedric_0 Месяц назад

    Any upadte on that streamlit code , that would be helpful thanks

  • @joydeepbhattacharjee3849
    @joydeepbhattacharjee3849 3 месяца назад

    Very nice presentation and explanation

  • @Rkcuddles
    @Rkcuddles 6 месяцев назад

    Please let me play with this! Impressive work !

  • @DefenderX
    @DefenderX 6 месяцев назад

    Great, this is something I also thought about when AI had difficulty finding relevant information a while back.
    Basically have filters to determine how the AI will maneuver the training data depending on what is prompted and relevance.
    This is something I thought about after reading a paper on the discovery of a new hybrid braincell type that acted as a trigger that could turn on and off pathways.
    So the context in the prompt is what's important. Because that decides which tags in the training data should be turned on and off.
    Which in the end will give you a unique pathway for the AI to retrieve data.

    • @DefenderX
      @DefenderX 6 месяцев назад

      Also, the next step would create overarching filters between several AI agents.
      After you have all this, the next step is for AI to implement statistics in its reasoning.

  • @filippomarino861
    @filippomarino861 6 месяцев назад +1

    This could be a game-changer in both public and private-sector intelligence analysis (as I am sure you figured out.) Looking forward to additional info - but what about the private dataset's format? Is it vectorized? If so, can we assume that there are optimal and sub-optimal approaches? (IOW, is it fair to assume vectorization can significantly impact GraphRAG's performance?)

  • @energyexecs
    @energyexecs 4 месяца назад

    Thanks for the video. I can see a Use Case in my energy industry. Does GraphRAG work across all "modes" and "modalities"?

  • @phillipmaire8637
    @phillipmaire8637 6 месяцев назад

    Would love the opportunity to contribute to this project, super interesting.
    How easy is it to update existing knowledge graphs periodically when new data comes in? Is there a “reindexing” cost?

  • @ysy69
    @ysy69 4 месяца назад

    This is powerful!

  • @anthonyanalytics
    @anthonyanalytics 4 месяца назад

    This is amazing!

  • @mhkk1122
    @mhkk1122 5 месяцев назад

    I am waiting eagerly for the code of this paper.

  • @GigaFro
    @GigaFro 6 месяцев назад

    Excuse me if I’m wrong… listened to this while exercising… but the main issue explored here for each question was that questions like “what are the top themes?” Cannot be answered by the LLM with vanilla RAG. Is this correct?
    If so, then if context size grows large enough this will be less necessary right?
    Furthermore, by introducing a graph that has communities premised on topics/themes or whatever u decide, doesn’t that reduce the degrees of freedom of your system?

  • @escanoxiao6871
    @escanoxiao6871 6 месяцев назад

    fabulous work! wondering how long it takes to form a whole vector db and plus how many tokens will it take?

  • @knutoletube
    @knutoletube 6 месяцев назад +1

    Is the rest of this conversation available somewhere, @alexchaomander?

  • @ghostwhowalks2324
    @ghostwhowalks2324 6 месяцев назад

    This is just brilliant

  • @SDAravind
    @SDAravind 5 месяцев назад

    Whats the database used?

  • @NobleCaveman
    @NobleCaveman 6 месяцев назад

    Would be a great tool for rapid and more reliable meta analysis

  • @dhirajkhanna-thebeardedguy
    @dhirajkhanna-thebeardedguy 6 месяцев назад

    This is outstanding stuff!

  • @MyRandomnezz
    @MyRandomnezz 5 месяцев назад

    Can you provide the code for this? Would be amazing!

  • @mhkk1122
    @mhkk1122 5 месяцев назад

    This approach is really good but don't you think that extracting entities and then making relationships between extracted entities is an expensive operation if we use GPT4 or Gemini?

    • @j.k.priyadharshini9753
      @j.k.priyadharshini9753 5 месяцев назад

      I thought the same. Using knowledge graph is super but how we going to create it with less computation resources and less cost?

  • @Thrashmetalman
    @Thrashmetalman 6 месяцев назад

    is there source code anywhere for this?

  • @Sarmoung-Biblioteca
    @Sarmoung-Biblioteca 6 месяцев назад

    GraphRAG Perfect !

  • @FitoreKelmendi-fm1tg
    @FitoreKelmendi-fm1tg 5 месяцев назад

    Does chatgpt (paid version) use graph rag?

  • @RickySupriyadi
    @RickySupriyadi 6 месяцев назад +1

    oh hey that's obsidian note style of note making
    it is interesting AI actually can remember better with the help of zettelkasten like human do!?
    can't wait until japan researcher conclude their research using chemical reactions in tube to emulate emotions, so machine can felt emotions through chemical reactions, like human do.... to me emotional are also the best way to learn and remembering things.

    • @RickySupriyadi
      @RickySupriyadi 6 месяцев назад

      so what if... instead of tube of chemical reactions...
      important informations and often asked questions had an emotional cue graph to create some kind of important profiling so that profile will serve as a mark whenever AI is the expert in that field (strong retrieval in specific field leading for future of MoE)

  • @Cedric_0
    @Cedric_0 3 месяца назад

    Streamlit code would be great, thanks

  • @pabloe1802
    @pabloe1802 6 месяцев назад

    To understand semantic search first you need to understand how HNSW works, then you realice no wonder it dosent work. I ended up building a datastructure to combine vector search and entities

  • @youcaio
    @youcaio 4 месяца назад

    è bellissimo!

  • @hjl1045
    @hjl1045 6 месяцев назад

    When will it be open sourced? :)

  • @pablof3326
    @pablof3326 6 месяцев назад

    Great work! I was thinking to use a system like this to build the memory of an AI companion as it talks to the user. So in this case the knowledge graph will start empty and grow get built dynamically with every conversation. Do you see this as a good use case for GraphRAG?

  • @malikanaser8251
    @malikanaser8251 6 месяцев назад

    Hi, are you going to share the code?

  • @lesptitsoiseaux
    @lesptitsoiseaux 4 месяца назад

    Its a month later, where's the code you promised? Please?

  • @Sri_Harsha_Electronics_Guthik
    @Sri_Harsha_Electronics_Guthik 6 месяцев назад

    implementations?

  • @ABG1788
    @ABG1788 5 месяцев назад

    I don't understand. Why do we need GraphRag, when an LLM can summarise the text and find relationships ?

    • @mrpocock
      @mrpocock 5 месяцев назад

      Knowledge graphs have a ton of formal methods to work with them. If you can get it into RDF then you can now use all the RDF tooling, or analyse it in cytoscape, or whatever.

  • @adilgun2775
    @adilgun2775 2 месяца назад

    great research topics but as an hands-on nlp engineer on ner boosted knowledge graphs and LLMs , my experiences say too naive to believe that it would work in productions systems

  • @tacticalgaryvrgamer8913
    @tacticalgaryvrgamer8913 5 месяцев назад

    I assume it's open source because why would someone pay to have gpt4 parse and organize their data. Takes 2 seconds to roll your own.

  • @nickfleming3719
    @nickfleming3719 6 месяцев назад

    Okay... we know graph rag is good. duh. How is it implemented, how do you feed it to the LLM, how do you store the data

  • @MahmoudAtef
    @MahmoudAtef 6 месяцев назад +1

    But knowledge graphs are very slow to query. I wonder if we can encode those graphs in the gpt model by building graph transformers.

    • @damianlewis7550
      @damianlewis7550 6 месяцев назад +3

      I don’t think that’s the case. Optimized graph query engines can return results in milliseconds e.g. WikiMedia, Google etc. at a fraction of the computational cost of an LLM.
      The reason that GraphRAG is slow-ish is because the LLMs are slow.

    • @MrDonald911
      @MrDonald911 6 месяцев назад +4

      Google, Facebook, and Linkedin all use graph databases, it's actually much faster than relational DBs

    • @nas8318
      @nas8318 6 месяцев назад +2

      Slower than LLMs?

  • @vcool
    @vcool 4 месяца назад +1

    Admit it: y'all built graphrag first for use by the CIA. This is not a joke.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 6 месяцев назад +4

    but don't you lose information in the process of making a knowledge graph, given how only a subset of the textual information is extracted and retained in the KG?

    • @computerrockstar2369
      @computerrockstar2369 6 месяцев назад

      I don't think the LLM really needs the graph to make any decisions. Its more valuable for human users to find related information

    • @HediBen-r7t
      @HediBen-r7t 6 месяцев назад

      You can use ETL to build your knowledge graph by yourself from RDMSs, then you will not loose information

  • @bicepjai
    @bicepjai 4 месяца назад +1

    Why do we need other faces on screen ? Hope they know they are just distractions :)

    • @nikharmalik6090
      @nikharmalik6090 4 месяца назад

      And I hope you know that you can zoom in to the screen to not see them and that it's always better not to say anything if you don't have anything nice or useful to say 😊

  • @Walczyk
    @Walczyk 6 месяцев назад

    What's a rag

    • @IlyaDenisov
      @IlyaDenisov 6 месяцев назад

      Retrieval Augmented Generation (use that as an input to your favourite search engine or AI companion)

  • @lanc3carr
    @lanc3carr 6 месяцев назад

    Police, FBI, CIA, etc... investigations (CSI AI)

  • @Ian_Arden
    @Ian_Arden 4 месяца назад +1

    Where did you get the body for this? This whole text is taken from some chief Russian propaganda bureau 😅

  • @RameshK-c1w
    @RameshK-c1w 6 месяцев назад

    American princess Google Plex SEO Sandra Mitra watching.....

  • @historia_tego_swetra
    @historia_tego_swetra 4 месяца назад

    first of all, there is no MOVEMENT but state sponsored Russian proxies like the Yemenis.
    a very wrong choice of a dataset.
    second thing - there is NO NOVOROSSIA

  • @cyberslot
    @cyberslot Месяц назад

    You could save yourselves the political/propaganda element.
    From all uncounted number of articles available, the choice of this particular topic is more than flashy. It's ridiculous for techie persons who are expected to be smart in general...

  • @ross9263
    @ross9263 6 месяцев назад +2

    The content is very political..

    • @somjrgebn
      @somjrgebn 5 месяцев назад +1

      Haha, and skewed... Crickets for Gaza... but Odessa is worth mentioning?
      This is why it's best to avoid politics when we're trying to stay on task, especially when dealing with tech that's literally forming and pruning knowledge graphs based on topics/themes...