MemGPT 🧠 Giving AI Unlimited Prompt Size (Big Step Towards AGI?)

Поделиться
HTML-код
  • Опубликовано: 2 окт 2024

Комментарии • 572

  • @matthew_berman
    @matthew_berman  11 месяцев назад +171

    So who’s building something with AutoGen + MemGPT?

    • @zappy9880
      @zappy9880 11 месяцев назад +10

      Please do! Autogen had blown my mind before and now combined with this it could be unstoppable!

    • @TheRealDOSmile
      @TheRealDOSmile 11 месяцев назад +14

      I'm currently working on something very similar to that.

    • @codescholar7345
      @codescholar7345 11 месяцев назад +17

      Ha! I was just going to suggest that. How can we get it working with a local LLM and autogen?

    • @randotkatsenko5157
      @randotkatsenko5157 11 месяцев назад +1

      ​@@TheRealDOSmile How to contact you?

    • @mavvemavve3498
      @mavvemavve3498 11 месяцев назад +1

      I probably am ;)

  • @middleman-theory
    @middleman-theory 11 месяцев назад +29

    Your channel has distinctly carved its niche in the AI RUclips arena. Among the myriad of AI RUclipsrs I'm subscribed to, your channel, particularly over the last six months, has excelled in quality, presentation, and professionalism. Your videos have become my go-to source, superseding others that now seem laden with filler content.
    Your knack for diving straight into the core topic, elucidating not only the 'what' but the 'why,' is refreshing. The structured walk-throughs, practical guidance, and anticipatory glimpses into the future keep me engaged throughout. Your closing phrase, "And...I'll see you in the next one," has amusingly become a segment I look forward to; it encapsulates the essence of your engaging delivery.
    Being a part of your channel feels like being immersed in a thriving community. The clear, concise factual delivery, balanced with simplicity, makes the content accessible for newcomers while remaining enriching. Despite the crowded space of AI discussions on RUclips, your channel effortlessly ranks within my top 10.
    Thank you for the enriching content and the community you've fostered.

    • @matthew_berman
      @matthew_berman  11 месяцев назад +4

      This is such a kind comment, thank you so much!! Glad you’re learning from my videos :)

    • @theChotkiyOne
      @theChotkiyOne 11 месяцев назад +2

      I agree, but this looks like it was written by GPT

    • @karlortenburg
      @karlortenburg 11 месяцев назад +1

      Well deserved and well said! Amazing how you explain these matters for everyone. Any exec will be so pleased to have you guide them.
      And btw it doesn't matter whether the words were perfected by AI, it's the thought - the gratitude that counts.

    • @PanamaRed917
      @PanamaRed917 11 месяцев назад +1

      @@theChotkiyOne that is exactly what I was just saying. LMAO

  • @amj2048
    @amj2048 11 месяцев назад +148

    AGI would be impossible without a memory system, so I agree this is another step towards it. It's really cool.

    • @matthew_berman
      @matthew_berman  11 месяцев назад +7

      🎉🎉

    • @kloszi
      @kloszi 11 месяцев назад +2

      I have the same fellings

    • @Bargains20xx
      @Bargains20xx 11 месяцев назад +1

      AGI doesn't need to be a memory machine. An agi good enough at comprehension and decision making is enough. Now if you talk about AGI with conscience, we are talking about elon musk level extinction

    • @Madman-bi5bf
      @Madman-bi5bf 11 месяцев назад

      What possibilities regarding MemGPT could be accomplished with AI like ChatGPT?

    • @akarna69
      @akarna69 11 месяцев назад

      ​@@kloszino one cares. 😄

  • @sveindanielsolvenus
    @sveindanielsolvenus 11 месяцев назад +11

    Once we have a robust way of handling memory, like MemGPT, we can simply fine tune the LLMs to utilize the system. Then we no longer need to use context window space for the system prompt to operate the memory. The LLM will just "naturally" do it.

    • @gidmanone
      @gidmanone 11 месяцев назад +1

      you can simply fine-tune right now for that

    • @sveindanielsolvenus
      @sveindanielsolvenus 11 месяцев назад

      @@gidmanone Yes, when we can fine tune GPT-4. But it will be better if OpenAI implement this directly themselves.

  • @davidbaity7399
    @davidbaity7399 11 месяцев назад +72

    As an older developer, we used 'virtual memory' because in 1989 computers only had 640k and in DOS, there was no OS memory management. We would swap CAD/CAM geometry objects in and out of memory as they were needed.
    Please keep us informed as this project moves forward especially when it can use open source LLM's.

    • @JorgetePanete
      @JorgetePanete 11 месяцев назад +2

      LLMs*

    • @robinvegas4367
      @robinvegas4367 11 месяцев назад +6

      Hold up a sec, I gotta find disk 2

    • @FamilyManMoving
      @FamilyManMoving 8 месяцев назад +3

      The more things change, the more they stay the same. I've been writing code professionally for 30 years, and every generation of 20-somethings "discovers" something some greybeard taught me when I was 20-something.
      Virtual context management. Imagine that. New since about 1970.

    • @snooks5607
      @snooks5607 7 месяцев назад +1

      nitpick; PC from 1989 likely had more RAM than 640k, DOS by default just couldn't address more than 1MB directly (with 384k reserved for system leaving 640k for user) because of legacy architectural limitation of the original IBM PC from 1981 and the holy tenets of backwards compatibility.
      since around dos4.0 in the backwards compatible "real mode" himem.sys and emm386 could give access to higher memory areas but the proper way was to switch to "protected mode" that could address rest of the system memory directly (16M for 24bit 286 and 4GB for 32bit 386) usually with an extender library like DOS/4G which were around in -89 but maybe not so widely spread yet.

    • @davidbaity7399
      @davidbaity7399 7 месяцев назад

      @@snooks5607
      You need to understand at $1,500 per mb of memory, there were not many computers with more than a mb of memory.

  • @stickmanland
    @stickmanland 11 месяцев назад +53

    Man! I for one, am fully ready to welcome our AGI overlords!

    • @Seriph001
      @Seriph001 11 месяцев назад +3

      I'm right there next to you my friend

    • @DodoJo
      @DodoJo 11 месяцев назад +2

      @@Seriph001I'm right behind you bro.

    • @randotkatsenko5157
      @randotkatsenko5157 11 месяцев назад +2

      Bow to the chosen One.

    • @Romulusmap
      @Romulusmap 11 месяцев назад +2

      Same

    • @andrewxzvxcud2
      @andrewxzvxcud2 11 месяцев назад +4

      this meme is so overdone i cringe everytime i see it

  • @redbaron3555
    @redbaron3555 11 месяцев назад +28

    Yes please do another tutorial with MemGPT! This is huge!

    • @matthew_berman
      @matthew_berman  11 месяцев назад +4

      Ok!

    • @redbaron3555
      @redbaron3555 11 месяцев назад

      @@matthew_berman Thank you!!!👏🏻👍🏻

    • @toddai2721
      @toddai2721 4 месяца назад

      Please also do a tutorial on Salesforce ai.

  • @Rod-e2y
    @Rod-e2y 11 месяцев назад +3

    This was the first thing I thought of when I learned about token limits. I even asked GPT to create a micro shorthand language to condense info. It didnt work in April but seems like were getting close!

  • @ytpah9823
    @ytpah9823 11 месяцев назад +69

    🎯 Key Takeaways for quick navigation:
    00:00 🧠 AI currently lacks memory beyond training data and is limited by its context window.
    00:29 📈 Progress has been made to increase context window size, but still limited (e.g., Chat GPT-4 has 32,000 tokens).
    00:58 📚 Introducing MemGPT: A solution to expand AI's memory. The video reviews this research and the open-sourced code.
    01:11 ✍️ Paper titled "M GPT: Towards LLMs as Operating Systems" has several authors from UC Berkeley.
    01:51 🗣️ Limited context window issues arise especially in long-term chat and large document analysis.
    02:20 💽 MGPT mimics computer OS memory management, with an "appearance" of large memory resources.
    03:27 📊 Increasing context window in Transformers is not optimal due to computational and memory costs.
    04:08 🔄 MGPT autonomously manages its memory through function calls, enhancing its ability.
    04:52 🖥️ Diagram explanation: Inputs go through parsers, get processed in virtual contexts (main and external), and get outputted after further processing.
    06:14 🖱️ MGPT allows AI to self-manage context, treating longer context as virtual memory and own context as physical memory.
    06:40 📟 Main context (like RAM) has a size limit while external context (similar to a hard drive) is virtually unlimited.
    07:08 📏 Various models have different token limits, impacting how many messages can be processed.
    07:48 ⚠️ Actual usable context is often less than advertised due to system messages and other requirements.
    09:00 🔄 Recursive summarization is another way to manage limited context, previously discussed in another video.
    09:15 🧠 MemGPT stores its "memories" in a vector database, but it eventually compresses them through a process called "reflecting on memories" to manage space.
    09:56 🔄 Recursive summarization can address overflowing context but is lossy, leading to gaps in the system's memory, much like video compression degradation.
    10:38 📝 MemGPT splits context into: system instructions, conversational context (recent events), and working context (agent's working memory).
    12:02 🎂 MemGPT can store key information from conversations in its working context, as shown by a birthday conversation example.
    12:43 💽 External context acts as out-of-context storage (like a hard drive), separate from the main context but can be accessed through function calls.
    13:25 🔍 There are two types of external contexts: recall storage (history of events) and archival storage (general data store for overflow).
    14:09 🧩 MemGPT manages its memory using self-directed memory edits and retrievals, executed via function calls and based on detailed memory hierarchy instructions.
    15:32 🔄 MemGPT can correct its memory when false information is detected, updating its stored context.
    16:14 🤖 The effectiveness of MemGPT as a conversational agent is evaluated based on its consistency (alignment with prior statements) and engagement (personalizing responses).
    17:10 🎵 Through a function call, MemGPT can delve into its past memory to recall previous conversations, like discussing a music artist.
    17:52 🕰️ Deep Memory Retrieval (DMR) enables the agent to answer questions that refer back to very specific details from past conversations.
    18:05 📊 The accuracy of MGPT's responses is better than GPT 3.5 or GPT 4 alone.
    18:19 🍪 Personalized conversation openers (like referencing a user's cookie preference) increase user engagement.
    19:01 ☕ Examples illustrate how MGPT uses context and recall differently to engage with users.
    20:12 📜 Many documents exceed the token limits of current models, creating challenges in document analysis.
    21:06 🧠 Large language models exhibit a bias in recalling information towards the beginning or end of their context, mirroring human memory patterns.
    22:44 📈 Charts indicate that MGPT maintains consistent accuracy regardless of the number of documents or nested information, unlike GPT 3.5 and 4.
    23:12 ⚖️ A trade-off with MGPT is that some token budget is used for system instructions.
    23:41 🤖 Discussion about LLMS as agents and their emergent behaviors in multi-agent environments.
    24:21 💻 Tutorial on how to activate and use MGPT, starting with code setup.
    27:35 📁 MGPT's document retrieval feature allows users to chat with their documents; using wildcards can fetch multiple text files.
    28:15 💵 Embedding files come with a computational cost; example given shows 3 documents for 12 cents.
    28:44 🔄 MGPT's persona is customizable, allowing users to tailor how the model interacts with information, like referencing archival memory.
    29:38 🔍 MGPT can retrieve specific data from documents, such as annual revenues of companies.
    30:06 🌐 Introduction to MGPT emphasized its rapid evolution and potential for open-source models in the future.
    30:33 🎙️ Interview with MGPT authors Charles and Vivian discussing inspiration and plans for the project.
    30:46 🧠 MGPT addresses the memory limitations of current language models by actively saving crucial data into a permanent memory store.

    • @tmhchacham
      @tmhchacham 11 месяцев назад +4

      Wow, nice. Thank you!

    • @eraldcala9125
      @eraldcala9125 11 месяцев назад +6

      What did you use for this

    • @captanblue
      @captanblue 11 месяцев назад +1

      What was used for this?

    • @Madman-bi5bf
      @Madman-bi5bf 11 месяцев назад +1

      Sounds pretty complicated, regardless, things like ChatGPT could use this to improve the performance of the ai they use, right?

    • @RandomButBeautiful
      @RandomButBeautiful 11 месяцев назад +6

      @@eraldcala9125 I think its HARPA ai. I'm seeing tons of videos spammed with this.... already over it lol

  • @dominiccogan945
    @dominiccogan945 11 месяцев назад +5

    I literally was just about to ask about a memGPT your a freak…. You earned that sub

    • @93cutty
      @93cutty 11 месяцев назад +2

      I joined the discord the other day, it's pretty awesome in there too

    • @adelinrapcore
      @adelinrapcore 11 месяцев назад

      you're*

    • @dominiccogan945
      @dominiccogan945 11 месяцев назад

      @@adelinrapcore why does that always happen. Not lying I always mess it up and someone corrects me.

    • @matthew_berman
      @matthew_berman  11 месяцев назад

      Haha thank you. I’m reading your mind :)

    • @matthew_berman
      @matthew_berman  11 месяцев назад +1

      @@93cuttywelcome!

  • @tomt215
    @tomt215 11 месяцев назад +9

    Please let us know and do this again when they have open source models!

  • @JimMendenhall
    @JimMendenhall 11 месяцев назад +5

    Thanks for digging into this and explaining it so well. I have looked at this project a couple of times and didn't quite "get" it. Keep up the good work!

  • @remsee1608
    @remsee1608 11 месяцев назад +40

    Some of the new Mistral based local LLM's have 32k context and hence beat GPT-4 at certain tasks, it's amazing

    • @matthew_berman
      @matthew_berman  11 месяцев назад +3

      Good to know!

    • @avi7278
      @avi7278 11 месяцев назад +11

      which ones exactly?

    • @remsee1608
      @remsee1608 11 месяцев назад

      @@avi7278i used TheBloke/MistralLite-7B-GGUF and it was good TheBloke/Mistral-7B-Phibrarian-32K-GGUF is another option i've tried it wasn't as good for what I was doing but it might be better on academic datasets

    • @emmanuelkolawole6720
      @emmanuelkolawole6720 11 месяцев назад +12

      TheBloke/Mistral-7B-Phibrarian-32K-GGUF

    • @emmanuelkolawole6720
      @emmanuelkolawole6720 11 месяцев назад +5

      TheBloke/Llama-2-7B-32K-Instruct-GGUF

  • @RichardGetzPhotography
    @RichardGetzPhotography 11 месяцев назад

    Imagine using MemGPT + Agents for large or multi-document analysis, where each Agent takes on a document (or section) and discusses with the other Agents their documents to answer user questions?

  • @ReanCombrinck
    @ReanCombrinck 11 месяцев назад +10

    Please keep following this with opensource! Great for analysis

  • @wingflanagan
    @wingflanagan 11 месяцев назад

    Wow. I just set up my own MemGPT bot on Discord and had a long conversation, Impressive, though still a bit artifically cheerful. Thanks for this!

  • @nufh
    @nufh 11 месяцев назад +7

    I came across your channel and AI related topics on RUclips by accident. Now I'm hooked, even though what I know is very limited, but this thing is really interesting. I started learning Python since last week, I just found out what Docker is today. Do you have any suggestions/references for newcomers like me? I really like the idea of having an AI friend/buddy that we can chat with while helping us with work.

    • @sashetasev505
      @sashetasev505 11 месяцев назад

      He’s a YT/media personality and knows little beyond what he reads in the news, press releases and GPT4 summaries. Certainly not a bad thing-we need dedicated news aggregators since legacy media and trad. sources are inadequate in this sense-but to expect anything more than bulletins, general zeitgeisty commentary (and superficial read-throughs like this) would be misguided. Knowledgeable or even merely competent engineers have bigger fish to fry rn or they are Indian/Asian and have a less polished AV style than this 🤷🏻‍♂️
      Good advice is boring: Use text to learn and YT news to keep up to date. No shortcuts to mastery.

    • @matthew_berman
      @matthew_berman  11 месяцев назад +1

      Thanks for joining! Just go through my videos and work with an AI to learn Python :)

    • @matthew_berman
      @matthew_berman  11 месяцев назад +2

      ⁠@@sashetasev505ouch. I guess my 20+ years of development, multiple tech businesses, and production-level AI implementations don’t count for much. 🤷‍♂️

    • @sashetasev505
      @sashetasev505 11 месяцев назад

      @@matthew_berman No insult intended, just no hints of any of that apparent. 🤷🏻‍♂️ Your current line of work is as a medium. Do regale us with (evidence of) your dev lore and business acumen.

    • @ludoviclebleu
      @ludoviclebleu 11 месяцев назад

      @sashetasev505 with respect I have to disagree, I think that's uncalled for and inaccurate. Matthew is doing way much more than reading the news, he's curating and showing how to install and use the tech. The specific applications we do with the tech is our gig. There's no way he could also cover use cases and scenarios at this pace of tech releases, and he would leave cases out anyway.
      He releases great and USEFUL content several times per week that, at least for me, would take hours I don't have cos I'm using this info to actually build my cases/scenarios.
      I'm so thankful with his work, I'd actually love to see a tuto to "wrap it all so far", like encompassing most of the tech he's curated and reviewed and used over the months in a macro system to build my applications on: memgpt + autogen, with open source LLMs he has tested and shown that would make the best agents (llama, llava, mistral, falcon...), plus GPT4 and Claude2, and Dalle3 and SDXL on top. And on runpod on demand (pay as u go).
      Out of all the creators I follow on AI, Matthew would be the best (only) one would could show how to build such a comprehensive system; I say this based on all his great historic here, he has the knowledge, the smarts and the pedagogy to do this. I almost think a responsibility by now ;)
      Cheers, @matthew_berman.

  • @J2897Tutorials
    @J2897Tutorials 11 месяцев назад +7

    My favourite open source model is currently _Falcon 180B_ with the web search feature. I was impressed by M$'s _Bing Chat_ in Edge, but I mainly use Falcon instead now, since it seems just as good for grabbing information from the web, at least from my perspective. Although I don't fancy paying to run Falcon on a server, just to test it with MemGPT, despite my eagerness to try it out. It could be interesting if there was a _Falcon 180B_ API, similar to OpenAI's API, only much cheaper.

  • @bertilhatt
    @bertilhatt 11 месяцев назад +5

    Separating the conversation from an internal dialogue the system can have will prove very helpful: you can ask where the system has learned something to prevent hallucinations, have a space to run logical reasoning until confirmation, and now spout, “The ball has to be 10c and the bat $1.10… Wait, no.”

    • @Shinkaze33
      @Shinkaze33 11 месяцев назад +2

      Yes, Self Awareness would great improve LLMs.. some humans need it to learn that skill too!

  • @SassePhoto
    @SassePhoto 11 месяцев назад +3

    As always, highest quality content, many kind thanks!

  • @faysoufox
    @faysoufox 11 месяцев назад

    Regarding CS Go, it's probably from messages before the printed example, as the example is about reaching the end of the context.

  • @thenoblerot
    @thenoblerot 11 месяцев назад +5

    One of my first function calling experiments was having GPT-4 manage a couple of it's own context windows, and it really does a good job! Told it to use regex. Didn't go to this scale tho... Sounds really expensive!!!

  • @nathanbollman
    @nathanbollman 11 месяцев назад +7

    It looks like UC Berkley intends to release their own tuned version of Mistral-7B, Sounds like that project combined with their memory might have some amazing results for local independent research. Interesting they are on the Mistral 7B and not the Llama2 7b or Llamav3, This is institutional recognition of the value in this new open commercially viable solution and its plasticity to being fine tuned... I cant wait to see what comes of it definitely make a vid when its working with a local LLM, I suspect if Berkley is tuning Mistral for this use case it *could* be all local!

    • @lauridskristensen9800
      @lauridskristensen9800 11 месяцев назад

      I've almost exclusively heard of Berkley in relation to jazz music education, so I can't help but wonder if they're *tuning* it to the jazz standards of "The Real Book"?! :D

  • @Christopher-today
    @Christopher-today 11 месяцев назад +1

    Amazing bit of work by this team.
    A thought... While I'm not going to be silly and say open source models are currently as good all around as openAI's offerings they're close in so many regards and are catching up fast in most areas BUT where openAI really has a lead is in things like Function Calling. I'm really, really hoping we see some innovation in this area in the open model space soon. Thankfully I do think that innovation is coming and openAI's closed ecosystem is going to be under more and more pressure. imo open models will eventually win. Thanks for the coverage.

  • @PietroSperonidiFenizio
    @PietroSperonidiFenizio 11 месяцев назад +2

    Matthew, this is sn amazing video. Remember the format, it's really good. Of course there must be a paper which is as good as this. But your way of explaining it is really well done

    • @matthew_berman
      @matthew_berman  11 месяцев назад +1

      Much appreciated. I don’t think some people liked the glitch transition or the sound effects. But can’t please everyone!

    • @PietroSperonidiFenizio
      @PietroSperonidiFenizio 11 месяцев назад

      @@matthew_berman i have not noticed any glitch transition. But maybe my brain is running too low Herz to notice them 😉

  • @MarkusEicher70
    @MarkusEicher70 11 месяцев назад +1

    Thanks a ton, Matthew! That's such great news. One step closer to a real LLM-OS. Can't wait till they implement opensource model support. I also would like to see how things like LangChain, HuggingFace and others can get integratied into solutions. Would highly appreciate another video about this topics from you. Thanks for your great work! 💪

  • @scitechtalktv9742
    @scitechtalktv9742 11 месяцев назад +1

    Just to be sure: so MEMGPT uses the principle of PAGING, that is used in Operating Systems to circumvent the limitation of RAM size, when a much larger memory size is needed? Which is called VIRTUAL MEMORY. In the case of LLMs we could call it VIRTUAL CONTEXT ?

  • @productjoe4069
    @productjoe4069 11 месяцев назад

    This is an exciting research direction. I wish they were using standard terminology from cognitive science though. What they call ‘recall’ storage is properly called episodic memory. What they call ‘archival’ storage is semantic memory. Using established terminology helps researchers find papers, and also can suggest ideas (for example, what’s the equivalent of procedural memory? Is that a useful thing to add?)

  • @raducamman
    @raducamman 11 месяцев назад +2

    I feel like Vivian could have done this all by herself, because it's covering a memory issue and women never forget anything.

  • @MCNarret
    @MCNarret 11 месяцев назад +2

    They should use both the uncompressed and compressed memories, the compressed memories offer a "preview" to the AI which it can then call more details if it needs to.

  • @fuba44
    @fuba44 11 месяцев назад +1

    HUGE yes from me, please cover it again when it can use llama or the webUI api :-) suuuper cool project!

  • @RonnyMW
    @RonnyMW 11 месяцев назад +4

    I think the information is valuable and is explained up to the point where you can't understand more without a deep dive into AI. Good job!

  • @alx8439
    @alx8439 11 месяцев назад +6

    The issue with uneven attention in the context window (that phenomena when only beginning and end is memorized well, but everything else in the middle is foggy blurry) was partially solved by Mosaic with their MPT models

  • @Leonid.Shamis
    @Leonid.Shamis 11 месяцев назад +3

    Thank you very much for sharing this information! I'm very interested in using MemGPT with open-source LLM models installed locally. If you come across any new developments in that space, I would highly appreciate hearing about it!

  • @basoele7795
    @basoele7795 11 месяцев назад +11

    🎯 Key Takeaways for quick navigation:
    00:00 🧠 The limitation of AI regarding memory and context window sizes, with previous models having token limitations that hinder its performance in long-term interactions or extensive document analysis.
    02:33 🖥️ Introduction of MemGPT as a solution, mimicking traditional computer memory management systems with fast (RAM-like) and slow (Hard Drive-like) memory for handling larger contexts.
    04:08 💾 Explanation on how MemGPT autonomously manages memory through function calls, creating a virtual memory system for AI to access and manage information beyond fixed context limits.
    06:40 📊 Comparison between the context handling of different models and the real-world limitation of token count even in higher-end models like Claude 2.
    09:56 🔄 Mention of recursive summarization as a method to handle overflowing context windows, but its lossy nature leads to eventual large holes in memory.
    13:25 🗂️ The distinction between two types of external context, Recall Storage and Archival Storage, to store and manage different types of data.
    14:09 📝 Description of how memory edits and retrieval are self-directed and executed via function calls, with a detailed structure to guide the system on how to interact with its memory systems.
    17:10 🔄 Example of Deep Memory Retrieval (DMR) where the system references past conversations to answer current queries.
    18:19 👋 Evaluation of MemGPT on crafting engaging conversation openers by referencing past interactions to enhance user engagement.
    20:12 📜 Addressing the challenge of document analysis with large documents and the limitations of current models' context windows, introducing the potential of MemGPT in handling such tasks.
    21:21 🧠 The comparison of large language models' memory behavior to human memory, where both tend to remember the beginning and end of a list better than the middle.
    22:16 📉 The performance of GPT-3.5 and GPT-4 drops significantly after reaching their context window limits, while MemGPT maintains performance regardless of the number of documents retrieved.
    23:12 🔄 MemGPT requires system instructions for memory management which consumes a portion of the token budget, a trade-off for its enhanced document retrieval capacity.
    23:54 🤖 Reference to Park et al. paper on enabling memory in LLMs (Large Language Models) and observing emergent social behaviors in a multi-agent environment.
    [24:21](youtu.be/QQ2
    Made with HARPA AI

  • @leegregory5617
    @leegregory5617 11 месяцев назад +4

    Yes please do another video if they incorporate open source. This looks awesome, but I don't want to use an Open AI key. Another great video BTW :) You are my go to AI RUclipsr.

  • @mshonle
    @mshonle 11 месяцев назад +3

    About lossy compression: it’s fascinating to me that lossy *text* compression can act as a normalizer, including replacing misspelled words or typos. I wonder if the output of recursive reflection is text or an embedding? As embeddings they can have more nuance than can be expressed in words (eg, “like a unicorn but even more mythical”) but that nuance could accumulate noise as well.

  • @karenrobertsdottir4101
    @karenrobertsdottir4101 11 месяцев назад

    Would love to see this for something like Falcon or Mistral. LLaMA 2, which everyone seems to target, isn't actually "open".

  • @luizbueno5661
    @luizbueno5661 11 месяцев назад

    Yes, please!
    Thank You for this video.
    And please , as soon as they release it with Open source LLMs, love your videos.

  • @jp00738
    @jp00738 11 месяцев назад +3

    Hey Matthew, great tutorial. Wondering if its already possible to use local llms with it by using the openai format apis on services like textgen webui?

    • @matthew_berman
      @matthew_berman  11 месяцев назад

      Possible, yes. Not out of the box though. Also, they are working on making it native.

  • @danberm1755
    @danberm1755 11 месяцев назад

    Well done! That was brilliant and the synergy between NN OSs and AutoGen seems like the way forward for sure.

  • @keeganpenney169
    @keeganpenney169 11 месяцев назад

    I have some random thoughts just to throw into the mix. First the personal attention grabbing motive demonstrated, I mean it's a fun example but if Google or Facebook had something like this in their user interface acting like that would waste so much of one's time, LLMs are one thing but keep in mind both companies want you to keep scrolling, keep the ads coming, so something that was so personally engaging with you based on memories and known likes, pre convo context. I just think it would be horrible for someone like myself, it could easily be trained to waste my time and try to fill a social void. Idk that's just one thought.
    Second I do like what they're doing however like I stated before this kind of extension to a model might guide us individually to be more lax with our train of thought and prompt direction.
    Anyways cool review of it!

  • @AaronSherman
    @AaronSherman 11 месяцев назад

    Definitely would love follow-up on the future open source model usage!

  • @defaultdefault812
    @defaultdefault812 11 месяцев назад +1

    What is the downside to using an external context? In terms of speed, etc. It still can't get around the problem of 'rationalising' beyond it's currrent context in a single call. I don't really see how different it to a knowledge base using vector embeddings, just for storage of previous conversations.

  • @ElleDyson
    @ElleDyson 11 месяцев назад +2

    While I acknowledge there are other similar concepts floating around, I think the MemGPT ease of use, documentation and open sourcing is a great resource. Maybe I need to read the entire paper, but I am curious whether the "working_context_apend" feature is self guided, or a schema specified by the programmers, eg. "Key Personality Trait" - did the LLM decide this was something to remember, or that was pre-defined ?

  • @curtkeisler7623
    @curtkeisler7623 11 месяцев назад

    Definitely want a tutorial with open source models and thank you so much for doing all of this I've learned a ton from you

  • @NotNonaSoft
    @NotNonaSoft 11 месяцев назад +1

    *BUT YOU DIDNT HAVE TO CUT ME OFF*
    **galaxy brain meme**

  • @alexjenkins8026
    @alexjenkins8026 11 месяцев назад

    Epic vid thanks for the insight.
    Seems like a much better solve than the attention sink paper.
    Excited to see this in the wild.
    The very basic install instructions seemed out of place.

  • @nerdkillz7872
    @nerdkillz7872 11 месяцев назад +1

    Matt often mentioned "chat with your docs". Wich programs/ projects are out there for this task, and wich do you think are the best?

  • @ryzikx
    @ryzikx 11 месяцев назад +2

    9:59 as an amateur author use recursive summarization to communicate my ideas to LLMs all the time so i can't wait to see if this will be better

  • @pconyc
    @pconyc 11 месяцев назад

    Definitely interested when this goes open source. Thx for this!

  • @ianknowles
    @ianknowles 11 месяцев назад +1

    This is already massively out of date. We've been doing this for a while with contextual awareness and temporal filtering with vector search for the last couple of years. Even the functional aspect has existed for a while.

  • @Singularian-ii7rf
    @Singularian-ii7rf 11 месяцев назад +1

    Can u also make a video about: BitNet: Scaling 1-bit Transformers for Large Language Models ! Please!

  • @kasomoru6
    @kasomoru6 11 месяцев назад

    I have attempted to do a couple of you tutorials, but somewhere around the last step it always has some issues. this time it was the import command. I would like to see you do this live at this point.

  • @davidlavin4774
    @davidlavin4774 11 месяцев назад +1

    As this continues to evolve where does the line between fine tuning the model with additional data vs having extended memory for this long term context. I realize that the memory is only for an instance of a model, but do they perform similar functions in some regards? For example, do you upload documents into the model or into the extended context?

  • @Random_Innovation
    @Random_Innovation 11 месяцев назад +1

    You are my goto Mac Ai guy! Keep it up!

    • @matthew_berman
      @matthew_berman  11 месяцев назад

      Can I also be your PC AI guy too? :)

  • @JimLove1
    @JimLove1 11 месяцев назад +1

    I like all your stuff but this video blew me away. Even though you include a transcript, I had to keep stopping it to make notes. Well done. The only place I stumbled was the many different but slightly similar constructs. I'm still working to wrap my mind around that. For instance, you had a reference to system instruction, conversational context and working context. Later you refer to recall storage and archival storage which I assume are the same as main context and external context. Later you have working context and recall. I'm sure it's just me, but I'm trying to sort that out in my own mental model. But again, well done!

  • @timschafer2536
    @timschafer2536 11 месяцев назад +2

    This would be awesome if it would’ve allowed for local llms.

    • @matthew_berman
      @matthew_berman  11 месяцев назад

      It does. Check my later videos :)

  • @fotoflo
    @fotoflo 11 месяцев назад

    Thanks Matt, Really interesting. Seems like they've essentilly built the ability to query an embeddings database directly into the model's application runtime - kind of like langchain without the work. Way off?

  • @noahtell1314
    @noahtell1314 11 месяцев назад

    Might be an optimization, but quadratic growth is not an issue though. Compute grows exponentially.

  • @thanksfernuthin
    @thanksfernuthin 11 месяцев назад +1

    Please don't use "video glitch" transitions. They aren't cool. And they give people like me who were around when computer video was a nightmare to work with PTSD.

    • @matthew_berman
      @matthew_berman  11 месяцев назад

      Lol ok I’ll not use them next time

  • @friendofai
    @friendofai 11 месяцев назад +2

    This was such a good episode. The fact that the LLMs have memory like humans remembering the first and last, wow. I want this. Great episode!

  • @phonejail
    @phonejail 11 месяцев назад

    This was such a great breakdown, even I understood. Wow.

  • @KeyonThomas
    @KeyonThomas 11 месяцев назад +2

    Could you possibly use LLM studio to use MemGPT with open source models since it forces the Open Source model into the Open AI API structure?

  • @crawkn
    @crawkn 11 месяцев назад

    Very comprehensive analysis, thanks. This is encouraging, but I wonder if limitations on memory aren't at least in part a safety feature, i.e. could much larger memories already be in use experimentally, but considered too risky for public use?

  • @robertbyer8189
    @robertbyer8189 11 месяцев назад +3

    Love the videos. Definitely want to see more on MemGPT as I believe this is going to be the next huge move in development.

  • @adama7752
    @adama7752 11 месяцев назад

    What about creating a response from chunks of the database, then a result from those results.
    Instead of lossy compressing/reflecting the database?

  • @J3R3MI6
    @J3R3MI6 11 месяцев назад +1

    Do you have to use conda if you’re cloning it in Replit?

  • @djstraylight
    @djstraylight 11 месяцев назад +1

    MemGPT really needs contributors. They have lots of todo items on their list. It also needs some work on error handling, especially when the openai API gets overloaded and times out.

  • @Bronco541
    @Bronco541 7 месяцев назад +1

    Notice weve already started referring to the LLM as "they"

  • @wan2lmao
    @wan2lmao 11 месяцев назад

    Unfortunately it is the quadratic increase in computation that is the magic , only thing that has changed this time is the compute , underneath it is still the same math

  • @mordokai597
    @mordokai597 11 месяцев назад +1

    things like textgen have qlora training built in that runs on fairly low spec hardware... have an option to train a lora from the long term memory on a schedule . start with a default lora trained on synthetic MemGPT input/output text pairs with the FULL Memgpt system header, then use short hand system messages during inference to give it 'reminders' on whatever aspect of the complete system protocol is the most important for that step.

  • @titusfx
    @titusfx 11 месяцев назад +1

    🎯 Key Takeaways for quick navigation:
    00:00 🧠 AI's lack of memory is a significant hurdle to improving artificial intelligence.
    00:29 💾 Current AI context windows are highly limited, even in large models like GPT-4.
    01:24 📄 MemGPT (Memory GPT) introduces a solution to expand AI's memory capacity.
    02:06 🖥️ MemGPT aims to mimic the memory management of an operating system, with RAM and hard drive equivalents.
    03:27 📊 Increasing context length in AI models incurs significant computational cost.
    04:23 🤖 MemGPT autonomously manages its memory through function calls, allowing dynamic context adjustments.
    05:32 🔄 MemGPT divides memory into main context (like RAM) and external context (like a hard drive).
    06:14 📊 Large parts of the context in AI models are used for system messages and pre-prompts.
    08:03 🤯 Recursive summarization, a previous approach, leads to significant memory loss over time.
    10:09 🧠 MemGPT can correct false information and update its memory during conversations.
    12:58 💾 MemGPT uses recall and archival storage to manage external context efficiently.
    14:23 📜 MemGPT performs self-directed memory edits and retrieval via function calls.
    16:14 🗣️ MemGPT excels in maintaining conversation consistency and engagement.
    20:12 📄 MemGPT addresses document analysis challenges posed by lengthy documents.
    21:06 🔄 Scaling context alone doesn't solve uneven attention distributions in large AI models.

  • @drgutman
    @drgutman 11 месяцев назад

    yes, please let us know when it can handle local llms.
    I hope there is someone who works on a finetuning method to make any model handle function calls.

  • @davidallred991
    @davidallred991 11 месяцев назад +3

    Great video, Exciting stuff. Memory access is a huge limiting factor especially within coding projects so I can see this really moving things forward. It seems like this would give you the benefit of a huge LLM like ChatGPT that then can be "trained" or augmented to your specific use and data set while still retaining all of its full training data.

  • @freedom_aint_free
    @freedom_aint_free 11 месяцев назад +2

    Claude 2's 100k token limit is enough to process whole books, way bigger than GPT-4, unfortunately it's dumber than a bag of hammers, I've found that it's mostly useful to summarize texts.

    • @ryzikx
      @ryzikx 11 месяцев назад +1

      not quite whole books . i had to split my novel into 3. but it is far better than gpt4 for handling large amounts of text in my situation too

    • @btm1
      @btm1 11 месяцев назад +1

      +1. I tested it too and was really dissapointed.

  • @TomTrval
    @TomTrval 11 месяцев назад

    Hey that is what I was working on for my Dungeon and Dragons DM AI :D

  • @Sean.Vosler
    @Sean.Vosler 11 месяцев назад

    Thinking about what you’re thinking… subconscious analysis of thoughts based on beliefs… Seems like the CPU/Ram/HD analogy could be better replaced by how our minds actually process information. Love this stuff! Thanks for breaking it down

  • @fabiom.3910
    @fabiom.3910 11 месяцев назад +1

    In an earlier video you mentioned a website where we can find such papers specifically for ai. I've forgotten the name of the website 😕. Can you say me the name or is anyone here know the website? 😋
    Really cool stuff and nice videos, Matthew!😇

    • @matthew_berman
      @matthew_berman  11 месяцев назад +1

      Arxiv

    • @fabiom.3910
      @fabiom.3910 11 месяцев назад

      @@matthew_berman thank you!:-) I realy enjoying you content. And the best is that my english getting better and better🙈😅

    • @messengercreator
      @messengercreator 11 месяцев назад

      don't u worry about that for paper I'll introducing AI CHAT DEEPAI the AI CHAT DEEPAI is so powerful and much better than OPENAI since start 2015-2018 and u can show ur picture and video and anything u want

  • @13exousia
    @13exousia 8 месяцев назад

    I would be interested in you keeping us up to date with this subject and its procedures and call. I would like to chat with you personally about something I have been thinking about going through your videos thus far.

  • @chrismadison8946
    @chrismadison8946 11 месяцев назад +1

    Love this video and thanks so much for the in-depth post! Accurately explains the theoretical science along with the practical implementation 🙏🏾

  • @TraderRonzoni
    @TraderRonzoni 11 месяцев назад

    Definite video when they incorporate open source models

  • @Artorias920
    @Artorias920 11 месяцев назад +6

    Brilliant research & Brilliant video! Firm handshakes to you and the MemGPT team 🤝

  • @tommyboi0
    @tommyboi0 8 месяцев назад +1

    Any idea if this is installable on Apple silicon

  • @wholeness
    @wholeness 11 месяцев назад

    Open Source Models Are Here!

  • @silversobe
    @silversobe 11 месяцев назад +1

    Surprised this hasn't happened already! Huge step forward.

  • @Yipper64
    @Yipper64 11 месяцев назад

    when can we expect this type of thing to come to chat GPT?
    I mostly use chat GPT to bounce ideas off of for projects I work on, and the way ive done it so far is by writing everything in a notion document, and then exporting it, and uploading that to chat GPT when I want to work with it. It would be so nice if I could just have a single chat for that instead and it wouldnt get wonky on the memory.
    And I know this is working with chat GPT's API but, well as you said, expensive.

  • @Dron008
    @Dron008 11 месяцев назад

    That's interesting but it is still not a universal approach. It would still not allow asking question about a document as a whole like compare this book to other book because to di it it needs to load all the book into the context. Of course it may store summarized version of the book and compare them but it may miss something. Same problem will be with code generation. I don't know where and how it may store a big software project to answer any question about it and to develop it. Compare it to our brain. It effectively stores in context all our life since our birth or even before it You may recall some dialogs from your childhood, some emotions. Even if you think you forgot them, they could be occasionally restored once due to some association. The context is very long and our brain has just 86 billions of neurons. So we probably need different architecture.

  • @godned74
    @godned74 11 месяцев назад

    With gpt 4 as far as mem is concerned I was having success talikng a picture of my prompt, uploading it to gpt4 basicly getting maximum tokens because the upload of the png doesnt use tokens. You can also compress a text file to zip and do it that way aswell. I did notice an improvment in the quality and accuracy of gpt 4 doing it this way. I would lovve to get the autogen and mem gpt working on my music generator project but so far I can still get more done regurgitating the outputs by hand.

  • @RandPrint
    @RandPrint 11 месяцев назад +1

    i'd like to see a team of agents using memgpt work on a complex task

  • @mungojelly
    @mungojelly 11 месяцев назад

    abstractly the idea of storing memories makes sense but um the particular way that this system remembers things is,, shockingly credulous,, if it's going to be grounded in reality at all it needs to not just believe particular things it's heard immediately and unreservedly, it needs to have more complex memories w/ provenance & other metadata to give them contextual meaning

  • @Tom17ire
    @Tom17ire 11 месяцев назад

    LLMs have cache memory as is.
    This is great etc but on a technicality when LLM’s are in use they use cache.

  • @isitanos
    @isitanos 11 месяцев назад +1

    A lot of things discussed here are very similar to how human memory works. We can hold a limited amount of data in our short attention window. Our brain can store a lot of long-term info but buries it deeper and out-of-reach if it thinks it's not currently relevant. It also seems to compress memories by letting us remember easily the most important details of an event but burying the rest deeper. And we have all kinds of techniques or "functions" to jog our memory to bring back old data we know we have somewhere, store more short-term stuff efficiently when cramming for an exam, and so forth.

    • @dekumutant
      @dekumutant 11 месяцев назад

      The more i think about multi model systems the more i see similarities with how our brains divy up task priorities. Its both freaking me out and exciting me to be honest

  • @ShaneHolloman
    @ShaneHolloman 11 месяцев назад +1

    Thanks for the great content, Ive learned a lot from your Ai curation.
    Due to the pervasive sound effects I use subtitles on your channel. Keep up the great work

    • @matthew_berman
      @matthew_berman  11 месяцев назад

      You don’t like the sound effects you’re saying? I’ll reduce them in future videos if people don’t like them.

  • @abdelhaibouaicha3293
    @abdelhaibouaicha3293 11 месяцев назад

    📝 Key points:
    📌 Mem GPT is a solution to the limitations of AI models' context windows, allowing for effective memory management through function calls and virtual context management.
    🧐 Mem GPT autonomously manages its own memory, utilizing a memory hierarchy similar to traditional operating systems, and allows for repeated context modifications during a single task.
    🚀 MGPT (Multi-Granularity Pre-training) manages memory through function calls using databases, recall storage, and archival storage, enhancing its conversational abilities.
    💬 The paper discusses the challenges of overflowing context windows, introduces the concept of recursive summarization, and highlights the tradeoff between retrieved document capacity and memory management.
    📊 Testing showed MGPT's accuracy and ability to craft engaging messages by referencing prior conversations.
    🌐 The short-term plan for MGPT is to enhance user workflows, while the long-term focus is on improving the performance of GPT 3 and A2 or creating new open-source models.
    💡 Additional Insights:
    💬 The paper references a 1988 paper on computer memory systems to support the approach of mem GPT.
    🌐 Combining autogen with MGPT could provide agents with unlimited memory.
    📊 The installation process of MGPT is discussed, including cloning the repository, creating a conda environment, and setting up the OpenAI API key.
    🌐 The project is in its early stages and may be costly to use, but the author is enthusiastic about its potential and future updates.
    📣 Conclusion:
    Mem GPT offers a comprehensive solution to the limitations of AI models' context windows by autonomously managing memory and utilizing a memory hierarchy. MGPT enhances conversational abilities and shows promise in accuracy and engagement. The short-term plan is to enhance user workflows, while the long-term focus is on improving existing models or creating new open-source ones. The project is in its early stages but has potential for future development.
    By: talkbud.lamd.ai/

  • @melihcloud
    @melihcloud 10 месяцев назад

    I was thinking a similar and more basic system in my startup 4-5 months ago every chat message is turned to an embedding and then every time user sent a message it will append most similar 5 messages and append them to message history

  • @0ldPlayer
    @0ldPlayer 11 месяцев назад

    Gonna be that guy for a minute but why is the assumption being made that larger prompt windows = better ?

  • @skud9999
    @skud9999 11 месяцев назад

    Gotta pont out, that's pretty much an analog of how humans process memory as well. also when it says :working-context-append Key Personality
    high-speed, adrenaline-rush activities and intense gaming sessions in CSGO) a slightly more charitable reading of it would take the CSGO part as just a descriptor of things that are fast paced and adrenal pumpin activates like formula one racing.

  • @animeswitch
    @animeswitch 11 месяцев назад

    its a shame we don't have small models as powerful as GPT 4 yet, once we do, we are going to be able to make such incredible things.

  • @alfredgarrett6775
    @alfredgarrett6775 10 месяцев назад

    Is there a way to use mem GPT in a less conversational way?
    Sort of like fine-tuning, but only extracting relevant information from a large document and adding it to the prompt?
    I have been able to make it work for my use cases using ChatGPT-4's larger context window, but especially with the larger documents, I am spending more on tokens and now hitting my daily rate limits. It almost feels like it might be cheaper to run 2 prompts, one to extract the information, and one to perform the querying. It would still be cheaper than hiring someone and worth it in terms of time consumption. I think I spent about $1.30 on 400+ entries but still looking for ways to run cheaper and optimize.
    Another Idea I had was to get a sample of data back using the newest chat gpt 4 model and then use it to fine toon the regular gpt 4 or gpt 3
    If I could have MemGPT query the data, it would make a world of difference.