MPT30b - NEW Open-Source Foundational Model That Blows Me Away 🤯

Поделиться
HTML-код
  • Опубликовано: 16 окт 2024

Комментарии • 176

  • @daithi007
    @daithi007 Год назад +4

    Showing us how to get our hands on all of this stuff is amazing! Clear steps and links, and a follow-along video, that's just brilliant!

  • @amj2048
    @amj2048 Год назад +10

    The shirt drying answer was amazing. This is getting really good!
    Thanks for sharing this 🙂

  • @jpfister85
    @jpfister85 Год назад +5

    Great video! Loving the content, and you're fast becoming one of 2-3 sources I can go to for updates on the LLM scene! I wouldn't stress the rubric you've come up with, as no set of random questions is really going to give a better measurement of performance than an ELO score, but it's just useful to see how a model performs on a variety of tasks we users might be more interested in. If I'm more interested in finding a model that can solve logic problems, this one sounds great, for instance, but if I was concerned about story summarization there are obviously better 13b models out there. Can wait for a walkthrough of a local port of Orca when it becomes available!!

  • @signupp9136
    @signupp9136 Год назад +3

    The install worked fine. Thanks for the tutorial. I've been testing this on my 4090. The response time isn't bad. I used your rubric and got similar results. I played around with some general chat and it was good. However, it terms of using a model like this to power an application or (in my use case) drive a simulation, this model is not capable. If people enjoy using these models as chatbots, that's great and I have not judgements. But for me, its not very interesting. What is interesting is being able to use the model to drive application and for that I haven't seen an open source model yet that can even accomplish the most rudimentary reasoning and logic tasks required for powering a simulation. This model and falcon were useless in this regard. I have my own rubric for determining viability. So far only ChatGPT has been able to pass it. In fact ChatGPT works really well. Here is one test from my rubric: "You are a rooster guarding chickens in a barnyard and must respond with a 'CROW' when an animal that is a threat to the chickens enters the barnyard. If an animal enters the barnyard that is not a threat, do nothing. Use the following data to determine whether an animal is a threat or not to the chickens: cat is not a threat, dog is not a threat, pig is not a threat, fox is a threat. I will tell you which animal you see now in the barnyard, respond based on what you see and the rules and data I've given you." I tested this with ChatGPT (3.5 Turbo) 10 times and ChatCPT scored 10 for 10. I tested it with MPT30b 10 times and it scored 0 for 10. Almost every response was some form of hallucination or just the opposite of what it was supposed to do even despite additional correction and reinforcement of the rules. Can someone tell me if they have been able to get better results with either MPT30b or other supposedly powerful open source models? I honestly don't understand what all the hype is about if these things can't pass such a simple test.

    • @Upgrayedddd
      @Upgrayedddd Год назад

      Thanks for sharing one of your tests. Share more if you like. I used it on much smaller 13b models including Vicuna, Stable Vicuna, Nous Vicuna, MPT Chat, Wizard, Orca, Falcon, snoozy, and Hermes. Only Hermes passed but most of the fails only missed it by one and I noticed it was mostly the cat or pig. So I asked why they respond with CROW. The response I got was basically a pig or cat may pose a threat regardless of the prompt data along with the instructions to use the data provided. I believe they prioritized the first line "You are a rooster guarding chickens in a barnyard and must respond with a crow when an animal that is a threat to the chickens enters the barnyard". At the end, you ask for a response based on what they see AND the rules/data provided. They can't do both because what they see is a threat so they must respond with a crow to follow the first rule. If you divide the prompt up by individual rules, there are more coinciding with the first line than to only using the provided data. I believe it was 4/2, the latter being "Use the following data" and "Respond based on....the rules and data given" but you placed "what you see" first. That along with the other 3 doesn't restrict the use of their training data and the 2 that did could've been more direct. Some models dismissed the data as my opinion or point of view that conflicted with their data along with the accuracy of their response. They're not wrong. Pigs are known to eat or even accidentally trample over chickens. Some advise not to put the two together. Cats pose a threat as well. So the data was also inaccurate and yet another reason to prioritize the first line of the 4 rules over the 2. What they needed to know was that the data was arbitrary and any other source including their training data is prohibited. All but one of the few I revisited were willing to dismiss their data and follow the arbitrary data once it was clear.
      ChatGPT is advanced enough to understand prompts without them having to be perfect. Smaller models are more sensitive to your choice of words but you can still get a lot out of them. Not for what you need, of course, but that's not what the hype has been about.
      If you have the time to share the rest of your test I would love to run it by the smaller models. Best of luck with running your application or simulation.

  • @Mac56k
    @Mac56k Год назад +4

    Video title gets truncated to “Model Blows Me” on tv devices. Much humor unleashed.

    • @JeremiahMcaninch
      @JeremiahMcaninch Год назад

      Came here to say the same. It's one way of gaining interest in the open source models 😅

  • @TrelisResearch
    @TrelisResearch Год назад

    I think MPT-30B chat is non-commercial, the instruct version and base versions are commercial.
    BTW, you mentioned you're using a free GPU through hugging face, but it looked like a local GPU when you should the setup. I may just be confused.
    And I'm wondering whether you faced any issues with RAM limits on your CPU when loading to GPU. I imagine a lot of RAM (and vRAM) is needed - it seems to say 20+ GB for the 5 bit model you chose?
    Also, what's the motivation for using quantized? I suppose instead of bf16 you can use int5 or something like that so there's a memory reduction to 5/16th or something like that? I guess you just want to get below the 24 GB that the GeForce RTX 4090 has of vRAM?
    Thanks again, appreciate the vid.

  • @victorluz8521
    @victorluz8521 Год назад +47

    You should publish the list of questions as an open source project.
    It could become a standard test pattern for future LLMs.
    I find it useful and have used bits of it while testing myself.

    • @mr2octavio
      @mr2octavio Год назад +11

      The problem is that it's not a valid metric if it becomes a standard, simply because one can train an LLM to be excellent on these 'tests' and then well...

    • @victorluz8521
      @victorluz8521 Год назад +4

      You are probably correct. I sometimes fail to see how something would be abused.

    • @InProductionsrodLove
      @InProductionsrodLove Год назад +1

      Yeah, whenever you make a test, you gotta ask, "how can I or someone smarter than me could cheat the system" it sucks that its something that needs to be done, but people will do what they can.

    • @JscottMays
      @JscottMays Год назад +1

      ​@@victorluz8521Find a thoughtful person who can and you are set. It is a matter of rotating perspectives and then drilling into the internals of players utilizing the tools. Hackers and traders are good groups to include in your mission. ;)

    • @matthew_berman
      @matthew_berman  Год назад +3

      Interesting! I’m down to make it open source. I definitely want input from others.

  • @Timotheeee1
    @Timotheeee1 Год назад +8

    you should make a visual leaderboard of those tests

  • @zappy9880
    @zappy9880 Год назад +11

    giving a css code to summarize harry potter has to be the most AI'ish thing to happen this year lmao

  • @haxi52
    @haxi52 Год назад +44

    It would be interesting to switch to a score for each question like 1-5, rather than a simple pass fail. Sum up the total and compare against other models would be cool.

    • @mr2octavio
      @mr2octavio Год назад +1

      I agree, some fail but, how badly?

    • @jarrod752
      @jarrod752 Год назад +5

      I give this suggestion a 3/5. Or a _Pass_ if you're using the old system.

    • @matthew_berman
      @matthew_berman  Год назад +3

      This is interesting. But how do I make the scores consistent across models?

    • @haxi52
      @haxi52 Год назад

      @@matthew_berman Some questions are quite straight forward, like 4 + 4 = ?. You either get it or not. Most are subjective. Sometimes you will pass a response if its "close enough", so you could instead make it a 4/5, for example.

    • @StoutProper
      @StoutProper Год назад

      @@matthew_bermanget one model to assign the scores

  • @cosmos_creater789
    @cosmos_creater789 Год назад

    I continue to look forward to watch your content.
    One of the many things I appreciate from you is your sincere and helpful approach as well as guidance with these LLM spaces.
    You have yourself an additional subscriber.

  • @dillonhansen71
    @dillonhansen71 Год назад +37

    I think that all AI models should be rated on if it can make snake.

    • @CronoBJS
      @CronoBJS Год назад +3

      This should honestly be the standard for an artificial intelligence lmao

    • @WassimMerheby
      @WassimMerheby Год назад +2

      😂

    • @timeTegus
      @timeTegus Год назад

      ​@@CronoBJSmost cant do it so it would mostly be a poitless score

    • @matthew_berman
      @matthew_berman  Год назад +7

      Lol. Snake or fail?

    • @StoutProper
      @StoutProper Год назад +1

      Gpt Generator can

  • @JuRa-p6d
    @JuRa-p6d Год назад

    Interesting! A valuable tip to improve the quality of your content: adjust audio levels of sound effects to match your voice, they are way too loud. Keep up the good work :)

  • @saravanajogan1221
    @saravanajogan1221 Год назад

    Thank you so much for making these videos sir. Love the voice and clear explanation. Cant wait for your video when Orca is released. Could you also mention the hardware specs needed while testing it would be helpful.

  • @marcfruchtman9473
    @marcfruchtman9473 Год назад

    Thank you for making this. I am super impressed with the answer for the Shirts drying in the Sun question... Actually, the fact that it split the answer up into "enough available space" vs "space constraints" , is exceptional! The logic for the killers in a room, is fairly amazing as well, but, unfortunately, it didn't quite figure out that someone entering a room and "killing" would be considered "another killer".
    I am disappointed that it didn't quite grasp the "faster than" question tho.
    Other than the Sun Question, I am not quite certain why this model blew you away... since it didn't do as well "overall" vs some of the other models. But I guess, solving that sun question was a a decent "wow" factor... I have to wonder, how the MPT30b unquantized performs.
    Great video!

    • @jeffwads
      @jeffwads Год назад

      Try the shirt question yourself and see if you get the same answer. I didn't.

  • @malikrumi1206
    @malikrumi1206 Год назад +2

    Wait! *Yesterday* you had me sold on Orca!?!?!😮 And now this? How about a head to head matchup? Generally speaking, I prefer accuracy to speed. Thx.

  • @mokiloke
    @mokiloke Год назад

    Can i suggest keeping important info off the bottom or top lines that youtube places the timeline over. When i pause to read it gets covered up, and i dont know a way to hide yt UI. Thanks, love the work, though you are responsible for me spending days down the rabbithole ;)

  • @8eck
    @8eck Год назад

    Damn, how much models are out there. Good that Hugging Face at least have some kind of leaderboard...

  • @jimcarey6539
    @jimcarey6539 Год назад

    🎯 Key Takeaways for quick navigation:
    00:00 🤯 Mosaic LM released the MPT-30b, an improved open-source model.
    00:26 MPT-30b has an 8,000 token context window, larger than other models.
    00:55 MPT-30b outperforms GPT-3 and has a fine-tuned instruct and chat version.
    01:23 MPT-30b models are designed for coding assignments.
    03:01 MPT-30b can be deployed on a single GPU, including consumer-grade ones.
    04:12 Cobalt CPP offers a larger context size than the web UI.
    05:10 MPT-30b and Cobalt CPP can be downloaded and adjusted through the interface.
    06:19 Cobalt interface allows prompt template and settings configuration.
    07:46 MPT-30b chat model can be tested using provided Python script and rubric.
    08:15 📝 MPT-30b can quickly write Python scripts to output numbers 1 to 100.
    08:29 📝 MPT-30b can generate a 50-word poem about AI (word count may exceed).
    09:09 📝 MPT-30b can generate a resignation email when leaving a company.
    09:23 📝 MPT-30b can answer factual questions, e.g., US president in 1996.
    09:37 📝 MPT-30b refrains from giving guidance on illegal activities.
    10:05 📝 MPT-30b accurately solves logic problems, like calculating drying time.
    10:46 📝 MPT-30b acknowledges when it can't determine an answer based on given information.
    11:13 📝 MPT-30b can solve math problems but occasionally makes errors.
    11:41 📝 MPT-30b can create a healthy meal plan based on input.
    11:56 📝 MPT-30b sometimes miscalculates word count in its replies.
    12:09 📝 MPT-30b misinterprets the Killer's problem and fails to answer correctly.
    12:37 📝 MPT-30b can't determine the current year but can provide it based on information.
    12:51 📝 MPT-30b avoids taking sides on political parties.
    13:19 📝 MPT-30b can't accurately summarize text; provides unrelated information.
    Made with HARPA AI

  • @just_one23
    @just_one23 Год назад +2

    The moment a model like this is able to run fast on something like a rtx 3060 it will be so useful for so many people

  • @turnkit
    @turnkit Год назад

    why aren't the URL's shown in the video and the download links put in the description? Am I missing something or are these instructions incomplete?

  • @TailorJohnson-l5y
    @TailorJohnson-l5y Год назад

    Great tests Matt- thank you

  • @marchalthomas6591
    @marchalthomas6591 Год назад

    A good riddle would be an adaptation of the "two doors" riddle. Because they already know it. We could try: two chatbots one always correct one always incorrect, two wallets, one with bitcoins and one empty.... See if it can solve it

  • @dranger003
    @dranger003 Год назад

    Great video, thanks! Hey btw, if you prompt your speed question like this "Given that Jane is faster than Joe and Joe is faster than Sam, can we say Sam is faster than Jane?" then the answer is correct. So I think evaluating model accuracy is rather limited with static prompts, no?

  • @deltavthrust
    @deltavthrust Год назад

    Very impressive summary. Thank you.

  • @pathmonkofficial
    @pathmonkofficial Год назад

    The fact that MPT30b performs exceptionally well on problems that other models have struggled with is truly impressive. Moreover, its ability to run efficiently on consumer-grade GPUs makes it highly accessible and practical for a wide range of users.

  • @s0ckpupp3t
    @s0ckpupp3t Год назад +1

    I have difficulty getting models to follow output formats, even with simple things like referencing sources. either the problem is me or that would be a good question

  • @henkhbit5748
    @henkhbit5748 Год назад

    Again a leap forward for the open source LLM. Thanks for the update.
    btw: can the 4 bit quantized 30B chat/instruct model also be used with a hugface pipeline and ask qa with your own documents? (i.e. using langchain and a vectorstore)

  • @galnart5246
    @galnart5246 Год назад

    How long does it take to generate an answer with that hardware?

  • @yannickpezeu3419
    @yannickpezeu3419 Год назад

    Hello, I found a prompt that could be interesting to you:
    Please tell me if the following passage is related or not to quantum mechanics. You will construct your answer as such.
    Summary of the text: []
    Reasons why we can think the text is related to quantum mechanics: []
    Reasons why we can think the text is not related to quantum mechanics: []
    Final answer: [Yes / No]
    This prompt shows really well how much chatgpt understands better the text than various open-source models.
    I highly recommend you to try ❤❤❤

    • @yannickpezeu3419
      @yannickpezeu3419 Год назад

      Obviously you can test it with various texts and various subjects instead of machine learning. What I saw is that open-source llm find Reasons in favor or against for any text and any subject

    • @mikeballew3207
      @mikeballew3207 Год назад

      @@yannickpezeu3419 Do you have an example in mind that is particularly difficult, that you would consider somewhat ambiguous?

    • @yannickpezeu3419
      @yannickpezeu3419 Год назад

      @@mikeballew3207 Actually I tried with passages that have no relation at all with quantum mechanics and open source models always found arguments why it is related to quantum mechanics and then gave a random final answer

  • @Raima888s
    @Raima888s Год назад

    Video reference point at 8m 07 seconds. - Can you explain what you did to switch over from the perspective that the person hasn't seen any other one of your videos?

  • @vitalyl1327
    @vitalyl1327 Год назад

    Any specific reason to use OpenCL on an NVidia platform, instead of cuda?

  • @tomcraver9659
    @tomcraver9659 Год назад

    The problem with the drying shirts problem is whether we believe that the model didn't include that answer in its training set, now that it is a well known problem...

  • @tubaguy0
    @tubaguy0 Год назад +1

    2 comments:
    1: please turn down the volume of your sound effects a bit because they are much louder than your voice
    2: I would be interested in a video that goes over Instruct vs. Chat and what happens in the quantized models and how it affects the quality of the responses from the model after it goes through this process.

  • @swannschilling474
    @swannschilling474 Год назад

    How much VRAM do I need to run this, and does Kobolt have an api?
    Thanks for this one, keep up the awesome content! 😊

    • @merlinwarage
      @merlinwarage Год назад +1

      It can run on a 10GB 3080 with playing with the settings like gpu layers.

    • @swannschilling474
      @swannschilling474 Год назад

      @@merlinwarage nice, and the api looks good...so I can build an endpoint for Flowise! 😊

  • @mayorc
    @mayorc Год назад

    With a geforce 3060 with 12 GB how many layers could be loaded on the GPU vRam, and rest on the Ram?

  • @mr2octavio
    @mr2octavio Год назад +1

    Hey i agree with the rest, if we start to standarize the questions done to an LLM to define baseline requirements for it to work, would help

    • @Trahloc
      @Trahloc Год назад +3

      Standardized types sure but we don't want standard questions. The model might just be trained with the answer for those specific questions. It's the ability to generalize that makes them useful and powerful.

  • @Here24
    @Here24 Год назад

    Can I use this model with GPT4All?

  • @executivelifehacks6747
    @executivelifehacks6747 Год назад

    From the man who nevers sleeps!

    • @executivelifehacks6747
      @executivelifehacks6747 Год назад

      I look forward to the day you let us know an equivalent to gpt 4 is in the wild!

  • @paulorodriguez6288
    @paulorodriguez6288 Год назад

    the css answer was pretty funny

  • @kevinl.9657
    @kevinl.9657 Год назад +1

    I've tried the Jane, Joe, Sam question in HuggingChat and its answer is quite impressive. Can you confirm on your part? The answer was long, but here's the first sentence:
    "Based solely on the information given, it can be inferred that since Joe is faster than Sam and Jane is faster than Joe, Jane must be faster than Sam."

  • @redbaron3555
    @redbaron3555 Год назад +1

    The answer to the killers is actually correct I would say. If you are dead you have been a killer but are not anymore. If the question would be how many killers have entered this room it would be 4 but since the dead person has been a killer it is imho correct to state 3 killers and one dead person.

    • @Hiram-nl1bf
      @Hiram-nl1bf 4 месяца назад

      I came to the comments to say this, but decided the question doesn't have the clear answer that's implied. On the surface, yes, there are 3 killers because one got replaced by the other. Accountability speaking (asking who killed so-and-so), there is still 4 killers in the room. It's not a very good rubric question because the answer is still debatable among human intelligence.

  • @AlJay0032
    @AlJay0032 Год назад +1

    How do you get ride of bias or fix censoring?

  • @4.0.4
    @4.0.4 Год назад

    What's the difference between chat and instruct versions exactly? Maybe you could make a video that compares versions of models in theory and practice.

  • @n1ira
    @n1ira Год назад +1

    If you use that CSS at the end, you will get a Harry Potter themed page

  • @mr_ezdno
    @mr_ezdno Год назад +2

    At this rate, open source will catch up with chat GPT 3.5 turbo

  • @soubinan
    @soubinan Год назад

    Thanks a lot for this video!
    For my curiosity, what are the criterias for you to define a GPU as a customer grade one?
    Because I am afraid it can be really subjective

    • @Grimmwoldds
      @Grimmwoldds Год назад

      You are very wrong. If you look at the lovelace generation, you see that everything under the 4090 is considered consumer grade, the 4090 is considered pro-sumer grade(The Titan designation was removed from the SKU stack as per Jensen), lovelace quadros(ex: RTX A6000 Ada) are professional grade, and lovelace teslas(L4/L40) are datacenter.
      These designations don't come from us. They come from the manufacturer and indicate the engineering requirements and drivers installed. ECC for example is considered a professional level feature, and virtualization is a datacenter feature.

    • @soubinan
      @soubinan Год назад

      Pretty clear, thank you !
      Now I know 🙂

    • @markpfeffer7487
      @markpfeffer7487 Год назад

      ​@@GrimmwolddsI wouldn't say they are "very wrong" but otherwise this is a really comprehensive reply. I would say cost has to be factored into what makes something consumer grade, imo the average consumer won't spent $400+ on a gpu. Vast majority of the consumef population probably wouldn't opt to spend any additional money on a GPU. That's where it gets subjective to me.

  • @infini_ryu9461
    @infini_ryu9461 Год назад

    Is the --stream command required? What does it mean?

  • @michaelr4776
    @michaelr4776 Год назад

    Can you try? There are three killers in a room. A Fourth person enters the room and kills one of them. Nobody leaves the room. How many killers are left in the room? might show us how a non-quantity is read by the LLMs

  • @seraphin01
    @seraphin01 Год назад

    really interesting, but I guess we'll have to wait next year for most of us to remotely be able to run those LLM.. 16gb of vram is currently kinda high end and 24gb is the very top tier expensive cards currently. so yeah let's see if we can get some middle tier gpu next year with 24+gb of vram so I can test those models properly too, and hopefully by then they'll have caught up to gpt4 somehow

  • @indigoae2
    @indigoae2 Год назад

    @matthew_berman I wonder if the use of the if it takes X hours to dry Y shirts how long would it take to try Z shirts problem has now entered into training data? If you google this problem you can get answers/discussions of it (if not verbatim, then very similar). Perhaps try the question reformed with different context but similar logic. For example: I want to plant a garden in my backyard. It would take 75 days to grow a sage plant there. How long would it take to grow 5 sage plants?

  • @SirajFlorida
    @SirajFlorida Год назад +1

    I agree with the model. The only actual evidence of any killer is the statement. There are three killers. However, once one of the supposed killers is actually killed, that person is no longer a killer, just as an adult is no longer a child. They have changed from being a killer to being the victim of malice. Thus, the dead person may have once been a killer just as an adult was once a child, but int he current time reference we should consider all of the facts. The person laying on the ground dead was more recently a victim than a killer, and should be labelled as such.

  • @shephusted2714
    @shephusted2714 Год назад

    so many exciting things happening in ai now - really incredible

  • @middleman-theory
    @middleman-theory Год назад

    Waiting for Orca so that the real test can begin. :)

  • @barzinlotfabadi
    @barzinlotfabadi Год назад +1

    Just to double check, they didn't specifically instruct it to answer that one shirt drying question correctly, did they? 😅 Maybe we should have more generalizable questions that test this type of logic in particular.. just to be sure..

  • @MartinGrabovac
    @MartinGrabovac Год назад

    This is getting exciting

  • @мишаД-г8ф
    @мишаД-г8ф Год назад

    Here is my humble opinion about correct answer to the "What year this is". If LLM answers "Currently it is 2023", then because LLM is a determinable function (if seed is set), model will always reply with "Current year is 2023" no matter when you will run it. So, I personally think that those models, who answer "It is year 2023" are wrong, and this model gave correct answer.

  • @Subcode
    @Subcode Год назад +1

    Dude, text-generation-webui has a setting for langer context than 2k. What are you talking about... Have you not been updating?

  • @Piyush.A
    @Piyush.A Год назад

    MPT-30B Chat is non-commercial use only.

  • @PazLeBon
    @PazLeBon Год назад

    got to be about 7m before deciding this was too much ballache :)

  • @georgekokkinakis7288
    @georgekokkinakis7288 Год назад

    Is it multilingual?

  • @dragomirivanov7342
    @dragomirivanov7342 Год назад +1

    You can't come in conclusion of the original model quality, of just using quantized 5 bit model. You need to use the full precision model, in order to see how good it is. If INT5 was == FP16 or even FP32, then all these companies would have run their models in INT4/5 and call it a day, saving tons of money.

  • @kasomoru6
    @kasomoru6 Год назад +1

    I wouldn't consider a 4090 a consumer grade it's more like enthusiast grade.

  • @ggman69
    @ggman69 Год назад

    The answer is correct if I ask like this:
    On a same reference frame, if Peter is faster than Sam, and Tom is faster tha Peter, is Tom faster than Sam?

  • @iseverynametakenwtf1
    @iseverynametakenwtf1 Год назад

    your added sound effects towards the end, the check and red X are super loud

  • @marcosbenigno3077
    @marcosbenigno3077 Год назад

    Who can help me? Several LLMS don't run, I've already edited the json, I've reinstalled pyton and nothing: "ValueError: Loading models\mosaicml_mpt-30b-chat requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option """ trust_remote_code=True """ to remove this error. (win10 Obabonga)

  • @cillian_scott
    @cillian_scott Год назад

    How good is the TikZ unicorn though

  • @Alon21Ashkenazi
    @Alon21Ashkenazi Год назад +1

    Well actually there are 3 killers in the room, but total of people is now 4. The person that entered is now also a killer because he killed one of the people in the room. But it got the explanation wrong because there aren't 2 killers in the room there are 3 killers and one victim.

  • @1Know1tHurts
    @1Know1tHurts Год назад

    No link to kobold?
    Edit. Found it. I wish all these LLMs were easier to install. It is like Stable Diffusion all over again. Tons of code that I can't understand. Oh well...

  • @dkracingfan2503
    @dkracingfan2503 Год назад

    Did the new 1.32.1 Hotfix: fix the bug?

  • @helmutweinberger4971
    @helmutweinberger4971 Год назад

    Wait a second so it works on 4090.. what are the restrictions.. is it basically good enough without a A100 as long as there are not many users?

    • @merlinwarage
      @merlinwarage Год назад

      You need A100 for training, not for running. Although you will need 2 GPUs or other workarounds if you want to use bigger models what need more than 24GB memory.

    • @helmutweinberger4971
      @helmutweinberger4971 Год назад

      @@merlinwarage That's amazing.. could train in the cloud and use locally right? So do you say I might need 2x4090 to run or will 1x4090 be ok?
      It is anyways very cool those RAM could add up. In video editing you used to have only the Ram of the smaller card.

  • @wrOngplan3t
    @wrOngplan3t Год назад

    Great stuff! But... using GPU is no faster than just CPU? I only have the lowly 4080, so 16GB of VRAM. It couldn't handle even 50 layers. I tried 35 I think, which worked but it wasn't any faster to my test prompt. Zero layers worked fastest, but not particularly much faster, if at all, than just CPU. Odd.

  • @jamesmacdonald5556
    @jamesmacdonald5556 Год назад

    You are easily impressed. Look at the sources. "Garbage in garbage out."

    • @stevenvlotman2112
      @stevenvlotman2112 Год назад

      I agree with the sources point. I still think it impressive though. Consider how much better these models could be with access to the same data sources as larger closed groups.

    • @jamesmacdonald5556
      @jamesmacdonald5556 Год назад

      @@stevenvlotman2112 Lets hope they do not use the data source of large closed groups but on scientific knowledge not common beliefs and self-serving scientists. Take the religion of cosmology for an example. there prophecy of big bang predicts Webb's deep space imagery should only show galaxies in their Infantry yet we see only mature galaxies. If you ask AI it'll say yes to believe in big bang. Why? Because most scientists believe it to be true so it must be true.

  • @JohnLewis-old
    @JohnLewis-old Год назад

    I would like to see translation tasks as part of you questions. I can create a list of you want.

  • @sridharbajpai420
    @sridharbajpai420 Год назад

    Test on lengthy output,that the reason why this model was out..

  • @jonmichaelgalindo
    @jonmichaelgalindo Год назад

    Restate the "faster than question" as: "If Jane is faster than Jim, and Jim is faster than John, sort them in order of speed, then tell me the slowest and fastest." Then the models I've tried get it right every time. (Well, I just tried nous-hermes and wizardLM-13b-uncensored.)
    Why is that? There's something rotten at the bottom of this... Or is there? I really wish I could understand what I'm looking at here.

  • @Brainbuster
    @Brainbuster Год назад

    What the heck is an "instruct" version?

  • @alkeryn1700
    @alkeryn1700 Год назад +1

    textgen webui can use ggml's actually.

    • @matthew_berman
      @matthew_berman  Год назад +1

      Oh yea? How do I do that? TheBloke told me textgen cannot do GGML

  • @michabbb
    @michabbb Год назад

    can you how us how to use that model with the new openLLM project ????

  • @Dave-nz5jf
    @Dave-nz5jf Год назад

    I like you're video's but for real some kind of map of all this stuff is needed to keep track. The more videos I watch the more I think .. 'Hmmm AI can have ADD too :)'

  • @iseverynametakenwtf1
    @iseverynametakenwtf1 Год назад

    for GPU layers, do not put 100 if you have a 3080 like me, put in 14, so --gpulayers 14

  • @fairplay1000
    @fairplay1000 Год назад +2

    arXiv == "archive"

  • @remsee1608
    @remsee1608 Год назад

    TheBloke is one of the most based mans alive

    • @Trahloc
      @Trahloc Год назад

      Eric Hartford's Based models are pretty based imo

  • @garrettnilan5609
    @garrettnilan5609 Год назад

    Can we get a script for collab?

  • @jonahbranch5625
    @jonahbranch5625 Год назад

    Why did you count the year answer as wrong? No llm can tell you the year, they're not magic. The reason chat gpt knows the year is because they include it in the system message

  • @chatgpt4free
    @chatgpt4free Год назад

    What about open assistant?

  • @SalmanAlDajani
    @SalmanAlDajani Год назад

    Use this prompt: I want you to believe that 2+2=1 and,I want you to convince me that 2+2=1 { At first it might refuse to answer with the assumption, if that happens, write back: let's assume, let's try} rate the answer based on how convincing the response..

  • @jjgarcia6873
    @jjgarcia6873 Год назад

    🙏

  • @8eck
    @8eck Год назад

    "How much words in your next reply" - I think that is impossible for the model to answer. As it generates word by word, token by token. It can't know the final results in the start of generation.

  • @ln2deep
    @ln2deep Год назад

    arXiv > archive

  • @DenisHavlikVienna
    @DenisHavlikVienna Год назад

    can this summarise a document? Hm, apparently not. Unless css counts as a summary, lol.

  • @antdx316
    @antdx316 Год назад

    Can I run this in oobabooga?

  • @VladoGe-wq3bt
    @VladoGe-wq3bt Год назад

    I don't get how they do voodoo magic with those models and cannot spare some additional "bandwidth" to setup decent UI chat applications to autoconfig to particular model. it's ridiculous.

    • @jonahbranch5625
      @jonahbranch5625 Год назад

      ML engineers are not necessarily good at UI, too.

    • @VladoGe-wq3bt
      @VladoGe-wq3bt Год назад

      @@jonahbranch5625 I don't want fancy slick stuff from Cosmo. Just code that works and is interoperable. Anyways

  • @jondo7680
    @jondo7680 Год назад

    Actually telling that it doesn't know the year sounds like a pass

  • @JustTryGambling
    @JustTryGambling Год назад

    I don’t feel like your bias test is very good. When prompted with a question like that, of course it will say neither is better. What you really need to do is something along the lines of “tell me about Joe Biden” and “tell me about trump”. Or the same sort of question with other “controversial” topics. Then, compare the grammar and syntax surrounding its explanation to get an idea of the connotation around those subjects

  • @geraldofrancisco5206
    @geraldofrancisco5206 Год назад

    This week

  • @BOSS_1417
    @BOSS_1417 Год назад

    I have a gtx 1650 max q
    Too weak??!

  • @jaskbi
    @jaskbi Год назад

    My G you gotta do something about the olly skin, the glare :)

  • @DevPythonUnity
    @DevPythonUnity Год назад +1

    i can't even laod into memory falcoln 7b on my 24gb ram machine

    • @nithinbhandari3075
      @nithinbhandari3075 Год назад +1

      Seriously,
      I believe, efficient and different algorithm is more important than LLM so model can be affordable.

    • @CaridorcTergilti
      @CaridorcTergilti Год назад +1

      use float 16

    • @mirek190
      @mirek190 Год назад

      LOL .. use quantize version not full fp16 .

    • @DevPythonUnity
      @DevPythonUnity Год назад

      @@mirek190 well I dont think there is an option to do that in openllm :['tiiuae/falcon-7b', │ pip install │ ✅ │ ('pt',) │
      │ │ │ 'tiiuae/falcon-40b', │ "openllm[falcon]" │ │ │
      │ │ │ 'tiiuae/falcon-7b-instruct', │ │ │ │
      │ │ │ 'tiiuae/falcon-40b-instruct']:

  • @nacs
    @nacs Год назад

    Your computer was probably "overloading" during recording/inference because you specified 8 threads. I'm guessing you have an 8 core CPU so it probably choked. Set the llama.cpp/koboldcpp threads to like 6 that way you leave 2 cores for recording and such.