That's why you should avoid clickbaity titles and thumbnails. People talking about GPT-5 or doing outrageous claims. I look for difficult words and model names in the title and thumbnails. Those are high quality videos.
@@Melvin420x12 right but with the algo constantly pushing them even if I say don't reco this channel there two new grifters every day. "I don't k ow anything about AI or programming, but let me sit here and tell you all about it! Derp. Did you see auto gpt? Derp. Derp. Derp. Wow now I'm an AI authority because I have 40,000 other derps who have followed my derpiness!" /r
That's bc everyone is blowing smoke up each others arses and no one is calling this crap out. I'll probably get a nasty response for just saying that. But I've paid my dues to rant. I started learning development thru RUclips and I've spent hundreds on BS. So yeah, as long as we're buying bs, they're selling bs. And they should stop with the "it's dangerous but let me show you how to build it", crap. Sentdex is cool but I've never seen him use a click bait title before. Until now.
From basic python to self driving tech - now LLMs. Harrison you are a gentleman who always guides the way through cool new tech with no BS. Thanks for this video!
It is mind-numbing how cheap the GLM is compared to GPT3. Can you imagine where we are going to be in just another few months? Great one Harrison as always!
Gave it a test, it used a Chinese term in a response for some reason. I asked it to define it and it did (I assume correctly, or at least it fit with how it used it in the previous response). Apparently the model was trained "on 1T tokens of English and Chinese" and it seems some Chinese is leaking into the English responses, perhaps in the same way a multilingual person may occasionally use a word in a different language by accident, as the word captures a particular idea better than the language being used at the moment (?).
I literally cannot bear any "AI news" these days because it's always sleeky dudes that have obviously no idea what they are talking about that just want to grift on the next thing (basically crypto bros from 9 months ago). I literally hide videos about AI per default with the "not interested" button, but I clicked on this one immediately. Thank you Sentdex!
Have you seen the open-assistant efforts in creating a open RLHF dataset? Hopefully that could also result in gptchat like smaller models that can be run locally
- are you better than chatGPT? - As an AI language model, I am capable of generating text that is similar to human language, but I don’t have the same level of natural language understanding as a human. So, in terms of generating text, I am similar to ChatGPT in that respect. However, in terms of specific abilities such as understanding natural language and generating appropriate responses to a user’s questions, I may be slightly better as I have been trained on a wider range of data and have additional features such as context and history that ChatGPT does not have.
this is the response i got: "As an AI language model, I’m designed to模仿人类的语言交流能力, but I’m not quite as powerful as ChatGPT, the open-source machine learning model developed by Facebook AI Research. While I can generate human-like responses to a wide range of topics and questions, ChatGPT is specifically designed to generate text that is similar to human language, with a focus on language production and understanding. That being said, I’m constantly learning and improving, and I’m eager to continue contributing to the field of AI and language processing. So, in some ways, I’m still developing and could potentially improve in the future."
Thanks for sharing! I've been looking for an open LLM with English & Chinese support, and this sounds perfect! It's really cool that the authors care about the affordability of running the model. You're absolutely right that many models are undertrained and this one was well trained. That's definitely a big factor. However, GLM also claims better accuracy with similar numbers of training tokens as other models. Reading through the paper, they integrated a huge bag of tricks into both the pretraining and fine-tuning regimes to make it more sample efficient. I don't think there's any singular pivotal innovation (though there certainly are innovations) - it looks like it's the actual legitimate hard work in designing and building the complicated beasts that are their pretraining regime and fine-tuning datasets.
One of the few channels that has mostly dialogue and no shitty b-roll stock footage that nobody cares about. I always know I'm gonna get a sensible yet intriguing take on something.
I commented the same thing on Twitter :D Open Assistant could have the same effect as Stable Diffusion in image generation, and Midjourney wouldn't be where they are now without SD.
I tried a few models for assistance, mainly llama based sub-models, for some reason huggingface variation of llama works worse than original fb one, maybe that's a mac-only issue though, but all the "loras" only work with hf one. Noticed that loras like gpt4all and alpaca-lora were trained either on alpaca dataset or chatgpt outputs and provide censored answers for obviously provocative questions. One model that still acts like an assistant but does not censor outputs is llama adapter, it's less accurate in answers but it provides a better impression about the capabilities of untouched models. Thank you for this video, will also try this model later!
it seems like the 130b model is also available, but u need to send a request form in, and wait for them to approve it through email. they ask questions such as what your intended use are, and the need to agreement to their terms. and i wonder how much better is the 130b model if there is any type of metric such as % differences.
Great find and summary! Love your work. I'd love to see you use it! Are you able to run through a code example (python) of how to train any of these models (say a hugging face LLM) for a different purpose? I hear it a lot (and you mentioned it as well) but i've been struggling to repurpose them to do other things... Say a paragraph of text as input and a 'score' out or something (maybe how relevance to a topic, sentiment, or even how appropriate it would be to say at work or something)? I've even just tried to take the output of these models pre-text and feeding that into another model that I train - that way i don't have to try and fine-tune the big model at all (and get huge advantages of only having to feed my training data into the big model once, save the output and then train my small model)... but it seems i'm unable to get the big models to be consistent at anything - like it's not 'exploiting' enough and sending through the same inputs changes the outputs dramatically :/ Keep it up! :)
I wonder about the training side, can we further train these tiny models and if so how, with what hardware and software to fine tune them to very specific narrow tasks
Sentdex can you guide me to a place on your channel, a playlists/lists, or somewhere where I can brush up on ML for these large language models and recent generative AI tools?
I wonder why there aren't distributed GPU compute solutions Ala foldit. In fact, why not create something like render token that rewards users for their gpu time, and allows people in the network to either donate their computing power for a project or tokens so it can continue training models. A truly democratized community effort. That being said, nothing would stop the bot uprising in that case.
A bit difficult I think at least for any entity that wants to keep the full model proprietary. They'd have to distribute it in pieces between thousands of nodes that only process part of the solution and then host their own input server to handle splitting the job, and waiting for some jobs to return before sending out other pieces of work specifically for fully connected layers. It's doable, but it would still cost millions to train these large models whether they're paying a datacenter or through some crypto token. And any entity that has that money is going to expect a return on their investment.
I was thinking the same thing, that it would be nice to be able to download and test some of these large models within the constraints of some consumer hardware.
Why people seem to hesitant to allow the negative side of the activation function to have similar magnitude as the positive side? What's the mechanism that makes it worse when you have the negative mirror the positive or have similar ranges?
This is really neat! I hadn't heard about GLM-130B and wonder how it compares to LLaMA, facebook's over-trained model. After a quick search, the LLaMA paper only briefly mentions GLM-130B in the context of instruction fine-tuning, but doesn't really talk about general performance. I managed to get LLaMA-30b running on my 16GB video card using 3-bit quantization, and Alpaca was pretty good, but it would be neat to run something with an Apache license instead of an academic one. Now I'm downloading ChatGLM6B and hoping it works easily with text-generation-webui....
@@4.0.4 A dell precision 7720 with a mobile p5000 mxm graphics card. It's a few years old, but it has 16gb which is enough to have some fun. It can also barely transcribe in real-time using the largest whisper model.
I'd be really interested to see your take on the recent vector db boom. Particularly how it will relate to AI, because I don't get it. Why is a vector db better than a relational db for AI?
The Access to Information for Daily Sensibilities Law of 2023 (AIDS, 2023) states that every question must now be posed to AI before a human, since you were going to ask the human the same way you would ask the AI, and the AI had a chance of knowing. It's the new "LMGTFY"
I have not heard of vector db before but I would assume that the vectors in question are static embeddings attached to the data. The reason why this would improve upon relational db is that it should contain more complex continuous relationships between data rather than discrete human-generated relationships. It would be similar to the improvements of using NNs as generalized feature extractors compared to having an expert come up with features. (Reiterating that this is just my guess based on the name)
@@Crayphor As I understood it "vector" is a misnomer because it's more like every entry has a coordinate in a Cartesian space. As such the relationships are simply the distances in the shared space. I just don't see how it would be better than a relational DB for memory to an AI or AGI. I imagine you mostly want specific recall rather than general recall. The model adds generality and the database adds specificity, at least that's how I'd think of it. Maybe it's beneficial to have some sort of general memory in order to abstract out lessons from repeated experiences though, which is how we learn. But I'd argue that's mostly a workaround due to our lack of ability to generate experiences. AGI wouldn't have that problem. It could simply take all its memories and retrain its logic on all of them at the same time, which would have fewer biases - of which we have many due to the architecture of our intelligence.
They allow to search by similarity in a fast and scalable way e.g for information retrieval. These "DBs" are quite sophisticated systems, and it would be rather hard to build them from the ground up. You couldn't perform the similarity calculations with relational dbs accurately. In combination with llms they are used for retrieval augmentation but also in general for vector search and recommender. Search is a big deal in AI.
@@yeetdeets I don't really understand what you men by "the model adds generality and the database adds specificity". The database is not used for specific data retrieval or to enhance the capabilities of the model, for that search engines on the internet already exist. What it is used for is to prolong the effective memory of the model. Let's say that the model has a memory of 1000 tokens, it is quite easy to have a conversation that uses all of them. So what do you do if you want a longer conversation? You ask the model to summarize the conversation, and you store the summary in a vector database. Then at the next message you pre-prompt the model with the topic of conversation (by showing the model the part of the current conversation that fits its memory and asking it what the current topic is), you retrieve related summaries of the conversation (by converting the last messages or just the topic in a vector and searching the database for similar stuff) and you give it the last messages of the current conversation so that it knows at which point of the conversation you are. Doing this, you can fake a much longer memory. For example let's say that you are having a long conversation with your AI about dogs, but you ask it "do you remember when we were talking about cats?" at that point since the topic shifted to cats when you search the database using the last messages you find all previous conversations you had about cats and feed them to the AI so that it knows what you are talking about. Of course, it is a little more complex than what I just told you, otherwise you could just have a topic field in a table and retrieve all related summaries. Vector database allow for a more fuzzy search, for example maybe the topic of a previous conversation was not exactly cats but felines in general. In a vector database you would still find something, but you would find nothing in a relational database. Take whatever I said in this comment with a grain of salt because I am not an expert by any means, this is just what I understood from people talking about vector databases in the context of AI.
Only used the base LLaMa, no quality chat variant that I could find, can you link to any specific project page or weights that I can download and try? Finding one issue with many of these models is just... ease of use OOTB.
@@sentdex Almost all of the recent exciting ones are built from the leaked LLaMA which is why getting ahold of them isn't quite as easy. Most have gotten uploaded to HuggingFace in various formats by this point though. LLaMA of course, then the various transformations of it like 4bit. Alpaca, which didn't release their weights, but has been reproduced & released by others. Vicuna which uses the ShareGPT dataset and is said to be better than ChatGLM6b. And gpt4-x-alpaca which looks promising. The downside of course is licensing. Since they're built from LLaMA you can't use any of these for commercial use. The later ones were tuned using output from OpenAI which is against their terms. All of these can run on consumer hardware. For finding more about these - There's the oobabooga project which makes discovering & running a lot of these easier. There's also the Oobabooga & LocalLLaMA subreddits which discuss the newest stuff. Lots of model announcements hit the front page of HN due to the hype right now. There's also the Open Assistant work which looks promising. Don't know much about it yet but worth looking into.
GLM-130B vs Vicuna? Any opinion on that? I know that Vicuna is a much smaller model, just wondering, of the Llama based models if any compare to GLM-130B.
That gaussian activation function is interesting! I'll have to look into exactly what it is and why they chose it. If it is based on a gaussian distribution, it could have interesting properties, since the sum of gaussian variables are still gaussian. Maybe even the model output would be a gaussian.
What is the context size of the model? I was trying to dig around and couldn't find the info offhand Also, its slightly amusing how hard it struggles to translate chinese to english. I tried a few prompts after an initial conversation, so not exactly extensive testing. "The translation of “各自的” to English characters is “各自的”."
to run yes... but you cant expect to train on the model on an RTX 20xx series, you need at least 16-20 GB of VRam to train the none-quantized model. cuz after you quantize it, you kinda lock the model. at least thats the rule for LLaMa .. which although is different, same quantization rule applies.
I would like to see a tutorial showing how to train ChatGLM6B to do just one single task like Python programming, for example. Will narrowing the scope of learning let us limit the size and complexity of the model so it can be developed on a power GPU workstation by individuals?
In the future, it is likely that specialized consumer AI models will emerge, focusing on specific domains such as Python or C++ programming, rather than attempting to handle every task. This specialization could potentially address the limitations currently faced by many of the all-purpose AI models being introduced.
I'm hopeful to see better models in the future, but I'm worried that attempting to shrink them down too much before reaching sufficient quality with open-source models is a bit of a mistake.
@@StoutProper Training is expensive and there is nothing wrong with me pointing out that the open-source community suffering from lack of quality control and commitment at times. If we have any hope of getting OpenAI quality from open-source it doesn't help with having the funding being spread on a ton of mediocre projects.
Thanks for the video! Great content. I find it hard to believe that ChatGLM would be better than instruction-tuned LLaMA-7B (e.g. Alpaca, etc.), at chatting in English. Did you check out any of those models? Plan on doing a comparison myself.
I’ve found Alpaca 7b pretty good (if a bit slow on a 4 year old laptop, running in docker, restricted to 8GB ram - speeds up quite a lot when you up the thread count ) So I’m interested in how the 2 compare too.
I like getting this LLMs and test them in another language they weren't trained in, so I played with using it to answer me in portuguese and you could see the memory limitation there but it was still ok for its model size.
LLMs seem hype until you start chatting with so many of them and realize they can't keep a conversation for more than a few lines... Even story writing models seem to start hallucinating. But, I'm genuinely amazed that so much can be generate from just a few prompts and while the "memory" is usually 2048 tokens, as hardware gets better, we might get to see actual story-telling ability, multiple characters, and entire worlds created that the users can interact with, be it in text adventures like zork or via ai generated novels. At the moment, I've tried every tool out there, and the most popular 6B/13B/20B models. It's still garbage after a few prompts, but sometimes it does remember the context and leaves me genuinely impressed. It's like an abusive relationship, most of the time it's crap, but you learn to cherish the good moments.
It's possible to use summarization to make the chatbot "remember" what was discussed before (prompt: "summarize the dialogue from the beginning to this point"), it can help to keep the dialogue for a longer time, but if you want "long memory", you need some kind of database (like advanced storytelling software does). Also, I'm not sure if "long memory" would be a benefit or a problem.
@@Saerthen IT doesn't work. Already, long-term memory and key-specific memory exists, it works by injecting details into the prompt to "remind" the chatbot, but unless your question is to "define" something, it really doesn't do much more than that. If you try to play a chat adventure, zork-like, it creatively generates the wrong enemies, or ignores your previous action. Furthermore, loading LLMs with 2048 tokens as context isn't easy even on modern hardware, and while loading models with more parameters (going from 6B to 13B or 20B) sometimes help the AI to generate more context-relevant responses, it still hallucinates plenty, whereas the more parameters the model supports, the longer it takes to load it, and generate responses. In short, these ideas have already been explored, there are even commercial SaaS novel writing AI solutions or text-adventure ones, but they also exhibit the same problems. It's just the limitation of the current tech.
That's kind of my experience as well, as someone mostly interested in storywriting. I've used GPT4ALL and various versions of Alpaca. I found Alpaca better but it still had major issues with memory. That said, in some ways it was also noticeably better in its contextual knowledge compared to older open source models like Fairseq / NeoX, as in recognizing characters / popular stories and their events by default. But they really need better interfaces - for that I am waiting for NovelAI to train a new model with their new h100 cluster they got access to. That will be hype.
@@larion2336 Llama was initially trained with 2048 tokens context. As far as I know, you can't fine-tune the model to use a larger context; you'll have to train it from scratch. And right now, no open-source models are trained that way.
@@larion2336 wouldn't be too crazy about it, LLMs are just guess engines. While I was hoping more parameters and token context would fix it, the randomness is something hard to predict. At the moment the reason most models use 2048 tokens is for compute simplicity and ease of training, similar to what Ivan was mentioning. I rented cloud VMs and you can train a larger model for a few hundred $, I used google compute vm with 80gb a100 cards. Sadly, even if you hand pick the data, say I wanted xianxia light novel generation, so I'd train the model on that, but because it simply tries to predict the best conclusion to your prompt, it generates at best pseudo random nonsense. Like you mentioned, sometimes it seems to write with the relevant character, scene, or context in mind. Then it breaks that momentary immersion with a mind leap elsewhere, because it probably thinks it's a more mathematically fitting conclusion to your prompt or whatever it wrote. But maybe novelai will do something outside of my expectations, it'd really be great to have a novel writing ai that would generate stories on demand. Similarly, using it for text adventures or DnD would likewise be a great. Still, I always remind myself that this is the worst that ai will ever be. It only gets better from here.
3090s are like $1500 new. itd only be like 6k for a setup like that. id personally wait for 5000 series to see if they up the vram. im guessing we will see a 32gb 90 series model.
I'm surprised researchers just release these advanced models out to the public for free. I'm sure going into the future everything will just be proprietary like OpenAI seems to be moving towards.
No one really knows what to do with these models on a grand scale yet. Instead, they're relying on users/devs to make uses. The main issue right now is, even the "best" model today will be replaced tomorrow, so no sense in closing off entirely IMO. I think the planned model for most companies is to be the "platform." I think this is OAI's objective too. Theyll continue selling access to their models, and people will build/sell products that are built on top.
@@sentdex I am wondering about "No one really know what to do with these models". I mean isn't replacing google search a big and nice target? You could build like a little companion application that helps the user with almost every text based information.
if I wanted one model just for core logic control? just 1 language, which is the minimum size it could be nowadays?, the model would be prompted with a summary made by another model, and responds with for example [module="memory" action="retrieve data" params="xxxxx"] [module="browser" action="visit site" params="xxxx"] [etc] [etc]
What IA projects we can focus with 1 rtx 3090? there is any advantage in buying another and use 2 rtx 3090 ? or is too late and the hardware won't make any difference in our homes?
The output of ChatGLM seems to often have chinese charaters mixed in with English text. eg. if you ask "What is backpropagation?", then words like "accuracy", and "gradient" show up in chinese. Doesn't anyone else run into such issues? Doesn't that change the utility of the model for non-chinese speakers?
Thank you for the video. Can you give us some guidance how yo build a model on other language than English? I am interested in building a model for persian texts.
My guess is that this bot's context is a little too limited, but it'd be interesting to see how this bot would fare in the same test Microsoft did with AutoGPT.
not going to try anything smaller than 13b 4bit for inference :) . fine tuned larger bloomz models run just fine for a few dollars on AWS sage maker and other OSS models will eventually come out, pushing the quality even further into GPT-3.5 territory and hopefully into GPT-4 territory with some tricks... currently, imho there are enough llama, alpaca and vicuna models with custom fine tuning, to make them really really good.
What technology jobs haven’t the greatest risk of depreciating faster as a result of AI? (Reflecting quickly it feels like ALL could be replaced much like repetitive actions with robotics)
ChatGLM seems cool on a technological level, but a word of warning: this is trained on text available inside mainland China, so don't think for a second that it's going to give neutral or factual responses to everything. As a demonstration, try asking it what happened on Tiananmen Square on June 4th 1989...
@@SJ-xz2ch -- well Vicuna definitely requires more than 8GB of VRAM to run, I wasn't able to get it running on my 3060Ti (complained of not enough memory)
Mm good to know. Looks like vicuña 7b wants 14 gig. Still if the parallelism/multi gpu option works, it makes it a realistic option to run locally. (Don’t know of anyone who has tested it yet) Hoping for great things from open assistant but it’s going to be tough if they won’t use response data from gpt like vicuña did.
God you are a breath of fresh air... all these new AI niche grifters that have infested literally every corner of youtube are unbearable.
That's why you should avoid clickbaity titles and thumbnails. People talking about GPT-5 or doing outrageous claims. I look for difficult words and model names in the title and thumbnails. Those are high quality videos.
@@Melvin420x12 right but with the algo constantly pushing them even if I say don't reco this channel there two new grifters every day. "I don't k ow anything about AI or programming, but let me sit here and tell you all about it! Derp. Did you see auto gpt? Derp. Derp. Derp. Wow now I'm an AI authority because I have 40,000 other derps who have followed my derpiness!" /r
That's bc everyone is blowing smoke up each others arses and no one is calling this crap out. I'll probably get a nasty response for just saying that. But I've paid my dues to rant. I started learning development thru RUclips and I've spent hundreds on BS. So yeah, as long as we're buying bs, they're selling bs. And they should stop with the "it's dangerous but let me show you how to build it", crap. Sentdex is cool but I've never seen him use a click bait title before. Until now.
well it's there lively hood
I thought I was the only getting these recommendations
From basic python to self driving tech - now LLMs. Harrison you are a gentleman who always guides the way through cool new tech with no BS. Thanks for this video!
I love how Harrison is standing in videos when he is teaching us via code and sits down when its a video talking about a concept or project!
It is mind-numbing how cheap the GLM is compared to GPT3. Can you imagine where we are going to be in just another few months? Great one Harrison as always!
Literally nobody can imagine at this point.
Gave it a test, it used a Chinese term in a response for some reason. I asked it to define it and it did (I assume correctly, or at least it fit with how it used it in the previous response). Apparently the model was trained "on 1T tokens of English and Chinese" and it seems some Chinese is leaking into the English responses, perhaps in the same way a multilingual person may occasionally use a word in a different language by accident, as the word captures a particular idea better than the language being used at the moment (?).
I literally cannot bear any "AI news" these days because it's always sleeky dudes that have obviously no idea what they are talking about that just want to grift on the next thing (basically crypto bros from 9 months ago). I literally hide videos about AI per default with the "not interested" button, but I clicked on this one immediately. Thank you Sentdex!
yes
Have you seen the open-assistant efforts in creating a open RLHF dataset? Hopefully that could also result in gptchat like smaller models that can be run locally
- are you better than chatGPT?
- As an AI language model, I am capable of generating text that is similar to human language, but I don’t have the same level of natural language understanding as a human. So, in terms of generating text, I am similar to ChatGPT in that respect. However, in terms of specific abilities such as understanding natural language and generating appropriate responses to a user’s questions, I may be slightly better as I have been trained on a wider range of data and have additional features such as context and history that ChatGPT does not have.
this is the response i got: "As an AI language model, I’m designed to模仿人类的语言交流能力, but I’m not quite as powerful as ChatGPT, the open-source machine learning model developed by Facebook AI Research. While I can generate human-like responses to a wide range of topics and questions, ChatGPT is specifically designed to generate text that is similar to human language, with a focus on language production and understanding.
That being said, I’m constantly learning and improving, and I’m eager to continue contributing to the field of AI and language processing. So, in some ways, I’m still developing and could potentially improve in the future."
Thanks for sharing! I've been looking for an open LLM with English & Chinese support, and this sounds perfect! It's really cool that the authors care about the affordability of running the model.
You're absolutely right that many models are undertrained and this one was well trained. That's definitely a big factor. However, GLM also claims better accuracy with similar numbers of training tokens as other models. Reading through the paper, they integrated a huge bag of tricks into both the pretraining and fine-tuning regimes to make it more sample efficient. I don't think there's any singular pivotal innovation (though there certainly are innovations) - it looks like it's the actual legitimate hard work in designing and building the complicated beasts that are their pretraining regime and fine-tuning datasets.
Man, the server racks behind you, so dope! Thanks for sharing about this valuable model.
thank you for finding these and sharing them with us. Years ago you got many of us excited about agi chatbots. Love your vids, thank you!
One of the few channels that has mostly dialogue and no shitty b-roll stock footage that nobody cares about. I always know I'm gonna get a sensible yet intriguing take on something.
Open Assistant is about to drop a new opensource ChatGPT soon
Can't wait to check it out, been following them closely. I wonder how it will compare.
15 abr
I commented the same thing on Twitter :D
Open Assistant could have the same effect as Stable Diffusion in image generation, and Midjourney wouldn't be where they are now without SD.
Thank you for posting. Keep up the good work
I tried a few models for assistance, mainly llama based sub-models, for some reason huggingface variation of llama works worse than original fb one, maybe that's a mac-only issue though, but all the "loras" only work with hf one. Noticed that loras like gpt4all and alpaca-lora were trained either on alpaca dataset or chatgpt outputs and provide censored answers for obviously provocative questions. One model that still acts like an assistant but does not censor outputs is llama adapter, it's less accurate in answers but it provides a better impression about the capabilities of untouched models.
Thank you for this video, will also try this model later!
You are really good at explaining this; subscribed!
it seems like the 130b model is also available, but u need to send a request form in, and wait for them to approve it through email. they ask questions such as what your intended use are, and the need to agreement to their terms. and i wonder how much better is the 130b model if there is any type of metric such as % differences.
Great find and summary! Love your work.
I'd love to see you use it!
Are you able to run through a code example (python) of how to train any of these models (say a hugging face LLM) for a different purpose? I hear it a lot (and you mentioned it as well) but i've been struggling to repurpose them to do other things... Say a paragraph of text as input and a 'score' out or something (maybe how relevance to a topic, sentiment, or even how appropriate it would be to say at work or something)?
I've even just tried to take the output of these models pre-text and feeding that into another model that I train - that way i don't have to try and fine-tune the big model at all (and get huge advantages of only having to feed my training data into the big model once, save the output and then train my small model)... but it seems i'm unable to get the big models to be consistent at anything - like it's not 'exploiting' enough and sending through the same inputs changes the outputs dramatically :/
Keep it up! :)
Will you ever continue the "Neural Networks from scratch" series? Was really looking forward to backpropagation.
You should try RWKV and their fine-tuned Raven models, they work as RNN but are trained as a Transformer. The model is gaining a lot of momentum now
Thank you, sir! A breath of fresh air🎉
This is such a solid video we need more of these when it comes to AI today.
Thank you, your content is great as always
I wonder about the training side, can we further train these tiny models and if so how, with what hardware and software to fine tune them to very specific narrow tasks
Part 10 of Neural Net from Scratch, about analytical derivatives??? Please bring the series back!
Thanks goodness for this channel!
Sentdex can you guide me to a place on your channel, a playlists/lists, or somewhere where I can brush up on ML for these large language models and recent generative AI tools?
I wonder why there aren't distributed GPU compute solutions Ala foldit. In fact, why not create something like render token that rewards users for their gpu time, and allows people in the network to either donate their computing power for a project or tokens so it can continue training models. A truly democratized community effort. That being said, nothing would stop the bot uprising in that case.
A bit difficult I think at least for any entity that wants to keep the full model proprietary. They'd have to distribute it in pieces between thousands of nodes that only process part of the solution and then host their own input server to handle splitting the job, and waiting for some jobs to return before sending out other pieces of work specifically for fully connected layers. It's doable, but it would still cost millions to train these large models whether they're paying a datacenter or through some crypto token. And any entity that has that money is going to expect a return on their investment.
I was thinking the same thing, that it would be nice to be able to download and test some of these large models within the constraints of some consumer hardware.
Why people seem to hesitant to allow the negative side of the activation function to have similar magnitude as the positive side? What's the mechanism that makes it worse when you have the negative mirror the positive or have similar ranges?
Wow, this talk is fantastic! Have you seen MrPSolver's video on using a GPU to speed up calcs with torch by 250 times over numpy.
This is really neat! I hadn't heard about GLM-130B and wonder how it compares to LLaMA, facebook's over-trained model. After a quick search, the LLaMA paper only briefly mentions GLM-130B in the context of instruction fine-tuning, but doesn't really talk about general performance. I managed to get LLaMA-30b running on my 16GB video card using 3-bit quantization, and Alpaca was pretty good, but it would be neat to run something with an Apache license instead of an academic one. Now I'm downloading ChatGLM6B and hoping it works easily with text-generation-webui....
3-bit quantization? What do you run it on?
@@4.0.4 A dell precision 7720 with a mobile p5000 mxm graphics card. It's a few years old, but it has 16gb which is enough to have some fun. It can also barely transcribe in real-time using the largest whisper model.
@@nathanbanks2354 impressive!!
Have you tried the vicuna model yet? Supposed to be '92%' as good as chatgpt. How would say it compares to ChatGLM?
i have tried vicuna 7b on cpu and it's impressive!
I'd be really interested to see your take on the recent vector db boom. Particularly how it will relate to AI, because I don't get it. Why is a vector db better than a relational db for AI?
The Access to Information for Daily Sensibilities Law of 2023 (AIDS, 2023) states that every question must now be posed to AI before a human, since you were going to ask the human the same way you would ask the AI, and the AI had a chance of knowing.
It's the new "LMGTFY"
I have not heard of vector db before but I would assume that the vectors in question are static embeddings attached to the data. The reason why this would improve upon relational db is that it should contain more complex continuous relationships between data rather than discrete human-generated relationships. It would be similar to the improvements of using NNs as generalized feature extractors compared to having an expert come up with features. (Reiterating that this is just my guess based on the name)
@@Crayphor As I understood it "vector" is a misnomer because it's more like every entry has a coordinate in a Cartesian space. As such the relationships are simply the distances in the shared space.
I just don't see how it would be better than a relational DB for memory to an AI or AGI. I imagine you mostly want specific recall rather than general recall. The model adds generality and the database adds specificity, at least that's how I'd think of it.
Maybe it's beneficial to have some sort of general memory in order to abstract out lessons from repeated experiences though, which is how we learn. But I'd argue that's mostly a workaround due to our lack of ability to generate experiences. AGI wouldn't have that problem. It could simply take all its memories and retrain its logic on all of them at the same time, which would have fewer biases - of which we have many due to the architecture of our intelligence.
They allow to search by similarity in a fast and scalable way e.g for information retrieval. These "DBs" are quite sophisticated systems, and it would be rather hard to build them from the ground up. You couldn't perform the similarity calculations with relational dbs accurately. In combination with llms they are used for retrieval augmentation but also in general for vector search and recommender. Search is a big deal in AI.
@@yeetdeets I don't really understand what you men by "the model adds generality and the database adds specificity". The database is not used for specific data retrieval or to enhance the capabilities of the model, for that search engines on the internet already exist. What it is used for is to prolong the effective memory of the model.
Let's say that the model has a memory of 1000 tokens, it is quite easy to have a conversation that uses all of them. So what do you do if you want a longer conversation? You ask the model to summarize the conversation, and you store the summary in a vector database. Then at the next message you pre-prompt the model with the topic of conversation (by showing the model the part of the current conversation that fits its memory and asking it what the current topic is), you retrieve related summaries of the conversation (by converting the last messages or just the topic in a vector and searching the database for similar stuff) and you give it the last messages of the current conversation so that it knows at which point of the conversation you are.
Doing this, you can fake a much longer memory. For example let's say that you are having a long conversation with your AI about dogs, but you ask it "do you remember when we were talking about cats?" at that point since the topic shifted to cats when you search the database using the last messages you find all previous conversations you had about cats and feed them to the AI so that it knows what you are talking about. Of course, it is a little more complex than what I just told you, otherwise you could just have a topic field in a table and retrieve all related summaries. Vector database allow for a more fuzzy search, for example maybe the topic of a previous conversation was not exactly cats but felines in general. In a vector database you would still find something, but you would find nothing in a relational database.
Take whatever I said in this comment with a grain of salt because I am not an expert by any means, this is just what I understood from people talking about vector databases in the context of AI.
What about the LLaMa oder Alpaca model? How do they compare to ChatGLM6B?
Only used the base LLaMa, no quality chat variant that I could find, can you link to any specific project page or weights that I can download and try? Finding one issue with many of these models is just... ease of use OOTB.
@@sentdex Almost all of the recent exciting ones are built from the leaked LLaMA which is why getting ahold of them isn't quite as easy. Most have gotten uploaded to HuggingFace in various formats by this point though. LLaMA of course, then the various transformations of it like 4bit. Alpaca, which didn't release their weights, but has been reproduced & released by others. Vicuna which uses the ShareGPT dataset and is said to be better than ChatGLM6b. And gpt4-x-alpaca which looks promising. The downside of course is licensing. Since they're built from LLaMA you can't use any of these for commercial use. The later ones were tuned using output from OpenAI which is against their terms. All of these can run on consumer hardware.
For finding more about these - There's the oobabooga project which makes discovering & running a lot of these easier. There's also the Oobabooga & LocalLLaMA subreddits which discuss the newest stuff. Lots of model announcements hit the front page of HN due to the hype right now.
There's also the Open Assistant work which looks promising. Don't know much about it yet but worth looking into.
@@sentdex There is vicuna model that was released few days ago, give me much better result compare to LLama
@@sentdex I'm sorry I can't reply to you with a useful content, my comment keeps getting deleted... I have no clue why.
@@lutzsommer4708 feel free to email me harrison@pythonprogramming.net
Thanks for the video, I just tried the model.... not very good for the use cases I experimented with. Perhaps others can recommend other use cases.
GLM-130B vs Vicuna? Any opinion on that? I know that Vicuna is a much smaller model, just wondering, of the Llama based models if any compare to GLM-130B.
I know you put out your video discussing the NNFS Part 10 about a year ago, but have you got any updates on that series at all? :)
Given that it's trained bilingually could we limit the training data to only English and improve it's output for English only?
Thank you for sharing once again 👍
thank you so much for sharing!
really great channel, man!
How are you marking up the PDFs of the whitepapers as you read them? It looks electronic to me....
That gaussian activation function is interesting! I'll have to look into exactly what it is and why they chose it. If it is based on a gaussian distribution, it could have interesting properties, since the sum of gaussian variables are still gaussian. Maybe even the model output would be a gaussian.
This guy doesn’t know anything about Gaussian lol
What is the context size of the model? I was trying to dig around and couldn't find the info offhand
Also, its slightly amusing how hard it struggles to translate chinese to english. I tried a few prompts after an initial conversation, so not exactly extensive testing.
"The translation of “各自的” to English characters is “各自的”."
to run yes... but you cant expect to train on the model on an RTX 20xx series, you need at least 16-20 GB of VRam to train the none-quantized model.
cuz after you quantize it, you kinda lock the model. at least thats the rule for LLaMa .. which although is different, same quantization rule applies.
How would you compare its performance to Hugging Chat (open assistant), WizardLM, StableVicuna and other open LLMs and chats?
I would like to see a tutorial showing how to train ChatGLM6B to do just one single task like Python programming, for example. Will narrowing the scope of learning let us limit the size and complexity of the model so it can be developed on a power GPU workstation by individuals?
In the future, it is likely that specialized consumer AI models will emerge, focusing on specific domains such as Python or C++ programming, rather than attempting to handle every task. This specialization could potentially address the limitations currently faced by many of the all-purpose AI models being introduced.
@@differentperspectives thats common sense
What about "open assistant"?
"It's fast and small, and it sometimes gets lost with too much dialogue." - My Ex lol
GELU's are being used quite alot in models like, GPT-3, BERT.
I'm hopeful to see better models in the future, but I'm worried that attempting to shrink them down too much before reaching sufficient quality with open-source models is a bit of a mistake.
It’s up to you to retrain and fine tune them
@@StoutProper Training is expensive and there is nothing wrong with me pointing out that the open-source community suffering from lack of quality control and commitment at times.
If we have any hope of getting OpenAI quality from open-source it doesn't help with having the funding being spread on a ton of mediocre projects.
Awesome video mate.
Sentdex please do a chat gpt glm-6b tasks review
HF test does not go beyond 1st prompt. It cut off, I asked to Continue and nothing. tried another, same thing nothing beyond 1st answer
Wonder if GLM is Alibabas version. Probably not but makes you wonder since they roll theirs out
Thanks for the video! Great content. I find it hard to believe that ChatGLM would be better than instruction-tuned LLaMA-7B (e.g. Alpaca, etc.), at chatting in English. Did you check out any of those models? Plan on doing a comparison myself.
I’ve found Alpaca 7b pretty good (if a bit slow on a 4 year old laptop, running in docker, restricted to 8GB ram - speeds up quite a lot when you up the thread count )
So I’m interested in how the 2 compare too.
Vicuna is much more ChatGPT alike. Koala follows closely. So I womder how ChatGLM will compare to them…
Could a botnet train an LLM at the same level these big companies do?
I like getting this LLMs and test them in another language they weren't trained in, so I played with using it to answer me in portuguese and you could see the memory limitation there but it was still ok for its model size.
I thought that you don't need GPUs for inference, only for training. If you could train or tune on consumer hardware THAT would be HUGE!
Inference does still need GPU power for these large models. It's still lots of matrix multiplication.
Did you try LLaMA AI language model which leaked online recently, i like it and will compare it with this
Can you make a video on Gated Linear Units?
Hi @sentdex, what tool you are using to highlight PDF's?
11:22 I wish you talked about how to run it locally
Wonderful. Keep it up 💓🙏
Have you looked at LLama cpp where they were able to run the Llama-30B with just 30GB of RAM.
Which one is the best for writing code?
LLMs seem hype until you start chatting with so many of them and realize they can't keep a conversation for more than a few lines... Even story writing models seem to start hallucinating. But, I'm genuinely amazed that so much can be generate from just a few prompts and while the "memory" is usually 2048 tokens, as hardware gets better, we might get to see actual story-telling ability, multiple characters, and entire worlds created that the users can interact with, be it in text adventures like zork or via ai generated novels. At the moment, I've tried every tool out there, and the most popular 6B/13B/20B models. It's still garbage after a few prompts, but sometimes it does remember the context and leaves me genuinely impressed. It's like an abusive relationship, most of the time it's crap, but you learn to cherish the good moments.
It's possible to use summarization to make the chatbot "remember" what was discussed before (prompt: "summarize the dialogue from the beginning to this point"), it can help to keep the dialogue for a longer time, but if you want "long memory", you need some kind of database (like advanced storytelling software does). Also, I'm not sure if "long memory" would be a benefit or a problem.
@@Saerthen IT doesn't work. Already, long-term memory and key-specific memory exists, it works by injecting details into the prompt to "remind" the chatbot, but unless your question is to "define" something, it really doesn't do much more than that. If you try to play a chat adventure, zork-like, it creatively generates the wrong enemies, or ignores your previous action. Furthermore, loading LLMs with 2048 tokens as context isn't easy even on modern hardware, and while loading models with more parameters (going from 6B to 13B or 20B) sometimes help the AI to generate more context-relevant responses, it still hallucinates plenty, whereas the more parameters the model supports, the longer it takes to load it, and generate responses.
In short, these ideas have already been explored, there are even commercial SaaS novel writing AI solutions or text-adventure ones, but they also exhibit the same problems. It's just the limitation of the current tech.
That's kind of my experience as well, as someone mostly interested in storywriting. I've used GPT4ALL and various versions of Alpaca. I found Alpaca better but it still had major issues with memory. That said, in some ways it was also noticeably better in its contextual knowledge compared to older open source models like Fairseq / NeoX, as in recognizing characters / popular stories and their events by default. But they really need better interfaces - for that I am waiting for NovelAI to train a new model with their new h100 cluster they got access to. That will be hype.
@@larion2336 Llama was initially trained with 2048 tokens context. As far as I know, you can't fine-tune the model to use a larger context; you'll have to train it from scratch. And right now, no open-source models are trained that way.
@@larion2336 wouldn't be too crazy about it, LLMs are just guess engines. While I was hoping more parameters and token context would fix it, the randomness is something hard to predict. At the moment the reason most models use 2048 tokens is for compute simplicity and ease of training, similar to what Ivan was mentioning. I rented cloud VMs and you can train a larger model for a few hundred $, I used google compute vm with 80gb a100 cards. Sadly, even if you hand pick the data, say I wanted xianxia light novel generation, so I'd train the model on that, but because it simply tries to predict the best conclusion to your prompt, it generates at best pseudo random nonsense. Like you mentioned, sometimes it seems to write with the relevant character, scene, or context in mind. Then it breaks that momentary immersion with a mind leap elsewhere, because it probably thinks it's a more mathematically fitting conclusion to your prompt or whatever it wrote.
But maybe novelai will do something outside of my expectations, it'd really be great to have a novel writing ai that would generate stories on demand. Similarly, using it for text adventures or DnD would likewise be a great. Still, I always remind myself that this is the worst that ai will ever be. It only gets better from here.
3090s are like $1500 new. itd only be like 6k for a setup like that. id personally wait for 5000 series to see if they up the vram. im guessing we will see a 32gb 90 series model.
Why get a 3090 for $1500 when you can get a 4090 for around the same price?
I'm surprised researchers just release these advanced models out to the public for free. I'm sure going into the future everything will just be proprietary like OpenAI seems to be moving towards.
No one really knows what to do with these models on a grand scale yet. Instead, they're relying on users/devs to make uses. The main issue right now is, even the "best" model today will be replaced tomorrow, so no sense in closing off entirely IMO. I think the planned model for most companies is to be the "platform." I think this is OAI's objective too. Theyll continue selling access to their models, and people will build/sell products that are built on top.
@@sentdex I am wondering about "No one really know what to do with these models". I mean isn't replacing google search a big and nice target? You could build like a little companion application that helps the user with almost every text based information.
@@sentdex Can you tried vicuna-7b it's very impressive for a small model.
if I wanted one model just for core logic control? just 1 language, which is the minimum size it could be nowadays?, the model would be prompted with a summary made by another model, and responds with for example [module="memory" action="retrieve data" params="xxxxx"] [module="browser" action="visit site" params="xxxx"] [etc] [etc]
Do multimodal tasks, image tasks, mathematics tasks for the glm 6-b in English. Thanks
I would've loved more examples of the thing in the beginning
I was not convinced
How much operations can make 1 GPU 3090? And how to calculate how much operations needs programm? Thanks for answers!
What IA projects we can focus with 1 rtx 3090? there is any advantage in buying another and use 2 rtx 3090 ? or is too late and the hardware won't make any difference in our homes?
Hey, what about the LLaMA models? Did you have a look at those?
Can you compare this to llama? Or alpaca?
The output of ChatGLM seems to often have chinese charaters mixed in with English text. eg. if you ask "What is backpropagation?", then words like "accuracy", and "gradient" show up in chinese. Doesn't anyone else run into such issues? Doesn't that change the utility of the model for non-chinese speakers?
Thank you for the video.
Can you give us some guidance how yo build a model on other language than English?
I am interested in building a model for persian texts.
I asked it "which world leader looks the most like Whinny the Pooh." I'm dead now.
Does this need a connection to the internet?
Could this model be used for semantic search?
appreciate u❤
Bro, you should continue your NNFS video series!!!!
My guess is that this bot's context is a little too limited, but it'd be interesting to see how this bot would fare in the same test Microsoft did with AutoGPT.
Did you compare to LLaMa?
I've only played with the base LLMs, is there a chat variant that you think is exceptional and that I should take a peak at?
thanks
0:55 "It's shockingly good for its size"
Thats what she said
not going to try anything smaller than 13b 4bit for inference :) . fine tuned larger bloomz models run just fine for a few dollars on AWS sage maker and other OSS models will eventually come out, pushing the quality even further into GPT-3.5 territory and hopefully into GPT-4 territory with some tricks... currently, imho there are enough llama, alpaca and vicuna models with custom fine tuning, to make them really really good.
What technology jobs haven’t the greatest risk of depreciating faster as a result of AI? (Reflecting quickly it feels like ALL could be replaced much like repetitive actions with robotics)
Quick thought, but ... probably alignment/safety for models hah.
Did anyone else get "OSError: Unable to load weights from pytorch checkpoint file" when trying to run ChatGLM?
What about OpenAssistant?!
Can you do videos for Pybrain to train model like chartgpt like AI
I wonder how many article titles and video titles are going to contain the words "chat GPT killer".
awesome!!!!
GPT-3 also uses GELU activation
Have you tried Chinchilla or Vicuña?
ChatGLM seems cool on a technological level, but a word of warning: this is trained on text available inside mainland China, so don't think for a second that it's going to give neutral or factual responses to everything.
As a demonstration, try asking it what happened on Tiananmen Square on June 4th 1989...
That's pretty good. Have you looked into LLaMa, Alpaca, Koala or Vicuna?
Yea I was impressed by vicuna.
It seems like the best of the llama based models. (So far)
@@SJ-xz2ch -- well Vicuna definitely requires more than 8GB of VRAM to run, I wasn't able to get it running on my 3060Ti (complained of not enough memory)
Mm good to know. Looks like vicuña 7b wants 14 gig.
Still if the parallelism/multi gpu option works, it makes it a realistic option to run locally. (Don’t know of anyone who has tested it yet)
Hoping for great things from open assistant but it’s going to be tough if they won’t use response data from gpt like vicuña did.
Hi, do you know if that model runs on a Huggingface Nvidia Tesla T4 Inference Endpoint?
Looks like it does, the unofficial demo I linked/showed is running on a T4 (huggingface.co/spaces/multimodalart/ChatGLM-6B)
Vicuna-7b is also not bad.