Thanks! I just started there. Still will make my normal youtube content but this open source project is exactly the sort of stuff I would normally cover. Glad you liked the video.
Wow, I have a lot of saved documents, articles, and even e-books on my computer. The idea of my own local chatbot being able to reference all of this and carrying on a conversation with me about it is almost like having a friend that shares all my own interests and has read all the same things that I've read! Amazing how the technology is advancing! I can't wait for this!
@@lynngeek5191 You said the desktop version is limited to 2000 but that is not true, you have the option for 8000, however you need a gpu that can handle it(like a RTX 4090, or RTX A6000, or a quadro rtx 8000 card..)
This is amazing and very well put together! You have one of my favorite channels on all of RUclips and I’ve been able to follow what you teach. It’s an honor to learn from you! Thank you!!
Great video. I hope this gets a lot of views because it is relevant to so many different use cases that need to protect source data. Love the demo of how easy it is load vectorize your own local docs with Langchain.
Awesome!!! Here I was losing hope about AI / GPT being more transparent about biases getting trained "baked" into popular chatbots already & the lack of real privacy users have about what they discuss "there is no privacy". And blammo you guys already had this out in just a few months. Super cool!! Thanks to all who went in on this!
been using Chatbots to write a tabletop GPG campaign for my friends, but having the main story in separate files has been a problem. If I can use the material I already have as my own training material it might be way easier! This chatbot might be exactly what I need! Cool, I will give it a go!
5:30 yes rob yes. Please. It will be all round approach if you start teaching python on a cloud environment. Much awaited and thanks for everything ur channels has offered me till date. Explicitly love your YT SHORTS.
This was great! I’m in the process of setting up langchain locally with openllm as a backend but to think I’ll try this as a next step. Thanks for sharing!
This is one of the best discussions of building an AI locally I have seen Bravo!! BTW the tutorial is excellent. its clearly enunciated the print ia veru big and readable for old foggies like me and he goes slow enough to follow and describes and shows what he is doing so noobs like me can follow,. also don't forget the transcription button gives details about every minute and a half . Very welll done anybody who is patient will like this. thank you Rob Mulla
you got the like only for just including the last part 14:15 and after. the whole video is decently good. keep up the good work,the last part of info is really what eh people should get in their heads. BRAVO!!! ευχαριστω που το επες.
The best explanation so far. I've experience using GPT4All, self hosted whisper, Wav2lip, stable diffusion and also tried few others that I failed to run succesfully. The AI community is growing so fast and is fascinating. I'm using RTX3060 12GB and the speed is okay for chatbot use case but for realtime AI game engine character response it is slow to get response. I recently get a hand of RTX3080 10GB and in this video I see you are using RTX 3080TI which has 10240 CUDA vs mine 8960. It is first time i see that you can use additional cards which in your case GTX1080 vs mine GTX1060 to run the LLM. Very informative video!
If you’re trying to shoe-horn a full fledged LLM to power some NPCs then you’re doing it wrong. All that you need is basic chat personalities and the ability to train based on in-game events, this requires very little processing power!
Hello Rob, I liked your video very much. I wanted to suggest that you consider making a video on how a translator or voice-to-text transformation can become a tool for everyone based on an open language model. It would be an interesting topic to explore and could benefit many people. Thank you for your great content!
Look for "Faster WhisperAI", maybe it could help you in creating transcriptions and translations from audio-to-text, I've had great success in using it to transcribe youtube videos and create subtitles for them.
Could you do a video on the “fine tuning” you talk about near the end? I like the privacy attribute of running open source locally and the fine tuning would be really beneficial.
@@tacom6 I write Inspection and test Plans, inspection test reports and standard operating procedure for industrial engineering and industrial construction. For instance if we need to write a procedure and checks for the correct way to flush HDPE or terminate and electrical panel etc. Currently I can paste SOPs or our checklists and ask if it was missed anything, ask about new ideas to add or new things entirely. It's great at asking for ASTM codes instead of looking it up or buying the PDF. I'm using Claude currently and perplexity. My company does everything internal it doesn't want the data hosted with another company. I'd like to make something for us to use internally. I believe using AI language models has sped up our procedure drafting and checksheet drafting about 40% so far. It's been game changing. But, I'm using 3d party sites and I have to scrub out client details, names etc. If I had an in house model I could let it retain all client or confidential data and others could ask requests against it also. I have a bot running that I've made through Poe using Claude as the base but I can't make it public for colleagues to use.
Dude, you are not even being biased THIS IS THE BEST INVENTION EVER!!! Open source??? AND it runs locally???? even without the file access feature this would've been the coolest piece of software I've ever encountered!
Thank you so much. This is a clear guide for us to begin experimenting with our vision for a new application, and the last 4 minutes is a great executive summary for me to show to my management.
Great video! I'm a fan of H20. Really impressed with the driverless performance. Helps me benchmark my own code! Gonna try this out later this week, thanks Rob!
Mighty nice video, very useful! Giving local context is interesting. The question about roller coasters was a clever way to demo the feature. Thanks! 😊
Awesome content! I noticed the audio dropouts from time to time. I had a similar issue this week when recording some videos myself, and the culprit for me was the Nvidia Noise Removal filter in OBS. I changed it back to RNNoise and it worked like a charm. Don't know if yours is related, but if it helps, then happy days! Cheers!
Wow an OpenSource GPT model. This is freaking awesome. I am working on building some AI products this is a life saver. I am excited to play with this big time. Throwing in some Vector Memory Databases to add context on top of this and I can get my first AI product out real soon. I can easily build some text to speech and Computer Vision models of my own on tensorflow to get something big to happen. Man Christmas has really come early for me this year.
Rob, the video is Awsome! Great content as usual 🤩 Would love to watch a version utilizing a spinned up instance from a cloud provider too ( for those of us without a gpu 😊)
Yes, as requested, I am letting you know that I am interested in any of the potential future videos you mention in this video! You are giving gold away for free!
One suggestion here, this would be more popular for everyone if there was an installer like GPT4All has, as those who have no command-line experience can still use it.
Thank you very much, finally a reasonably reasonable documentation. The topic of Cuda is also somehow a single cramp. I wanted to realize the whole thing as a Docker to keep the system more scaled, just cramp.
I am interested more about how this can turn into features for 3d model making, music making, 3d/2d game making or even software programming. Some tests of what can it generate and other stuff could be nice.
I decided to try this out, and I don't feel like the document feature really works? I uploaded a few smaller markdown files, and I wanted a summary of everything that was discussed in those documents - instead, it picks two of those documents, and ignores everything else. It's not clear to me how this was implemented, or even how this would work? Best guess, it creates a *separate* index (using BERT or something similar) and queries that for relevant files - then includes the contents of those files in the actual query? Or however much it can, based on the max context length and the length of the query? Even after explicitly selecting all my documents, it only picks two of them. What I was hoping for was some kind of deep integration with the LLM itself - but I don't suppose that's really possible at this time? While this feature is probably useful for some things, it doesn't really help solve my problem, which is trying to distill a lot of long conversations into conclusions. I'm still waiting for an LLM that can actually handle larger context. It sounds like they're on the horizon with something like LongLlama? But it doesn't look like they're here yet? In the mean time, this is better than nothing, I suppose. But the real killer feature would be very large context, enabling it to read and ingest larger volumes of content, and then make queries. Maybe I'm too impatient. 😅
Just checked the thing out, as soon as it was mentioned (luckily, this video was suggested to me by RUclips). Being a tech enthusiast and a translator, I ended up spending an hour discussing the technical dimensions of my profession. Loved this! I will definitely look into this in more detail later. The thing is also pleasantly polite. 😊
Thank you! Nice. What is the difference to GPT4ALL? Where are the strengths and weaknesses? Can h2oGPT understand associations? Or can it be trained accordingly? Will there be more videos with how-to’s for basics, using your own files, training and corrections?
On the why you would want this question: I actually think 1 of the most compelling answers is "because it works without the internet". Most of the interesting potential applications I can think of for an LLM are not copacetic with mandated internet access... eg. using it for dynamic NPC dialogue in a video game, people don't like always-on connections for very good reasons.
11:00 I have a question this model..type and its quantity of the size..is based on how much accurate the information will be on basis of that?..and do we need update it?
great i have learn so much about how use the open source ai and ai modules ,i am glad to do my self to build the project on my local computer ,the read PDF ability is so good i well try it!
Thank you!! I’ve been stumped on building a model to generate an entire script for a Seinfeld episode based on my personal scores of each episode and I think this video just unlocked it for me!!
Thanks for the really informative and well paced video. The video seems to glance-over a page showing the timeline vs linkage/evolution of models at about 5:36. What is the source of that ? Would be nice to have a link to it. Definitely interested in the video on how to use cloud-GPU to run the larger models. In fact (but not surprisingly, as a newbie to generative AI and AI in general), I was under the impression that you didn't need a GPU for producing generative AI content after the foundational model or specialized model was ready. Would be nice if you could cover, what could be done on systems without a GPU (or say an iGPU only) and those that need H110/A110 type GPUs (and how many).
You need a huge GPU to train the model and a big GPU to run it. You can train a smaller model from a bigger model, like Alpaca, but it is not as smart. The GPU has to host the entire brain of the AI so the smaller the worse.
Man ! Having own version of GPT will definitely help in doing personal research or study by providing it accurate resources. Since we already know, online chat-gpt often gives inaccurate result. But by giving this offline gpt accurate resource like in the end of this video, may be this problem can be tackled......
This is a great breakthrough, but can you please show us how to do it on Windows? It would be nice if there was just an msi like normal software. Having a convoluted 25 step command line based install process makes it really hard to ship software that uses these models locally. It just makes this stuff really inaccessible and unappealing for Windows developers.
Excellent stuff, thank you. I just followed your instruction, plus the README in the GIT, and spin up an instance on a Cloud VM, as my notebook has not GPU. It is fun... and I wish you can further teach us how to do the langchain things when we have a lot of documents that we want to feed them in. Thank you once again.
@@KimchiOishi1 h2oGPT does support some models that run on CPU. You just need to follow the insturctions for gpt4ALL or llama.cpp. They are much less powerful and slower than the falcon model running on GPU but they still work.
Great video! Thank you! I do agree that it's better to have your own local model running open source software if your machine can run it. What GPU do you need??!!! lol The biggest issue I have with ChatGPT - open source or otherwise, are incorrect responses. That makes it next to worthless because you can't trust the responses 100% of the time. Can it also respond incorrectly if you train it with your own data?? And how much of your own data do you need to train it? So if I try to train it on all the PDF's on Raman Microscopy, what's the percent likelihood that a response is going to be incorrect? Thanks in advance. Cheers!
Hi Rob, great video. Could you make one on how to set it up using Azure and also how to use it for training a custom dataset (I would be especially interested in your recommendations for training the Falcon 40B)
You mentioned your machine needed a bigger gpu for a bigger model, I think it would be great to mention what gpu u are using so we can have some sort of reference.
What GPU is big enough to fit the 40B model? Is there a commercial GPU capable of such? What's the highest model that I can most likely get with say a $1000-2000 budget GPU? Thanks. And great content!
Just scaling from the 7bil at 9gib when the weights are truncated to 8 bits. so at least 60GB, Maybe 30 if you cut them to 4 bits but at least 2-3 24gb mem GPUs.
I agree with you on the llm’s… eventually you will probably even have certifications… like ai solutions engineer… different flavors or llm’s just like the different flavors of Linux… Every small to medium company will want their own private ai setup when they see the benefits.
Very jumpy video and a little difficult to follow, *it's difficult because of all the zooming and cutting and clearing of the terminal and just going everywhere lol*, especially medicated ; ), but in the end i'm up to the point of downloading the model and about to run it for the first time. At the end you speak of fine tuning for a particular use case. I would watch a video about that and i saw a few comments saying the same thing. Thanks for the video and have a great day!
When I try to use it, it asks about 10 clarifying questions when GPT 3.5 asks none and gets the job done. I like this, but so far it's a time suck if you are trying to be productive.
Sounds interesting. Installation for Linux assumes a .deb apt package manager. It also seems to have a dependency on X11. If it ever comes out as an appimage self-executable linux package, I may come back to explore it further. I also could not install the older torch 2.1.2. The current version is 2.2.0 and I could not find an archive to find an older version.
Very clear explanation of the program. Great video. I wonder how people create these open source programs and still can put food on the table. They must have day jobs.
I didn't knew that you were working for h2o, but I am happy for you all. You're doing a great work making open source LLM more accesible and friendly!
Thanks! I just started there. Still will make my normal youtube content but this open source project is exactly the sort of stuff I would normally cover. Glad you liked the video.
I’ve always liked H2O, I used to use their deep learning framework a lot. Will definitely check this out.
@@robmulla You work for this?... Sad.
@@sylver369 get off his back
@@sylver369if you worked for anyone "better" then why are you here? 🤔
Wow, I have a lot of saved documents, articles, and even e-books on my computer. The idea of my own local chatbot being able to reference all of this and carrying on a conversation with me about it is almost like having a friend that shares all my own interests and has read all the same things that I've read! Amazing how the technology is advancing! I can't wait for this!
In its current state, i think you will be underwhelmed by its performance unless you have a pretty powerful GPU
@@lynngeek5191In simple English, what are you even talking about dude?
@@Raylightsen I think what he meant is that his GPU is not good enough, since he couldn't use the 8k token version.
@@lynngeek5191 You said the desktop version is limited to 2000 but that is not true, you have the option for 8000, however you need a gpu that can handle it(like a RTX 4090, or RTX A6000, or a quadro rtx 8000 card..)
@@Raylightsen ok dude, here it is for you : He's trying to say that this shit ain't free homie. Apprently far from it according to @lynngeeks5191
This is amazing and very well put together! You have one of my favorite channels on all of RUclips and I’ve been able to follow what you teach. It’s an honor to learn from you! Thank you!!
Wow. Thanks for such kind words. I appreciate the positive feedback.
Great video. I hope this gets a lot of views because it is relevant to so many different use cases that need to protect source data. Love the demo of how easy it is load vectorize your own local docs with Langchain.
Thanks for the feedback! Glad you liked it. Please share it anywhere you think other people might like it.
Awesome!!! Here I was losing hope about AI / GPT being more transparent about biases getting trained "baked" into popular chatbots already & the lack of real privacy users have about what they discuss "there is no privacy". And blammo you guys already had this out in just a few months. Super cool!! Thanks to all who went in on this!
The last thing they want is AI “noticing” patterns in modern western society.
been using Chatbots to write a tabletop GPG campaign for my friends, but having the main story in separate files has been a problem. If I can use the material I already have as my own training material it might be way easier! This chatbot might be exactly what I need! Cool, I will give it a go!
update?
YES PLEASE make another video where you sey up all of these in a cloud environment instead of local. Excellent video, thank you very much
5:30 yes rob yes. Please. It will be all round approach if you start teaching python on a cloud environment. Much awaited and thanks for everything ur channels has offered me till date.
Explicitly love your YT SHORTS.
This was great! I’m in the process of setting up langchain locally with openllm as a backend but to think I’ll try this as a next step. Thanks for sharing!
Glad you enjoyed the video! Thanks for the feedback.
This is one of the best discussions of building an AI locally I have seen Bravo!! BTW the tutorial is excellent. its clearly enunciated the print ia veru big and readable for old foggies like me and he goes slow enough to follow and describes and shows what he is doing so noobs like me can follow,. also don't forget the transcription button gives details about every minute and a half . Very welll done anybody who is patient will like this. thank you Rob Mulla
you got the like only for just including the last part 14:15 and after.
the whole video is decently good.
keep up the good work,the last part of info is really what eh people should get in their heads.
BRAVO!!!
ευχαριστω που το επες.
This is awesome! Definitely going down this rabbit hole
The best explanation so far. I've experience using GPT4All, self hosted whisper, Wav2lip, stable diffusion and also tried few others that I failed to run succesfully. The AI community is growing so fast and is fascinating. I'm using RTX3060 12GB and the speed is okay for chatbot use case but for realtime AI game engine character response it is slow to get response. I recently get a hand of RTX3080 10GB and in this video I see you are using RTX 3080TI which has 10240 CUDA vs mine 8960. It is first time i see that you can use additional cards which in your case GTX1080 vs mine GTX1060 to run the LLM. Very informative video!
You should refit the model using Lora to get a smaller size, more narrow for in game usage, that way it's more optimize
Would AMD cards work or is it a headache?
@@CollosalTrollge currently they are using cuda technology which require nvidia cards.
@@hottincup good suggestion, i will try.
If you’re trying to shoe-horn a full fledged LLM to power some NPCs then you’re doing it wrong.
All that you need is basic chat personalities and the ability to train based on in-game events, this requires very little processing power!
I would also like to see another video from you about setting up all a cloud environment. Thanks for sharing your knowledge.
Hello Rob,
I liked your video very much. I wanted to suggest that you consider making a video on how a translator or voice-to-text transformation can become a tool for everyone based on an open language model. It would be an interesting topic to explore and could benefit many people. Thank you for your great content!
there are plugins for this fine an open source one and use that
@@pirateben what's it called Ben? Linkey pls.. :)
Look for "Faster WhisperAI", maybe it could help you in creating transcriptions and translations from audio-to-text, I've had great success in using it to transcribe youtube videos and create subtitles for them.
As many searches I've done on RUclips, you're Channel came up today. I'm really impressed. Great Job!
Could you do a video on the “fine tuning” you talk about near the end? I like the privacy attribute of running open source locally and the fine tuning would be really beneficial.
I love the Content, even though no one doesn't know about this, Very very useful content we are expecting a cloud version demo also. Thank You
This is exactutly what i was looking for to develop a model for internal use at my compnay. Thank you!
any specific use-case?
@@tacom6 I write Inspection and test Plans, inspection test reports and standard operating procedure for industrial engineering and industrial construction. For instance if we need to write a procedure and checks for the correct way to flush HDPE or terminate and electrical panel etc. Currently I can paste SOPs or our checklists and ask if it was missed anything, ask about new ideas to add or new things entirely. It's great at asking for ASTM codes instead of looking it up or buying the PDF. I'm using Claude currently and perplexity. My company does everything internal it doesn't want the data hosted with another company. I'd like to make something for us to use internally. I believe using AI language models has sped up our procedure drafting and checksheet drafting about 40% so far. It's been game changing. But, I'm using 3d party sites and I have to scrub out client details, names etc. If I had an in house model I could let it retain all client or confidential data and others could ask requests against it also. I have a bot running that I've made through Poe using Claude as the base but I can't make it public for colleagues to use.
@@BryanEnsign sounds fantastic. my interest is similar but with focus on cybersecurity. thanks for sharing!
@@tacom6 That's awesome. So many possibilities. Luck to you brother!
Amazing! Thanks for the detailed guide. Will definitely be using this for future projects!
Dude, you are not even being biased THIS IS THE BEST INVENTION EVER!!!
Open source??? AND it runs locally???? even without the file access feature this would've been the coolest piece of software I've ever encountered!
This was a really transformative experience and I really appreciate that you did this video!
Fantastic tutorial and superb framework! Congratulations for you and the H2O team! 🔥🔥🔥
You should explain up front that it's an ad.
Thank you so much. This is a clear guide for us to begin experimenting with our vision for a new application, and the last 4 minutes is a great executive summary for me to show to my management.
Great video! I'm a fan of H20. Really impressed with the driverless performance. Helps me benchmark my own code! Gonna try this out later this week, thanks Rob!
Very interesting, I stood it up on a VPS with 10 CPUs, it's painfully slow but it works!
Mighty nice video, very useful! Giving local context is interesting. The question about roller coasters was a clever way to demo the feature. Thanks! 😊
Fascinating stuff. So important to figure out ways of using these tools in a way that allows us to retain some privacy. Subscribed.
This is really cool. I just installed it and tried it. actually runs pretty fast on my CPU
How did you get it to work with your CPU? I keep getting token limitations on the answers. Did you follow the documentation?
Awesome content! I noticed the audio dropouts from time to time. I had a similar issue this week when recording some videos myself, and the culprit for me was the Nvidia Noise Removal filter in OBS. I changed it back to RNNoise and it worked like a charm. Don't know if yours is related, but if it helps, then happy days! Cheers!
Great work. I was looking for a tutorial like this for a long time.
THIS IS A GAME CHANGER!! FOSS FOR THE WIN!
Thanks for the Video I really appreciated and support the open source community! .
Very informative video ion how to create your own private chat bot and have it learn from your context. Genius! I look forward to further development.
Wow an OpenSource GPT model. This is freaking awesome. I am working on building some AI products this is a life saver. I am excited to play with this big time. Throwing in some Vector Memory Databases to add context on top of this and I can get my first AI product out real soon. I can easily build some text to speech and Computer Vision models of my own on tensorflow to get something big to happen. Man Christmas has really come early for me this year.
It does have many limitations such as requiring fairly beefy hardware (or get really restrictive questions) and a ton of storage space.
@@Runefrag do you think nvidia rtx 4090 can handle this? When you say beefy hardware what do you have in mind?
Rob, the video is Awsome! Great content as usual 🤩
Would love to watch a version utilizing a spinned up instance from a cloud provider too ( for those of us without a gpu 😊)
Thanks for watching! Will def look into a video with steps to setup on the cloud.
@@robmulla definitely interested in the cloud provider video too
Thanks for all your efforts teaching brotha
These language models are getting improved so fast, by the time you have it installed and working there's 3 better ones
Excellent content, especially the LangChain part, thank you!
this is the first video ive noticed that highlighted the actual subscribe button. it looked really clean.
Yes, as requested, I am letting you know that I am interested in any of the potential future videos you mention in this video! You are giving gold away for free!
One suggestion here, this would be more popular for everyone if there was an installer like GPT4All has, as those who have no command-line experience can still use it.
Thank you very much, finally a reasonably reasonable documentation. The topic of Cuda is also somehow a single cramp. I wanted to realize the whole thing as a Docker to keep the system more scaled, just cramp.
Can’t imagine what sort of Ai we could build if we had all them ethereum miners ^^
I am interested more about how this can turn into features for 3d model making, music making, 3d/2d game making or even software programming. Some tests of what can it generate and other stuff could be nice.
Finally a video that gives me a "hello world" for an attainable local gpt alike chat bot. Now I can actually LEARN how to fine tune a model.
Really enjoy the video
I decided to try this out, and I don't feel like the document feature really works? I uploaded a few smaller markdown files, and I wanted a summary of everything that was discussed in those documents - instead, it picks two of those documents, and ignores everything else. It's not clear to me how this was implemented, or even how this would work? Best guess, it creates a *separate* index (using BERT or something similar) and queries that for relevant files - then includes the contents of those files in the actual query? Or however much it can, based on the max context length and the length of the query? Even after explicitly selecting all my documents, it only picks two of them. What I was hoping for was some kind of deep integration with the LLM itself - but I don't suppose that's really possible at this time? While this feature is probably useful for some things, it doesn't really help solve my problem, which is trying to distill a lot of long conversations into conclusions. I'm still waiting for an LLM that can actually handle larger context. It sounds like they're on the horizon with something like LongLlama? But it doesn't look like they're here yet? In the mean time, this is better than nothing, I suppose. But the real killer feature would be very large context, enabling it to read and ingest larger volumes of content, and then make queries. Maybe I'm too impatient. 😅
Just checked the thing out, as soon as it was mentioned (luckily, this video was suggested to me by RUclips).
Being a tech enthusiast and a translator, I ended up spending an hour discussing the technical dimensions of my profession. Loved this!
I will definitely look into this in more detail later. The thing is also pleasantly polite. 😊
you got me subscribe !!! wow thanks for explaining in step by step for anyof your content keep it up
Thank you! Nice. What is the difference to GPT4ALL? Where are the strengths and weaknesses? Can h2oGPT understand associations? Or can it be trained accordingly?
Will there be more videos with how-to’s for basics, using your own files, training and corrections?
Awesome knowledgeable video this is so useful. Keep making videos like this
This is extremely helpful, Thank you for sharing this.
Thanks!
Thanks a ton! Glad you liked it.
Great video, your voice and pace is perfect. Thank you
On the why you would want this question: I actually think 1 of the most compelling answers is "because it works without the internet". Most of the interesting potential applications I can think of for an LLM are not copacetic with mandated internet access... eg. using it for dynamic NPC dialogue in a video game, people don't like always-on connections for very good reasons.
11:00 I have a question this model..type and its quantity of the size..is based on how much accurate the information will be on basis of that?..and do we need update it?
great i have learn so much about how use the open source ai and ai modules ,i am glad to do my self to build the project on my local computer ,the read PDF ability is so good i well try it!
Thank you!! I’ve been stumped on building a model to generate an entire script for a Seinfeld episode based on my personal scores of each episode and I think this video just unlocked it for me!!
You are now master of your domain!
You explained everything very clearly, Thanks
Thanks for the really informative and well paced video. The video seems to glance-over a page showing the timeline vs linkage/evolution of models at about 5:36. What is the source of that ? Would be nice to have a link to it. Definitely interested in the video on how to use cloud-GPU to run the larger models. In fact (but not surprisingly, as a newbie to generative AI and AI in general), I was under the impression that you didn't need a GPU for producing generative AI content after the foundational model or specialized model was ready. Would be nice if you could cover, what could be done on systems without a GPU (or say an iGPU only) and those that need H110/A110 type GPUs (and how many).
You need a huge GPU to train the model and a big GPU to run it. You can train a smaller model from a bigger model, like Alpaca, but it is not as smart. The GPU has to host the entire brain of the AI so the smaller the worse.
Man ! Having own version of GPT will definitely help in doing personal research or study by providing it accurate resources. Since we already know, online chat-gpt often gives inaccurate result. But by giving this offline gpt accurate resource like in the end of this video, may be this problem can be tackled......
Ran through this then when trying to generate I get that the module fire does not exist. I'm in the Conda environment we made. not sure what I missed.
Rob thanks a lot for this video. Please make a video on how to get the gpu in the cloud
Thanks for asking. Others have been asking for this too so I am planning to work on it.
I will be trying this, can't wait!!
Amazing video.
Thank you so much.
I also would to love to see a video about the setup in a cloud provider.
Great work, thanks for the video 🎉
Nice work buddy. Keep it up
Great content, I enjoyed your video.
This is a great breakthrough, but can you please show us how to do it on Windows? It would be nice if there was just an msi like normal software. Having a convoluted 25 step command line based install process makes it really hard to ship software that uses these models locally. It just makes this stuff really inaccessible and unappealing for Windows developers.
same here. I gave up installing with that Windows installer. now trying by Docker.
Excellent stuff, thank you. I just followed your instruction, plus the README in the GIT, and spin up an instance on a Cloud VM, as my notebook has not GPU. It is fun... and I wish you can further teach us how to do the langchain things when we have a lot of documents that we want to feed them in.
Thank you once again.
PLEASE show us how to install this on a Cloud provider
Thanks for the feedback. Will do! What cloud provided do you prefer? AWS? GCP?
@@robmullaGCP please😘
@@robmulla would you mind also doing AWS too afterwards? Also, possibly rreally dumb question: is it possible to run this at all w/o GPU/CUDA?
@@KimchiOishi1 h2oGPT does support some models that run on CPU. You just need to follow the insturctions for gpt4ALL or llama.cpp. They are much less powerful and slower than the falcon model running on GPU but they still work.
@@robmulla put a railway template already
This is cool and brilliant. Great tutorial
Great video! Thank you! I do agree that it's better to have your own local model running open source software if your machine can run it. What GPU do you need??!!! lol The biggest issue I have with ChatGPT - open source or otherwise, are incorrect responses. That makes it next to worthless because you can't trust the responses 100% of the time. Can it also respond incorrectly if you train it with your own data?? And how much of your own data do you need to train it? So if I try to train it on all the PDF's on Raman Microscopy, what's the percent likelihood that a response is going to be incorrect? Thanks in advance. Cheers!
Very helpful video. Thank you!
Nice! It's working.. I'm excited
Brilliant. Thank you for sharing. I am now looking for a faster machine to reproduce what you have demonstrated 🙂
Hi Rob, great video. Could you make one on how to set it up using Azure and also how to use it for training a custom dataset (I would be especially interested in your recommendations for training the Falcon 40B)
You mentioned your machine needed a bigger gpu for a bigger model, I think it would be great to mention what gpu u are using so we can have some sort of reference.
Good point. It’s a 3080ti
Crickin SCARY! I love it! Hail our new overlords!
What GPU is big enough to fit the 40B model? Is there a commercial GPU capable of such? What's the highest model that I can most likely get with say a $1000-2000 budget GPU? Thanks. And great content!
Just scaling from the 7bil at 9gib when the weights are truncated to 8 bits. so at least 60GB, Maybe 30 if you cut them to 4 bits but at least 2-3 24gb mem GPUs.
do they have same thing with an INSTALL button instead of over 9000 commands python?
Thank you very much for providing the beautiful resource..
I agree with you on the llm’s… eventually you will probably even have certifications… like ai solutions engineer… different flavors or llm’s just like the different flavors of Linux… Every small to medium company will want their own private ai setup when they see the benefits.
Yes Rob ,, would love to see your implementation in the cloud space for these models.
Absolutely brilliant. Thank you.
That was good presentation.
Thanks !
Very jumpy video and a little difficult to follow, *it's difficult because of all the zooming and cutting and clearing of the terminal and just going everywhere lol*, especially medicated ; ), but in the end i'm up to the point of downloading the model and about to run it for the first time. At the end you speak of fine tuning for a particular use case. I would watch a video about that and i saw a few comments saying the same thing. Thanks for the video and have a great day!
Man, you're just awesome ❤
Hello Rob, do you have any suggestions on what kind local machine can run LLM, cpu, ram, GPU?
WONDERFUL!
Thank you.
I like this! U just earned yourself a new sub!! Would be interesting to know how to fine tune or train the installed LLMs
When I try to use it, it asks about 10 clarifying questions when GPT 3.5 asks none and gets the job done. I like this, but so far it's a time suck if you are trying to be productive.
Sounds interesting. Installation for Linux assumes a .deb apt package manager. It also seems to have a dependency on X11. If it ever comes out as an appimage self-executable linux package, I may come back to explore it further. I also could not install the older torch 2.1.2. The current version is 2.2.0 and I could not find an archive to find an older version.
Does Private GPT model use only the data available on your server or does it get the information from the web too? If yes how is theinformation save?
13:40 Instad of downloading a pdf (or a 100), could you give it a link to the article? Or to a webcrawler?
Very clear explanation of the program. Great video. I wonder how people create these open source programs and still can put food on the table. They must have day jobs.
It has potential. I hate all the subscription models coming at us all the time. Hopefully more of this will come about.
Extremely excellent explanations