Feed Your OWN Documents to a Local Large Language Model!
HTML-код
- Опубликовано: 7 фев 2025
- Dave explains how retraining, RAG (retrieval augmented generation) and context documents serve to expand the functionality of existing models, both local and online. For my book on the autism spectrum, check out: amzn.to/3zBinWM
Dave's Attic - Friday 4PM Podcast - / @davepl
Follow me for updates!
Twitter: @davepl1968 davepl1968
Facebook: davepl
No nonsense, 100% information, 0 fluff videos. Thank you so much for sharing your knowledge.
also he has a very symmetric face]\
so true
For years I’ve stored documentation relevant to my domain, more than I can read in ten life times and cherry pick from to build conferences and workshop. For some time I knew that local model working on your own data data set could be useful. But today, when I see you explaining that in your video, that became obvious that I should do that move!. Thank you for your help. I subscribed to watch more about local set up of a llama model.
I'm with you! I have over 37 years of reports that I've generated and all the backup documents. I would love nothing more than to create my own local AI that can generate a report just like I would based on the data that I input from a brand new data set to build the report using language that I've mastered over the years. Could be a big time saving measure!
@@genejoanenwon't y need a lot of money for doing this?
but dont use chatgpt for that, you are uploading all your knowledge to a propietary company.
Dave: you can "bind mount" the documents directory into the container FS. Bind mount is docker-speak for "reflect a piece of the host FS into the container FS".
Thats what I was thinking too
This should definitely be a pinned comment.
I'm pretty sure that on macOS and Windows, this has a performance overhead. On Windows you can mitigate this by being in WSL2 but he's on macOS and I think the bind mounts are not as performant as on Linux which can be annoying.
@@robinkuster1127 No different than the volume bind mount the container runs in. Just poor performance all around comparatively to Linux.
Came here to say this
I appreciate the practical hands-on approach to your AI videos. So many AI presentations are 90% hype (or more).
Agreed! And those that aren't hype require three years of careful study in the field before you can understand them.
This video is perfect for just showing me how to accomplish what I want to get done.
I went to bed yesterday night thinking about what other ways we can train the models, other than feeding the documents at the prompt - and woke up in the morning with this video being at the top of suggested videos in RUclips !
My radio in my car tells me what I'm thinking all the time! It's been listening to me for a long time
I guess it knows me
@@B33t_R007 depends on the model you using and the level of quantization used. In truth when buying PC for AI, GPU VRAM capacity is more important than CPU RAM.
Go for a PC with a 64GB VRAM you could run llama 90B with half precision(bfloat16)
Wow, Google really knows you.
Now they reading our thoughts
hi krishna, i am already using RAG but i am also thinking the same i.e, how can i train models on my data, video says that its very hard to do that , but did you find any way?
I've started playing around with a local LLM based on your other video. This was very helpful. Thanks!
For users who can't find the document tab in Workspace in V0.3.12, it is now called Knowledge and you can just upload your documents here even while in Docker. In Workspace, create your Model as per normal but choose the Knowledge Base that you've created and uploaded files into.
Everything else works as normal.
Using a smaller model for RAG is an interesting idea, so that it has less of its own content and relies more on the documents. I always learn something from you, thanks!
Thank you so much for taking the time to create this video. It answered so many questions for me. Completely "Demystified" much of the mystery around what path is best for me personally for adding data to an LLM. I'm so glad you came up in my RUclips feed. Thanks again.
Dave, this is great! I walked through everything this weekend, but my web-ui looked different. I didn't have a scan button under documents, but under workspace I had a "knowledge" tab where I would upload documents to my custom model. I think they must have just changed this functionality to make it easier to upload documents without copying to the backend because I could upload the documents from the web-ui and it automatically put them in an "upload" directory and used the document as a reference the same as your video. Overall, I think the change is a huge improvement. Thanks for putting this together, because I was able to use it to host my own AI at home using ollama and web-ui and build custom models using my own reference document. Keep up the great work!
Software updates and changes occur often due to the rapid development of AI tools, so every step in this tutorial could be obsolete with-in days, weeks, months...
It is insane how fast AI is moving. Since you made the video Open WebUI have released an update for knowledge and document management, making it easier (you can now upload directly from the web interface)!!
Thanks for making these videos, I love how clear and concise they are, as well as just entertaining!
Great Channel really enjoy your talks. About 20 years ago I worked for a company that had full source code to Windows NT as it was called then. I remember coming in on weekends just sifting through the code trying to understand how the operating system worked. Very complicated and very impressive. I remember one time Dave Cutler came to our company and sat down with all the engineers and gave a talk. What a brilliant man! Have been retired now for about 10 years and am currently trying to understand large language models. So your video was very appropriate. I often wondered how RAG worked..
Dave, you've been reading my mind lately on topics I want to know more about. Thanks for making this!
I asked ChatGPT whether creating a custom GPT actually places documents into the context window or if it access them via RAG. Here's what it said: When you create a custom GPT and upload documents, those documents aren't loaded directly into my context window or memory. Instead, they're used with a retrieval-augmented generation (RAG) approach. This means that when you ask a question, the system searches through the uploaded documents to retrieve relevant information and incorporates those specific parts into the response dynamically.
So, I only "see" what’s relevant to your query at the time, rather than having the entire document always loaded in memory. This keeps responses focused and ensures data privacy by only referencing what’s necessary for the question.
I have the same question. 16:19 "It will be slower" part is what I am wondering about. If RAG can search and bring the most relevant pieces from the whole set of docs, we should always include all the docs, and it should be approximate the same amount of time
I think me meant it will be slower to demo/film
In practice yeah youd dump the whole folder of relevant docs@@leoxiao2751
How can i feed my uni notes to an LLM?
@@notaras1985 Literally the video tells you how. You just need to make sure your notes are typed as text on a keyboard or scanned with OCR and saved as pdfs. I don't think the LLMs can easily "read" images.
Yeah, I'm pretty sure this clarification is needed within the original video. I believe what you said is more accurate regarding how ChatGPT treats things and would certainly explain why I've gotten strange results. I think of the context window used in chat as an immediately available short term memory and the uploaded documents like a long term memory library which gets fetched as needed. I believe ChatGPT is more like a RAG tool, while other tools like NotebookLM by Google, is the context window type they claim in the video originally regarding ChatGPT. *Correction: As I watch more, I realize they uploaded documents in the video to chat directly, which would make that usage a context document rather than RAG. But I still stand by what I say regarding the custom GPT side of things.* NotebookLM seems to incorporate documents in the context window directly, hence why it errors out when the total tokens of all documents exceeds at least 2 million tokens as of now (due to 2 million context window in its current model vs ChatGPT 128k). ChatGPT can go beyond that not because of context window capacity, but the fact that uploaded documents are more of a long-term memory library not immediately available unless fetched. It's more like unstudied look-up materials, whereas NotebookLM is studied look-up materials if you will.
Tokens (as in context window tokens) are only involved in the "short term" memory and information can be fetched in there as needed. I find most of the training process on custom GPTs actually involves optimally indexing these "long term knowledge library" documents with good labels and having good custom instructions so it accurately fetches the right parts without fail. If general models are fed inaccurate information from the internet in initial training, ChatGPT still pulls from the general inaccurate model despite being provided the information and instructions. NotebookLM, which is grounded in provided materials and everything is in the context window directly, answers far more accurately even without custom instructions! Obviously, that depends on whether custom instructions are needed to correctly interpret information provided. I believe that is due to everything being in immediate context vs the ChatGPT fetching mechanism.
I think it's more accurate, as you are implying, that ChatGPT is effectively a RAG rather than the context document due to the short vs long term memory concepts in it. NotebookLM would likely be the context document type because everything is in the context window itself and not some "knowledge library" getting fetched as needed. There is no "long term library" which gets spontaneously retrieved in NotebookLM. Yeah, I got tripped up in what Dave said about ChatGPT being a context window type, so thanks for pointing this out.
This is by far the best video you´ve ever done...... Thanks so much !!!!!!!
I was looking for this video - WOW, thanks DAVE!! Finally the RUclips suggestions actually work - Watch you all the time - subscribed, yet busy.. missed this video!
There are several inaccuracies here. Option two and three are actually the same because content is always added to the context window in real time. You can upload very large files to ChatGPT which would be inefficient to load in the context window at once. And expensive. So in your option two, where you upload a doc to ChatGPT, it does a sort of mini rag. It converts and chunks the content and searches for relevant chunks that fit your query. Your example of the custom GPT is in fact full RAG with multiple documents. In the backend, OpenAI uses Azure AI Search which is a vector database where your content is persisted. The documents you upload are chunked and vectorized. When you ask a question, relevant chunks are added to the context window to answer your query. Open WebUI uses a similar approach.
Thanks buddy, I was thinking the same
Ah, I just asked something about this few minutes ago. I was indeed confused because the 2 felt the same ... both seems persistent and "RAG-ish".
Thanks Dave! RAG now makes a lot more sense to me. This sounds like a way that would actually make AI LLMs useful to me.
RAG is a godsend. And it’s necessary as new knowledge itself is not used for LLM training, and forget about your/your company’s specific knowledge
Local Deployment: First, you'd need to deploy a local instance of an LLM. This could involve downloading pre-trained models like those available from Hugging Face, Google's BERT, or others, and setting up the necessary hardware (powerful enough GPUs or TPUs for running these models efficiently).
Customization:
Fine-Tuning/Retraining: If you have specific data you want the model to know or focus on, you can fine-tune the model. This doesn't mean retraining from scratch but rather adjusting the model's parameters based on your dataset, which could be documents, texts, or any other form of data relevant to your needs. This is particularly useful for creating domain-specific applications, like legal advice, medical diagnosis, customer service in your industry, etc.
RAG (Retrieval Augmented Generation): Set up a system where your LLM can pull in additional information from a local database or document set in real-time when generating responses. This method requires you to have a retrieval system in place, like an efficient database search or vector space model for semantic search, which would find and feed relevant documents or data pieces as additional context for the LLM's responses.
Contextual Inputs: Directly inputting documents or data as context for each query. This doesn't change the model's knowledge base permanently but allows it to generate responses based on the specific information provided for each interaction.
Tools and Libraries: Use frameworks like PyTorch, TensorFlow, or libraries specifically designed for LLMs like Transformers from Hugging Face, which provide tools for model fine-tuning, data preparation, and deployment.
Privacy and Control: Running AI models locally gives you complete control over data privacy, as there's no need to send sensitive information over the internet. This is particularly appealing for sectors like healthcare, legal, or any enterprise needing strict data governance.
Custom User Interface: Develop or adapt a user interface where users can interact with your specialized AI tool, whether it's through a web app, desktop application, or even a chatbot for internal business tools.
Integration: Your specialized AI could be integrated into existing systems or workflows, enhancing functionalities like document analysis, customer support, research assistance, etc., with tailored responses.
The challenge lies in setting up the infrastructure (hardware and software), managing the computational load, and ensuring the model's performance aligns with your specific requirements. However, with the right resources and know-how, creating such a specialized AI tool locally is not only feasible but can offer significant advantages in terms of customization, privacy, and application-specific performance.
What is the point of this? I see people like you commenting random generated overly wordy and unhelpful summaries of either the transcript of the video or literally just a simple question. Do you think that helps anybody? Is it just for interaction so you feel good about yourself?
@@joeshmoe4207 if you do not get it ... move on ... nothing to see here
Absolutely awesome content you are pushing out on this Dave, thank you so much!
Im a little worker bee at Apple with 0 programming skills and im using what im picking up from you to try and make our department a custom IA for some things.
Thank you!
LOL, white text on blue background, watching the 70B model generate was like a total flashback to the bulletin board days on dialup/early isps.
This comment reminds me I am old. BBS and the modem squeal are something that haunts me still... you've got mail.
How old are you! Nevermind, I'm that age also
A fully grown adult coworker looked at me and asked what it was like growing up in the 1900s, I nearly had a heart attack when I realized that that's how university grads refer to elders.
Thanks for the friendly introduction to this topic, Dave. Quite keen to try this stuff out now.
Outstanding! Finally, something useful on this topic... referring your channel to everyone I know! Love the approach you take to sharing information.
Dave... greetings from Uruguay! I must say... you're the man! Thanks for this one!!!
Again, wow! Your videos are incredibly helpful.
Thank you!
Dave, to get ride of the dialog box, right click on top of it and select inspect > select the html containing dialog box(the element should be currently highlighted) and right click on it and select delete element. This way you can continue your screencap recording without having the irritating box displayed.
@Dave's Garage i'm playing with this myself now. i'm having success adding docs via the Knowledge section for the workspace. then, when making a new model, it can reference that doc the same as when you scanned the folder. i did that because i am running the ui in a docker container. i know how to find and populate the volume mount, but i wasn't seeing a button to scan the directory. but the Knowledge section is another way to add the docs! great video!
Love this - I'm hoping to set up my own in the next year!
I'll need to watch this back and take notes! Cheers!
I also loved the part when your blinds momentarily lost connection to wifi at 11:12.
Thank you.. . I will be visiting your shop once in everyday.. and keep learning
Love your clear explanation, and I might not have heard you correct on why you wanted to run Ollama locally rather than in a container... but you can just mount a volume to the docs rather than run locally. That would enable maximum flexibility.
An excellent episode Dave. Really well done. The explanation of training, augmentation and session based approaches is very useful not just for this episode but for others as well. Thanks.
Brilliant... and prefect timing showing up in my feed, thankyou
You speak like a news anchor and it’s awesome
Hey Dave, Just came across this video and oh boy! I'm in the middle of researching all of this and you just straight up gave all the answers. Absolutely Love It!
One improvement at 14:15 is that you can mount the docs folder as a volume into the docker container at that path. That way you don't need to use the COPY command in the docker file and bake it into the docker image.
Thank you for all that you do. 🖖
Hey Dave, your vids are value packed - so much good info in under 20 minutes. I'd love to see a similar vid on AI image analysis - perhaps for use with security cameras, or with 3D printers to detect foul-ups etc.
This is awesome, what a time to be alive! Much appreciated, thank you Dave!
wow, I just found your channel and you're great at explaining!
quick, simple, and straight to the points!
Totally subbed.
You can also map a local directory into a docker container at a specified path with the --volume flag.
So there are no limits to the size that can be uploaded? Drive is usually 10s or 100s GB.
Hello terry, could you help me with something? I work with attorneys and judges in a court of military justice in Brazil. We have a folder that contains resolutions and ongoing processes. I would like to create a collection knowledge in Web UI that will be receiving the newly updated files sent by them in that same folder. I'm an intern in software development, but I have setted up a server with Ollama and WebUI that responds to us in a URL.
Wow! Ever since I found out how we could download these LLMs and run them locally, I've always been wondering how we can 'train' them to know some very specific information about our focus area. Thanks for answering this, Dave!
The tooling available today to make this so easy to do on your desktop computer is just amazing. The tech here is very dense and can be challenging to do without the tools available today.
This is where I wanted AI to be two years ago. Great video.
Once Sales directors, Project Managers and Product Owners realise that their company needs reliable and reviewed documentation to leverage LLMs, the nightmare of technical documentation will begin. I'm pro manuals and pristine technical documentation, but the majority of engineering teams NEVER document knowledge: I guess Knowledge Management will have a resurgence.
AI can auto-generate the doc from the code.
They don't want to be replaced 😂
2001: Hello Dave, I can't do that .....
I'm pretty excited to replace our home assistant's voice with HAL9000's
😂
Cringe
Dave I asked for this exact setup your talking about, dont know if you saw my comment but either way I REALLY appreciate this video.. It will help my create my own custom base of information for computer parts for gaming office and server ect.. SO GRACIOUS!!!
Dave, I paused the video after 11min - to thank you for producing it. And the PDP11... in the 80s we developed a graph-based logistics system running on it.
30 seconds into the video = LIKE - what an intro Dave. Love it!
this is great! but also a bit hilarious. It sounds like Dave is reading text written by GPT 😂. It is just so good and well structured. He may have the linguistic chops to write that well, most people I know don't. And whatever way this level of quality is attained, I applaud and appreciate it very much. I just learned a ton, thank you.
Thanks for this very informative video. Likes this practical hands on approach.
This is good. Best explanation I have had between the two. The best piece of advice is to run rag on a small model. Here I am trying to do it on 8b when I should be using a 1b
Thank you for sharing your experience and knowlwedge. You should build a Dave model so we can talk to you as we go thru our own compute adventures. I'm learning so much, just for fun on my home hobby network. Love this channel. Thanks again.
great stuff Dave, lots of things to try. Thanks for the direction.
Re: RAG - This is the first time anyone has shown me an actual USEFUL reason to use AI (aside from "Wow! It's really neat!) Thanks Dave!
I have a Wix site with chat that I’d love to use my own LLM for. This is a terrific start to figuring all that out. Thanks Dave!
Great demo. It shows that you had the knowledge to know when it gave a subpar answer. People going in without that knowledge won't have that ability though. It again shows that these are assistants that may need help, not replacements, as some are pitching LLMs as
sir you are teacher all kids deserve to have better chances in their future ... thank you for your work
Oh goodness, a PDP-11. Memories of days gone by. Thanks 🙂
It's a pidp11 a pdp11 replica using a raspberry pi. You can build one yourself with a kit.
Thanks! The RAG setup in Open Web UI is exactly what I was looking for!
New Subsciber and Former DEC PDP-11 Software Engineer !
Since Dave uploaded an 8MB document around the 8 minute mark, I asked gpt the following:
what happens if an uploaded pdf has more words that your input token limit ?
ChatGPT said:
If an uploaded PDF exceeds my 128K token limit, I won’t be able to process the entire document at once. However, I can still help by:
Processing the document in sections - I can analyze and summarize parts of the PDF separately.
Extracting key sections - You can specify which sections (e.g., chapters, pages) are most important.
Guiding you on splitting the file - You can divide the PDF into smaller parts and upload them one by one.
**Providing a compressed summary
Am I the only one who is impressed by Dave’s blazing typing speed? 😂 7:57
All required informative, watched without taking a breath. I have seen much worse videos in e-learning platforms. Good job. Thank you.
How useful and helpful! RAG is just the thing I need. Thanks!
Perfect explanation, thanks a lot! I did not know that open-webUI can do RAG, again thanks!
Fantastics video. First decent explanation of the often referenced RAG I've found. Will go try it for my data now!
Thank you for sharing your massive knowledge! I really appreciate this. Liked this video for the super informative content. Keep going! Love your videos.
You could use docker and mount a physical drive/folder on the host as the doc folder or a subfolder of it, for which you probably need to adjust the Dockerfile setup, so you won't need to copy, if you have access to whatever runs the docker container
WOW, another awesome video, explaining, defining AND SHOWING AI things SO WELL! THANK YOU! Genius level work!
These options have kind of changed in the latest builds. You can now directly upload the knowledge from the UI frontend. So, it's much easier to add your knowledge base. Also when making the KB or the Model the button to add is all the way on the right side a tiny little + icon. You can miss it if you don't look closely.
Wow! PDP11??? I did my summer practice on a PDP8 + Donner Analogue in 1970. And met my first PDP11 some years later when I started to work at DEC. Nice video and nice memories 😊 Thanks!
Thank you for the clear instruction and still bringing back memories of PDP11's.
Fantastic vid Dave. This will help the individual level of employees start to incorporate AI into their lives. Very well done. Thank you.
in Docker setup - you don't need to copy the documents manually. You can simply use binded mounts, so that you can easily share docs between host and container. However in docker - you might have some issues utilizing GPU (e.g. on a Mac)
Thanks for being so clear and detailed in your presentation. Even someone like me without formal tech training can understand and follow the steps easily. 🙌 Liked. Subscribed. Waiting for more!
The keyboard sounds is just awesome.
Your pre-recorded summary following the RAG demo was about your Custom GPT. Good vid anyway. Thanks. 18:20
How a legacy codebase could be used with RAG would be interesting. A base that had the learning of the codebase language would be obvious starting point. Thanks very much for ruling out relearning.
I downloaded Ollama and llama3.2 and asked it if it was running locally. No, it said, it could only run from a server, not locally. So then I pulled out my internet connection and continued to chat with it!
Hahah, nice!
Haha yes, even when you tell it that it runs local, it won’t believe you. Only after telling it how you did it, it will start to believe you 😂
This is really interesting. I want to take advantage of the RAG part for story-writing. English isn't my mother tongue, so I want it to assist me in writing a very long story, and probably the RAG approach will help me remember all of the details. Wish there's an option of which messages/instructions will be incorporated without manually updating the documents.
Thanx Dave - this was very timely!
thank you for the brief intro into setting up a custom GPT, online and local. but the openwebui+ollama combo as docker container is still fit, you just need to map your host folder to the openwebui container path, there's no need to COPY something INTO the container
Thanks, I needed this. Open-WebUI is just what I was looking for.
Nice, really nice. I've listened to many AI stories. If you do your homework, Dave is THE best professor.
Thanx Dave. I think I finally found a way to get rid of all that zillions of user manuals of all kind of equipment (from kitchen to garden and everything in between). And still find an answer when I really need to use a manual.
It took a while, but it turns out to be easy to add documents using web-ui in a docker container under windows. Just using the add documents + button. I don't know where they go, if anywhere but it does work. ChatGpt suggested it. It wasn't obvious on my screen layout, but it is there.
I dont get it. Explain it better
@@gracegoce5295
I can't replicate the steps since moving to Linux as an experiment. So whether the later versions of openweb-ui are different or the Linux one is I don't know. I'll swap back to Windows soon and maybe that will be different. For now I can't help.
@@gracegoce5295
I still can't replicate it it looks like the feature has gone. I have a new system and am at a complete loss too now,
Great video. Always great audio.
Thank you. This is not self-explanatory but your video helped me make this work.
NotebookLM also lets you upload documents into the context and chat with them. The interface is really great.
oh, just a straight and easy information. what a man, thanks
Thanks Dave. This is what i was looking for.
Clear and precise. Thanks for making this video
Thankyou for running the other models. There’s a lack of people actually just running simple ollama comparisons from hardware.
SOLD, I'll install open webui . Thanks Dave
Thanks, that video was actually very helpful. I am midst working out different ways to use our local documents to chat with them. Preferably with a local hosted LLM.
Love being a sub, and love to like videos with this quality content. This is one of the most exciting projects I have got to play with in decades ... This AI stuff is the door to a whole new world of learning . LOVE IT !!!! I love that for now it is free....
Exactly what I needed, exactly when I needed it.
Hi Dave, what a great video. Thank you
So cool, you create an PDP-11 expert. I am old enough to remember this type of computer. The video was also very informative. Thank you.
This was a very informative video! Thank you.
I am both stunned and very very apprehensive. AI is beyond human.
Ai is advanced pattern recognition and matching that’s all lol
Dave - Thank you for yet another excellent video.
I am running the the Ollama 3.1 model on a system with 128 GB of RAM, base on what you taught in your earlier video. Thank you for that video.
However, I was unable to run the Web UI, because my graphics card, which is powerful, was an older NVIDIA card, and there were missing NVidia DLL's that were needed. The newere installations were incompatible with my graphics card.
SInce I just wanted to run it for myself, I really don't need a graphics card at all for running the model. I wish there was a Web UI that didn't require the NVidia DLLs.
nice, straight to the point. You should do an updated version no that we Deepseek R1 as Retraining should now be possible