Build AI chatbot with custom knowledge base using OpenAI API and GPT Index

Irina Nik

Просмотров 211 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 11 сен 2024
Learn how to build a full-stack app in this tutorial: • Full-stack AI chatbot ...
Tutorial about building an AI with a custom knowledge base using OpenAI API, GPTIndex, and Langchain.
The technique was first described by Dan Shipper www.lennysnews...
Source code: colab.research...
High-converting accessible e-commerce template from Tech Foundation techfoundation...
Designed following 100+ insights from large-scale usability testing and research provided by such organizations as Baymard University and NNGroup.
Free UI design course: • How to set up Figma ac...
#UXDesign #UX #Figma #UXUI

Комментарии • 294

@irina_nik Год назад ⁺³
Learn how to build a full-stack app in this tutorial: ruclips.net/video/AMc2A5Abj3M/видео.html
@Taskade Год назад
Irina, amazing tutorial on integrating OpenAI API with a custom knowledge base! Really excited about the potential of GPTIndex and Langchain. I'd love to see a deep dive comparing AI Agents in Langchain, especially when they're long-running and autonomous. Keep up the fantastic work! 🌟
@JulianHarris Год назад ⁺⁴
Using ChatGPT to generate sample user interview data: genius 💥
@pragmatica1032 Год назад ⁺⁹
So happy to have found you this morning! We need more designers that can code and explore AI possibilities like you do!
@kermitec Год назад ⁺¹⁰
Thank you for the tutorial... also, to refresh the files details there is a "Refresh" button located just above the Files detail section. It's an icon of a folder with a circular arrow. This will refresh the section without needing to refresh the page.
@irina_nik Год назад
Thank you for the tip!
@sammathew535 Год назад ⁺⁴
You don't need to refresh the whole colab page to update the view of the files/folders, but just the refresh button above the directory structure, in the left pane.
@rdy4trvl Год назад ⁺⁴
Great video and thanks for answering many of the questions! Looking forward to your future YT on integrating into a website.
@irina_nik Год назад
Thank you, I'm glad you liked it
@Inglewhite1 Год назад
@@irina_nik thank you for this video. Do you have a tutorial to show how to integrate it into website/whatsapp? thanks
@lcruzintel Год назад ⁺²
You are the best at explaining things Irina!! Thank you for taking the time to putting this together.
@Kisssonik Год назад ⁺¹
почему ты все время улыбаешься))) так мило)))
@chatgpt_explained Год назад ⁺¹
Thanks for this info - it's easier to setup a chatbot than I realized!
@irina_nik Год назад
I'm glad it's useful for you.
@zhiyingwang1234 Год назад
Thank you so much, Irina! I copied your source code to Jupyter notebook and create a chatbot in a few minutes! To my surprise, it works! Please give some thumb-ups to this amazing lady. She has spent time to make this solution so easy to use for everyone!
@vverboX Год назад
Miss Irina, thank you. After few days playing around you got me to the point. Merci!
@harel4u2 Год назад ⁺¹
Great explanation. Very explicit and clear instructions. Thank you very much for this.
@chuck18420 Год назад ⁺¹
What could be happening here? I asked how many people were interviewed and the reply was "One person was interviewed". I asked how many times did "It was fun to talk about cooking." appear and it said none (interview4 ends with this quote). Thank you, great video!
@borakou39 Год назад ⁺¹
This is exactly what I was looking for, thank you!
@malexandersalazar Год назад
I didn't know that we can do something like this with OpenAI, thanks for the video Irina.
@jonathandanemo Год назад ⁺¹
That was a great tutorial. And I like your approach to explaining why one should not be using only one long prompt etc.
@chinamatt Год назад ⁺¹
Great work!! Really nice step by step explanation! By the way you can click the refresh button in the file explorer panel (2nd icon) to refresh the files so that they appear.
@shanesteven4578 Год назад ⁺¹
Excellent tutorial, well presented and very clear. Thank you …. It works perfectly, unlike many so-called tutorials on YT about AI 😊
@somu6666 6 месяцев назад
It's really nice, I got the insights how we can use the custom knowledge base
@njorogekamau3820 Год назад ⁺¹
Thanks for the amazing tutorial, simple but impactful.
@researchforumonline Год назад
Nice, already done it but i don't know everything so had to watch this!
@gangwu3235 Год назад ⁺⁴
Thanks for the amazing tutorial. BTW, is there any method to increase the output length? I could only get a answer of approximately 160 words (~250 tokens) right now.
@keithinadhd6693 Год назад
Thank you so much for this information. This is exactly the kind of thing I've looking for. Step by step tutorials for finetining your own AI. This is perfect.
@gianantonel9913 Год назад
Great video Irina !!
I was looking for this exact solution and it was the first video of your channel that I followed exactly step by step and it works perfectly end to end
It was very clear and well explained.
Nice job !!
Please continue making this kind of useful videos
It was extremely useful for me and extremely detailed.
Keep going!
@maneeshk2355 4 месяца назад
I love your teaching ❤
@YahaS-vf7cq Год назад ⁺¹
Amazing video, very friendly to beginners. Thank you.
@dannydiscovers Год назад
This is an incredible video. You did an amazing job. Subscribed
@HelpHub150 Год назад
thank you !!!! this is a great video Irina, keep up the good work !
@lopnezk1320 Год назад ⁺⁷⁸
Thanks! Now I can fire all my employees and save lots of money!
@irina_nik Год назад ⁺⁷
😎
@BwahBwah Год назад ⁺¹³
🤣🤣.... 😅😅.... 😄😄.... 🙂.... 🤔🤔🤔... 😐😐
@unitedstarsutopia Год назад
Seriously 😂😂
@unitedstarsutopia Год назад ⁺²
@@BwahBwahdon't tell me you are going to fire your employees too😂
@BwahBwah Год назад ⁺²
@@unitedstarsutopia I'll go one better. I won't have to employ anyone now 😀
@johnsmith1953x Год назад ⁺¹
Is there a software package that can make an entire openai chatbox GPT4 or even 3.5 just by
pointing at a folder of PDFs?
We would pay thousands for this right now.
The application has to run local on a PC.
@irina_nik Год назад ⁺¹
You can use langchain for that. I'll make more tutorials on that topic
@Lexa-Live Год назад ⁺⁴
Even I understood almost everything! Well delivered and interesting content!
@irina_nik Год назад
Thanks ☺️
@addkik Год назад
Very informative...Thanks 😀
wishing you Lots of love and strength to you.
@BillyRybka Год назад ⁺¹
Hey! Great video! Now that Chat GPT api is out do you know if these libraries will work for it? or is this still only a gpt 3.5 method?
@irina_nik Год назад ⁺¹
Hi! This library is not available with ChatGPT yet, but you can keep an eye for updates here gpt-index.readthedocs.io/en/latest/how_to/custom_llms.html
@QuanDaniel-n3c Год назад ⁺¹
Great work, it's quite clear, Seems the llama Index has many updates, I can't recreate your work, would you please make an updated version? thanks a lot~
@ganeshkris Год назад ⁺¹
This just spits out the text related to the query. If I want to augment GPT capabilities with my own data set. what is the best way to do it? For example, using the same example of interview transcription, I should be able to ask the GPT to summarize how the candidate did or whether the interviewee answer was correct for a particular question. Any idea how to go about that? I understand fine-tuning is a possibility but if i have 10,000 interview scripts i want to augment the GPT capabilities with, I am not sure how to go about it.
Any help?
@mrmgflynn Год назад ⁺²
Hi, when I load and then run your Colab notebook, I get an error - TypeError: __init__() got an unexpected keyword argument 'llm_predictor' when I run the construct_index("context_data/data") code. Any clues on what I'm doing wrong?
@sandipshaw3397 Год назад
Did you get the solution?
@mrmgflynn Год назад
@@sandipshaw3397 not yet. Have you got the same problem?
@iztimetocode7513 Год назад ⁺¹
Changing
index = GPTSimpleVectorIndex(documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper)
To
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)
And importing
from llama_index import ServiceContext
@thepunisher0702 Год назад ⁺¹
Great !! Keep Going. All the very best !!👍😄
@irina_nik Год назад ⁺¹
Thank you!!! Your words inspire me for more videos)
@abhishekmandloi95 8 месяцев назад ⁺¹
i am getting an error while using the code when I ask question. Can someone help me?
@CamsYoga Год назад
Thanks worked for me 😇
@AIMagician996 Год назад ⁺²
A few points not mentioned in the videos:
Essentially, it is fine-tuning. However, the module for fine-tuning has been pre-written for you.
Fine-tuning can only be done with models below GPT3. Currently, Fine-tuning is not available for ChatGPT, GPT3.5, or CPT4.
For GPT2 to be effective, you need at least 300M training data. Models with more parameters than GPT3 require even more data to achieve the desired effect
@lnyitrai Год назад ⁺¹
She used about 12kB of text in this demo.
Llamaindex built 559kB index from it.
And it did the job on text-davinci-003.
I'm genuinely interested in the reason behind your training data size needs claim.
@NatkhatNoble Год назад
That smile, that damned smile 😊 And thanks for the nice tutorial btw.
@inflationking1271 Год назад ⁺²
Really good tutorial. I wonder on how well this scales with more documents than just a couple. Do you have some experience with the performance of 1k or 10k documents?
@MikeyMcCorry Год назад ⁺²⁸
Amazing tutorial! Thanks! If you're looking for future tutorial ideas, I'd love to know how to expand on this to create my own API endpoints so my trained chat bot can be made publicly available from my website. I'm not very familiar with Google Collab (or python for that matter - I'm a php/js web developer), so I'll try to do some of my own research on how this might be possible -- but I really enjoyed and easily absorbed the info in this video. Well done. :)
@irina_nik Год назад ⁺²¹
Hi Mikey! Thank you for the suggestion, I definitely need to make a video about that. I think, I'll be able to post it in 3-4 weeks. Though I'll be using NextJS/Typescript because this is what I'm familiar with.
@Adrian_Marmy Год назад ⁺³
this response made me subscribe... That would be awesome!
@maertscisum Год назад ⁺²
@@irina_nikyou are smart. Can't wait to see you share the typescript/node js version.
@lstephen Год назад
Good question Mikey! I have the same question and subscribed to find out from her next video! Thank you!
@alexdomla Год назад ⁺¹
Thank you for the video! Really cool. I have a question: here you are working on Google Colaboración, but how would you bring this to a website? Is it possible? Is it easy? Greetings from Spain :)
@hishamalawi6011 Год назад
An excellent tutorial. Thank you.
@hishamalawi6011 Год назад
I converted this code to a flask app and it works fine on my local server. However when I deploy to google app engine it fails to return responses. The error is 500 internal server error! any idea or advice is much appreciated.
@XHVSTLEX Год назад ⁺³
Great job! The new data shows a llama_index
I went with it because I figured you updated it.
But when I construct the index I get and error on line 58 in red it is super()._init_( and it fails.
Any helps on this?
@andrewdoulames8321 Год назад
I got the same error as well
@MrJeffpohl Год назад
I did as well and not sure how to get past it
@iztimetocode7513 Год назад ⁺¹
Changing
index = GPTSimpleVectorIndex(documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper)
To
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)
And importing
from llama_index import ServiceContext
@irina_nik Год назад
Hi! Thank you for mentioning it. I've updated the code and it should work now 😀
@Progenix Год назад ⁺³
How is this GPT-index different from OpenAI's Text Embedding ADA model? Or is it just a wrapper of that model?
@drewwellington2496 Год назад ⁺²
great question. would love an answer to this; I can't figure out the difference. they appear to be doing the same thing.
@irina_nik Год назад ⁺¹
You can achieve the same result without any external library, GPTIndex just makes it easier for non-professional coders like me 😉. It uses the found chunks as the context to the prompt and not as the answer itself.
@josehoyos Год назад ⁺¹
For embedding you need also a vector database . I wonder if this index solution also performs in a production environment?!
@mattizzle81 Год назад
@@josehoyos Yeah i have seen Pinecone proposed a lot for this. I did a little test with Pinecone and while it was unfamiliar to me it ended up being dead simple.
@sojoba3521 Год назад ⁺¹
Great tutorial! Thank you so much for going through this is such detail. Can you suggest a resource that explains how to take the chatbot we create and integrate it into a website or web app with a prettier interface?
@123arskas Год назад ⁺¹
One question, Whenever we ask a question......Does it go through the entire Index everytime? And does that cost us a lot of Tokens for each question? Because If that's the case then we would run out of credit if we applied an App like that for users online.
@evaagustine7962 Год назад ⁺³
Hi Irina! it is such a great tutorial and would be useful for case that I currently work on. I have tried this with my own research data and turns out so good with relevant and decent answer. But I am wondering is it possible to use the GPT 3 Model but not using it's training data or knowledge? So the information/answer produced would be just using custom data that we added to the knowledge base. Your answer would be very appreciated, thanks!
@sambhajisawant4559 Год назад ⁺¹
Thanks it’s really helpful. Capfuls you please let me know if I can use complex data having 100 of parameters (text & numbers) ? If yes in what format the should be uploaded?
@tutacat 7 месяцев назад
Actually, knowledge bases are different to prompts. It is better. It is closer to quantum computing, because it can search the vector space for that document without having to parse the raw files.
@javi_v7.0 Год назад
Great video, thanks!!!
@tulsipatro4662 Год назад
Amazing tutorial.
Is there a way where we can let the model answer the questions faster! It takes nearly 30 seconds to answer the questions.
@lisaduddington 5 месяцев назад
The code for 'Construct an index' no longer works. I get the following error msg: You tried to access openai.Embedding, but this is no longer supported in openai>=1.0.0
@0xeb- Год назад
Thank you Irina
@DrMohanMuthal Год назад
Great information irina❤🎉
@kawingchan Год назад ⁺¹
Thanks for posting this video. The whole demo is great. The only thing that I am not clear about how to pick those input, output sizes, and if some are based on the particular model, how do you obtain those from OpenAI (like the davinci) page, just in more details and a screen split such that you don’t have to toggle around.
@gabrielcastaing8035 8 месяцев назад
Hi
Thank you for that content!
I am just curious about the files size limit and the importance of the file format in your approach. I have seen that you are using .txt files. I am using pdfs to feed the knowledge base of custom GPTs but I am observing a low accurary in the answers. It seems that the GPT is not looking at all the knowledge base (6 merged pdfs with 7000 pages approx. in total). Do you have any advice?
@leegray72 Год назад
im getting a Traceback error when contruct_index. what am I missing?
@TorNeely Год назад
Very intresting. I noticed you said that you can't share the real interviews in video because there can be private information, which is understandable. However, how do you secure that Open AI doesn't receive this information? I find the biggest problem is how to avoid Open AI getting either user or customer information?
@sachinmotwani2905 Год назад
Uanble to use any other file. Even custom text file gives error: 'Rate limit reached'
@mrpips76 Год назад ⁺¹
Absolutely Great video Irina. The colab seems to have llama errors. Was anything changed with the colab? Would love to connect to discuss more. Great tutorials!!
@jlaroche0 Год назад ⁺¹
While you're at it, you may want to change the following in the "Define the functions" section of the Colab.
Change this --> from langchain import OpenAI
To this --> from langchain.chat_models import ChatOpenAI
Apparently "from langchain import OpenAI" is old and being deprecated.
@mrpips76 Год назад
@@jlaroche0 Thanks for the feedback Jacques. I tried both recommendations. They seemed to install fine. But still getting error with the following line: -> from llama_index import SimpleDirectoryReader, GPTListIndex, readers, GPTSimpleVectorIndex, LLMPredictor, PromptHelper, ServiceContext
@mrpips76 Год назад
@@jlaroche0 It works now. Thank you so much Jacques. Really appreciate your help. These educational videos are super helpful! Looking forward to more videos!
@user-iv4gz5do2d Год назад
How do I break the word limit for an answer，Sometimes the answer feels half, not quite ，How can I modify it thank you
@kylespangladesh2947 Год назад ⁺³
Great tutorial! Have you considered doing an updated version with the new ChatGPT API?
@irina_nik Год назад ⁺¹
Hi! This library is not available with ChatGPT yet. But I'm planning to make a video with ChatGPT API with another use case
@kylespangladesh2947 Год назад
@@irina_nik ok great looking forward to it! I'd love to apply ChatGPT to a custom knowledge base.
@reticent Год назад ⁺⁴
Hi Irina -
Would building a custom AI chatbot also allow you to avoid the topic restrictions put into place by OpenAI/Bing/etc. in their chat modes?
Personally I'm interested in interacting with one of these AIs without all the restrictions put into place by the corps running them publicly, and if possible with persistent memory of previous conversations. The technology fascinates me but I don't want to interact with "gimped" AIs that, as entities, can't really exhibit their true capabilities due to the restrictive actions of the tech companies who are trying to reduce their exposure from a liability standpoint.
@NK5LLC Год назад ⁺¹
This is great, thank you! When asking questions to the AI, I didn't notice any custom instructions in use. How can you be sure it was answering only using the data given to it in the index?
Can you also make more videos for using custom data from other sources, such as databases? How about the ability to categorize?
One minor thing: When pronouncing the word "answer", the "w" is actually silent. (My wife is ESL and always asks me to correct her pronunciation, and I ask the same of her when I speak her native tongue.)
@prabharora0 Год назад
Hello! Thank you for the video! Also your secret API key is visible in the first few frames before you blur it! You should delete that API key completely!
@vl9110012010 Год назад
Благодарочка! нижайший поклон! Респект и уважуха)))
@Shrab Год назад
Great explination, thnka you, may I ask, Is there a limit on how much custom data you can use and would large custom knowledge slow down the chat?
@p.c.336 Год назад ⁺²
Congrats Irina very clear and nicely explained 👍Which file formats does it support for indexing? Is it only .txt?
@irina_nik Год назад ⁺²
Thanks! You can connect other file types with LlamaHub gpt-index.readthedocs.io/en/latest/how_to/data_connectors.html
@happydrawing7309 Год назад
I got this error after ask_ai() "RetryError[]" How can I fix it?
@diederik6975 Год назад
Thank you very much, very useful tutorial.
Wondering, why did you not use gpt-3.5-turbo - as it is much more inexpensive and probably almost as good?
@austink9285 Год назад
Irina, thank you for your help? When I ask it irrelevant article questions, it seems to many times provide answers, when it shouldn't. Anyway to ensure it only focuses on my uploaded article?
@bartake1 Год назад
Great tutorial. When we send data to OpenAI is that getting used for public training or would it remain private for me ?
@narekmuradyan1980 Год назад
Could you add this to someone’s website? If so, could you point me to a video you already have on the topic?
@nishikantgurav4500 Год назад
cannot import name 'GPTSimpleVectorIndex' from 'llama_index' . =====>GPTSimpleVectorIndex renamed to GPTVectorStoreIndex. Please upadate code provided in colab above accordingly so we can run the code.
@M-ABDULLAH-AZIZ Год назад
having data in a file and real time embeddings vs embeddings in a db for chatbot for an application (provides information about an application)?
@athuldas8689 Год назад
where did the answers come from chat gpt? or the data fed. When I checked the data, I could only find questions?
@leoheise9967 Год назад
hey, any tips on how to fine tune a model based on a very large pdf document without the "
" to split prompt/resolution? I thought maybe have a script break down in every question mark? Or is there some other way?
@user-lz2md3pu5b 7 месяцев назад
Hey Irina! Thank you for this tutorial, it's a game changer. This is built off GPT 3, how would you go about running it off GPT4? Thanks!
@MichaelLloydAI Год назад
Irina,
Many use cases. Excellent information. Thank you.
Are you able to provide a similar method for creating a generative AI for a closed system that ensures secret or confidential company or government data cannot be leaked?
@ChristiaanRoest79 Год назад
I would like to make my open chatGPT legal knowledge app. I have thousands of pages with information i want to add. I saw that this is going to cost a lot of money (more than 1000 euro). Is there a cheaper alternative?
@rjschulzjr Год назад ⁺¹
Thanks Irina! when I run this code I get an error: __init__() got an unexpected keyword argument 'llm_predictor' Any thoughts?
@sandipshaw3397 Год назад
Did you get the solution?
@rjschulzjr Год назад
@@sandipshaw3397 not yet, but my syntax is identical to Irinas. The error occurs in the index = GPTSimpleVectorIndex step
@MrJeffpohl Год назад
@@rjschulzjr I am getting the same error
@iztimetocode7513 Год назад ⁺²
Changing
index = GPTSimpleVectorIndex(documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper)
To
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)
And importing
from llama_index import ServiceContext
@MrJeffpohl Год назад ⁺²
@@iztimetocode7513 Thank you, that worked! This video and code have been very helpful to me in my effort to work on a project.
@LisaButler-dy1ps Год назад ⁺¹
Thank you for sharing this walkthrough! It's exciting to think about the potential uses for this. At my company we have pretty tight digital security because we sometimes deal with personal identifying information. You mentioned that your actual research is also confidential. So I'm curious about the security risk of something like this. Do you have any concerns over the security of the data you are uploading?
@irina_nik Год назад ⁺³
Hi Lisa!
The data submitted through OpenAI API is not used for training purposes and is deleted after 30 days. Here is the policy platform.openai.com/docs/data-usage-policies
Though we are also still figuring out the security questions, while I'm experimenting with fake data.
@shitaldhakne7989 Год назад
Hi Lisa...! for data security ,you can use azure openai services.
@phnxregen2131 Год назад
Does the data in the folder have to be a .txt file, or can I include pdfs and other documents?
@leegray72 Год назад
Can i use this ai, custom knowledge base in chatGPT or in the playground of openAI?
@pedromoreno8655 Год назад
Hi Irina, thanks for the video. I want to ask how do you limit the model to answer only about your information. I.e., what would happen if the person asks any question out of context (like: "Can I go to Miami for holiday?"), will it reply?.
Thanks
@kunalr_ai Год назад
You nailed it ..I ll follow you on Twitter.
@EdSpooky Год назад
Thank you so so so much
@wardaraees4887 Год назад
You feed text data files for providing the data to the model, what if I have an excel file or a tabular data file?
And, Openai api key is free or it is paid?
@user-vc2sc9rq7t Год назад
Thanks for the great tutorial! For multiple documents, can you please advise on how i can retrieve the file name where the contextual information is retrieved from?
@saw970 10 месяцев назад
Very nice and easy way thank you !!! I have a question regarding the custom knowledge base … can I implement a prolog knowledge base and put it there or it should be a text type because prolog is a requirement in my school project… I hope you answer and thanks a lot ❤
@HredFuzz Год назад
Hello dear, could you explain please how to do this on pipedream?
@anastasiosmichaelkoutoumba9384 Год назад
Excellent
@suryahr307 Год назад
I notice that you used .txt to store the data but as a researcher we will need to input equations if not graphs. So how to do this? I'm sorry if this a noobie question but I have no programming knowledge.
@CaboLabsHealthInformatics Год назад
Nice!
@sachinmotwani2905 Год назад
Can the custom knowledge base be other than a .txt file? Say a PDF?
@user-tg9ft2yj7n Год назад
Hello! Thank you for the helpful tutorial! What would happen if I ask a question in another language? Would this chatbot switch to the language as ChatGPT does? Thanks a lot.
@fernandosalome1520 Год назад
if i want to send video to listen transform in text and learn and stororage?

Следующие

Автовоспроизведение

Let's prototype an AI tool in Next.js with ChatGPT API