This tutorial truly stands out from the rest! After struggling with coding along other RUclips tutorials on GPT-index and langchain, I finally stumbled upon this gem, and behold, my code worked like a charm. Thanks for sharing your expertise and making it easy to follow. You're a lifesaver!
Thank you, Shweta. I was able to get something working based on your code. Please note that lots of libraries are now outdated (gpt_index is now llama_index). My complete code (works 100%!): from langchain import OpenAI import sys import os from llama_index import SimpleDirectoryReader,GPTListIndex,GPTVectorStoreIndex,LLMPredictor,PromptHelper,ServiceContext from llama_index import StorageContext, load_index_from_storage def create_index(path): max_input = 4096 tokens = 200 chunk_size = 600 #for LLM, we need to define chunk size max_chunk_overlap = 20 prompt_helper = PromptHelper(max_input,tokens,max_chunk_overlap,chunk_size_limit=chunk_size) #define prompt llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-ada-001",max_tokens=tokens)) #define LLM docs = SimpleDirectoryReader(path).load_data() #load data service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper) vectorIndex = GPTVectorStoreIndex.from_documents( docs, service_context=service_context ) vectorIndex.storage_context.persist(persist_dir="storage") return vectorIndex def answerMe(): storage_context = StorageContext.from_defaults(persist_dir="storage") index = load_index_from_storage(storage_context) query_engine = index.as_query_engine() q = "What is the best plan?" print(q) print('------------') response = query_engine.query(q) print(response) #create_index('data') answerMe()
Thanks Gleb. I have created few videos on how to fix these broken ones and few more are on the way. Please feel free to let me know, if anything else is not working. I will try to provide solutions at my earliest convenience. Thanks once again.
Question about execution flow: Hi Sweta - Thanks for a very helpful tutorial. Trying to wrap my head around the execution flow. For example, createvectorIndex probably sends the document to openAI servers, the servers return embeddings that are then stored locally. During answerMe - is the prompt sent back to openAI for embeddings? Does answerMe again send document embeddings back to openAI servers? How much is done in the local process (on PC) vs how much is done by openAI servers? Thanks.
One way to understand this is by disabling the internet connection once you have embedding stored on your local device and then try to call answerMe. Doing this will clear all your doubts 😊
Hi Shweta, Awesome video, great learning. Thanks. Just one concern - when we grant access to ChatGpt to our custom data stored in our machine, then is there a risk that the data can be copied/ used by users of ChatGPT or Open AI or otherwise?
thanks for your contribution, it's easy to understand for a beginner like me. You can continue to make the next video to guide the question and answer continuously like chatgpt.
Thank you so much for this! You made something very intimidating to me super easy to understand. I am very grateful for the time and effort your put into this video. Thanks a lot!
I noticed that in the implementation shown in the video, the GPT model is being called every time a user prompt is received, and this can be expensive in a real-world scenario where the application is serving multiple users concurrently. Each request to the GPT model requires a certain number of OpenAI tokens, and this can quickly add up and become expensive.
Hi, thank you for the video! I have tried this code to answer questions based on data about a very specific product. It answers well, however, it answers general questions as well (such as "What is Earth?") having no information at all about this in the files provided. How can I make this code answer based on the information that I provided only?
Thanks for watching Mikhail. Give a try to my this video, it gives better results - Use Your Locally Stored Files To Get Response From GPT like ChatGPT | Python
Hi Shweta - I am planning to follow this tutorial, it looks amazing, I was wondering if you know if the local data we used is kept and is local only or if any of the data gets make public or goes back to OpenAI?
Hi Shweta, that was a great tutorial! However, I have a question. Just like you used the custom data from the local disk here, How can we use the data from aws/elasticsearch?. I have a huge database (About 20 million records) which our employees access via querying on elasticsearch, and If I wish to create a custom chatbot trained on that data, how would one achieve that?
Thank you so much Shweta! You got me so passionate on the topic. After completing your tutorial how do I move this into a chatbot that I can bring to my app? I’m stuck
hi Shweta, i tried this code on my end but in { vectorIndex.save_to_disk('vectorIndex.json') } is giving me an error so i tried { index.storage_context.persist('vectorIndex') } this is not creating a vectorIndex.json file on my system its creating a folder name vectorIndex.json inside which i has 4 json file docsstore, graph_store, index_store and vector_store.. can you pls tell me where i am going wrong
@shweta-lodha, wonderful video and article. Thank You. Please keep up the good work. One qs: is there a way to implement the same solution using a web based solution - angular, react etc.
Thank you so much Naveen, glad you liked it. I was thinking to try this out but couldn't find API endpoints. Please let me know, if you come across any such documentation.
For me, this code was throwing an error. If you update the code by first importing ServiceContext from gpt_index (along with the other stuff you were already importing) and then update the code by adding the following lines, '-----#load data docs = SimpleDirectoryReader(path).load_data() service_context=ServiceContext.from_defaults(llm_predictor=llmPredictor,prompt_helper=PromptHelper)
#create vector index vectorIndex=GPTSimpleVectorIndex.from_documents(documents=docs,service_context=service_context) --- it should solve your issue. P.S: I used this code from another one of your videos :))
ImportError Traceback (most recent call last) Cell In[29], line 1 ----> 1 from gpt_index import SimpleDirectoryReader, GPTListIndex, GPTSimpleVectorIndex, LLMPredictor, PromptHelper 2 from langchain import OpenAI 3 import sys What should I do for this error???
I have not found such a detailed explanation of Open AI based Chat bot, Thankyou for Sharing. Can you please make a video about how i can feed my NLP based ML model to create a Chat bot based on ChatGPT.
This looks like a brilliant tutorial thank you. Please excuse the silly question, but where are you editing your code, I opened IDLE but it must be some other editor/console? Thank you again.
I am using Visual Studio Code (VS Code). You can install it as it very easy to use. Please make sure to install Python and Jupyter extension from Extensions panel.
Very good tutorial, one question if have some logs which contains sensitive data like ip etc, if I feed the logs to the llm will my data move out of my system to the llm model provider's server. If so is there a security threat. I have a presentation to make on using llm to my company and for sure this question will rise, any feedback will be appreciated
Thanks. Yes, it will go to model provider’s server and it could be a security threat. If privacy is your concern, then my suggestion would be to use Azure OpenAI
Hello Shweta Lodha, I have a couple of questions. Firstly, do you have any ideas on how we can reduce the cost of tokens? It would be really helpful if you could make a video explaining any potential solutions. Additionally, as a learner, I'm facing issues with expired APIs. Do you have any suggestions on how I can resolve this problem? Thank you in advance for your help.
@Shweta this is brilliant! as usual top notch like any of your other videos, thank you so much for taking time doing this. One question here, can you please provide the differences in the parameter if the same had to be achieved through Azure Open AI with openai.api_base, openai.api_type, openai.api_version, deployment_name - I have been trying to meddle with this but no luck so far..
Thanks a lot great video , just want to know if all my data is stored in sharepoint how can i take them all into the same mechanizm u did thanks again for this great tutorial
Thankyou for this detailed tutorial, so you asked 2 question from it, can you please share how much openai credits does it used in this complete operation? that would be really helpful
I have implemented code, works perfectly, Thanks for that, I have question, it is just providing answer from context, Is gpt_index is all about context ? can not get general information like, who is trump or any other information? if there is way please help me with in this code
Hi, thanks for the great video. I'm a beginner. How do I proceed now that gpt_index has been replaced by llama_index? I'm getting the error "No module named 'gpt_index'
Hello, this tutorial is really amazing but i was trying to replicate the same work but got an error when executing the function createVectorIndex saying that the function __init__() in GPTSimpleVectorIndex got an unexpected keyword argument 'documents' did anyone get the same issue ?
Hi Shweta, Very useful tutorial and I followed the same but m always getting the response None for any question, However I kept the document at same place where you have mentioned. Could you please help me out, what could be the reason for the same.
Hello and thanks for the video. Very useful. Is it possible to use a sphinx generated website (essentially a handful of html pages) as the resource documents instead of a plan txt file?
@@shweta-lodha Thank you for the reply. Would you have any latest video on same topic? Or would you know if someone have created on this topic in simpler way like you had done here.
Thank you for this wonderful video🎉. I have a question when I was trying it. I run the codes to create vector index,but I cannot find it in the OSDisk. There is no error😂
Hi Shweta, this tutorial is amazing! I have one question after running the bot, on my OpenAI usage I am getting text-davinci requests as well as text-embedding-ada-002-v2. Any thoughts on why I am getting the davinci requests?
@@shweta-lodha Thank you for your reply. My usage looks like this does this make more sense? 10:55 PM Local time: Apr 26, 2023, 8:55 AM text-davinci, 2 requests 3,805 prompt + 75 completion = 3,880 tokens 10:55 PM Local time: Apr 26, 2023, 8:55 AM text-embedding-ada-002-v2, 1 request 8 prompt + 0 completion = 8 tokens
Hi Shweta, amazing job, hope you can help me: when running vectorIndex = create_index('Knowledge'), I get an error message: Output exceeds the size limit. Open the full output data in a text editor. maybe you know why?
Thank you Swetha for this easy and well-formed tutorial! Is there a way to connect this bot to a WordPress site? I have created a custom bot explained in this tutorial which uses a dataset from a knowledgebase site. Now I'd like to connect that chatbot to that site so that users can directly communicate there, in addition, to seeing articles in a standard way.
Hello, Shweta Lodha! Great job. I really love your videos. And I have a quick question: Do I have to have a paid GPT plan in order for the code to work?
@@shweta-lodha Thank you so much., Shweta. But I will get me a subscription today anyhow. I don't want to ran out of credits. With Love from Tijuana. You are great!
HI Shweta Lodha! First... Thanks a lot for the tutorial! :) I have a problem with an error. Can You help me? I wrote the same code and use the same data. But when I try to run the code, this error appear: TypeError: __init__() got an unexpected keyword argument 'documents'. I Printed the variable and it contains the book text... The error is on the line: vectorIndex = GPTSimpleVectorIndex(documents=docs,llm_predictor=llmPredictor,prompt_helper=prompt_helper) I looked for it on internet but I didn't understand why there's a problem with the arg "documents"... :/
I solved! :D For anyone that had the some problem I had, here is my function: def createVectorIndex(path): max_input = 1024 tokens = 256 chunk_size = 600 max_chunk_overlap = 20
#define LLM llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-ada-001")) #load data docs = SimpleDirectoryReader(path).load_data() service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512) vectorIndex = GPTSimpleVectorIndex.from_documents(docs, service_context=service_context) vectorIndex.save_to_disk('vectorIndex.json') return vectorIndex The problem is: I removed this line: prompt_helper = PromptHelper(max_input,tokens,max_chunk_overlap,chunk_size_limit=chunk_size) Should it be a problem? It worked here without it and with other texts... If it's a problem, could anyone explain the part of the code?
HI! This is a very helpful tutorial. I have a question: How would you bring this to a website? I mean, creating a chat website where you can ask questions and the bot answers based on your custom data. Is it possible? Thanks in advance :)
That's a great video thanks for sharing this. I have a question if you don't mind. How different is this method rather than using Open AI APIs for chat completion, embedding and completion? Thanks.
@@shweta-lodha Thanks a lot for a prompt reply. One last question. The method you explained under the hood it uses text-ada-002 model for embedding and text-davinci-003 for completion, right?
@Shweta, Awesome video and Thanks for sharing.. I have data in CSV which has many text columns and many rows. I want to build chat application based on the data. Can you please let me know how can i implement this .
hey, any tips on how to fine tune a model based on a very large pdf document without the " " to split prompt/resolution? I thought maybe have a script break down in every question mark? Or is there some other way?
could this learn from new unlabeled questions provided by the user ? Is this a trainable chatbot which could learn new things every time user asks a new question
Hi, I done it with sucess! really thank you! However its possible to start one chat as the last part of the video and save where it stop and come back in other moment?
madam, I am from India, after doing tons of research on ChatGPT API ,this is the best! I am working on Linux platform and have Jupyter notebook, will it work in that env?
Hi shweta, this is very helpful tutorial. I tried this code but after asking question, it is not responding...I waited for 15 minutes...still no response.
Hii need your help. I have followed exact same steps but facing below issue. I have given pip install gpt_index and got response same as shown in your video then when I am giving from gpt_index import SinpleDirectoryReader code then I am getting error Module not found “gpt_index” I tried uninstalling, re-installing and checking on chatgpt but failed for everything. Plz plz plz plz help
i got this error "TypeError: BaseGPTIndex.__init__() got an unexpected keyword argument 'documents'". dont know if i will get any assistance what the heck.
i am getting this error while running code: Output exceeds the size limit. Open the full output data in a text editor--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[17], line 1 ----> 1 vectorIndex = createVectorIndex('Chatbot') Cell In[14], line 16, in createVectorIndex(path) 13 docs = SimpleDirectoryReader(path).load_data() 15 #create vector index ---> 16 vectorIndex = GPTSimpleVectorIndex(documents=docs,llmPredictor=llmPredictor,prompt_helper=prompt_helper) 17 vectorIndex.save_to_disk('vectorIndex.json') 18 return vectorIndex
How to restrict the bot to only search for answers based on data provided or how to make it generate an error if the question asked is outside the available data
Post, llama-index replaced by gpt-index, I am getting the error during creation of vector index. Wondering if anyone else was also facing the same. INFO:openai:error_code=404 error_message='Resource not found' error_param=None error_type=None message='OpenAI API error received' stream_error=False
It is not mandatory to take text file. You can use other file types too provided you are able to read them and convert the text to vector. No need to put in directory, if it is single file. But in that case, you have to look for different function.
I am having issues installing gpt_index. I install it using pip install but when I do a pip list I see gpt-index (note the hyphen). I then cannot import gpt-index. Has anyone faced this problem? Thank you.
for this line vectorIndex = GPTSimpleVectorIndex(documents=docs,llm_predictor=llmPredictor,prompt_helper=promptHelper) I am getting the following error __init__() got an unexpected keyword argument 'documents'. Any tips?
@@shweta-lodha You are right. There appears to be some change in new version of gpt_index. This modified function code worked for me. Thanks for this amazing tutorial. was looking for something like this for a long time. def create_index(path): max_input = 4096 tokens = 200 chunk_size = 600 #for LLM, we need to define chunk size max_chunk_overlap = 20
#define LLM - there could be many models we can use, but in this example, let’s go with OpenAI model llmPredictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-ada-001",max_tokens=tokens)) service_context = ServiceContext.from_defaults(llm_predictor=llmPredictor, prompt_helper=promptHelper) #load data - it will take all the .txtx files, if there are more than 1 docs = SimpleDirectoryReader(path).load_data() #create vector index vectorIndex = GPTSimpleVectorIndex.from_documents(documents=docs,service_context=service_context) vectorIndex.save_to_disk('vectorIndex.json') return vectorIndex
This tutorial truly stands out from the rest! After struggling with coding along other RUclips tutorials on GPT-index and langchain, I finally stumbled upon this gem, and behold, my code worked like a charm. Thanks for sharing your expertise and making it easy to follow. You're a lifesaver!
Thank you, Shweta. I was able to get something working based on your code. Please note that lots of libraries are now outdated (gpt_index is now llama_index). My complete code (works 100%!):
from langchain import OpenAI
import sys
import os
from llama_index import SimpleDirectoryReader,GPTListIndex,GPTVectorStoreIndex,LLMPredictor,PromptHelper,ServiceContext
from llama_index import StorageContext, load_index_from_storage
def create_index(path):
max_input = 4096
tokens = 200
chunk_size = 600 #for LLM, we need to define chunk size
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input,tokens,max_chunk_overlap,chunk_size_limit=chunk_size) #define prompt
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-ada-001",max_tokens=tokens)) #define LLM
docs = SimpleDirectoryReader(path).load_data() #load data
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
vectorIndex = GPTVectorStoreIndex.from_documents(
docs, service_context=service_context
)
vectorIndex.storage_context.persist(persist_dir="storage")
return vectorIndex
def answerMe():
storage_context = StorageContext.from_defaults(persist_dir="storage")
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()
q = "What is the best plan?"
print(q)
print('------------')
response = query_engine.query(q)
print(response)
#create_index('data')
answerMe()
Thanks Gleb. I have created few videos on how to fix these broken ones and few more are on the way. Please feel free to let me know, if anything else is not working. I will try to provide solutions at my earliest convenience. Thanks once again.
Great bro... It was helpful to test it out.
@@shweta-lodha Currently I am getting RateLimitError with whatever account I try. Is there a way I can bypass this and test out
With the above code I have tried as well mentioned by Gleb.
As someone just learned how to use GPT-4o Mini to write python code, your explanations are well done.
Please cover UI part as well in one of the coming video. Thank you so much in advance.
Question about execution flow: Hi Sweta - Thanks for a very helpful tutorial. Trying to wrap my head around the execution flow. For example, createvectorIndex probably sends the document to openAI servers, the servers return embeddings that are then stored locally. During answerMe - is the prompt sent back to openAI for embeddings? Does answerMe again send document embeddings back to openAI servers? How much is done in the local process (on PC) vs how much is done by openAI servers? Thanks.
One way to understand this is by disabling the internet connection once you have embedding stored on your local device and then try to call answerMe. Doing this will clear all your doubts 😊
Hii,
I am facing this error
RetryError: RetryError[] while calling answerMe function, can you please help me out .
Your instructions are really good, thank you so much Shweta Ji
Hi Shweta, Awesome video, great learning. Thanks. Just one concern - when we grant access to ChatGpt to our custom data stored in our machine, then is there a risk that the data can be copied/ used by users of ChatGPT or Open AI or otherwise?
Thank you Swetha. It can’t get any better than your explanation with show and tell 👍
thanks for your contribution, it's easy to understand for a beginner like me. You can continue to make the next video to guide the question and answer continuously like chatgpt.
Thank you so much for this! You made something very intimidating to me super easy to understand. I am very grateful for the time and effort your put into this video. Thanks a lot!
I noticed that in the implementation shown in the video, the GPT model is being called every time a user prompt is received, and this can be expensive in a real-world scenario where the application is serving multiple users concurrently. Each request to the GPT model requires a certain number of OpenAI tokens, and this can quickly add up and become expensive.
Indeed! For production scenarios, you have to extensively used vector database
Hi, thank you for the video! I have tried this code to answer questions based on data about a very specific product. It answers well, however, it answers general questions as well (such as "What is Earth?") having no information at all about this in the files provided. How can I make this code answer based on the information that I provided only?
Thanks for watching Mikhail. Give a try to my this video, it gives better results - Use Your Locally Stored Files To Get Response From GPT like ChatGPT | Python
Very good explanation. You speak very calm and make audience to follow all step in detail. Keep good work
Thank you. Is there a way to capture user inputs like contact details (email, name etc)? That would be really useful.
Ms. Shweta, your tutorials are superb!! Stands out first from the rest.
Thank you! Cheers!
Hi Shweta - I am planning to follow this tutorial, it looks amazing, I was wondering if you know if the local data we used is kept and is local only or if any of the data gets make public or goes back to OpenAI?
Thanks Maggy, glad you find it useful. Data will go to OpenAI servers
having data in a file and real time embeddings vs embeddings in a db for chatbot for an application (provides information about an application)?
Outstanding preparation and presentation. Thanks so much!
Glad you enjoyed it!
Hi Shweta, that was a great tutorial! However, I have a question. Just like you used the custom data from the local disk here, How can we use the data from aws/elasticsearch?. I have a huge database (About 20 million records) which our employees access via querying on elasticsearch, and If I wish to create a custom chatbot trained on that data, how would one achieve that?
Thank you so much Shweta! You got me so passionate on the topic. After completing your tutorial how do I move this into a chatbot that I can bring to my app? I’m stuck
You need web UI
@@shweta-lodha can you drop a tutorial on that because that would be very helpful
Great Video, you are inspiring me to start learning how to code. I am doing this as a project to see if I can follow along withyou. Thank you.
Best of luck!
hi shweta, does this take structured data like csv or excel for manipulate any data?
hi Shweta, i tried this code on my end but in { vectorIndex.save_to_disk('vectorIndex.json') } is giving me an error so i tried { index.storage_context.persist('vectorIndex') } this is not creating a vectorIndex.json file on my system its creating a folder name vectorIndex.json inside which i has 4 json file docsstore, graph_store, index_store and vector_store.. can you pls tell me where i am going wrong
@shweta-lodha, wonderful video and article. Thank You. Please keep up the good work. One qs: is there a way to implement the same solution using a web based solution - angular, react etc.
Thank you so much Naveen, glad you liked it. I was thinking to try this out but couldn't find API endpoints. Please let me know, if you come across any such documentation.
For me, this code was throwing an error. If you update the code by first importing ServiceContext from gpt_index (along with the other stuff you were already importing) and then update the code by adding the following lines,
'-----#load data
docs = SimpleDirectoryReader(path).load_data()
service_context=ServiceContext.from_defaults(llm_predictor=llmPredictor,prompt_helper=PromptHelper)
#create vector index
vectorIndex=GPTSimpleVectorIndex.from_documents(documents=docs,service_context=service_context) ---
it should solve your issue.
P.S: I used this code from another one of your videos :))
Please refer my How-To-Fix video. All this happened because of breaking changes in API 😊
A tutor that is looking for the result of a func without calling the func :))))))!!!!!
Haha 😃
ImportError Traceback (most recent call last)
Cell In[29], line 1
----> 1 from gpt_index import SimpleDirectoryReader, GPTListIndex, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
2 from langchain import OpenAI
3 import sys
What should I do for this error???
Please refer my how to fix video entitled as breaking changes. Things are broken because of API enhancements
I have not found such a detailed explanation of Open AI based Chat bot, Thankyou for Sharing.
Can you please make a video about how i can feed my NLP based ML model to create a Chat bot based on ChatGPT.
Yes, soon
This looks like a brilliant tutorial thank you. Please excuse the silly question, but where are you editing your code, I opened IDLE but it must be some other editor/console? Thank you again.
I am using Visual Studio Code (VS Code). You can install it as it very easy to use. Please make sure to install Python and Jupyter extension from Extensions panel.
Amazingly helpful, thank you!
Very good tutorial, one question if have some logs which contains sensitive data like ip etc, if I feed the logs to the llm will my data move out of my system to the llm model provider's server. If so is there a security threat. I have a presentation to make on using llm to my company and for sure this question will rise, any feedback will be appreciated
Thanks. Yes, it will go to model provider’s server and it could be a security threat. If privacy is your concern, then my suggestion would be to use Azure OpenAI
Hello
Shweta Lodha, I have a couple of questions. Firstly, do you have any ideas on how we can reduce the cost of tokens? It would be really helpful if you could make a video explaining any potential solutions. Additionally, as a learner, I'm facing issues with expired APIs. Do you have any suggestions on how I can resolve this problem? Thank you in advance for your help.
I have already created a video on cost factor, please check my playlist. I didn’t understand- what do you mean by expired API
@Shweta this is brilliant! as usual top notch like any of your other videos, thank you so much for taking time doing this. One question here, can you please provide the differences in the parameter if the same had to be achieved through Azure Open AI with openai.api_base, openai.api_type, openai.api_version, deployment_name - I have been trying to meddle with this but no luck so far..
Sure, I'll cover this in my Azure OpenAI series, which I started recently :)
Hi Shweta, You did a great job. Can you please help me how to set that environment on which your are doing code.
Sure. Drop me an email and we will connect
What is the token for? Is it limited to ask the bot? Is there a way it can be unlimited?
This is really useful, Shweta! Thank you so much for making such an awesome content.
Glad you think so!
there is no longer GPTKeywordVectorIndex. There is GPTSImpleKeywordTableIndex()
are they the same?
Thanks a lot great video , just want to know if all my data is stored in sharepoint how can i take them all into the same mechanizm u did thanks again for this great tutorial
Thankyou for this detailed tutorial, so you asked 2 question from it, can you please share how much openai credits does it used in this complete operation?
that would be really helpful
It should not be that much. I'm uncertain about the exact number as I sent multiple requests around same time frame :(
I have implemented code, works perfectly, Thanks for that, I have question, it is just providing answer from context, Is gpt_index is all about context ? can not get general information like, who is trump or any other information? if there is way please help me with in this code
This video is about how to get answers based on context. If you want general info, then you can simply achieve that using OpenAI
Hi, thanks for the great video. I'm a beginner. How do I proceed now that gpt_index has been replaced by llama_index? I'm getting the error "No module named 'gpt_index'
Yes, there are changes. Please refer my How-To-Fix video, it has the fix.
Hello, this tutorial is really amazing but i was trying to replicate the same work but got an error when executing the function createVectorIndex saying that the function __init__() in GPTSimpleVectorIndex got an unexpected keyword argument 'documents' did anyone get the same issue ?
Please check my How-To-Fix video
Hi Shweta, Very useful tutorial and I followed the same but m always getting the response None for any question, However I kept the document at same place where you have mentioned. Could you please help me out, what could be the reason for the same.
Please give complete/absolute path and try it. If still it doesn’t work, then issue is not with the input file
Hello and thanks for the video. Very useful. Is it possible to use a sphinx generated website (essentially a handful of html pages) as the resource documents instead of a plan txt file?
Hi Mam,
Does this work today also?
Gpt_index does not have any function?
Perhaps not as lot many things have changed recently, in terms of SDK and API. Lot many functions are renamed and moved here and there.
@@shweta-lodha Thank you for the reply. Would you have any latest video on same topic? Or would you know if someone have created on this topic in simpler way like you had done here.
Any plans for coming up with part 2 for this with custom data that needs to be continuously indexed in the background?
Will plan soon. Thanks Prudhvi for the pointer :)
Very good video thank you. I must have blinked and it was over! Which specific function uses langchain?
LLMPredictor
Thank you for this wonderful video🎉. I have a question when I was trying it. I run the codes to create vector index,but I cannot find it in the OSDisk. There is no error😂
If you didn't provide complete path, then it must be in your current directory. Current directory is the one, from where you are running your script.
hi shweta
your tutorial is really helpful
and your knowledge on open Ai. i have also emailed you please reply
I'll try to respond at the earliest.
Very nicely presented its a marvel
Thanks!
Hi Shweta, this tutorial is amazing! I have one question after running the bot, on my OpenAI usage I am getting text-davinci requests as well as text-embedding-ada-002-v2. Any thoughts on why I am getting the davinci requests?
I didn’t understand your question completely. Your embeddings would be using text-embedding-Ada-002-v2 model. Don’t you want to use this?
@@shweta-lodha Thank you for your reply. My usage looks like this does this make more sense?
10:55 PM Local time: Apr 26, 2023, 8:55 AM
text-davinci, 2 requests
3,805 prompt + 75 completion = 3,880 tokens
10:55 PM Local time: Apr 26, 2023, 8:55 AM
text-embedding-ada-002-v2, 1 request
8 prompt + 0 completion = 8 tokens
The bot is also able to answer questions about topics that I have not fed it. I think that it is accessing other sources than what I have fed it.
Hi Shweta, amazing job, hope you can help me: when running vectorIndex = create_index('Knowledge'), I get an error message: Output exceeds the size limit. Open the full output data in a text editor. maybe you know why?
Hope 'Knowledge' directory exists in the same path, from where you are running your script.
Thank you Swetha for this easy and well-formed tutorial! Is there a way to connect this bot to a WordPress site? I have created a custom bot explained in this tutorial which uses a dataset from a knowledgebase site. Now I'd like to connect that chatbot to that site so that users can directly communicate there, in addition, to seeing articles in a standard way.
Thanks and glad you liked it. Did you try plug-in?
Hello, Shweta Lodha! Great job. I really love your videos. And I have a quick question: Do I have to have a paid GPT plan in order for the code to work?
No necessary. It would work with your free account too, assuming you have required credits.
@@shweta-lodha Thank you so much., Shweta. But I will get me a subscription today anyhow. I don't want to ran out of credits. With Love from Tijuana. You are great!
HI Shweta Lodha! First... Thanks a lot for the tutorial! :)
I have a problem with an error. Can You help me?
I wrote the same code and use the same data. But when I try to run the code, this error appear:
TypeError: __init__() got an unexpected keyword argument 'documents'.
I Printed the variable and it contains the book text... The error is on the line:
vectorIndex = GPTSimpleVectorIndex(documents=docs,llm_predictor=llmPredictor,prompt_helper=prompt_helper)
I looked for it on internet but I didn't understand why there's a problem with the arg "documents"...
:/
I solved! :D
For anyone that had the some problem I had, here is my function:
def createVectorIndex(path):
max_input = 1024
tokens = 256
chunk_size = 600
max_chunk_overlap = 20
#define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-ada-001"))
#load data
docs = SimpleDirectoryReader(path).load_data()
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)
vectorIndex = GPTSimpleVectorIndex.from_documents(docs, service_context=service_context)
vectorIndex.save_to_disk('vectorIndex.json')
return vectorIndex
The problem is: I removed this line:
prompt_helper = PromptHelper(max_input,tokens,max_chunk_overlap,chunk_size_limit=chunk_size)
Should it be a problem?
It worked here without it and with other texts...
If it's a problem, could anyone explain the part of the code?
Please have a look at documentation, in case something has changed. At the time of this video, it was all good.
You rock! I just created a video on how to fix this :)
appreciate for your work, just wonder if this method works for extracting specific number ? like i want to extract some price for mutilple products
Yes, I tried this and in most of the cases, I got accurate output.
@@shweta-lodha thank u so much for reply
also getting this as well
Output exceeds the size limit. Open the full output data in a text editor
This is because you have restricted the output length for Jupyter. You can open it in notepad to see full error. Not a big deal 👍🏻
HI! This is a very helpful tutorial. I have a question:
How would you bring this to a website? I mean, creating a chat website where you can ask questions and the bot answers based on your custom data. Is it possible?
Thanks in advance :)
Glad you find it useful. Either you can create a website or you can create a widget and plug it onto your website
That's a great video thanks for sharing this. I have a question if you don't mind. How different is this method rather than using Open AI APIs for chat completion, embedding and completion? Thanks.
Here you can save the vectored index locally and re-read it.
@@shweta-lodha Thanks a lot for a prompt reply. One last question. The method you explained under the hood it uses text-ada-002 model for embedding and text-davinci-003 for completion, right?
Yes, you're right :)
@@shweta-lodha thanks for the response and clarification 😁
@Shweta, Awesome video and Thanks for sharing.. I have data in CSV which has many text columns and many rows. I want to build chat application based on the data. Can you please let me know how can i implement this .
You can read CSV in memory, chunk it and you’re good to go
hey, any tips on how to fine tune a model based on a very large pdf document without the "
" to split prompt/resolution? I thought maybe have a script break down in every question mark? Or is there some other way?
I'm going to publish a video on this today. Stay tuned!
Thanks a lot . Can we use llama_index instead of gpt_index ?
Yes, you have to use as gpt_index is deprecated. You can refer my GPT_Index breaking changes video for that.
could this learn from new unlabeled questions provided by the user ? Is this a trainable chatbot which could learn new things every time user asks a new question
No
Thank you very much. I really hope you will have more useful videos like that.
Thank you, I will
Hi shweta please tell which extensions you have used in VS code for this video
Jupyter
Amazing tutorial, thank you!!
Hi, I done it with sucess! really thank you! However its possible to start one chat as the last part of the video and save where it stop and come back in other moment?
Glad you find it useful. You need to save your chat history and refer it whenever you're starting your conversation next time.
awesome !! very inspiring, thanks a lot for your work !
Thank you for this great video!
madam, I am from India, after doing tons of research on ChatGPT API ,this is the best! I am working on Linux platform and have Jupyter notebook, will it work in that env?
Yes, it should.
Exactly what I need! thank you
Hi @shweta, can this bot integrate with any database..?
Hi shweta, this is very helpful tutorial.
I tried this code but after asking question, it is not responding...I waited for 15 minutes...still no response.
Are you using VS Code? If so, please check your terminal, your py file or command box
wonderful. you made it look so simple.
Thank you! Cheers! If you can't make things simple, it means you yourself didn't understand :)
Thank you for the tutorial, it was great! How would I deploy this app to the internet onto my own custom domain?
You need to create web app for that 😊
Hii need your help. I have followed exact same steps but facing below issue.
I have given pip install gpt_index and got response same as shown in your video then when I am giving from gpt_index import SinpleDirectoryReader code then I am getting error Module not found “gpt_index”
I tried uninstalling, re-installing and checking on chatgpt but failed for everything.
Plz plz plz plz help
Please check my most recent How-To-Fix video. It contains the solution
i got this error "TypeError: BaseGPTIndex.__init__() got an unexpected keyword argument 'documents'". dont know if i will get any assistance what the heck.
Please refer my How-To-Fix video
Thank you. Very useful.
i am getting this error while running code:
Output exceeds the size limit. Open the full output data in a text editor---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[17], line 1
----> 1 vectorIndex = createVectorIndex('Chatbot')
Cell In[14], line 16, in createVectorIndex(path)
13 docs = SimpleDirectoryReader(path).load_data()
15 #create vector index
---> 16 vectorIndex = GPTSimpleVectorIndex(documents=docs,llmPredictor=llmPredictor,prompt_helper=prompt_helper)
17 vectorIndex.save_to_disk('vectorIndex.json')
18 return vectorIndex
APIs have changed a bit, since I published this video. Please refer the updated documentation, I am sure, it would be easy to fix.
This was awesome, thank you so much!
Glad you enjoyed it!
getting this error on line PromptHelper():
chunk_overlap_ratio must be a float between - and 1
Yes, you need to change this between 0 and 1 due to recent API changes
Is my own data imported to openai system in this case? I don't want to breach the company's data confidentiality code.
Yes
I appreciate your video very much!
Glad it was helpful! Stay tuned for more...
How to restrict the bot to only search for answers based on data provided or how to make it generate an error if the question asked is outside the available data
For this, you need to tweak your prompt.
Post, llama-index replaced by gpt-index, I am getting the error during creation of vector index. Wondering if anyone else was also facing the same.
INFO:openai:error_code=404 error_message='Resource not found' error_param=None error_type=None message='OpenAI API error received' stream_error=False
Please check out my How-To-Fix videos. Perhaps they can help you 😊
Hi, just want to ask if the data that I will feed will be exposed to public or to openAI? Is this safe for business? Thank you.
IT would be exposed to OpenAI. If you are concerned about security part, I would recommend you to check Azure OpenAI.
Does it have to be a txt file? What if I have a csv data table? Also does it have to be in a directory?
Do you think I should have the script convert the csv to a text file first?
It is not mandatory to take text file. You can use other file types too provided you are able to read them and convert the text to vector. No need to put in directory, if it is single file. But in that case, you have to look for different function.
Hi thank you for a great video. Is there a way we can combine the code and add gradio ui ?
Yes, you can.
How do we load multiple data, and will the script and OpenAI remember the prior conversation?
For memory, you need to change this implementation a bit
ValueError: One of documents or index_struct must be provided. what i need to do with this error?
Looking like it is not able to generate JSON properly. Please validate your docs
Hey so im wondering How I can create a interface once I have made my language model.
You can go for flask
thanks for the amazin video, can it read podf file instead of txt?
Yes you can! But there would be different class/API/function to do so.
Hi, it seems I can’t find VectorSimpleIndex, has it been replaced by VectorStoreIndex?
Please check my latest video: How to Fix[GPT-Index]: Fixing GPT-Index Related Broken Pieces
@@shweta-lodha because it's llama_index right?
Absolutely!
I am a beginner
When using gpt chat, do I have to get the API from it, or can I get it from anywhere?
Please get it from Openai.com
I am having issues installing gpt_index. I install it using pip install but when I do a pip list I see gpt-index (note the hyphen). I then cannot import gpt-index. Has anyone faced this problem? Thank you.
GptIndex doesn’t exist anymore, it has been renamed. Please check my how to fix video on breaking changes of gptIndex
@@shweta-lodha Thanks Shweta. Just saw the other video. Much appreciated.
Shweta, please note that llama_index does not have GPTSimpleVectorIndex
It has been changed to GPTVectorStoreIndex.
Can you explain how you setup your IDE here?
I am using VS Code and installed extensions for python and Jupyter
Can u help in making similar bot for providing solution to accounts questions in step by step manner accurately!!!
100% accuracy can’t be guaranteed, this is AI 😊
is possible to creat a simple page with a search page that request for the prompts?
Yes, it can be done
Are there any other vector index options, I'm not getting the expected results.
There are few vector databases you can use like pinecone, redis, etc
for this line
vectorIndex = GPTSimpleVectorIndex(documents=docs,llm_predictor=llmPredictor,prompt_helper=promptHelper)
I am getting the following error
__init__() got an unexpected keyword argument 'documents'. Any tips?
Please have a look at documentation, in case something has changed. At the time of this video, it was all good.
@@shweta-lodha
You are right. There appears to be some change in new version of gpt_index. This modified function code worked for me. Thanks for this amazing tutorial. was looking for something like this for a long time.
def create_index(path):
max_input = 4096
tokens = 200
chunk_size = 600 #for LLM, we need to define chunk size
max_chunk_overlap = 20
#define prompt
promptHelper = PromptHelper(max_input,tokens,max_chunk_overlap,chunk_size_limit=chunk_size)
#define LLM - there could be many models we can use, but in this example, let’s go with OpenAI model
llmPredictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-ada-001",max_tokens=tokens))
service_context = ServiceContext.from_defaults(llm_predictor=llmPredictor, prompt_helper=promptHelper)
#load data - it will take all the .txtx files, if there are more than 1
docs = SimpleDirectoryReader(path).load_data()
#create vector index
vectorIndex = GPTSimpleVectorIndex.from_documents(documents=docs,service_context=service_context)
vectorIndex.save_to_disk('vectorIndex.json')
return vectorIndex
Thanks Darth. Lot many people are getting this error. Hence I created a new video explaining this change. Cheers!