Wonderfully straight forward explanations. A question. I want to recursively vector embed an entire hard drive . All of my files and as I add files have it appended to the vector embedding and vice versa, remove documents that have been removed. This may be outside the scope of what you are proposing, however you look like you know what you are doing. Obviously pdf files can be large and image based , so a pdf parser / ocr aspect must be included. Have you attempted this?
Anyone else having a problem installing the dependencies stated at 5:29 in the video. When attempting to add the unstructured and sentence_transformer I am getting the same error: "Cannot install build-system.requires for scipy."
Yes, its definitely possible. I just did it like that to avoid having a video that is too long and complicated to digest. One of the ways is to have two folders, one for unprocessed documents and another one for processed documents. You just move the files once they are processed. You can even use aws s3 buckets if you want to be able to upload them in the cloud. You will also need a database to store the indexes in the vector store in case you want to implement a feature to remove files.
@@DonaAIThank you so much. I really appreciate your prompt response. As a person with little experience in coding, your video has been extremely helpful and enlightening! That said, a relevant novice question would be: Can I directly use the existing vectorstore file to add new docs? I was worried that it would overwrite the old file with only the new docs? Thanks!
Hey Dona thanks for the content one question can we use it directly on the local system as I have read that for running falcon 7b there is some specific hardware requirement.
Hey @donatien, I am getting this error "IndexError: list index out of range" while running the code. the code is exactly the same, only difference is the pdf file i have loaded. can you pls help in pointing, what could be causing this
One of the reason might be because of the unstructured package. How did you import it? Did you use poetry or pip? Be sure to have unstructured[all-docs] and not just unstructured
Hi@@DonaAI I am facing the same issue and I have installed the unstructured package exactly the same way you did with all docs. I tried with the PDF files as well as word docs. Same error "nltk_data] Error loading punkt: "
How much RAM do you need to run this type of models locally? I am trying to make something for 20+ users who might be asking some question at the same time
Hello Donatien! I'm also loving the content. Keep up the great work. :)
The quality of your videos are getting better each time!
Absolutely brilliant. Your video is packed with value all for free. Amazing. Can't wait for your next one
Thanks!
Thank you @shintarookamoto3289! Really appreciated it!
Exactly what I am looking for, thank you!
Glad I could help!
So So brilliant!! Love this
Wonderfully straight forward explanations. A question. I want to recursively vector embed an entire hard drive . All of my files and as I add files have it appended to the vector embedding and vice versa, remove documents that have been removed.
This may be outside the scope of what you are proposing, however you look like you know what you are doing.
Obviously pdf files can be large and image based , so a pdf parser / ocr aspect must be included. Have you attempted this?
Anyone else having a problem installing the dependencies stated at 5:29 in the video. When attempting to add the unstructured and sentence_transformer I am getting the same error: "Cannot install build-system.requires for scipy."
Hi Donatien, excellent video! Can you share how the results would be different from creating your own GPT with Open AI?
Thank you for the great video. I guess it is possible to add new docs to an existing vectorstore. Could you elaborate on this? Thanks!
Yes, its definitely possible. I just did it like that to avoid having a video that is too long and complicated to digest.
One of the ways is to have two folders, one for unprocessed documents and another one for processed documents. You just move the files once they are processed. You can even use aws s3 buckets if you want to be able to upload them in the cloud.
You will also need a database to store the indexes in the vector store in case you want to implement a feature to remove files.
@@DonaAIThank you so much. I really appreciate your prompt response. As a person with little experience in coding, your video has been extremely helpful and enlightening! That said, a relevant novice question would be: Can I directly use the existing vectorstore file to add new docs? I was worried that it would overwrite the old file with only the new docs? Thanks!
Hey Dona thanks for the content one question can we use it directly on the local system as I have read that for running falcon 7b there is some specific hardware requirement.
Yes, I think you need at least 16gb of ram. I am using a M1 Macbook with 16gb of ram and basic graphic card and it works pretty smoothly.
Very helpful. Thanks!
@donatien Thorez can you use GPT4.0 instead of GPT3.5 and will this remove the 4000 credit limit?
I think so yes
@@DonaAI hi, where are u using gpt4.0 in this tutorial?
Hey @donatien, I am getting this error "IndexError: list index out of range" while running the code. the code is exactly the same, only difference is the pdf file i have loaded. can you pls help in pointing, what could be causing this
One of the reason might be because of the unstructured package. How did you import it? Did you use poetry or pip?
Be sure to have unstructured[all-docs] and not just unstructured
Hi@@DonaAI
I am facing the same issue and I have installed the unstructured package exactly the same way you did with all docs. I tried with the PDF files as well as word docs. Same error
"nltk_data] Error loading punkt: "
How much RAM do you need to run this type of models locally?
I am trying to make something for 20+ users who might be asking some question at the same time
Can read scanned pdf ?
No but you can use gpt4 vision to analyse the pdf and transform it to text and then pass it and do the chunking