I have a quick query. I have my own retriever or table extractor since i am working on PDF files of papers which gives out CSV files. Can I directly then use tapas to generate the response needed? Or do I still need the retriever like the Mpnet model?
And kind of if i upload an csv from that the csv will be read in dataframe and we can perform sentence transformer and store it in a chromadb and retrieve the data according to our query from chromadb using cosine similarity and use tapas to respond to our query . Can you please upload video for this usecase.
@@temiwale88 yeah I'm here www.linkedin.com/in/jamescalam -- working on image/multi-modal ebook at the moment, it's actually completely free too ;) www.pinecone.io/learn/image-search/
Hello James, thank you for this video, do you think it’s gonna work if we have only tables with numbers ? Like an accounting report or financial report. If yes, Should we specify the column names or something to help the model to find answers since there is no text ?
So, how do we use this for our own proprietary tables in a database? I mean, how do you create vector embeddings of tables in vector database, Thank you in advance.
Hi James, great video. I have several tables with 20,000+ records each which I'd like to ask questions to. Is this design suitable if you have a small number of large tables?
Hi James, Thanks for sharing. I have been interested in the table Q&A use case and using it for bank statements and utility bills. Can you recommend a table-to-text model that I could use to extend this example? Cheers! Chris
I'm sorry, I forgot the most crucial part of my question. Once I get the answer from TAPAS, I want to generate text to answer the question. Do you have any recommendations for NLG models that could help with this task?
I have tried OpenAI text-davinci-003 to parse the NL Query and it gave much better results than TAPAS which is bound by 512 length limitation using ALL the excel table as a context.
Hey James! I love this video. Could I take the answer at the end and leverage GPT3 API with a preface for it to read the function and question? The goal would be to make the answer come out more naturally for our business users.
Great video! I was wondering what if we have a single large table and perform query on that. Do we have to further fine tune TAPAS model? Please make a video on that...
@@jamesbriggs As I am using a large csv file about 1000 rows, it throw the error such as "IndexError: index out of range in self". It would be great if you could provide me with an appropriate solution
Thanks for the informative video. I am working on pdfs with table where we are applying table detection using image processing followed by table-qa. 1.Have we able to use other way to hav table inside pdfs and hav the .csv format of the table. Later, table qa can be used. 2. Can you able to suggest some inputs?.
hey, you should be able to extract tables from PDF with libraries like PyPDF or PyTesseract, or if you're willing to pay I'd definitely recommend Abby OCR Finereader. From there you should be able to reformat the tables to CSV and then you can follow the same process we did here. For (2) I think you mean can you adjust the tables based on what table QA is saying? In that case, yes I'm sure there's a way, it would just require some additional logic on top of what is already there
@@jamesbriggs Thanks for the replies. I hav tried pypdf or camlet libraries for table extraction from the pdf but i feel table rows are not properly detected. So, I am involved in image processinv for table row detection followed by tesseactocr.I beluve this is the only way. If you know others kindly suggest.
thank you so much.
If it weren't for your videos, my grad school life would've been toast
Can you post the link to your jupyter notebook?
I have a quick query.
I have my own retriever or table extractor since i am working on PDF files of papers which gives out CSV files.
Can I directly then use tapas to generate the response needed? Or do I still need the retriever like the Mpnet model?
And kind of if i upload an csv from that the csv will be read in dataframe and we can perform sentence transformer and store it in a chromadb and retrieve the data according to our query from chromadb using cosine similarity and use tapas to respond to our query . Can you please upload video for this usecase.
James - you keep putting out bangers bro. Thank you!!
thanks man 🙏
@@jamesbriggs thank you! Are you on LinkedIn? Also, any new courses or books coming out. Take my money!
@@temiwale88 yeah I'm here www.linkedin.com/in/jamescalam -- working on image/multi-modal ebook at the moment, it's actually completely free too ;) www.pinecone.io/learn/image-search/
Can we use Tapas --> as retriever model as well ??
James can you perform the same in chromadb because its open source compared to pinecone
Hello James, thank you for this video, do you think it’s gonna work if we have only tables with numbers ? Like an accounting report or financial report. If yes, Should we specify the column names or something to help the model to find answers since there is no text ?
Can this answer from relational tables (multiple related tables), instead of individual tables one at a time?
This is the best content on youtube now 👍
So, how do we use this for our own proprietary tables in a database? I mean, how do you create vector embeddings of tables in vector database, Thank you in advance.
Quite surprising there is no explanation on the new format 'bodegas'!
As always you are great tutor and brother, God bless you and thank you for your time and help 🙏
Any time, thankyou!
Just curious, can you ask language models to do more complex calculations? Like find regressions and variable selections?
Thank you! Just gold gems since day 1
Great video again! Say we have a ton of Excel/csv files and we want to extract all the sensitive information. Would this be a good solution?
Hi James, great video. I have several tables with 20,000+ records each which I'd like to ask questions to. Is this design suitable if you have a small number of large tables?
Hi James,
Thanks for sharing. I have been interested in the table Q&A use case and using it for bank statements and utility bills.
Can you recommend a table-to-text model that I could use to extend this example?
Cheers!
Chris
Hi Chris, I'd actually recommend trying with TAPAS, if you can format the bank statements and bills as a typical table I think it should work
I'm sorry, I forgot the most crucial part of my question. Once I get the answer from TAPAS, I want to generate text to answer the question. Do you have any recommendations for NLG models that could help with this task?
I have tried OpenAI text-davinci-003 to parse the NL Query and it gave much better results than TAPAS which is bound by 512 length limitation using ALL the excel table as a context.
Could give some more context
Nice, but you failed to state that you are affiliated with Pinecone.
Hey James! I love this video. Could I take the answer at the end and leverage GPT3 API with a preface for it to read the function and question? The goal would be to make the answer come out more naturally for our business users.
you might be able to feed it into gpt3 directly, but I haven't tested this
What is alternative to Pinecone ?
faiss or weaviate, but they require more engineering effort and don't offer the 5M vector free plan that Pinecone does
@@jamesbriggsThank you. Looking foe in-premise database software. which I can install my software
@@sanatan_yogi_org Did you find any good alternative to pinecone which can work as on premise db software solution
Great video! I was wondering what if we have a single large table and perform query on that. Do we have to further fine tune TAPAS model? Please make a video on that...
No need, you can just use the single TAPAS reader step :)
@@jamesbriggs As I am using a large csv file about 1000 rows, it throw the error such as "IndexError: index out of range in self". It would be great if you could provide me with an appropriate solution
Thanks for the informative video. I am working on pdfs with table where we are applying table detection using image processing followed by table-qa. 1.Have we able to use other way to hav table inside pdfs and hav the .csv format of the table. Later, table qa can be used. 2. Can you able to suggest some inputs?.
hey, you should be able to extract tables from PDF with libraries like PyPDF or PyTesseract, or if you're willing to pay I'd definitely recommend Abby OCR Finereader. From there you should be able to reformat the tables to CSV and then you can follow the same process we did here.
For (2) I think you mean can you adjust the tables based on what table QA is saying? In that case, yes I'm sure there's a way, it would just require some additional logic on top of what is already there
@@jamesbriggs Thanks for the replies. I hav tried pypdf or camlet libraries for table extraction from the pdf but i feel table rows are not properly detected. So, I am involved in image processinv for table row detection followed by tesseactocr.I beluve this is the only way. If you know others kindly suggest.
Love for you to share the notebook
Hi you can find it here github.com/pinecone-io/examples/blob/master/search/question-answering/table-qa.ipynb :)
Thanks James.
very cool video!
thanks!