8. OpenAI Financial Advisor Q&A Embeddings - Python Tutorial
HTML-код
- Опубликовано: 23 сен 2024
- Like the video? Support my content by checking out Interactive Brokers using the link below:
www.interactiv...
In this video, we transcribe a financial podcast using Whisper and use OpenAI Word Embeddings on the transcript to create a question answering system. If you like this type of content, I am starting a spinoff channel this year focused on AI in music, gaming, and design at / @parttimeai , so please subscribe, content coming there soon.
Notebook: colab.research...
Question Sheet (Raw): docs.google.co...
Question Sheet (Transformed): docs.google.co...
Like the video? Support my content by checking out Interactive Brokers using the link below:
www.interactivebrokers.com/mkt/?src=ptlPY1&url=%2Fen%2Findex.php%3Ff%3D1338
Notebook: colab.research.google.com/drive/1cVQNg2-zGQb7qZXFECG6kyq5yVIHyf5o?usp=sharing
Question Sheet (Raw): docs.google.com/spreadsheets/d/1z6DVJPU1DS4J0OhsPkauRPu_2-HfiPjHix1hpaKTBzE/edit?usp=sharing
Question Sheet (Transformed): docs.google.com/spreadsheets/d/13hTC5wV84-M7_nw_yC7LayRdLbwHAkm96moY6ea4qWU/edit?usp=sharing
hey Larry i have a question, wouldnt it be much faster and less GPU intensive (and maybe more accurate) if you instead of passing the whole video through whisper to transcribe, just get the text transcription directly? (Most videos already have transcriptions or are even auto generated using google's speech to text, which is pretty accurate) - Any reason you are preffering to go thru the whisper route vs just downloading the video's transcription directly? thanks!!
Thank you so much for this continent. I was wondering how to structure WAV files is there a module that I can use with open AI whisper that will allow me to have some sort of data frame, rather than just a string of words
This is by far the best openai tutorial on youtube, period. These codes provide us with endless potentials. Thanks again.
Holy crap! Larry this is amazing! Nicely done
Oh shit, you're here! How would you feel about having your voice cloned for science?
@@parttimelarry lol if this is how I find immortality let's do it ;) I definitely want to learn more about what you've created here
I am falling in love with these openAI series of yours. Kudos.
Thanks a bunch Larry! Bro you rock my man you’re a true gift from god please don’t stop 🙏
this is one of the best video, so far Larry, thank you
Hey Larry, I just found your yt channel today. I am building something like this bro you are my inspiration now. Respect ++
Wow, I need to watch this a couple more times before it sinks in, but can see some great uses for it. Thanks Larry , the way you are able to use all these different technologies and piece them together is exceptional. Thanks again!
Funny enough I was thinking of doing this yesterday night... woke up to see this video pop up in my notifications lol. Thank you!!
Very impressive, you have opened gates of ideas. Proud that how you have transformed your self and us as well from pandas based indicators to future of AI based decision matrix .
8:02 100% on the mark Larry. Great project as always.
I was looking everywhere for a video like this! about to use GPT3 for private information helping for search in different document. Thankl you !!
thanks for sharing all these beauties with us, trying to make the world a better place. Keep up your amazing work!
PTL.. you provide real value! Thanks
thanks for such an amazing video. This is very clever and I like your technique of combining whisper API with Embedding and Completion API. This is really a great insight. Thanks a ton.
Best video in the series! I like how you combined the different services and build something amazing. looking forward to the pipeline video! Keep up the good work!
Thanks for a so clear english speech. My ability to understand spoken english is a bit limited, by i can understand 99% of your video-tutorials without using captions! Indeed, your style to expose how to do these kind of projects is spectacularly easy to understand. You're a great teacher! Thanks for these tutorials about OpenAI. They are the best i have seen these weeks.
Thanks for watching, much more to come!
Indeed, Larry, have you test to make the part of "completions" with another model than davinci? I'm asking it just because price. You know.
Incredible. Thanks for the awesome educational content
Thank you Larry your a beast ✊
I've gotten as far as you show in this video.
Now I'm trying to figure out:
- splitting data into chunks to fit max tokens - openai has a great jpynb example for this.
- how to loop through all data in chunks to fully complete the results of a question.
- openai functions to improve consistency of output format.
PTL great video and series. You are a brilliant lad! Thank you.
Part Time Larry, I am a full-time fan!
I’m about to use the same process in my thesis project… when it’s done I’ll show you!! BTW I think you’ll be named in the bibliography!! Thanks again for the great content you continuously provide!!
Hi, how did it go?
@@megaedwin2363 still in progress!! Regarding this part of the project I’m wrapping everything up in a discord bot that will answer has a tutor in a online course… keep you posted on final results!!
@@marcysalty Great!!! Keep me posted
I love this content, this is very usefull thank you
Very nice! Can't wait for the next one.
Your lessons are incredible! Thanks for sharing
Great video!!! Thanks for awesome lessons
You are giving me a lot of ideas, love your efforts.
Love your lessons Larry. Thank you for your videos ❤
this was incredibly good. as someone just learning to code I was able to follow along. Thankyou so much for putting this together!
Amazing tutorial!! thank you so much, Now cant wait for your next part. Any ideas?
2nddddd!!! Letsss gooo made itt!
Missed your content, Larry! Hope to see more often. If you do some open source alternative to OpenAI at some point that will be great too.
BRILLIANT
It DOES dodge questions. I performed (independently) a Top 25 NBA players of all time analysis based on ... lots of stats, using gaussian distribution, skewness, kurtosis; accounting for career-length differences, utility value, post season, accomplishments, and outliers. In the end- MJ was the GOAT, Wilt at #2, LeBron at #3, and Kobe at #4. I asked ChatGPT to access basketball-reference OR wikipedia. It responded by saying that it DID have that ability, but then neglected to do so and analyze the statistical categories. It refused OVER and OVER saying that "subjective biases exist that can skew perception of this complicated question." I asked it to IGNORE that and just crunch the numbers. IT AGAIN refused to do so. WOW
Did you give ChatGPT thumbs down feedback so that hopefully OpenAI will fix it?
@@NickWindham Oh absolutely. I would be surprised if anything changes. I asked - dispassionately and objectively - over and over for it to "just focus on these categories." To which it replied about perceptions and biases. At this point, I think it is just a Google Copy and Paste ... thing. I don't see it doing anything more than that. No actual analysis or connecting the dots.
Awesome value !
Wow, this is insanely valuable, can't believe you are sharing it for free, thank you very much! 🙏One question though, you mentioned that you can work with OpenAI API on your personal confidential information, but if I am calling API for embeddings is not sending my information to OpenAI for vectorization outside boundaries of my organization?
aww🎉some
good vid
great content
Curious if you could share high level best practices for getting embeddings? Depending on the use case I'd imagine how you split up the text would be really important and also what are the technical requirements for the input (e.g. input mustn't have white spaces or line breaks?)... Thanks for all these tutorials, they're amazing!
Dam, Larry are you building these for companies yourself? Feels like your the tip of the spear here.
Thanks!
Thank you! This is very kind!
Great content Larry!! 👏💥 I'm really looking forward to seeing an open-source alternative to OpenAI for doing this kind of project. How sure are we that the content we provide to the AI model stays private? I mean, OpenAI has access to all content fed into their system regardless of whether it is embedded, or in chatGPT format.
This is really a great sharing! Thanks man!! One quick questions is that I noticed not every episode will have description in details where you will find when the specific questions are raised? how can you get the questions start time in such cases please? Just out of curious. Thanks again!
I wonder how long it actually took to build behind the scenes.
Hi Larry no longer works the latest Python openai library no longer contains embeddings_utils. So this breaks this.
I don't know if you can upload an update of this video or with another embeddings like for example with Azure.
I send you a greeting you have helped me a lot to motivate me to study a professional career, I hope to share a coffee.😀
A thought I have is how podcasters grow over time. Is there a way to weight recent content as more important than old content? While still maintaining all the content in the database
Thanks a lot for this really great content. It's quite hard for the novice but extremely interesting. One thing that would be really awesome is if you made the same embedding model for your own videos so that we can ask it aka AI-Larry questions about your how-to videos. For example, I have a series of podcasts that I would like to transcribe and embed and they don't have the perfect time stamps in the descriptions. How would I go about creating the Q&A CSV file for those episodes?
Thanks!
MW
Great content as always! Just one question: is there a following video about building the user web interface?
Great video! it truly blows my mind! One questions: How is this different from fine-tuning GPT model? Have you tried fine-tuning using the same dataset and compare the results? might be interesting to look into that.
Hi, I wanted to start working on your "Full Stack Trading App Tutorial", but I'm missing the lectures on your homepage!
Where is the old content of your homepage gone?
What's in the mug dude? Hahaha just kidding thanks for the video.
When is part 9 coming?!!!
Hi Larry no longer works the latest Python openai library no longer contains embeddings_utils. So this breaks this.
I don't know if you can upload an update of this video or with another embeddings like for example with Azure.
I send you a greeting you have helped me a lot to motivate me to study a professional career, I hope to share a coffee.
This is great! Can you guide me with what I would need to start as in my own server etc... Thanks
Love the videos! Side note... I was wondering if you could do a video on the advanced-trade-api I believe this is replacing Coinbase pro's api? Possibly in python? :)
I have made some videos on CCXT before, which supports many exchanges. It looks like there are some recent code merges for CCXT on Github that support the new Coinbase stuff. So it should just be a configuration option.
I would have thought you would compute the embedding for 'question' column and do a cosine similarity between that versus the embedded form of your question. Then sort it by similarity. And take the 'context' corresponding to the closest similarity.
Since you want to match question with question.
Many of the timestamps in the video are not full questions. There are many timestamps with 1 or 2 word titles like "Market sell-off", "Tax Strategy", etc, so I thought it made sense to check a combination of the question + the answer in case a user question was answered but wasn't directly contained in a timestamped question.
@@parttimelarry Maybe a third useful way, would be to add 2 more columns at the CSV of contexts: one to store an ABSTRACT done by davinci of the column CONTEXT (the transcribed audio between 2 time marks), and another column with the EMBEDDING of that ABSTRACT 😁
It probably would be useful to "have more clear" which row (Q&A) are closer to a new user question. And using these 2 "abstract embeddings" the request to davinci model will be quite more short and so quite more cheap. It would be needed to test it, of course. Maybe you would lose too much useful information to build the answers... who know.
Thoughts on how to do this without question and answer in the original format? Can we just feed in walls of text then extract question and answers from that?
You don't necessarily need questions in advance, but you need to divide up your text in a logical way. If you check the OpenAI cookbook, they have an example using Wikipedia articles and asking questions about the Olympics. In this case, they use the headings + paragraphs to divide the text and finding the section that is most relevant to the question.
When you have a large length of text, how to chunk them by sentences and fit within the max token? Also if we could have some overlaps like having the last sentence from previous chunk to be included in the next chunk, it will provide the model a better context.
There are some great tools that handle common patterns like this that I need to cover - Langchain and Llama-Index.
hey Larry, what would happen if I don't use the timestamp and just use the whole transcribed podcast as the source data? Would it just be more expensive and slower or would the resulting answer be different?
1st to like today 😀
why does it say Streaming data not found for the video. Unable to download.
even though the youtube video is available?
So do i need to know timestamps before i can do this?
or can i do like this:
is there a way to do this without knowing the timestamps of the questions/answers?
ChatGPT's knowledge cutoff is September 2021. Not 2019.
How can we be sure our internal data being fed into the model isn’t saved somewhere with openai?
Great video!!! Quick question - is it more accurate to calculate the embeddings for the questions in the file and then select the context suitable according to the question that is closest to the question the user is asking?
Given youtube play list ? how did you extract all the playlist urls
This can be done in a few ways - 1) with the RUclips API, 2) with some screen scraping , or 3) By hand :). I can touch on this when I should how to process in batch.
@@parttimelarry right click inspect or ctrl shift i
var scroll = setInterval(function(){ window.scrollBy(0, 1000)}, 1000);
window.clearInterval(scroll); console.clear(); urls = $$('a'); urls.forEach(function(v,i,a){if (v.id=="video-title"){console.log('\t'+v.title+'\t'+v.href+'\t')}});
I was wondering if I could get a little help. I have successfully added an embedding column to my data sheet but when I embed my question and try to sort my data sheet by similarities I run into the following error:
numpy.core._exceptions._UFuncNoLoopError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('
The code seems to break only when I try to find similarities from a df just by loading the csv file with embeddings. When I create the csv file with embeddings from a data frame that data frame seems to work with finding similarities.
This is great! Do you think ChatGPT API could be applied to these use cases instead of embedding?
I have a PDF file with about 20k words.
How can I make a chat bot that will answer questions who'se answers are inside the PDF?
I tried to play with the openai GPT playground but it has a limit of 4096 words.
Please give me tips.
I never suscribe to any youtube channel, too much noise. Yours, I did.
Hello Larry, can i do a similar thing using Ruby? 😢
What is COMPLETIONS_MODEL? Is that a custom model you trained?
It's just a variable that I defined closer to the top of the notebook. I set it to text-davinci-003 (the OpenAI model) but you can change the value there to use a cheaper model if desired.
COMPLETIONS_MODEL = "text-davinci-003"
@@parttimelarry d'oh! 😉 Excellent vid as always, super stoked for your new AI channel
Nice. Only issue is that gpt-3 calls will be expensive with that much text, will be totally unscalable i think
Thanks for the feedback, this gives me an idea for a cost calculation video. The embeddings calls I use in the project are very cheap. Will do the full batch of podcasts and show my costs for a large project. Also planning to do some projects with open source alternatives to compare results. Cheers.
@@parttimelarry I think gpt3 is overkill for this task. One reason embedding are great is the cost. Summarizing text should be fine for Currie, or a cheaper model even. BTW, what would you do if the text was a book without distinct q&A? I guess you need to determine how to split the text. You could automate the process of splitting using another embedding model maybe. Make an embedding of each sentence and then split into contexts based on the similarity of sentences? Hmm, interesting
@@rshrott for books/documentation, i suppose that probably could be useful to treat paragraphs in the same way Larry has worked with the CONTEXTS in this video (sentences between 2 time marks), and the CHAPTER of those paragraphs be indexed in the same way Larry has indexed the YT URLS 😁
Indeed, if you think it well, "to know the question" is no so important. You only need to find "text the most related to the question the user do" and then ask to the model to use it as a context to return an answer to the question.
Sincerely... i want so much to try all this by myself...!! super! The applications are endless... finally WE HAVE A SEMANTIC TEXT CALCULATOR !!! It is our last dream for those of us who have been in this AI since the 90's.
@@parttimelarry yes cost calculation become extremely interesting. It looks like one could extract the text, let it summarize with a cheaper model and then do the embedding. That way you also limit context as you will paste less text in the prompt.
Thanks for the best video on embedding and specifically how to feed the context back to the model (and I watched half a dozen)
6:25 the answer sounds very chatgpt, said nothing at all to avoid the risk of being wrong.
Shouldn’t you say database, instead of model? Because you aren’t training right…just collecting the info then promoting with it right?
Totally said model a few times in the last few videos where it wasn't appropriate. Noticed this later but hard to go back since it takes a long time to record and edit.
Hello, can someone help me, I have this error.
KeyError: 'streamingData'
1 stream = youtube_video.streams.filter(only_audio=True).first()
2 stream.download(filename='financial_advisor.mp4')