8. OpenAI Financial Advisor Q&A Embeddings - Python Tutorial

Поделиться
HTML-код
  • Опубликовано: 23 сен 2024
  • Like the video? Support my content by checking out Interactive Brokers using the link below:
    www.interactiv...
    In this video, we transcribe a financial podcast using Whisper and use OpenAI Word Embeddings on the transcript to create a question answering system. If you like this type of content, I am starting a spinoff channel this year focused on AI in music, gaming, and design at / @parttimeai , so please subscribe, content coming there soon.
    Notebook: colab.research...
    Question Sheet (Raw): docs.google.co...
    Question Sheet (Transformed): docs.google.co...

Комментарии • 105

  • @parttimelarry
    @parttimelarry  Год назад +14

    Like the video? Support my content by checking out Interactive Brokers using the link below:
    www.interactivebrokers.com/mkt/?src=ptlPY1&url=%2Fen%2Findex.php%3Ff%3D1338
    Notebook: colab.research.google.com/drive/1cVQNg2-zGQb7qZXFECG6kyq5yVIHyf5o?usp=sharing
    Question Sheet (Raw): docs.google.com/spreadsheets/d/1z6DVJPU1DS4J0OhsPkauRPu_2-HfiPjHix1hpaKTBzE/edit?usp=sharing
    Question Sheet (Transformed): docs.google.com/spreadsheets/d/13hTC5wV84-M7_nw_yC7LayRdLbwHAkm96moY6ea4qWU/edit?usp=sharing

    • @thebicycleman8062
      @thebicycleman8062 Год назад +1

      hey Larry i have a question, wouldnt it be much faster and less GPU intensive (and maybe more accurate) if you instead of passing the whole video through whisper to transcribe, just get the text transcription directly? (Most videos already have transcriptions or are even auto generated using google's speech to text, which is pretty accurate) - Any reason you are preffering to go thru the whisper route vs just downloading the video's transcription directly? thanks!!

    • @erikstillman7336
      @erikstillman7336 Год назад

      Thank you so much for this continent. I was wondering how to structure WAV files is there a module that I can use with open AI whisper that will allow me to have some sort of data frame, rather than just a string of words

  • @jerrywang3225
    @jerrywang3225 Год назад +24

    This is by far the best openai tutorial on youtube, period. These codes provide us with endless potentials. Thanks again.

  • @bencarlson2587
    @bencarlson2587 Год назад +4

    Holy crap! Larry this is amazing! Nicely done

    • @parttimelarry
      @parttimelarry  Год назад

      Oh shit, you're here! How would you feel about having your voice cloned for science?

    • @bencarlson2587
      @bencarlson2587 Год назад

      @@parttimelarry lol if this is how I find immortality let's do it ;) I definitely want to learn more about what you've created here

  • @krissn8111
    @krissn8111 Год назад +15

    I am falling in love with these openAI series of yours. Kudos.

  • @TexasNation897
    @TexasNation897 Год назад

    Thanks a bunch Larry! Bro you rock my man you’re a true gift from god please don’t stop 🙏

  • @supriyadevidutta
    @supriyadevidutta Год назад +1

    this is one of the best video, so far Larry, thank you

  • @kamalswami8374
    @kamalswami8374 3 месяца назад +1

    Hey Larry, I just found your yt channel today. I am building something like this bro you are my inspiration now. Respect ++

  • @frankgiardina205
    @frankgiardina205 Год назад +7

    Wow, I need to watch this a couple more times before it sinks in, but can see some great uses for it. Thanks Larry , the way you are able to use all these different technologies and piece them together is exceptional. Thanks again!

  • @charlieevert7666
    @charlieevert7666 Год назад +2

    Funny enough I was thinking of doing this yesterday night... woke up to see this video pop up in my notifications lol. Thank you!!

  • @prabhu_patil
    @prabhu_patil Год назад +4

    Very impressive, you have opened gates of ideas. Proud that how you have transformed your self and us as well from pandas based indicators to future of AI based decision matrix .

  • @AndrewMagee01
    @AndrewMagee01 Год назад +1

    8:02 100% on the mark Larry. Great project as always.

  • @yony2k2
    @yony2k2 Год назад +1

    I was looking everywhere for a video like this! about to use GPT3 for private information helping for search in different document. Thankl you !!

  • @nicolatje
    @nicolatje Год назад +1

    thanks for sharing all these beauties with us, trying to make the world a better place. Keep up your amazing work!

  • @TheRealHassan789
    @TheRealHassan789 Год назад +4

    PTL.. you provide real value! Thanks

  • @adityakadam2256
    @adityakadam2256 Год назад +1

    thanks for such an amazing video. This is very clever and I like your technique of combining whisper API with Embedding and Completion API. This is really a great insight. Thanks a ton.

  • @tohando
    @tohando Год назад +2

    Best video in the series! I like how you combined the different services and build something amazing. looking forward to the pipeline video! Keep up the good work!

  • @SergiRodriguesRius
    @SergiRodriguesRius Год назад +1

    Thanks for a so clear english speech. My ability to understand spoken english is a bit limited, by i can understand 99% of your video-tutorials without using captions! Indeed, your style to expose how to do these kind of projects is spectacularly easy to understand. You're a great teacher! Thanks for these tutorials about OpenAI. They are the best i have seen these weeks.

    • @parttimelarry
      @parttimelarry  Год назад +2

      Thanks for watching, much more to come!

    • @SergiRodriguesRius
      @SergiRodriguesRius Год назад

      Indeed, Larry, have you test to make the part of "completions" with another model than davinci? I'm asking it just because price. You know.

  • @ChrisWi88
    @ChrisWi88 Год назад +4

    Incredible. Thanks for the awesome educational content

  • @kingtrippy5006
    @kingtrippy5006 Год назад +2

    Thank you Larry your a beast ✊

  • @simple-security
    @simple-security Год назад

    I've gotten as far as you show in this video.
    Now I'm trying to figure out:
    - splitting data into chunks to fit max tokens - openai has a great jpynb example for this.
    - how to loop through all data in chunks to fully complete the results of a question.
    - openai functions to improve consistency of output format.

  • @trainspotting02
    @trainspotting02 Год назад +1

    PTL great video and series. You are a brilliant lad! Thank you.

  • @jayasimhanmasilamani9078
    @jayasimhanmasilamani9078 Год назад

    Part Time Larry, I am a full-time fan!

  • @marcysalty
    @marcysalty Год назад +4

    I’m about to use the same process in my thesis project… when it’s done I’ll show you!! BTW I think you’ll be named in the bibliography!! Thanks again for the great content you continuously provide!!

    • @megaedwin2363
      @megaedwin2363 Год назад

      Hi, how did it go?

    • @marcysalty
      @marcysalty Год назад

      @@megaedwin2363 still in progress!! Regarding this part of the project I’m wrapping everything up in a discord bot that will answer has a tutor in a online course… keep you posted on final results!!

    • @megaedwin2363
      @megaedwin2363 Год назад

      @@marcysalty Great!!! Keep me posted

  • @keridince
    @keridince Год назад +1

    I love this content, this is very usefull thank you

  • @macrobody
    @macrobody Год назад +1

    Very nice! Can't wait for the next one.

  • @ChefRodKnight
    @ChefRodKnight Год назад +1

    Your lessons are incredible! Thanks for sharing

  • @JOANCARLESAGUILAR
    @JOANCARLESAGUILAR Год назад +1

    Great video!!! Thanks for awesome lessons

  • @robin7769
    @robin7769 Год назад +2

    You are giving me a lot of ideas, love your efforts.

  • @rotoboter
    @rotoboter Год назад +1

    Love your lessons Larry. Thank you for your videos ❤

  • @Steve-js7bp
    @Steve-js7bp Год назад

    this was incredibly good. as someone just learning to code I was able to follow along. Thankyou so much for putting this together!

  • @IshmeetSinghahuja
    @IshmeetSinghahuja Год назад +3

    Amazing tutorial!! thank you so much, Now cant wait for your next part. Any ideas?

  • @sriramkrishna6853
    @sriramkrishna6853 Год назад +4

    2nddddd!!! Letsss gooo made itt!
    Missed your content, Larry! Hope to see more often. If you do some open source alternative to OpenAI at some point that will be great too.

  • @bertobertoberto3
    @bertobertoberto3 Год назад +1

    BRILLIANT

  • @Pork-Chop-Express
    @Pork-Chop-Express Год назад +2

    It DOES dodge questions. I performed (independently) a Top 25 NBA players of all time analysis based on ... lots of stats, using gaussian distribution, skewness, kurtosis; accounting for career-length differences, utility value, post season, accomplishments, and outliers. In the end- MJ was the GOAT, Wilt at #2, LeBron at #3, and Kobe at #4. I asked ChatGPT to access basketball-reference OR wikipedia. It responded by saying that it DID have that ability, but then neglected to do so and analyze the statistical categories. It refused OVER and OVER saying that "subjective biases exist that can skew perception of this complicated question." I asked it to IGNORE that and just crunch the numbers. IT AGAIN refused to do so. WOW

    • @NickWindham
      @NickWindham Год назад

      Did you give ChatGPT thumbs down feedback so that hopefully OpenAI will fix it?

    • @Pork-Chop-Express
      @Pork-Chop-Express Год назад

      @@NickWindham Oh absolutely. I would be surprised if anything changes. I asked - dispassionately and objectively - over and over for it to "just focus on these categories." To which it replied about perceptions and biases. At this point, I think it is just a Google Copy and Paste ... thing. I don't see it doing anything more than that. No actual analysis or connecting the dots.

  • @DavidDji_1989
    @DavidDji_1989 Год назад +2

    Awesome value !

  • @paraconscious790
    @paraconscious790 Год назад +2

    Wow, this is insanely valuable, can't believe you are sharing it for free, thank you very much! 🙏One question though, you mentioned that you can work with OpenAI API on your personal confidential information, but if I am calling API for embeddings is not sending my information to OpenAI for vectorization outside boundaries of my organization?

  • @onemanops
    @onemanops Год назад +3

    aww🎉some

  • @LeveragedFinance
    @LeveragedFinance Год назад +2

    good vid

  • @rafaeltacconi2065
    @rafaeltacconi2065 Год назад +3

    great content

  • @eltoroloco28
    @eltoroloco28 Год назад +2

    Curious if you could share high level best practices for getting embeddings? Depending on the use case I'd imagine how you split up the text would be really important and also what are the technical requirements for the input (e.g. input mustn't have white spaces or line breaks?)... Thanks for all these tutorials, they're amazing!

  • @yellowboat8773
    @yellowboat8773 Год назад +3

    Dam, Larry are you building these for companies yourself? Feels like your the tip of the spear here.

  • @FPRowland
    @FPRowland Год назад +1

    Thanks!

  • @andrescastro8961
    @andrescastro8961 Год назад +1

    Great content Larry!! 👏💥 I'm really looking forward to seeing an open-source alternative to OpenAI for doing this kind of project. How sure are we that the content we provide to the AI model stays private? I mean, OpenAI has access to all content fed into their system regardless of whether it is embedded, or in chatGPT format.

  • @vipwlb
    @vipwlb Год назад +1

    This is really a great sharing! Thanks man!! One quick questions is that I noticed not every episode will have description in details where you will find when the specific questions are raised? how can you get the questions start time in such cases please? Just out of curious. Thanks again!

  • @yomajo
    @yomajo Год назад +1

    I wonder how long it actually took to build behind the scenes.

  • @SneyDeag
    @SneyDeag 10 месяцев назад

    Hi Larry no longer works the latest Python openai library no longer contains embeddings_utils. So this breaks this.
    I don't know if you can upload an update of this video or with another embeddings like for example with Azure.
    I send you a greeting you have helped me a lot to motivate me to study a professional career, I hope to share a coffee.😀

  • @Joshukend
    @Joshukend Год назад

    A thought I have is how podcasters grow over time. Is there a way to weight recent content as more important than old content? While still maintaining all the content in the database

  • @AiDHDtv
    @AiDHDtv 11 месяцев назад

    Thanks a lot for this really great content. It's quite hard for the novice but extremely interesting. One thing that would be really awesome is if you made the same embedding model for your own videos so that we can ask it aka AI-Larry questions about your how-to videos. For example, I have a series of podcasts that I would like to transcribe and embed and they don't have the perfect time stamps in the descriptions. How would I go about creating the Q&A CSV file for those episodes?
    Thanks!
    MW

  • @anbld9386
    @anbld9386 Год назад

    Great content as always! Just one question: is there a following video about building the user web interface?

  • @bobbyhuang4620
    @bobbyhuang4620 Год назад

    Great video! it truly blows my mind! One questions: How is this different from fine-tuning GPT model? Have you tried fine-tuning using the same dataset and compare the results? might be interesting to look into that.

  • @tradissimo9606
    @tradissimo9606 Год назад

    Hi, I wanted to start working on your "Full Stack Trading App Tutorial", but I'm missing the lectures on your homepage!
    Where is the old content of your homepage gone?

  • @arnaudlelong2342
    @arnaudlelong2342 Год назад +2

    What's in the mug dude? Hahaha just kidding thanks for the video.

  • @DamienLuc
    @DamienLuc Год назад +1

    When is part 9 coming?!!!

  • @SneyDeag
    @SneyDeag 10 месяцев назад

    Hi Larry no longer works the latest Python openai library no longer contains embeddings_utils. So this breaks this.
    I don't know if you can upload an update of this video or with another embeddings like for example with Azure.
    I send you a greeting you have helped me a lot to motivate me to study a professional career, I hope to share a coffee.

  • @5ice1971
    @5ice1971 Год назад

    This is great! Can you guide me with what I would need to start as in my own server etc... Thanks

  • @AlterEgo77763
    @AlterEgo77763 Год назад +1

    Love the videos! Side note... I was wondering if you could do a video on the advanced-trade-api I believe this is replacing Coinbase pro's api? Possibly in python? :)

    • @parttimelarry
      @parttimelarry  Год назад +2

      I have made some videos on CCXT before, which supports many exchanges. It looks like there are some recent code merges for CCXT on Github that support the new Coinbase stuff. So it should just be a configuration option.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад +1

    I would have thought you would compute the embedding for 'question' column and do a cosine similarity between that versus the embedded form of your question. Then sort it by similarity. And take the 'context' corresponding to the closest similarity.
    Since you want to match question with question.

    • @parttimelarry
      @parttimelarry  Год назад +3

      Many of the timestamps in the video are not full questions. There are many timestamps with 1 or 2 word titles like "Market sell-off", "Tax Strategy", etc, so I thought it made sense to check a combination of the question + the answer in case a user question was answered but wasn't directly contained in a timestamped question.

    • @SergiRodriguesRius
      @SergiRodriguesRius Год назад +1

      @@parttimelarry Maybe a third useful way, would be to add 2 more columns at the CSV of contexts: one to store an ABSTRACT done by davinci of the column CONTEXT (the transcribed audio between 2 time marks), and another column with the EMBEDDING of that ABSTRACT 😁
      It probably would be useful to "have more clear" which row (Q&A) are closer to a new user question. And using these 2 "abstract embeddings" the request to davinci model will be quite more short and so quite more cheap. It would be needed to test it, of course. Maybe you would lose too much useful information to build the answers... who know.

  • @yellowboat8773
    @yellowboat8773 Год назад +1

    Thoughts on how to do this without question and answer in the original format? Can we just feed in walls of text then extract question and answers from that?

    • @parttimelarry
      @parttimelarry  Год назад +3

      You don't necessarily need questions in advance, but you need to divide up your text in a logical way. If you check the OpenAI cookbook, they have an example using Wikipedia articles and asking questions about the Olympics. In this case, they use the headings + paragraphs to divide the text and finding the section that is most relevant to the question.

  • @wangjueliang
    @wangjueliang Год назад

    When you have a large length of text, how to chunk them by sentences and fit within the max token? Also if we could have some overlaps like having the last sentence from previous chunk to be included in the next chunk, it will provide the model a better context.

    • @parttimelarry
      @parttimelarry  Год назад

      There are some great tools that handle common patterns like this that I need to cover - Langchain and Llama-Index.

  • @koetje071
    @koetje071 11 месяцев назад

    hey Larry, what would happen if I don't use the timestamp and just use the whole transcribed podcast as the source data? Would it just be more expensive and slower or would the resulting answer be different?

  • @dservais1
    @dservais1 Год назад +5

    1st to like today 😀

  • @dongnguyenanh7282
    @dongnguyenanh7282 Год назад

    why does it say Streaming data not found for the video. Unable to download.
    even though the youtube video is available?

  • @Siyar-sb2ub
    @Siyar-sb2ub 6 месяцев назад

    So do i need to know timestamps before i can do this?
    or can i do like this:
    is there a way to do this without knowing the timestamps of the questions/answers?

  • @vtrandal
    @vtrandal Год назад

    ChatGPT's knowledge cutoff is September 2021. Not 2019.

  • @Kmysiak1
    @Kmysiak1 Год назад

    How can we be sure our internal data being fed into the model isn’t saved somewhere with openai?

  • @yshaool
    @yshaool Год назад

    Great video!!! Quick question - is it more accurate to calculate the embeddings for the questions in the file and then select the context suitable according to the question that is closest to the question the user is asking?

  • @GauravGarg-dq4js
    @GauravGarg-dq4js Год назад +1

    Given youtube play list ? how did you extract all the playlist urls

    • @parttimelarry
      @parttimelarry  Год назад +3

      This can be done in a few ways - 1) with the RUclips API, 2) with some screen scraping , or 3) By hand :). I can touch on this when I should how to process in batch.

    • @GauravGarg-dq4js
      @GauravGarg-dq4js Год назад

      @@parttimelarry right click inspect or ctrl shift i
      var scroll = setInterval(function(){ window.scrollBy(0, 1000)}, 1000);
      window.clearInterval(scroll); console.clear(); urls = $$('a'); urls.forEach(function(v,i,a){if (v.id=="video-title"){console.log('\t'+v.title+'\t'+v.href+'\t')}});

  • @CodeCoachh
    @CodeCoachh Год назад

    I was wondering if I could get a little help. I have successfully added an embedding column to my data sheet but when I embed my question and try to sort my data sheet by similarities I run into the following error:
    numpy.core._exceptions._UFuncNoLoopError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('

    • @CodeCoachh
      @CodeCoachh Год назад

      The code seems to break only when I try to find similarities from a df just by loading the csv file with embeddings. When I create the csv file with embeddings from a data frame that data frame seems to work with finding similarities.

  • @hoomanvan
    @hoomanvan Год назад

    This is great! Do you think ChatGPT API could be applied to these use cases instead of embedding?

  • @kebab-case
    @kebab-case Год назад

    I have a PDF file with about 20k words.
    How can I make a chat bot that will answer questions who'se answers are inside the PDF?
    I tried to play with the openai GPT playground but it has a limit of 4096 words.
    Please give me tips.

  • @RickHunter-fz7oh
    @RickHunter-fz7oh Год назад +2

    I never suscribe to any youtube channel, too much noise. Yours, I did.

  • @cocoarecords
    @cocoarecords Год назад

    Hello Larry, can i do a similar thing using Ruby? 😢

  • @rileyclubb
    @rileyclubb Год назад

    What is COMPLETIONS_MODEL? Is that a custom model you trained?

    • @parttimelarry
      @parttimelarry  Год назад +1

      It's just a variable that I defined closer to the top of the notebook. I set it to text-davinci-003 (the OpenAI model) but you can change the value there to use a cheaper model if desired.
      COMPLETIONS_MODEL = "text-davinci-003"

    • @rileyclubb
      @rileyclubb Год назад +1

      @@parttimelarry d'oh! 😉 Excellent vid as always, super stoked for your new AI channel

  • @rshrott
    @rshrott Год назад

    Nice. Only issue is that gpt-3 calls will be expensive with that much text, will be totally unscalable i think

    • @parttimelarry
      @parttimelarry  Год назад +6

      Thanks for the feedback, this gives me an idea for a cost calculation video. The embeddings calls I use in the project are very cheap. Will do the full batch of podcasts and show my costs for a large project. Also planning to do some projects with open source alternatives to compare results. Cheers.

    • @rshrott
      @rshrott Год назад +1

      @@parttimelarry I think gpt3 is overkill for this task. One reason embedding are great is the cost. Summarizing text should be fine for Currie, or a cheaper model even. BTW, what would you do if the text was a book without distinct q&A? I guess you need to determine how to split the text. You could automate the process of splitting using another embedding model maybe. Make an embedding of each sentence and then split into contexts based on the similarity of sentences? Hmm, interesting

    • @SergiRodriguesRius
      @SergiRodriguesRius Год назад

      @@rshrott for books/documentation, i suppose that probably could be useful to treat paragraphs in the same way Larry has worked with the CONTEXTS in this video (sentences between 2 time marks), and the CHAPTER of those paragraphs be indexed in the same way Larry has indexed the YT URLS 😁
      Indeed, if you think it well, "to know the question" is no so important. You only need to find "text the most related to the question the user do" and then ask to the model to use it as a context to return an answer to the question.
      Sincerely... i want so much to try all this by myself...!! super! The applications are endless... finally WE HAVE A SEMANTIC TEXT CALCULATOR !!! It is our last dream for those of us who have been in this AI since the 90's.

    • @knddlbr
      @knddlbr Год назад

      @@parttimelarry yes cost calculation become extremely interesting. It looks like one could extract the text, let it summarize with a cheaper model and then do the embedding. That way you also limit context as you will paste less text in the prompt.
      Thanks for the best video on embedding and specifically how to feed the context back to the model (and I watched half a dozen)

  • @jmasked5082
    @jmasked5082 Год назад

    6:25 the answer sounds very chatgpt, said nothing at all to avoid the risk of being wrong.

  • @JohnRoodAMZ
    @JohnRoodAMZ Год назад

    Shouldn’t you say database, instead of model? Because you aren’t training right…just collecting the info then promoting with it right?

    • @parttimelarry
      @parttimelarry  Год назад

      Totally said model a few times in the last few videos where it wasn't appropriate. Noticed this later but hard to go back since it takes a long time to record and edit.

  • @SneyDeag
    @SneyDeag Год назад

    Hello, can someone help me, I have this error.
    KeyError: 'streamingData'
    1 stream = youtube_video.streams.filter(only_audio=True).first()
    2 stream.download(filename='financial_advisor.mp4')