LangChain 101: YouTube Transcripts + OpenAI

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024

Комментарии • 99

  • @HerroEverynyan
    @HerroEverynyan Год назад +7

    Your diagrams + explanations are really helpful. I usually phase out when people explain things using diagrams, but the way you do it is very easy to follow and understand and I'm sure others feel the same as well.

    • @DataIndependent
      @DataIndependent  Год назад +1

      That's awesome to hear. Thank you for sharing that.

  • @anujsaluja9139
    @anujsaluja9139 11 месяцев назад

    Your explanation aided by that diagram up front made it extremely easy to understand what otherwise is a complex topic for newbies like myself. I am learning a lot from your videos. A big thank you for all your efforts.

    • @DataIndependent
      @DataIndependent  11 месяцев назад +1

      Awesome, thanks for letting me know!

  • @mushroomthump
    @mushroomthump Год назад +1

    Diagrams are super useful, great videos overall. Please keep them coming!

  • @leromerom
    @leromerom Год назад

    I anticipate you will become very popular soon, keep up this good work and you will reach 100’s of thousand people audience

    • @DataIndependent
      @DataIndependent  Год назад

      That would be cool! I will continue to put energy into this space

  • @shrvn110
    @shrvn110 Год назад

    Greg, thank you for all the videos you have made, theyve all been super helpful! I hope you get everything you want back in life!

  • @brentdunklau4536
    @brentdunklau4536 Год назад

    I’m going to use LangChain to look at all your videos and tell me which ones I should really pay attention to based on what I’m trying to do 💥

  • @KunjaBihariKrishna
    @KunjaBihariKrishna Год назад

    This is cool. Because openai models are great at sentiment analysis. You could write a script that automatically fetches trending videos of a specific topic (a specific industry/market, depending on your needs) and performs sentiment analysis on the 500 highest performing videos. Just filter RUclips by topic, uploaded: Today, sort by: views.
    Then do sentiment analysis on each transcript, assign a score. (You would have to do some real work on designing a scoring system, though. That's what determines the value of this whol thing)
    And you end up with daily summaries on what people are saying about some product, market, political figures, whatever you like.
    Daily summaries, along with the sentiment analysis scoring system, are turned into statistics, charts, weekly summaries, monthly.. etc.
    You'd need to have a good setup for circumventing the token-limit when interpreting transcripts, but that can be done.

    • @DataIndependent
      @DataIndependent  Год назад +1

      Yeah I like that idea. No need to stop at youtube videos either. There is likely a lot of good data on reddit/twitter as well.

    • @KunjaBihariKrishna
      @KunjaBihariKrishna Год назад

      @@DataIndependent Yes. For twitter, you could probably target the main news/influencers for a niche

  • @santhoshvasamsetti9165
    @santhoshvasamsetti9165 Год назад

    Your diagrams are cool. Your Explanations are cool and the content is kick-ass. Do more videos brother.

  • @sameerdev2139
    @sameerdev2139 Год назад

    Amazing video! I am just discovering the Langchain +OpenAI and your videos are just superb.

  • @dadas7852
    @dadas7852 Год назад

    One of the easiest tutorial to follow, thx!

  • @nattapongthanngam7216
    @nattapongthanngam7216 4 месяца назад

    Great tutorial!

  • @catyung1094
    @catyung1094 Год назад +2

    That's extremely cool ! Wondering if you can compare the performance on using Flan T5 vs GPT 's performance on Langchain pipeline next time ❤💪

  • @mathavansg9227
    @mathavansg9227 9 месяцев назад

    love your videos

  • @tfhighlander2280
    @tfhighlander2280 Год назад

    Your videos are amazing! Looking forward to see how you could integrate gpt-index as well as langchain!

    • @DataIndependent
      @DataIndependent  Год назад

      Thanks thanks for the comment. What is your use case for the two tools? I like to have an example to work through instead of just an overview

    • @tfhighlander2280
      @tfhighlander2280 Год назад +1

      @@DataIndependent Honestly i’m not sure gpt-index would be the best way to go, but my use case is that i have a large amount of document that i need to store on the cloud and update weekly in order to be accessible to a web app. Looking around online I thought I could use gpt-index as a long term memory and use langchain to connect the model. Like a way to q&a your personal journal stored online

  • @nsitkarana
    @nsitkarana Год назад

    great video and nicely explained !!

  • @byteolu
    @byteolu Год назад

    MFM great podcast! If I wasn’t subscribed I am now!

  • @nathancanbereached
    @nathancanbereached Год назад +2

    This is great! Could you do a video about connecting lang chain to embedding / semantic search? I've been eyeing what you can do with Pinecone - but I dont know where to start.

    • @DataIndependent
      @DataIndependent  Год назад

      Ya sounds great. Could you give me an example problem statement or exercise you'd like to walk through? Ex: "I want to search XYZ"

    • @klammer75
      @klammer75 Год назад

      I second this request!

    • @nathancanbereached
      @nathancanbereached Год назад

      @@DataIndependent Yeah like lets say I saved a few 200-300 page self help book pdfs to my google drive. I'd like to be able to do Q&A where it does semantic search through embeddings to find the best k results, and then it feeds those results into prompt context before sending it to the llm.

    • @DataIndependent
      @DataIndependent  Год назад +2

      Nice thank you. That’ll be a fun example to do. I’ll give it a go tomorrow

  • @juancorrea3546
    @juancorrea3546 Год назад

    Extremely useful! Thanks a lot!

    • @DataIndependent
      @DataIndependent  Год назад

      Glad it was helpful! Anything else you want to see?

  • @adamsardo
    @adamsardo Год назад

    Forgive me if this is a noob question, but went to try this out myself by importing your RUclips Loader file into my Jupyter Notebook, and I keep running into "AttributeError: type object 'RUclipsLoader' has no attribute 'from_youtube_url'".
    Any idea on what I could be doing wrong?
    Cheers 🙏

  • @aaronward9140
    @aaronward9140 Год назад

    It would appear that something has changed. I'm trying to use the RUclipsLoader module but i get an SSL error: `urllib.error.URLError: `

  • @carlosrscoelho
    @carlosrscoelho 11 месяцев назад

    Hello there, Greg! I really appreciated your video! Imagine I have a playlist with a bunch of URLs. How would you handle this scenario? Initially, I would extract these URLs from the RUclips playlist using from pytube import Playlist. Now, to obtain the transcript for each of them, I attempted the method you showcased in the video (Multiple Videos) but faced issues. Do you have any suggestions or thoughts on this?

  • @snippletrap
    @snippletrap Год назад

    I am going to start using "instantialize" unironically. Good word

    • @DataIndependent
      @DataIndependent  Год назад

      Tomato tomato ha :) Like any good forward thinking developer I just snagged instantialize.com

  • @jesusmtz29
    @jesusmtz29 Год назад +1

    Is it possible to pass additional instructions to the summary method?

    • @NoOne-uz4vs
      @NoOne-uz4vs 5 месяцев назад

      Have u found out a way? I mean, I need to summarize the video in another language instead of english

  • @henkhbit5748
    @henkhbit5748 Год назад

    The results of your Langchain summary of summaries are 👏
    a small question: say u have formula, for example the quadratic formula or some specific formula, in your document. Can I ask a question to solve the answer?

    • @DataIndependent
      @DataIndependent  Год назад +1

      You would likely need to isolate that piece of information and then ask it to solve it. There are math tools but I haven't used them a ton.

  • @karinwiberg2223
    @karinwiberg2223 9 месяцев назад

    Hi, this is a really helpful video. I want to ask you - when I try to access a RUclips video which is too long, I get an empty list as the result meaning I dont have any text to split. Have you come across this before?

    • @DataIndependent
      @DataIndependent  9 месяцев назад

      Hey Karin!
      hm, I haven't run into that. But rather than the problem being that it is too long it sounds like there isn't a transcript for it (not all videos have it).
      Can you see the transcript on the RUclips UI?

  • @caiyu538
    @caiyu538 Год назад

    great,great

  • @victorguerrero6581
    @victorguerrero6581 Год назад

    Is there a way to save the youtube link as a variable to put in streamlit?

  • @lorenzoleongutierrez7927
    @lorenzoleongutierrez7927 Год назад

    Great videos !. And Greetings from Pedro Pascal ancestors land ! Chile 🇨🇱

  • @wwsdley
    @wwsdley Год назад

    Great video! Can you please check if this code still working? I think something might be changed on google side, because I can't load any video transcriptions anymore...

  • @AyushSharma-ux4fk
    @AyushSharma-ux4fk Год назад

    hey a dumb question.
    If i can simply call openai apis, what is the benefit of using langchain? Internally langchain is also calling openai apis.
    Would taking the langchiain path not increase the latency of the application?

    • @DataIndependent
      @DataIndependent  Год назад

      Check out my latest video on the 7 core concepts of LangChain. In it I overview most of the power it can do today. Tons of software built to make common tasks easy.
      Yes it would increase the latency, but that is unavoidable for more sophisticated tasks at the moment.

  • @brunoresendesantos45
    @brunoresendesantos45 Год назад

    Cool!
    Is there a way to change the summary prompt? Like detailed summary instead of concise?

    • @DataIndependent
      @DataIndependent  Год назад +1

      Yep, you'll need to edit the prompt that is being used. Ex: Concise > Detailed
      Here is the documentation on how to do that
      langchain.readthedocs.io/en/latest/modules/indexes/chain_examples/summarize.html?highlight=load_summarize_chain#the-stuff-chain:~:text=Ukrainian%2DAmerican%20citizens.%22%7D-,Custom%20Prompts,-You%20can%20also

  • @aekundayo
    @aekundayo Год назад

    Hi These videos are very helpful, I have a question though. Its seems there is some overlap between what you can do with Langchain and llama-index (gpt-index), in what scenario would you leverage both libraries?

    • @DataIndependent
      @DataIndependent  Год назад +1

      I'm getting this question a lot and happy to do a video on it. Thanks for asking.

  • @aiautoglasscrm
    @aiautoglasscrm Год назад

    Awesome videos and focusing on solving business problems 🙂
    1. ChatGPT playground follows the prompts as necessary i.e. the way it should, however, the ChatGPT API using same model, same prompts, and with settings, however, the return response in API call is not always or rarely in the requested format.
    2. Can feeding in excel tabular data using the method you demonstrated or another method train ChatGPT to predict on a column?
    Just found your channel a few hours ago, awesome videos and thank you for making them.

    • @DataIndependent
      @DataIndependent  Год назад +1

      Thanks for the kind words. What do you mean to predict on a column?

    • @aiautoglasscrm
      @aiautoglasscrm Год назад

      @@DataIndependent Thank you for responding. Imagine you have a table with these columns, price, car-years, car-make-, car-autoglass, with thousand of rows. Can you use that table to train ChatGPT predicting a price given the car-years, car-make-, car-autoglass,

    • @DataIndependent
      @DataIndependent  Год назад +1

      @@aiautoglasscrm ah nice I see. For that you’d want to use a different ML model. Likely a regression based on those attributes.
      There are a bunch out there to choose from. Maybe even some Kaggle exercises as examples

    • @aiautoglasscrm
      @aiautoglasscrm Год назад

      @@DataIndependent Thank you!!

  • @rosslovell73
    @rosslovell73 Год назад

    I'm at a loss. There seems to be no easy way to move from an existing list of preprocessed strings into anything that can chunk those strings. All the loaders assume a user will need to load from a document of whatever description. What if that isn't the case? What if a list is ready to go?

    • @DataIndependent
      @DataIndependent  Год назад

      Sorry I don't understand fully. Where is your list of strings? in a text doc?

    • @RudyBanks
      @RudyBanks Год назад

      I ran into this. If you already have a string variable of text say. "LargeText" change this line from .split_documents to texts = text_splitter.split_text(LargeText)

  • @shamsnahid4046
    @shamsnahid4046 Год назад

    Good video! I am curious, is there any way we can train our ai so it can answer as a professional way like chatgpt does?

    • @DataIndependent
      @DataIndependent  Год назад

      You'll need to do a custom prompt and tell it to speak in a different tone, examples of the tone you're looking for are good as well

  • @cgtinc4868
    @cgtinc4868 Год назад

    Hi Greg, great video again! already "liked". Wondering if there is a translation module from Langchain, as some youtube videos are of different language. And two more requests from youtube functions. 1st, can i just get the full transcript? and second can i place a timing to the extract like between 1 min into the video till like 4th mins? Thanks mate, sorry for pushing the limits on this as with those, there are real uses.

    • @DataIndependent
      @DataIndependent  Год назад

      Hey! For translation, I haven't seen first class support for this from LangChain yet.
      For full transcriptions, yep you can, it is an output of that data loader which should work for you.
      I don't understand the question about min 1-4

    • @cgtinc4868
      @cgtinc4868 Год назад

      @@DataIndependent Oh about the 1-4 min is when a video is like 10 mins, i just want to summarize those from the first minute to the 4th and leave out the rest. Just wondering if that can be done

    • @DataIndependent
      @DataIndependent  Год назад

      @@cgtinc4868 Nice - when I got the transcript I didn't see timestamps but they may be hiding in there somewhere. You could do it when it's just a simple matter of cropping the transcript.

  • @adipatki
    @adipatki Год назад

    Is there a way to get longer and more detailed output?

    • @DataIndependent
      @DataIndependent  Год назад

      You can change the prompt or use custom prompts and ask for more information

  • @xorlop
    @xorlop Год назад

    I am interested to know what happens/what should you do when the map_reduce chaining token size is also too long. For example, what if all the concatenated summaries are greater than 4096 tokens, the max limit? Maybe, there could be a map_reduce_recursive and it will automatically solve this problem for you.

    • @xorlop
      @xorlop Год назад

      Omg nevermind! Your next video about a querying a book and pinecone covers when you have many documents. It looks like the method is to find similar documents first instead of map_reduce summarizing all of them!

    • @DataIndependent
      @DataIndependent  Год назад

      Nice! Glad that worked out

  • @ambrosionguema9200
    @ambrosionguema9200 Год назад

    Great!, How to upload my personal link with audio? Which is the method?

    • @DataIndependent
      @DataIndependent  Год назад

      What do you mean your personal link?

    • @ambrosionguema9200
      @ambrosionguema9200 Год назад

      @@DataIndependent I have a link when i'm teaching but it's not from youtube, is it possible to put on this youtubeLoader....(url)?

  • @mandardk
    @mandardk Год назад

    Excellent videos. I loved this but when I try running it, I am receiving multiple errors. Does anyone have a fully working code?

    • @wwsdley
      @wwsdley Год назад

      I'm having the same issue. I've heard that google has shutdown some loaders, such as pytube... :(

  • @digitald74
    @digitald74 Год назад

    nice tutorial, is it possible to use another language for the transcript and also modify the prompt?

    • @digitald74
      @digitald74 Год назад

      loader = RUclipsLoader.from_youtube_url("ruclips.net/video/QujoO8CLGMw/видео.html", add_video_info=True,language='de')

    • @digitald74
      @digitald74 Год назад

      chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True,map_prompt=prompt,combine_prompt=prompt)

    • @DataIndependent
      @DataIndependent  Год назад

      Yep that is exactly it to modify the prompt. How did it go for you?

  • @junaidmughal3806
    @junaidmughal3806 Год назад

    You look like Ryan Gosling

  • @BorisDrubetsky
    @BorisDrubetsky Год назад

    Very nice demo, thank you very much.
    I am wondering if anyone else is running into error: "AttributeError: type object 'RUclipsLoader' has no attribute 'from_youtube_url'" when running: loader = RUclipsLoader.from_youtube_url("ruclips.net/video/QsYGlZkevEg/видео.html", add_video_info=True) despite prerequisites being installed?
    Thanks again OP, well presented tutorial.

    • @DataIndependent
      @DataIndependent  Год назад

      Nice! I've seen some updates come through for langchain and specifically that loader. Maker sure you're on the most recent version

    • @waqasobeidy8318
      @waqasobeidy8318 Год назад

      Were you able to solve this? I installed the latest version but still faces the same issue.

    • @d279020
      @d279020 Год назад +1

      @@waqasobeidy8318 the loader seems to have been updated to use the official GoogleCloudAPI. v0.0.105 still seems to work. I'm not against using the API but anything that asks for my credit card I tend avoid at all cost

    • @waqasobeidy8318
      @waqasobeidy8318 Год назад

      @@d279020 Yep I agree. The function works fine on the older version like you suggested, Thanks.

    • @shamsnahid4046
      @shamsnahid4046 Год назад

      @waqas which older version you using?

  • @ArchITECH-vk7ke
    @ArchITECH-vk7ke Год назад

    Thank you for sharing this, super helpful. Wondering if you ran into the below issue with the API unable to retrieve the transcript ? sharing sample below
    Could not retrieve a transcript for the video ruclips.net/video/eVX0QrvjA5M/видео.html! This is most likely caused by:
    Subtitles are disabled for this video
    Thanks in adavnce!!

    • @DataIndependent
      @DataIndependent  Год назад

      Interesting, no I haven't seen that issue, though not scalable, there are a lot of sites that will get a transcript for you from the audio. Or you could use whisper

  • @sbharadwaj1
    @sbharadwaj1 Год назад

    Yes, the diagrams are supercool.
    I was wondering how to do subsections of a video eg. ruclips.net/video/QsYGlZkevEg/видео.html -- is there a way to then get the summary of a section of a video.

    • @DataIndependent
      @DataIndependent  Год назад +1

      You totally could. You just need to feed that subsection into your summarizer.
      There isn’t an easy out of the box way to do it though

    • @sbharadwaj1
      @sbharadwaj1 Год назад

      @@DataIndependent Thank you. Maybe there is a different API in RUclipsLoader for this? Else does one have to dig or guesstimate the spot in the text stream?

  • @shamsnahid4046
    @shamsnahid4046 Год назад

    And they saying “youtube loader has no attribute from_youtube_url

    • @DataIndependent
      @DataIndependent  Год назад +1

      Try upgrading LangChain and if it still doesn't work check the code on the documentation