LangChain 101: YouTube Transcripts + OpenAI

Greg Kamradt (Data Indy)

Просмотров 24 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 11 сен 2024

Комментарии • 99

@HerroEverynyan Год назад ⁺⁷
Your diagrams + explanations are really helpful. I usually phase out when people explain things using diagrams, but the way you do it is very easy to follow and understand and I'm sure others feel the same as well.
@DataIndependent Год назад ⁺¹
That's awesome to hear. Thank you for sharing that.
@anujsaluja9139 11 месяцев назад
Your explanation aided by that diagram up front made it extremely easy to understand what otherwise is a complex topic for newbies like myself. I am learning a lot from your videos. A big thank you for all your efforts.
@DataIndependent 11 месяцев назад ⁺¹
Awesome, thanks for letting me know!
@mushroomthump Год назад ⁺¹
Diagrams are super useful, great videos overall. Please keep them coming!
@leromerom Год назад
I anticipate you will become very popular soon, keep up this good work and you will reach 100’s of thousand people audience
@DataIndependent Год назад
That would be cool! I will continue to put energy into this space
@shrvn110 Год назад
Greg, thank you for all the videos you have made, theyve all been super helpful! I hope you get everything you want back in life!
@brentdunklau4536 Год назад
I’m going to use LangChain to look at all your videos and tell me which ones I should really pay attention to based on what I’m trying to do 💥
@DataIndependent Год назад
Nice!
@KunjaBihariKrishna Год назад
This is cool. Because openai models are great at sentiment analysis. You could write a script that automatically fetches trending videos of a specific topic (a specific industry/market, depending on your needs) and performs sentiment analysis on the 500 highest performing videos. Just filter RUclips by topic, uploaded: Today, sort by: views.
Then do sentiment analysis on each transcript, assign a score. (You would have to do some real work on designing a scoring system, though. That's what determines the value of this whol thing)
And you end up with daily summaries on what people are saying about some product, market, political figures, whatever you like.
Daily summaries, along with the sentiment analysis scoring system, are turned into statistics, charts, weekly summaries, monthly.. etc.
You'd need to have a good setup for circumventing the token-limit when interpreting transcripts, but that can be done.
@DataIndependent Год назад ⁺¹
Yeah I like that idea. No need to stop at youtube videos either. There is likely a lot of good data on reddit/twitter as well.
@KunjaBihariKrishna Год назад
@@DataIndependent Yes. For twitter, you could probably target the main news/influencers for a niche
@santhoshvasamsetti9165 Год назад
Your diagrams are cool. Your Explanations are cool and the content is kick-ass. Do more videos brother.
@sameerdev2139 Год назад
Amazing video! I am just discovering the Langchain +OpenAI and your videos are just superb.
@DataIndependent Год назад
Nice! Thank you
@dadas7852 Год назад
One of the easiest tutorial to follow, thx!
@nattapongthanngam7216 4 месяца назад
Great tutorial!
@catyung1094 Год назад ⁺²
That's extremely cool ! Wondering if you can compare the performance on using Flan T5 vs GPT 's performance on Langchain pipeline next time ❤💪
@DataIndependent Год назад ⁺¹
Great suggestion! I'll add this to the list
@mathavansg9227 9 месяцев назад
love your videos
@tfhighlander2280 Год назад
Your videos are amazing! Looking forward to see how you could integrate gpt-index as well as langchain!
@DataIndependent Год назад
Thanks thanks for the comment. What is your use case for the two tools? I like to have an example to work through instead of just an overview
@tfhighlander2280 Год назад ⁺¹
@@DataIndependent Honestly i’m not sure gpt-index would be the best way to go, but my use case is that i have a large amount of document that i need to store on the cloud and update weekly in order to be accessible to a web app. Looking around online I thought I could use gpt-index as a long term memory and use langchain to connect the model. Like a way to q&a your personal journal stored online
@nsitkarana Год назад
great video and nicely explained !!
@byteolu Год назад
MFM great podcast! If I wasn’t subscribed I am now!
@nathancanbereached Год назад ⁺²
This is great! Could you do a video about connecting lang chain to embedding / semantic search? I've been eyeing what you can do with Pinecone - but I dont know where to start.
@DataIndependent Год назад
Ya sounds great. Could you give me an example problem statement or exercise you'd like to walk through? Ex: "I want to search XYZ"
@klammer75 Год назад
I second this request!
@nathancanbereached Год назад
@@DataIndependent Yeah like lets say I saved a few 200-300 page self help book pdfs to my google drive. I'd like to be able to do Q&A where it does semantic search through embeddings to find the best k results, and then it feeds those results into prompt context before sending it to the llm.
@DataIndependent Год назад ⁺²
Nice thank you. That’ll be a fun example to do. I’ll give it a go tomorrow
@juancorrea3546 Год назад
Extremely useful! Thanks a lot!
@DataIndependent Год назад
Glad it was helpful! Anything else you want to see?
@adamsardo Год назад
Forgive me if this is a noob question, but went to try this out myself by importing your RUclips Loader file into my Jupyter Notebook, and I keep running into "AttributeError: type object 'RUclipsLoader' has no attribute 'from_youtube_url'".
Any idea on what I could be doing wrong?
Cheers 🙏
@aaronward9140 Год назад
It would appear that something has changed. I'm trying to use the RUclipsLoader module but i get an SSL error: `urllib.error.URLError: `
@carlosrscoelho 11 месяцев назад
Hello there, Greg! I really appreciated your video! Imagine I have a playlist with a bunch of URLs. How would you handle this scenario? Initially, I would extract these URLs from the RUclips playlist using from pytube import Playlist. Now, to obtain the transcript for each of them, I attempted the method you showcased in the video (Multiple Videos) but faced issues. Do you have any suggestions or thoughts on this?
@snippletrap Год назад
I am going to start using "instantialize" unironically. Good word
@DataIndependent Год назад
Tomato tomato ha :) Like any good forward thinking developer I just snagged instantialize.com
@jesusmtz29 Год назад ⁺¹
Is it possible to pass additional instructions to the summary method?
@NoOne-uz4vs 5 месяцев назад
Have u found out a way? I mean, I need to summarize the video in another language instead of english
@henkhbit5748 Год назад
The results of your Langchain summary of summaries are 👏
a small question: say u have formula, for example the quadratic formula or some specific formula, in your document. Can I ask a question to solve the answer?
@DataIndependent Год назад ⁺¹
You would likely need to isolate that piece of information and then ask it to solve it. There are math tools but I haven't used them a ton.
@karinwiberg2223 9 месяцев назад
Hi, this is a really helpful video. I want to ask you - when I try to access a RUclips video which is too long, I get an empty list as the result meaning I dont have any text to split. Have you come across this before?
@DataIndependent 9 месяцев назад
Hey Karin!
hm, I haven't run into that. But rather than the problem being that it is too long it sounds like there isn't a transcript for it (not all videos have it).
Can you see the transcript on the RUclips UI?
@caiyu538 Год назад
great,great
@victorguerrero6581 Год назад
Is there a way to save the youtube link as a variable to put in streamlit?
@lorenzoleongutierrez7927 Год назад
Great videos !. And Greetings from Pedro Pascal ancestors land ! Chile 🇨🇱
@DataIndependent Год назад
Thank you! Greetings!
@wwsdley Год назад
Great video! Can you please check if this code still working? I think something might be changed on google side, because I can't load any video transcriptions anymore...
@AyushSharma-ux4fk Год назад
hey a dumb question.
If i can simply call openai apis, what is the benefit of using langchain? Internally langchain is also calling openai apis.
Would taking the langchiain path not increase the latency of the application?
@DataIndependent Год назад
Check out my latest video on the 7 core concepts of LangChain. In it I overview most of the power it can do today. Tons of software built to make common tasks easy.
Yes it would increase the latency, but that is unavoidable for more sophisticated tasks at the moment.
@brunoresendesantos45 Год назад
Cool!
Is there a way to change the summary prompt? Like detailed summary instead of concise?
@DataIndependent Год назад ⁺¹
Yep, you'll need to edit the prompt that is being used. Ex: Concise > Detailed
Here is the documentation on how to do that
langchain.readthedocs.io/en/latest/modules/indexes/chain_examples/summarize.html?highlight=load_summarize_chain#the-stuff-chain:~:text=Ukrainian%2DAmerican%20citizens.%22%7D-,Custom%20Prompts,-You%20can%20also
@aekundayo Год назад
Hi These videos are very helpful, I have a question though. Its seems there is some overlap between what you can do with Langchain and llama-index (gpt-index), in what scenario would you leverage both libraries?
@DataIndependent Год назад ⁺¹
I'm getting this question a lot and happy to do a video on it. Thanks for asking.
@aiautoglasscrm Год назад
Awesome videos and focusing on solving business problems 🙂
1. ChatGPT playground follows the prompts as necessary i.e. the way it should, however, the ChatGPT API using same model, same prompts, and with settings, however, the return response in API call is not always or rarely in the requested format.
2. Can feeding in excel tabular data using the method you demonstrated or another method train ChatGPT to predict on a column?
Just found your channel a few hours ago, awesome videos and thank you for making them.
@DataIndependent Год назад ⁺¹
Thanks for the kind words. What do you mean to predict on a column?
@aiautoglasscrm Год назад
@@DataIndependent Thank you for responding. Imagine you have a table with these columns, price, car-years, car-make-, car-autoglass, with thousand of rows. Can you use that table to train ChatGPT predicting a price given the car-years, car-make-, car-autoglass,
@DataIndependent Год назад ⁺¹
@@aiautoglasscrm ah nice I see. For that you’d want to use a different ML model. Likely a regression based on those attributes.
There are a bunch out there to choose from. Maybe even some Kaggle exercises as examples
@aiautoglasscrm Год назад
@@DataIndependent Thank you!!
@rosslovell73 Год назад
I'm at a loss. There seems to be no easy way to move from an existing list of preprocessed strings into anything that can chunk those strings. All the loaders assume a user will need to load from a document of whatever description. What if that isn't the case? What if a list is ready to go?
@DataIndependent Год назад
Sorry I don't understand fully. Where is your list of strings? in a text doc?
@RudyBanks Год назад
I ran into this. If you already have a string variable of text say. "LargeText" change this line from .split_documents to texts = text_splitter.split_text(LargeText)
@shamsnahid4046 Год назад
Good video! I am curious, is there any way we can train our ai so it can answer as a professional way like chatgpt does?
@DataIndependent Год назад
You'll need to do a custom prompt and tell it to speak in a different tone, examples of the tone you're looking for are good as well
@cgtinc4868 Год назад
Hi Greg, great video again! already "liked". Wondering if there is a translation module from Langchain, as some youtube videos are of different language. And two more requests from youtube functions. 1st, can i just get the full transcript? and second can i place a timing to the extract like between 1 min into the video till like 4th mins? Thanks mate, sorry for pushing the limits on this as with those, there are real uses.
@DataIndependent Год назад
Hey! For translation, I haven't seen first class support for this from LangChain yet.
For full transcriptions, yep you can, it is an output of that data loader which should work for you.
I don't understand the question about min 1-4
@cgtinc4868 Год назад
@@DataIndependent Oh about the 1-4 min is when a video is like 10 mins, i just want to summarize those from the first minute to the 4th and leave out the rest. Just wondering if that can be done
@DataIndependent Год назад
@@cgtinc4868 Nice - when I got the transcript I didn't see timestamps but they may be hiding in there somewhere. You could do it when it's just a simple matter of cropping the transcript.
@adipatki Год назад
Is there a way to get longer and more detailed output?
@DataIndependent Год назад
You can change the prompt or use custom prompts and ask for more information
@xorlop Год назад
I am interested to know what happens/what should you do when the map_reduce chaining token size is also too long. For example, what if all the concatenated summaries are greater than 4096 tokens, the max limit? Maybe, there could be a map_reduce_recursive and it will automatically solve this problem for you.
@xorlop Год назад
Omg nevermind! Your next video about a querying a book and pinecone covers when you have many documents. It looks like the method is to find similar documents first instead of map_reduce summarizing all of them!
@DataIndependent Год назад
Nice! Glad that worked out
@ambrosionguema9200 Год назад
Great!, How to upload my personal link with audio? Which is the method?
@DataIndependent Год назад
What do you mean your personal link?
@ambrosionguema9200 Год назад
@@DataIndependent I have a link when i'm teaching but it's not from youtube, is it possible to put on this youtubeLoader....(url)?
@mandardk Год назад
Excellent videos. I loved this but when I try running it, I am receiving multiple errors. Does anyone have a fully working code?
@wwsdley Год назад
I'm having the same issue. I've heard that google has shutdown some loaders, such as pytube... :(
@digitald74 Год назад
nice tutorial, is it possible to use another language for the transcript and also modify the prompt?
@digitald74 Год назад
loader = RUclipsLoader.from_youtube_url("ruclips.net/video/QujoO8CLGMw/видео.html", add_video_info=True,language='de')
@digitald74 Год назад
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True,map_prompt=prompt,combine_prompt=prompt)
@DataIndependent Год назад
Yep that is exactly it to modify the prompt. How did it go for you?
@junaidmughal3806 Год назад
You look like Ryan Gosling
@BorisDrubetsky Год назад
Very nice demo, thank you very much.
I am wondering if anyone else is running into error: "AttributeError: type object 'RUclipsLoader' has no attribute 'from_youtube_url'" when running: loader = RUclipsLoader.from_youtube_url("ruclips.net/video/QsYGlZkevEg/видео.html", add_video_info=True) despite prerequisites being installed?
Thanks again OP, well presented tutorial.
@DataIndependent Год назад
Nice! I've seen some updates come through for langchain and specifically that loader. Maker sure you're on the most recent version
@waqasobeidy8318 Год назад
Were you able to solve this? I installed the latest version but still faces the same issue.
@d279020 Год назад ⁺¹
@@waqasobeidy8318 the loader seems to have been updated to use the official GoogleCloudAPI. v0.0.105 still seems to work. I'm not against using the API but anything that asks for my credit card I tend avoid at all cost
@waqasobeidy8318 Год назад
@@d279020 Yep I agree. The function works fine on the older version like you suggested, Thanks.
@shamsnahid4046 Год назад
@waqas which older version you using?
@ArchITECH-vk7ke Год назад
Thank you for sharing this, super helpful. Wondering if you ran into the below issue with the API unable to retrieve the transcript ? sharing sample below
Could not retrieve a transcript for the video ruclips.net/video/eVX0QrvjA5M/видео.html! This is most likely caused by:
Subtitles are disabled for this video
Thanks in adavnce!!
@DataIndependent Год назад
Interesting, no I haven't seen that issue, though not scalable, there are a lot of sites that will get a transcript for you from the audio. Or you could use whisper
@sbharadwaj1 Год назад
Yes, the diagrams are supercool.
I was wondering how to do subsections of a video eg. ruclips.net/video/QsYGlZkevEg/видео.html -- is there a way to then get the summary of a section of a video.
@DataIndependent Год назад ⁺¹
You totally could. You just need to feed that subsection into your summarizer.
There isn’t an easy out of the box way to do it though
@sbharadwaj1 Год назад
@@DataIndependent Thank you. Maybe there is a different API in RUclipsLoader for this? Else does one have to dig or guesstimate the spot in the text stream?
@shamsnahid4046 Год назад
And they saying “youtube loader has no attribute from_youtube_url
@DataIndependent Год назад ⁺¹
Try upgrading LangChain and if it still doesn't work check the code on the documentation

Следующие

Автовоспроизведение

LangChain101: Question A 300 Page Book (w/ OpenAI + Pinecone)