AI Blog Post Summarization with Hugging Face Transformers & Beautiful Soup Web Scraping

Nicholas Renotte

Просмотров 17 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 21 янв 2025

Комментарии • 77

@upalkundu2872 3 года назад ⁺³
Your videos are always useful. The explanation along with the work just makes it WoW. Again, really useful for beginners like me who want to get into data science.
@muhmmedmomen8948 3 года назад ⁺²
Here is a like from my side before ending the video 👍 the intro tells alot. Appreciated effort bro.
@NicholasRenotte 3 года назад ⁺¹
Thanks a ton @Muhmmed!!
@muditrustagi5775 3 года назад ⁺¹
this was much needed!! Thank you !!!!!
@NicholasRenotte 3 года назад
Thanks for checking it out!!
@vendroid6193 3 года назад ⁺¹
I can't thank you enough for all these videos
Also as a suggestion for the next video, I would like to suggest building a chatbot from scratch
Keep up the good work Sir
@NicholasRenotte 3 года назад ⁺¹
I think I've got a walk through using Watson Assistant as one of my earlier videos!
@rachelroselinarul8055 3 года назад ⁺¹
great job, I am doing research in abstractive text summarization so kindly upload more videos for abstractive text summarization from basics to advance. thank you.
@MyChris128 3 года назад ⁺¹
Great video, very well explained 👍
@NicholasRenotte 3 года назад ⁺¹
Thanks so much @Chris D!
@the_python_guide Год назад
Hey nick, here is another way to extract chunks.
for i in range(len(res_arr)):
length=len(res_arr[i].split(' '))
if(length+count
@jorgerios4091 2 года назад
Hi Nicho, I learned a lot from your vid, I don't know if the YT algo takes this in consideration but I wanted to say it anyway: Thank you.
@ElTallerDeTD 3 года назад
Amazing video! 🤩
@NicholasRenotte 3 года назад ⁺¹
Thanks so much, glad you enjoyed it!
@thepythonprogrammer4338 3 года назад
Hats off brother love your videos
@abhishekshandilya5644 11 месяцев назад
made this last weekend for a hackathon, good little project to add to the arsenal. I’m still concerned about inference time. Is there an algorithmic way we can accelerate it?
@alokkumar8793 Год назад ⁺¹
i Cant import pipeline from transformer what to do?
@toriqhasmen9129 Год назад
Nick, the text you have results = soup.find_all('h1','p') only gets the headline of the article, when I tried it. Are you sure it gets all the text from 'soup'. Seems not to work for me.
Also can you not use some python library that does html to text directly, instead of removing the html tags this way?
@d3v487 3 года назад ⁺²
Very nice explanation. How should I use this for a whole dataset. Please provide if you have any link.
@NicholasRenotte 3 года назад
You can use this for a whole dataset, it chunks it up :)
@peshangjaafar8469 3 года назад ⁺¹
thank you so much. from iraq. peace...
@alexandermedina4950 2 года назад
Great content, as usual, thank you for this.
@ThemanB1997 Год назад
Is this technique viable for hardcopy files if it's not a blog post online.
@haardrao4387 Год назад
I am trying to use this model, but i am not able to extract the whole blog. can you please help me out with it?
@anirbanpatra3017 2 года назад
Thanks For the Tutorial.I am really struggling with the chunking part.Is there a way I can understand it in a better way??
Is it possible to deploy this on streamlit??
@sebastianmayer5418 3 года назад ⁺¹
Thank you very much for this tutorial.
I tried it with a very long text. I chunked the text (length of chunks < 500 words) and parse the chunks to the Transformer, like you did. My text has more than 20 chunks.
After the input of the 16th chunk to the transformer i get an index-out of-range-error. (IndexError: index out of range in self)
Does this transformer has a limit there?
Do you have a solution for this problem?
Thank you for response.
@NicholasRenotte 3 года назад
Weird, does it work with 15 chunks?
@sebastianmayer5418 3 года назад
@@NicholasRenotte yes
@NicholasRenotte 3 года назад
@@sebastianmayer5418 would it work if you break it up and do it in two runs?
@dab0927 3 года назад ⁺¹
Great tutorial and excellent explanations from beginning to end. My question for you is can you take several articles and produce 1 summary. In other words, it would be great to have a single summary of several related articles. Is that possible? If so, how is it different from the process up walked through in this video?
@NicholasRenotte 3 года назад
You could generate a summary for each. If you were looking at extracting key topics from each, topic discovery might be a better technique.
@upalkundu2872 3 года назад
Maybe append the result texts and run the summarizer on the extended one?
@hambaba2 3 года назад ⁺¹
Hi Nicholas, thank you for your wonderful videos. Have a question on this one, is it possible you add a feature that not only summarizes for each chunk, also for the same chunk it provide a number ( maybe between -1 to 1) that reflects the sentiment for that chunk , was it Positive or negative, in the form of a dataframe with two columns , "Summary", "Sentiment". Thanks again for awesome work you are doing.
@NicholasRenotte 3 года назад ⁺¹
Could definitely do that, try passing the chunk to something like this: ruclips.net/video/szczpgOEdXs/видео.html
@Maicolacola 3 года назад ⁺¹
Hi Nicholas, thanks for putting together an incredible tutorial. I was able to get this going in no time. I managed to use your script (with slight modifications) to summarize an approximately 10.5k scientific article down to 1.4k. Which brings me to my question. I set the max length to 300, but it returned a summary of 1.4k words. Do you know what might be going on here?
I'm going to make a loop that keeps repeatedly runs the summarize code until the length of the text is below 300 words. I'll report back!
@Maicolacola 3 года назад ⁺²
Update: I made a while loop that would keep going until the length of the summary was below the max_length I specified. It took two runs instead of one to achieve that. Despite it being a summary of a summary, it reads really close to the actual abstract. I think with some fine tuning, it could get most of the way there.
@NicholasRenotte 3 года назад ⁺²
THIS IS AWESOME, yeah I've had mixed results with setting max_length and even min_length. Need to do some more digging into it. Wold love to hear mroe about your use case!
@sarahelizabethnajeraespino7464 2 года назад ⁺¹
This is so amazingl! I am a complete beginner with data science but seems so useful! Would it be possible to do the same but for a list of URLs exported in a CSV file?
@NicholasRenotte 2 года назад ⁺¹
Sure could! Scrape them first using BeautifulSoup then run the summarizer over it!
@CODTALES-KILLSTREAKS 3 года назад ⁺¹
Can you do this for tree care blog posts? I’m interested in seeing a summary of tree care posts
@NicholasRenotte 3 года назад
Definitely, grab some tree care blog posts and give it a crack!
@rokeyasiddiqua9375 3 года назад ⁺¹
awesome...!
thanks a lot
@NicholasRenotte 3 года назад
Anytime @Rokeya, thanks for checking it out!
@islamrighi8395 3 года назад ⁺¹
Thank you very much for this tutorial.
For my work I want to summarize pdf text in French, is it possible?
@NicholasRenotte 3 года назад ⁺²
Hmmm, doesn't look like there's an explicit french model. You could use translation to convert to english then summarize and translate back to french though!
@AI-LearnAndEarn 3 года назад ⁺¹
Thanks for the great content. Channel subscribed!!!
Can you please answer a question. Is it possible to create your own Language Model by using the web scraping data ? and then later do the transfer learning with Hugging Face transformers ?
@NicholasRenotte 3 года назад ⁺¹
Sure can, you can fine tune the underlying language models!
@AI-LearnAndEarn 3 года назад
@@NicholasRenotte Can you please make a video on how to do it? fine tune Language model using GPT2. Thanks in advance.
@NicholasRenotte 3 года назад ⁺²
@@AI-LearnAndEarn definitely, I've got it planned!
@ruchisehgal893 3 года назад
Hi Nick. This was a really great blog, but what if i have write the data in a word file(docx file) and have to put the sentences in bullet and add margins?
@ruchisehgal893 3 года назад
Also, there is an issue when a number that appears like 18.8 or 9.1 then this data is separated in different lines if we have to bullet it into different points. Can you let me know how to solve this
@Jandoesrun 3 года назад
I'm really thankful for your videos. You're my life savior for my thesis research.
Can you make a video on how to use MediaPipe by google for hand gesture recorginition?
Also, I'm really at a loss for how I can do my thesis.
Can I consult with you?
I'm trying to make use of the hand landmark values to classify certain words for sign language.
I'm confused whether to use LSTMs, Transformers, BERT , GPT2. This is honestly overwhelming.
@NicholasRenotte 3 года назад ⁺¹
Yup, definitely! It'll be coming soon!
@Jandoesrun 3 года назад ⁺¹
@@NicholasRenotte Thank you very much! Sir Nicholas, I'm trying to research a tensor flow implementation of GPT models. Do you have any ideas?
@NicholasRenotte 3 года назад
@@Jandoesrun hot off the press: ruclips.net/video/cHymMt1SQn8/видео.html it uses PyTorch but you can change the backend to TF as well
@henkhbit5748 3 года назад
As always, great intro. But as you know, not all viewers have a kangaroo 😎in their backyard and have English as their default language. It might be helpful if you're doing NLP to give some side notes about other languages.
BTW: I did a small test and provided Dutch text without translation. And also translate the same text into English as input, summarize and back to Dutch. I compare both summaries and both versions are almost the same. In the non-translated version, a few words have been chopped.
@NicholasRenotte 3 года назад
😂😂 I had a good laugh at the kangaroo reference, was going to say even I don't have kangaroos in my yard. But tbh, I've got some living 20 minutes away from me so that argument was null and void. Wait so in the non-translated version it didn't really summarize?
@slowedReverbJunction 3 года назад ⁺¹
I don't code in python and all just JavaScript , but this one seems interesting , can a noob in NLP like me can try this ??
@NicholasRenotte 3 года назад ⁺²
If you can code in JS, you can probably code in anything 😅. Definitely give it a crack, you'll love it. Python ML and JS are the perfect combo!
@slowedReverbJunction 3 года назад
@@NicholasRenotte that's gr8 to know
I will definitely give it a try now
@NicholasRenotte 3 года назад
@@slowedReverbJunction awesome stuff!
@sameerpatel3201 3 года назад ⁺²
Me: Likes the video even before watching it.
@NicholasRenotte 3 года назад
YEAHHHYAAA! Thanks so much @Sameer!
@Venkatesh-vm4ll Год назад
can we able to ask question and return the summary what we teach
@ingoampt 5 месяцев назад
Can you or someone here tell me how can I make it now as an api and use it in Swift Xcode for an app !
@captainng97 3 года назад ⁺¹
Hi, is this Abstractive or Extractive? 😅
@NicholasRenotte 3 года назад
Heya @Ng, it's using the same model as before which I believe is Extractive!
@testingemailstestingemails4245 3 года назад
how to do that trained huggingface model on my own dataset? how i can start ? i don't know the structure of the dataset? help.. very help
how I store voice and how to lik with its text how to orgnize that
I an looking for any one help me in this planet
Should I look for the answer in Mars?
@diegocaumont5677 3 года назад ⁺¹
dope dope dope
@NicholasRenotte 3 года назад
Yeahyyaa, thanks @Diego!
@amitdutta3875 3 года назад
is it possible to extract data like email phone number address from a text?
Can You make a video on that?
@NicholasRenotte 3 года назад ⁺¹
Definitely, you could use something like OCR: ruclips.net/video/ZVKaWPW9oQY/видео.html
@amitdutta3875 3 года назад ⁺¹
@@NicholasRenotte thank you sir but i was looking for CV or resume parsing to get only important text like email id phone number
@NicholasRenotte 3 года назад ⁺¹
@@amitdutta3875 oh, you could probably extract that using regex or using a text classifier!
@amitdutta3875 3 года назад ⁺¹
Thank you
@amitdutta3875 3 года назад ⁺¹
Actually I heard about entity recognition

Следующие

Автовоспроизведение

Generate Blog Posts with GPT2 & Hugging Face Transformers | AI Text Generation GPT2-Large