Fighting Spam on YouTube with TensorFlow & Python

Поделиться
HTML-код
  • Опубликовано: 24 июн 2021
  • I'm sick of crypto-related spam comments on RUclips, so I trained a machine learning model to delete them! A script runs periodically and uses the text classifier to filter the latest comments on my videos.
    The filter is surprisingly effective, even though the training dataset is relatively small. I'll keep expanding the dataset and retrain the classifier so it becomes more accurate overtime.
    💌 Sign up for Simply Explained Newsletter:
    newsletter.simplyexplained.com
    Monthly newsletter with cool stuff I found on the internet (related to science, technology, biology, and other nerdy things)! No spam. Ever. Promise!
    🌍 Social
    Twitter: / savjee
    Facebook: / savjee
    Blog: savjee.be
    ❤️ Become a Simply Explained member: / @simplyexplained
    👩‍💻 Source code:
    Available on GitHub:
    github.com/Savjee/yt-spam-cla...
    ❓❓ Frequently asked question:
    ❓ Why do I still see spam comments on your channel?
    First, not all comments are caught by the AI and still require manual intervention. Secondly, the script runs on a fixed interval. Give it some time to run. And thirdly, it only filters recent comments. I will let the classifier clean up the old comments as well.
  • НаукаНаука

Комментарии • 148

  • @bahadir2198
    @bahadir2198 2 года назад +20

    So scared of this government issue on always banning crypto.

    • @harrywilson3206
      @harrywilson3206 2 года назад

      Same. But have no fear. When they did early this year I was still earning profit. Because I have a licensed and legitimate USA broker who trades for me.

    • @christysmith5458
      @christysmith5458 2 года назад

      Tell me about it please.

    • @mollynobles5833
      @mollynobles5833 2 года назад

      How do you get the money he trades for you.

    • @kerienstones3679
      @kerienstones3679 2 года назад

      What’s his name. And how do I know his legit

    • @bahadir2198
      @bahadir2198 2 года назад

      @UCeCCBPRZz_fAdI9EX_XlDQw ID@R a m i r e z o s p i n a

  • @QuantumWalnut
    @QuantumWalnut 3 года назад +47

    The fact that machine learning has become so DIY makes me very hopeful for its applications. The cheaper and more accessible it becomes, the more democratic this technology can be.

    • @cybr774
      @cybr774 3 года назад +4

      lol "democratic"

    • @TheALPHA1550
      @TheALPHA1550 2 года назад

      @@cybr774 Seriously.

    • @zoc2
      @zoc2 2 года назад +1

      This video makes me want to get into it myself. I think I'm going to dedicate tomorrow to machine learning!

  • @Tredecillionscience
    @Tredecillionscience 3 года назад +20

    The explanation is so smooth and easy to understand! Really appreciate the effort you made ^^

  • @zyansheep
    @zyansheep 3 года назад +41

    RUclips has no incentive to fix this. We wouldn't even need a spam filter if they just used a better type of comment system (like the one reddit uses)
    Edit: python is pretty awesome...

    • @simplyexplained
      @simplyexplained  3 года назад +15

      Well, I could think of a few simple measures. Require people to validate a phone number, limit the amount of comments you can post in a day, and indeed, allow everyone to vote and moderate like Reddit.
      Oh yeah Python is great! Just started using it, and I'm amazed at how easy it is to learn, to use, and how many great libraries there are.

    • @onatkorucu842
      @onatkorucu842 2 года назад +1

      Why do you think that reddit is better?

    • @isableye7164
      @isableye7164 2 года назад +6

      @@onatkorucu842lol bcoz unlike youtube their downvote button actually works.

    • @dragonhold4
      @dragonhold4 2 года назад +4

      In every Reddit thread there are many legitimate comments that tell the truth but may be inconvenient so they are bombed by a mob and unfairly set to hidden, removed, or put to the very bottom of the infinity scroll. This system is far from ideal.

    • @supertron6039
      @supertron6039 2 года назад

      @@isableye7164 lmao

  • @theguyordie
    @theguyordie 3 года назад +17

    Add some Oauth support, swap out Google sheets with an SQL database, add a simple dashboard and you've got yourself a really neat PaaS product that could do really well!

    • @JohnyK07
      @JohnyK07 3 года назад +1

      yeah, I can see that Sheets file getting filled pretty quickly... (each one has a limit of 5 million cells...yup, cells, not rows).
      Google also has BigQuery, if he wants to keep it in the cloud and benefit from Sheets integration, but any sql database would do nicely.

  • @evergreen-
    @evergreen- 3 года назад +12

    Simply Explained: So far I haven’t noticed false positives with this AI
    Comments thread under this video: *empty*

  • @adarshkumar3518
    @adarshkumar3518 3 года назад +21

    Lol. RUclips needs this 😂

  • @supertron6039
    @supertron6039 2 года назад +2

    Now imagine if YT chose to make similar text identifiers in their thousands of servers to clean up all of YT from trolls, porn adverts and fake website links. _Like that's ever gonna happen._

  • @Pix2io
    @Pix2io Год назад +1

    it would be cool if RUclips themselves starting selling spam removal devices snd maybe cameras and gaming gear for RUclipsrs.

  • @simonalogiudice7581
    @simonalogiudice7581 2 года назад +1

    Amazing! You explain things very clearly, good job!

  • @varunahlawat9013
    @varunahlawat9013 Год назад

    "Bring it on"
    LMAO!
    Really appreciate this video!

  • @YoushaAhmad
    @YoushaAhmad 3 года назад +1

    This impressive. Well done and thank you! I have been disappointed with RUclips for not sorting this out considering Alphabet has some highly capable AI projects, which they often show off in press releases. If they don't get on top of this perhaps you could continue to work on this an license it to other channels to use themselves, or open source it with donations/ a freemium option.
    I used to flag more comments as spam, but thought RUclips was going to do something years ago. It is also annoying when large channels like Bloomberg leave spam up, whilst it isn't their fault they have the means to help get rid of it themselves.

  • @selvamselvam3670
    @selvamselvam3670 Год назад

    you always explain complicated topics in a simple way.. love your way of explaining.. you got a new subscriber

  • @reikiorgone
    @reikiorgone Год назад +1

    Testing organically that by retraining the model this comment is an organic test of Dogecoin xlm RUclips also I love you this was a great video

  • @vamshi4956
    @vamshi4956 2 года назад

    Wow. I was thinking about the spam and I found your video. I now have some idea on where to start even though I know nothing but Java. I'll update if my make any progress. Thank you and great content. Cheers.

  • @brombeerbert2768
    @brombeerbert2768 Год назад

    Nice Channel! Thank you for all the education.
    I just had to let you know that i randomly ended up on your Channel after clickíng an ad on The Million Dollar Website.
    Had me laughing :D

  • @benjaminkirbytennyson386
    @benjaminkirbytennyson386 2 года назад

    Thanks for sharing this video, now I can show this to my engineer friend to overcome your spam filter.

  • @MyBizTT
    @MyBizTT 3 года назад +1

    Excellent video! I'm guessing Naive Bayes Classifier?

  • @chaos_monster
    @chaos_monster 2 года назад

    I understand your need to run it on a server, but I am also really happy to see CoreML running on the devices only - that makes me feel a little bit less paranoia :D

  • @devadevans700
    @devadevans700 2 года назад

    Hey long time no see, your content is really good, patience and consistency is important

  • @beloved3244
    @beloved3244 3 года назад +1

    Dude this is awesome!!!

  • @aayushgore4245
    @aayushgore4245 Год назад +1

    Hotdog NOT HOTdog!! 🤣🤣

  • @prikshitparashar8950
    @prikshitparashar8950 3 года назад

    Amazing work !

  • @mr_vinod_123
    @mr_vinod_123 2 года назад +1

    Amazing explanation. 👏👏

  • @polavenki
    @polavenki 2 года назад +1

    Was wondering about your thoughts on using the pre-trained zero shot models like GPT for this use case?

  • @gosper420tyvs
    @gosper420tyvs Год назад +1

    You are awesomeness bro !!! Thank you for sharing !!! 🔥😎🔥😎🫶🏼👌🏼

  • @ahmad_dos5563
    @ahmad_dos5563 2 года назад

    That’s fascinating bro

  • @robertb7003
    @robertb7003 3 года назад +2

    That's totally awesome. Shows the power of using APIs. You could definitely make some money on this even if youtube didnt hire you. Your server looks like it could handle other youtuber's channels as well. Thanks for the videos

    • @simplyexplained
      @simplyexplained  3 года назад +2

      I might consider doing that if I get some requests from channels. Haha, the server appears to be very fancy, but it's actually a very old one. I removed all the hardware and put in a low-power CPU. It's mainly used for backups and home automation.

    • @2DReanimation
      @2DReanimation 2 года назад

      @@simplyexplained You could write spam bots to get all the big youtubers to buy your product XD

  • @thesultan1212
    @thesultan1212 3 года назад +1

    Dude this is amazing!

  • @Kim-by5uy
    @Kim-by5uy 2 года назад

    Can we all take a momment to appreciate the great and easy-to-follow explanation

  • @derickrcruz
    @derickrcruz 3 года назад +1

    0:47 dammit Jian-Yang

    • @simplyexplained
      @simplyexplained  3 года назад +1

      Finally someone who noticed the reference!

  • @ffcml1733
    @ffcml1733 2 года назад

    From where you studied all these programming and other stuffs

  • @anuragdhondge9579
    @anuragdhondge9579 3 года назад +6

    This is a spam comment. Deal with it algorithm.

    • @nethoncho
      @nethoncho 3 года назад +1

      LOL

    • @simplyexplained
      @simplyexplained  3 года назад +8

      Algorithm says: 8,5% chance of being spam. Try harder ;)

    • @anuragdhondge9579
      @anuragdhondge9579 3 года назад +3

      @@simplyexplained 😂😂
      Love your channel, keep up the work..thanks for the reply

  • @HesderOleh
    @HesderOleh 2 года назад

    ThioJoe just made a script that requires you to name the spammer, while looking for training data to see if I could automate it, I see you have already done this!

  • @PawirodinomoM
    @PawirodinomoM 3 года назад +1

    Amazing!

  • @AamishSohailRamay
    @AamishSohailRamay Год назад +1

    which software you use to make these attractive videos?

  • @reold
    @reold 3 года назад

    You could also use a python any where or heroku server if you need the home server back

    • @simplyexplained
      @simplyexplained  3 года назад +1

      True, I thought about going that route. But my home server is running Proxmox. Plenty of space for VM's and containers like this ;)

  • @jamesseddon1637
    @jamesseddon1637 2 года назад

    @savjee Man great video, I've been writing a small script to scrape comments and detect deleted comments, I've been working with Panda and CSV and it's an absolute nightmare, especially as you say when trying to constantly append data to the CSV and read it back in a loop. Mega thanks for the source code, I'm going to implement a similar approach using Google Sheets and see how that goes. Out of curiosity, have you hit any limits with the RUclips API?

    • @jamesseddon1637
      @jamesseddon1637 2 года назад

      I noticed a small bug, line 76 where you rest allComs you use use the wrong variable name (allComms instead of allComs).
      "# Reset list before we continue
      allComms = []"

    • @jamesseddon1637
      @jamesseddon1637 2 года назад

      Scratch that I don't think that line is even needed.

    • @simplyexplained
      @simplyexplained  2 года назад

      Yeah, CSV files are a mess. Also, the RUclips Data API isn't very easy to started with.
      As for the limits: I did request a quota increase and got it very quickly. However, I don't really need it. This channel doesn't get that many comments.
      Thank you for spotting that! I removed the allComms line because it was indeed unnecessary.

  • @sneu420
    @sneu420 3 года назад

    Hey hey 0:26, that's me on one of your videos...!

  • @crashia
    @crashia 2 года назад

    When are new videos coming? I've just discovered this channel and I'm in love 💓

  • @ryugadebo
    @ryugadebo Год назад

    Loved the video

  • @nurtorekelesov4286
    @nurtorekelesov4286 Год назад +1

    this was amazing

  • @NewMateo
    @NewMateo 3 года назад +4

    Man I gotta learn Python.

    • @simplyexplained
      @simplyexplained  3 года назад +2

      I just started learning it, and I'm loving it so far!

  • @hem89180
    @hem89180 2 года назад

    Well done.

  • @UDKO2
    @UDKO2 2 года назад

    How do you make it run every hour ?

  • @nixonlauture7337
    @nixonlauture7337 3 года назад +2

    Can you add a step-by-step for this?

    • @simplyexplained
      @simplyexplained  3 года назад +3

      The source code is on GitHub. I think the Jupyter notebook is easy to follow, but I might do a tutorial video on it. No promises though ;)

  • @Silverdev2482
    @Silverdev2482 3 года назад

    i tested the filter with a fake spam comment and it works

  • @md.najmulhasan8774
    @md.najmulhasan8774 Год назад

    wow that is amazing :)

  • @tuna1270
    @tuna1270 2 года назад

    Hope you can do a step by step tutorial for this!!! very cool.😎😎

    • @cyber3808
      @cyber3808 2 года назад

      THANK YOU FOR WATCHING FOR CRYPTO GUIDANCE SEND MSG RIGHT AWAY WHAT'SAPP

    • @cyber3808
      @cyber3808 2 года назад

      What'sApp✚447459667378

    • @sinankoa824
      @sinankoa824 2 года назад

      @@cyber3808 aint no way

  • @YuzuruA
    @YuzuruA 2 года назад

    The fact that google can´t emulate a simple DIY solution created by a loner youtuber speaks volumes about their commitment.

  • @tahatatakorshow5396
    @tahatatakorshow5396 2 года назад

    Xavier.
    Did YT hired you?
    You are amazing.

    • @simplyexplained
      @simplyexplained  2 года назад +1

      No they didn't! But my anti-spam bot is still going strong ;)

    • @tahatatakorshow5396
      @tahatatakorshow5396 2 года назад

      @@simplyexplained well well all know how amazing you are Xavier.
      I hope you get a better position and maybe hire me one day 😁

  • @machashanker6407
    @machashanker6407 2 года назад

    Can we Know Which software using for Animation

  • @silvernaturemusic599
    @silvernaturemusic599 3 года назад +1

    No spam in the comment box proves it right.

  • @pravallikadamerla9835
    @pravallikadamerla9835 2 месяца назад

    Can you please share source code

  • @blackmennewstyle
    @blackmennewstyle 3 года назад +5

    RUclips actually has no interest to remove these spams since i'm pretty sure ironically, they are also probably involved in huge amount of ads campaigns, very lucrative during the human malware pandemic outbreak ;)
    Let's see if your spam filter detects me as a spam :p
    Have a great weekend my brother and keep it up the great job

    • @simplyexplained
      @simplyexplained  3 года назад +3

      It says there's a 0.3% chance that your comment is spam. You're safe ;)
      Have a nice weekend as well!

  • @justhere9549
    @justhere9549 2 года назад

    Will u ever come back?

  • @EsterMelati
    @EsterMelati 2 года назад

    I rly wanna try tensorflow :(

  • @FrancisGauthier2
    @FrancisGauthier2 3 года назад

    I wonder if be Bayesian filter algorithm is now outdated by AI

  • @mariosasic4251
    @mariosasic4251 2 года назад +1

    nice video, good jov bro :D

  • @amanda188
    @amanda188 2 года назад

    Disculpe, he visto su canal en RUclips. Estoy muy interesado. Si está interesado en una asociación empresarial, podemos hablar de los detalles.

  • @palabinash
    @palabinash 3 года назад

    Nice

  • @REAnyAJ
    @REAnyAJ 2 года назад

    Badass

  • @cloudtech0903
    @cloudtech0903 4 месяца назад

    Share the code

  • @dizaj
    @dizaj 2 года назад

    👍👍👍

  • @ghipsandrew
    @ghipsandrew 2 года назад

    Maybe the spammers will train an adversarial network to engineer their comments so as to trick your model :O

  • @miguelbertonatti
    @miguelbertonatti 3 года назад

    👌🏼

  • @freebie808
    @freebie808 3 года назад

    Cool

  • @yyjj7934
    @yyjj7934 2 года назад

    Hello sir, how can I contact you?

  • @2DReanimation
    @2DReanimation 2 года назад

    Wouldn't it have been better if all the manually not classified comments would have a value of 0.5 for "could be spam or not"?
    2:00: oh, you removed the non-tagged comments. That makes sense.

  • @zaurmustafayev7248
    @zaurmustafayev7248 3 года назад

    Love your idea, would you mind to share the source code with me? :) Happy to hear your feedback

    • @simplyexplained
      @simplyexplained  3 года назад +1

      Sure! I mentioned it at the end of the video. Source code is on GitHub, link in the description.

  • @CuinnHerrick
    @CuinnHerrick 3 года назад

    Let's give it a go...
    Get rich quick now. $$$$ Not spam. True wealth creation. 😋

  • @Lord0x
    @Lord0x 2 года назад

    amazing video.

  • @grindererrofficial3755
    @grindererrofficial3755 2 года назад

    Is he is alive or died ? :( 11months been silent :(

  • @siddarthgurram5023
    @siddarthgurram5023 2 года назад

    My 🧠 : go spam a comment and check if it gets reported as spam ~~he said more the data better the prediction let's help him~~

  • @3F34N1M4T3S
    @3F34N1M4T3S 2 года назад

    Came here from million dollar homepage

  • @alymuni
    @alymuni 3 года назад +1

    This is a test to see if my comment gets deleated :D just for fun, anyway still a good video.
    SPAM SPAM find me

    • @simplyexplained
      @simplyexplained  2 года назад +2

      Nope, algorithm says only 2% chance of being spam ;)

    • @alymuni
      @alymuni 2 года назад +1

      @@simplyexplained aa ok xD thank for letting me kbow

  • @lucagiovanni658
    @lucagiovanni658 Год назад +6

    Great video!!! Very engaging...
    With everything going on right now, the best decision is having a profitable investment strategy. Stocks are good but crypto is better.

  • @RickyAraujoOficial
    @RickyAraujoOficial 2 года назад

    COPYRIGHT REMOVAL APPEAL
    Hi Xavier, how are you? My name is Ricky Araujo. You reported the video I posted on RUclips for violating rights to your video, I understand you have the right to do so, but I humbly apologize for that.
    I'm a fan of your content that's why I subscribed to your channel, at the time I watched your video and I thought it was so rich that I didn't think twice about wanting to copy it, but I'm here begging you for a venomous apology and I ask you to remove your information, as this radically gets in the way. my growth here in Brazil.
    Also, I can post a video apologizing and put your channel in my video description.
    Don't worry about it anymore, it will never happen again. I just ask that you please withdraw a complaint.
    Att. Ricky Araújo

  • @RealSweveel
    @RealSweveel 2 года назад

    A

  • @Andy11876
    @Andy11876 2 года назад

    Hi I have something important to tell you

  • @sitbackandrelax2482
    @sitbackandrelax2482 2 года назад

    i am a spam

  • @spiritbears
    @spiritbears 2 года назад +1

    Hey spam filter don't delete my comment its not a spam😂

  • @nethoncho
    @nethoncho 3 года назад +1

    This comment may be spam...

  • @ratgreen
    @ratgreen 3 года назад +1

    Great stuff, I hope yt actually does something. I must admit some of the comments are very legit, like the first 10 or so comments will look like a pretty normal conversation, perhaps a bit scripted but the actual bait will be many comments below, with a pretty legit setup, ie, oh 'I wish I had known how to trade' 'I too didnt know the tricks of trading until I was introduced to Dr Sue Bateman who taught me' 'oh do you have contact details' 'oh, yes you can contact her on WA on 0000000000'
    So flagging the entire thread of spam, probably looks like people abusing the report feature to youtube, as they read like legit comments. Its only when you read the entire thread, which I assume ML wont pick up on, that it becomes spammy.
    Also I assumed they are bots, but I've actually seen some of them reply to real comments. Which was odd.
    Lets see how your filter does with that ^ too ha

    • @simplyexplained
      @simplyexplained  3 года назад +1

      You're 100% correct. Comments like "I wish I had known how to trade" are tricky. By themselves, they're not spam. But the replies it gets are. So I trained the model exactly like this. As soon as someone mentions another person to help them, it's spam.
      My filter goes through top-level comments as well as replies and processes them individually. So a top level comment "I wish I had known how to trade" might be left alone, while the replies might get removed.
      Anyway, I'll tweak the script as time goes on. But so far it seems to do quite well. Fingers crossed!

  • @needabettername1559
    @needabettername1559 2 года назад

    Profits money love xavier bitcoin test this is a test simply explained