Fake News Detection Intro using Machine Learning (ML) Models and Natural Language Processing (NLP)

Поделиться
HTML-код
  • Опубликовано: 12 сен 2024
  • Fake news is all around us - whether we can identify it or not. Individuals and organizations publish fake news all the time, whether it be for a persuasion tactic, or to simply override unfavorable truths. Take the search for a Covid-19 vaccine for example, an issue that is especially relevant in our current times. Before a vaccine came out, there were some sources that stated there was already a fully effective vaccine available, some that stated it was coming very soon, and others that stated that it would take decades for a safe and functional one to be released. And trusting and following the wrong source can lead to more harm than good.
    Now the question becomes, which websites do we trust, and which do we ignore? In most cases, it might not always be this clear as to which sites to trust or reject, and which sites are real or fake.
    Fortunately, Big Data can save the day for us! In today’s world of ever growing data streams, one can imagine crunching through the volumes of data to detect patterns, which can then be analyzed to separate out real news from fake.
    That is exactly the project I executed on - a fake news detection machine learning model that utilizes advanced natural language processing techniques to classify news websites as either fake or real.
    This machine learning model utilizes binary classification to identify whether a news site is fake or real, in which an output of ‘1’ indicates that the website is most likely fake and ‘0’ indicates that the site is indeed trustworthy. It will take in a list of website URLs and corresponding raw HTML as input data and will train a logistic regression model to output a label of either 0 or 1 depending on whether the website is real or fake.
    The core of this model comes in the form of the various natural language processing techniques deployed to transform the input data, previously in the form of words, into numbers that the machine can understand and learn from. I have transformed this data by creating and importing several functions generally referred to as featurizers. The purpose of these featurizers is to extract key features of the URL and HTML that may help predict the trustworthiness of the site and transform the data into numerical values to input into the logistic regression model.
    To obtain the data necessary for my model, I scraped the web for news websites and compiled a set of *2557 sites, consisting of roughly 50% fake and 50% real. I then split my data into a training set, cross-validation set, and a test set.
    I created my first baseline featurizer to be a domain featurizer that extracts basic features from the domain name extension of each website. This domain featurizer takes in a URL and an HTML and returns a dictionary mapping the feature descriptions to numerical features. The accuracy of this model was only 55%, which was not surprising as the domain extension, while might provide some clues, cannot be the deterministic predictor of a website’s trustworthiness.
    The key problem with this model is that there is simply not enough information. To combat this issue, I decided my next step would be to make use of specific (and potentially predictive) keywords of the HTML in addition to the domain extension to feed into the logistic regression model. After a series of steps, I used a logistic regression model to get an accuracy of 73%.
    The model performed considerably better than the domain method, but as this is still a relatively simple method, I started to think of more nuanced approaches. The meta descriptions of a website’s HTML is a great source of information conveying the core content of that website. As an improvement to my last keyword featurizer, I used the Bag-of-Words NLP model. Once I obtained my score reports for this model, I observed that all of the metrics yielded much higher percentages that before.
    Now a shortcoming of the bag-of-words model was that it only looked at the counts of words in the description for each website. But then I pondered if there was a way to somehow understand the meaning of the words in the descriptions for each site. This is where word vectors come in. I utilized a model called GloVe, to accomplish this task. This model yielded an accuracy of about 87%.
    Given that I tried out several different featurizers and observed the score reports for each, I was curious to find out if I would obtain improved results when I combine all of the featurization approaches. I then passed the concatenated vector into my logistic regression model and obtained an accuracy of 91%, which was the highest yet.
    It was time to test it out on the unseen test data to obtain the real accuracy of my model. I obtained the score reports and observed my model predicted the trustworthiness of news websites with 91% accuracy.
    As with any machine learning model, there are places for improving the score metrics even further, such as obtaining a larger dataset, developing more featurization approaches, etc.

Комментарии • 109

  • @kawaiinoona8383
    @kawaiinoona8383 3 года назад

    It is good that we have this kind of machine who can filter fake news from real news and the accuracy is amazing.

  • @SachinMadushan
    @SachinMadushan 3 года назад +1

    Impressed about the accuracy level 91% of this model. Great info! Thank you so much for the explanation.

    • @Acadaimy
      @Acadaimy  3 года назад

      Glad it was helpful!

  • @mariels4683
    @mariels4683 3 года назад

    The fact that you studied properly for this awesome information just can blow everyone's mind. I hope everyone will see this and make it to the top.

  • @hungrynomadph105
    @hungrynomadph105 3 года назад

    This is very relevant nowadays because fake information is so easy to spread. Fascinated by your logistic regression model that obtained an accuracy of 91%. Good job! Thank you for sharing this information.

  • @judillewaling1733
    @judillewaling1733 3 года назад

    This video is very informative which give us important views on how to detect fake news. We all know that fake news is very well-known today. Through this video everyone will know what is the bad effect of fake news and how to handle fakes using this devices.

  • @chan26
    @chan26 3 года назад

    this is really helpful. to learn whether it is fake or not is a must especially for today's situation all over the world. with the accuracy of 91%, it is also very impressive and the way it was explained so that others would understand it very clearly.

  • @crescarlocariza3041
    @crescarlocariza3041 3 года назад +2

    Great video , it definitely helped me in understanding the different classification problems. also im very impressed with the 91% accuracy level of this model.

  • @joydeepmitra5973
    @joydeepmitra5973 2 года назад

    Useful and impressive video, it was good to hear about your model, thankyou for sharing.

  • @ayezasantos
    @ayezasantos 3 года назад +1

    This is a great video! Very timely especially with what's currently happening. Very important to know what's real and what's fake. Thanks for sharing this. Move videos to come!

  • @silambusaravanan9930
    @silambusaravanan9930 3 года назад

    Good information for the society about the fake news, I like the information

  • @bilibabachan2798
    @bilibabachan2798 3 года назад

    This is very good and helpful information, thank you for sharing

  • @mariegracesoltones2920
    @mariegracesoltones2920 3 года назад

    Thankyouuuu for your sensible thoughts and warning for us!

  • @divinegracemones7468
    @divinegracemones7468 3 года назад

    Nice video. The fake news detection model you've developed is very timely nowadays because there are lots of fake news, especially on social media. This model is very helpful for us to know what is real and not.

  • @madhumitakarmakar8086
    @madhumitakarmakar8086 2 года назад

    Very helpful and amazing video.

  • @neststudios9755
    @neststudios9755 2 года назад

    this video is really made of good purpose in social aspects.

  • @RumpaPal-w8t
    @RumpaPal-w8t Год назад

    Thank you for sharing this informative video. Yes this is very helpful for us.I always wanted to know about AI.

  • @abriojhonpouly.8937
    @abriojhonpouly.8937 3 года назад

    very helpful information for people thank you and keep it up!

  • @irishjoydiwa4233
    @irishjoydiwa4233 3 года назад

    great video content and a very informative one at that..

  • @altonsmith712
    @altonsmith712 3 года назад

    Impressive detailing.

  • @breyltecson4948
    @breyltecson4948 3 года назад

    This is a great video it really helps me thanks the free tutorial. I can understand whats the difference of fake and real news. Happy to discover this video. Amazing video with 91 percent of accuracy level.

  • @research7287
    @research7287 3 года назад

    Great information. Everyone should know about this information

  • @anantawagle1382
    @anantawagle1382 3 года назад

    I am a newbie in machine learning but loved the way you explain.

  • @domzkijane8160
    @domzkijane8160 3 года назад

    Interesting. Thanks for the information.

  • @antaramukherjeer6021
    @antaramukherjeer6021 2 года назад

    Thank you for sharing such amazing video which helps me to understand the different classifications. Also understand the level of 91% accuracy of this model.

  • @reyd.pis-an3148
    @reyd.pis-an3148 3 года назад

    Thanks for letting us know.

  • @SoumikaAichRoy
    @SoumikaAichRoy 3 года назад

    Wow! This is really helpful and interesting!!!

  • @sarojmanikpuri6131
    @sarojmanikpuri6131 3 года назад

    Thanks a lot. I had no idea about detecting of fake news. This video helped me.

  • @kimberlybernaldez209
    @kimberlybernaldez209 3 года назад

    These information were so reliable and it enlightened me of some issues.

  • @SadBoy-im4hy
    @SadBoy-im4hy 3 года назад

    Very informative. Thank you

  • @phirstparkhomesbyjoannsoliva
    @phirstparkhomesbyjoannsoliva 3 года назад

    This kind of video is a must watch. Informative and factual. I like it, keep it up!

  • @lalithaswaminathan9107
    @lalithaswaminathan9107 3 года назад

    Wow fake news deduction will be very useful,and informative

  • @princessfiller809
    @princessfiller809 3 года назад

    It was indeed a great video to watch to, Thank you for this video, I am hoping for more videos like this.

  • @vino4076
    @vino4076 3 года назад

    Wow...this was awesome and innovative

  • @rimidey4500
    @rimidey4500 3 года назад

    A great informative video

  • @darylbucasas2667
    @darylbucasas2667 3 года назад

    What a great and informative video for everyone about the exploratory data analysis.

  • @kennysmith7714
    @kennysmith7714 3 года назад

    NIce! super informative

  • @wfr9495
    @wfr9495 3 года назад

    This is very good information. Thanks 👍👍

  • @christinemarie7476
    @christinemarie7476 2 года назад

    It's really very interesting and helpful to detect fake news ! thank for sharing such valuable information .

  • @karlnikkamallari3204
    @karlnikkamallari3204 3 года назад

    Great informations , I love and like watching you because I can learn a lot fr you , you are great I learned a lot about AI because of you

  • @TraderArcane
    @TraderArcane 3 года назад

    I'm a computer science student and your video is really informative and helpful to me

    • @Acadaimy
      @Acadaimy  3 года назад +1

      Glad to hear, good luck!

  • @priyankamondal5924
    @priyankamondal5924 3 года назад

    Great information

  • @junkiegamer9825
    @junkiegamer9825 3 года назад

    Great video, I learned something new today.

  • @abhinavmishra2176
    @abhinavmishra2176 2 года назад

    Informative

  • @priyankamaity3754
    @priyankamaity3754 3 года назад

    Thank you for sharing the information.

  • @elamir6326
    @elamir6326 2 года назад

    Nice! I want to create programs like this too! hoping I would finish my training in IT. Good video!

  • @annkumar100
    @annkumar100 3 года назад

    Keep up the good work

  • @daylinnilyad5134
    @daylinnilyad5134 3 года назад

    thank you for this video. Great info

  • @alirezaghanbari8686
    @alirezaghanbari8686 2 года назад

    Wery good mis maral

  • @aadhilahamed8471
    @aadhilahamed8471 3 года назад

    100% useful information

  • @yasirabdulkareem9844
    @yasirabdulkareem9844 2 года назад

    Good job , is this work for a thesis ?

  • @redsky1545
    @redsky1545 3 года назад

    Very helpful information for me thank you and keep it up!

  • @videoclips6101
    @videoclips6101 3 года назад

    Great info very helpful thank you

  • @kyrolazx7408
    @kyrolazx7408 3 года назад

    Thank you for this. Great info

  • @vunam5000
    @vunam5000 3 года назад

    Great video , it really helped me in understanding classification problems

  • @linielfiguron5901
    @linielfiguron5901 3 года назад +1

    good job

  • @RobertNMoore-hj6vj
    @RobertNMoore-hj6vj 3 года назад

    Very informative video.

    • @Acadaimy
      @Acadaimy  3 года назад

      Glad it was helpful!

  • @nilusultana8573
    @nilusultana8573 3 года назад

    Very informative video

    • @Acadaimy
      @Acadaimy  3 года назад

      Glad you think so!

  • @chakmarenuka27
    @chakmarenuka27 3 года назад

    Helpful instructions

  • @zoyaroy3747
    @zoyaroy3747 2 года назад

    This is a great video. This is informative and factual.

  • @mariaabragan6317
    @mariaabragan6317 3 года назад

    This is very informative

  • @ohmatokita5990
    @ohmatokita5990 2 года назад +1

    Please share some code Indian, or no one know if you truly did it

  • @rumapalvlog
    @rumapalvlog 3 года назад

    Very interesting

    • @Acadaimy
      @Acadaimy  3 года назад

      Glad you think so!

  • @AnilSahu-ye9yt
    @AnilSahu-ye9yt 3 года назад

    Very niceee

  • @raejm124
    @raejm124 3 года назад

    Thank you :D !!! Great and informative :)

    • @Acadaimy
      @Acadaimy  3 года назад

      Glad it was helpful!

  • @melodyubod3670
    @melodyubod3670 3 года назад

    very interesting.

  • @Manshairstylist-me5id
    @Manshairstylist-me5id 3 года назад

    Super

  • @Leadbrew
    @Leadbrew 3 года назад

    Great intro, is there a more detailed video with details coming?

    • @Acadaimy
      @Acadaimy  3 года назад

      Yes, just posted!

  • @arjanalamis2726
    @arjanalamis2726 3 года назад

    Wow its really cool and very informative. keep it up :D

  • @bryandavefabrigas4880
    @bryandavefabrigas4880 3 года назад

    Thanks for the info . and you misspell youre artificial .

  • @user-dt2on2so8n
    @user-dt2on2so8n 11 месяцев назад

    source code?

  • @vikasbishnoi3727
    @vikasbishnoi3727 3 года назад

    Grow account at instagram.com/horse._paradise_/

  • @rullyajisasmito6530
    @rullyajisasmito6530 3 года назад

    Impressed about the accuracy level 91% of this model. Great info! Thank you so much for the explanation.

  • @paygrashan1880
    @paygrashan1880 3 года назад

    Great information. Everyone should know about this information

  • @monsterg7312
    @monsterg7312 3 года назад

    Great video , it really helped me in understanding classification problems

  • @icarusfalls5647
    @icarusfalls5647 3 года назад

    This is very good and helpful information, thank you for sharing

  • @thebongray3132
    @thebongray3132 3 года назад

    This is very informative

  • @jitendrashaw5745
    @jitendrashaw5745 3 года назад

    Great information. Everyone should know about this information