Project 10. Credit Card Fraud Detection using Machine Learning in Python | Machine Learning Projects

Поделиться
HTML-код
  • Опубликовано: 8 апр 2021
  • Hi! I will be conducting one-on-one discussion with all channel members. Checkout the perks and Join membership if interested: / @siddhardhan Check membership Perks: / @siddhardhan
    . In this video we have built a Credit card Fraud Detection system using Machine Learning with Python. For this project, we have used the Logistic Regression model.
    All presentation files for the Machine Learning course as PDF for as low as ₹200 (INR): Drop a mail to siddhardhans2317@gmail.com
    Enroll at One Neuron to learn from 100 courses in one subscription with 5% discount: courses.ineuron.ai/neurons/Te...
    Machine Learning Projects Playlist: • Machine Learning Projects
    Machine Learning Course with Python Playlist: • Machine Learning Cours...
    Hello everyone! I am setting up a donation campaign for my RUclips Channel. If you like my videos and wish to support me financially, you can donate through the following means:
    From India 👉 UPI ID : siddhardhselvam2317@oksbi
    Outside of India? 👉 Paypal id: siddhardhselvam2317@gmail.com
    (No donation is small. Every penny counts)
    Thanks in advance!
    Hi guys! I am Siddhardhan. I work in the field of Data Science and Machine Learning. It all started with my curiosity to learn about Artificial Intelligence and the ability of AI to solve several Real Life Problems. I worked on several Machine Learning & Deep Learning projects involving Computer Vision.
    I am on this journey to empower as many students & working professionals as possible with the knowledge of Machine Learning and Artificial Intelligence.
    Let's build a Community of Machine Learning experts! Kindly Subscribe here👉 tinyurl.com/md0gjbis
    I am making a "Hands-on Machine Learning Course with Python" in RUclips. I'll be posting 3 videos per week: Monday Evening; Wednesday Evening; Friday Evening.
    Dataset file: www.kaggle.com/mlg-ulb/credit...
    Colab File Link: colab.research.google.com/dri...
    Download the Course Curriculum File from here: drive.google.com/file/d/17i0c...
    LinkedIn: / siddhardhan-s-741652207
    Telegram Group: t.me/siddhardhan
    Facebook group: groups/49085... Instagram: / siddhardhan23

Комментарии • 317

  • @rajatagarwal4436
    @rajatagarwal4436 2 года назад +17

    wow, this is such a fluent, smooth explainer, I am a mere beginner, and still able to understand almost everything. My teacher was not even able to explain a line of it correctly.

  • @saisasikanthduvvuri2209
    @saisasikanthduvvuri2209 11 месяцев назад +2

    Hello, @Siddhardhan Your presentation feed me more than I thought. Really awesome accuracy score and good EDA. Thank you for your video.

  • @marcoandresc.5560
    @marcoandresc.5560 2 года назад +5

    Many RUclips videos and even courses in Spanish do not explain very well and do not cover everything necessary to do a machine learning project, but you explain it very well. I'm not very good at English but I understood the procedure very well, excellent video

  • @kandrunaresh-mx9zj
    @kandrunaresh-mx9zj Год назад +5

    Hi siddhardhan..Your explaination is awesome.. keep up the good work..Nice comparision between over fitting and under fitting over accuracy.. and nice example too..

  • @jeeruveeresh8942
    @jeeruveeresh8942 11 месяцев назад

    simple and nice explanation...,I didn't no machine learning. just, I know python but the way of your explanation helps me lot to understand machine learning.Thanks a lot.

  • @556west
    @556west Год назад +5

    Hi, Siddhardhan
    "I just wanted to take a moment to express my sincere gratitude for the excellent tutorial on Credit Card Fraud Detection using Machine Learning in Python that you posted on RUclips. Your clear and concise explanations, combined with the practical examples, have helped me tremendously in understanding the fundamentals of this complex topic. Your dedication to providing high-quality content is evident, and I appreciate the time and effort you put into creating such an informative tutorial. Once again, thank you so much for sharing your knowledge with the world - you're making a positive impact on the lives of many, including mine."

  • @sandipansarkar9211
    @sandipansarkar9211 2 года назад +1

    finished coding practice .Feeling a lot confident

  • @virago8883
    @virago8883 2 года назад +13

    The way you explained was really amazing. I was able to clear all my doubts regarding this project. Keep up the good work and seriously thanks a lot for providing such a great content!!!!

    • @Siddhardhan
      @Siddhardhan  2 года назад +1

      thanks a lot for your positive words 😇

    • @nandinimadan6421
      @nandinimadan6421 Год назад

      when I am loading the dataset and checking the null values I am getting some,but he is not is anyone else getting this error

    • @musical_touch.
      @musical_touch. 10 месяцев назад

      ​@@Siddhardhan accuracy of this prj?

  • @Ricksanches_2001
    @Ricksanches_2001 Год назад

    Great work Siddhardhan, you really explained it in an amazing way

  • @mamondhar1823
    @mamondhar1823 8 месяцев назад

    Really explain step by step very easy way . Want more vdo on machine learning as Covid Detection etc

  • @faizanwar_
    @faizanwar_ Год назад

    Very informative video...thanks for your community work....God bless 🙏

  • @rohitsarkar9338
    @rohitsarkar9338 3 года назад +2

    A great teacher ❤️❤️❤️❤️❤️❤️ i have ever seen in my life...who is explaining each and every line of a big project code🔥🔥🔥🔥🔥🔥🔥🔥

    • @Siddhardhan
      @Siddhardhan  3 года назад +1

      Thanks a ton 😇

    • @rohitsarkar9338
      @rohitsarkar9338 3 года назад

      @@Siddhardhan i can't explain you how much you help me for clearing my doubt.. that stuck into my brain for atleast a week❤️

  • @kamaleshsenthilmurugan1561
    @kamaleshsenthilmurugan1561 Год назад +25

    Extraordinary content! I have watched all your videos from hands on ml course to this one.Everything was explained such that even a beginner would understand it. You have a really great gift in teaching complex stuff in a easy manner. My request to you is to keep teaching like this so that you will be able to change the life of lots of people like me. I am going to recommend this channel to all my juniors and friends.

    • @Siddhardhan
      @Siddhardhan  Год назад +2

      Thanks a lot 😇

    • @tusharkhatri5795
      @tusharkhatri5795 Год назад +2

      @@Siddhardhan why have you done train_test_split after balancing dataset wont it create problem of data leakage?

    • @user-ub9pd5yl7p
      @user-ub9pd5yl7p 8 месяцев назад

      can you suggest some key points to add this project in resume. Thanks

  • @priyankaeklaspure8163
    @priyankaeklaspure8163 2 года назад

    Absolutely , Stunning . Your way of explanation is too good 💥💥. Thanks for sharing !! Super excited from more projects videos ...

  • @user-vk4xe7vz9k
    @user-vk4xe7vz9k 11 месяцев назад

    Hey, have build the same project as u teached but with different dataset & while using logistic regression im facing the ValueError: as 'Could not ocnvert string to float' what should i do?

  • @mansishrivastava7259
    @mansishrivastava7259 3 месяца назад

    Your explanation is really wonderful and so easy to understand

  • @farhabikamal302
    @farhabikamal302 Год назад +2

    Hi... I am working with the same dataset. But In my pc when I try to find info of the dataset, it shows a different amount of data than yours. I coundn't find why is it.

  • @adityamahamuni7365
    @adityamahamuni7365 3 года назад +4

    Thanks a lot! You're doing a great work. Keep it up!!

  • @VarunKumar-ek3kr
    @VarunKumar-ek3kr 2 года назад +1

    Hey.... I got total no. Of iterations reached limit warning in logisttic regression model...... What to do to solve this

  • @user-zp7uv3vd3r
    @user-zp7uv3vd3r Год назад

    bro from next video onwards please add some visualizations. any way ur explanation is excellent, thank u for sharing this content to public who needs like this content.

  • @swarnavdeb2064
    @swarnavdeb2064 2 года назад +1

    I still didn't get the part where groupby is being used for checking on the mean. Why is it determined and what does it conclude?
    Kindly through some light please

  • @OceanAlves23
    @OceanAlves23 2 года назад +1

    Hi 👨‍🎓, from Brazil/Teresina/PI. 👏👏👏

  • @shaikhirfan7749
    @shaikhirfan7749 2 года назад

    What if i use Random forest (isolation forest algorithem)Instead of logistic regression?? With svm also bcz of the data is imbalenced and having some outliers?

  • @jeremyheng8573
    @jeremyheng8573 Год назад

    Thank you for this very informative tutorial!

  • @nawin7789
    @nawin7789 27 дней назад

    man you are doing a really great job!

  • @sk.creations.
    @sk.creations. 9 месяцев назад

    great , bhai mst explain kiya

  • @shahriarafridi
    @shahriarafridi 19 дней назад

    Thank You for this entire project

  • @tahiraleem
    @tahiraleem 9 месяцев назад

    Thanks a lot Sid sir. Love and support from Pakistan.

  • @srinivasarao416
    @srinivasarao416 2 года назад

    Dear sir, I am doing my MSc. I am thinking to do my project dissertation on Credit Card Fraud Detection . Every body doing on kaggle dataset. If I do on same data set university people will say "copied" . Could you please suggest any other dataset from another resource . Thanks

  • @shubhasmitanayak3495
    @shubhasmitanayak3495 10 месяцев назад

    Can you please tell me how can I manage to perform test of different algorithms on different dataset in single colab repository ?

  • @hirakhan8015
    @hirakhan8015 Год назад

    very good content sir.
    one question i have a project on this i searched for null values , in mine its showing null values from V8-Class whole. how come in urs not showing null values?

  • @rattlesstrings2729
    @rattlesstrings2729 2 года назад

    hi there,
    there are missing values in the dataset from kaggle..
    pls do check

  • @rachana7044
    @rachana7044 2 года назад

    Hello sir, i m getting an error like 'dataframe'object not callable whn i run legit_sample= legit.sample() .could yu please help me with this...

  • @shadowalker5467
    @shadowalker5467 Год назад

    sir I have a question whenever I run this part of the code: credit_card_data['Class'].value_counts() it doesn't show the exact amount of fradulent transaction for example its supposed to show 492 but in my case its showing 239 why is that sir?

  • @durgavutla1055
    @durgavutla1055 3 года назад +1

    Amazing Videos!! Thanks for sharing to get practices on ML Projects

  • @poornimanair8662
    @poornimanair8662 Год назад

    Can we do over sampling instead of under sampling in this case?

  • @pranay6708
    @pranay6708 2 года назад

    wow... very nice and detail video and smooth explanation. Please add how to inbuilt sampling technique in this video . I liked the video very much .

  • @Sai-ph5vh
    @Sai-ph5vh Год назад

    Hello sir, amazing lecture, but have a doubt, why did do take those numbers instead of transaction id's and how do you take those numbers

  • @sakshigarg5996
    @sakshigarg5996 12 дней назад

    Thankyou so much for making this video It helped me alot😄

  • @niteshprajapat7918
    @niteshprajapat7918 3 года назад +3

    Thank You Sir, It was one of the best tutorials ever and I loved the way you explaining all lillte things in easiest way..

    • @Siddhardhan
      @Siddhardhan  3 года назад +1

      Thanks a lot for your positive words 😇 happy that you liked it!

    • @niteshprajapat7918
      @niteshprajapat7918 3 года назад

      @@Siddhardhan Sir, can you make a complete video on how to start ML, what are its pre-requisite and maths and mathematical intuition of all algorithm . Please Sir .Separate playlist . Because I'm getting confused what to do first ....

    • @Siddhardhan
      @Siddhardhan  3 года назад

      You can follow this playlist: ruclips.net/p/PLfFghEzKVmjsNtIRwErklMAN8nJmebB0I

  • @asdad9715
    @asdad9715 2 года назад +2

    hi Siddhardhan, thank you so much for this content, your explanation is really easy to understand.
    listening to your step by step explanation and directly practiced it on my google colab really help me to understand it.
    your content helps me a lot!

    • @nandinimadan6421
      @nandinimadan6421 Год назад

      when I am loading the dataset and checking the null values I am getting some,but he is not is anyone else getting this error

  • @vinsanargeese4384
    @vinsanargeese4384 10 месяцев назад

    I just wanna know whether it gives the accuracy details only or detect whether card is fraud or not

  • @vasanthkani2509
    @vasanthkani2509 2 года назад

    Your teaching good sir your videos very useful my project sir thank you so much sir

  • @fun_with_AI269
    @fun_with_AI269 2 года назад +16

    What's the ultimate result out of it? What did we learn? Is there any way to find out fraudulent transactions in legit data set?

  • @charanravikumar
    @charanravikumar 11 месяцев назад +1

    Great Work! but the collab link does not contain the required files like the dataset, it contains other files of sampling house

  • @amani4541
    @amani4541 2 года назад +1

    Have you also implemented this for neural network algorithm?

  • @fiz982000
    @fiz982000 2 года назад

    Sidhard ,you are awesome!!!!!

  • @JayashreeS-nn6rk
    @JayashreeS-nn6rk Год назад +3

    hi siddhardhan,
    I have an error,kindly resolve it.# training the Logistic Regression Model with Training Data
    model.fit(X_train, Y_train)...in this part ,I got STOP:TOTAL NO OF ITERATIONS REACHED LIMIT.How can i resolve it,

  • @igorsanjane4927
    @igorsanjane4927 Год назад

    Very good video and easy to understand code, thumbs up!!

  • @aadarshmishra2375
    @aadarshmishra2375 2 месяца назад

    after CONCAT I should be getting both 492 values for each class right? But I'M getting 492 values for class 1 and 1 value for class 0. PLS HELP!

  • @Jayansh-it2yu
    @Jayansh-it2yu 10 часов назад +1

    can anyone tell me where the data preprocessing is done in the video?

  • @mdmynuddin1888
    @mdmynuddin1888 Год назад

    Can anyone tell me about encode categorical feature where some feature have more than 10k Category.
    I am just work with diff dataset

  • @prathikshagowda924
    @prathikshagowda924 5 дней назад

    hii everytime iam uploading your data there is a missing values in data any suggestions...??

  • @sailakshmi3792
    @sailakshmi3792 Год назад +5

    I just want to suggest few points before you post a project
    1. Goal of the Project
    2. Output of this
    3. ways that it can be implemented
    4. in realworld how it is checked with the user inputs
    I think this basic points must be mentioned in the project to be called it as a meaning full project.
    Anyways thanks for your contribution @Siddhardhan

    • @jeremycapital
      @jeremycapital Год назад

      Do you have an idea of how this model could be used in a real-world, for instance, in a web application with a backend in python or node js?

    • @kabirbhawar2045
      @kabirbhawar2045 Год назад

      Hey can you please give the answers to these questions

    • @rudroroy1054
      @rudroroy1054 Год назад

      @@jeremycapitalyou will have to embed the model in the server. Also you will have to include separate modules for data cleaning and preprocessing. that would include a separate data engineering vertical to your application, we use kafka for stream processing

  • @sohamnath76
    @sohamnath76 Год назад

    Extremely nice explanation. It's explained in such a simple way. Thank you, brother! Really helped a lot. 😁

    • @rohitgupta4354
      @rohitgupta4354 Год назад

      *Please tell me what topics of ML topics used in this project so that I can start this project after learning those topics .*

  • @ashitnayak1912
    @ashitnayak1912 Год назад

    Awesome mate, finally my search ends with your content.

    • @rohitgupta4354
      @rohitgupta4354 Год назад

      *Please tell me what topics of ML topics used in this project so that I can start this project after learning those topics .*

  • @user-ps3tl8ze6r
    @user-ps3tl8ze6r 5 месяцев назад

    how to process the missing values to make as a meaningfull data??

  • @rickricky7847
    @rickricky7847 Год назад

    I am getting an error after training data and testing data
    At 36:17
    After fitting in model.fit(x_train, y_train)
    Found input variables with inconsistent numbers of sample: [197,787]
    Kindly help me in resolving this

  • @yogeetakhatri4015
    @yogeetakhatri4015 2 года назад

    Very good explanation thank u fro the explanation

  • @yassine.h3262
    @yassine.h3262 Год назад +2

    Great.
    I would like to add some remarks that might help in future projects :
    - The way you dealed with imbalanced data will certainly lead to bad predictions, because alot of information were lost when taking a small sample from a large dataset.
    - use SMOTE instead of sampling
    - use decision trees or random forests algos they are better when dealing with imbalanced data
    - use more evaluation metrics like ROC curve, F1 score, recall...

    • @nandinimadan6421
      @nandinimadan6421 Год назад

      when I am loading the dataset and checking the null values I am getting some,but he is not is anyone else getting this error

    • @chanlovebmx
      @chanlovebmx 11 месяцев назад

      @@nandinimadan6421there are no missing values from the default dataset maybe u wan to download again the data

  • @user-qi8xq7jk8t
    @user-qi8xq7jk8t Месяц назад

    Can someone explain how thr logisticregression function works? How the v1 v2 v3...... Values are defined as fradulent or not.

  • @user-qi8xq7jk8t
    @user-qi8xq7jk8t Месяц назад

    Anyone knows what the numbers in the dataset denote. How do they relate to real life transactions?

  • @PoojaKumari-wo8qi
    @PoojaKumari-wo8qi 3 месяца назад

    credit card fraud detection using machine learning iska code app kha pe likh rha hu thora btyia mugha ya pura code milskti ha kya or pppt

  • @dipakdas8887
    @dipakdas8887 9 месяцев назад

    You explained so well.. 😍

  • @FerinKingsly
    @FerinKingsly 5 дней назад

    Bro while foolwing the same codes it is showing a name error in the X_train, X_test,Y_train, Y_test = train_test_split(......)...and it shows like namerror : name 'train_test_split' is not defined. But I have givenfrom ""sklearn.model_selection import train_test_split"" in the beginning itself . Could anyone say why this error is happening to me?

  • @kushsheth4801
    @kushsheth4801 2 года назад

    Hi, can you give to code to plot the dataframe , to visualise the comparison of fraudulent and legit means. thank you

  • @user-kp4jh7zj9g
    @user-kp4jh7zj9g Год назад

    Video is good, but I am getting Found input variables with inconsistent numbers of samples: [787, 197]
    error...can anyone help me out

  • @zabashhd459
    @zabashhd459 8 месяцев назад

    Hi what did we conculde in the end what is the result ?

  • @subhamsaha2235
    @subhamsaha2235 3 года назад +14

    Very nice explanation and really liked the video. The classification report is also very good measure of the model. I think if we do cross validation and use some boosting techniques then the score can be increased more and one more imp thing is that here accuracy score doesnt matters, main is precision and recall because we cant let a fraud trans to become non fraud. Thank you

    • @Siddhardhan
      @Siddhardhan  3 года назад +1

      nice insights. you can definitely try to do some optimizations.

    • @nandinimadan6421
      @nandinimadan6421 Год назад

      when I am loading the dataset and checking the null values I am getting some,but he is not is anyone else getting this error

  • @femiOkaseun
    @femiOkaseun 3 месяца назад

    Awesome video. Thank you.

  • @tanishabiswas6268
    @tanishabiswas6268 11 месяцев назад

    Thank You! This was very helpful

  • @Akth1518
    @Akth1518 3 месяца назад

    Hi, i have a doubt. Can i work this project side by side and do everything you did, and just put it in my resume? or should i have to do something else. I'm a darta science fresher who wants to start career in data science. so could you please clarify, how to present projects in my resume

  • @adnanemehdaoui5487
    @adnanemehdaoui5487 11 месяцев назад +2

    we can use smote method to resolve unbalanced data, it very useful also

  • @s.o.r.e8362
    @s.o.r.e8362 11 месяцев назад

    Bro u could have hyper tune the model..and u could use f1,recall,auc/ruc , precision for checking for more accurate score of its ther

  • @rahulgaud4340
    @rahulgaud4340 Год назад

    Thankyou so much sir it helped me lot.

  • @gautamgupta4770
    @gautamgupta4770 11 месяцев назад

    Sir I have only one confusion which is why we create new dataset for checking accuracy

  • @sohelimtiaz9777
    @sohelimtiaz9777 2 года назад

    Which algorithm u used here?

  • @nandinimadan6421
    @nandinimadan6421 Год назад

    when I am loading the dataset and checking the null values I am getting some,but he is not is anyone else getting this error

  • @tutorstown973
    @tutorstown973 2 года назад

    Can you do it without undersampling?

  • @jirehla-ab1671
    @jirehla-ab1671 Год назад

    If I take this tutorials, is the software used for learning free? Like what you exactly did in the video?

  • @jigneshkhandare321
    @jigneshkhandare321 4 месяца назад

    Can we do this project using random forest and SVM?

  • @harshitha4204
    @harshitha4204 2 года назад

    how did we detect fraud with help of accuracy?

  • @KhurramShadab-nj9cc
    @KhurramShadab-nj9cc 3 месяца назад

    Bro code is not running it shows error in first 3rd line

  • @monicagullapalli6106
    @monicagullapalli6106 2 года назад +4

    Hi, amazing video! I wanted to ask that when this is tested against user input, what all inputs are required to be taken from the user? Only the amount?

  • @vinith_pr
    @vinith_pr 5 месяцев назад

    Can anyone send me a video link about how to deal with missing values from this channel?

  • @UnspokenTears
    @UnspokenTears Год назад

    Can i know the ML algorithm used here??

  • @shriyashrestha3341
    @shriyashrestha3341 10 месяцев назад

    Why didn't we use a standard scaler in this situation?

  • @Codebond7
    @Codebond7 11 месяцев назад

    what is the algorithm he used in this code can anyone SAYYYYYY

  • @sirigb9513
    @sirigb9513 10 месяцев назад

    Which algorithm did he use?

  • @Manojn9353
    @Manojn9353 Год назад

    can i use this for CSE final year project???

  • @SreeFacts
    @SreeFacts 2 года назад

    Can you share existing system and disadvantages of that?

  • @rahulvijay6611
    @rahulvijay6611 3 года назад +1

    Awesome, very helpful

  • @gorkemkoroglu8052
    @gorkemkoroglu8052 Год назад

    feature engineering parts at what minute?

  • @practicemail3227
    @practicemail3227 9 месяцев назад

    you could have also tried SMOTE technique to better understand and predict.

  • @ANURAGSINGH-nl2ll
    @ANURAGSINGH-nl2ll 8 месяцев назад

    Nice explanations

  • @dhilipduzo
    @dhilipduzo 3 месяца назад +1

    (Time-aware attention based gated network for credit card fraud detection by extracting transactional behaviors), Sir i choose this title , can i do ur project for this title ?

  • @sulochanakamshetty1711
    @sulochanakamshetty1711 3 года назад

    great job siddhardhan

  • @oguzaydiner3936
    @oguzaydiner3936 2 года назад

    How to deal with "ValueError: could not convert string to float " if we are working with a dataset with various data types ? For example my data has columns such as actionType which includes "transfer", "withdrawal" etc.

  • @diminSprint-bm8ul
    @diminSprint-bm8ul 2 месяца назад

    ParserError: Error tokenizing data. C error: Expected 31 fields in line 1987, saw 42 .This is the error i am getting while loading dataset to data frame any solution. chatgpt told to see line 1987 in csv file i d=saw but i did not found any error how to fix this

  • @datharaj1370
    @datharaj1370 3 года назад +1

    Hi Siddhardhan
    The video is very informative and easy to implement.
    However, undersampling is not the optimal way to approach this problem because we are discarding almost 95% of data and just training over

    • @Siddhardhan
      @Siddhardhan  3 года назад +2

      good insights. I'll research more about this.