Data Analyst Portfolio Project #2: Python Customer Segmentation & Clustering

Поделиться
HTML-код
  • Опубликовано: 11 ноя 2024

Комментарии • 104

  • @slacex
    @slacex 10 месяцев назад +12

    for those who have an error in the following formula is ---> df.corr() ------> df.corr(numeric_only=True)

    • @rishiraj1192
      @rishiraj1192 8 месяцев назад

      Facing similar issue how to resolve it

    • @vladthelad7298
      @vladthelad7298 8 месяцев назад +3

      @@rishiraj1192 He literally said it in his comment

  • @nezzylearns
    @nezzylearns Год назад +15

    This is an exceptional walkthrough, especially how you vividly explain the process of visualizing the data.

    • @absentdata
      @absentdata  Год назад

      Lovely Feedback! Thanks. I am glad you enjoyed it.

  • @Yett1hhh
    @Yett1hhh Год назад +3

    for i in columns:
    plt.figure()
    sns.kdeplot(data=df, x=i, shade=True, hue='Gender')

  • @janakiyeluripati6368
    @janakiyeluripati6368 2 года назад +9

    I followed till 32 min as I am not into ML. I just loved it. Understood univariate, bivariate. Want more videos like this. Love from India. Stay blessed.

    • @absentdata
      @absentdata  2 года назад +2

      Thank you so much! I am glad you finished the video and understood the exploratory data analysis steps. You also stay blessed!

  • @aditideshpande5566
    @aditideshpande5566 4 месяца назад +2

    Thank you so much for an exceptional well explained and clear video better than what I learnt in my masters degree!

  • @ai.simplified..
    @ai.simplified.. 3 года назад +9

    More usefull than hours of clas, good job 😍

  • @mospher9253
    @mospher9253 8 месяцев назад

    Better than most of the big channels around there
    Really good explanation and project step by step
    Can you do other video like this using other types of Clustering like GMM and others and do a more detailed analysis and conclusions
    as well thank you for the time you put on this video it was super helpful

    • @absentdata
      @absentdata  8 месяцев назад +1

      Ibreally appreciate this. Sure I'll do more detail analysis on clustering

  • @thanomnoimoh9299
    @thanomnoimoh9299 3 года назад +1

    Python great way to analysis awesome!!! thank you for great clip.

  • @Maliiik804
    @Maliiik804 3 года назад +1

    waiting for it from a long! Thanks for uploading this great content

    • @absentdata
      @absentdata  3 года назад

      Glad you're enjoying the content!

  • @batoolalshareef9456
    @batoolalshareef9456 Год назад

    Thanks alot, It's a great efforts,, keep on going, and share more videos like this 👍🌺🌺🌺

  • @jimmyxrs
    @jimmyxrs Месяц назад

    Also for the get dummies at the end if you need to force it to do an integer instead of a boolen, this worked for me: dff = pd.get_dummies(df,dtype=int,drop_first=True)
    dff.head()

  • @muskanmodi724
    @muskanmodi724 Год назад +7

    Hello.. for the kdeplot at 16:35, when I'm adding hue=df['Gender'], it is giving error The following variable cannot be assigned with wide-form data: `hue`

    • @juancamilosanchez4693
      @juancamilosanchez4693 Год назад +12

      You can solve this by adding this to the code: x=df['Annual Income (k$)'], and then you put the hue and it works

    • @mercyolaleye7502
      @mercyolaleye7502 Год назад

      @@juancamilosanchez4693 Thanks, this helped.

    • @iwojoseph
      @iwojoseph Год назад

      @@juancamilosanchez4693 this worked! Thanks!

    • @StrangeMemes52
      @StrangeMemes52 Год назад

      @@juancamilosanchez4693 yeah , thanks, this worked

    • @kostantinaorselli1093
      @kostantinaorselli1093 Год назад

      @@juancamilosanchez4693 why does this solve that problem?

  • @isaacetungu5215
    @isaacetungu5215 2 года назад

    Worth watching and follwoing along.
    I completed the video and did my work alongside code.
    I needed more help on multivariate analysis of clustering. The last part of the video on it was not well explained.
    Any recoomendations or video on that @Absent Data??

  • @RonaldPostelmans
    @RonaldPostelmans Год назад

    thanks for your great explanation

  • @javeda
    @javeda Год назад +1

    Please also tell how did you implemented code autocompletion in Jupyter notebook

  • @aramisfarias5316
    @aramisfarias5316 3 года назад +2

    The end felt a little rushed and underwhelming, but overall very instructive. Good job. =)

  • @awaisanjum9023
    @awaisanjum9023 Год назад

    Amazing video. Kindly make more protfolio projects videos.

  • @rachrach9871
    @rachrach9871 Год назад +2

    Awesome tutorial! I tried to download the dataset but I don’t where to begin. There’s an option for “raw” and “blame”. I’m new to data analytics so I would appreciate some help. Thank you very much

    • @absentdata
      @absentdata  Год назад +2

      You can find the data here:
      absentdata.com/data-analysis/where-to-find-data/

    • @rachrach9871
      @rachrach9871 Год назад

      @@absentdata thank you so much for your quick response! I’m already doing tutorial #1 and I’m hoping to learn as much from your tutorials

  • @emilioprill3373
    @emilioprill3373 Год назад

    Learned a lot! Thank you

    • @absentdata
      @absentdata  Год назад +1

      I'm glad to hear that. Please share it with anyone you think it helps

  • @adrianapanjiwijaya1520
    @adrianapanjiwijaya1520 Год назад +2

    Hi, This is very helpful. I do have a question though, after df=df.drop('Customer ID'), I forgot to add the hashtag and continued on. From that point on, the Customer ID disappeared. But in your case, Customer ID value re-appear during clustering. How did that happen and how do I get it Customer ID back?

  • @brandonwarfield5611
    @brandonwarfield5611 11 месяцев назад

    This is gold!!! I'm upset I'm just finding your channel!!!

    • @absentdata
      @absentdata  11 месяцев назад +1

      I am glad that you found the channel. Share it with anyone you think it will help!

  • @travelofftradition
    @travelofftradition Год назад +1

    Hi!
    thank you for this video. I have a question. I want to segment bank customers. But the data is in multiple files like accounts.csv, customer_details.csv, transactions.csv
    How to approach this problem when we have data in multiple files to segment the customers?
    Thanks
    Mohit

    • @absentdata
      @absentdata  Год назад

      You will need to merge them into a single dataset.

    • @travelofftradition
      @travelofftradition Год назад

      @@absentdata Ok. so basically i have to join them using any of joins like inner joins etc.?
      But how is it done when there are like 10-20 files? Is there any other way?

    • @absentdata
      @absentdata  Год назад

      @@travelofftradition append the files that are similar like all transactional files to create a single dataset and merge these with single customer details file which should also be result if an append.

  • @faa_z
    @faa_z Год назад

    Amazing video, thank you a lot, i only have question in
    21:52
    you said that from the graph seems like there is more femal than male, how did you know, is it because the median?

    • @absentdata
      @absentdata  Год назад

      The value count function will count the number of males and females to give the actual number

  • @nazmussumon5105
    @nazmussumon5105 2 года назад

    Thank you for this awesome tutorial. Learnt a lot.

  • @ericametta6964
    @ericametta6964 11 месяцев назад

    insightful

    • @absentdata
      @absentdata  11 месяцев назад

      Glad you found it insightful

  • @nadil3230
    @nadil3230 Год назад

    list object has no attribute mean , how to fix this error

  • @forecaststatistics8496
    @forecaststatistics8496 3 года назад

    Good job!

  • @nadil3230
    @nadil3230 Год назад

    why on the y axis it was density and can we change it with some other parameters.

    • @absentdata
      @absentdata  Год назад

      Yes you can change the variables on the x and y axis. You can also use PCA techniques also to display the data

  • @alejandrosalgadolima3745
    @alejandrosalgadolima3745 2 года назад

    Hi, great video.I can not understand why hue is not working in my computer. Could you please help me/

    • @absentdata
      @absentdata  2 года назад

      Whats your issue?

    • @promise-abasi
      @promise-abasi Год назад

      @@absentdata Hi, thank you so much for the video, I also have a challenge with the hue I can't seem to get pass' ValueError: The following variable cannot be assigned with wide-form data: `hue', from 17m, how do I solve this please, thank you

  • @anooppainuly5271
    @anooppainuly5271 2 года назад

    Loved it

  • @TvsCar30
    @TvsCar30 Год назад

    The following variable cannot be assigned with wide-form data: `hue` someone can help me

    • @sencxx6368
      @sencxx6368 Год назад +1

      sns.kdeplot(data=df, x='Annual Income (k$)', shade=True, hue='Gender')

  • @yusufbas035
    @yusufbas035 2 года назад

    thank you

  • @nayeem9358
    @nayeem9358 5 месяцев назад

    What is spending score ?

    • @absentdata
      @absentdata  5 месяцев назад

      It is the score(out of 100) given to a customer by the mall authorities, based on the money spent and the behavior of the customer

  • @KarthiKeyan-ci2yj
    @KarthiKeyan-ci2yj 3 года назад +1

    I would like to learn Data Analytics , can I get your contact to get more information from you?

    • @absentdata
      @absentdata  3 года назад

      www.linkedin.com/in/gaelimholland

  • @grhagandanap9912
    @grhagandanap9912 2 года назад

    Thanks for the practice. But I got some problem when execute the n_clusters sensitivity analysis in 41:13. Do you know what the problem is?

  • @parhatbazakov1091
    @parhatbazakov1091 Год назад

    Hi, I am new to data, Can anyone answer my question please? If the correlation showed the most correlation with Age (-0.33) and no correlation with Annual income (0.0099), would it be better to cluster by age?

    • @absentdata
      @absentdata  Год назад +2

      Low correlation doesn't necessarily mean low similarity. Clustering can still be useful to identify patterns even with low correlation. It depends on the goals of the analysis.

    • @parhatbazakov1091
      @parhatbazakov1091 Год назад

      @@absentdata Thanks!

  • @ai.simplified..
    @ai.simplified.. 3 года назад

    15:00 practical &usefull

    • @absentdata
      @absentdata  3 года назад +1

      Yes loops are your friends. Saves tons of time :)

    • @tejkumar9018
      @tejkumar9018 9 месяцев назад

      ----> 3 plt.figure()
      TypeError: 'module' object is not callable
      please help it cant execute because of error

  • @harryfeng4199
    @harryfeng4199 3 года назад

    Thnk uuuu

  • @adishreepatra7330
    @adishreepatra7330 Год назад

    Hi, Loved your content! If possible please share the source code of this project

    • @absentdata
      @absentdata  Год назад

      I added it in the description

  • @mahmoudemad8507
    @mahmoudemad8507 Год назад

    i get an issue that fit_transform must get 2 arguments

    • @absentdata
      @absentdata  Год назад

      try posting your code so we can see what's happening.

  • @im4485
    @im4485 Год назад

    Hi, is K means reliable at high dimensions?

    • @absentdata
      @absentdata  Год назад

      I would say no. I would do some PCA to reduce some of your dimensions.

  • @sikkandarbasha-p8o
    @sikkandarbasha-p8o 9 месяцев назад

    can i put this project on my resume?

    • @absentdata
      @absentdata  9 месяцев назад +1

      Of course you can!

  • @EpicSharjeel
    @EpicSharjeel 5 месяцев назад

    everything changes in 4 year every syntax

  • @zubairsultanate5660
    @zubairsultanate5660 Год назад

    zub salute

  • @vishnua5028
    @vishnua5028 Год назад

    How to download dataset

    • @absentdata
      @absentdata  Год назад

      check the description 😊

    • @vishnua5028
      @vishnua5028 Год назад

      @@absentdata I can't see any download option in GitHub

  • @hrsh3329
    @hrsh3329 3 года назад

    👍🏽👍🏽👍🏽

  • @forzahorizon4eliminator206
    @forzahorizon4eliminator206 8 месяцев назад

    and you got yourself a subscriber

    • @absentdata
      @absentdata  8 месяцев назад

      Welcome to the family! I am happy to earn your subscription.

  • @mn4769
    @mn4769 8 месяцев назад

    sns.kdeplot(df['Annual Income (k$)'],shade =True,hue = df['Gender']); here i ValueError: The following variable cannot be assigned with wide-form data: `hue`. Can someone explain?

    • @arindambhunia9862
      @arindambhunia9862 3 месяца назад

      sns.kdeplot(x=df['Annual Income (k$)'],shade=True,hue=df['Gender']);
      write the code in this way, it will get resolved. I also had the same issue.
      Good Luck

  • @slacex
    @slacex 10 месяцев назад +1

    df.groupby('Gender')['Age', 'Annual Income (k$)', 'Spending Score (1-100)'] ---> cannot subset columns with a tuple with more than one element. Use a list instead.

    • @absentdata
      @absentdata  10 месяцев назад

      Is that your whole code? Because there is no aggregation function in your group by. Also you are adding two columns. So it should be df groupby('category')[['A','B']].mean()

    • @slacex
      @slacex 10 месяцев назад +3

      @@absentdata i have just resolve it ----> df.groupby(['Gender'])[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']].mean() min 30:41

    • @jimmyxrs
      @jimmyxrs Месяц назад

      @@slacex Also helps with the income cluster later on : df.groupby(['Income Cluster'])[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']].mean()

  • @yashjajoria
    @yashjajoria Год назад

    sns.kdeplot(df['Annual Income (k$)'],shade = True,hue= df['Gender']); - ValueError: The following variable cannot be assigned with wide-form data: `hue`

    • @absentdata
      @absentdata  Год назад +1

      The updated version of sns.kdeplot may require you to make sure you have your Gender column in longform. so you need to melt the column like this. melted_df = df.melt(id_vars='Gender', value_vars=['Annual Income (k$)'])
      sns.kdeplot(data=melted_df, x='value', hue='Gender', shade=True)

    • @yashjajoria
      @yashjajoria Год назад

      thanks for response sir i'm your student @@absentdata