Data Analyst Project - Cleaning Data with SQL for Insights | Power BI, SQL & Python

Поделиться
HTML-код
  • Опубликовано: 16 дек 2024

Комментарии • 46

  • @aliahmad1987
    @aliahmad1987  4 месяца назад +2

    If there are corrections, additional instructions or updates then those will be listed below.
    15.08.2024 - Backup file updated with a non-empty version. Name in GitHub is "Episode 2 - PortfolioProject_MarketingAnalytics.bak"
    Thanks for watching and keep up the good work 🤩

  • @avineshblah5739
    @avineshblah5739 4 дня назад

    Hi ahmed, your videos are super helpful and informative. However, it would be even better if you could include some checks which can be used to find the irregularities within the data and what to clean. Here you've delved straight into the cleaning part.

  • @omarisawesome365
    @omarisawesome365 4 месяца назад +3

    Thank you. I want more tutorials like this ❤

    • @aliahmad1987
      @aliahmad1987  4 месяца назад +1

      @@omarisawesome365 I will make more after this one is finished.
      Are there any special segments or industries you are interested in?

    • @walachanel3838
      @walachanel3838 4 месяца назад +1

      Thank you so much! I’ve completed many courses, but I haven’t had practical experience or job opportunities to use SQL for data analysis, as my country doesn’t commonly use SQL for this purpose. I’m passionate about learning and eager to deepen my understanding and skills further.

    • @aliahmad1987
      @aliahmad1987  4 месяца назад

      @@walachanel3838 you are welcome!
      Happy the examples are useful!
      // Ali

  • @divyachouhan2366
    @divyachouhan2366 17 дней назад +3

    how to get dataset

  • @chicoyy
    @chicoyy 9 дней назад +1

    thank you so much for the video!! I found something, if in a date all the rows are NULL, then the avg_duration will be NULL as well. This happend in 2025-08-01, so I add a global avg to when this happen:
    COALESCE(Duration, avg_duration, global_avg_duration) AS Duration
    AVG(Duration) OVER() AS global_avg_duration, -- avg_global

    • @aliahmad1987
      @aliahmad1987  7 дней назад

      Hi,
      Okay - did it fix it? It oculd be instances that doesn't have values yes, but feel free to get creative!

  • @prernabhalerao-m1m
    @prernabhalerao-m1m 8 дней назад +1

    Someone please tell me where to get the dataset?I checked Github and could only find the csv for customer reviews data

    • @aliahmad1987
      @aliahmad1987  7 дней назад

      Hi,
      Yes it is on GitHub and some parts we generate in the project also :)

  • @sevalbzkrtt
    @sevalbzkrtt 3 месяца назад +2

    Thank you, I appreciate your effort. Can you also work on projects in the field of finance?

    • @aliahmad1987
      @aliahmad1987  3 месяца назад +3

      Hi,
      First of all - thank you for watching and supporting.
      Yes - I will add more projects with other segments / indstries with Finance being one of them :) I think towards the end of this one I will run a poll on it, but top of mind I have planned Retail, Customer Service, Supply Chain, Finance and maybe another one on Sales, but a bit more advanced one.
      Wish you a continued great day!
      // Ali

    • @sevalbzkrtt
      @sevalbzkrtt 3 месяца назад +1

      @@aliahmad1987 Thank you in advance for all your efforts and everything.

  • @user-xt4og
    @user-xt4og 3 месяца назад +1

    hey Ali, thanks for your tutorial. Do we need to check all tables for duplicate rows like customer_journey table?

    • @aliahmad1987
      @aliahmad1987  3 месяца назад +1

      You are welcome!
      No, you don't. The purpose or thought behind is this statement is that you notice it, and then do something about it and I was just using some SQL concepts how to do it practically.
      An idea, depending on how motivated you are ;) is to still write the statement and use it as a test to see if you can do it, and verify through SQL statement and an output that there aren't any duplicates.
      In the project I made though, it is only an issue in one table.

  • @alenkadric635
    @alenkadric635 2 месяца назад +2

    Hey Ali greetings! Im very thankfull to you for creating this project for us future Analysts. But I have one problem, the bak database file that you created is on I belive 2022 SQL SERVER version, and my computer has a problem, so I can only install a 2014 SQL server version on my PC. Is there any possibility that you can downgrade the BAK file so that I'm able to restore it to my server , and complete this beautiful projects afterwards.💯💯🙌

  • @mankiratsingh8494
    @mankiratsingh8494 3 месяца назад +1

    hey can you tell me from where you have taken the datasets for the project

    • @aliahmad1987
      @aliahmad1987  3 месяца назад

      @@mankiratsingh8494 Hi,
      I have created it in Python and SQL Server.
      Ali

  • @peterkadar7754
    @peterkadar7754 3 месяца назад +1

    Hi Ali, Thank you for your great job. I executed the last customer journey query but unfortunately there are still some remaining NULL in the Duration column. For example I filtered Journey ID 1465. That is still NULL. Do you have the same on your side?

    • @aliahmad1987
      @aliahmad1987  3 месяца назад +1

      Hi Peter,
      Yes I have the same! There are some customer_journey rows which only has 1 event. When I think about it, a part of the project could have been to handled this also as I see a lot of the events are "Checkout".
      Could also indicate that they have gone directly to the Check Out page. Either way, an average function over a blank row will return average :)
      If you want to you can adjust this one:
      COALESCE(Duration, avg_duration) AS Duration -- Replaces missing durations with the average duration for the corresponding date
      to
      COALESCE(Duration, avg_duration, 0) AS Duration -- Replaces missing durations with the average duration for the corresponding date
      And it will replace the Duration NULL value with a 0, and then you can take it out the measure in Power BI if you want to do an average of that.
      Does that make sense?
      Thanks for pointing it out btw and thanks for watching!

    • @peterkadar7754
      @peterkadar7754 3 месяца назад +1

      Hi Ali,
      Thank you very much. Sorry for this late reply.
      Now understand. I will checknit it and try all of them.
      Thank you again.
      Peter
      Thank you!

    • @aliahmad1987
      @aliahmad1987  3 месяца назад +1

      @@peterkadar7754 You are welcome!

  • @rituvennela4292
    @rituvennela4292 Месяц назад +1

    Hi , I have MySQL set up in this system, where can I download the db, to use it in MySQL..unable to find it in the github link

    • @aliahmad1987
      @aliahmad1987  7 дней назад

      Hi,
      This can be done with MS SQL Server, not MySQL unfortunately.

    • @ganapatinaik8552
      @ganapatinaik8552 6 дней назад

      @@aliahmad1987 can you convert that bak file into Mysql suitable file or csv file and upload in git?

  • @nguyenthuylinh6005
    @nguyenthuylinh6005 Месяц назад +1

    First of all, thanks for sharing this useful content, but I don't know how to download the dataset to practise, please give me the instruction, again, thank you for your valuable playlists

    • @aliahmad1987
      @aliahmad1987  7 дней назад

      Hi,
      Please check the description ;) It is all there

  • @Sweetzee-v9q
    @Sweetzee-v9q 3 месяца назад +1

    hey ali, can i make this project under the topic digital marketing for my college project??

    • @aliahmad1987
      @aliahmad1987  3 месяца назад +1

      @Sweetzee-v9q Hi,
      Yes, sure, why not? If it makes sense and fits in then go for it! Just make sure it isn't considered cheating etc. of course 😎

  • @navidhadavii
    @navidhadavii Месяц назад +1

    hello
    First of all, thanks for sharing this useful content
    I have a question. How did you find your favorite industry? Do you have any tips to help me?
    thank you

    • @aliahmad1987
      @aliahmad1987  7 дней назад +1

      You are welcome.
      Uhm, good question...I don't really have one as I've gottan a bit agnostic since I've worked on lots of different types of data.
      eCommerce seems like a lot of fun though! Great industry + combination.

  • @souravverma8707
    @souravverma8707 4 месяца назад +1

    Where to get dataset?

    • @aliahmad1987
      @aliahmad1987  4 месяца назад

      Hi,
      We will build it in Part 4 in Power BI using the SQL statements.
      Soon ready 😎.
      How did you find this part?

  • @CheySpice
    @CheySpice 2 месяца назад +1

    Hi Ali! Thank you again for another great project. I am coming across an issue when I am trying to restore the "Episode 2 - PortfolioProject_MarketingAnalytics.bak" file in SSMS. The error reads "Restore of Database PortfolioProject_MarketingAnalytics.bak" failed. It also states that "The operating system returned the error '5(Access Denied)'. Do you know why this could be?

    • @aliahmad1987
      @aliahmad1987  2 месяца назад

      Hi,
      Thank you! Hmmm...it sounds like you might have it stored on a place you can't read from?

    • @CheySpice
      @CheySpice 2 месяца назад

      @@aliahmad1987 I have it stored in the same place where I have AdventureWorks2022 which I was able to recover with no issue. I wonder if it should be stored somewhere else?

  • @manelhnid8645
    @manelhnid8645 2 месяца назад +1

    Hello ali 😍 wish u reply soon 🤩 im new in sql and i didnt understand alors 🥲 do we need to write all of that or it is already written and thé data base is aleeady ready and ur only explaining im lost 😅 also i tried too many thi g but didnt know where to find thèse Line or Even where to write it 😂

    • @aliahmad1987
      @aliahmad1987  2 месяца назад

      Hello there!
      Startin off you have to set up the environment needed to run this, so I hope you have started with that :) After that, yes I would recommend that you write the SQL, how else will you learn 🙂?
      All material is on GitHub, but it sounds like you have to start a bit slow and then progress over time.
      Ali