Find Outliers in SQL

Поделиться
HTML-код
  • Опубликовано: 23 авг 2024
  • Let me show how to quickly find outliers in SQL using this easy to follow along code. In this tutorial, you will learn how to calculate the mean, standard deviation, Z-Score to find out data that fall behold your outlier threshold.
    Don't hesitate to dive into the SQL mini-case studies and questions and create your own answers. Learn by doing! Check out the SQL practice link below:
    ______________________________________________________________________
    Check out StrataScratch: stratascratch....
    ______________________________________________________________________
    Find the SQL code and written instructions here:
    absentdata.com...
    #SQL
    #Interview
    #Outliers

Комментарии • 24

  • @carpoolify
    @carpoolify 2 года назад +3

    Great tutorial, many thanks. Would be interested in the box plot method!

  • @drivetrainerYT
    @drivetrainerYT 2 года назад

    Just went thru this channel's uploads. Many hours of binge watching are queued now. "Thanks" 😀
    Great videos!

  • @ronenTheBarbarian
    @ronenTheBarbarian 2 года назад

    Another great lesson, thanks!

  • @user-tn7ce8dz9l
    @user-tn7ce8dz9l Год назад

    hello! thank you so much for making this video! how would I find the outliers within a specific group within the overall select? for example if I am finding outlier numbers in each state etc.

  • @ngocbao2436
    @ngocbao2436 4 месяца назад

    Thank you

  • @zhangmr7955
    @zhangmr7955 2 года назад

    Good course, but I cannot find "web_data"? would you please provide the link as other datasets?

  • @linuxbrad
    @linuxbrad Год назад

    Beautiful explanations! How did you generate your normal distribution bell curve? (Syntax please?) Thank you!!!

    • @absentdata
      @absentdata  Год назад +1

      If you want to create a bell curve. A great way is to use Python. You can create a KDE plot using the seaborn package!

  • @CorenMare
    @CorenMare 2 года назад

    Good video.

  • @--ShivaS
    @--ShivaS 2 года назад

    Gr8 video✌️❤️

  • @pawlowski6132
    @pawlowski6132 2 года назад

    Awesome. I can put this to good use this week. However how did you calculate the outlier thresholds. Plus or minus 2.576? That wasn't clear.

    • @absentdata
      @absentdata  2 года назад

      The thresholds are based on the Z-score that give us the probability of how much of our data is within that Z-score. The 2.576 would be 95.5% of your data. So anything outside of that or higher than that score would be an outlier. This 2.57 also aligns to 3 standard deviation from the mean. You don't need to calculate the threshold. You set the threshold at 90, 95, 99 percent and you find the z-score matches this.

  • @philtoa334
    @philtoa334 2 года назад

    Nice.

  • @user-mc5bj6ys4f
    @user-mc5bj6ys4f Год назад

    hello, newbie here. Would like to know what stdev should i base my values on. I have this task where i need to find the outliers in a set of amounts. Should i base in on the 2.576 or the 1.96 or 1.645. or should i test in on these three stdev values?

    • @absentdata
      @absentdata  Год назад

      Well this is up to your business domain and use-case. How important is it for you to identify outliers? What impact do these outliers have on your data? What is the distribution of your data? These are all questions you should be asking yourself and your stakeholder. Then you can determine what threshold is best for your case.. For example In, In medical domain 95% or 1.96 may be too conservative. There is no one size fits all approach to setting this threshold.

  • @abcxyzncl
    @abcxyzncl Год назад

    How can I draw a line using PostgreSQL?

    • @absentdata
      @absentdata  Год назад

      Can you explain what you mean by draw a line?

  • @binu1455
    @binu1455 Год назад

    hey ..why z-score value is taken as 2.57 not 2 or 3 .could you please clear that

    • @absentdata
      @absentdata  Год назад

      These values correspond to the percentile values on Z table. I chose 2.57 because it represents 99.49%. This is an arbitrary way to set the threshold for your outliers.

    • @binu1455
      @binu1455 Год назад

      @@absentdata okay is this value 2.57 is same for all datasets or for this particular dataset. how to calculate that value

  • @jacquetrahan8481
    @jacquetrahan8481 Год назад

    Why do some use mean and others use median?

    • @absentdata
      @absentdata  Год назад +1

      Median is going to less influenced by outliers than mean.

  • @AP-dw6nf
    @AP-dw6nf 2 года назад

    Thank you