What is Diagnostic Analytics? - Data Science Wednesday

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024
  • Data Science Wednesday is produced by Decisive Data, a data analytics consultancy. Lean more about us using the following links. Also, the video transcription is included below.
    www.decisivedat...
    / decisivedata
    / decisive-data
    Video Transcription (Last minute or two cut off)
    Hello, and welcome to Data Science Wednesday. My name is Tessa Jones, and I'm a data scientist at Decisive Data. And I'm here today to talk about diagnostic analytics. Diagnostic analytics really fits into this spectrum of analysis, really going from basic to more complex. Descriptive analytics, we've talked about before in a previous episode, so if you're interested in learning more about descriptive analytics then you're more than welcome to watch it. But the important part about descriptive analytics is it really launches us into these more advanced analytics data science phases.
    Today we're gonna talk about diagnostic analytics, which is really the most abstract of any of these phases of analysis. And it really answers the questions of why. Why things are happening? What's driving things to go up, or down, or anything along those lines? So in order to have this conversation let's imagine that we are grocery store owners, and we wanna know what's causing our revenue for any given product to go up or go down so that we know how to stock our shelves. So we're gonna do...first, we're gonna find correlations in our data where we wanna know what's driving the revenue? What's causing it to go up or down? So we're gonna pick out a bunch of features that we think might impact that revenue and then we're gonna basically find a number that tells us it's either highly correlated or low correlation.
    So in this example, we're gonna take time of year, commercial airtime, Twitter mention, location, shelf height, and then we're gonna put it in our big box of statistical methods, and it's gonna spit out a number between zero and one. One would mean that it's very highly correlated and zero would mean that it's really not correlated at all. In this example, we see that time of year is really, really...has a high correlation as does location and shelf height. And Twitter mention also is somewhat correlated. So let's think about this graphically and visually.
    So here's an example. We have Twitter mention and sales, and we can see that the more Twitter mentions you have the higher the sales go as you can see from this line here. And these dots out here, the distance away from these lines and the more clustered these dots are to this line, the more confident we are that this correlation is in fact true. In contrast, we go down here and we see time on market and we plot it relative to sales. And these points are kind of all over the place and the line is flat. This is just clearly not correlated. Time on market really doesn't impact the revenue. And this is all really good to know because we need to know what drives revenue in order to build a good predictive model, and to understand what's really driving all of our data in general.
    So let's visualize some of this for a minute. If you're starting to go into diagnostic analytics and data science you probably have a pretty good descriptive and analysis base. So you probably have a pretty good dashboard. So here we're gonna go into an example where you have revenue by different departments, like pantry, and dairy, and meat. And if you have a really good visualization you click in it and then it shows you the revenue of all your products in that department. And then let's say you're interested in your top seller. And you click on it and you see it over time in a given year, your sales has this weird bump where it's really high on either side, but then it's really low. Which makes sense because we've identified that the time of year is correlated to revenue, which means that for a given time of year you're gonna sell more than another time of the year.
    So let's kind of like drill down into that a little bit more. So let's take an example of cold and hot cereal. So on this axis we have the months of the year, and we see that during the summer months the cold cereal really sells really well, and the oatmeal doesn't sell so well. But it sells really well in the winter months which totally makes sense because it's cold out, you want hot cereal, or it's really hot out and you want something cold. So this totally makes sense. So this is really good information to have for our next steps of analysis.

Комментарии •