"Mr. Anderson, you are the expert in all matters related to drawing red lines. We need you to draw seven red lines. All of them strictly perpendicular, some with green ink and some with transparent. Can you do that?"
To anyone who is interested in data visualization: I highly recommend the five books on data visualization by Edward Tufte, particularly the first one, "The visual display of quantitative information." He is the founding figure of the field of data visualization and his books are very interesting and pleasant to read.
When he mentioned marks and channels with the specific examples, I immediately think of the analogy of how these are used in cartography and the choices made there.
Really interesting stuff. I'm very surprised to see that using color lum/sat for highlighting magnitude is considered worse than markersize. I mean, it makes sense that without a colorbar it is very difficult to say what saturation/transparency corresponds to anything in absolute or relative sense. So I feel like when utilizing those, a colorbar is mandatory. I'd be interested to see Dr. Xu's take on how to visualize high density datasets. Because when you have a set of n=1e6, a scatter plot that uses area to denote magnitude will not really be usable due to the markers overlapping or being to small to be visible. I'm expecting that at some point you have to shift from using scatterplots to porkchop plots and so on. Would be nice to see something of an overview between data set size and plotting formats!
RGB on a computer screen is a super poor way to refer to saturation. HSV and HLS are super poor derivatives. If you do not understand color spaces and vision in depth you shouldn't be talking about levels of saturation because that is a very complex topic and knowing only RGB values and your run of the mill Photoshop color pickers will lead you only to talking a lot of pseudoscientific nonsense that misleads people. To be more accurate it is necessary to talk about the Lch, Luv and L*a*b color spaces. Then you can use words like "twice as saturated" and have it actually mean something. Luminance is also fraught with peril. Something twice as bright doesn't look twice as bright. When printing on paper or viewing on a monitor there's a maximum brightness but this isn't true in general. Twice the number of lightbulbs will give it twice the brightness but it doesn't look it. It's all a bit too complicated for this comment but suffice it to say that there're not many people that understand it yet there's a large percentage that really think they know when in fact they do not.
@@xbzq Yeah exactly. I completely agree. To obtain something absolute from "a color" is incredibly sensitive and depends on way more factors than you'd expect. It becomes even worse when you use colormaps that vary multiple channels.
I can do another video if there is enough interest about visualising large dataset (I assume this is what you mean by 'high density'). A bit of spoiler: there is no silver bullet for large dataset unfortunately.
@@lens07 Yeah, although what I mean with high density is quite specific. So I'm not too sure about the wording. When doing optimization problems, the results should converge to the optimum. So what happens is that your distribution of markers is quite dense around the optimum. Which doesn't leave a lot of space for varying marker size (and thus the possibility to distinguish between the individual markers). Which makes me always opt for varying color (however a porkchop is superior of course). Hopefully that clears the "high density" part up a bit. Thanks for the reply!
Really informative video! Can someone please name the books on screen between 0:25 - 0:47. I can recognize Tufte’s classic from a mile away but I can’t see the other two. One of them is surely a springer handbook not sure which one.
Wouldn't a pie chart (area, 0.7, underestimated) be more reliable than a straight line (length, 1, normalized)? I think there's more tricks like that to improve visualization for better precision of the estimated value.
I think pie charts utilize a mix of area and length. The example in the video with squares compares 2 geometrically similar squares with different areas. The pie chart's pie's are not geometrically similar with changes in area but instead the arclengths change (since radius is constant to the bounds of the entire pie). I'm assuming this study comparing areas used uniform scale. We could change the visualization to compare a square and scaled square along one dimension (rectangle) because now you can compare side lengths (linear term), or display the numerical values inside the marker.
No, people's perceptions of triangular area or sectors (pie slices) are unreliable especially those undergoing rotation rely on their ability to estimate angular displacement. If instead of percentage of totals they display raw count data, say 9,720 votes, 9,000 votes, 8,280 votes, and 7,200 votes, and 1,800 votes, you would have trouble recognizing that these are 27%, 25%, 23%, 20%, and 5% respectively from a total of 36,000 votes. A linear graph with 4 closely but separated marks at each raw vote count and another very near 0 would show the differences of 720 votes between the first three and 1,080 votes to the fourth and the very wide gap to the last 5%. You may even notice that the gap between the 3rd & 4th values is wider than those between 1st & 2nd and 2nd & 3rd which are of equal width.
Yes as someone in a field where visualization of data is very important (biology) i have been told to always steer clear of pie charts. Honestly i’ve never seen someone use one in a talk, and i think there would be snickers from the crowd if they did.
So it's about categories and magnitude, and how you should represent them depends on how accurate you want them to be. But what about relation, trend, and connection intensity?
Can you guys do a video on the new SLP bug CVE-2023-29552? I think it would be really interesting and would love to hear your professional takes on it!
Overall good talk. Disappointed in tinting most everything green when showing the chart for the Magnitude Channel in the rows for "Color luminance" and "Color saturation" as well as for the Identity Channel for the row "Color hue"; these should not have been tinted to all be green.
The experiment was a bit biased towards giving a different answer for each one, I'd say. Having three things to judge and three judgements to use may cause you think you have tot use all three judgements once
So this suggests to me that any visualization whatsoever should only ever use distance to represent numeric data, since anything else would potentially deceive the audience. Any other channel like RGB, hue, etc. should be used strictly to distinguish nonnumeric data.
I don't have a CS background. I'm more of a traditional engineer in ChemE/MatSci. For people like me, you really can't separate engineering and design. In fact, I'd argue that Engineering is just a small circle inside of the larger circle that is design. Is it similar for CS, where all kinds of CS work has to involve some kind of design?
Whatever you did to the video made the length example wrong, that line is not twice the first line, it is clearly more, as measured with a ruler on my screen
I think visualization is great, but should always be accompanied by the raw data itself, unless the presenter is deliberately trying to mislead. So many charts are presented without labels on the scales - it might not be 0-based, the scale might be logarithmic, etc. The raw data at least can't be "misinterpreted". The main reason i mention this is because of the statement "a small increase in the voltage is perceived as a large increase by the subject". Are we talking a small increase in units or a small increase in percentage? Human perception has been shown to be logarithmic naturally (you can very quickly differentiate 4 lions from 5 lions at a glance, but not 100 lions from 101 lions). I'm not accusing your example of being misleading, but moreso backing-up my point that raw data should always be included so there's no room for misinterpretation.
The options in the test were confusing the user and were manipulating his mind, so i think if wouldn't interven and let the user to give anwer without any assitance the result would be more accurate
Didn't mention Jacques Bertin in the important books about Data Visualization. Sorry but that's a red flag for me. He, and no one else wrote the first and widest intent to provide a theoretical foundation to Information Visualization, and his works are still valid. Is it because an Anglo centrism? I'm deeply sorry, much because I deeply like your work and everything you brought to me.
"Mr. Anderson, you are the expert in all matters related to drawing red lines. We need you to draw seven red lines. All of them strictly perpendicular, some with green ink and some with transparent. Can you do that?"
* nervously sweats with 2 perpendicular sweat drops on his forehead and the 3rd drop doesn't know where to go *
When you inflate the balloon, can you do it in the shape of a kitten?
@@phandao5404You are talking about different Mr Anderson lol
yo thats a swastika?
7 perpendicular lines 6:47
Can't wait to have my data visualized with electric shock intensity! ❤
Let me run that past the ethics committee for a sec.
If we're talking about, visualizations, that must mean the electrodes will be hooked up to our eyeballs. Extra FUN!
Thanks for the free, high-quality education.
Welcome.
To anyone who is interested in data visualization: I highly recommend the five books on data visualization by Edward Tufte, particularly the first one, "The visual display of quantitative information." He is the founding figure of the field of data visualization and his books are very interesting and pleasant to read.
thanks!
When he mentioned marks and channels with the specific examples, I immediately think of the analogy of how these are used in cartography and the choices made there.
dr xu has impeccable drip
Really interesting stuff. I'm very surprised to see that using color lum/sat for highlighting magnitude is considered worse than markersize. I mean, it makes sense that without a colorbar it is very difficult to say what saturation/transparency corresponds to anything in absolute or relative sense. So I feel like when utilizing those, a colorbar is mandatory.
I'd be interested to see Dr. Xu's take on how to visualize high density datasets. Because when you have a set of n=1e6, a scatter plot that uses area to denote magnitude will not really be usable due to the markers overlapping or being to small to be visible. I'm expecting that at some point you have to shift from using scatterplots to porkchop plots and so on. Would be nice to see something of an overview between data set size and plotting formats!
RGB on a computer screen is a super poor way to refer to saturation. HSV and HLS are super poor derivatives. If you do not understand color spaces and vision in depth you shouldn't be talking about levels of saturation because that is a very complex topic and knowing only RGB values and your run of the mill Photoshop color pickers will lead you only to talking a lot of pseudoscientific nonsense that misleads people. To be more accurate it is necessary to talk about the Lch, Luv and L*a*b color spaces. Then you can use words like "twice as saturated" and have it actually mean something. Luminance is also fraught with peril. Something twice as bright doesn't look twice as bright. When printing on paper or viewing on a monitor there's a maximum brightness but this isn't true in general. Twice the number of lightbulbs will give it twice the brightness but it doesn't look it. It's all a bit too complicated for this comment but suffice it to say that there're not many people that understand it yet there's a large percentage that really think they know when in fact they do not.
@@xbzq Yeah exactly. I completely agree. To obtain something absolute from "a color" is incredibly sensitive and depends on way more factors than you'd expect. It becomes even worse when you use colormaps that vary multiple channels.
I can do another video if there is enough interest about visualising large dataset (I assume this is what you mean by 'high density'). A bit of spoiler: there is no silver bullet for large dataset unfortunately.
@@lens07 How do you define "saturation" that allows you to say that one area has twice the saturation of another?
@@lens07 Yeah, although what I mean with high density is quite specific. So I'm not too sure about the wording. When doing optimization problems, the results should converge to the optimum. So what happens is that your distribution of markers is quite dense around the optimum. Which doesn't leave a lot of space for varying marker size (and thus the possibility to distinguish between the individual markers). Which makes me always opt for varying color (however a porkchop is superior of course). Hopefully that clears the "high density" part up a bit. Thanks for the reply!
I would consider data visualization as closer to the discipline of User Experience design.
I knew I liked Dr. Xu early on, but that sensation only magnified as time went on; much like electroshock.
This is the other kind of graph theory.
Computerphile invites data visualisation expert, films his presentation from a distance on a tiny screen
Idk if you were just making a joke or if you actually want to see the presentation. But if so, it's in the description.
Really informative video! Can someone please name the books on screen between 0:25 - 0:47. I can recognize Tufte’s classic from a mile away but I can’t see the other two. One of them is surely a springer handbook not sure which one.
The visual display of quantitative information - second edition
The grammar of graphics - second edition
Visualisation analysis and design
Apologies: photos.app.goo.gl/brmCVQYgFked85kx8
@@Computerphile Thanks a lot
@@AF-lt2fr Thank you
@@sanketdutta4981 no problem - I ended up getting it by changing the video quality under advanced to 4k and zooming in.
Wouldn't a pie chart (area, 0.7, underestimated) be more reliable than a straight line (length, 1, normalized)?
I think there's more tricks like that to improve visualization for better precision of the estimated value.
I think pie charts utilize a mix of area and length. The example in the video with squares compares 2 geometrically similar squares with different areas. The pie chart's pie's are not geometrically similar with changes in area but instead the arclengths change (since radius is constant to the bounds of the entire pie). I'm assuming this study comparing areas used uniform scale. We could change the visualization to compare a square and scaled square along one dimension (rectangle) because now you can compare side lengths (linear term), or display the numerical values inside the marker.
No, people's perceptions of triangular area or sectors (pie slices) are unreliable especially those undergoing rotation rely on their ability to estimate angular displacement.
If instead of percentage of totals they display raw count data, say 9,720 votes, 9,000 votes, 8,280 votes, and 7,200 votes, and 1,800 votes, you would have trouble recognizing that these are 27%, 25%, 23%, 20%, and 5% respectively from a total of 36,000 votes.
A linear graph with 4 closely but separated marks at each raw vote count and another very near 0 would show the differences of 720 votes between the first three and 1,080 votes to the fourth and the very wide gap to the last 5%. You may even notice that the gap between the 3rd & 4th values is wider than those between 1st & 2nd and 2nd & 3rd which are of equal width.
Yes as someone in a field where visualization of data is very important (biology) i have been told to always steer clear of pie charts. Honestly i’ve never seen someone use one in a talk, and i think there would be snickers from the crowd if they did.
Is it possible to get original presentation somewhere?
I also would like to have the slides.
It's in the description.
So it's about categories and magnitude, and how you should represent them depends on how accurate you want them to be.
But what about relation, trend, and connection intensity?
Can you guys do a video on the new SLP bug CVE-2023-29552? I think it would be really interesting and would love to hear your professional takes on it!
I was quite shocked that they were all 2x darker, longer, and larger!
I had guessed, 2.5, 2.5 and 2!
Overall good talk. Disappointed in tinting most everything green when showing the chart for the Magnitude Channel in the rows for "Color luminance" and "Color saturation" as well as for the Identity Channel for the row "Color hue"; these should not have been tinted to all be green.
Great video!
... but my boss wants 3D pie charts and 3D stacked bar charts. Basically, add 3D to everything.
The experiment was a bit biased towards giving a different answer for each one, I'd say. Having three things to judge and three judgements to use may cause you think you have tot use all three judgements once
Yep, but can you visualize it?
I wonder why they used voltage with electric shock and not power, since it would make sense pain is proportional to power.
So ironic how there's nothing to look at through so much of this video!
Can you, please, provide names of the books in beginning of the video.
Electric shocks? Suffer to get your data puny human
That's obviously how he got his funding 😆
So this suggests to me that any visualization whatsoever should only ever use distance to represent numeric data, since anything else would potentially deceive the audience. Any other channel like RGB, hue, etc. should be used strictly to distinguish nonnumeric data.
I don't have a CS background. I'm more of a traditional engineer in ChemE/MatSci. For people like me, you really can't separate engineering and design. In fact, I'd argue that Engineering is just a small circle inside of the larger circle that is design. Is it similar for CS, where all kinds of CS work has to involve some kind of design?
good video and explanation
10:15 Hmm the infamous electric shock visualiser.
Isn’t this a re-upload?
Cool video
Whatever you did to the video made the length example wrong, that line is not twice the first line, it is clearly more, as measured with a ruler on my screen
At 10:00? It's your screen doing something funny. Here it is precisely 4.5 vs 9.0 cm.
Can someone add caption (not automatic) ?
I think visualization is great, but should always be accompanied by the raw data itself, unless the presenter is deliberately trying to mislead. So many charts are presented without labels on the scales - it might not be 0-based, the scale might be logarithmic, etc. The raw data at least can't be "misinterpreted".
The main reason i mention this is because of the statement "a small increase in the voltage is perceived as a large increase by the subject". Are we talking a small increase in units or a small increase in percentage? Human perception has been shown to be logarithmic naturally (you can very quickly differentiate 4 lions from 5 lions at a glance, but not 100 lions from 101 lions). I'm not accusing your example of being misleading, but moreso backing-up my point that raw data should always be included so there's no room for misinterpretation.
and how is the raw data formatted?
great video thanks so much!!!!!!!!!!!!
Cool video. I wonder if you are preparibg one about the GPT-4 pause
Strangely saturation was guessed right, and length wrong.
The options in the test were confusing the user and were manipulating his mind, so i think if wouldn't interven and let the user to give anwer without any assitance the result would be more accurate
Why the dislikes tho?
genius stuff.
Thanks for the great video. One question... this laptop looks amazing. Is it a macbook or a windows machine? Which one? Does anyone know?
❤❤❤❤
👍
Didn't mention Jacques Bertin in the important books about Data Visualization.
Sorry but that's a red flag for me.
He, and no one else wrote the first and widest intent to provide a theoretical foundation to Information Visualization, and his works are still valid. Is it because an Anglo centrism?
I'm deeply sorry, much because I deeply like your work and everything you brought to me.
Wow
Its a QR code 😂
BORING THIS CHANNEL IS GOING DOWN THE TUBES
Nah. It's only you.