@@acollierastroI'll do you one better, when I review a paper and it uses a violin plot I'll ask the authors to replace it with a boxplot + histogram and cite your video
Instead of people citing a video, should write a paper I suggest the title: "Violin plots considered harmful" (because their past existence cannot be undone)
Hello! I am the creator of the Turnip Plot, which is similar to a violin plot but rotated around the vertical axis and rendered in 3D. Please cite my paper. Thanks!
Yeah, I was nodding along to the mechanical critiques of the plots, and realized about three-quarters of the way through the first half (I'm a bit slow about these sorts of things cause I'm an asexual cis man who ain't get none and ain't want none neither) that oh yeah, these do kinda look like genitals (and it only clicked cause one of em kinda looked like a penis and that made it click that the rest of em look like vulvas) and was like: Huh, That's weird: I wonder if anyone else noticed that, and if she's gonna mention it at all?
Have you noticed that most graph paper uses a relatively faint blue colour? That is because photocopiers (before they became scanners) couldn't see blue well. If you were careful with the contrast, you could draw a diagram or a form on graph or squared paper and the photocopier would come out with only your lines on it. It was a great way to create forms, character sheets for RPG games or, indeed, the sorts of diagrams in an academic paper. Regarding the typewriter thing, you could get whiteout on paper strips. You backspaced over the error, poked the whiteout paper behind the ink-ribbon and hit the erroneous letter again. That deleted it and you could over-type again with the correction. The line spacing on typewriters is quantised so you can go up and down by exact lines easily. It's only when you noticed the error after you had taken the paper out that you would have to resort to liquid whiteout and fiddle to get the positioning correct.
In my typing class in high school (wow, am I really that old?), we used that method as well. We also had fancy Selectric-2's that had a built-in white-out strip! But i also remember just rolling the paper up a bit, applying white-out, letting it dry, and rolling it back down by the same amount. No need to take the paper out altogether. This would have been 1987 or 1988. The tail end of the typing class days, before keyboarding became standard, and Apple II's or Macs, etc., became cheap enough to replace typewriters, to teach typing. I mean, we had a few Apple IIs, but they were for "computer class", not "typing class".
@@mikedavis979 That was around a decade after I went to school. In my day only the girls did shorthand and typing. I've cursed that fact many times during my IT career. By the time I realised it would be useful to be able to touch-type, I was too fast without to be able to stick to learning.
I had a typewriter from my dad that had a little like tape strip in it for corrections instead of white out, and you'd hit the backspace key to go back one character spce and the delete lever would push the ribbons up so the correction tape was in line and you'd just smack the key for the offending character a few times until the tape pulled all the ink up, then you turn off delete and keep typing. You could even do neat stuff by combining some inked characters with other characters on the correction tape, like clearing a + across an inked M looked pretty cool, and it made good on-demand icons if you needed them for something.
@@ArtThingies That was a carbon ribbon instead of an ink ribbon, a plastic film with a layer of carbon on it. The rich people like important executives used those. They produced more professional-looking text because the ink didn't bleed into the paper. But the ribbon could only be used once; when the shape of a letter had been transferred onto the paper, it left a transparent hole in the tape. There was industrial espionage where people stole the secretaries' used typewriter ribbons and read the documents they'd been typing from the ribbon. Carbon ribbons were expensive and had to be regularly replaced. Ink ribbons just reversed themselves each time through and you kept using them until your text looked too faint. If you were enterprising you could even re-ink them yourself, in much the same way that you can re-fill ink cartridges today.
Librarian here: Physical Journals/books/etc. are often preferred when it comes to preservation and access, since when we buy a physical copy, we won't lose access to it when the publisher decides that we need to pay another $400 per user to access their system, or if a ransomware attack takes out the archive. Obviously, storing all these is impractical for all but the largest library systems, but the number of times someone needs to cite an article in the 2003 summer edition of Phrenology Today means that you can get away with only one copy in an offsite, climate controlled storage space. The number of times I've had to do dumb piracy shit because a publisher literally pulled access to a non-downloadable article I was using for a paper *mid semester* was infuriating. Not to mention the time my school literally stopped paying our publisher liaison because they upped their rates by a factor of 10. We had a lot of 3rd/4th generation photocopies floating around that year, lmao
I love love love my campus library staff, at every campus I've been affiliated with. You all rock! And there's something so nice about just walking through the stacks to flip through journals rather than just scrolling and clicking through online lists of volume/issue numbers.
The technology exists for us to put 3d objects into .pdfs. For this reason, I propose the vase plot. The diameter of the vase is the probability density. or maybe the cross-sectional area is the probability density. It is intentionally ambiguous. Please cite my comment whenever you use my case plot in your paper.
The inner diameter or outer diameter or cross sectional area (selected at random when you run the program, no the module is not seedable) of the vase is the probability density. uploading this to npm and pypi asap
I suggest a 4 dimensional plot, called the aerofoil plot, where the aerodynamic drag coefficient of each 3 dimensional cross section of the 4d shape determines the probability density. Also every single 3d cross section is vaguely penis shaped. Please cite me
How about a 4d object casting a 3d shadow? The orientation of a 4d object projects n 3d shadow where n gives the spectrum of possible densities depending on the k-orientation of a complex data mapped onto a 4d-manifold.
The kooky patterns make it so that colorblind people can read the histograms. I'm red-green colorblind, and I get really passionate about data visualization, partially because I have trouble with certain color visuals that other people don't struggle with, and partly because data visualization is a very efficient way to communicate info when done correctly, and I also have adhd, so I appreciate their communicative strengths. But yeah, all that to say, I really appreciate when kooky patterns are included. It's immediately tells me that the person who made that visual is thinking about how their data will be perceived and wants to communicate it effectively to as many people as possible.
Designing data representation and slides for talks with color blind people and dyslexics in mind is a drum that I always have to beat with my trainees. A lot of the arbitrary complaining that people do about visualization and typeface etc is actually pretty ableist. Fun example is that comic sans is actually one of the easier fonts for people with dyslexia to read.
Interesting that it helps with colorblindness, i always ran into it in school worksheets and tests. They were printed in black and white so needed a way to differentiate between grey and slightly lighter grey, and we all just had pencils to make them ourselves. It became a bit of a competition as to who could come up with the weirdest patterns for their bar charts that were still distinct. Always nice when adaptations for specific things end up being useful for unintended reasons
the comic sans thing is a myth look it up. The same as fonts specifically designed for dyslexia like open dyslexic. More important than anything is font size. @@tglittle3166
Yes, the data visualization field should deliberately popularize plot color selections that don’t penalize the red-green folks. Or, encourage thinking about using line and hatching types that don’t require colors to distinguish
This is called hatching and there's a whole system behind it. It comes from heraldry and was developed in the 17th century back when there was already significant demand for printed illustrations but printable coloured ink was not yet invented. Basically, you have hatching patterns representing six basic colours; red which you represent with vertical lines, blue which is horizontal lines, yellow (or gold) which is dots, green which is diagonals from top left to bottom right, purple which is diagonals the other way, and black which is a grid. You get pastel tones by replacing the lines with dashes, and you can mix colours by overlaying the hatching of both, for example dashed verticals give you pink and if you interleave that with dots, you get orange
I'm not a scientist. Just a humble welder. I like your videos. I had no idea where you were going with this, but you had me nodding along, thinking, 'I get it. These violin plots are stupidly complicated and are not as efficient as other plots.' I get it! And you know, not being involved with the sciences, I didn't actually care about the plots one way or the other, but I could see how it would annoy an actual scientist/researcher. Then it finally got it where it was going, and I was like, 'Damn! That was masterfully done.' Keep up the great work.
@@NightsReignthis comment honestly made me chuckle for a good bit, i mean girls that are super duper into horses, n have posters of em and backpacks with horses on it, that typa horse girl lmao 😂😂😂😂
I used to work at a tight tolerance thin film deposition optics manufacturer. One day my current supervisor visited my workstation unexpectedly and asked, "you haven't been using violin graphs in any of your report generators, have you?"; "Never heard of such a thing. Why?"; "Good. I knew you were a smart one. Just don't. They're useless anyway." ... Later she quietly explained their inappropriateness. Turns out she wasn't just trying to prevent her team from using them either. The owner got top management together and instructed them to purge any current work of their existence. HE was smart.
I'm an old man, so I know something about typewriting erasure techniques. In rough order of technological advancement: 1. hand-erasing with an ink eraser 2. Liquid ink eraser solution, in a bottle. You would apply this to the typo on the page and it would break down the ink and fade it somehow. 3. An "eraser strip" as part of the ink ribbon where you would retype the offending letter using the strip and it would abrade/absorb/dissolve the ink 4. hand painting over the typo with white-out (from a bottle) 5. hand-held whiteout strips, with dried whiteout on one side. You would retype the letter with one hand, holding the strip against the page where the hammer hits with the other hand. 6. whiteout strip built into the ink ribbon. Same as the whiteout strip, above,, but you don't need to hold the strip, its part of the ink ribbon. Most solutions did not require removing the paper, because you can *never* get it aligned again. There's typically enough space where the hammers hit the page for you to get in there with whatever erasing solution you're trying to use.
Being a 90's child I still managed to use both mechanical and electric typrewritters. I had forgotten those witheout strips! In my mind, those used to come in little red plastic boxes, kinda like chewing gum packs
Those Selectrics with a delete key and a special ribbon were the best! I forget which technology they used - "glueing off" a plastic based ink, or a white-out ribbon. I think they used white-out if I remember correctly, but the few that used a plastic based ink that you could glue off within a few seconds were an absolute delight since you could use that technology on almost any paper and it didn't matter what colour the paper was.
When I was in undergrad getting my degree in economics, I saw these every once in a while when conducting research for papers. Turns out I wasn't an idiot for not being able to read these plots, I was just an idiot for getting a degree in economics.
@@TessHKM I think “the economy” is generally meant to serve everyone. Amazon is obviously not good for producers, and small businesses, but it is also in the long term bad for consumers. Yes it does provide them cheaper goods, but it also results in wealth being funnelled out of communities which kills small towns. It also results in worse working conditions, and lose pay for people in cities. Overall it’s a net negative to the living standard of the average person.
As a data analyst, you've made amazing points for not using violin plots--scientifically. But in the business world, violin plots are **pretty** and the vibe is how you get execs to make the decisions in favor of what you want 😂😂😂😂
@@GiovanniBottaMuteWinterI think that’s the point they’re making… they’re saying that in the business world, the information doesn’t matter so long as it’s presented in an attractive way.
@@daniel6678 Executives are only interested in the overview. They will naturally gravitate towards the violin plot for the overview data and assume the technical people will handle the details in the histogram. We make judgements on how relevant info is to us all the time, and pretty is not necessarily better. For example the more visually appealing a letter through my door is, the quicker it will be identified as junkmail and be thrown away.
Head on the nail, tbh. As I was learning data science post-BA, I was struck by how differently academics are expected to visualize info vs how you need to present that same data to "general audiences", i.e. rich kids who coasted into "business".
Related to that very last point about "why not just make it half the graph", when they made us do violin plots in high school biology we were told to put *half* the data on each side so that the width of the *whole* thing matched the amount of data. Which is absolutely not how anyone else does their violin plots. Also we had to do them by hand, which is as excruciating as it sounds
I thought that was how they were drawn as well. So that the area of the shape has an actual meaning, which is something intuitive to look at. Although now that I think about it there is no x-axis so doubling or halving all of the widths doesn't change anything.
During my masters, my advisor actually ENCOURAGED the use of violin plots, and I didn't really question it at the time. Rude jokes were made. I have a publication that has FIVE SEPARATE PAIRS of violin plots (and plots that cluster points into hexagons? for some reason?). And you're right! Side-by-side histograms would have been BETTER and MORE COMPACT 😭😭😭
Your story about everyone turning to look at you as the only woman... it gave me a flashback to school sex-ed where the teacher (to be inclusive) said "or a**l sex" and people turned to look at me which basically outed me to the teacher. But also what reaction were they expecting from me lmaoo
This was a great reminder about effective strategies for allyship. Any of the other ten guys in the room could have made the response she described, and it wouldn't have even seemed like White Knighting. That they turned to her was definitely motivated by empathy, but the effect is the same as what happened to you.
@@GSBarlev yeah I really feel like when you're a member of the dominant group in a situation like this best practice is to absorb as much attention as possible AWAY from the person who stands out unless they've made it clear they specifically want to say something
An alternative to a violin plot is a "beeswarm" plot: Instead of a smoothed density you plot each individual datapoint as a dot at its exact y-value, and the x-values of the points are chosen so the dots don't overlap, causing y-coords with a lot of dots to bulge out. I like them because you can simultaneously see the raw datapoints, and also see the broad distribution. One problem is that in naive implementations you get chains of points extending out from the center in a line, giving a christmas-tree appearance. But good implementations can avoid this.
The comment at the end about “why have two flaps” is the most upsetting part of them to me. The only time I’ve seen something like this (where they used histograms instead of smoothing, so less yonic) that seemed even a little defensible is the man/woman population age plots for different counties. It’s still probably better to arrange them differently but at least they were using the two sides for something.
Exactly. What is the point of having your data mirrored onto both sides? Your smoothed out histograms were so nice you had to show them twice? BS. I do see how stacking histograms can get cluttered, but as you pointed out: if the goal is to compare histograms without stacking them, then make an asymmetrical violin chart with labeled axes so you can actually interpret the data.
The one sorta viable use case that just uses the other side for something useful, but they also don't just smooth it out. Ex. population age plot between men and women, Vertical axis is age, horizontal axis is pop count. men on left bars and women on right bars.
Saw this on Polymatter and I thought the same thing. That’s the one viable use of this data visualization technique. But there it’s just two histograms turned on their side and placed opposite one another.
I remember vividly how my advisor emphasized how he read papers: title, authors, abstract then plots. If the plots were compelling then he'd dig in. Those plots absolutely need to tell a succinct and coherent story.
Interesting point. I've always thought of the graphs, plots, whatever they are as being a side dish What you said reminds me of my father telling me to always start with the maps in a book of military history, then decide if you want to purchase that book. Similar reason
Usually the quality of plots it very quick to evaluate. So if you have limited time and a lot of papers it makes sense to judge a bit on plot quality before reading the whole paper. However many plots require some significant explanation of the measurement system and conditions. Therefore after a quick look at the plots I often go back to the text.
This video sort of gave me the urge to come up with plots that are even more cursed than the violin plot. Like a stick figure plot where different aspects of the data set are represented by the size and orientation of the body parts.
what about a scatter plot in audio form? you map numerical values to hertz values, and instead of coloring points and lines, you give them an instrument. like a guitar note would be played for each data point, and a violin is played to show the line of best fit. i'd call it a song plot
Hello! I like violin plots a lot; Im a data science researcher. A violin plot displays not only easily comparable mean and quartile information, but also more granular information about the shape of tge distrobution. This gives violin plots a unique ability to intuitively inform certain dicisions about further analysis, especially when youre exploring new data with numerous distrobutions for the first time. Histograms have an issue of sharing the same axis, which, when trying to understand intricacies of distributions, can be difficult to read. Box plots are easy to read but can obscure information, maybe leaving readers to question if the choice of a box plot was appropriate. A violin plot allows you to render an easily interpretable plot which lays bare qualitative aapects of the underlying dustribution. This not only allows for easy analysis via the box plot, but also high level qualitative understanding. I never, when i read a violin plot, care about the scale of the distribution, but the shape, which i think they do fairly well. Of course, when publishing I may or may not use them. I find them incredibly good for visualizing data exploration, and like to use them when explaining datasets moreso than results. On the point of smoothing, totally. Thats why ive gravitated towards swarm plots for general qualitative distribution understanding. But, smoothing is an issue within itself, histograms have essentially the same exact issue in terms of bin size. Also, worth noting, im colorblind af, so overlayed color infornation may as well be jibberish to me, which might be part of the reason i hate overlayed histograms so much.
I'm not colorblind and I also can't read an overlayed histogram. Way too cluttered, and in my mind I'm trying to imagine what they'd look like not overlayed.
Honest question, but why not just use staggered histograms then? What does the rotation and mirroring add? Including information on medians/averages (and quartiles if you really need to, but if you have the histogram there anyway why would you) could be done in pretty much any format you choose.
@@TheManifoldTruth That's a great question, and the honest answer is it's really convenient to plot violin plots with seaborn. Another more honest answer is the aspect ratio of monitors. While they don't have to, histograms have their density along the Y axis, meaning, if you have a lot of distributions you want to compare, it's easier to fit a violin plot which orient things horizontally. Yes you could just rotate the histogram horizontally, but the love of making the "perfect" vs the "good enough" plot starts to die out around your 10,000th plot in your career. Another, maybe more satisfying but less honest answer, is the mean and standard deviation of the distributions is useful in comparison, and that comes out of the box in most violin plots. Really, the debate around this feels like the debate around the oxford comma; strong opinions around "rules" which are really well entrenched but still arbitrary preferences. I don't have any evidence to back this up, but I wouldn't be surprised if violin plots were more common in more data rich and fast moving research domains like data science rather than physics. In data science, making a plot that's good enough quickly is way more attractive given the sheer volume of visualization required in the domain. I have to note though, a lot of my takes make a lot more sense in a business context, rather than an academic context. Papers take a long time to make, so having a sub-par plot makes much less sense.
The colors on the graph at 20:37 are really really hard to tell apart for people with deuteranopia (some 6% of males). These pastel colors are hard, it's much better if they are very definitely yellow, or blue, or red, or grey, etc. Just wanted to chip in, since we're talking about it already.
As a non-scientist, I've always thought these plots were confusing and just obviously above my pay grade. Very validating to hear that they are indeed as uninformative as I thought they were. Much appreciated❤️
This is such an important video. I remember that one of my high school textbooks had some stupid plot (that I now understand to be a violin plot) that the author loved to use. That book could have been half-a-pound lighter if they just took them out.
@@thefaboo yeaahhhhhhhh radar plots are cool until you realize the area can be altered by how the spokes are ordered lol. It's just a multi-variable plot with connections between each percentage for no real reason
The whole point of data visualisation is to make it easy to understand, otherwise you'd just dump raw data in table format at people, so your POV is perfectly valid. Of course, depending on context, if you are writing something to be read by people familiar with the topic instead of the general public, you can go a bit spicier on the complexity, but it should always be as simple as possible
On the contrary, they're information dense, combining histograms and box-and-whiskers plots for multiple sets of data. There only uninformative if you decide not to read them.
@@Dongobog-ps9tz Vibes are definitely not the whole point of a plot, but a nice feature if you have a well constructed one. You should still be able to recreate the original data from whatever visualization you end up using not only so other people can try to find other useful features in the dataset, but they can verify the plot actually matches your original data. If your visualization is "vibes only", it's a marketing gimmick, not a useful research tool.
@@NameName-u9e A plot is a lossy compression where you're trying to turn the data into something human readable. Maybe we have a different definition of vibes but all I'm talking about is that the plot tells a story with the data. It can misleading and entirely truthful.
I've always felt vaguely guilty as a scientist for never using the violin plot functions in any plotting tool - thank you for lifting this weight from me.
You'd *apply whiteout w/ the page STILL inside* the typewriter, wait about 1 minute, and then use the *backspace* key to shift _back a space_ to your mistake so you can apply new ink over the dried whiteout. Later on there were typewriters that could apply the whiteout for you. Usually electric ones. There were some other more esoteric solutions too! But yeah, most people just applied something directly to the paper and shifted back a space. Calling it "backspace" on a computer keyboard is one of many holdovers from the typewriter days. So is the stubborn yet incorrect convention of double-spacing after a sentence. (Double-spacing is essentially a typewriter trick/convention that makes things easier to read because periods are so small and on some typewriters don't offset enough. Single space after a sentence has ALWAYS been typographically correct in the world of typesetting books - plus print & graphic design.) We use a lot of old terms and symbols that don't apply anymore. Like saying "rolling" when a camera starts recording comes from the early days of film when there was a step to roll the film. Same way that many save icons are still simplified shapes of floppy disks - lots of kids grow up associating that shape form with "saving" without actually knowing it's a real physical thing. Or how how we associate the power symbol with turning things on (it actually was originally a standby-reset symbol or something but that's a whole different conversation.
@@NoeLPZCyou're both right - the 0 and 1 do represent binary states, but the version of the power symbol with the line crossing the circle was originally a standby symbol. If I remember correctly it was meant to indicate something like what we'd call sleep mode as opposed to turning something all the way off and on. The actual power-on-off icon was supposed to be the line totally within the circle, not crossing it. It's even still used in some specific cases now - I work in a lab and we have vortexers that have marked switch positions for on (line), off (circle) and touch-activated (line breaking circle) modes.
actually most correcting typewriters had an adhesive ribbon that would lift the letter off the paper. there's a fun technology connections video about it
As to the violin plot joke thing social difficult choices. A classic is to look confused and ask why the joke is funny in a very sincere way. Another way that tends to work for me is to just say "dude, c'mon", as that puts the onus squarely on them I do find it very helpful after a moment like the one you described to reflect on what I could have done differently while keeping my goal in mind. Generally speaking with stuff like this, the best approach is to deflate the other person, so to speak. I wish you all the best in your future strange and awkward social interactions
Violin plots are overused but they have a use case for comparisons of a large number of samples that have complex distributions. We use them for this when comparing gene expression in cell populations. We can quickly see the 'shape' and get the vibe of the multimodel gene expression for large sample numbers.
I am not a scientist. I have never heard about violin plots before today. But now I know about them and why they are mostly useless. I cannot overstate how much talent you have for making subjects like these interesting. A lot of the time, I pause science videos while alt tabbing to other things, taking in the videos in chunks. I always seem to watch yours straight through, beginning to end. The way you break up your videos with music and title-cards really helps make them digestible. Thank you!
I thought, "Well they look funny, but surely there's a reason why they'd be useful!" And then at the 8-minute mark, you finish explaining how you make a violin plot and I'm like, "Okay but why would you do that though???" I think it's a terrible plot already and there's still over 30 minutes of reasons to listen to. Brilliant!
One principle of peer review is that we shouldn’t just assume authors are analyzing their data correctly. I appreciate violin plots because they provide the reader reassurance that the use of box plots is appropriate. Absent the density overlay, I worry (and sometimes rightly) that the authors are using box plots in inappropriate contexts (as evidence from the fact one sometimes sees multi mode distributions in violin plots in papers)
I agree, although I do agree with Dr. Collier that violin plots are less aesthetically pleasing. Plotting semi-transparent points over a box plot sometimes can work. Perhaps everyone should make a separate violin plot for reviewers, as well as box plot or something else. Hmmm....
I prefer to see a test of gaussianness (the actual test name escapes me right now). Then you have a one liner saying "yep, boxplots are good here". No wasted paper.
I think you are referring to a Q-Q line plot? It gives a quick indication if the data is normal and any possible skew at a glance. They are also very useful for comparing goodness of fit between distributions.@@davidjohnston4240
I agree whole heartedly with the second portion of the video (the first too). In situations like the one you described (the not-funny jerk), I will invariably pull out my phone, act like someone's calling, and then interrupt and say loudly "1950's on the phone and wants its joke back". And I'm a male who was born in the late 1950s. I have absolutely *no* time for people like that and *do not* let it pass without comment. Why should I? I'm at an age where I don't have to any more. And coming from a time when "humor" like that was "normal", I actually feel obliged. Damn, you're good.
I never heard the name until today. I always imagined them as vulvas with misplaced clits and thought when they showed up it was scientists being juvenile somehow
My grad school used to have professional plot makers on staff. It was long before my time but the space they worked in was still there and there were some people still working there who remembered them.
This went from ha ha, to not ha ha real fast I want to say that you do an amazing job explaining the female side of these interactions The fact that you articulate why it was not ok is a great source of information for people who want to do good but don't yet see how certain things are problematic
You make me want to do video rants about bad research proposals, but then I realize that you are about 1000% better in front of the camera than I could ever be. Keep it up!
I think the type of smoothing they're doing is called a Kernel Density Estimate. They didn't teach us about KDE plots in physics classes because, as you say, they mostly just show vibes, but it's still better than the arbitrary-window smoothing you're suggesting they do. See the Seaborn documentation for violinplot
This is actually kind of funny. I'm a PhD student in statistics, and I learned about violin plots for about 10 seconds in one of my first year courses. Just a few weeks ago I ran into a situation where I actually considered using violin plots to convey the distribution of sequence lengths for a system running in different states. However, I did ultimately decide to use a different plot, because the finished violin plot just looked too weird, and would have been distracting. I admit, I've never seen them used in a professional setting by other statisticians or scientists.
Wow, actually sharing your experience at work was really enlightening. I never would have thought about how distracting and uncomfortable it would be beyond the off-color comment. The lingering after effect sounds like it had much more of an impact. Thank you for sharing.
I'm convinced the original paper was an elaborate troll. It's funny because sexual body parts are inherently funny. For the same reason farts are funny. It takes a physical (physics!) aspect of ourselves which is tied to emotions that we normally try to keep private and forces it out into the open. This emotional discomfort is mostly politely expressed as humor. Now had your joking colleague just said with clear disgust, "It looks like HONK, ugh" wouldnt that have been worse? Of course you are right, best course of action would have been to say nothing at all. The obvious humorous response to the violin plot is to come up with some sort "c*ck" plot and then come up with a logical way to overlay it or point it at the violin plot. And , get it published straight up. Never acknowledge any physical resemblance. That would be funny. I know you'd like it to be but, clearly, humanity isn't better than this.
Ha, only a minute in and I already adore the video. Out curiousity I had a professor give me old dissertations to see how they used to do data visualisation back in their days. And the solution is glue. Glued in graphs hand drawn on graph paper. Glued in photographs of the setups. And then it clicked in my head why we learned to do all that glueing stuff on paper in elementary school.
The animation on 18:20 could've just been a line diagram. Perfectly conveys the same data with just a single image. Super easy to plot in Excel too. No need to make it a complicated animation that’s impossible to understand. I feel like that's often the case with dataisbeautiful. It's almost a competition in presenting the most basic data in the most convoluted way possible. Like those "make the worst volume bar" UI challenges, but serious.
Since it's supposed to be different levels of legalization I would've gone for a stacked area graph, but I agree with the sentiment. It looks kinda cool but it's definitely worse for conveying information.
one of dataisbeautiful's biggest issues is an obsession with making data animated that doesn't need to be it's honestly way easier to tell how quick something is by looking at a slope on a time plot then by trying to compare different speeds half a minute apart in the same animation
@@antonhelsgaun 4 separate lines that demonstrates the change over time. Then you don’t need to make it an animation. Or do a stacked area graph like mentioned above.
I've been watching your videos for a while as someone who works in humanities but with a great interest in physics and finally I can explain something in turn! FWIW, corrections for even manual typewriters are a lot simpler than one might think. On most standard models there is a button off to the side that switches the ribbon (which by Einstein's time would at least have two or more types of ink in a uniform stripe across it, so it looks like one of those weird sour fruit roll ups that's segmented by color horizontally) to the 'correction' ink, which is thick, white and usually on the bottom; most of the time it just shifts the ribbon up like a centimeter or so, but some fancier typewriters can hold separate ribbons just for corrections that you can toggle on and off. There's no need to touch the paper, thankfully. After you've switch the ribbon and fixed your mistake, you can switch back to the regular ink seamlessly. You did typically have to move the hammer to where you made the mistake and type that exact letter that you screwed up again in order to cancel it out with a negative image in white ink, but muscle memory makes it like riding a bike. Plots were virtually impossible on these things, though, as you said-- it would be like trying to make ASCII art by hand but you can only use a springy lever to move the cursor.
I did a violin plot in one of our articles, and after watching this I can say I did it pretty mindlessly. I was an option. It looked like a cool data viz. It showed an increase in a median altered genome segment length as well as a higher number of longer segment alterations. Which is basically the same thing, as I look at it now. But during writing we were in that phase of trying to get our point across. In that mindset the overloaded graph seemed useful. After watching this I'd just do ridgeplots. Thanks for taking your time to talk about this.
When you made a mistake when typing on a typewriter you had to use a backspace then you would use a special white chalk covered piece of paper - put it between the typing tape and the paper then type the wrong character again - that would "erase" the wrong character, so you could use backspace again and press the correct key.
@@marcins.1128 I'm old enough to remember when those Tipp-Ex strips came out. They were an amazing innovation. Before that, you needed to use an ink eraser, or some noxious kind of solvent that faded the ink on the page. Later, some typewriters had the whiteout strip *built into the ink ribbon*. It was a magical time.
@@marcins.1128 Im only 26 and I happened to grow up with one. There was one at my moms old job when shed take me as a kid and I'd play around with it. I'm not sure if the backspace function worked the same way though.
@@mercury5003 there were also newer typewritters with two tapes - one of them was the erasing one. They store some recent characters in memory so you could use the backspace as on your PC.
Or just cross it out with X-es and type it correctly right next to that or above that. Or use correction fluid, wait for a bit, then type over it (it always looked different though), or re-type the whole page.... depends on what you're doing and what are the tolerances for nice presentation vs just having the text on a piece of paper.
My wife in astrophysics -- she has a very similar experience to what you described with the Violin plots, where she'll go out to dinner with a bunch of male physicists and the waiter will come up and say "Well, ladies first!" And she can't explain how frustrating it is to have the entire table be alerted/reminded of the fact that she's the only woman there. It's tough in that situation because she can't comment on how it makes her uncomfortable because the waiter's not being a bad person about it and it'll make her look bad if she brings it up at the table. So she just kind of has to deal with it. It sucks!
@@Daniel-ih4zhI genuinely hate when people acknowledge I'm a woman and that I'm special because I'm the only woman around at the moment. Why the hell does my gender matter to you guys, I'm here to do my business like everyone else. I didn't "earn" the trait of being a woman so it seems weird to point it out as if it were extraordinary.
@@vickypedia1308 I 100% agree and understand. But it seems like people are having their cake and eating it when they hold this sentiment while also promoting things like WiStem and AA
@@Daniel-ih4zhI don't know what those terms mean (not a native speaker, so if those are terms where I live they likely have different acronyms). However I would like to add that the people who get uncomfortable when someone highlights that they're "special" for being some sort of minority are usually not the same ones who actively advocate for special treatment. For those who do, it tends to be because they're two different kinds of "special treatment" and one of them feels patronizing while the other doesn't. Personally, I think we should strive to decrease sexism at the workplace, not force women quotas or other artificial stuff like that. Sexism is more likely to happen in fields that are predominantly pursued by men, simply due to the lack of women who can point out if someone is being sexist. (And even if there are one or two women there, you don't want to be *that* person who complains about something nobody else sees an issue with.) In my opinion, the fix isn't to forcibly try to get women into that field and making a big deal out of it. I would certainly not want to be the token woman who only got in because a company needed to fill a quota. I think we should rather try to make the place feel welcome to *any* person, women included.
@@Daniel-ih4zh You think it's bizarre when people who want to be included as equal participants, get uncomfortable at being singled out for no reason? It's *almost* like you think "equality for minorities" is the same thing as "special treatment". Hmm. Maybe you should reflect more about that.
9:09 Histograms are pretty difficult to use for comparison of single-cell RNA sequencing data. You can also make a variation on the violin plot by making each half of the violin represent two different conditions, eg. a control condition vs. some treatment (eg. how does this drug that inhibits this pathway influence expression levels of this other thing, compared to no drug). The ridgeline plot is one alternative, but it's pretty space-inefficient unless you overlap the data and risk harming readability.
This video was more of a rant than a legitimate analysis of the use cases for the violin-plot. Even the example shown at 10:22 shows just how unreadable overlapping histograms become once you have more than 2. Violin plots are literally just a way of visualizing several histograms at once without making them collide with each other.
About papers 100 years ago. Based on the memoirs of the Stephen Timoshenko it seems that there were special people at universities who prepared plots. You would give them hand scatched drawings and they prepare then versions for a paper.
In 1956, Bette Nesmith Graham (mother of future Monkees guitarist Michael Nesmith) invented the first correction fluid in her kitchen. Working as a typist, she used to make many mistakes and always strove for a way to correct them. Starting on a basis of tempera paint she mixed with a common kitchen blender, she called the fluid "Mistake Out" and started to provide her co-workers with small bottles on which the brand's name was displayed.
When I wrote my bachelor's thesis in uni my supervisor insisted that I would use violin plots to show some data. The problem was, the outliers in my dataset where not many but they were far out, like really really far out. So in the plots the often just weren't visible at all but I had to include them to represent the data accurately, at least that's what I was told. In the end the captions for every figure featuring these plots ended up absurdly long because I had to explain what the hell was going on there lest I forget it myself. So I 100% agree with you that these plots are just bad in every regard. The bit about these plots looking like genitalia is also true in every regard: because of these outliers some of my medians ended up at the bottom of the plot so that's of course where the belly was located, this however had the fun little side effect of making these particular plots look like a cock and balls. So my supervisor basically insisted that I would draw a bunch of dicks in my thesis. These things are truly terrible, they just always look like genitalia
I sometimes have a lot of possible plots I could do, and generating all of them as violin plots is useful because I don't know in advance if my data is bimodal or whatever. And I can put lots of violins next to each other and compare them, unlike histograms. But sure, I won't put them in my presentations because by then I know the best way to show the data. Fair enough. Also excellent point about the smoothing! People not understanding their statistical models bothers me a lot.
I've been recommended this video endlessly and eventually realized I'd subscribed from your other stuff. It's wild to me that this doesn't have more attention. Maybe the algorithm just knows me too well. There's as much to learn from your delivery as there is from the content.
There's a strong link between embarrassment and humour, and a lot of embarrassment over taboo topics like anatomy that roughly half of people have (particularly among teenagers and people who haven't got over having been teenagers). As for the use of references as a form of comedy, the basic idea is "we all laughed at {thing} then; remembering it will bring you to a similar state of mind and probably make you laugh now". If you didn't laugh as a teenager when someone broke taboo, then you're not going to find it funny when people try to evoke the experience as an adult. On the other hand, if you're someone who found Monty Python hilarious, then someone saying "this parrot - " (the pause is essential) is going to remind you of John Cleese screaming at Michael Palin and very likely get at least a smile, if not a chuckle, out of you. There's also a whole in-grouping thing going on - "you and I share an understanding of this reference, therefore I am a member of the in-group and popular and successful"
People love "chart art." I used to work in finance. There was pressure to replace data tables with charts when possible, even if the charts ultimately distort the data. But the chart makes the report look pretty. A very common chart that I absolutely hate is the 3D pie chart. The pie chart is already a bad chart, but someone has sabotaged what meaning a pie chart has as the areas have become distored and are no longer direct representations of the weights.
A 2D pie chart is only bad when there are more than two categories in it. A pie chart with two categories is excellent. You can immediately see whether the fraction displayed is closer to a quarter, half, or three quarters. For more than two categories, a stacked bar is better.
I mean, you can immediately tell whether X% is closer to a quarter, or a half, or whatever without the chart. Charts should ideally be used to get a snapshot of lots of data--not just to make a list of percentages look fun.
While I also find violin plots a bit hard to read, for something like the paper at 13:30 where they apparently want to compare 7 different probability distributions side-by-side, I'm not sure any other option would be much more readable. It's probably too many to overlay the probability densities on top of each other (although I agree that's a good option for comparing 2 or 3 distributions). I guess they could do 7 side-by-side histograms or pdfs. 🤷🏻♂️ (By the way, Fig. 3 and 4 you point to aren't actually a histogram of the same thing, they're a bar chart of something else. Note the x-axis isn't numeric, unlike the y-axis of the violin plot. Sorry to nitpick!)
I made a comment saying basically the same thing, the arguments in this video don't actually make sense in the context of the examples given. Even though I have never personally used violin plots before, I am now convinced that they are a very effective way of visualizing many distributions at once without overlap.
As a PhD student in Bio, I was also on the way to say this. I have a lot of overlapping distributions for a lot of conditions. I think one solution is to distill your conditions into the truly necessary ones. Then, I think the ridge-line plot (or a less overlapping version of it) is definitely better than a violin plot
I come out and say that I used the Violin Plots in my PHD thesis. I had grain size distributions to display. Histograms have the big problem that you can not well put a lot of them over each other. I had I think ten different samples I wanted to display next to each other (to make them comparable). Furthermore, I wanted to use the median to simplify the further discussion, but I also wanted to show the actual distribution. Of the Particles, as it was important for the behavior I was looking at. The Violin plot was a good combination of: 1. The median is visually represented. 2. I do show the actual distribution, so I can discuss the skew, if there is one. 3. I can pack a lot of them next to each other, the reader gets a good visual representation of the different distributions. 4. Yes, I find them visually pleasing, if done right. I did plot them horizontal, so. 5. I think the Violin plot is symmetrical, for the same reason as the Boxplot is symmetrical. And when I think about a metal grain, which I worked with). I thought like it represents the form of the actual grains. I did not add another histogram. I did give the smoothing value and normalized the width to one. As, I saw the second part of your video, I am sorry to hear about the unfortunate situation and that this plot makes you and other women uncomfortable, this is unfortunate. I did not know about this, and it was absolutely not obvious to me. I am sorry for that.
I worked as a statistician in the 90s and into the early 00s, and never heard of a violin plot. Knowing what they are now, I see they are entirely useless.
thanks for explaining. i used to think I'm somehow stupid for not understanding them. you are sufficiently compatible with my tastes for long explainer vid host and I am subscribed to you now after watching this. these tend to either look like genitals or (sometimes) weird turds.
I love your videos. Like why would I care why a certain plot is horrendous? 40 plus minutes later I'm super invested and ready to go on the war path about violin plots.
I'm only 8 minutes into the video, so you may well change my mind before the end, but I have made good use of violin plots in my work. When I've been comparing posterior distributions of multiple parameters from multiple different MCMC chains, the violin plots have been an excellent way for me to tell at a glance what the data is doing, and if there are any severe problems. Boxplots do not tell you if your posteriors are multimodal, violin plots do, and a histogram with 30 variables is going to be completely unreadable. I don't really care about the precise values of the interquartile ranges, I want to see if chains are converging to the same unimodal distributions. None of this information is for presenting in a paper (I'll give sensible posterior distribution plots there), it's for me (and my collaborators) to understand how well my MCMC is converging, and where the problems are. For that they work pretty well. Okay, now I'm going to shut up and continue watching to hear what you have to say! EDIT: all my violin plots were horizontal. It never actually occurred to me what they resembled when viewed vertically.... Also I work in veterinary epidemiology (I'm a mathematical modeller), where the majority are women, and my supervisor is a German woman (also did maths at undergraduate), who has no issues with speaking her mind, so I don't think anyone would be as daft as to joke about it!
ja but if your data set is multimodal, why even use a box plot? Or, like, make a histogram since that's showing the important parts and then put the quartiles in a table or something?
I agree with this use case - the shinystan package in R makes good use of violin plots and has saved me a lot of time in evaluating models. But I'd argue that the usefulness of violin plots go beyond MCMC. Overlaying density plots / histograms is ideal in most situations, but things get incredibly cluttered the moment you have >4 lines to plot. Having multiple panels of densities works - but is essentially a violin plot without the mirroring.
@@qualia765 Rdgeline plots are good and probably the first choice, but violins are particularly good when you want to compare groups across different strata (e.g., geography and income)
Once again, I must commend you, Angela. You've managed to keep my attention through 42 minutes while speaking about a subject I had previuosly no knowledge or interest in. That was great.
Great video! Your point about choosing a plot which conveys the most important thing about the data really hits home. Exploring one’s data is so important.
It is funny, but my master thesis was actually one of first theses in my university that were typeset using LaTeX. Probably the first thesis, actually. And my diagrams were drawn using Postscript. Yes, I wrote the programs to draw the diagrams. Of course, Knuth created the whole digital typesetting thing because the expert typesetters (actual people) were retiring. And the new generation could not do his "The Art of Computing" well enough because of all the diagrams and mathematics. Yeah, we came a long way. I was just one of the people right in the middle of the old and the new. Later, I was using gplot to generate Postscript graphs. It still exists, I believe.
I think the reason the histogram got mirrored is because "symmetry makes it look and function better" (which isn't necessarily true in general, and certainly not true in this specific case, but it feels like a common misconception, though that might also be personal bias because my spicybrain likes symmetry). Also, the joke is that genitals are funny. Not just AFAB people's genitals, everyone's. I've seen some radio reception graphs that look like a different set of genitals, and had to stifle a giggle. Like, the sexism stuff is definitely real and valid, and there's a time and place for genital jokes, and an academic setting definitely isn't that, and the jokes absolutely age more quickly than some short-lived isotopes, but still.
I've never seen these before. I do think they would make sense in One particular situation. You give an example of temperature data distribution per month (12 separate plots) but suppose instead of months, you want to plot the annual temperature distribution of all the world's capital cities, and you want to put a continuous variable (latitude) on the X axis, with temperature on the Y axis. In that situation, the symmetry of the violin helps centre the data correctly on the X axis. I completely agree that this format should only be used for overview information (but I note that it can work without colour, while certain other types of plots can be difficult to read without colour.)
There are so many data visualization that just should be wiped from existence. Violin plots are at the top of my personal list, they are outlassed in every way in modern data visualization
I totally agree about the "feminism" point. The level of defensiveness, especially in STEM is insane. Re: the plot: I think they doubled it assuming we'd get a more intuitive sense of the "area" corresponding to an increased histogram value. Still a shitty plot
Yes, the feminist pivot was a disqualification. Could have started with "the useless plots look like a ****", that would cover it. But dipping in the intersectional victimology half way?
@@peterpeterson8792 "Intersectional victimology" is a loaded way of saying "shared how it made her feel". Maybe you don't care about that part. She identified that section pretty clearly. If hearing how she feels makes you feel some kinda way, that's for you to examine.
@@bbqchezit Do you realize that the hypothetical offensive situations never happened, she went on freeflow of inventing nonsense about some hypothetical men making a stupid joke and how she would feel if that happened. And then "allies" jumped in here ready to pre-save poor pre-victim of her imaginary pre-situation of a pre-bad-taste infantile joke she imagined that surely would traumatize her forever. Oh, "allies" feel she is already traumatized by her own imagination? And sure, we should feel for her trauma and cancel the plot? Thus, from disliking the plot she figured if she comes up with a me-too victim imaginary situation, and how horribly she would feels about it, that sure will erase the graph. By the way, unless one has no spatial (2-d spatial!) imagination, there are valid and well demonstrative applications for this plot, perhaps not for her data. I wonder how leaves make her feel? Ever thought of it? Think of it, violin plots or ****** if you wish, all over the forests, trillions of them? I never thought violin graph looked like a ****** until this chick made a stink about it. And still don't. Any other offensive shapes, circles perhaps? Just get real, get therapy if you need, and don't ask others to participate in your manipulation by your imaginary issues that "make you feel".
I don't know if anyone has mentioned yet, but a population pyramid comes to mind as a use case where it makes sense to have both the mean or median and quartiles, but the overall shape of the distribution is still important and useful. Mind you, I'm typing this at the 9:20 mark, so maybe that does come up in the video at some point and I'm just jumping the gun.
The population pyramid could instead be two stacked histograms. This would make it much easier to compare the male and female populations at a given age. However, it wouldn't be a pyramid, so you would no longer be able to use your copy of _Demography_ to keep your razor blades sharp.
But those convey so much more information. Each bin is labeled and usually the left and right are not exactly symmetrical, the left and right are assigned to male and female population. So say there was a particularly deadly war, you can expect a bigger dent on the male side between certain ages than on the female side for the same ages. The violin plot fails in all of these points, it isn't labeled, it has no bins, it's smoothed so it becomes even more vague, and it's symmetrical for no reason.
@@3snoW_, oh sure, it's not a violin plot, and I wouldn't want it to be one. I just mean that it's a histogram where it probably wouldn't hurt having box plots within them (one per side), so you could quickly compare the median and quartile ranges of male vs. female populations, assuming you wanted to force that into a single visualization and you didn't want to set aside space for a table or something
No idea about actual science professionals, but the r/dataisbeautiful guys probably see the plot and the symmetry reminds them of audio data visualization and they think that's fancy, and that's all the thought they spend on it.
Starting my second year in college with the goal of graduating with BA in mechanical engineering. Found your videos accidentally! Love them so far! Thank you for sharing.
😂😂😂 OMG, when I first saw these violin plots I had the same thought. I feel validated that someone else had the same thought, because I felt that my profession was slowly corrupting my brain - I'm an OB/GYN. Thank you for providing content for me to watch on my post-call day. It sucks being a woman in a science based field, and unfortunately this bias has seeped its way over into medicine as well (shocker). This is why representation matters, but we must also continually actively dismantle the patriarchy
You've become my favorite lol. I used to complain to my bf about how anyitme you present data online people always point of "Correlation doesn't equal causation" as if that dismisses all the data or is a complete argument. When you referenced that a few videos back I melted. I'm so ready for Schrodinger's cat!
I love these kooky patterns too! I first ran into this in Trello, where the labels (categories) are colored rectangles; but that sucks for accessibility. So they have Color Blind Friendly mode where they add distinct patterns to the colors. Everything should do this! I'm not colorblind and it helps SO MUCH!
actually there IS a type writer that can erase stuff and uses some weird ass material science I don't know about to kinda suck the ink out after you write it, my good ol' technology connections has a whole video about corrections in type writers
@@pmcgee003 There was whiteout tape that came in dispensers like Scotch tape. There was a way to shift the ribbon out of the way, and you'd backspace over the mistake and type it again while sticking the whiteout tape between the type hammer and the paper. That would type over it in white. My mom used this back in the 70s. Later, there were typewriters that had something like this as an auxiliary ribbon. Actually, more often they would use a carbon-based regular ribbon and the correction ribbon could literally lift the stuff off of the paper when you typed over it.
@@MattMcIrvin Ah, interesting! i just looked at my "brother AX310" electric typewriter. (heirloom) It seems to have a translucent correction tape, if anyone cares i might try to find out if that is how it works 😅
Typewriters like the Selectric were more like letraset than ink. They erased by using a sticky tape that literally peeled the “letraset” character off the page.
Hospital business analyst here... the reason I use violin plots is to specifically highlight the fact that the data is wonky and it cannot accurately be represented by an average... despite it being reported out that way on ten prior occasions
That’s what I was thinking the whole time. In corporate settings, you’re not just trying to explain your data obviously, but often have to show why the counter side’s “data” is misleading if not out right deceptive.
You bring up a lot of really good points in this video! I (maybe guiltily, now) honestly really like the idea behind violin plots, but I also feel like the standard way they are constructed in most literature does not do them justice. I’m an empirical economist, and I can’t tell you how many times I’ve made a box plot and wished to have more granular information about the statistical distribution I’m looking at, while still keeping the feel/presentation of a box plot. I think violin plots could be a lot better if the smoothed kernel densities were just replaced with cleanly-drawn histograms with identifiable bins. That way, we could know what we were looking at, and it wouldn’t look vaguely sexual… which, let’s be honest, is the first thing everyone thinks when they look at these graphs. Also, I agree with your sentiment of “why has the aesthetic choice been made to reflect the densities over the y-axis?” It doesn’t make sense to show the densities twice. The one-sided violin plots should honestly be the standard, if anything.
I personally like raincloud plots, since it gives you the granularity of including the box plot with all datapoints + the kernel densities. They can take up more room though, so including a lot of them all in the same figure can be challenging and overly complex.
I would argue that in the era of information, violin plots have an important purpose. I get that many dislike them, and I appreciate why. It's annoying to take a ruler and run it across a list of plots to figure out the values. But 99.9% of people nowadays are not actually doing knee-deep research that requires precise fact-checking for these plots. Also, usually whoever made the plot has also published their data so you can just make your own plots or use the data directly, which is actually way more accurate than eyeballing some numbers if you're actually doing math for something in your paper. And violin plots have a very useful feature, skimmability. They reduce a plot down to its basic shapes and make it so the human brain, upon skimming, can quickly identify "hey, this visualized 2D object has three humps, whereas this one has two. And this one looks just like a ball. These are fundamentally different samples!" Instead of just showing you the curve of the shape (like a normal histogram), the violin plot also gives you the ability to "feel" the volume of the displayed "object" at a specific point. This is useful because our monkey brains are better at identifying and classifying objects than curves. As an example, what might look like a slightly more sudden downturn (but no big deal) in a histogram could give you a "wow, this sample is tapering out like crazy at the bottom!" effect. Also you can often just throw violin plots on top of your box plots and it will just give readers a better understanding of the data than pure box plots. However, I do agree that smoothing diagrams so they look pretty can be quite obnoxious and sometimes even obstructive to science. And it happens most often with violin plots, since they give more of a "wow" effect if you smooth them like crazy. Additionally, I think violin plots really only "work" with certain types of small data (i.e. just look at these three humps). If there are too many metrics, it becomes a nightmare to read and interpret violin plots (as you have stated) and it should just be thrown in a different diagram. Basically, if your violin plot looks more like a weird snake than a violin, you should rethink how you're presenting your data. And honestly, I do just want more research on these plots and their effects on scientific research. If you can get a paper together that actually proves violin plots are trash and obstruct science, I would love that. It would help me accept that everything in this comment (so far) is just a subjective personal experience from me and my professors, and I'd be more inclined to never use one ever again. My counterarguments to the video are subjective experiences, after all. Also, then I could cite you, which would "appreciably increase your market value" as they say on LinkedIn, I think. Oh, and I'd love to see the feedback of the people I've worked with who have conducted extensive research primarily relying on violin plots. I'm cool with sharing videos like this with my friends, but I'd rather give my professors something more academic, you know. Addendum: Wow.. I don't have a vagina and I leave thoughts about genitals at home when doing research, I can't believe I never realized they look like . Yeah, that will actually make me think twice about using them... Thanks to the creator for the story and transparency, it must suck to put themselves out there to this extent. And I'm glad they decided to ignore the idiots who go "huhuhuh va" at work. Especially since many of those people will likely see this video, potentially even including that f-ing guy. The "joke" is so thoroughly unfunny that it made me almost laugh at how badly the guy conducted himself. I think it's best to just pretend those people don't exist, in most cases. I really hope some of his coworkers or friends gave him a serious talking-to after the presentation... If any reader experiences this kind of behaviour from someone they know, please just sit them down in private and tell them it's not ok. That said, as many commenters have mentioned before, something looking kinda suggestive does not mean it is wrong to use scientifically. It's just really nice to have a box plot with extra data about a specific metric sometimes. Sometimes you just wanna be able to directly compare like 10-20 different f-ing plots without having them overlap and look like a horrible unreadable cluster-f. Especially if you expect the reader to find something you didn't expect. What if someone finds a cool association that would be hidden behind the back layers of the fancy 3D layered histogram? You cannot seriously tell me you can clearly see the beginning of each distribution at 21:00... Also, accessibility! Not everyone can see colours, colourblind people exist. That's why the one histogram in the beginning had kooky patterns, colourblind people wanna understand data distributions too. Also, I know it's the digital age, but many researchers like printing out their important references and notes. As a starving student, it costs soooo much more to print something in color rather than greyscale. I implore you to take your "useful diagram" at 20:40, print it in greyscale and explain to someone what it means. Or better yet, just ask your red-green colourblind friends to read it. In this case, not even kooky patterns can save you, because you want them all to overlap! Not every design decision is good, many of them have problems. But I would say violin plots have a clear reason to exist, even if that reason is a bit nuanced and overlooked. And yeah, I don't like making women uncomfortable, but anything is suggestive if you look at it too much. I think most of us can agree that the awkward situation was created by that one weirdo, not by the plot. However, I hold hope that perhaps one day some psychology student will set out to prove me wrong on that. Until then, I will probably keep using violin plots every now and then
I've been watching this video thinking "Mmm, I don't think I'm seeing where Angela is coming from here. They aren't that bad." ...until the bean plot at 23:07. An absolutely chaotic dumpster fire of borderline illegible meaninglessness. I get it now. I'm revolted that it took me this long.
The big problems are 1) there is no scale provided for the frequency of the distribution and 2) If you want to compare how two distributions differ, you need them overlayed on top of each other, but violin plots are presented side by side instead of overlayed (you could get rid of the box plot in the middle and then overlay them, but then that just makes them harder-to-read, unnecessarily-smoothed histograms)
7:38 as soon as you explained this I literally shouted, out loud, "WHY?" I had one term of high school statistics and one term of quantitative anthropology. I am not an expert in this. I am, however, able to understand when something takes everything that is good about two things and removes it, leaving only the useless stuff. This is that.
ok, I had to stop watching due to being a parent and didn't come back for a long time. whoops. I can safely say ALL my opinions were validated, except that... every time, I think of dangerous butt plugs rather than genitalia, which is even worse, imo? regarding why they mirror the curve about the axis, I think 1. some people are unnaturally obsessed with bilateral symmetry, and 2. mirroring the curve makes the changes in the data more dramatic, and this is a plot for the wholly unsubtle.
The comedy is pure gold. You're highlighting a good point that sometimes people go for "vibe" more than usefulness in published papers, and that we should scrutinize our charts more. In particular, I agree we should handle graph smoothing with more care and deliberation. However ... the violin graph is not that bad ... and I think 3D charts are worse (thank god they're getting out of fashion). Anyway, this is the first video from your channel that I've seen, but I dig the content so you got another subscriber!
If I ever write a paper, I'm going to not use a violin plot, and I'm going to cite you for why I didn't use a violin plot.
Great point. I demand citations every time someone avoids a violin plot in the future.
@@acollierastroif I wasn't going to use one anyways, then should I still cite so that it's obvious why I didn't
@@acollierastroI'll do you one better, when I review a paper and it uses a violin plot I'll ask the authors to replace it with a boxplot + histogram and cite your video
Instead of people citing a video, should write a paper I suggest the title: "Violin plots considered harmful" (because their past existence cannot be undone)
Getting passive aggressive, annoying vibes 😮
Transwomen are women 🙋♀️🙋🙋♂️
I love your videos. It's like if science was a city and you're a tour guide taking us to all the places where people get stabbed.
Funniest description ever!
Could you make a violin plot outlining the distribution density of those stabbing areas for us please? I'll show myself out now. 😂😂
Omg I’m dead 😂
Omg please tell me you didn't come up with that off the cuff, what a perfect description
Also known as: Violence Plots.
Hello! I am the creator of the Turnip Plot, which is similar to a violin plot but rotated around the vertical axis and rendered in 3D. Please cite my paper. Thanks!
Sounds like a lot of accidental buttplugs to me 😅
Why do you need citation
@@carpathianhermit7228 Turnip
if u don't cite my beyblade plot paper in your turnip plot paper I'm citing u for plagiarism
hur hur hur looks like b*schrödinger'scat*tt pl*schrödinger'scat*g
i spent 27 minutes going "okay is she going to say they look like vulvas" and then felt very validated
Yeah, I was nodding along to the mechanical critiques of the plots, and realized about three-quarters of the way through the first half (I'm a bit slow about these sorts of things cause I'm an asexual cis man who ain't get none and ain't want none neither) that oh yeah, these do kinda look like genitals (and it only clicked cause one of em kinda looked like a penis and that made it click that the rest of em look like vulvas) and was like: Huh, That's weird: I wonder if anyone else noticed that, and if she's gonna mention it at all?
spoiler alert
Meanwhile I'm here getting distracted by the ones that look like stingrays. 🤷♂️
I call them snot plots because to me they look like gloopy boogers such as you sometimes see in kids with colds.
@@GSBarlevthe shrinks are going to have a field day with you😂
Have you noticed that most graph paper uses a relatively faint blue colour? That is because photocopiers (before they became scanners) couldn't see blue well. If you were careful with the contrast, you could draw a diagram or a form on graph or squared paper and the photocopier would come out with only your lines on it. It was a great way to create forms, character sheets for RPG games or, indeed, the sorts of diagrams in an academic paper.
Regarding the typewriter thing, you could get whiteout on paper strips. You backspaced over the error, poked the whiteout paper behind the ink-ribbon and hit the erroneous letter again. That deleted it and you could over-type again with the correction. The line spacing on typewriters is quantised so you can go up and down by exact lines easily. It's only when you noticed the error after you had taken the paper out that you would have to resort to liquid whiteout and fiddle to get the positioning correct.
In my typing class in high school (wow, am I really that old?), we used that method as well. We also had fancy Selectric-2's that had a built-in white-out strip! But i also remember just rolling the paper up a bit, applying white-out, letting it dry, and rolling it back down by the same amount. No need to take the paper out altogether. This would have been 1987 or 1988. The tail end of the typing class days, before keyboarding became standard, and Apple II's or Macs, etc., became cheap enough to replace typewriters, to teach typing. I mean, we had a few Apple IIs, but they were for "computer class", not "typing class".
@@mikedavis979 That was around a decade after I went to school. In my day only the girls did shorthand and typing. I've cursed that fact many times during my IT career. By the time I realised it would be useful to be able to touch-type, I was too fast without to be able to stick to learning.
I had a typewriter from my dad that had a little like tape strip in it for corrections instead of white out, and you'd hit the backspace key to go back one character spce and the delete lever would push the ribbons up so the correction tape was in line and you'd just smack the key for the offending character a few times until the tape pulled all the ink up, then you turn off delete and keep typing. You could even do neat stuff by combining some inked characters with other characters on the correction tape, like clearing a + across an inked M looked pretty cool, and it made good on-demand icons if you needed them for something.
@@ArtThingies That was a carbon ribbon instead of an ink ribbon, a plastic film with a layer of carbon on it. The rich people like important executives used those. They produced more professional-looking text because the ink didn't bleed into the paper. But the ribbon could only be used once; when the shape of a letter had been transferred onto the paper, it left a transparent hole in the tape. There was industrial espionage where people stole the secretaries' used typewriter ribbons and read the documents they'd been typing from the ribbon. Carbon ribbons were expensive and had to be regularly replaced. Ink ribbons just reversed themselves each time through and you kept using them until your text looked too faint. If you were enterprising you could even re-ink them yourself, in much the same way that you can re-fill ink cartridges today.
@@richardurwin4432 Neat!
Librarian here: Physical Journals/books/etc. are often preferred when it comes to preservation and access, since when we buy a physical copy, we won't lose access to it when the publisher decides that we need to pay another $400 per user to access their system, or if a ransomware attack takes out the archive. Obviously, storing all these is impractical for all but the largest library systems, but the number of times someone needs to cite an article in the 2003 summer edition of Phrenology Today means that you can get away with only one copy in an offsite, climate controlled storage space.
The number of times I've had to do dumb piracy shit because a publisher literally pulled access to a non-downloadable article I was using for a paper *mid semester* was infuriating. Not to mention the time my school literally stopped paying our publisher liaison because they upped their rates by a factor of 10. We had a lot of 3rd/4th generation photocopies floating around that year, lmao
I love love love my campus library staff, at every campus I've been affiliated with. You all rock! And there's something so nice about just walking through the stacks to flip through journals rather than just scrolling and clicking through online lists of volume/issue numbers.
god bless Alexandra Elbakyan
Phrenology Today lol
@@trespaul She's the best. I really miss the old, up-to-date scihub tho :(
Public access now!
The technology exists for us to put 3d objects into .pdfs. For this reason, I propose the vase plot.
The diameter of the vase is the probability density.
or maybe the cross-sectional area is the probability density.
It is intentionally ambiguous.
Please cite my comment whenever you use my case plot in your paper.
The inner diameter or outer diameter or cross sectional area (selected at random when you run the program, no the module is not seedable) of the vase is the probability density. uploading this to npm and pypi asap
Buttplug plot
I suggest a 4 dimensional plot, called the aerofoil plot, where the aerodynamic drag coefficient of each 3 dimensional cross section of the 4d shape determines the probability density. Also every single 3d cross section is vaguely penis shaped.
Please cite me
How about a 4d object casting a 3d shadow? The orientation of a 4d object projects n 3d shadow where n gives the spectrum of possible densities depending on the k-orientation of a complex data mapped onto a 4d-manifold.
Call it the amphora plot, to get the greek creds
Angela utilizing her platform to indoctrinate the public against evil plots in an intellectual crusade is what I live for
That part.
the plot thickens
that was quite clever
Evil plots
What do we want?
Good data visualization!
When do we want it?
Now!
The kooky patterns make it so that colorblind people can read the histograms. I'm red-green colorblind, and I get really passionate about data visualization, partially because I have trouble with certain color visuals that other people don't struggle with, and partly because data visualization is a very efficient way to communicate info when done correctly, and I also have adhd, so I appreciate their communicative strengths.
But yeah, all that to say, I really appreciate when kooky patterns are included. It's immediately tells me that the person who made that visual is thinking about how their data will be perceived and wants to communicate it effectively to as many people as possible.
Designing data representation and slides for talks with color blind people and dyslexics in mind is a drum that I always have to beat with my trainees. A lot of the arbitrary complaining that people do about visualization and typeface etc is actually pretty ableist. Fun example is that comic sans is actually one of the easier fonts for people with dyslexia to read.
Interesting that it helps with colorblindness, i always ran into it in school worksheets and tests. They were printed in black and white so needed a way to differentiate between grey and slightly lighter grey, and we all just had pencils to make them ourselves. It became a bit of a competition as to who could come up with the weirdest patterns for their bar charts that were still distinct. Always nice when adaptations for specific things end up being useful for unintended reasons
the comic sans thing is a myth look it up. The same as fonts specifically designed for dyslexia like open dyslexic. More important than anything is font size. @@tglittle3166
Yes, the data visualization field should deliberately popularize plot color selections that don’t penalize the red-green folks. Or, encourage thinking about using line and hatching types that don’t require colors to distinguish
This is called hatching and there's a whole system behind it. It comes from heraldry and was developed in the 17th century back when there was already significant demand for printed illustrations but printable coloured ink was not yet invented. Basically, you have hatching patterns representing six basic colours; red which you represent with vertical lines, blue which is horizontal lines, yellow (or gold) which is dots, green which is diagonals from top left to bottom right, purple which is diagonals the other way, and black which is a grid. You get pastel tones by replacing the lines with dashes, and you can mix colours by overlaying the hatching of both, for example dashed verticals give you pink and if you interleave that with dots, you get orange
I'm not a scientist. Just a humble welder. I like your videos. I had no idea where you were going with this, but you had me nodding along, thinking, 'I get it. These violin plots are stupidly complicated and are not as efficient as other plots.' I get it! And you know, not being involved with the sciences, I didn't actually care about the plots one way or the other, but I could see how it would annoy an actual scientist/researcher. Then it finally got it where it was going, and I was like, 'Damn! That was masterfully done.' Keep up the great work.
"humble welder" - I like how you just had to validate your stereotype of "how do you know someone's a welder? They'll tell you" on the first line.
@@lost4468yt Just like vegans and horse girls.
@@teremleonheart3776 When you say "horse girls" are you meaning equestrians, or is this some new horror? 🤔
@@NightsReignthis comment honestly made me chuckle for a good bit, i mean girls that are super duper into horses, n have posters of em and backpacks with horses on it, that typa horse girl lmao 😂😂😂😂
@@teremleonheart3776 And now I'm laughing about the connection between welders and horse girls that I had never seen before, but can't unsee now.
I used to work at a tight tolerance thin film deposition optics manufacturer. One day my current supervisor visited my workstation unexpectedly and asked, "you haven't been using violin graphs in any of your report generators, have you?"; "Never heard of such a thing. Why?"; "Good. I knew you were a smart one. Just don't. They're useless anyway." ... Later she quietly explained their inappropriateness. Turns out she wasn't just trying to prevent her team from using them either. The owner got top management together and instructed them to purge any current work of their existence. HE was smart.
This story is so wholesome.
Inappropriateness? Did they mean from a scientific or a social perspective?
@@SlenderSmurf Yes
I'm an old man, so I know something about typewriting erasure techniques. In rough order of technological advancement:
1. hand-erasing with an ink eraser
2. Liquid ink eraser solution, in a bottle. You would apply this to the typo on the page and it would break down the ink and fade it somehow.
3. An "eraser strip" as part of the ink ribbon where you would retype the offending letter using the strip and it would abrade/absorb/dissolve the ink
4. hand painting over the typo with white-out (from a bottle)
5. hand-held whiteout strips, with dried whiteout on one side. You would retype the letter with one hand, holding the strip against the page where the hammer hits with the other hand.
6. whiteout strip built into the ink ribbon. Same as the whiteout strip, above,, but you don't need to hold the strip, its part of the ink ribbon.
Most solutions did not require removing the paper, because you can *never* get it aligned again. There's typically enough space where the hammers hit the page for you to get in there with whatever erasing solution you're trying to use.
IIRC, the IBM Selectric III would automatically retype the previous character with the whiteout ribbon when you pressed the backspace key.
I amazed the spell checker didn't ding me for spelling "whiteout" without a hyphen. It's always "correcting" words that don't need correcting.
Neat! Thanks for sharing some history!
Being a 90's child I still managed to use both mechanical and electric typrewritters. I had forgotten those witheout strips! In my mind, those used to come in little red plastic boxes, kinda like chewing gum packs
Those Selectrics with a delete key and a special ribbon were the best! I forget which technology they used - "glueing off" a plastic based ink, or a white-out ribbon. I think they used white-out if I remember correctly, but the few that used a plastic based ink that you could glue off within a few seconds were an absolute delight since you could use that technology on almost any paper and it didn't matter what colour the paper was.
When I was in undergrad getting my degree in economics, I saw these every once in a while when conducting research for papers. Turns out I wasn't an idiot for not being able to read these plots, I was just an idiot for getting a degree in economics.
I thought a degree in economics was a guaranteed job at Amazon. Is that still a thing?
@@Gersberms If true, that's the best reason I've heard for *not* getting an economics degree! =:o\
@@Gersberms Isn't Amazon fundamentally bad for the economy so they're guaranteed to get only hire bad economists? xD
@@Bozebo it depends on if you view "the economy" as something that's meant to serve small businesses/producers or something that serves consumers.
@@TessHKM I think “the economy” is generally meant to serve everyone. Amazon is obviously not good for producers, and small businesses, but it is also in the long term bad for consumers. Yes it does provide them cheaper goods, but it also results in wealth being funnelled out of communities which kills small towns. It also results in worse working conditions, and lose pay for people in cities. Overall it’s a net negative to the living standard of the average person.
As a data analyst, you've made amazing points for not using violin plots--scientifically. But in the business world, violin plots are **pretty** and the vibe is how you get execs to make the decisions in favor of what you want 😂😂😂😂
But do the execs understand those plots or do they just think they are cute?
@@GiovanniBottaMuteWinterI think that’s the point they’re making… they’re saying that in the business world, the information doesn’t matter so long as it’s presented in an attractive way.
@@daniel6678 Executives are only interested in the overview. They will naturally gravitate towards the violin plot for the overview data and assume the technical people will handle the details in the histogram. We make judgements on how relevant info is to us all the time, and pretty is not necessarily better. For example the more visually appealing a letter through my door is, the quicker it will be identified as junkmail and be thrown away.
Head on the nail, tbh. As I was learning data science post-BA, I was struck by how differently academics are expected to visualize info vs how you need to present that same data to "general audiences", i.e. rich kids who coasted into "business".
I’m a super lay person. My problem is that it is about the only plot that I’ve seen that doesn’t intuitively communicate anything.
Related to that very last point about "why not just make it half the graph", when they made us do violin plots in high school biology we were told to put *half* the data on each side so that the width of the *whole* thing matched the amount of data. Which is absolutely not how anyone else does their violin plots. Also we had to do them by hand, which is as excruciating as it sounds
I thought that was how they were drawn as well. So that the area of the shape has an actual meaning, which is something intuitive to look at. Although now that I think about it there is no x-axis so doubling or halving all of the widths doesn't change anything.
During my masters, my advisor actually ENCOURAGED the use of violin plots, and I didn't really question it at the time. Rude jokes were made. I have a publication that has FIVE SEPARATE PAIRS of violin plots (and plots that cluster points into hexagons? for some reason?). And you're right! Side-by-side histograms would have been BETTER and MORE COMPACT 😭😭😭
This channel really feels like having a cool older PhD friend who tells you all the secret tips and tricks and cool stories in academia
Exactly! ❤
Its the best
Spot-on!
Thanks, you really summarised the feeling so well! Love this channel ❤
Who you callin "old" , buster ?
Yes to the symmetry argument. I spent the entire video trying to figure out why they were twice the size they needed to be.
Your story about everyone turning to look at you as the only woman... it gave me a flashback to school sex-ed where the teacher (to be inclusive) said "or a**l sex" and people turned to look at me which basically outed me to the teacher. But also what reaction were they expecting from me lmaoo
This was a great reminder about effective strategies for allyship.
Any of the other ten guys in the room could have made the response she described, and it wouldn't have even seemed like White Knighting. That they turned to her was definitely motivated by empathy, but the effect is the same as what happened to you.
nooooo come on classmates learn to be chill
@@GSBarlev yeah I really feel like when you're a member of the dominant group in a situation like this best practice is to absorb as much attention as possible AWAY from the person who stands out unless they've made it clear they specifically want to say something
An alternative to a violin plot is a "beeswarm" plot: Instead of a smoothed density you plot each individual datapoint as a dot at its exact y-value, and the x-values of the points are chosen so the dots don't overlap, causing y-coords with a lot of dots to bulge out.
I like them because you can simultaneously see the raw datapoints, and also see the broad distribution. One problem is that in naive implementations you get chains of points extending out from the center in a line, giving a christmas-tree appearance. But good implementations can avoid this.
The comment at the end about “why have two flaps” is the most upsetting part of them to me. The only time I’ve seen something like this (where they used histograms instead of smoothing, so less yonic) that seemed even a little defensible is the man/woman population age plots for different counties. It’s still probably better to arrange them differently but at least they were using the two sides for something.
Honestly, those would also just be better if they had both on the same side, as then you can compare them properly....
Exactly. What is the point of having your data mirrored onto both sides?
Your smoothed out histograms were so nice you had to show them twice?
BS.
I do see how stacking histograms can get cluttered, but as you pointed out: if the goal is to compare histograms without stacking them, then make an asymmetrical violin chart with labeled axes so you can actually interpret the data.
The one sorta viable use case that just uses the other side for something useful, but they also don't just smooth it out.
Ex. population age plot between men and women, Vertical axis is age, horizontal axis is pop count. men on left bars and women on right bars.
I thought population pyramids were good ways to visualize the data honestly.
Saw this on Polymatter and I thought the same thing. That’s the one viable use of this data visualization technique. But there it’s just two histograms turned on their side and placed opposite one another.
I remember vividly how my advisor emphasized how he read papers: title, authors, abstract then plots. If the plots were compelling then he'd dig in. Those plots absolutely need to tell a succinct and coherent story.
Interesting point. I've always thought of the graphs, plots, whatever they are as being a side dish
What you said reminds me of my father telling me to always start with the maps in a book of military history, then decide if you want to purchase that book. Similar reason
Usually the quality of plots it very quick to evaluate. So if you have limited time and a lot of papers it makes sense to judge a bit on plot quality before reading the whole paper. However many plots require some significant explanation of the measurement system and conditions. Therefore after a quick look at the plots I often go back to the text.
This video sort of gave me the urge to come up with plots that are even more cursed than the violin plot. Like a stick figure plot where different aspects of the data set are represented by the size and orientation of the body parts.
It exists. Pie charts.
@@raygivlernah this is like you took a pie chart and cursed it
Can you pull up the latest set of Homunculus plots please? I'm a bit concerned about some outliers I saw 😂
what about a scatter plot in audio form? you map numerical values to hertz values, and instead of coloring points and lines, you give them an instrument. like a guitar note would be played for each data point, and a violin is played to show the line of best fit. i'd call it a song plot
@@PokeCube_ Symphony plot?
Hello! I like violin plots a lot; Im a data science researcher.
A violin plot displays not only easily comparable mean and quartile information, but also more granular information about the shape of tge distrobution. This gives violin plots a unique ability to intuitively inform certain dicisions about further analysis, especially when youre exploring new data with numerous distrobutions for the first time.
Histograms have an issue of sharing the same axis, which, when trying to understand intricacies of distributions, can be difficult to read. Box plots are easy to read but can obscure information, maybe leaving readers to question if the choice of a box plot was appropriate.
A violin plot allows you to render an easily interpretable plot which lays bare qualitative aapects of the underlying dustribution. This not only allows for easy analysis via the box plot, but also high level qualitative understanding.
I never, when i read a violin plot, care about the scale of the distribution, but the shape, which i think they do fairly well.
Of course, when publishing I may or may not use them. I find them incredibly good for visualizing data exploration, and like to use them when explaining datasets moreso than results.
On the point of smoothing, totally. Thats why ive gravitated towards swarm plots for general qualitative distribution understanding. But, smoothing is an issue within itself, histograms have essentially the same exact issue in terms of bin size.
Also, worth noting, im colorblind af, so overlayed color infornation may as well be jibberish to me, which might be part of the reason i hate overlayed histograms so much.
I'm not colorblind and I also can't read an overlayed histogram. Way too cluttered, and in my mind I'm trying to imagine what they'd look like not overlayed.
I promise I'm not as stupid as my spelling suggests.
Honest question, but why not just use staggered histograms then? What does the rotation and mirroring add? Including information on medians/averages (and quartiles if you really need to, but if you have the histogram there anyway why would you) could be done in pretty much any format you choose.
@@TheManifoldTruth That's a great question, and the honest answer is it's really convenient to plot violin plots with seaborn.
Another more honest answer is the aspect ratio of monitors. While they don't have to, histograms have their density along the Y axis, meaning, if you have a lot of distributions you want to compare, it's easier to fit a violin plot which orient things horizontally. Yes you could just rotate the histogram horizontally, but the love of making the "perfect" vs the "good enough" plot starts to die out around your 10,000th plot in your career.
Another, maybe more satisfying but less honest answer, is the mean and standard deviation of the distributions is useful in comparison, and that comes out of the box in most violin plots.
Really, the debate around this feels like the debate around the oxford comma; strong opinions around "rules" which are really well entrenched but still arbitrary preferences.
I don't have any evidence to back this up, but I wouldn't be surprised if violin plots were more common in more data rich and fast moving research domains like data science rather than physics. In data science, making a plot that's good enough quickly is way more attractive given the sheer volume of visualization required in the domain.
I have to note though, a lot of my takes make a lot more sense in a business context, rather than an academic context. Papers take a long time to make, so having a sub-par plot makes much less sense.
Came here to say many of the things that you said. Thank you for saying them more completely.
The colors on the graph at 20:37 are really really hard to tell apart for people with deuteranopia (some 6% of males). These pastel colors are hard, it's much better if they are very definitely yellow, or blue, or red, or grey, etc. Just wanted to chip in, since we're talking about it already.
As a non-scientist, I've always thought these plots were confusing and just obviously above my pay grade. Very validating to hear that they are indeed as uninformative as I thought they were. Much appreciated❤️
Same! I felt the same about radar plots for a long time until I found out there's no real consesus on how to read those either 🙃
This is such an important video. I remember that one of my high school textbooks had some stupid plot (that I now understand to be a violin plot) that the author loved to use. That book could have been half-a-pound lighter if they just took them out.
@@thefaboo yeaahhhhhhhh radar plots are cool until you realize the area can be altered by how the spokes are ordered lol. It's just a multi-variable plot with connections between each percentage for no real reason
The whole point of data visualisation is to make it easy to understand, otherwise you'd just dump raw data in table format at people, so your POV is perfectly valid. Of course, depending on context, if you are writing something to be read by people familiar with the topic instead of the general public, you can go a bit spicier on the complexity, but it should always be as simple as possible
On the contrary, they're information dense, combining histograms and box-and-whiskers plots for multiple sets of data. There only uninformative if you decide not to read them.
"You can't actually get data, you're just getting vibes." Brilliant.
The whole point of a data visualisation is to extract vibes from numbers though surely?
@@Dongobog-ps9tz Vibes are definitely not the whole point of a plot, but a nice feature if you have a well constructed one. You should still be able to recreate the original data from whatever visualization you end up using not only so other people can try to find other useful features in the dataset, but they can verify the plot actually matches your original data. If your visualization is "vibes only", it's a marketing gimmick, not a useful research tool.
@@NameName-u9e A plot is a lossy compression where you're trying to turn the data into something human readable. Maybe we have a different definition of vibes but all I'm talking about is that the plot tells a story with the data. It can misleading and entirely truthful.
I've always felt vaguely guilty as a scientist for never using the violin plot functions in any plotting tool - thank you for lifting this weight from me.
You'd *apply whiteout w/ the page STILL inside* the typewriter, wait about 1 minute, and then use the *backspace* key to shift _back a space_ to your mistake so you can apply new ink over the dried whiteout.
Later on there were typewriters that could apply the whiteout for you. Usually electric ones. There were some other more esoteric solutions too!
But yeah, most people just applied something directly to the paper and shifted back a space. Calling it "backspace" on a computer keyboard is one of many holdovers from the typewriter days. So is the stubborn yet incorrect convention of double-spacing after a sentence.
(Double-spacing is essentially a typewriter trick/convention that makes things easier to read because periods are so small and on some typewriters don't offset enough. Single space after a sentence has ALWAYS been typographically correct in the world of typesetting books - plus print & graphic design.)
We use a lot of old terms and symbols that don't apply anymore. Like saying "rolling" when a camera starts recording comes from the early days of film when there was a step to roll the film.
Same way that many save icons are still simplified shapes of floppy disks - lots of kids grow up associating that shape form with "saving" without actually knowing it's a real physical thing.
Or how how we associate the power symbol with turning things on (it actually was originally a standby-reset symbol or something but that's a whole different conversation.
You have a source for that last paragraph? I've always heard the power symbol was a combination of 0 and 1 - a binary toggle for on/off.
@@NoeLPZCyou're both right - the 0 and 1 do represent binary states, but the version of the power symbol with the line crossing the circle was originally a standby symbol. If I remember correctly it was meant to indicate something like what we'd call sleep mode as opposed to turning something all the way off and on. The actual power-on-off icon was supposed to be the line totally within the circle, not crossing it.
It's even still used in some specific cases now - I work in a lab and we have vortexers that have marked switch positions for on (line), off (circle) and touch-activated (line breaking circle) modes.
actually most correcting typewriters had an adhesive ribbon that would lift the letter off the paper. there's a fun technology connections video about it
this concept is called a skeuomorphism
As to the violin plot joke thing social difficult choices.
A classic is to look confused and ask why the joke is funny in a very sincere way.
Another way that tends to work for me is to just say "dude, c'mon", as that puts the onus squarely on them
I do find it very helpful after a moment like the one you described to reflect on what I could have done differently while keeping my goal in mind. Generally speaking with stuff like this, the best approach is to deflate the other person, so to speak.
I wish you all the best in your future strange and awkward social interactions
Violin plots are overused but they have a use case for comparisons of a large number of samples that have complex distributions. We use them for this when comparing gene expression in cell populations. We can quickly see the 'shape' and get the vibe of the multimodel gene expression for large sample numbers.
Exactly, the point of the violin is just the vibe. The data is in the box plot inside of the violin.
Wouldn’t ridge plots work in that kind of situation ?
I am not a scientist. I have never heard about violin plots before today. But now I know about them and why they are mostly useless. I cannot overstate how much talent you have for making subjects like these interesting. A lot of the time, I pause science videos while alt tabbing to other things, taking in the videos in chunks. I always seem to watch yours straight through, beginning to end. The way you break up your videos with music and title-cards really helps make them digestible. Thank you!
I thought, "Well they look funny, but surely there's a reason why they'd be useful!" And then at the 8-minute mark, you finish explaining how you make a violin plot and I'm like, "Okay but why would you do that though???" I think it's a terrible plot already and there's still over 30 minutes of reasons to listen to. Brilliant!
One principle of peer review is that we shouldn’t just assume authors are analyzing their data correctly. I appreciate violin plots because they provide the reader reassurance that the use of box plots is appropriate. Absent the density overlay, I worry (and sometimes rightly) that the authors are using box plots in inappropriate contexts (as evidence from the fact one sometimes sees multi mode distributions in violin plots in papers)
I agree, although I do agree with Dr. Collier that violin plots are less aesthetically pleasing. Plotting semi-transparent points over a box plot sometimes can work. Perhaps everyone should make a separate violin plot for reviewers, as well as box plot or something else. Hmmm....
I prefer to see a test of gaussianness (the actual test name escapes me right now). Then you have a one liner saying "yep, boxplots are good here". No wasted paper.
I think you are referring to a Q-Q line plot? It gives a quick indication if the data is normal and any possible skew at a glance. They are also very useful for comparing goodness of fit between distributions.@@davidjohnston4240
@@davidjohnston4240normality lol
Just use a histogram
I agree whole heartedly with the second portion of the video (the first too). In situations like the one you described (the not-funny jerk), I will invariably pull out my phone, act like someone's calling, and then interrupt and say loudly "1950's on the phone and wants its joke back". And I'm a male who was born in the late 1950s. I have absolutely *no* time for people like that and *do not* let it pass without comment. Why should I? I'm at an age where I don't have to any more. And coming from a time when "humor" like that was "normal", I actually feel obliged. Damn, you're good.
I'm convinced people only use them because they look vaguely sexual
A pretty good reason tbh
I never heard the name until today. I always imagined them as vulvas with misplaced clits and thought when they showed up it was scientists being juvenile somehow
I'm convinced they're only called violin plots because they couldn't get a paper to publish "vagina plot"
And STEM is still largely a boys-club, so there is increased tolerance for anything that even subtly makes non-men uncomfortable.
To be fair, if there was a plot that looked like a dick, men would certainly also use that just because it looks like gentiles.
My grad school used to have professional plot makers on staff. It was long before my time but the space they worked in was still there and there were some people still working there who remembered them.
Professional plot makers! I love that.
Pretty much the same for me. There were still a few around but they were on the way out. Images had to be pasted into place
This went from ha ha, to not ha ha real fast
I want to say that you do an amazing job explaining the female side of these interactions
The fact that you articulate why it was not ok is a great source of information for people who want to do good but don't yet see how certain things are problematic
I clicked on this video thinking "oh what's wrong with those they look cool" but you managed to thoroughly win me over
You make me want to do video rants about bad research proposals, but then I realize that you are about 1000% better in front of the camera than I could ever be. Keep it up!
I think the type of smoothing they're doing is called a Kernel Density Estimate. They didn't teach us about KDE plots in physics classes because, as you say, they mostly just show vibes, but it's still better than the arbitrary-window smoothing you're suggesting they do. See the Seaborn documentation for violinplot
I've never once used a violin plot... but for some reason i still felt like i was in trouble the whole video.
This is actually kind of funny. I'm a PhD student in statistics, and I learned about violin plots for about 10 seconds in one of my first year courses. Just a few weeks ago I ran into a situation where I actually considered using violin plots to convey the distribution of sequence lengths for a system running in different states. However, I did ultimately decide to use a different plot, because the finished violin plot just looked too weird, and would have been distracting. I admit, I've never seen them used in a professional setting by other statisticians or scientists.
Wow, actually sharing your experience at work was really enlightening. I never would have thought about how distracting and uncomfortable it would be beyond the off-color comment. The lingering after effect sounds like it had much more of an impact. Thank you for sharing.
I'm convinced the original paper was an elaborate troll.
It's funny because sexual body parts are inherently funny. For the same reason farts are funny. It takes a physical (physics!) aspect of ourselves which is tied to emotions that we normally try to keep private and forces it out into the open. This emotional discomfort is mostly politely expressed as humor. Now had your joking colleague just said with clear disgust, "It looks like HONK, ugh" wouldnt that have been worse? Of course you are right, best course of action would have been to say nothing at all.
The obvious humorous response to the violin plot is to come up with some sort "c*ck" plot and then come up with a logical way to overlay it or point it at the violin plot. And , get it published straight up. Never acknowledge any physical resemblance. That would be funny.
I know you'd like it to be but, clearly, humanity isn't better than this.
Ha, only a minute in and I already adore the video. Out curiousity I had a professor give me old dissertations to see how they used to do data visualisation back in their days. And the solution is glue. Glued in graphs hand drawn on graph paper. Glued in photographs of the setups. And then it clicked in my head why we learned to do all that glueing stuff on paper in elementary school.
The animation on 18:20 could've just been a line diagram. Perfectly conveys the same data with just a single image. Super easy to plot in Excel too. No need to make it a complicated animation that’s impossible to understand.
I feel like that's often the case with dataisbeautiful. It's almost a competition in presenting the most basic data in the most convoluted way possible. Like those "make the worst volume bar" UI challenges, but serious.
Since it's supposed to be different levels of legalization I would've gone for a stacked area graph, but I agree with the sentiment. It looks kinda cool but it's definitely worse for conveying information.
one of dataisbeautiful's biggest issues is an obsession with making data animated that doesn't need to be
it's honestly way easier to tell how quick something is by looking at a slope on a time plot then by trying to compare different speeds half a minute apart in the same animation
Why do you want a line, or for them to be connected at all? Shouldn't it be 4 separate bars?
@@antonhelsgaun 4 separate lines that demonstrates the change over time. Then you don’t need to make it an animation.
Or do a stacked area graph like mentioned above.
A violin plot to overthrow the physics department !
There's no need to resort to violins!
Let me get the band together
Plot twist!
I've been watching your videos for a while as someone who works in humanities but with a great interest in physics and finally I can explain something in turn!
FWIW, corrections for even manual typewriters are a lot simpler than one might think. On most standard models there is a button off to the side that switches the ribbon (which by Einstein's time would at least have two or more types of ink in a uniform stripe across it, so it looks like one of those weird sour fruit roll ups that's segmented by color horizontally) to the 'correction' ink, which is thick, white and usually on the bottom; most of the time it just shifts the ribbon up like a centimeter or so, but some fancier typewriters can hold separate ribbons just for corrections that you can toggle on and off. There's no need to touch the paper, thankfully.
After you've switch the ribbon and fixed your mistake, you can switch back to the regular ink seamlessly. You did typically have to move the hammer to where you made the mistake and type that exact letter that you screwed up again in order to cancel it out with a negative image in white ink, but muscle memory makes it like riding a bike.
Plots were virtually impossible on these things, though, as you said-- it would be like trying to make ASCII art by hand but you can only use a springy lever to move the cursor.
I did a violin plot in one of our articles, and after watching this I can say I did it pretty mindlessly. I was an option. It looked like a cool data viz. It showed an increase in a median altered genome segment length as well as a higher number of longer segment alterations. Which is basically the same thing, as I look at it now. But during writing we were in that phase of trying to get our point across. In that mindset the overloaded graph seemed useful.
After watching this I'd just do ridgeplots. Thanks for taking your time to talk about this.
I enjoy that this turned into a histogram appreciation video because histograms are really great.
When you made a mistake when typing on a typewriter you had to use a backspace then you would use a special white chalk covered piece of paper - put it between the typing tape and the paper then type the wrong character again - that would "erase" the wrong character, so you could use backspace again and press the correct key.
Now I feel really old.
@@marcins.1128 I'm old enough to remember when those Tipp-Ex strips came out. They were an amazing innovation. Before that, you needed to use an ink eraser, or some noxious kind of solvent that faded the ink on the page. Later, some typewriters had the whiteout strip *built into the ink ribbon*. It was a magical time.
@@marcins.1128 Im only 26 and I happened to grow up with one. There was one at my moms old job when shed take me as a kid and I'd play around with it. I'm not sure if the backspace function worked the same way though.
@@mercury5003 there were also newer typewritters with two tapes - one of them was the erasing one. They store some recent characters in memory so you could use the backspace as on your PC.
Or just cross it out with X-es and type it correctly right next to that or above that. Or use correction fluid, wait for a bit, then type over it (it always looked different though), or re-type the whole page.... depends on what you're doing and what are the tolerances for nice presentation vs just having the text on a piece of paper.
My wife in astrophysics -- she has a very similar experience to what you described with the Violin plots, where she'll go out to dinner with a bunch of male physicists and the waiter will come up and say "Well, ladies first!" And she can't explain how frustrating it is to have the entire table be alerted/reminded of the fact that she's the only woman there. It's tough in that situation because she can't comment on how it makes her uncomfortable because the waiter's not being a bad person about it and it'll make her look bad if she brings it up at the table. So she just kind of has to deal with it. It sucks!
Why exactly would this make someone uncomfortable lmao? Bizarre these "diversity and inclusion" people become uncomfortable when they're unique
@@Daniel-ih4zhI genuinely hate when people acknowledge I'm a woman and that I'm special because I'm the only woman around at the moment. Why the hell does my gender matter to you guys, I'm here to do my business like everyone else. I didn't "earn" the trait of being a woman so it seems weird to point it out as if it were extraordinary.
@@vickypedia1308 I 100% agree and understand. But it seems like people are having their cake and eating it when they hold this sentiment while also promoting things like WiStem and AA
@@Daniel-ih4zhI don't know what those terms mean (not a native speaker, so if those are terms where I live they likely have different acronyms). However I would like to add that the people who get uncomfortable when someone highlights that they're "special" for being some sort of minority are usually not the same ones who actively advocate for special treatment. For those who do, it tends to be because they're two different kinds of "special treatment" and one of them feels patronizing while the other doesn't.
Personally, I think we should strive to decrease sexism at the workplace, not force women quotas or other artificial stuff like that. Sexism is more likely to happen in fields that are predominantly pursued by men, simply due to the lack of women who can point out if someone is being sexist. (And even if there are one or two women there, you don't want to be *that* person who complains about something nobody else sees an issue with.) In my opinion, the fix isn't to forcibly try to get women into that field and making a big deal out of it. I would certainly not want to be the token woman who only got in because a company needed to fill a quota. I think we should rather try to make the place feel welcome to *any* person, women included.
@@Daniel-ih4zh You think it's bizarre when people who want to be included as equal participants, get uncomfortable at being singled out for no reason? It's *almost* like you think "equality for minorities" is the same thing as "special treatment". Hmm. Maybe you should reflect more about that.
9:09 Histograms are pretty difficult to use for comparison of single-cell RNA sequencing data. You can also make a variation on the violin plot by making each half of the violin represent two different conditions, eg. a control condition vs. some treatment (eg. how does this drug that inhibits this pathway influence expression levels of this other thing, compared to no drug). The ridgeline plot is one alternative, but it's pretty space-inefficient unless you overlap the data and risk harming readability.
This video was more of a rant than a legitimate analysis of the use cases for the violin-plot. Even the example shown at 10:22 shows just how unreadable overlapping histograms become once you have more than 2. Violin plots are literally just a way of visualizing several histograms at once without making them collide with each other.
About papers 100 years ago. Based on the memoirs of the Stephen Timoshenko it seems that there were special people at universities who prepared plots. You would give them hand scatched drawings and they prepare then versions for a paper.
In 1956, Bette Nesmith Graham (mother of future Monkees guitarist Michael Nesmith) invented the first correction fluid in her kitchen. Working as a typist, she used to make many mistakes and always strove for a way to correct them. Starting on a basis of tempera paint she mixed with a common kitchen blender, she called the fluid "Mistake Out" and started to provide her co-workers with small bottles on which the brand's name was displayed.
When I wrote my bachelor's thesis in uni my supervisor insisted that I would use violin plots to show some data. The problem was, the outliers in my dataset where not many but they were far out, like really really far out. So in the plots the often just weren't visible at all but I had to include them to represent the data accurately, at least that's what I was told. In the end the captions for every figure featuring these plots ended up absurdly long because I had to explain what the hell was going on there lest I forget it myself.
So I 100% agree with you that these plots are just bad in every regard.
The bit about these plots looking like genitalia is also true in every regard: because of these outliers some of my medians ended up at the bottom of the plot so that's of course where the belly was located, this however had the fun little side effect of making these particular plots look like a cock and balls. So my supervisor basically insisted that I would draw a bunch of dicks in my thesis. These things are truly terrible, they just always look like genitalia
I would argue including the outliers made your visual less accurate not more.
Maybe we should just call them Rorschach plots
I sometimes have a lot of possible plots I could do, and generating all of them as violin plots is useful because I don't know in advance if my data is bimodal or whatever. And I can put lots of violins next to each other and compare them, unlike histograms.
But sure, I won't put them in my presentations because by then I know the best way to show the data. Fair enough.
Also excellent point about the smoothing! People not understanding their statistical models bothers me a lot.
I've been recommended this video endlessly and eventually realized I'd subscribed from your other stuff. It's wild to me that this doesn't have more attention. Maybe the algorithm just knows me too well. There's as much to learn from your delivery as there is from the content.
There's a strong link between embarrassment and humour, and a lot of embarrassment over taboo topics like anatomy that roughly half of people have (particularly among teenagers and people who haven't got over having been teenagers).
As for the use of references as a form of comedy, the basic idea is "we all laughed at {thing} then; remembering it will bring you to a similar state of mind and probably make you laugh now". If you didn't laugh as a teenager when someone broke taboo, then you're not going to find it funny when people try to evoke the experience as an adult. On the other hand, if you're someone who found Monty Python hilarious, then someone saying "this parrot - " (the pause is essential) is going to remind you of John Cleese screaming at Michael Palin and very likely get at least a smile, if not a chuckle, out of you.
There's also a whole in-grouping thing going on - "you and I share an understanding of this reference, therefore I am a member of the in-group and popular and successful"
People love "chart art." I used to work in finance. There was pressure to replace data tables with charts when possible, even if the charts ultimately distort the data. But the chart makes the report look pretty.
A very common chart that I absolutely hate is the 3D pie chart. The pie chart is already a bad chart, but someone has sabotaged what meaning a pie chart has as the areas have become distored and are no longer direct representations of the weights.
A 2D pie chart is only bad when there are more than two categories in it. A pie chart with two categories is excellent. You can immediately see whether the fraction displayed is closer to a quarter, half, or three quarters.
For more than two categories, a stacked bar is better.
I mean, you can immediately tell whether X% is closer to a quarter, or a half, or whatever without the chart. Charts should ideally be used to get a snapshot of lots of data--not just to make a list of percentages look fun.
While I also find violin plots a bit hard to read, for something like the paper at 13:30 where they apparently want to compare 7 different probability distributions side-by-side, I'm not sure any other option would be much more readable.
It's probably too many to overlay the probability densities on top of each other (although I agree that's a good option for comparing 2 or 3 distributions). I guess they could do 7 side-by-side histograms or pdfs. 🤷🏻♂️
(By the way, Fig. 3 and 4 you point to aren't actually a histogram of the same thing, they're a bar chart of something else. Note the x-axis isn't numeric, unlike the y-axis of the violin plot. Sorry to nitpick!)
I made a comment saying basically the same thing, the arguments in this video don't actually make sense in the context of the examples given. Even though I have never personally used violin plots before, I am now convinced that they are a very effective way of visualizing many distributions at once without overlap.
That's what I was thinking. But then the Ridge-Line Plot at 21:20 looks like it's probably superior in every way.
As a PhD student in Bio, I was also on the way to say this. I have a lot of overlapping distributions for a lot of conditions. I think one solution is to distill your conditions into the truly necessary ones. Then, I think the ridge-line plot (or a less overlapping version of it) is definitely better than a violin plot
google ggridges
I come out and say that I used the Violin Plots in my PHD thesis. I had grain size distributions to display. Histograms have the big problem that you can not well put a lot of them over each other. I had I think ten different samples I wanted to display next to each other (to make them comparable). Furthermore, I wanted to use the median to simplify the further discussion, but I also wanted to show the actual distribution. Of the Particles, as it was important for the behavior I was looking at. The Violin plot was a good combination of:
1. The median is visually represented.
2. I do show the actual distribution, so I can discuss the skew, if there is one.
3. I can pack a lot of them next to each other, the reader gets a good visual representation of the different distributions.
4. Yes, I find them visually pleasing, if done right. I did plot them horizontal, so.
5. I think the Violin plot is symmetrical, for the same reason as the Boxplot is symmetrical. And when I think about a metal grain, which I worked with). I thought like it represents the form of the actual grains.
I did not add another histogram. I did give the smoothing value and normalized the width to one.
As, I saw the second part of your video, I am sorry to hear about the unfortunate situation and that this plot makes you and other women uncomfortable, this is unfortunate. I did not know about this, and it was absolutely not obvious to me. I am sorry for that.
I worked as a statistician in the 90s and into the early 00s, and never heard of a violin plot. Knowing what they are now, I see they are entirely useless.
thanks for explaining.
i used to think I'm somehow stupid for not understanding them.
you are sufficiently compatible with my tastes for long explainer vid host and I am subscribed to you now after watching this.
these tend to either look like genitals or (sometimes) weird turds.
I love your videos. Like why would I care why a certain plot is horrendous? 40 plus minutes later I'm super invested and ready to go on the war path about violin plots.
I'm only 8 minutes into the video, so you may well change my mind before the end, but I have made good use of violin plots in my work. When I've been comparing posterior distributions of multiple parameters from multiple different MCMC chains, the violin plots have been an excellent way for me to tell at a glance what the data is doing, and if there are any severe problems.
Boxplots do not tell you if your posteriors are multimodal, violin plots do, and a histogram with 30 variables is going to be completely unreadable. I don't really care about the precise values of the interquartile ranges, I want to see if chains are converging to the same unimodal distributions. None of this information is for presenting in a paper (I'll give sensible posterior distribution plots there), it's for me (and my collaborators) to understand how well my MCMC is converging, and where the problems are. For that they work pretty well.
Okay, now I'm going to shut up and continue watching to hear what you have to say!
EDIT: all my violin plots were horizontal. It never actually occurred to me what they resembled when viewed vertically.... Also I work in veterinary epidemiology (I'm a mathematical modeller), where the majority are women, and my supervisor is a German woman (also did maths at undergraduate), who has no issues with speaking her mind, so I don't think anyone would be as daft as to joke about it!
ja but if your data set is multimodal, why even use a box plot? Or, like, make a histogram since that's showing the important parts and then put the quartiles in a table or something?
I agree with this use case - the shinystan package in R makes good use of violin plots and has saved me a lot of time in evaluating models. But I'd argue that the usefulness of violin plots go beyond MCMC. Overlaying density plots / histograms is ideal in most situations, but things get incredibly cluttered the moment you have >4 lines to plot. Having multiple panels of densities works - but is essentially a violin plot without the mirroring.
ridge line plots tho
@@qualia765 Rdgeline plots are good and probably the first choice, but violins are particularly good when you want to compare groups across different strata (e.g., geography and income)
Once again, I must commend you, Angela. You've managed to keep my attention through 42 minutes while speaking about a subject I had previuosly no knowledge or interest in. That was great.
Great video! Your point about choosing a plot which conveys the most important thing about the data really hits home. Exploring one’s data is so important.
Violin/Bean plots: another example that just because you CAN do something does not mean that you SHOULD.
Finally! Someone else hates these things!
No science, only vibes 🙄 also, 🐱
A picture is worth a thousand words. Dr. Collier's beleaguered sigh is worth a thousand data points.
"it's fine"
It is funny, but my master thesis was actually one of first theses in my university that were typeset using LaTeX. Probably the first thesis, actually. And my diagrams were drawn using Postscript. Yes, I wrote the programs to draw the diagrams. Of course, Knuth created the whole digital typesetting thing because the expert typesetters (actual people) were retiring. And the new generation could not do his "The Art of Computing" well enough because of all the diagrams and mathematics. Yeah, we came a long way. I was just one of the people right in the middle of the old and the new. Later, I was using gplot to generate Postscript graphs. It still exists, I believe.
I think the reason the histogram got mirrored is because "symmetry makes it look and function better" (which isn't necessarily true in general, and certainly not true in this specific case, but it feels like a common misconception, though that might also be personal bias because my spicybrain likes symmetry).
Also, the joke is that genitals are funny. Not just AFAB people's genitals, everyone's. I've seen some radio reception graphs that look like a different set of genitals, and had to stifle a giggle. Like, the sexism stuff is definitely real and valid, and there's a time and place for genital jokes, and an academic setting definitely isn't that, and the jokes absolutely age more quickly than some short-lived isotopes, but still.
I've never seen these before. I do think they would make sense in One particular situation. You give an example of temperature data distribution per month (12 separate plots) but suppose instead of months, you want to plot the annual temperature distribution of all the world's capital cities, and you want to put a continuous variable (latitude) on the X axis, with temperature on the Y axis. In that situation, the symmetry of the violin helps centre the data correctly on the X axis. I completely agree that this format should only be used for overview information (but I note that it can work without colour, while certain other types of plots can be difficult to read without colour.)
There are so many data visualization that just should be wiped from existence. Violin plots are at the top of my personal list, they are outlassed in every way in modern data visualization
Pie chart/Gauges are the top of my list. But for some reason C-suite loves them.
@@BlueSapphyre pie charts are fine.
If I could murder one type of plot it would be the pie chart. Not because they're worse than violin plots, but because they are more prevalent.
@@hyphenatednickand often used in the wrong way.
@@PBMS123only if used properly.
I totally agree about the "feminism" point. The level of defensiveness, especially in STEM is insane.
Re: the plot: I think they doubled it assuming we'd get a more intuitive sense of the "area" corresponding to an increased histogram value. Still a shitty plot
Yes, the feminist pivot was a disqualification. Could have started with "the useless plots look like a ****", that would cover it. But dipping in the intersectional victimology half way?
@@peterpeterson8792 "Intersectional victimology" is a loaded way of saying "shared how it made her feel".
Maybe you don't care about that part. She identified that section pretty clearly.
If hearing how she feels makes you feel some kinda way, that's for you to examine.
@@bbqchezit Do you realize that the hypothetical offensive situations never happened, she went on freeflow of inventing nonsense about some hypothetical men making a stupid joke and how she would feel if that happened. And then "allies" jumped in here ready to pre-save poor pre-victim of her imaginary pre-situation of a pre-bad-taste infantile joke she imagined that surely would traumatize her forever. Oh, "allies" feel she is already traumatized by her own imagination? And sure, we should feel for her trauma and cancel the plot? Thus, from disliking the plot she figured if she comes up with a me-too victim imaginary situation, and how horribly she would feels about it, that sure will erase the graph. By the way, unless one has no spatial (2-d spatial!) imagination, there are valid and well demonstrative applications for this plot, perhaps not for her data.
I wonder how leaves make her feel? Ever thought of it? Think of it, violin plots or ****** if you wish, all over the forests, trillions of them? I never thought violin graph looked like a ****** until this chick made a stink about it. And still don't. Any other offensive shapes, circles perhaps? Just get real, get therapy if you need, and don't ask others to participate in your manipulation by your imaginary issues that "make you feel".
I don't know if anyone has mentioned yet, but a population pyramid comes to mind as a use case where it makes sense to have both the mean or median and quartiles, but the overall shape of the distribution is still important and useful. Mind you, I'm typing this at the 9:20 mark, so maybe that does come up in the video at some point and I'm just jumping the gun.
The population pyramid could instead be two stacked histograms. This would make it much easier to compare the male and female populations at a given age. However, it wouldn't be a pyramid, so you would no longer be able to use your copy of _Demography_ to keep your razor blades sharp.
But those convey so much more information. Each bin is labeled and usually the left and right are not exactly symmetrical, the left and right are assigned to male and female population. So say there was a particularly deadly war, you can expect a bigger dent on the male side between certain ages than on the female side for the same ages. The violin plot fails in all of these points, it isn't labeled, it has no bins, it's smoothed so it becomes even more vague, and it's symmetrical for no reason.
@@3snoW_, oh sure, it's not a violin plot, and I wouldn't want it to be one. I just mean that it's a histogram where it probably wouldn't hurt having box plots within them (one per side), so you could quickly compare the median and quartile ranges of male vs. female populations, assuming you wanted to force that into a single visualization and you didn't want to set aside space for a table or something
No idea about actual science professionals, but the r/dataisbeautiful guys probably see the plot and the symmetry reminds them of audio data visualization and they think that's fancy, and that's all the thought they spend on it.
Starting my second year in college with the goal of graduating with BA in mechanical engineering. Found your videos accidentally! Love them so far! Thank you for sharing.
😂😂😂 OMG, when I first saw these violin plots I had the same thought. I feel validated that someone else had the same thought, because I felt that my profession was slowly corrupting my brain - I'm an OB/GYN. Thank you for providing content for me to watch on my post-call day. It sucks being a woman in a science based field, and unfortunately this bias has seeped its way over into medicine as well (shocker). This is why representation matters, but we must also continually actively dismantle the patriarchy
I have used violin plots and liked them. You have convinced me that I was wrong.
Oh really? Where were you Jan 6th !?
@@ubahfly5409what does this mean dawg
@@aidanjimenez9343I think they were (jokingly) calling you a terrorist....
We all make mistakes. But not all of us admit to them publicly
@@gebaligive me a sec and I'll make a violin plot of that.
i love watching your videos while i procrastinate on my physics 101 homework. makes me feel like i'm actually working
You've become my favorite lol. I used to complain to my bf about how anyitme you present data online people always point of "Correlation doesn't equal causation" as if that dismisses all the data or is a complete argument. When you referenced that a few videos back I melted.
I'm so ready for Schrodinger's cat!
What's this? A 40-minute video? Ranting about something very specific and esoteric? Subscribed.
I love these kooky patterns too!
I first ran into this in Trello, where the labels (categories) are colored rectangles; but that sucks for accessibility. So they have Color Blind Friendly mode where they add distinct patterns to the colors. Everything should do this! I'm not colorblind and it helps SO MUCH!
I've often seen them used in colorless graphs. It might be popular in Ecology.
Violin plots are just Georgia O'Keeffe paintings for STEMlords
that is so spot-on 🤣
i came to this comment section to make this joke and now i feel unoriginal
actually there IS a type writer that can erase stuff and uses some weird ass material science I don't know about to kinda suck the ink out after you write it, my good ol' technology connections has a whole video about corrections in type writers
I assume whiteout tape originated in typewriters .. as an auxiliary ribbon.
People used whiteout out of a little bottle.
@@pmcgee003 There was whiteout tape that came in dispensers like Scotch tape. There was a way to shift the ribbon out of the way, and you'd backspace over the mistake and type it again while sticking the whiteout tape between the type hammer and the paper. That would type over it in white. My mom used this back in the 70s.
Later, there were typewriters that had something like this as an auxiliary ribbon. Actually, more often they would use a carbon-based regular ribbon and the correction ribbon could literally lift the stuff off of the paper when you typed over it.
@@MattMcIrvin yeah, the IBM Selectric golfball typewriter was a beast to try moving around the office.
@@MattMcIrvin
Ah, interesting!
i just looked at my "brother AX310" electric typewriter. (heirloom)
It seems to have a translucent correction tape, if anyone cares i might try to find out if that is how it works 😅
Typewriters like the Selectric were more like letraset than ink. They erased by using a sticky tape that literally peeled the “letraset” character off the page.
Thanks for putting the last part in
Hospital business analyst here... the reason I use violin plots is to specifically highlight the fact that the data is wonky and it cannot accurately be represented by an average... despite it being reported out that way on ten prior occasions
That’s what I was thinking the whole time. In corporate settings, you’re not just trying to explain your data obviously, but often have to show why the counter side’s “data” is misleading if not out right deceptive.
Finally, someone discuss this very topic! Thank you.
You bring up a lot of really good points in this video! I (maybe guiltily, now) honestly really like the idea behind violin plots, but I also feel like the standard way they are constructed in most literature does not do them justice. I’m an empirical economist, and I can’t tell you how many times I’ve made a box plot and wished to have more granular information about the statistical distribution I’m looking at, while still keeping the feel/presentation of a box plot. I think violin plots could be a lot better if the smoothed kernel densities were just replaced with cleanly-drawn histograms with identifiable bins. That way, we could know what we were looking at, and it wouldn’t look vaguely sexual… which, let’s be honest, is the first thing everyone thinks when they look at these graphs. Also, I agree with your sentiment of “why has the aesthetic choice been made to reflect the densities over the y-axis?” It doesn’t make sense to show the densities twice. The one-sided violin plots should honestly be the standard, if anything.
I personally like raincloud plots, since it gives you the granularity of including the box plot with all datapoints + the kernel densities. They can take up more room though, so including a lot of them all in the same figure can be challenging and overly complex.
Such a good video!❤❤❤
Now I need to show this to my GF, I hope it changes her mind, she uses these violin plots a lot in her phd
My theory: The inventors of the violin plot were *literally trolling,* and call it internally the *pussy plot.*
I would argue that in the era of information, violin plots have an important purpose. I get that many dislike them, and I appreciate why. It's annoying to take a ruler and run it across a list of plots to figure out the values. But 99.9% of people nowadays are not actually doing knee-deep research that requires precise fact-checking for these plots. Also, usually whoever made the plot has also published their data so you can just make your own plots or use the data directly, which is actually way more accurate than eyeballing some numbers if you're actually doing math for something in your paper.
And violin plots have a very useful feature, skimmability. They reduce a plot down to its basic shapes and make it so the human brain, upon skimming, can quickly identify "hey, this visualized 2D object has three humps, whereas this one has two. And this one looks just like a ball. These are fundamentally different samples!" Instead of just showing you the curve of the shape (like a normal histogram), the violin plot also gives you the ability to "feel" the volume of the displayed "object" at a specific point. This is useful because our monkey brains are better at identifying and classifying objects than curves. As an example, what might look like a slightly more sudden downturn (but no big deal) in a histogram could give you a "wow, this sample is tapering out like crazy at the bottom!" effect. Also you can often just throw violin plots on top of your box plots and it will just give readers a better understanding of the data than pure box plots.
However, I do agree that smoothing diagrams so they look pretty can be quite obnoxious and sometimes even obstructive to science. And it happens most often with violin plots, since they give more of a "wow" effect if you smooth them like crazy. Additionally, I think violin plots really only "work" with certain types of small data (i.e. just look at these three humps). If there are too many metrics, it becomes a nightmare to read and interpret violin plots (as you have stated) and it should just be thrown in a different diagram. Basically, if your violin plot looks more like a weird snake than a violin, you should rethink how you're presenting your data.
And honestly, I do just want more research on these plots and their effects on scientific research. If you can get a paper together that actually proves violin plots are trash and obstruct science, I would love that. It would help me accept that everything in this comment (so far) is just a subjective personal experience from me and my professors, and I'd be more inclined to never use one ever again. My counterarguments to the video are subjective experiences, after all. Also, then I could cite you, which would "appreciably increase your market value" as they say on LinkedIn, I think. Oh, and I'd love to see the feedback of the people I've worked with who have conducted extensive research primarily relying on violin plots. I'm cool with sharing videos like this with my friends, but I'd rather give my professors something more academic, you know.
Addendum: Wow.. I don't have a vagina and I leave thoughts about genitals at home when doing research, I can't believe I never realized they look like . Yeah, that will actually make me think twice about using them... Thanks to the creator for the story and transparency, it must suck to put themselves out there to this extent. And I'm glad they decided to ignore the idiots who go "huhuhuh va" at work. Especially since many of those people will likely see this video, potentially even including that f-ing guy. The "joke" is so thoroughly unfunny that it made me almost laugh at how badly the guy conducted himself. I think it's best to just pretend those people don't exist, in most cases. I really hope some of his coworkers or friends gave him a serious talking-to after the presentation... If any reader experiences this kind of behaviour from someone they know, please just sit them down in private and tell them it's not ok.
That said, as many commenters have mentioned before, something looking kinda suggestive does not mean it is wrong to use scientifically. It's just really nice to have a box plot with extra data about a specific metric sometimes. Sometimes you just wanna be able to directly compare like 10-20 different f-ing plots without having them overlap and look like a horrible unreadable cluster-f. Especially if you expect the reader to find something you didn't expect. What if someone finds a cool association that would be hidden behind the back layers of the fancy 3D layered histogram? You cannot seriously tell me you can clearly see the beginning of each distribution at 21:00...
Also, accessibility! Not everyone can see colours, colourblind people exist. That's why the one histogram in the beginning had kooky patterns, colourblind people wanna understand data distributions too. Also, I know it's the digital age, but many researchers like printing out their important references and notes. As a starving student, it costs soooo much more to print something in color rather than greyscale. I implore you to take your "useful diagram" at 20:40, print it in greyscale and explain to someone what it means. Or better yet, just ask your red-green colourblind friends to read it. In this case, not even kooky patterns can save you, because you want them all to overlap! Not every design decision is good, many of them have problems. But I would say violin plots have a clear reason to exist, even if that reason is a bit nuanced and overlooked.
And yeah, I don't like making women uncomfortable, but anything is suggestive if you look at it too much. I think most of us can agree that the awkward situation was created by that one weirdo, not by the plot. However, I hold hope that perhaps one day some psychology student will set out to prove me wrong on that. Until then, I will probably keep using violin plots every now and then
I've been watching this video thinking "Mmm, I don't think I'm seeing where Angela is coming from here. They aren't that bad."
...until the bean plot at 23:07. An absolutely chaotic dumpster fire of borderline illegible meaninglessness. I get it now. I'm revolted that it took me this long.
The big problems are 1) there is no scale provided for the frequency of the distribution and 2) If you want to compare how two distributions differ, you need them overlayed on top of each other, but violin plots are presented side by side instead of overlayed (you could get rid of the box plot in the middle and then overlay them, but then that just makes them harder-to-read, unnecessarily-smoothed histograms)
@@inafridge8573I didn't fully consider point 2. Imagining trying to do this disgusts me. Thanks for your reply! :)
7:38 as soon as you explained this I literally shouted, out loud, "WHY?" I had one term of high school statistics and one term of quantitative anthropology. I am not an expert in this. I am, however, able to understand when something takes everything that is good about two things and removes it, leaving only the useless stuff. This is that.
*me, struggling through learning data analysis*
time to get my opinions validated about this dumbass plot
ok, I had to stop watching due to being a parent and didn't come back for a long time. whoops. I can safely say ALL my opinions were validated, except that... every time, I think of dangerous butt plugs rather than genitalia, which is even worse, imo?
regarding why they mirror the curve about the axis, I think 1. some people are unnaturally obsessed with bilateral symmetry, and 2. mirroring the curve makes the changes in the data more dramatic, and this is a plot for the wholly unsubtle.
The comedy is pure gold. You're highlighting a good point that sometimes people go for "vibe" more than usefulness in published papers, and that we should scrutinize our charts more. In particular, I agree we should handle graph smoothing with more care and deliberation. However ... the violin graph is not that bad ... and I think 3D charts are worse (thank god they're getting out of fashion).
Anyway, this is the first video from your channel that I've seen, but I dig the content so you got another subscriber!
ridgeline plots, though... damn that's fine
I have deep respect for your pivot into the more whimsical hot-takes lately. This is great and hilarious.