What is Violin plot
Violin plots are used to visualize data with its probability density at different values. Violin plots are great if you want to look at a set of data values for a category and analyse the highest, lowest and most probable value. These values can also be compared across multiple categories. Violin plots look beautiful and can be plotted horizontally or vertically.
There are several tools that allow you to build a violin plot, most of which require coding.
I have been learning about violin plots and how to make and style them using Flourish. This is currently the only tool that allows you to easily and quickly build a violin plot.
Violin plot interpretation
How to interpret a violin plot? While they may look a bit overwhelming at first sight to understand, violin plots are easy to read. I created a graph that shows the anatomy of the volin plot. The top, bottom and middle of the violin are the highest, lowest and middle value point respectively, while the widest part of the violin shows the highest probability. The widest part of the violin can appear on any spectrum of its height, it can be close to the highest, lowest or mid value point.
Unlike bar chrats you can plot your violin chart starting with a number higher than 0. With bar charts if you don’t start at 0 your visual will be significantly skewed. With violin charts you can start just a little below your lowest value point. Here is an example of the same chart plotted starting from 110 and starting from 0.
Both variations can work.
When to use violin plots
Violin plots can be used for many different data sets. The goal with using a violin plot is to show the range of values for a certain category, what is the probability density, or in other words what is the most likely value, and how this compares to the other categories. Violin plots can be used for category comparison and trends over time if comparing value distribution between categories.
Violin plots also work best when you are dealing with similar number across the categories you are comparing and the goal is to find the probability density for each category. If you are dealing with big discrepancies in the data values it might be better to consider using a bar chart.
Violin Plot Examples
I have built a couple of examples of a violin plot using Flourish.
This first example shows a trend over time comparing temperatures during each month in 2020 in Colorado. Colorado is known for extreme weather changes on a daily basis. What I am trying to show with this chart is not only the temperature trend, which we expect to be low during winter months and high during summer months, but also the amplitudes that we experience during each month. Also what time of the year can we expect most to experience big temperature changes.
I have used Flourish stories to build this with an animation:
I also looked at ways to apply a violin plot to a web analytics data analysis. Here is a violin plot looking at a month worth of data and comparing the number of visitors to a website in the morning, during lunch time and in the afternoon. What this visual aims to show is the highest probable number of visitors to a website and whether this is during the morning, lunch time or afternoon. This differs from simply building a bar chart with the total numbers by showing also the distribution of the number of visitors to the website and what number of visitors we are most likely to see for each time of day. This can servce a purpose in digital marketing by assisting the strategy of timely campaigns, including paid media and email.
In Flourish you can choose to include beeswarm chart within your violin plot if you are plotting a categorical and numerical data. Here is the same violin plot lookin at website visitors by time of day but the data points are grouped differently.