By Amir Netz, Technical Fellow and Mey Meenakshisundaram, Product Manager
Numbers tell the story. But when you have diverse data points and sources, telling the story with just one aggregation to represent the whole range of numbers might often not tell the fully story.
Showing averages over time or across some series of data often allows us to answer questions like: How long did the app take to load in the mobile device? To answer this question, most commonly, we would find all data points for the day and then compute the average. While the average is often a useful metric, by itself is a lossy compression algorithm. What if sizable number of customers are experiencing a slow load time even though the average is within the limits of our expectation? Imagine that we had a dataset that showed on average it took 300ms to load the app. Now we may be happy with that metric, but what happens if every now and then it takes 6000ms to load? The 300ms average number hides that alarmingly bad experience for sizable customer base. This is also where other metrics come into play, like the median, 95 percentiles that can give us a better understanding of the data.
Half a century ago, one mathematician thought out-of-the-box, to solve this problem and came up with the box plot. In his words, the greatest value of a picture is when it forces us to notice what we never expected to see and box plot does it perfectly.
The box whisker plot allows us to see a number of different things in the data series more deeply. We can see outliers, clusters of data points, different volume of data points between series; all things that summary statistics can hide. A box whisker plot uses simple glyphs that summarize a quantitative distribution with: the smallest and largest values, lower quantile, median, upper quantile. This summary approach allows the viewer to easily recognize differences between distributions and see beyond a standard mean value plots.
This week we have two submissions to the gallery about Box and Whisker – one from Brad Sarsfield and another from Jan Pieter Posthuma. Thanks to both them for producing this very important visual and publishing it to the gallery.
In Brad’s chart, every data point is plotted as a circle on the axis; this lets us visualize the distribution of the data points, the top and bottom 5% as ‘outliers’ and color them red and mark the ‘whiskers’ at those points, the 95th quantile and the 5th quantile. You can also adjust these quantile values to meet your needs. In this chart, you have to explicitly say ‘Do not summarize’ in the Values bucket to view each series and data point.
The one from Jan Pieter allows category to make the box colorful. It has a second ‘Samples” category to provide different sample results of one experiment group. The values are aggressed at this second group. But If you want to treat each data point separately, then you can have a column which has unique value for each row and put this in the Sample bucket
Here is the video from Brad
Make sure to mark the aggregation as ‘Don’t summarize’ in the Value bucket for each series.
In the formatting section, you can also specify the percentile for each of the Quantile.
Here is the one from Jan.
To use, simply download Box and Whisker chart from the visuals gallery and import it to your Power BI report and use it.
Here are the links to Brad’s and Jan’s Box plots.
You can also download the pbix file with sample file attached to this post.
As usual, we can’t wait to hear your thoughts and your ideas for improvements.