The point of data visualizations #
The point of a data visualization is to communicate meaning found in data.
Whenever one approaches a subject, it is worth asking what you are trying to do. This website explores the design of data visualizations, and we have already discussed the purpose of design, so now is the time to ask about what we are trying to design. In other words: what is the point of a data visualization. We will start by trying to dispel some myths surrounding them, so let us first discuss what they are not.
What data visualization are not #
Data visualizations are not for communicating numbers #
Since data visualizations are, almost by definition, reporting quantitative values, it is tempting to suppose that they are intended to communicate numbers. However, this is simply not the case—or at the very least, it can be easily shown that they are not the best tool for the job.
Consider the data visualization shown below. Suppose I told you that each of these points represented numbers to 3 decimal places of precision. Do you think you could accurately extract these values, even if given quite a bit of time? Perhaps the best approach would be to try to digitize the plot and extract information from the pixel position, but even then there will be some ambiguity, and so it is unlikely that you will be able to extract the numbers with the precision to which they are known.
IMAGE OF SCATTER PLOT.
On the other hand, we have a ready tool for reporting numbers with great precision: the table. Consider the table shown below. From this tool, it is almost laughably trivial to extract the numbers with the desired precision. Thus, it is clear that, if we simply wanted to communicate numbers, we should use a table, not a data visualization.
IMAGE OF TABLE.
This realization suggests that, when we are designing data visualizations, we should not be making decisions based on a desire to report numbers with precision. This is a theme that is explored in many posts on this site, such as in the discussion of [axes], [tick marks], or [grid lines] that, at first blush, appear to be intended for reporting precision, but perhaps serve other, more important, roles.
Data visualizations are not for establishing validity #
Another misconception people have is that “showing the numbers” lends validity to the data. It is true that a good data visualization can be compelling, and can make people to forget to consider the validity of the data. However, showing a data visualization does nothing to establish this. While it is true that a trained eye can spot bad data in a data visualization, it is also equally true that a trained person can readily fabricate data that seems quite compelling. Consider the images shown below, which contain completely fabricated data regarding xxxx or the time-resolve luminescence decay from a dye molecule. Both of these would be challenging for an expert to spot as fabricated.
IMAGES
The reason to discuss how data is acquired, examined, analyzed, and reported is not to establish the validity of the data, but the validity of the methods. It is essentially a matter of faith to believe the person reporting their work is telling the truth. However, if we decide to believe this, then all the rest of the data reporting allows us to judge if we believe their analysis is correct: “If their data did look like this, then does their next step make sense.”
This last point begins to approach the main point of a data visualization, and so we turn to this now.
Data visualizations are for communicating meaning #
The main point of a data visualization is to show some meaning in it.$^\dagger$. There could be many different kinds of meaning and below, we consider just three.
Data visualizations can communicate trends (or lack thereof) #
Below, is a table of numbers, showing the median income in the US between x and x. A question a person might naturally ask is this: has the median income increased or decreased or stayed the same during this time? Looking at the numbers, you might be able to reach a conclusion regarding this question, but it does require some time.
IMAGE of table, clickable to get the data vis.
Now, if you click on the table, it will show you a plot of this same data as a [line chart]. Doing so, immediately reveals the answer to this question. This is the power of a data visualization.
Data visualizations can communicate comparisons and similarities #
Of course, we might not care about the trend in data in isolation. For instance, the change in income is really only meaningful in context of the change in expenses. Thus, we can compare our change in median income (green numbers below) against the change in median housing costs (red numbers below).
IMAGE of table, clickable to get the data vis.
Examining this table, you can probably tell that housing expenses are increasing, but how does this increase compare to the change in income? It can be hard to get a feeling for this from the straight numbers.
Clicking on the table above will reveal a plot of this data, where it is again immediately obvious how these trends compare.
Data visualizations can communicate shapes #
Beyond trends, data can have shape. Consider, for instance, a set of measurements of nanoparticle sizes. These are shown in the table below. A natural question to ask about this is if the data is Gaussian distributed or fits some other distribution shape. Moreover, one might like to know if the population is multimodal.
IMAGE of table, clickable to get the data vis.
Looking at the table, it will be extremely challenging to determine this. However, if one plots a [histogram] of this data, then the answer is again immediately clear. You can see such a plot by clicking on the table.
Good communication of meaning requires bias #
This might be the most controversial statement on this entire website, but for now I stand by it.
It is important to note that, by bias, I do not mean “lying.” By bias, I do not need “misrepresentation.” Instead, I mean, “helping people see what you believe is there.” I mean “biasing a users perception of the data to show what you see.” Let us unpack this some.
I consider a good data visualization a bit like good teaching or good argumentation. Thus, a data visualization is, in many respects, a bit of visual rhetoric. The point of rhetoric is to help someone understand your position and to persuade them to your position. This is a tool used all the time in teaching. When I teach, I am trying to help students see what I see in the world.
There is a pernicious idea that a good data visualization has “no bias”—that it is somehow a objectively neutral representation of the data. Leaving behind that such a thing patently cannot exist, one should ask “should we even strive for that?”
Let’s return to the idea of teaching again. When I teach, I do not gather all the facts I know about a subject, write them down on flash cards, then shuffle the cards and give them to students. Instead, I carefully consider these facts, consider what I think they say about the world, and the organize their presentation such that students can understand how I got there and why I think this is a compelling view.
IMAGE: Randomly organized ideas. Data visualization that is totally neutral (only black and white, vs one that tells a story)
Now, good teaching should also present the students sufficient knowledge to understand how this position was reached, so that they can also judge if it is sound. In the same way, a data visualization should strive to show another person what you ahve found in the data, while presenting this in a way that they can judge if they agree with the position or not. But a data visualization is always in service of reporting meaning.$^\dagger$ and for that reason, we should strive to make this meaning as clear as possible. In other words, we should make it easy for the viewer to reach the same conclusions we did. In other words, we should bias the visualization to show what we have found.
Do not lie #
I want to be very clear that I am not advocating for lying or dishonesty. A data visualization should not mis-represent the data. Instead, it should make the meaning as clear as possible (requiring bias in presentation) while preserving the truth of the data. A good rule of thumb for this is: “if you would not want to explain to someone what you did, then don’t do it.” This suggests a call to action: whenever you show a data visualization, explain to people what you have done to create it. This is, after all, the standard in science.
$^\dagger$ This wiki focuses on communicating with data visualizations. Thus, we are focusing on designing plots you will show to someone else when telling a story about this data. There are, of course, times where you might wish to present data in as neutral a way as possible: for instance, when you are first performing exploratory data analysis on a new set of data. However, I think more bang for the buck can be gained by considering the communication side of things. Besides, once you learn how to bias a plot to tell a story, then you can also know how to remove this, and create neutral data visualizations for your exploratory data analysis.