Sankey diagrams

The Sankey diagram is a means to represent how a whole (or multiple wholes) is broken down into parts, often in terms of flow. The flow could be literal, such as the amount of water within rivers and its tributaries, or it could be metaphorical: Sankey diagrams are common ways to represent both money and energy flows. Indeed, the first use of a Sankey diagram was representing energy flows in a steam engine, and was produced by Captain Matthew Henry Phineas Riall Sankey

How to construct/interpret a Sankey diagram #

The basic idea behind a Sankey diagram is this: one starts with a set of categories, parts of which are transferred to a second set of categories. These categories are represented by bars. If the Sankey runs horizontally, then the height of the bars represents the size of the category. These two sets of bars are then connected by series of lines (or arrows), the width of the line reflecting the relative size of the flow. Multiple arrows can leave from any one category and contribute to any other category. Consider, for instance a household budget where there are two partners, each contributing to the overall budget, as well as some other categories.

Though Sankey diagrams can, in some sense, be thought of as representing a part to whole relationship, there is a key difference. One can represent Sankey diagrams where the flows do not total up to the whole. Consider, for instance this Sankey diagram that is associated with Penn State’s 2023 budget.

In this diagram, you can see that the income flows (blue) do not add up to the total budget (grey). The reason is that the university ran a deficit in this year.

This is why I classify Sankey diagrams as primarily representing flow. Though they can be used to represent part to whole relationships, it is not a requirement for them.

When to use a pie chart #

Sankey diagrams are most useful when you have quantitative categories that are reclassified into other quantitative categories. Above we saw examples of finance, but it could also easily be inventory. For instance, you have a supply of chairs and desks, that you need to distribute to classrooms. Or perhaps you have a series of cookies you need to distribute to friends. Really, the only requirement is that the categories be quantitative (so the bars and flows can be scaled) and that there be a flow from one category to another (so we can connect bars by flows). That really is about it.

Design considerations for Sankey diagrams #

As always, the defaults produced by plotting software is unlikely to be completely useful. For instance, if we return to the hypothetical household budget we started with, a default rendering might look like the following:

I think this is both less beautiful than the chart we started with, and less useful. So, let us see how we can get from this chart to a better one, using the tools of design.

Start from a consistent design #

When approaching a complex problem, such as this diagram, I think it can help to start with complete consistency. So, we can make a chart where all the colors are the same.

I note that, even though there is black and grey, a convention within Sankey diagrams is to make the flows a lighter shade of the categories. With this design in hand, we can consider what we might like to change.

Use consistent coloring for categories and flows #

For me, I think about the fact that the incomes are different from the expenses, which are both different from the central budgetary categories. So, I would perhaps choose different categories for these. To emphasize the connection of the flows to these categories, I would color the flows the same, but semi-transparent, so that we can see when they cross. The semi-transparency also helps make the flows a bit lighter, as is conventional.

Order the categories #

As the diagram currently stands, there is no real logical order, especially regarding the categories on the right. As discussed on the page concerning ordering we have a few options. I think in this case, ordering by magnitude makes the most sense.

Improve the labeling #

One thing that many Sankey builders struggle with with placing the labels in a way that makes sense. For instance, I think it doesn’t look amazing that some of these category labels overlap and others do not. Furthermore, I generally don’t like when the labels overlap the markers. Additionally, the fact that some labels are longer than other and some are interior and some are exterior is causing alignment issues. So, we can re-do the labeling, so that the labels do not overlap any of the marker ink. Instead, the outside categories can have exterior labels and the interior categories can be labeled above or the bars. Additionally, we can improve consistency by changing the text to be color-coded to the categories they label.

Remove unneeded lines and narrow categories #

One small thing remains. The categories have black lines around them, which are not needed, and they are probably a bit wider than they really need to be. They will stand out on their own, due to the difference in color, and so we don’t need these outlines.

And now we are left with a fairly clean, and well designed diagram. There are, of course, other aspects that we could choose to continue to work on, but this is a pretty solid place to start from.

Tutorials #

If you want to see how to make a Sankey diagram, I have a few tutorials on this, using both Sankeymatic and Python.