Sankey plots #
Sankey diagrams are a way to show how values flow from category to category. They are often found in representing budget data, but can also be seen to represent things like inventory, energy flows, and even the results of using data apps.
Sankey diagrams in Plotly #
Plotly has a way to directly create a Sankey plot. Thus, we can use Plotly directly. However, we need to learn to read the documentation and understand how to construct the plot. This documentation can be found here
This is the basic example that is shown:
and we can use it as a guide.
The idea seems to be that Sankey diagrams are made up up ’nodes’ and ’links’ between these nodes. So, at the most basic usage, we simply define something about the nodes, and we define the links (or flows) between them.
In the example below, I will show the flow of time, from the hours in a week, to how I used them in the week of September 29, 2025. I keep track of the hours I spend at work, so I will report the projects I worked on, plus sleep and free time.
My time tracker looks something like this:
I can see that I need to provide labels for these categories, and then I need to provide information about the values of how the overall category (hours in the week) flowed into how I used it.
Also, we have been using the approach where we first make a figure (fig = make_subplots()) and then add a trace to it, using something like fig.add_scatter(). In this case, we want a Sankey chart, so we will use something like fig.add_sankey().
Here is the most basic code:
from plotly.subplots import make_subplots
sankey = make_subplots()
sankey.add_sankey(
node = dict(
label = ["hours in week", "email", "CHEM 110H", "SC 103N", "Research", "Service", "Sleep", "Free time"],
),
link = dict(
source = [0, 0, 0, 0, 0, 0, 0] ,
target = [1, 2, 3, 4, 5, 6, 7],
value = [389, 638, 694, 291, 1016, 3360, 3692]
)
)
sankey.update_layout(title = "During the week of 09/29/2025, most of my time was spent out of work")
sankey.show("png")
In this code, I reverted to our typical use of make_subplots and add_sankey, but we can still use the general guidance in the documentation. In particular, I use both a node and a link dictionary.
The node dictionary contains a single entry; label which has a list that is all the categories of time. One is to contain all the hours in the week (“hours in week”) and the rest are the projects that I worked on. For instance, I had a category for the work done on this tutorial (for SC103N at Penn State).
In link I supply information for how time flowed. In this basic plot, the 1st entry in the label (index = 0) is the source of all time. That is, all the hours in the week come from here. Then the target is to the difference indexes corresponding to the different categories. Since this Sankey diagram is pretty simple, I only have 1 entry for each target. Thus, corresponding to each source and target entry, I have the amount that was moved to there.
If I make this Sankey diagram, I obtain the following:
This is a simple Sankey diagram, with no styling. You will notice that the default behavior is to sort the flows by size!
But I can also make a more complex Sankey diagram. If you look back at the time tracking, you might notice that I use color coding. That is both SC 103N entries and CHEM 110H entries are green. This is because, they are both courses I teach. Thus, I could try to group them into this category, and then further split them. I can do this, simply by adding a teaching category, and then adding the appropriate numbers.
from plotly.subplots import make_subplots
sankey = make_subplots()
sankey.add_sankey(
node = dict(
label = ["hours in week", "email", "CHEM 110H", "SC 103N", "Research", "Service", "Sleep", "Free time", "teaching"],
),
link = dict(
source = [0, 0, 0, 0, 0, 0, 0, 1, 2 ] ,
target = [1, 2, 3, 4, 5, 6, 7, 8, 8 ],
value = [389, 638, 694, 291, 1016, 3360, 3692, 683, 694 ]
)
)
sankey.update_layout(title = "During the week of 09/29/2025, most of my time was spent out of work")
sankey.show("png")
Where I have simply added in the last two flows to the end of the nodes and links. This produces:
This isn’t too bad, but it might be nice to have teaching where it is, and then further split out the teaching components.
This can, in part, by done by changing the order things are provided in. For instance, we can move teaching earlier, and re-number everything accordingly:
from plotly.subplots import make_subplots
sankey = make_subplots()
sankey.add_sankey(
node = dict(
label = ["hours in week", "email", "teaching", "CHEM 110H", "SC 103N", "Research", "Service", "Sleep", "Free time"],
),
link = dict(
source = [0, 0, 2, 2, 0, 0, 0, 2, ] ,
target = [1, 2, 3, 4, 5, 6, 7, 8, ],
value = [389, 1377, 683, 694, 291, 1016, 3360, 3692,]
)
)
sankey.update_layout(title = "During the week of 09/29/2025, most of my time was spent out of work")
sankey.show("png")
which provides:
But it doesn’t fully capture the look we want. I want “teaching” top be on the same column as “free time” and then the two classes to extend beyond that. But, we can define the position of the nodes as well, if we want. We can do both the horizontal (x) and vertical (y) positions, but here we only need the horizontal. However, if we are going to explicitly provide x-positions, then we need to provide y-positions as well.
import plotly.graph_objects as go
sankey = go.Figure(go.Sankey(
node = dict(
label = ["hours in week", "email", "teaching", "CHEM 110H", "SC 103N", "Research", "Service", "Sleep", "Free time"],
x = [0, 0.5, 0.5, 1, 1, 0.5, 0.5, 0.5, 0.5],
y = [0.5, 0.1, 0.2, 0.15, 0.25, 0.3, 0.4, 0.6, 0.85],
),
link = dict(
source = [0, 0, 2, 2, 0, 0, 0, 0], # Corrected length to match others
target = [1, 2, 3, 4, 5, 6, 7, 8],
value = [389, 1377, 683, 694, 291, 1016, 3360, 3692]
)
))
sankey.update_layout(title_text="During the week of 09/29/2025, most of my time was spent out of work")
sankey.show("png")
The added x and y positions are in terms of relative position, left to right, top to bottom.
The automatic ordering has been lost, so we might want to think about this some more. Additionally, we might want to change the color of the linkers, or the outlines on the nodes, or even other things. There is a great deal of flexibility, which you can learn from reading the documental, or talking to me during class. But for now, the point has been made. We can create a Sankey diagram to represent flows.