Visualizing crosstabs (or pivot tables) with sankey diagrams
I recently ran across an incredible blog, Excel Hero, that presented a Sankey diagram of energy flow created in Excel. I have been inspired by this to finally post an example of a Sankey diagram I have been using to visualize crosstabulations. The presented example is changes in overall and category grades for NYC High School Progress Report from publicly available data from the NYC DOE (to clarify, these are grades evaluating schools, not students).
Caveat: There is a decent chance that the actual figures I present are erroneously labeled or calculated since I did not bother to review. If the data is of interest, I would work from the raw data. My aim is to present an approach to data visualization, not to present an actual analysis.
Note: If anyone is interested in the workbook used to create these diagrams, let me know and I will provide it.
Value of Sankey v crosstab or pivot table
Crosstab
Sankey, overall
Sankey, environment
Sankey, performance
Sankey, progress
I think the Sankey offers improved pattern recognition over a crosstabulation, even when formatted (although a more nuanced heatmap may have made the table more revealing). For simple and small tables there may be relatively little to gain from visualizing data; but as is the case with visualization in general, the greater the quantity and complexity of the data, the more advantageous it is to visualize. The Sankey I have created has increases in grade represented on the top (green), static grades in the middle (brown), and decreases on the bottom (blue). The height of the path or arc represents the magnitude of the increase or decrease (i.e. big or small jump), and the width of the path represents the portion of schools it represents. In this case, you can learn a lot with a quick glance. For example. there is relatively less movement among overall grades than category grades, environment had relatively little increases compared to other categories, and performance has more varied movement than performance. When I look at the tables, they all look more or less the same.
Typically visualizations improve pattern recognition at the loss of look-up functionality, i.e. exact figures are not available. In this case, the exact number and percent of schools label each line, so it contains all the detail available in the table. The coloring may be more effective if red colored the decreases, but I was inspired by the riverbank metaphor used in this music history visualization (the metaphor does not really apply in my case, but I like how it looks). The line are made semi-transparent and use relatively light colors to improve clarity in cases of overlapping lines. One limitation of this approach is that unless you have a good printer, the diagram looks like junk.
This circular Sankey is generalizeable to any crosstab of categorical data that is ordinal and symmetrical. Sankeys commonly depict asymmetrical or geographic data, but in my opinion Sankeys offer a good alternative to more run-of-the mill crosstabs as well.
I have seen other Sankey or network visualizations created in Processing, and in most cases my Excel creation looks crude in comparison. Here is an example that came to mind, but upon close review it is not a Sankey since line widths do not convey information. Nonetheless, it is an interesting and beautiful visualization.
Creation
The workbook consists of three worksheets:
*Worksheet with the original actual crosstab table. This provides an easy location to populate new raw data.
Worksheet, raw data
*An intermediate data worksheet. This references and restructures the crosstab into one value per row, and each value is named; this provides and easier way of referencing the values in VBA. The worksheet also performs calculations, e.g. each values percent of total and width, subtotals.
Worksheet, intermediate data
*A worksheet containing a template of the diagram, where all of the lines are uniform in width. I created the initial lines and labels manually, and named each line and width with the connection is represents, e.g. a_b.
Worksheet, template
A macro uses the values from the intermediate data worksheet to set the widths of all the lines. It enters data from the intermediate data worksheet into the labels. So the crux of the program is:
One annoyance when creating this was figuring out which shapes are actually accessible to VBA. I think something like smart shape objects cannot be referenced, but lines can. So for example, I originally attempted to have the brown lines be actual box shape objects, but this did not work, so I had to make them lines. And the labels on the brown lines did not seem to be positionable where I wanted them, so I had to use text boxes. Dealing with small stuff like this accounted for about half the time it took to put this together.
When all is said and done, the end-user just needs to populate the crosstab tables and provide a single overall width multiplier for all lines (this takes a few attempts before a width that is aesthetically pleasing and results a clear digram is arrived at) and hit run to generate a new Sankey.








Thanks for the shout out.
Very nice work. I’ve added your RSS to my Google Reader and will follow your blog with interest.
Regards,
Daniel Ferry
excelhero.com
Hi!
It seems that I can use this very nice approach to produce a material and value flow. Can you please to send me the workbook to have a try?
Thanks in advance,
Jenő
Sent. Been a while seen logging in; please excuse the delay.
I would love a copy of the workbook as well. Is this something you could send? Thank you!
Thanks. Sent
Thanks for this…I’d be grateful if you could send me your workbook, to try it out too. Hope this is possible
rg
I love this way of representing crosstab data. Could I also have a copy of this work book?
Just,… wow!
This is exact what I was looking for.
I am doing my graduate investigation and writing an thesis about it.. and could not find a proper way to visualize increases and decreases.
Could you Please send me the worksheet?
You will be in the source reference!
Greetings,
Eric
Seems very interesting. I could use it for my combustion process diagrams!Let me know if I can have a copy of it. Thanks!
Sent. Thanks.