Thursday, August 1, 2013

Can you explain this graph to me? Peer Reviewing a Visualization

"For sale: Mixing bowl set designed to please a cook".

That opening sentence contains 10 words, or "tokens" as linguists often call them. Yet either in its spoken or written form, it really only transmits 4 ideas, or what I imagine Marc Alexander would call "metaphors", which are concepts that go beyond the words but that express meaning and understanding. They allow us to think in chunks.

What?: For sale
What's for sale?: a mixing bowl set
What's it like?: designed to please
Please whom?: a cook

The same sentence represents an attempt to conjure a very measured set of thoughts in another person. I can't take credit for the sentence, but when the author wrote it down, they hoped that you, dear reader, would understand those 4 ideas in the same way as all the other readers, and as they themselves understood them. It's their attempt to control your mind temporarily by drawing upon your understanding and memories associated with those 4 ideas. We may not get all the details exactly the same. Your mixing bowl set may be blue. Mine is seafoam and has spout on each bowl to make it easier to pour your batter into the baking tin. So we likely havn't had exactly the same understanding of the sentence, but our understandings are almost certainly within the limits of what's acceptable to the author.

If we add 2 more ideas to the end of the sentence we end up with a failed conjuration:

"For sale: Mixing bowl set designed to please a cook with a round bottom for efficient beating".

Because of the misplaced modifier, there are now two ways to understand these ideas. Does the bowl have a round bottom for efficient beating, or should the cook who will enjoy the bowl be so proportioned?

Visualizations can offer the same ambiguity.

Is this an image of a rabbit, or a duck?

In this case, it's both, and it's that very ambiguity that the artist intended us to understand. Not all visualizations are intended to teach us something specific, or to so carefully conjure a series of ideas in our minds. That's wholly too modernist for some. Visualizations can be exploratory, used by researchers to come to a different understanding of their data by slicing it in lots of ways until they see something interesting. Or, as I demonstrated in an earlier post, can be a quick way to get a distant look at a large amount of data by reducing it to something easier to digest. In that sense graphing can aid the discovery process of research even before the conclusions are ready to be shared with the world.

But when it comes to visualizations for academic publication, unintentional ambiguity is something we must strive to avoid. If done well, there should only be one proper way of interpreting the visualization. It's our job to create something that can conjure specific thoughts in the reader's head based on the graph's shape, colour, size, orientation, etc. And it should go without saying that those conjured thoughts should be grounded in rigorous research.

As academics we spend so much time and care on our prose, and even our footnotes. Usually (we hope) that prose comes out lucid and if we're lucky, is enjoyable to read. One of the ways we ensure that is through peer review. The editors help us find people who are willing to take the time to read what we've written and provide constructive feedback upon it.

Yet few of us feel we have the aptitude to offer similar feedback on visualizations. We're not visual artists and so we can be forgiven for using colour in confusing ways, or for thinking a pie chart with 100 categories is a good way to express an idea. As I mentioned previously, I'm quite confident that in the present climate, unique looking or impressive visualizations will slip through peer review unchecked, lest the reviewer's lack of expertise in visualization be exposed by making a comment to the effect of "I don't under stand this graph".

Now, far be it from me to suggest we only use column graphs or line graphs, or that we do X, but not Y. I think it's fantastic that so many people out there are pushing the boundaries of what we can achieve via visualization. The folks at the Guardian Data Blog do great work on bringing data to life, and are a wonderful place for anyone seeking inspiration.

Instead, what I would suggest is that as creators of academic visualizations, we make sure our graphs are reviewed, even if our reviewers cannot or will not do so in the traditional peer review process.

The way I'd propose we do that is to show our friends and colleagues what we've made as often as we can, including during the drafting process. But it's not just about showing them. We have to ask the right questions. Let's use the graph below as a (relatively poor) example of a visualization that we might like to get feedback on. Please note that this is not a graph showing real data about the cost of grain in the 19th century. It's just an example.

Most of us likely want to ask "Do you like my graph?" or "What do you think of this?"

A more productive starting point is probably: Can you explain this graph to me? You aren't going to be there when your reader or viewer is interpreting your graph. The best way to find out what set of ideas are going to form in their mind is to ask them to explain their thought process out loud.

In this case, I had intended to show the seasonal difference in the price of grain in London and Edinburgh over a 20 year period. You may not have picked up on that, which means I need to fix something.

Don't be affraid to ask explicitly: Is there any element of this graph that you do not inherently understand? Make sure they can explain the labels on both axes (if relevant). If they don't know where you're getting those values from you may need to rethink your axis labels. You'd be forgiven for asking what the numbers on the Y-axis represent in the example. I didn't label it, so how could you know?

When you start experimenting with your visualizations, you're bound to come up with ideas you think are clear, but that just don't translate into ideas that your reader can interpret. Looking at the sample graph, I wouldn't fault you for asking what the top and bottom line of the curves represent. They're supposed to be two line graphs: one representing Edinburgh prices, and one representing London. I've shaded in the space between the lines to emphasize the size of the gap. If this is in fact two lines, then which one is Edinburgh? Which one is London? And when they overlap, how do I know which bit corresponds to which line? Do they cross, or merely meet and diverge again? I havn't made the fact that this is a line graph obvious because the lines aren't distinguishable from the shape formed by the colours.

Speaking of colour, you'll want to make sure you havn't come up with a palette that is going to make interpreting your graph difficult for someone with colour blindness. There are many different forms of colour-blindness, so it pays to run a test on your graph. You can do this online by using a "Colour Blindness Simulator" on your finished image.

Sticking with the negatives, ask your tester which element of the graph they like the least. For the sample graph, they may say they don't like the colours, or the font, or the legend. Personally, I think using --------> to represent arrows looks lazy. Everyone will have their own opinions on what's worst about your work. If you know what turns people off you can make visualizations that people like. And if they like the visualization, readers are more likely to engage with its message. With this in mind, go ahead and ask if they like your graph. Or if there are any elements of the graph that they particularly fancy.

Just as with your prose, it may take a few iterations and a number of different opinions from colleagues before a graph says to others what you think it says in your own mind. Just because you submitted a graph with your article and the peer-reviewers didn't comment on it doesn't mean you've done a good job of clearly expressing your ideas visually.

And one last question to ask, just to make sure your readers get the right message and aren't distracted: does the shape of the graph make it look like anything unrelated?

Graphs and visualizations have tremendous potential for expressing ideas in academic research, but it's not a skill we're typically taught in school. Most of us learn on the job, or emulate graphs we saw elsewhere that we found effective. Taking the time to ensure the graphs you create transmit the right ideas to your reader is good scholarship. Knowing the right questions to ask makes it that much easier to reach that result.

Questions to ask about a visualization:
  1. Can you explain this graph to me?
  2. Are there any elements you do not inherently understand?
  3. Can you explain what each axis shows (if applicable)
  4. Will people with colour blindness be able to differentiate your colour palette? (check online)
  5. What do you like least about the graph?
  6. Do you like the graph / a particular element of the graph?
  7. Does the shape of the graph make it look like anything distracting?

1 comment:

Seth Dick said...

Its been pretty essential for the students to follow out all those concerns and probable opinions which are said to be of utmost importance. what is a portfolio website