Thursday, May 24, 2012

"Shock and Awe" Graphs in Digital Humanities

As you can see here, this graph, representing ten million points of data, plotted logarithmically against seven million other points of data in a counter-clockwise fashion, with a smoothing value of 3 and scaled by a function of the distance from my elbow to my fingertips, designed by a particularly gifted graphic artist at Bewilderment Inc., CLEARLY shows that eighteenth century cattle had a strong preference for south facing barns.

Can't argue with that. But I will anyway.

Over the past two years I've been noticing a rise in what I like to call "shock and awe" graphs in digital humanities, designed to overwhelm their audience and perhaps even to evoke doubt in one’s own abilities to compete in the same scholarly conversation. These graphics are both incredibly complex representations of data, and incredibly beautiful. If we got rid of the axes, we might even be tempted to hang them as art. A colleague of mine used the term "poster graph" to describe these works. The idea behind that name was that the graph looked nice enough to blow up and put on a poster. Implicitly, this colleague suggested that represented in this manner, the data was likely to impress and captivate. Great. But are complex graphs good for scholarship? 

Scholarship shared between academics is not inherently meant to impress. It is meant for making discoveries. And so, while complex graphs are beautiful, they have a time and place.

Exploring data is certainly one of those times. Complex representations of data are sometimes the only way we can make some types of discoveries. Our eyes are, after all, great at noticing patterns. In a recent example (of which I was quite openly critical), trends in a set of data only became evident when it was plotted logarithmically. This graph then led the researchers on the trail of some interesting discoveries that would not have otherwise been possible. I have no issue with this. I have no issue with quantitative analysis.

I also have no issue with attempting to engage an audience who might not otherwise be interested in the research. I'm always thrilled to see historians, archaeologists, and mathematicians discussing their work on TV or on radio. That's fantastic. And in those cases, a "shock and awe" graph is probably appropriate. After all we have to sell what we do if we hope to compete with the Hollywood pros and the increasingly popular data journalists in major news outlets for the scant attention of the masses.

But I do have issue with shock and awe graphs sneaking into work intended for academic colleagues – particularly in peer reviewed work, and particularly when the complexity of the graph is not absolutely necessary to the conveyance of information. I do have issue with the fact that many very intelligent people who are responsible for evaluating the truth of these claims do not have the skills to interrogate these complex visualizations. These graphs have seemingly come out of nowhere for many who have spent their entire careers working almost exclusively with text and perhaps only simple numbers. For interdisciplinary work, there is a good chance that the first time many researchers will come across a "shock and awe" graph is when they have been handed a paper to review for a journal.

Understandably it can be embarrassing to realise you do not have the skills to critically assess the work in a field to which you have devoted your life. By handing someone a graph you know they likely cannot appraise, you are deliberately playing towards their sense of insecurity. It is easy to say the problem is numerical literacy but we must remember these are extraordinarily complex visualizations. It takes a lot of skill and a lot of learning before someone can create these graphs. It takes a comparable amount of time to learn how best to interpret them. And not everyone has had the luxury of focusing his or her time on that skill. In some cases surely the reviewer passes the graph through the filters unchecked. It’s less embarrassing that way. 

I don’t believe this is just a matter of numerical literacy levels. I’d go so far as to suggest that these graphs are often intentionally overwhelming and unnecessary for making the argument. But this is not my greatest worry. From the perspective of good scholarship a shock and awe graph is impossible to test. And therein lies the biggest problem. You plot tens of thousands of points on a complex multi-coloured, multi-dimensional scatter plot. The reviewer gets a static image. How do you test that exactly? How do you know there hasn’t been a dramatic mistake in the way the information was put on the graph? How do you know the data are even real?

You can't. You don’t. And I believe too often their creators know this and hope that in an effort not to expose one's own weaknesses, a reviewer will overlook parts he or she does not fully understand. Shock and awe becomes one way to increase the chances you will get a publication for your CV. I suppose we can’t blame people for looking out for their own career development. But, one day someone will take advantage of this knowledge and will cheat. That is, if they have not already.

Cheating in academia is not altogether unheard of. The humanities have long battled with plagiarism. Famously, Saif Gaddafi was accused of having parts of his thesis ghost-written while studying at the London School of Economics, leading to the resignation of LSE's director Howard Davies shortly thereafter. Plagiarism is a war that may always persist. But with the introduction of digital humanities in collaborative efforts with more traditional humanist fields, we now have to watch out for the faked results that researchers like Jatinder Ahluwalia have been accused of committing.

Ahluwalia recently made headlines after allegedly faking research results during his PhD work at Imperial College London and later during a Post-Doc at University College London. The investigation into Ahluwalia's work led to the embarrassing retractions of papers in the Journal of Neurochemistry, Nature, and a parting of ways between Ahluwalia and his employer, the University of East London.

We now need safeguards to protect the integrity of the good work out there, and to allow people to critically evaluate our results. One way to do that is to be hyper-critical of the very graphs we love to look at so much. Do they convey the data in the most straightforward way possible? Are they produced in a way that allows the data to speak for themselves, or are colour, size, shape, scale, orientation, or any other number of variables manipulated in a way that seeks to draw the reader to a conclusion that may not be the correct or only interpretation? Even something as simple as the order in which data points are put on a scatter plot can drastically change how one interprets the results. Points that are put on first may be covered up by later points, thus hiding or highlighting a trend that may not exist.

There will always be people who distrust numbers or who scoff at digital humanists as a bunch of bean counters. That can be frustrating, but it is also invigorating to know that there are those out there who will be sceptical of what we produce. We need this scepticism and we need to meet it head on if our work will be accepted. We can either work towards quelling this type of scepticism by ensuring our graphs present necessary information as transparently as possible, or we can attempt to silence it through a policy of shock and awe, with ever-complex representations of increasingly intricate datasets.

We'll likely make more friends if we take the former approach.

So before you publish a visualization, please take a moment and step back. As in the cult classic, Office Space, ask yourself: Is this Good for the Company?

Is this Good for Scholarship?

Or am I just trying to overwhelm my reviewers and my audience?

photo credit: “Swirling a Mystery” by garlandcannon