As you can see
here, this graph, representing ten million points of data, plotted
logarithmically against seven million other points of data in a
counter-clockwise fashion, with a smoothing value of 3 and scaled by a function
of the distance from my elbow to my fingertips, designed by a particularly
gifted graphic artist at Bewilderment Inc., CLEARLY shows that eighteenth
century cattle had a strong preference for south facing barns.
Can't argue
with that. But I will anyway.
Over the past
two years I've been noticing a rise in what I like to call "shock and
awe" graphs in digital humanities, designed to overwhelm their audience
and perhaps even to evoke doubt in one’s own abilities to compete in the same
scholarly conversation. These graphics are both incredibly complex
representations of data, and incredibly beautiful. If we got rid of the axes,
we might even be tempted to hang them as art. A colleague of mine used the term
"poster graph" to describe these works. The idea behind that name was
that the graph looked nice enough to blow up and put on a poster. Implicitly,
this colleague suggested that represented in this manner, the data was likely
to impress and captivate. Great. But are complex graphs good for scholarship?
Scholarship shared between academics is not inherently meant to impress. It is meant for making discoveries. And so, while complex graphs are beautiful, they have a time and place.
Scholarship shared between academics is not inherently meant to impress. It is meant for making discoveries. And so, while complex graphs are beautiful, they have a time and place.
Exploring data
is certainly one of those times. Complex representations of data are sometimes the
only way we can make some types of discoveries. Our eyes are, after all, great
at noticing patterns. In a recent example (of which I was quite openly
critical), trends in a set of data only became evident when it was plotted
logarithmically. This graph then led the researchers on the trail of some interesting
discoveries that would not have otherwise been possible. I have no issue with this.
I have no issue with quantitative analysis.
I also have no
issue with attempting to engage an audience who might not otherwise be
interested in the research. I'm always thrilled to see historians,
archaeologists, and mathematicians discussing their work on TV or on radio.
That's fantastic. And in those cases, a "shock and awe" graph is
probably appropriate. After all we have to sell what we do if we hope to
compete with the Hollywood pros and the increasingly popular data journalists
in major news outlets for the scant attention of the masses.
But I do have
issue with shock and awe graphs sneaking into work intended for academic
colleagues – particularly in peer reviewed work, and particularly when the
complexity of the graph is not absolutely necessary to the conveyance of information.
I do have issue with the fact that many very intelligent people who are
responsible for evaluating the truth of these claims do not have the skills to
interrogate these complex visualizations. These graphs have seemingly come out
of nowhere for many who have spent their entire careers working almost
exclusively with text and perhaps only simple numbers. For interdisciplinary
work, there is a good chance that the first time many researchers will come
across a "shock and awe" graph is when they have been handed a paper
to review for a journal.
Understandably
it can be embarrassing to realise you do not have the skills to critically
assess the work in a field to which you have devoted your life. By handing
someone a graph you know they likely cannot appraise, you are deliberately
playing towards their sense of insecurity. It is easy to say the problem is
numerical literacy but we must remember these are extraordinarily complex
visualizations. It takes a lot of skill and a lot of learning before someone
can create these graphs. It takes a comparable amount of time to learn how best
to interpret them. And not everyone has had the luxury of focusing his or her
time on that skill. In some cases surely the reviewer passes the graph through
the filters unchecked. It’s less embarrassing that way.
I don’t believe
this is just a matter of numerical literacy levels. I’d go so far as to suggest
that these graphs are often intentionally overwhelming and unnecessary for
making the argument. But this is not my greatest worry. From the perspective of
good scholarship a shock and awe graph is impossible to test. And therein lies
the biggest problem. You plot tens of thousands of points on a complex
multi-coloured, multi-dimensional scatter plot. The reviewer gets a static
image. How do you test that exactly? How do you know there hasn’t been a
dramatic mistake in the way the information was put on the graph? How do you
know the data are even real?
You can't. You
don’t. And I believe too often their creators know this and hope that in an
effort not to expose one's own weaknesses, a reviewer will overlook parts he or
she does not fully understand. Shock and awe becomes one way to increase the
chances you will get a publication for your CV. I suppose we can’t blame people
for looking out for their own career development. But, one day someone will
take advantage of this knowledge and will cheat. That is, if they have not
already.
Cheating in
academia is not altogether unheard of. The humanities have long battled with
plagiarism. Famously, Saif Gaddafi was accused of having parts of his thesis
ghost-written while studying at the London School of Economics, leading to the
resignation of LSE's director Howard Davies shortly thereafter. Plagiarism is a war that may
always persist. But with the introduction of digital humanities in
collaborative efforts with more traditional humanist fields, we now have to
watch out for the faked results that researchers like Jatinder Ahluwalia have been
accused of committing.
Ahluwalia
recently made headlines after allegedly faking research results during his PhD
work at Imperial College London and later during a Post-Doc at University
College London. The investigation into Ahluwalia's work led to the embarrassing
retractions of papers in the Journal of Neurochemistry, Nature, and a parting of ways between Ahluwalia and his
employer, the University of East London.
We now need
safeguards to protect the integrity of the good work out there, and to allow
people to critically evaluate our results. One way to do that is to be
hyper-critical of the very graphs we love to look at so much. Do they convey
the data in the most straightforward way possible? Are they produced in a way
that allows the data to speak for themselves, or are colour, size, shape,
scale, orientation, or any other number of variables manipulated in a way that
seeks to draw the reader to a conclusion that may not be the correct or only interpretation?
Even something as simple as the order in which data points are put on a scatter
plot can drastically change how one interprets the results. Points that are put
on first may be covered up by later points, thus hiding or highlighting a trend
that may not exist.
There will
always be people who distrust numbers or who scoff at digital humanists as a
bunch of bean counters. That can be frustrating, but it is also invigorating to
know that there are those out there who will be sceptical of what we produce.
We need this scepticism and we need to meet it head on if our work will be
accepted. We can either work towards quelling this type of scepticism by
ensuring our graphs present necessary information as transparently as possible,
or we can attempt to silence it through a policy of shock and awe, with
ever-complex representations of increasingly intricate datasets.
We'll likely
make more friends if we take the former approach.
So before you
publish a visualization, please take a moment and step back. As in the cult
classic, Office Space, ask yourself:
Is this Good for the Company?
Is this Good
for Scholarship?
Or am I just trying to overwhelm my reviewers and my audience?
photo credit: “Swirling a Mystery” by garlandcannon