Saturday, December 7, 2013

Crymble Awards, Best of 2013

For this, the third year running (2011 & 2012), I've decided to acknowledge five projects who have most influenced my academic development in the past year. Winners have come up with ideas or shared their knowledge in a way that's had a real difference on the way I've approached my own work. This influence isn't always possible to measure by counting up citations in footnotes, but it's important to recognize.

Narrowing the list down to five projects each year is a challenge; there is so much great work going on that's worthy of praise. Nonetheless, I present to you my Crymble Award winners for 2013. Thank you for your inspiration.

1) Jorge Cham and Meg Rosenburg, 'Big Data + Old History' PhD Comics.

Belovedly known for his academic comic series, PhD Comics, Jorge Cham challenged PhD students to describe their thesis in two minutes, with the promise that he'd animate the twelve best entries. I was fortunate enough to be selected as one of the winners, and I'm thrilled by how Cham and his colleague Meg Rosenburg transformed my words into an engaging two-minute cartoon. I've had great feedback from the video (I think a couple people even think I'm cool now), and it's certainly showed me that the written word is not the only way we can share stories about the past.

Thanks very much to both Jorge and Meg for including me in the project. It was a great experience and a lot of fun.

2) Adam Frost and Tobias Sturt, The Guardian Data Blog.

In March I attended a one-day masterclass hosted by The Guardian newspaper on data visualization and visual storytelling. My award goes to two of the presenters on the day: Adam Frost and Tobias Sturt, both of whom worked at the time for the Guardian's Digital Agency (for-hire Guardian visualization experts). The pair hosted a great session on how the team at the Guardian Data Blog take raw data to the finished products which capture our imagination.

I'd definitely recommend the workshop - though I note they've raised the price from £99 to £250 since I attended. Not only did I see some great data visualization examples, but it got me thinking about the importance of answering the oft-ignored: who cares? As Frost noted that morning, without clarity and persuasion, data is just a spreadsheet. Visualization is about bringing data to life for an audience, and I'm grateful for Frost and Sturt for making that so clear for me. The graphs and visualizations I've been creating for my thesis have changed markedly as a result of their work, and I like to think it has been for the better.

3) Jelle van Lottum, 'Labour Migration and Economic Performance: London and the Ranstad, c. 1600-1800', The Economic History Review. Vol. 64, No. 2 (2011): 531-570.

Van Lottum is a historian at the University of Birmingham, but this paper was part of his British Academy fellowship at the University of Cambridge a few years ago. I stumbled across Van Lottum's article when researching some background material for an paper I was preparing with some colleagues on lower-class migration into eighteenth century London. It was a bit of a eureka moment for me, as I had been fumbling around in a new field, unsure even of what I'd been looking to do, and this article provided me with exactly the type of framework I was after. Since reading this paper I've  been drawn into an entirely different side of history, and read much more widely than I might ever have done, deliving deeply into the work of some talented economic historians and historical demographers. I've found these new fields a wonderful compliment to my interest in social history, and I owe that in part to what turned out to be a great article by van Lottum.

4) Anne Alexander, Social Media Knowledge Exchange (SMKE).

Anne wins an award for standing up for students' fiscal needs. SMKE was a year-long project that invited students to pitch an idea related to social media and academia. Winners were offered a £500 budget and £500 for themselves. As someone funding my own education, this was an incredibly important opportunity for me. There are so many organizations out there offering funding to pay for conference travel, or for student-run initiatives to pay to bring in senior speakers, or even for expenses to go visit libraries. And yet I get the sense that there's a desperate attempt to ensure students aren't trusted with any money they might use to live on. Those who consider themselves older and wiser and who control the purse-strings of granting agencies of all sizes seem convinced that any money they give directly to students will go straight to the pub. I spent mine on my tuition bill. And I thank Anne heartily for giving me that opportunity by taking a stand and putting value on student work.

I incorporated Anne's idea into a workshop I hosted last month, passing on the bulk of the funding I had to early career students who gave talks at the event. I'd challenge others to do the same. Don't reimburse travel for students; offer grants or honorariums to students for participating, and empower them to make their own decisions about how they get there or where they stay. Thanks to Anne for showing me that.

5) Shoaib Sufi, Neil Chue Hong, Aleksandra Pawlik, et. al. Software Sustainability Institute (SSI).

This year's final award goes to the team at the Software Sustainability Institute, based at the University of Edinburgh. Until I saw the call put out by the SSI looking for fellows late last year, I had never even given a thought to the idea of software sustainability. Since applying (and winning!) sustainability has been a major part of my strategy for all of my work. I was introduced to a wonderful network of people in fields ranging from engineering and physics, to computer science and geography, all who are struggling to ensure their work is useable for the long-term. As a society we put so much time and money into building tools and programmes to assist our research, but so little into planning for the future of that work.

As part of my fellowship I was given £3,000 of funding, which I used to host an event, Sustainable History: Ensuring Today's Digital History Lasts. The event was held at the Institute of Historical Research in London (cohosted by Jane Winters and Tim Hitchcock) and brought together a great group of scholars and information professionals to discuss what historians can and should be doing to ensure their projects and their data survives. It's a little question, but one I'm glad the team at the SSI challenged me to think about.

 * * *

Congratulations to this year's winners, but more importantly, thank you to all of them for shaping the way I approach my research. You join a very talented group of previous winners. Keep on inspiring!

Winners for 2012 and current affiliations:
  • Julia Flanders (Northeastern University)
  • Luke Blaxill (University of Oxford)
  • Peter King (University of Leicester)
  • Andrew Marr (BBC)
  • Fred Gibbs (University of New Mexico) & Miriam Posner (UCLA)
Winners for 2011 and current affiliations:
  • Tim Hitchcock (University of Sussex) & William J. Turkel (Western University)
  • Tim Sherratt (National Library of Australia)
  • Ben Schmidt (Northeastern University)
  • Sean Kheraj (York University)
  • Jeremy Boggs (University of Virginia)
As a final aside, eight of the twelve previous winners have moved on to new institutions and more impressive positions since winning. Can we thank the Crymble Awards for tipping their applications over the threshold? I suppose we may never know...

Tuesday, October 29, 2013

Is Creative Commons Flexible Enough for Historians?

Gumby and Monkey, by Joe (CC-BY-SA)
Creative Commons licenses are incredibly useful. They're easy to use. More and more people understand them. It's even possible to do web searches of Creative Commons content making it easy to find content you can use with confidence. The Open Access movement, particularly in the UK, seems to be promoting Creative Commons licensing as the best way to move towards open access to research, because it means we can (largely) leave lawyers out of it all and implement a standard set of licenses that everyone understands (or should understand). I see the practical merits in that and am a big fan of keeping costs at a minimum. But I also see the counterpoint, that many historians feel Creative Commons just isn't designed for them (see my previous post on Alternative Licensing). Sometimes that feeling is based on a misunderstanding. Sometimes, I think, it's justified. In the interest of opening that discussion, I thought I'd present a couple of scenarios in which I believe Creative Commons is not flexible enough for historians looking to manage the rights associated with their research.

For all of these scenarios, let's assume the work in question is an academic monograph written solely by me.

1) Supporting certain derivations

What I want: I'd like people to be able to translate my book into a range of formats (braile, French, audio, stage performance) without having to ask me, provided that every effort is made to ensure that the translation accurately represents the arguments and positions of the original, and the translator is listed as such on the title page or where applicable. This reuse is only permitted if the entire work is included in the translation.

Why this is important to me: I'm a big supporter of accessibility; I wouldn't want anyone working to provide access to my work for the blind to feel they were prevented from doing that good work by a legal restriction.

Why CC is not sufficient: CC-BY would allow this type of reuse. But it would also allow someone to translate only the introduction, or to pick and choose parts and rearrange them in a way that changes my message. I'm worried if they do that someone might get the wrong idea about my work. You may not think that's important, but it's my book and my reputation, and I am worried. I could use a 'no derivatives' license, CC-BY-ND, but I do want to allow certain types of derivatives under certain conditions.

2) Supporting certain commercial reuses

What I want: I'd like professors creating course readers to feel empowered to use parts of my book with their students. I'd also like private individuals to be able to use individual chapters in edited collections with modest print runs (let's say less than 500). I don't want Evil Publishing Ltd to be able to do the same without asking.

Why this is important to me: I'm a big supporter of ensuring students and my colleagues have access to my work. I also think it's important to support small entrepreneurs. But I know that the publishing industry is big business, and if they're going to make big money from my ideas, I think it's fair to ask that I get a cut of that. Anyone who has ever licensed stock imagery to use on a website or in print knows that the price of the license changes with the number of 'impressions'. In essence, the bigger the advertising campaign, the more money they want to charge you to use the image. This merely attempts to apply those types of restrictions on my book.

Why CC is not sufficient: CC-BY wouldn't give me the power to put the restrictions on Evil Publishing Ltd that I believe is important. Forcing me to use CC in this instance forces me to give away rights I would like to hold onto.

* * *

Those are just a couple of simple examples, which I don't believe are far fetched when considering licensing and reuse from the historian's perspective. For them to work, I think at the very least we need to adopt a CC-BUT license, in which creators are allowed to add restrictions to their license. As I said before, if the concerns of licensors aren't met, they won't get on board. I'd like them on board, but that may need to come at the expense of what seems on the surface to be a simple CC solution.

Monday, October 28, 2013

Sustainable History: Ensuring today's digital history survives

28 November 2013
Institute of Historical Research,
Malet St, London


How long will our digital research survive? Historical scholarship is increasingly digital; and yet we do not have an agreed form of best practices for ensuring that digital scholarship lasts. Speakers at this one day workshop will share practical advice on a range of pressing issues for historians and cultural heritage professionals working with digital material. From ensuring research data is archived safely, to building sustainable strategies into your project workflows, and even learning from the mistakes of others, this event promises practical solutions for big challenges facing digital scholarship.

Registration is free, but spaces are limited (Register here:

Sponsored by the Software Sustainability Institute, the Institute of Historical Research, the Programming Historian 2, and The AHRC Theme Leader Fellowship for Digital Transformations. 


Registration / Welcome  9:30-10:15am
Keynote Addresses  10:15-11am
  • Professor Andrew Prescott, (Dept. of Digital Humanities, King’s College London)
  •  Neil Grindley(JISC)
Tea / Coffee  11am-11:15am
Session 1: Preserving Resources 11:15am-12:15pm
  • Dr. James Baker (British Library) ‘Preserving Research Data for the Future’
  • Jennifer Doyle (King’s College London) ‘Working with Cultural Heritage to Open Research Data’
Lunch 12:15pm -1pm
Session 2: Working Together for the Long-term 1pm-2:30pm                                    
  • Dr. Gethin Rees (University of Cambridge) ‘Capturing and Documenting Workflows for Historical Scholarship’
  • Mia Ridge (Open University) ‘Sustaining Collaboration from Afar’
  • Claire Donaghue (Imperial College London) ‘Strategies for Working Together on Large Projects’
Tea / Coffee 2:30-2:45  
Keynote Address and Open Discussion 2:45-3:30                                                       
  •  Dr. Peter Webster (British Library)

Thursday, October 24, 2013

Academic Freedom License: An Alternative to CC-BY

Professor Peter Mandler, President of the Royal Historical Society allegedly made this comment today at an Open Access event held in London. I was not at the event, but I have heard this concern expressed before: CC-BY licenses allow someone to take an academic work, completely twist the words of the author, and republish it in a way that suggests those are the opinions of the author (either intentionally or through ignorance).

The fear is certainly valid, whether you agree with the interpretation of the license or not. No academic would be happy with the idea of someone twisting their words and republishing something that, if misconstrued, could damage their reputation as a scholar.

I'm inclined to suggest that a CC-BY license does not in fact grant these rights, as the fine print about 'moral rights' points out, noting that 'derogatory treatment' of the licensor's work is not permitted.

Nevertheless, the terms of the license do suggest it is up to the licensor to monitor and police this activity, and if necessary, turn to the courts to enforce it. That's just not practical for a busy academic.

Remixing isn't the only problem. Copyright of images or graphs can also be an issue. Anyone who gives a public lecture these days will be familiar with the release forms that you're asked to sign that require you to grant someone the right to reproduce images and graphs you don't own that happen to be on your powerpoint slides. Academic monographs have the same problem. How can we release our content as open access if the work contains someone else's work for which we have had to ask permission?

If I'm not mistaken, these two issues are the biggest objections to CC-BY licenses for the humanities and social sciences. Thankfully, Professor Mandler has offered another solution, and I'm all for solutions:

New License needed for HSS (Humanities and Social Sciences)

What a fabulous idea. What on earth are we waiting for? I present to you all for consultation: the Academic Freedom License, designed specifically with the needs of academics in mind, that both promotes open access and reuse, and prevents the types of abuses outlined above.

Academic Freedom License

For works released under an 'Academic Freedom License', you are granted the right:

To Share - to copy, distribute and transmit the work in its entirety only.
To Analyse - to data mine and study the work and publish or create work of your own based on that analysis.
To Sell - to make commercial use of the work in its entirety only.

Under the following conditions:

Attribution - You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work)

Excluding - You are prohibited from sharing, analysing, or selling any aspects of the work specified by the author or licensor (such as images under copyright or sections not produced by the author)

With the understanding that:

Waiver - Any of the above conditions can be waived if you get permission from the copyright holder.

Public Domain - Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.

Other Rights - In no way are any of the following rights affected by the license:
  • Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations;
  • the author's moral rights
  • Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights

Thursday, October 10, 2013

Would you buy a product to support digital humanities?

I'm launching an experiment, and I'd love for you to be involved. My PhD funding has just run out and I've been given a final £350 tuition bill during what's known as my 'writing up' period. In my search for solutions to cover this cost, I've found dozens of small grants that will pay for me to buy train tickets or hotel stays for research trips or conferences. I've found dozens more that will let me buy train tickets or hotel stays for others to come to a conference I'd organize. I can even get money to buy equipment for my research projects. But no one will give me money to pay my rather modest fees.

So I've decided to be creative. Crowdfunding has become rather trendy lately. Sites such as Kickstarter are even being taught as part of digital humanities courses, suggesting those of us in the field need to get out there and convince the public to part with some money in support of the research we do. Shawn Graham at Carleton University is now using this idea to raise money for an Undergraduate Scholarship in digital history, with funds to be matched by his university if he meets a certain threshold, and I wish him the best with what I consider a great initiative. But I know the marketplace can only handle so many campaigns that take the same form. So I've decided to go another route and ask: would you buy a product if you knew the procedes went to support digital humanities? Or more specifically: helped to pay my tuition fees?

So I've teamed up with Cafepress, and designed some digital humanities schwag to tempt you into my experiment. I've focused my product line on three key areas for the digital humanities:

  1. Bags and Electronics - Your electronics never looked so digital humanities
  2. Baby Clothes - Your baby makes digital humanities look good
  3. Mugs and Water Bottles - Support digital humanities while you drink

All of my profits will go directly towards my tuition fees. And in the interest of this experiment, I'll report back on progress at the end of 2013 when these limited edition products will disappear FOREVER! Are baby clothes the key to the future of digital humanities? We'll soon find out.

I thank you most humbly for your support.
Edited note: It has been wisely pointed out to me that not everyone needs baby clothes or more 'stuff'. If you'd like to contribute directly, I've set up a link through Paypal where you can do so. Thanks again.

Monday, September 9, 2013

Digital Humanities Comic 'Big Data + Old History'

You used to submit an abstract to a conference to share your findings. Now you ‘Dance your Thesis’ or compete to convince a world-class cartoonist to animate your research and turn it into a video. The modes of disseminating research have broadened in the past decade, with students in particular being offered a range of new contests designed to get them thinking creatively about engaging the public with academic research.

Jorge Cham, the internationally renowned animator behind ‘PhD Comics’, asked students ‘can you describe your thesis in two minutes?’ Cham then chose the best descriptions and turned them into animated cartoons. I'm very pleased to announce my entry was one of the winners, and the animated video of my thesis has just been released:

My two-minute talk focused on how distant reading has been central to my PhD research. There's only so much detail you can fit into a two minute talk, but I hope has been able to introduce the idea of distant reading to a much wider audience and that some of them might take the step to learn more. It's been a great experience, and I'd like to thank Jorge and his team for creating this opportunity. And since getting selected as one of the winners was partially down to voting from the public, I'd also like to thank everyone who took a moment last year to vote for my entry. The response has been wonderful. So thanks again.

I hope you enjoy the result.

Saturday, August 31, 2013

Applications open for Five Solutions: Digital Sustainability for Historians


Five Solutions to What?

Historical scholarship is increasingly digital; and yet we do not have an agreed form of best practices for ensuring that digital scholarship lasts. Five Solutions is looking for five scholars able to outline a solution to the issues of sustainability now facing historians. This one day workshop asks participants to give a 15 minute presentation outlining practical solutions to one of five challenges, with the resources and expertise of an ordinary working historian in mind.  These presentations will form the basis for a one day workshop on practical strategies for digital sustainability.  The presentations can be based on your own experience and ideas, or can be taken on as a research project. We will work with all participants to ensure that the final presentations are both technically workable and illustrated with the most appropriate datasets.

Accepted participants will each receive a £350 honorarium.*

The Five Themes

The following five themes are designed to get you started, but if you have other ideas, we’d love to hear about it. Each theme should be approached with the ordinary working historian in mind.

1.     Preserving research data for the future
2.     Curating an enduring professional online persona
3.     Paying project costs after the money runs out
4.     Capturing and documenting the expertise of temporary staff
5.     Strategies for working together on larger projects

Who Should Apply?

We’re looking for people with passion. Scholars old or young, university students of any level, librarians, archivists, developers, designers, system administrators, or anyone who considers themselves a historian at heart. No specific qualifications or prior experience required - just an interest in helping academia find solutions to organizational and technological challenges facing the sustainability of our digital projects.

What do I have to do?

Figure out a solution, of course! Once you’ve come up with your solution, you’ll share your work in two ways:

1.     A 15-minute presentation of your solution at a one-day conference in London, UK on the 28th of November 2013 at the Institute of Historical Research.

2.     A 1500-2000 word peer-reviewed tutorial outlining your solution to be published in the spring of 2014 in the Programming Historian 2 and distributed as part of ‘IHR Digital’.

All tutorials will be peer-reviewed and released under a Creative Commons CC-BY license. Participants will have the full support of an editor at the Programming Historian 2 who will provide guidance for writing an effective, practical tutorial.

Evidence of previous work with technical writing or a willingness to learn, as well as a strong command of the English language are a bonus.

How do I apply?

By 8 October 2013 send a two-page C.V. and a brief email to (subject line: Five Solutions) addressing the following questions:

1.     What theme would you like to tackle? (Use one of our suggestions or come up with your own.)
2.     Give us an idea of how you plan to solve this issue, or where you intend to look for a solution (max 200 words)
3.     What skills or experiences make you the ideal person for the task?

We apologize in advance, but we are limited to five scholars.

* Our funding restrictions allow honorariums for UK-based participants only, though we are happy to receive applications from those abroad who have access to their own travel funding and who would like to participate.

Project Support By
And by the AHRC Theme Leader Fellowship for its Digital Transformations Theme.

Monday, August 5, 2013

Can We Reconstruct a Text from a Wordcloud?

We’ve all seen Word Clouds. Many of us have even wondered if they’re of any value. I have used word clouds in the past; I find them useful in presentations when I want to highlight the relative importance of certain words over others. For example, I often use this word cloud to the left, to show the most common Irish surnames in the London area during the early 19th century. I hope my listeners will note that Murphy or Sullivan is more common than Burke or Foley, without me having to take the time to explain the connection between word-size and significance.

I’ve also used word clouds in analysis. In a previous post I discussed how I was able to use the below word cloud to show the relative frequency of topics found in the Gentleman’s Magazine between 1800 and 1820, which allowed me to get a pretty good idea of what the gentry and the middle class were interested in during that period.

I think both of those uses for word clouds have been productive. They’ve allowed me to transmit ideas, and formulate my own thoughts on a set of data in an effective manner. But I began to think about other uses, and I began to wonder about the process of getting back to the original data. Word clouds take the individual words (tokens) out of context. As I mentioned in my last post, we think in metaphors, or ideas. Not in words. That means a word cloud reduces a single idea such as “green bowl” into two tokens “green” and “bowl”. It then combines the word “green” into a single graphic based on how often it appears in the text. The program does not take into consideration the fact that “green” as it refers to a bowl is entirely different than Mr. Green or Green Park. An article about Mr. Green’s picnic in Green Park with his favourite green bowl might give you a skewed idea about the importance of the word green, here representing three completely different ideas, and in all three cases simply acting as modifiers to more important concepts (a man, a park, and a bowl).

Just for fun, I decided to do a test. I asked 4 colleagues, all experts on the criminal trial transcripts of the Old Bailey Online, to look at word cloud of a trial. Each person was asked to describe what key information they could tell me about the crime. I was interested in knowing if they could tell me the who, what, when, where, why type details, and if they could reconstruct the basic building blocks from the prevalence of certain keywords. In the spirit of exploration, I played along as well and offered my own interpretation.

The word cloud was created at random by my wife without my knowledge of the trial. All I knew (and all the participants knew) was that the trial took place between 1801 and 1820, and was between 2,000 and 3,000 words long. The word cloud was limited to 75 unique words and common English words were removed. The resultant visualization looks like this:
I have colour-coded my assessment (blue) for aspects I got correct and (red) for the bits I got wrong. What struck me immediately was that this was a case involving theft. That’s a safe bet anyways, since about 50% of Old Bailey trials during this period were theft cases. It was the large number of nouns that led me to this conclusion. I know that trial transcripts always list the items that were stolen, and the testimony in the trial almost always discusses the various objects repeatedly as several witnesses are called to give their account of what transpired. In this case, I’d assume there was a large quantity of alcohol that went missing, ranging from red wine, to port, to gin – stored in bottles, measured by gallons, and in at least one case: a cask.

At the time it was stolen the booze was being stored in a cellar before it was transferred to a cart that was being driven by a horsea mare to be specific. Why Restoration actress Nell Gwyn appears in the set of words, I have no idea since she died over a century before this era, unless the name is a coincidence or perhaps refers to the name of a pub that lost its liquor.

There were a large number of people involved with giving evidence against the defendant including Messers Hutt, Wells, Powell, Wood and Bagnigge, as well as possibly a Mr. Limbrick, and definitely someone named Hart – though again that may be the name of the pub. One of those men is likely the watchman and another an officer. Based on what I know about Old Bailey trials, this suggests there were a lot of witnesses, meaning the prosecutor was concerned that his case may not have succeeded.

The alcohol heist took place in the morning (and was perhaps discovered the following night), and the goods were then transferred down either Maiden-lane or City-road. Given the volume of goods stolen and the fact that death appears in the list suggests our defendant was found guilty and sentenced to death.

My conclusion: a pub named either Hart or Nell Gwyn located on City-road or Maiden-lane was robbed of a large volume of alcohol by a solo male defendant, who was found guilty and sentenced to death.


You can compare my assessment with the full trial transcript. It turns out I wasn’t that far off. There had been three defendants, but they were found guilty and sentenced to death. It had been a large alcohol theft. I wasn’t able to accurately pick out the fact that Messers Wood, Powell, and Hart were the defendants, meaning the who aspect of the challenge had completely eluded me. I also didn’t recognize Bagnigge Wells, which was the location of the crime, not someone’s name. Nell Gwyn in this case was the pub that had its door pried open to reach the alcohol.

Participant 1

“Powell and Hutt were found guilty of breaking into the wine cellar of the Nell Gywnn inn in Bagnigge Wells (I know about this place) They were accused of stealing two gallons of wine, casks of gin and bottles of port from the cellar  belonging to Mr Hart . A Watchmen Mr Wood on his round at 1 o'clock saw a broken iron (lock)on the wine cellar door  and saw two men drive off in a horse and cart down Maiden lane towards City Road and immediately called for an officer. The men were stopped by an officer who examined the cart and found the cask of wine, gin and port belonging to Mr Hart and arrested them. They received the death penalty.”

Participate 2

“I would guess it involves stealing a hamper of goods including a bottle of wine from the back of a cart.  My suspicion is that the defendants were two women, and that the servant of the person who owned the cart/hamper discovered the theft and was called in evidence, though the actual owner wasn't.  The cart was on its way to or from northwest London to banigge wells, for a social occasion and some visiting, it happened at night (suggesting they were returning home) and there was a runner or 'officer' involved in the arrest.”


In these examples, the participants tried to be very specific about the details of the transcript, and in doing so were incorrect more often than not. The basics of the case, including the location of the crime, was however, correct for participant 1. This would suggest that expert readers are able to get some of the basics – though by no means all. However, that expertise does little to bring forth the specifics of the trial.

Participant 3

“This trial seems to have a richer vocabulary than most.  It looks like a theft case from a wine cellar (wine, port, cask, gin, gallons, hampers, bottles, etc. suggest as much), presumably at Bagnigge Wells; actually the prominence of the word cart, and also the word horse, suggest the material might have been in the process or coming or going there, perhaps parked on the City Road (or Maiden Lane). Went suggests action.  Some force was used: broken, crow, saw, which suggests that the cart was broken into.   There is a certain amount of vocabulary indicating how the culprit might have been apprehended: officer, stopped, examined, watchman, observed, charge—this suggests that officers were used to apprehend the defendant. There are some names: Powell, Gwyn, Hart, Mr, Nell, William.  The most frequently mentioned, Hart, is presumably the victim.  Timing is important: o'clock, morning, night—the prominence of night suggests that that is when the crime took place, with the suspect arrested in the morning?  Numbers indicate either the number of items stolen or the time of day (one or two in the morning).


This one came out surprisingly accurate. The participant hadn’t recognized Nell Gwyn as the name of the actress. As in my own case, it proved impossible to figure out who was the defendant and who was the victim. However, the details of the crime and the process of apprehending the defendant is almost bang on. This participant didn’t try to reconstruct the narrative in the same way as Participant 1 or 2, and thus avoided many of the pitfalls. However, there has been no guess at the verdict, and while the basics of the trial are here, the richness of what actually is recorded in the transcript is nearly entirely lost.

Participant 4

“Geographical location
Wine Cellar –property crime
Bagnigge Wells – recognise this as the location of a rather seedy spa/pleasure
grounds on the outskirts of Clerkenwell towards Kings Cross.
City Road – not too far away runs from the City to Islington
Maiden Lane – small street that runs parallel to Covent Garden Piazza on the
south-side although there might be other Maiden Lanes
Time –mention of time o’clock and watchman (usually worked only at night)
and night although mentions morning (in the or the next)
Crime probably burglary because of the mention of the relevance of time
together with watchman
Stolen Goods: bottles of red wine, port, cask, gin from wine cellar
Broken into cellar with an iron bar, maybe crow bar
Probably took it away or hid it in a cart, certainly it is central to the plot and
perhaps discovery of the stolen goods. There may well have been a significant
amount of drink gallons, casks and bottles? Hampers – might have been used
to transport/hide the goods
Arrested – yes (examined) and officer (probably from one of the Police Courts
Observed - spotted
Names Hart – think this is a personal name rather than pub sign and possibly
Wood as it’s mentioned frequently but not as much as the things I think were
actually stolen
Gwyn –forename Welsh? Ha – just spotted nell (same size) and I know that
Nell Gywn was supposed to have lived at Banigge Wells. John, William, think
Hutt is a personal name (not a shed)
Death –punishment (so guilty)”


This one was particularly interesting, because the participant included their thought process as it related to the various words on the visualization. The fact that it was clearly a theft of goods, and that time was mentioned, lead this person to conclude we were dealing with a burglary, which was correct.  The conclusion that Hart was the name of a person rather than a pub was correct, but equally could have been wrong. While Nell Gwyn may have lived at Bagnigge Wells, that wasn’t really relevant to this case.


Can an expert on a historical source reconstruct the details of that source from a word cloud? It would seem the answer is: sort of. Of the five participants, two (#3 and #4) did an incredibly good job of getting some of the details. These people were able to reason out what the words meant by drawing upon their experience with the ways certain types of words were used in criminal trial transcripts. However, in both cases they were light on the details, and it would appear decided not to guess on elements of the crime that they couldn’t be confident in. That is, they spoke confidently when they were confident but otherwise stayed silent.

I think my analysis fell in the middle. I got lots of bits right, but I was also wrong just about as often. I was disappointed that I couldn’t pick out the roles people played in the trial from the word cloud. There was no way to guess who was the defendant, who was the victim, and without recognizing the names, who were the officers. I wasn’t even able to guess how many defendants were on trial. On the other hand, I did guess the type of crime, the verdict, and a few details and circumstances surrounding the arrest. Having said that, I can’t say I was confident in all of my conclusions and was guessing.

And finally, two of the participants were way off (#1 and #2). These two attempted to reconstruct the narrative of what had happened, providing a level of detail that involved a good deal of guesswork.

What does this mean? I think it shows that when faced with a simplified visualization such as a word cloud, the process of getting back to the original is fraught with a level of guesswork. However, an expert in the source material can, with reasonable accuracy, reconstruct some of the more basic details of what’s going on.

How do we move forward? Well, as I pointed out earlier in this article, I think the secret is in moving away from the idea that tokens transmit ideas. Ideas and metaphors transmit ideas, and it would be far more useful to have an idea or concept cloud than one that focuses on individual tokens. But I also think it’s time that those ideas were linked back to the original data points, so that people interpreting the word clouds can test their assumptions. We are ready to see the distance between the underlying data and the visualization contracted. We’re ready to see the proof embedded in the graph. And I hope we continue to see a development in this trend.

Thanks to my participants, Janice Turner, Bob Shoemaker, Louise Falcini, and Tim Hitchcock.

Thursday, August 1, 2013

Can you explain this graph to me? Peer Reviewing a Visualization

"For sale: Mixing bowl set designed to please a cook".

That opening sentence contains 10 words, or "tokens" as linguists often call them. Yet either in its spoken or written form, it really only transmits 4 ideas, or what I imagine Marc Alexander would call "metaphors", which are concepts that go beyond the words but that express meaning and understanding. They allow us to think in chunks.

What?: For sale
What's for sale?: a mixing bowl set
What's it like?: designed to please
Please whom?: a cook

The same sentence represents an attempt to conjure a very measured set of thoughts in another person. I can't take credit for the sentence, but when the author wrote it down, they hoped that you, dear reader, would understand those 4 ideas in the same way as all the other readers, and as they themselves understood them. It's their attempt to control your mind temporarily by drawing upon your understanding and memories associated with those 4 ideas. We may not get all the details exactly the same. Your mixing bowl set may be blue. Mine is seafoam and has spout on each bowl to make it easier to pour your batter into the baking tin. So we likely havn't had exactly the same understanding of the sentence, but our understandings are almost certainly within the limits of what's acceptable to the author.

If we add 2 more ideas to the end of the sentence we end up with a failed conjuration:

"For sale: Mixing bowl set designed to please a cook with a round bottom for efficient beating".

Because of the misplaced modifier, there are now two ways to understand these ideas. Does the bowl have a round bottom for efficient beating, or should the cook who will enjoy the bowl be so proportioned?

Visualizations can offer the same ambiguity.

Is this an image of a rabbit, or a duck?

In this case, it's both, and it's that very ambiguity that the artist intended us to understand. Not all visualizations are intended to teach us something specific, or to so carefully conjure a series of ideas in our minds. That's wholly too modernist for some. Visualizations can be exploratory, used by researchers to come to a different understanding of their data by slicing it in lots of ways until they see something interesting. Or, as I demonstrated in an earlier post, can be a quick way to get a distant look at a large amount of data by reducing it to something easier to digest. In that sense graphing can aid the discovery process of research even before the conclusions are ready to be shared with the world.

But when it comes to visualizations for academic publication, unintentional ambiguity is something we must strive to avoid. If done well, there should only be one proper way of interpreting the visualization. It's our job to create something that can conjure specific thoughts in the reader's head based on the graph's shape, colour, size, orientation, etc. And it should go without saying that those conjured thoughts should be grounded in rigorous research.

As academics we spend so much time and care on our prose, and even our footnotes. Usually (we hope) that prose comes out lucid and if we're lucky, is enjoyable to read. One of the ways we ensure that is through peer review. The editors help us find people who are willing to take the time to read what we've written and provide constructive feedback upon it.

Yet few of us feel we have the aptitude to offer similar feedback on visualizations. We're not visual artists and so we can be forgiven for using colour in confusing ways, or for thinking a pie chart with 100 categories is a good way to express an idea. As I mentioned previously, I'm quite confident that in the present climate, unique looking or impressive visualizations will slip through peer review unchecked, lest the reviewer's lack of expertise in visualization be exposed by making a comment to the effect of "I don't under stand this graph".

Now, far be it from me to suggest we only use column graphs or line graphs, or that we do X, but not Y. I think it's fantastic that so many people out there are pushing the boundaries of what we can achieve via visualization. The folks at the Guardian Data Blog do great work on bringing data to life, and are a wonderful place for anyone seeking inspiration.

Instead, what I would suggest is that as creators of academic visualizations, we make sure our graphs are reviewed, even if our reviewers cannot or will not do so in the traditional peer review process.

The way I'd propose we do that is to show our friends and colleagues what we've made as often as we can, including during the drafting process. But it's not just about showing them. We have to ask the right questions. Let's use the graph below as a (relatively poor) example of a visualization that we might like to get feedback on. Please note that this is not a graph showing real data about the cost of grain in the 19th century. It's just an example.

Most of us likely want to ask "Do you like my graph?" or "What do you think of this?"

A more productive starting point is probably: Can you explain this graph to me? You aren't going to be there when your reader or viewer is interpreting your graph. The best way to find out what set of ideas are going to form in their mind is to ask them to explain their thought process out loud.

In this case, I had intended to show the seasonal difference in the price of grain in London and Edinburgh over a 20 year period. You may not have picked up on that, which means I need to fix something.

Don't be affraid to ask explicitly: Is there any element of this graph that you do not inherently understand? Make sure they can explain the labels on both axes (if relevant). If they don't know where you're getting those values from you may need to rethink your axis labels. You'd be forgiven for asking what the numbers on the Y-axis represent in the example. I didn't label it, so how could you know?

When you start experimenting with your visualizations, you're bound to come up with ideas you think are clear, but that just don't translate into ideas that your reader can interpret. Looking at the sample graph, I wouldn't fault you for asking what the top and bottom line of the curves represent. They're supposed to be two line graphs: one representing Edinburgh prices, and one representing London. I've shaded in the space between the lines to emphasize the size of the gap. If this is in fact two lines, then which one is Edinburgh? Which one is London? And when they overlap, how do I know which bit corresponds to which line? Do they cross, or merely meet and diverge again? I havn't made the fact that this is a line graph obvious because the lines aren't distinguishable from the shape formed by the colours.

Speaking of colour, you'll want to make sure you havn't come up with a palette that is going to make interpreting your graph difficult for someone with colour blindness. There are many different forms of colour-blindness, so it pays to run a test on your graph. You can do this online by using a "Colour Blindness Simulator" on your finished image.

Sticking with the negatives, ask your tester which element of the graph they like the least. For the sample graph, they may say they don't like the colours, or the font, or the legend. Personally, I think using --------> to represent arrows looks lazy. Everyone will have their own opinions on what's worst about your work. If you know what turns people off you can make visualizations that people like. And if they like the visualization, readers are more likely to engage with its message. With this in mind, go ahead and ask if they like your graph. Or if there are any elements of the graph that they particularly fancy.

Just as with your prose, it may take a few iterations and a number of different opinions from colleagues before a graph says to others what you think it says in your own mind. Just because you submitted a graph with your article and the peer-reviewers didn't comment on it doesn't mean you've done a good job of clearly expressing your ideas visually.

And one last question to ask, just to make sure your readers get the right message and aren't distracted: does the shape of the graph make it look like anything unrelated?

Graphs and visualizations have tremendous potential for expressing ideas in academic research, but it's not a skill we're typically taught in school. Most of us learn on the job, or emulate graphs we saw elsewhere that we found effective. Taking the time to ensure the graphs you create transmit the right ideas to your reader is good scholarship. Knowing the right questions to ask makes it that much easier to reach that result.

Questions to ask about a visualization:
  1. Can you explain this graph to me?
  2. Are there any elements you do not inherently understand?
  3. Can you explain what each axis shows (if applicable)
  4. Will people with colour blindness be able to differentiate your colour palette? (check online)
  5. What do you like least about the graph?
  6. Do you like the graph / a particular element of the graph?
  7. Does the shape of the graph make it look like anything distracting?

Tuesday, July 23, 2013

Students should be empowered, not bullied into open access

'Bully Free Zone' by Eddie-S
The American Historical Association (AHA) has just adopted a resolution in support of recent graduates, encouraging them to feel empowered to keep their dissertations offline while they seek a publisher to turn that dissertation into a scholarly monograph.

Surprise, surprise, open access advocates everywhere have started snivelling.

No! they cry. We shouldn't support a resolution passed in good faith to protect the career progression of new scholars against scholarly presses that are allegedly refusing to accept manuscripts based on openly available dissertations. We should be burning books and the organizations that publish them. Down with books, up with free information on the Internet!

Lovely, but you can't eat free information. Makes a shit shelter as well.

Now, I certainly understand, sympathize, and even agree with the complaints of the open access community. Trevor Owens posted some great suggestions last night for ways to amend the AHA statement into one that recognizes some real flaws in the publication / promotion / tenure model that is over-reliant upon books. I certainly agree with Owens that it makes no sense to leave career progression of historians in the hands of acquisition editors at famous scholarly presses.

I'd also suggest that the AHA's claim that history is a "book" discipline is a bit too narrow. From where I live in London England, hundreds of thousands of people make their living either directly or indirectly off of history. That can be anything from freelance tour guides who offer historic walks through the City, to the cafeteria workers in the museums and historic sites, the actor who draws you into his theatre for a rendition of Richard III or the actress who portrays Elizabeth Woodville in a television series, or even her Majesty the Queen whose very presence and connection to a historic institution draws in millions of tourists every year.

The AHA's perspective is probably flawed in terms of the negative reaction of presses towards open access of dissertations. A yet to be published (and open access) article Do Open Access Electronic Theses and Dissertations Diminish Publishing Opportunities in the Social Sciences and Humanities suggests that the vast majority of publishers are willing to consider submissions based on openly available theses.

With all of this in mind, let's give the open access community what they want: You're right.

But dear God you're obnoxious.

The decision of the AHA to support this measure is nothing but a well-intentioned gesture designed to protect and empower those at the most vulnerable point in their career from a perceived threat. How could anyone could criticize them for that? The AHA and scholarly societies like it are not the enemy, and they don't operate to keep scholarship in the 19th century. They exist to promote the interests of their members, and that's exactly what the AHA has done with this resolution. If you want to change their direction, join them. Run for positions of power within their ranks, and influence the opinions of their membership. The historians who belong to these organizations aren't stupid, so if your ideas are good and your models sound, there's no reason we can't expect gradual change towards open access.

Both scholarly monographs and open access have their merits. We shouldn't be pushing for either / or, just like we havn't driven actors from the stage because we have television. Scholarly monographs are an effective way of preserving historical knowledge; they're in a format that the vast majority of us understand and even appreciate. We don't need to give that up.

And while I can appreciate the advantages of open access, its advocates often ignore the problems of an open access model. We live in a society in which things that have no cost have no perceived value. You wouldn't expect your lawyer to work for free, so why your historian? The scholarly presses defend their (failing) business model because it keeps their friends and family employed, their kids fed, and their bills paid. This isn't just a matter of profits funneling into the pockets of the rich. It's the way people like you and me make modest and honest livings.

If we start giving everything away we're promoting a model in which certain professions operate without the security of a paycheque while others doing important work continue to charge for their services. It's all well and good for open access advocates to tell us the benefits of their model, but until they come up with some solutions for its failings, they won't gain any friends who are sitting on the fence. Especially not if every well-intentioned effort by a scholarly society is met with a hostile barrage on Twitter by an extremist perspective that ignores the fact that we're all on the same team: We love history and we want to spend our careers sharing it with others.

If you want to give your dissertation away online, by all means do so. But it is your dissertation. You should feel equally empowered to bury it in a hole in the back yard, or throw it off a bridge. Anyone who tells you that you're bound by some moral obligation to give it away has a job, or a trust fund, and has no business putting any demands on your labour. Even if your scholarly book never earns you a cent, it's your prerogative to try and flog it any way you like. That doesn't make you a bad person. Neither does withholding your thesis from the Internet if you think that will help your pursuit towards a career that allows you to provide for your family. I hold my right to support my family far above your right to read my ideas for free.

I wholeheartedly want to thank the AHA for standing up for and empowering new scholars. No good deed goes unpunished, but there are many of us out there who appreciate your efforts and look forward to continued progress in what we hope becomes a civil debate and progression towards increased open access.