Monday, October 26, 2009

#apiworkshop Reflection: free the data

I recently attended Bill Turkel's Workshop on APIs for the Digital Humanities held in Toronto and had the pleasure of coming face to face with many of the people who have created the historical data repositories I have used so enthusiastically.

What I came away with was an even stronger conviction that data in humanities repositories should be completely liberated. By that, I mean given away in their entirety.

The mass digitizations that have occurred recently have provided a great first step for researchers. I no longer need to fly to London and sit in a dark room sifting through boxes to look at much of the material I seek. But, I'm still - generally - unable to do with it what I like.

Many repositories contain scanned documents which have been OCR'd so that they are full text searchable, but that OCR is not shared with the end user, rendering the entire thing useless for those wanting to do a text analysis or mash up the data with another repository's content.

Most databases require the user to trust the site's search mechanisms to query the entire database an return all relevant results. If I'm doing a study, I'd prefer to do so with all the material available. Without access to the entire database on my hard drive, I have no way of verifying that the search has returned what I sought.

Many of those at the workshop who administered repositories were willing and eager to email their data at the drop of a hat, but that is not yet the norm. Most of my past requests for data have been completely ignored. When it comes to scholarly data, possessiveness leads to obscurity.

As humanists become increasingly confident programmers, many will define research projects based on the accessibility of the sources. Those who are giving their data away will end up cited in books and journal articles. Those desperate to maintain control will get passed by. If someone asks you for your data, think of it as a compliment of your work, then say yes.


Trevor said...

Excellent post. I think you hit on something important---that sharing data is as much a social challenge as a technical one. APIs and Linked Data are important technical requirements for sharing data, but the harder challenges may be convincing humanities scholars to *actually* share using these technical means (or some others not yet invented).

It seems to me that, in the humanities, we might have an underdeveloped conception of sharing---what do we share and when do we share it? We work with various kinds of data at various levels of "processed-ness" during a project---these might be shared in different ways.

The idea that humanities scholarship happens only "inside the mind of the scholar" is something that Digital Humanities has helped debunk. Building on these insights might help humanists see the value of more data sharing.

Sean Kheraj said...

Great post, Adam. I think you've hit on one of the main points of the API Workshop: the value of sharing data and tools for knowledge production (in this case, historical research). Not only does the creation of an API for historical data open possibilities for greater use of that data, it redistributes control and takes advantage of the great potential innovation of a wider audience of end users.

I really should get started on writing up my reflections for the KnowMob page and finish editing the audio from the workshop. Look for it in the next week or so.

Seth Dick said...

Those are the possible concerns and values which must have even been followed by the one to regard about for the betterment. qualitative data analysis services