Wednesday, April 4, 2012

Tricks for Transcribing High-Contrast Historical Reproductions

If you spend enough time as a historical researcher, you're bound to come across the black blob. The blob - also referred to by its more technical name: "those letters I can't make out because of the stupid contrast levels on the reproduction" - is far more common than many of us would like, especially in online databases containing copies of original historical materials. It may not be the fault of the digitizers; the problem may have first occured decades ago when the source was transferred to microfilm or microfiche. Whatever the cause, it forces many a historian to squint and hypothesize about what lays behind. This post will provide a possible solution to the blob, using free software and straightforward techniques. It will not work in all cases, but it may conquer some blobs.

The above image is an unadulterated screenshot of a Vagrancy Removal Record from Middlesex County in the 1780s, found on the London Lives website. The original source contains lists of names of those forceably removed from Middlesex County. We've clearly got an Elizabeth "Eliz" and a Joseph here. But the contrast on the image is too high to make out their surnames. London Lives does offer full transcripts of everything on the website. Unfortunately, the transcribers were unable to decypher the names and left these particular entries incomplete. We too could pass them by, but if we are interested in what's underneath we can turn to a photo editing program to make an attempt.

This tutorial uses GIMP, a free open-source image processing program not unlike PhotoShop. Feel free to use the program with which you are most comfortable.

Step 1: Save the Original Image

I was using a Mac, so I took advantage of the handy screen capture feature (Cmd + Shift + 4), which allowed me to snag only the part of the image I was interested in correcting. Alternatively you could save the whole image by right-clicking it and using the "Save As" feature.

Step 2: Open the Image in an Image Processing Program

As mentioned above, if you do not already have an image processing program, try out GIMP. It is free after all.

Step 3: Adjust Brightness / Contrast

Open the "Brightness/Contrast" box located under the "Color" menu. Increase the brightness and contrast. In this example I've changed brightness by 118 and contrast by 103. Play with the sliders to get a result that works best for your particular source. You may even find it works better for you if you decrease one or the other. If you max out the values and need even more brightness or contrast, click OK and re-open the same dialogue box. This will allow you to repeat the process. You should notice some of the black blob beginning to fade and reveal hints of what might be underneath. This will probably occur first closer to the edge of the blob. You may now have all the information you need to finish the transcription. If so, great. If not, keep reading.

Step 4: Colorize

This feature is also located under the "Color" menu. This will help us to make the hidden letters pop out from amidst the shades of grey and black. Feel free to play with the sliders here to see if you can brighten up the results to the point where you are comfortable reading them. Sometimes I find it helps to decrease the "lightness" value while increasing the "saturation".

If you are still having trouble reading the words you can go back and repeat the process by again adjusting the contrast and brightness, and fiddling with the colours even more. If that doesn't work, you can move on to step 5.

Step 5: Trace What you Can See

For this step I like to use a USB tablet and pen, which lets me write to the screen in a fashion that's a bit more natural feeling than trying to draw using my mouse. If you don't have one you can do it with a mouse too. Choose the pen tool from the Toolbox and reduce the "Scale" of the brush to something appropriate for the size of the handwriting. Next, choose a nice bright colour that will stand out against the background colours you have chosen. Then, take your time and trace over whatever letters or bits of letters you can see.

As you can see from this example, we have been quite successful. What was once a "man Eliz" and a "ll Joseph" is quite obviously a "Hayman Eliz" and a "Hill Joseph". We did not get every part of every letter, but we did get enough new information to piece together the missing names.

This process may take a few minutes, but it can be worthwhile if your project depends upon decyphering the letters beyond the black blob. Unfortunately, it will not work in all cases. For this technique to work you do need a black blob with some shading variation. Computers store images as a series of coloured pixels with values ranging from completely black to completely white. Many black blobs are actually very dark grey blobs that look black to our eye. If there are shades of grey in your blob, and those shades correspond with the hidden letters underneath, as is often the case, then this technique may help you peel back the black and find what you are looking for.

Happy transcribing.


Sharon said...

I looked at quite a few of those London Lives image files (I applied the same techniques you mention in some really bad cases before we put them online) and I completely endorse your closing comments that often what seems like a black blob on the website is in reality more differentiated once you get to work on the contrast/brightness controls with GIMP - or even a simpler program like IrfanView/Mac Preview. So it's always worth a try.

Adam Crymble said...

Thanks for the comment Sharon. Good suggestion on Mac Preview; I've never thought to use it that way. Very quick and efficient.

Ben W. Brumfield said...

This is fantastically useful, Adam. I'm working on building a web front-end to do exactly this sort of thing for the FreeUKGen transcription tool, but I'd never seen colorize before. Many of our users are quite sophisticated at adjusting contrast and such to discover words blurred by water damage using offline tools, but I think this will be quite helpful.

Adam Crymble said...

Hi Ben, glad you found it useful. I'd love to see how that transcription tool comes out. It sounds like the perfect web-based application for transcribers.

Ben W. Brumfield said...

Thanks, Adam. It's based on the Scribe tool developed by the Zooniverse team for OldWeather, and our extensions will all be open-source. We're still in the early stages of R&D, but if you're interested you can follow our efforts on GitHub.

Anonymous said...

There has been more of the common objects have been cited out here which are considered to be essential and would further helped them around all those values. sql database help