Thursday, January 17, 2013

Measuring the Diversity of Immigration using the Old Bailey Online 1674-1834

"Mother's Wartime Passport -1941" A. Davey
This is the second in my series of posts on the Old Bailey Online (OBO) corpus. I've downloaded all of the trial transcripts from 1674 to 1834 (find out how on the Programming Historian 2), which is about 100,000 trials and 51 million words of text. In the last post I looked at the impact of editors and scribes on the vocabulary in the Old Bailey Proceedings.

This time I thought I'd look at something a little closer to my area of expertise: immigration to London in the Early Modern era. I've used the OBO heavily in my doctoral work on Irish immigrants, but that's been focused exclusively on the years 1801-1820, immediately following the 1801 Union of Irish and British parliaments. I've yet to take a longer look at immigrants across the centuries using the OBO and I thought this would be the perfect opportunity to do so.

This time I'll be looking at the "people words" extracted from the OBO corpus. As I mentioned in the last post they were identified by extracting all of the words that appeared between a set of "persName" tags in the XML version of the transcripts. This gave me just shy of 62,000 unique strings (referred to hereafter as "words") used to represent people. That's nearly half of all unique words in the corpus. Of those 62,000 words, most (55,000) are not found in the four English language dictionaries I used to identify English words. The remaining 7,000 are words such as "green" or "woman" or "the", which are used to refer to people such as "the woman" or "John Green", but which can also be used in other contexts (the woman's green hat). Not all of these words are therefore proper names; instead, they are words that have been marked up by the OBO team as a reference to a person somewhere in the corpus.

In Figure 1 you can see the rate at which these new "person words" appeared in the corpus.

Figure 1: Total number of "person words" found in the OBO corpus to date. [expand +]

Nothing particularly exciting here. It seems like most of the person names that are also English words appear very early on. It also looks like the number of unique words used to describe people increases at a steady pace throughout the long eighteenth century. From a cursory look at the list of names, it seems evident that many of these words are surnames.

Given names (first names) on the other hand, are much less common. That's because most early modern Londoners had pretty common given names (William, John, Elizabeth, or some variation thereof). Silly names for babies are largely an invention of twenty-first century Hollywood actors and football players.

While London is home to hundreds of thousands of people in the eighteenth century, it's safe to say the number of surnames people had in London increased over time as migrants flooded in from across England and beyond carrying new names with them. New surnames therefore have a few ways to end up in the corpus:
  1. An established London family is mentioned in the record for the first time
  2. An immigrant family with a new name arrives in the area and ends up in the record
  3. Someone with a funny accent tries to say their name and it gets spelled phonetically
In the case of #1, it's entirely possible an established family (or anyone with that name) just avoided the Old Bailey for decades on end. I've managed to do so and there's no reason to expect others didn't too. However, common names shared by many people and local to Londoners should eventually show up. In fact, there's a reasonable chance they'll show up very early. We see this is in fact the case, as Smith, Wilson, Brown, White, and Allen all appear for the first time before 1680. "McCaffrey" on the other hand doesn't show up until 1834 and it's safe to say represents a name brought to London by an immigrant (either scenario #2 or #3 above).

New names arriving in the area may not be indicitive of the total number of new people who have migrated to London, but I do believe it reflects the growing diversity of immigrants arriving. Malcolm Smith and Donald MacRaild's article, "The Origins of the Irish in Northern England" shows that at least with Irish surnames, most names can be pinpointed to a particular region in Ireland. This regionality of names was still evident into the middle of the nineteenth century and will be no surprise to any genealogist who has sought out their family's past. We can see direct evidence of this regionality by mapping surnames. Great Britain Family Names allows you to see the distribution of any name in Britain in 1881. In Figure 2 you can see the distribution of "Howard" families, which clearly cluster around a few areas.

Figure 2: The distribution of "Howard" families in 1881 [expand +]
John Mannion agrees with Smith and MacRaild's conclusions about migration. In "Old World Antecedants, New World Adaptations" he argues that people tend to follow migration "channels". That is, someone from their village went before them and came home to say how great it was. Mannion was looking specifically at Irish migrants to Newfoundland and was able to show that villages who had already sent migrants to Newfoundland were vastly more likely to continue to do so than somewhere without the same history. That means the first "McCaffrey" in London was far more adventurous than the 351st. In fact, we could suggest that in many cases the first McCaffrey drew the other 350 over time by breaking the ice. I am interested in why that first McCaffrey decided to come to London, and what factors might have influenced his or her decision to do so.

The reason I think the OBO corpus is a useful set of records for monitoring this growing diversity of migrants is because I'm quite firmly convinced that the Old Bailey is the place one takes people they don't know when they've wronged you. I believe strangers (including immigrants and migrants) were much more likely to be subjected to the official justice system than were people with deep roots in the community.If a stranger wronged you, they had to be caught and punished quickly, or they might disappear. If your long-time neighbour steals your linen tablecloth, you have lots of options for how best to deal with them. You could smack him, you could set the dogs on him, you could knock on the door and demand it back. You had options, and time, because you knew he would be there tomorrow. And the next day. You don’t have that same assurance with someone you've never seen before. And that meant I believe people were more likely to seek a legal response than to choose a community resolution when dealing with strangers or those they do not know very well. In John Beattie's wonderful book, Crime and the Courts in England: 1660-1800, he appears to agree:
In the small-scale society of the village a prosecution may not have been the most effective way to deal with petty violence and theft. Demanding an apology and a promise not to repeat the offense, perhaps with some monetary or other satisfaction, may have been a more natural as well as a more effective response to such an offense, or perhaps simple revenge directly taken (Princeton: Princeton University Press, 1986, p. 8).
If I am correct in my assumption then immigrants are more likely to appear in the Old Bailey records than established members of the community (at least as a defendant), and are even more likely to do so shortly after they arrive in London as opposed to several generations later. That means that there is likely a reasonably strong connection between the date a name first appears in the OBO corpus and the date that name first appeared in the London area. It may not be a precise way to measure the arrival of new names, but I would hasard to say that in most cases it's probably accurate to within a few years or a decade at the most.

Therefore, one way to find new families with few if any connections to the locals arriving in the area is to look for the first time a given surname appears in the records. Considering the nature of the Proceedings, most names that appear in the record likely refer to people in London as opposed to strangers living far away. That's not always going to be true but for the most part it's a reasonable assumption. That means by measuring the rate at which new names appear in the OBO, we should get a reasonable if rough idea of the rate of immigration from distinct family groups over time.

To isolate surnames I've taken all of the 60,000 names present in the London area in the 1841 census and checked them against the "person words". This returned a list of just over 20,000 surnames. That means one third of all unique surnames in London at the end of the period have showed up in the Old Bailey corpus at some point or another. I'm sure I've missed a few, particularly those spelled phonetically, but this is probably a pretty good start.

You can see when each of these names first appears in the corpus in Figure 3.

Figure 3: Number of new surnames in the OBO corpus per year [expand+].
What Figure 3 shows is that the rate of new surnames arriving is actually fairly stable over the course of the eighteenth century. As mentioned in the previous post, the big dip around 1700 is caused by missing data and very short trial accounts, and we might be wise to assume that the entries around 1715 should actually be lower if we had the full set of trials as more names would appear earlier, filling in the gap. The long slow decline therefore over the course of the eighteenth century might actually be better understood as a reasonably flat line hovering around 100 new surnames per year and declining slightly towards 50 or 60. But it turns out that is not the whole story, and the clue to that is the increase in new names in the years immediately following the Napoleonic Wars c. 1815.

After the fall of Napoleon at Waterloo it seems quite clear that there's an influx of new surnames into the London area. I've got a suspicion that the cause of this influx is decommissioned soldiers and sailors who were dumped in London (or found their way there) after the war and got themselves into trouble. War collects soldiers and sailors from far and wide and brings them together. When those soldiers and sailors are no longer needed they're released to go on with their lives.

It would seem that after two decades of war enough people had been uprooted from their native regions by this process for a long enough period that they felt no inclination to go back home. Instead some of them obviously resettled in London or the growing industrial cities in the north, which seemingly offered greater opportunities or were more germane to their skills than the family farm. In fact, many people may have found themselves without a farm to go back to, since the enclosure movement had been consolidating ariable land into much larger units throughout the second half of the eighteenth century, leaving many people landless. That landlessness may have attracted them to the army or navy in the first place, and now with military life behind them they had to find something else to do with themselves. London, it would seem, was it. At least for some.

This trend of more new names after the Napoleonic War doesn't appear to be an isolated incident; it's merely the most obvious case. Instead we see similar patterns in other major wars and domestic conflicts involving the British, the results of which can be seen in Figure 4.

Figure 4: Number of new surnames in the OBO corpus per year, colour coded to highlight periods of war and peace [expand+]
Figure 4 shows the same number of new surnames per year arriving in the OBO corpus, but this time is highlighted to show some of Britain's major wars and domestic conflicts in the latter-half of the eighteenth century. The wars depicted in red are:
  1. The Seven Years War (1756-1763)
  2. The American Revolutionary War (1775-1782)
  3. The French Revolutionary Wars (1793-1802)
  4. The Napoleonic Wars (1803-1815)
While the American Revolution wasn't officially settled until 1783, it was effectively over by the end of 1782. The grey bar between 1802 and 1803 separates visually the two wars with the French.

The black bars represent years in which significant domestic conflicts took place:
  1. The Jacobite Rising of 1745 (1745-1746)
  2. The Gordon Riots (1780)
  3. The Irish Rebellion (1798)
In nearly all cases we see a decrease in the number of new names showing up shortly after a war or domestic conflict erupts. This is most evident for the Jacobite Rising of 1745. The apparent dip in migration at this point makes sense; who wants to move when there's a rebellion going on? This dip is then followed by lower than average numbers of new names until the conflict ends, at which point almost invariably the following years experience an above average result. This is evident both for domestic conflicts as well as international wars. The pattern appears again and again.

The differences between the average number of new surnames per year during war compared to the average in the five years after the end of a war are in fact statistically significant, at least for the American Revolution and the combined French Wars (paried t-test: p = 0.0418, and p = 0.0001 respectively. Significance in this case was p < 0.05). The Seven Years War does not pass the statistical t-test (p = 0.1346), However, I am confident we are seeing evidence of the same trend. While my statistical skills are rather rudimentary, I think it's worth noting that failing a t-test does not mean something is not true. Instead, it suggests the numbers alone cannot support that conclusion beyond all reasonable doubt. For me, the fact that the latter wars are so obviously following this trend strenghtens my confidence in a similar trend for the Seven Years War and we can see this in Figure 5, which shows the average number of new surnames per year during and after the three wars.

Figure 5: The average number of new surnames per year during the wars and in the five years following the wars [enlarge+]

The strength of the correlation between these conflicts and the decrease in new names, followed by an increase in peacetime suggests to me that my original assumptions about newcomers getting in trouble with the law were correct. It also suggests some interesting things about migration patterns in the eighteenth century. That is, people migrated when they felt it was safe. During times of turmoil, they stayed put and waited things out.

There are implicitly two groups of people here, so each requires its own discussion, I think. Firstly there are the sailors and soldiers. The reason we don't see these people arriving in London during wartime is perhaps obvious: they were in the employ of the state, off fighting the enemy. Gathered from across Britain and Ireland, as mentioned above, when they were decommissioned they had the opportunity to move where they liked and it would seem some chose London. This may have disproportionately included sailors who may have hoped to find work in London's booming shipping business.

The second group are families or individuals who have decided for economic reasons to move to London. Since we don't have evidence that someone with that name lived in London previously, many of them are presumably amongst the first of their stock to try their hand at London living. This in itself should not be taken lightly, as moving to early modern London without a social support network was an incredibly lonely and dangerous prospect, which is why so many migrants failed and found themselves in gaol, or starving and desperate, looking for any chance to get away. Sadly, we see many immigrants like Sarah Holmes, who claim that they "have no friend but God" as they throw themselves at the mercy of the courts.

What does all this mean? What can we learn about these arriving surnames? War and domestic conflict are not the only variables at play here, but I think it puts forth a reasonable case for the effects of war and peace on migration patterns of those moving towards London in the long eighteenth century. Returning to the idea mentioned earlier about the first McCaffrey (or the first of any family), it seems that families were only too willing to bide their time during periods of war, waiting instead for peace before making their way to a new life in London. We can see why this strategy might have been appealing. Why take a risk when the country is at war?

Unfortunately it may have been the wrong approach from an economic standpoint. According to Ball and Sunderland's "An Economic History of London, 1800-1914", the gap between real wages and cost of living peaked just after peace was called with France in 1817 (p. 95). That means people were most desperate when the government realised it actually had to start paying for the war it had waged. It seems to me likely that the two trends are actually connected. As new families arrived they may have been desperate for any work, forcing down the price of labour in London as a surplus of workers vied for jobs. It may seem counterintuitive, but these data suggest it's best to move during war rather than after.

Nevertheless, this unlikely set of criminal records has provided, I think, an interesting window into wider migration strategies across the eighteenth century. And it came about not by looking at how many people arrived, but when unique groups of people likely first emerged. London it would seem was home to an increasingly diverse population. That population continues to diversify to this day. And though I'm sure there are some Brits who might see war as a strategy for keeping the net migration in the "tens of thousands", take heed, for when peace comes, so too will the immigrants.

Wednesday, January 9, 2013

Whose Lexicon? The Impact of Reporters and Editors on the Old Bailey Proceedings

A few weeks ago I was discussing early modern vocabulary with Tim Hitchcock (as one does on a Wednesday evening). If I recall correctly, he felt that new words likely appeared at roughly the same rate as old words disappeared from the language. In essence, we're not getting a bigger vocabulary, we're just using an ever-shifting one. Ben Schmidt's blog posts on "Age Cohort and Vocabulary Use" and "Predicting Publication Date and Generational Vocabulary Shift" would tend to support this idea. The basis for Schmidt's article was an analysis of the age of authors and how their age impacted the way they used words in 19th century literature. Schmidt found that people learn to use language in a certain way in their youth and tend not to change those patterns very much as they age. This accounts for the slightly different languages whippersnappers and their grandparents speak, even to this day.

I decided to see what I could find out about vocabulary use by digging into the Old Bailey Online (OBO). As many of you undoubtedly know, the OBO is a wonderful corpus of electronic text for anyone interested in Early Modern London. The OBO is an electronic XML version of the Proceedings of the Old Bailey, an abridged transcription of what was said in court for each case held in the Old Bailey courtroom between 1674 and 1914. What we have is not an exact facsimile of every word spoken, but what Magnus Huber believes is “guided by” what was said in court, capturing the ideas, if not always the exact words of the speaker. Though not a perfect transcription of speech, Clive Emsley believes that we can put our faith in the events described in the Proceedings because “the Old Bailey Courthouse was a public place, with numerous spectators, and the reputation of the Proceedings would have quickly suffered if the accounts had been unreliable”. 

Originally intended as a profit-making venture, entertaining the masses with tales of woe in the courtroom, the Proceedings became the official record in 1778 and were required to present a “true, fair and perfect narrative”. Practical limits of course made this difficult. The Proceedings were created entirely without electronic recording devices by shorthand reporters. Many trials therefore appear in significantly condensed form, such as the six-hour trial of Charles Stokes and company from 1787 that is recorded in only 468 words in the published version. Unfortunately, I do not have Huber's annotated version of the OBO corpus. I do however, have the full set of XML files, downloaded from the Old Bailey Online website (to learn how to do this check out the Programming Historian 2). I decided to focus on the records between the beginning in 1674 and the foundation of the Central Criminal Court in 1834. That gave me 161 years worth of early modern transcripts for just over 100,000 trials and 51 million words.

What I found was that even with such a wonderful resource we cannot be sure what people said to one another and how closely those speech events relate to the written records we have left. None of us knows how to speak like an eighteenth century Englishman, and based on my digging I don't think that the OBO can teach us how. That's because the vocabulary used in the OBO is the vocabulary of the people who recorded and edited the trial transcript and not of a wider communal lexicon. Instead of looking into vocabulary, I quickly realized I needed to be thinking about vocabularies. When it comes to the original assumption, we're not getting a bigger vocabulary, we're just using an ever-shifting one, the real question is: who is "we"?

Before coming to this conclusion, I looked at the rate of unique words entering the corpus over time. When I calculated this there were more than 120,000 unique words in the corpus, the introduction of which can be seen in Figure 1, showing a surprisingly linear increase. I should note that I'm defining a word as a unique string of characters. "word" and "words" are two different words for my purposes - that is, the corpus was not lemmatized.
Figure 1: The size of the lexicon int he Old Bailey Online [enlarge+]
Since I work on immigration to London this got me wondering if what we're actually seeing is not necessarily a growth in the vocabulary, but a diversification of names present in the metropolis. Trial transcripts typically discuss people, so we do get a lot of names popping up. New names can come from new people arriving in the city with unique names that no one else had, or when someone pronounces their name with a thick enough accent that a phonetic variation appears in the corpus as a new word (Callaghan, Calaghan, Callagan, Colligan, Calahan, Callaham, Callahan, Calleghan). Early modern parents were still largely not very adventurous with their names so we don't see a lot of children named "Apple" or "Harper Seven" and so creativity is not likely going to be a big factor driving the growth of new names. The OBO XML tags make this relatively easy to test. The "persName" tags allowed me to identify and extract all words used to describe people. As it happened, there were roughly 60,000 of these such names - about half of the entire lexicon.

I also decided to check if the new words represented use of English words, or if they were archaeic spellings, acronyms, or otherwise misspelled words, so I ran the entire set through a series of four English-language dictionaries to see if we were dealing with recognizeable words. These dictionaries included 90,000 unique "words", including lemmatized variations. Each word could therefore be a "dictionary word", a "name word", a combination of both, or neither.

The results can be seen in Figure 2, which shows what category the new words fall into, graphed over time along the same scale on the y-axis.

Figure 2: The size of the lexicon at each trial session, broken down by the category of word [enlarge+]
The results are, I think, interesting. I'll discuss the "Names List" and the "No List" entries in future posts. The bottom left graph is words that appear both as names and as words in the dictionary. This includes words such as "green", which can be both a person's name and a word describing the colour of something. I've decided not to disambiguate between the two on a word-by-word basis for time reasons.

For now, I'd like to focus on the dictionary terms. Dictionary words seem to be on the rise throughout the eighteenth century. The reason for this may in fact be a slight growth in the lexicon, but I think more likely is a tendency towards increased standardization in the spelling of English words. Samuel Johnson's Dictionary appears in print for the first time in 1755, so this is the age of standardization in lexicography. As anyone who has ever tried to read a seventeenth century text knows, people spelled (spelt?) words differently back then. Over time the "accompts" of criminal activity transform into "accounts". People stop committing "burghlary" and are instead charged with "burglary". This of course occurs shortly before the last "souldiers" are sent off to war.

My four "dictionaries" were built for modern use rathern than seventeenth or eighteenth century vocabulary, which means the figures above are a better indicator of when standardized spelling of English words were adopted than they are measures of the lexicon. "Burghlary" should really be counted as a variation of "burglary" rather than as a unique word in the "no list" category. A linguist would tell me that I should have lemmatized my corpus.

But don't dispair, not all is lost. I think we still can learn a thing or two about the lexicon as well as the OBO records themselves from this analysis. In Figure 3 you can see the number of new dictionary words per year introduced into the corpus.


Figure 3: The number of new words per year introduced into the OBO corpus [enlarge+].

The number of new terms is highest in the early years. This makes perfect sense as the corpus size starts at zero on the date of the first trial. Before a word appears in the corpus someone has to use it, and that takes time. The big peak in 1689 is an anomaly caused by a single account that was much longer than typically found at the time. Most big peaks can be traced to particularly long accounts, since these generally represent a reporter going into much more detail and therefore using a wider vocabulary. The dip in the early years of the 18th century represent a series of particularly short accounts, as well as some missing accounts. Where it all gets interesting for me is at the first arrow around 1715.

What we see at this first mark is a fairly high number of new dictionary words appearing each year until about the 1740s. The number of new words around 1715 may be artificially high, since we do seem to be missing entries in the previous decade and presumably some of those new words would have appeared earlier if we had the records. Nevertheless, there does appear to be more new words than normal in the following two decades. Perhaps surprisingly, the publication of Johnson's Dictionary at the second arrow marker, is actually a low point for new word growth in the middle of the eighteenth century. This to me suggests that Johnson was in many respects responding to generally accepted norms of spelling and word use rather than driving the adoption of such uses.

The last arrow is for me the most interesting. The number of new words per year again increases significantly just after 1778, which as I mentioned above was the date that the Proceedings of the Old Bailey became a "true, fair and perfect narrative" - an official record of courtroom activity. Perhaps this shift from a popular to an official record meant a significant change in what went into a trial account. That does seem to be part of the answer. The length of the Proceedings does slowly start to increase, starting in 1783 when the graph jumps upwards.

The fact that there are spikes in new words every time a long transcript appears reinforces the fact that the Proceedings are using a specific vocabulary - one related to criminal justice - as opposed to a vocabularly that's representative of the entire active English lexicon. But the spikes in new words, as well as the number of words in a trial transcript, can tell us even more, if we look at who was writing those words down.

As mentioned previously, Magnus Huber believes the Proceedings are "guided by" what was said in court. Before a spoken word appeared in the Proceedings the original speech event was converted to shorthand by a courtroom reporter and were converted back to prose by the workers in the print shop before being committed to paper. Words represent the attempt of one person to communicate an idea to another. As above, "burghlary" and "burglary" refer to the same idea. The difference between the two spellings is merely a choice in how to record the sounds using letters.

So what effect does changing the scribe have on the rate of new dictionary words entering the corpus? Apparently, quite a lot. I've located the names of the scribes from 1749 onwards in Huber's article, "The Old Bailey Proceedings, 1674-1834: Evaluating and annotating a corpus of 18th- and 19th-century spoken English". When we look at the number of new dictionary words each scribe introduces into the corpus (Figure 4), we see it's certainly not even across the board.

Figure 4: The number of new dictionary words per session added to the corpus, coloured by courtroom reporter [enlarge+].

I recognize there's a lot going on there, so I'll break this down into chunks that are easier to see. What I'd like to draw your attention to is the fact that some scribes increase the size of the corpus significantly, and others do not.

Firstly, let's look at Hodgson, who was holding the transcribers pen from 1782 to 1792 (Figure 5).

Figure 5: The number of new dictionary words in the Old Bailey corpus from 1781 to 1795, coloured by courtroom reporter [enlarge+].

Though W Blanchard was only on the job for a few months, it's quite clear that E Hodgson was on average adding more new words to the corpus each month than had his predecessor. The number of new words Hodgson adds starts slowly, but then picks up rather dramatically before tailing off towards the end of his tenure. Hodgson's "reign" so to speak also overlaps with the significant growth in the size of the Proceedings mentioned above. From the graph and the trendlines, we might make the following conclusions:
  • Hodgson was verbose and reported more than his predecessors
  • He had a larger than average vocabulary that he happily shared
  • After a few years his "used up" his vocabulary and ceased to find as many new words
Hodgson has therefore influenced both the vocabulary used in the Proceedings as well as the length of the resultant documents. We would be naive therefore to suggest that Hodgson was an impartial observer or that the Proceedings during this period are anything but the output of Hodgson himself. The records were not "created"; they were created by Hodgson. The way Hodgson created the records was distinct from how the others did so.

Moving forward slightly in time, let's consider the next three scribes who wrote between 1792 and 1801 (Figure 6).

Figure 6: The number of new dictionary words in the Old Bailey corpus from 1792 to 1801, coloured by courtroom reporter [enlarge+].

The effect of different writers here is perhaps more obvious. Silby doesn't seem to be one for new words, whereas Marson and Ramsay in the blue definitely are. The fact that Ramsay stays on afterwards and the growth of the vocabulary is stunted thereafter suggests that it was Marson driving this change. It's becoming clear that anyone working with these records should be wary of who was responsible for creating them in the first place.

The last section I'd like to highlight is the final one from 1816 to 1834 when a single scribe named H Buckler was on the job. However, Buckler worked under a series of editors as can be seen in Figure 7.

Figure 7: The number of new dictionary words int he Old Bailey corpus from 1816 to 1834, coloured by editor [enlarge+].
In this case, the scribe stays the same yet we still see patterns that seem to make more sense when we know there's a different editor publishing the Proceedings. Clearly when Stokes takes over in 1828 there's a change to the resultant lexicon that spikes up, presumably after he had become confident in his new role after a few months on the job. This set is particularly interesting because from what we can tell, the scribe H Buckler doesn't seem to be the one driving the adoption of new words. From 1816 to 1828 he's one of the less ambitious in this category, but that begins to change as new editors take over.

Conclusions

How does this all tie back with my origional questions about vocabulary in the early modern era? Well, first of all, I failed to test what I set out to understand. Because I did not lemmatize my corpus I was not able to determine if we do in fact have a growing vocabulary or a shifting one. What I should have looked at was a moving window of word use, calculating how many words were used in a given ten-year period. I'll leave that for another day or another researcher to take on if they feel so inclined. My suspicion is that we have both a growing and shifting vocabulary. Words are falling into disuse or at least out of regular use. I imagine that Ben Schmidt's analysis explains most of that shift: young kids don't learn - or at least don't use - the same words as their parents or grandparents. Words die one funeral at a time.
The reason I wasn't able to see a shifting vocabulary using the OBO corpus is because the OBO does not represent a single person's vocabulary. Instead, it roughly represents the combined vocabulary of people who appear one way or another in the Old Bailey courtroom to discuss matter of law and justice, filtered through a courtroom reporter and an editor. As I discovered, those last two have a much bigger impact on the corpus than we might have liked to imagine. For anyone who looks at the language of the court or indeed the format of the transcripts by using the OBO corpus, I'd urge you to keep the impact of those reporters and editors in mind. For anyone studying academic history, do note that what seems like a window into the past, is in fact the product of a few individuals who made decisions they may not even have been aware of that impacted what was recorded, what was not, and what words they used to do so.