tag:blogger.com,1999:blog-7029501467787683847.post1469036193329651062..comments2024-03-18T08:41:12.468-04:00Comments on Thoughts on Public & Digital History by Adam Crymble: Identifying and Fixing Transcription Errors in Large CorpusesUnknownnoreply@blogger.comBlogger11125tag:blogger.com,1999:blog-7029501467787683847.post-48253161818509059382018-11-01T11:19:47.081-04:002018-11-01T11:19:47.081-04:00Transcription Errors are ESO's way of letting ...Transcription Errors are ESO's way of letting us know this is either another C0DA or maimed so hard by retroactive Landfall that it's broken and inaccurate.<br /><br />Source: <a href="https://afrolingo.co.za/services/translation-services/" rel="nofollow">Translation companies in south africa</a><br />Anonymoushttps://www.blogger.com/profile/09577816491076612750noreply@blogger.comtag:blogger.com,1999:blog-7029501467787683847.post-87637090346787001642016-11-05T05:21:55.596-04:002016-11-05T05:21:55.596-04:00We can form herein all those objects and meaning w...We can form herein all those objects and meaning which students and other regarded bodies must needed to observe for the future success and guides. <a href="http://www.besttypingservices.net/translate-audio-file-to-text-easily/" rel="nofollow">translate audio file to text</a>Terry Calveryhttps://www.blogger.com/profile/18111327106647939749noreply@blogger.comtag:blogger.com,1999:blog-7029501467787683847.post-45774181329155319442016-04-15T14:21:08.174-04:002016-04-15T14:21:08.174-04:00Thanks for very informative and nice post. If you ...Thanks for very informative and nice post. If you need <a href="http://expectperfection.com/affordable-transcription-services/" rel="nofollow">transcription service</a> please try this one.Restaurant supplies USAhttps://www.blogger.com/profile/00033548915916709072noreply@blogger.comtag:blogger.com,1999:blog-7029501467787683847.post-69386985572254893702016-04-15T14:20:43.793-04:002016-04-15T14:20:43.793-04:00Thanks for very informative and nice post. If you ...Thanks for very informative and nice post. If you need <a href="http://expectperfection.com/affordable-transcription-services/" rel="nofollow">transcription service</a> please try this one.Restaurant supplies USAhttps://www.blogger.com/profile/00033548915916709072noreply@blogger.comtag:blogger.com,1999:blog-7029501467787683847.post-77978267406099806542016-04-15T14:18:53.439-04:002016-04-15T14:18:53.439-04:00This comment has been removed by the author.Restaurant supplies USAhttps://www.blogger.com/profile/00033548915916709072noreply@blogger.comtag:blogger.com,1999:blog-7029501467787683847.post-87984077730088365502015-10-13T08:58:44.493-04:002015-10-13T08:58:44.493-04:00I have gone through,.This is very good information...I have gone through,.This is very good information shared,.<br /><a href="https://www.digitaltranscriptionservice.com" rel="nofollow">transcription service</a>general managerhttps://www.blogger.com/profile/16569143375028967046noreply@blogger.comtag:blogger.com,1999:blog-7029501467787683847.post-83559946428672275642013-02-19T07:32:42.403-05:002013-02-19T07:32:42.403-05:00Thanks for your response, Adam, and for this blog ...Thanks for your response, Adam, and for this blog post. It came up last week at a natural history collections digitization hackathon, where the 'bad transcription vs. obscure-but-correct word' problem was very relevant.<br /><br />I am doing something very similar to what you describe in my database; expanding each abbreviation into all permutations within the JSON record, and then searching that. I'll try to write something up on that.<br /><br />I suspect that the problem with that approach is that it's fairly difficult to do the kinds of bulk analysis you've done here.<br /><br />Sharon, I'm not jealous of your workflow, but it's not substantially different from what we've got working so far.Ben W. Brumfieldhttps://www.blogger.com/profile/08363399128262210534noreply@blogger.comtag:blogger.com,1999:blog-7029501467787683847.post-2145732703234914662013-02-13T05:50:37.610-05:002013-02-13T05:50:37.610-05:00The answer to Ben's question is that it can be...The answer to Ben's question is that it can be updated, but we don't have a process designed to deal with anything more than small numbers of corrections at a time. (Which is to say, it consists of me checking suggested corrections against the images and manually editing the XML files, then getting the programmer to reindex them for the database.)<br /><br />So, we at OBO like Adam's methodology a lot, but developing a practical way to implement it is something we'll need to discuss in more depth, I think. Sharonhttps://www.blogger.com/profile/05651973319804848661noreply@blogger.comtag:blogger.com,1999:blog-7029501467787683847.post-29691981861725033802013-02-12T14:43:50.212-05:002013-02-12T14:43:50.212-05:00Thanks for the comments.
Ben, it's not my web...Thanks for the comments.<br /><br />Ben, it's not my website (I'm a user rather than a creator), so I can't really "do" anything with the errors I find in the original record. I should note that the OBO does accept error correcting suggestions, but it's on a trial-by-trial basis. If they're interested in my suggestions en-masse I'd be happy to pass them along (they know where to find me).<br /><br />Although in your case I imagine you could preserve the original spelling but store an alternative in xml data. I'm pretty sure that's how "fuzzy search" works on some of the major commercial databases. It might be interesting to explore a tool that takes a string like "William" and turns it into all probable spelling variations. On a website with an API like the OBO it would then be possible to search all variations at the same time and come back with all results. Maybe a future project; if you can't fix the transcription, fix the query!<br /><br />Ted, sorting errors by frequency is a great tip. One of my most common "errors" in this example was "scissars", which is obviously an alternate spelling for scissors. The way forward might be as you suggest, to build a dictionary of period-spellings, leaving only the really odd examples left for humans to look at.Adam Crymblehttps://www.blogger.com/profile/16729063535227511371noreply@blogger.comtag:blogger.com,1999:blog-7029501467787683847.post-5711294089925808792013-02-12T07:31:38.285-05:002013-02-12T07:31:38.285-05:00Excellent post. I think it's really important ...Excellent post. I think it's really important for us to start talking about how we approach these issues.<br /><br />Essentially I'm taking very much the same approach you are: identify words that don't match a dictionary, and sort these into period spellings (which I normalize), typos caused by predictable character substitutions (which I correct), or correctly-spelled uncommon/period words (which I try to leave unchanged).<br /><br />To some extent this is possible to automate, but it's always a good idea to sort 'errors' by frequency and manually examine the most common ones. Often they're not errors. Some resources I use are up at usesofscale.com. More to come.Ted Underwoodhttps://www.blogger.com/profile/04012428899328561750noreply@blogger.comtag:blogger.com,1999:blog-7029501467787683847.post-57045889633426969982013-02-12T07:11:03.205-05:002013-02-12T07:11:03.205-05:00I'd be interested in knowing more about what y...I'd be interested in knowing more about what you're doing with the "errors" you find in the text -- particularly those that exist in the original document.<br /><br />I've got a similar problem in my work <a href="http://manuscripttranscription.blogspot.com/2012/10/building-structured-transcription-tool.html" rel="nofollow">creating a searchable database of parish registers</a>. On one hand, we want to preserve and display any abbreviations, Latinizations, or archaisms verbatim. On the other hand, we want to be as useful as possible to researchers, which means that a search for "William" should return entries recorded as "Wm."<br /><br />I'd love to hear you explore that issue, but I'm not certain I understand whether you have the ability to correct any errors you find in the OBO. Can the database be updated at all?Ben W. Brumfieldhttps://www.blogger.com/profile/08363399128262210534noreply@blogger.com