News-digitizing making 1862 storm research easier

by Joel Pomerantz

March 3rd, 2011

There’s great news about researching the storm. The California Digital Newspaper Collection has been working on digitizing old news, just as Thinkwalks has been doing, only with more funding. I love calling 150-year-old articles “news”! Perhaps it should be “renews.”

If you’ve been reading this blog, you know about my effort to create a detailed historical survey of the record-setting storm of 1862, which began in December 1861, lasting so long it was called the Noachian Deluge by many alive at the time; it was more than forty days and forty nights, you see.

I’ve been drawing together volunteers to find and transcribe contemporary news accounts. It’s painstaking work. (Wanna help?) Wonderful Thinkwalks volunteers Caesar Napolitano, Barbara Cannella, Jessica Krakow and Kerry McGuire have made it go smoothly—when there’s material available. Some of it has to be sought in hidden places in old archival storage, microfilm and so forth, and that’s assuming we can tell it exists. Some of it isn’t even cataloged.

But things may be getting easier, at least regarding a few old newspapers. The CDNC has been plugging away at the thousands of pages of news that was written since the dawn of California. The images on their site are most useful to my research, as the OCR (text recognition) digitized version is almost impossible to breeze through, having dozens of mistakes per line. But the fact that they did do OCR means relevant articles might be found with a simple keyword search—sometimes.

Access to major, i.e., prolific, papers requires a lot of work. First they’re found, then scanned, then divided into pages, then articles, then read as text, with not a lot of human time available to get past the numerous automated glitches. Then they are posted on the site. Access to mining camp papers (there were many, as the population of California was largely occupied in mining at that time), small town papers, weeklies, and other great sources will progress very slowly.

I spoke with Andrea Vanek at the project’s Berkeley office and she says grant funding, which may run out at as soon as this summer, has allowed them to digitize half a dozen papers in California for certain years only. The total so far is a few hundred thousand pages. When you consider a SF Call Sunday Edition from 1910 had more than a hundred pages, that means only a small dent has been made in the tens of thousands of news publications that have been printed in our state.

For our target time period, the CDNC project has already worked through the Sacramento Union, which we at Thinkwalks haven’t yet done anything with. It’s going to be full of 1861 & 1862 flood news. (They had devastating floods there.) They’ve also completed much of the SF Daily Alta, and a paper in LA. Since their OCR is low quality (based on limits in the low-tech originals and the microfilm itself), they’re hoping to implement a user-correction option. When it comes, we’ll submit all the stuff we’ve already hand-typed from transcription sessions.

Historically augmented reality is hitting its stride, with both entrepreneurs and public entities digging up old photos to overlay on the real world, using iPhone apps. Here are some augmented reality project links. Many are just news of plans, or prototypes, rather than finished projects. Philadelphia, Time shutter, Retronaut, Museum of London (& a blog post about it). There are also some map versions.

I predict the augmented reality fad now underway with photos will eventually extend to text, probably as a result of genealogy research. Someone will try to create descriptive text from history that you can hear or read when you are in the place described, just as my 2nd cousin Steve Echtman has created an app that tells you current info about where you are.

I suspect, also, someone will try to create a database of everyone who ever lived, and that requires looking at all text, right? Genealogy is the driving force behind a lot of history research these days. Mormons are obsessed with it for religious reasons, for starters, as are many others. Maybe grant money can come from rich users trying to buy an afterlife by saving souls. (If I understand right, collecting names of people that have died allows those who believe Mormon doctrine to improve their expected afterlife.)

If you want to help find and transcribe news from the storm, some of which will go to reconstruct the weather system as it passed through and dumped rain for weeks, please get in touch.

Leave a Reply, Question or Comment.
(We'll read & make appropriate notes public.)