Last month, we looked at how to digitize a family photo collection. This month, we’ll take a look at how you can use some of the same techniques and equipment to make digital copies of family documents. There are two basic ways to scan text, as a file or using OCR, optical character recognition. Both are appropriate options, but produce different types of files.
A PDF will produce a full-color image of a document, with sharp text and accurate reproduction of any images or photographs on the document. Most scanners have a PDF button on them, so creating a PDF is as simple as making a photocopy. Make sure to use a preview scan if you can, since you can make sure the borders are correct and you capture the full document. Most PDFs are created with basic software and cannot be edited after they have been scanned, but can be universally read using free PDF viewers on a computer or tablet. Because it handles both text and images to produce an exact copy of the document, PDF is especially good to use for digitizing complex documents or those that you would prefer to see as they were created, such as citizenship papers, birth certificates, wedding invitations or entries in a family bible or record book. The bonus is, the files tend to be smaller in size and you can easily e-mail them. It is a good idea to also scan any pictures on the documents separately, as photo files, in case you would like to make enlargements or paste the photographs into other documents, such as a family tree. You don’t have to remove them from the document, just use the preview scan to reduce the scanned area to just the picture.
For documents that are mostly text or that contain text that you might want to copy and paste, you should consider OCR software. Some scanners do come with a basic OCR package or you can purchase one separately. You can find reviews of the latest software at PC Magazine and cnet. The way OCR software works is you use the scanner to scan documents and instead of getting an image file, the software converts the text into a Word (or other word processor) document. This makes the text easy to edit, so OCR is a good option if you have typed pages from a family history, memoirs or creative writing that you want to collect and publish or just have copies that can be edited later on if someone else wishes to compile them into a family history. Keep in mind that OCR software is far from perfect and the output tends to need some editing and formatting to correct misspelled names and clean up any mistakes. It isn’t unusual for an OCR program to “read” dirt as a period or see a creased area as italicized text. For this reason, not all handwritten documents can be read by an OCR program, so those may still need to be transcribed by hand.
As with the photos, you want to make sure to keep the originals someplace safe and follow the same storage advice for the electronic copies that was given in our May column. Next month we will review how to digitize old audio, from LP to microcassette.