Scanning Documents for Security, Retrieval and Storage Longevity

By NE Docs | September 15, 2016

Scanning Documents for Security, Retrieval and Storage Longevity

What file type works for YOUR documents?

There are right ways and wrong ways to scan printed documents to electronic files formats for safe storage and retrieval. Saving to the wrong file format can introduce software compatibility issues and add too much data, often making file sizes too large for efficient storage. Files that are too big or afoul of document handling protocols can slow down networks and even create system errors for which your IT department may need to be called.

This article discusses 3 common file formats of scanned documents and images: TIFF, PDF, PDF/A and JPEG.

We’ll tackle JPEG first. The Term JPEG stands for Joint Photographic Experts Group, which is the committee that created the standard. Note that while the JPEG format is very popular and works well for displaying website images, it’s generally a bad document format for a host of technical reasons. JPEG scans only allow one page at a time. Original file sizes are reduced with lossy compression, which uses only partial data and inexact approximations to represent or display the content. Inefficiency, critical data loss and OCR incompatibility are a few of the reasons to avoid JPEG format for digitizing your documents.

The dominant preferred formats for scanned documents are TIFF and PDF. These file formats utilize lossless data compression which, unlike lossy compression used for JPEG files, does not degrade the original data. While significantly reducing saved file sizes, original data can be fully reassembled from the compressed data, maintaining content integrity.

How do you choose which format to use?

Knowing what kind of documents, the type of content and how people use them will largely determine your best scanning and formatting methods. Let’s take a look at the key considerations:

Are your documents text based or graphics based?

• Text based examples include legal documents, office memos, reports, technical articles and books.       Think of documents that are primarily meant for reading.
• Graphics based documents include photos, drawings, illustrations, maps and newspaper clippings.         Think of documents that are analyzed for visual interpretation.

What are the document characteristics?

• Are they old, torn or stained? Full color, black & white, 2-color or grayscale?
• Are there handwritten notes or other special markings?
• Tonal qualities, color variation, text readability and searchability will determine whether  PDF or TIFF is best.

How will people search for and use the documents?

• Public web searches?
• Everyday office use?
• Free-form text search?
• By category such as date ranges, customer names, invoice numbers…?

Without getting too far into the technical details, there are some fundamental differences between TIFF and PDF files and reasons for using both. Let’s cover them:

A TIFF (tagged image file format) file is essentially a bitmapped or raster image captured by scanning original subjects like photographs, documents and drawings. TIFF files contain original image quality and are easily compatible for placing supporting images into documents. TIFF files are largely supported by most document software. The files contain both the background (white space) data as well as the document subject matter so they are relatively large and often inappropriate for full document scans. TIFF file content is not searchable without an OCR process that creates a separate text file which then needs to be tagged for indexing. In cases where most of your documents are photos, graphics and gradient drawings, TIFF could be your preferred format.

PDF (portable document format) is a file format that captures and recognizes the different elements of a scanned document, such as text, photos and backgrounds. Multiple pages can be scanned and easily formatted for printing, sharing and online viewing. PDF file content is highly searchable. Meta data is easily embedded into the file and full text searching is supported. For this reason it’s highly recommended that you avoid the image-only format when scanning to PDF.

PDF/A (portable document format/archival) is a specific kind of PDF format designed to meet certain ISO guidelines for archiving and the long-term preservation of electronic documents. When a document is saved as a PDF/A, it will not allow functions that don’t adhere to certain archiving standards such as linking fonts and native encryption.

Many NEdocs clients have mixed scanning and file formatting requirements and we assure that all types of scanned documents are evaluated for specific characteristics and uses. You’ll have peace of mind knowing that your documents are secure and that your electronic images will be complete and accurate.

We’re always happy to discuss your specific needs and offer the ideal scanning and document handling solutions for your business. Feel free to call us at (603) 625-1171 or visit our Inquiries Page.

Leave a Comment

Your email address will not be published.