October 14, 2010 - 9:19 PM
(After posting this last week, I got an email informing me that I had a few facts wrong. So I am posting a corrected version.)
Several recent online discussions re-raised the age old question in imaging: which format is better for storing your scanned documents – TIFF or PDF? This is yet another example of technology religious differences (Wintel versus Macintosh, UNIX versus Windows, Database products, etc.)
Both specifications originate by way of Adobe:
They both have been used as the basis for international standards.
(TIFF in ISO 12369 and 12234-2, PDF in ISO 32000, 19005, 24517, 15930 and 14289)
There are no copyright or patent licensing restrictions to prevent someone from implementing or using them.
Both have mechanisms for custom extension.
Each of these formats has their champions, and regularly the technical forums host vigorous discussions about which is better. Understanding their value requires looking at their original purposes and functionality.
TIFF (previously known as Tag Image File Format) was created to be a standard output format from imaging capture systems. It is a wrapper for a number of image encoding and compression techniques, ranging from lossy schemes like JPEG and other pixel-based approaches. The most commonly used format for document imaging is bi-tonal raster data, compressed using the ITU-T (previously known as CCITT) Group 4 facsimile 2-dimensional scheme. Most recognition tools expect (or at least prefer) this TIFF format as input.
PDF (Portable Document Format) is famous for delivering a consistent view of a document on any platform. Like TIFF, there are a number of different data encoding and compression approaches available within the specification. PDF provides for document tagging (internally stored indexing information) and electronic signing and security. PDFs made from scanned documents may use Group 4 compression (B/W) or a JPEG compression (color) scheme to hold the image. As an option, these PDFs can also include a version of the document’s text, so the PDF can be content indexed and searched. A useful PDF option is web optimization, which involves placing all of information necessary to display a page in a contiguous area, thereby eliminating the need to read or download the entire file to do that. An archival specification, PDF/A has been ISO approved; by limiting the choice of features, it simplifies the user's life.
So which format do you choose to keep your documents in? First, recognize that with all of the options available in both formats, one needs to be careful about the statements and comparisons that you make.
I think it is hard to deny that PDFs are the best and easiest format to use when you are sharing documents over the Internet. The viewer and browser plug-in are freely available. Since you don’t know the environment of the receiver, it does guarantee that the document will look the same no matter what. The security features and the ability add internal document information such as the author and abstract are really useful.
AIIM ARP 1-2009 cautions that TIFF implementations can have custom tags and headers, as well as reminding the reader that the lack of mechanisms for securing and authenticating can make TIFFs unreliable. (Note that PDF also has extensions beyond the official specification.) TIFFs can be edited it points out. These issues are true, but can be addressed through applying proper controls on the creation and management of the image files. Most records management systems will prevent improper access and any modification.
If you are planning to later OCR documents, I think that TIFF is a better choice (others differ).
In the end, I think you have to consider how you are going to use the document. Are you writing it to a WORM media for preservation? Are you planning to share information on the web?
It is easy to go from a TIFF to a PDF, and there are tools to do the reverse. One factor to watch creating the TIFF is any loss of accuracy that result from compressing images into the PDF. Of course, any of the PDF specific features mentioned above will be lost.
To me, the best world is when you can scan to a TIFF, generate a PDF from it, and keep both in your content management system linked as different renditions. Then everyone is happy, and after, disk space is cheap! ;-)
You need to log in to rate blog posts.
Click here to login.
This post and comment(s) reflect the personal perspectives of community members, and not necessarily those of their employers or of AIIM International