Which is better: TIFF or PDF? - Revised

Community Topic(s):

Keywords: PDF, TIFF, Comparison

Current Rating:
(0 ratings)

(After posting this last week, I got an email informing me that I had a few facts wrong.  So I am posting a corrected version.)

Several recent online discussions re-raised the age old question in imaging: which format is better for storing your scanned documents – TIFF or PDF?  This is yet another example of technology religious differences (Wintel versus Macintosh, UNIX versus Windows, Database products, etc.)

Both specifications originate by way of Adobe:

  • They both have been used as the basis for international standards.
    (TIFF in ISO 12369 and 12234-2, PDF in ISO 32000, 19005, 24517, 15930 and 14289)
  • There are no copyright or patent licensing restrictions to prevent someone from implementing or using them. 
  • Both have mechanisms for custom extension.

Each of these formats has their champions, and regularly the technical forums host vigorous discussions about which is better.  Understanding their value requires looking at their original purposes and functionality.

TIFF (previously known as Tag Image File Format) was created to be a standard output format from imaging capture systems.  It is a wrapper for a number of image encoding and compression techniques, ranging from lossy schemes like JPEG and other pixel-based approaches.  The most commonly used format for document imaging is bi-tonal raster data, compressed using the ITU-T (previously known as CCITT) Group 4 facsimile 2-dimensional scheme.  Most recognition tools expect (or at least prefer) this TIFF format as input. 

PDF (Portable Document Format) is famous for delivering a consistent view of a document on any platform.  Like TIFF, there are a number of different data encoding and compression approaches available within the specification.  PDF provides for document tagging (internally stored indexing information) and electronic signing and security.  PDFs made from scanned documents may use Group 4 compression (B/W) or a JPEG compression (color) scheme to hold the image.  As an option, these PDFs can also include a version of the document’s text, so the PDF can be content indexed and searched.  A useful PDF option is web optimization, which involves placing all of information necessary to display a page in a contiguous area, thereby eliminating the need to read or download the entire file to do that.  An archival specification, PDF/A has been ISO approved; by limiting the choice of features, it simplifies the user's life.

So which format do you choose to keep your documents in?  First, recognize that with all of the options available in both formats, one needs to be careful about the statements and comparisons that you make. 

I think it is hard to deny that PDFs are the best and easiest format to use when you are sharing documents over the Internet.  The viewer and browser plug-in are freely available.  Since you don’t know the environment of the receiver, it does guarantee that the document will look the same no matter what.  The security features and the ability add internal document information such as the author and abstract are really useful.

AIIM ARP 1-2009 cautions that TIFF implementations can have custom tags and headers, as well as reminding the reader that the lack of mechanisms for securing and authenticating can make TIFFs unreliable.  (Note that PDF also has extensions beyond the official specification.)  TIFFs can be edited it points out.  These issues are true, but can be addressed through applying proper controls on the creation and management of the image files.  Most records management systems will prevent improper access and any modification. 

If you are planning to later OCR documents, I think that TIFF is a better choice (others differ).

In the end, I think you have to consider how you are going to use the document.  Are you writing it to a WORM media for preservation?  Are you planning to share information on the web?

It is easy to go from a TIFF to a PDF, and there are tools to do the reverse.  One factor to watch creating the TIFF is any loss of accuracy that result from compressing images into the PDF.  Of course, any of the PDF specific features mentioned above will be lost.

To me, the best world is when you can scan to a TIFF, generate a PDF from it, and keep both in your content management system linked as different renditions.  Then everyone is happy, and after, disk space is cheap! ;-)

Report

Rate Post

You need to log in to rate blog posts. Click here to login.

Add a Comment

You need to log in to post messages. Click here to login.

Comments

Chris Riley, ECMp, IOAp

Bernard,

Great post. And this topic comes up all the time. I would like to point out that what step in the capture process you are referring to be VERY important. What the resulting format is, can be different than the format used during imaging. In fact, it should be. Because all (top 4 commercial, and top 2 open source) the OCR engines will convert whatever you give it to TIFF Group 4 for recognition no matter what. Thus you are best off giving it a TIFF Group 4 to start. The benefit of this is fewer conversion processes, which increase performance, as well as less risk of losing information, which would happen when it converts a PDF ( Compress JPEG most likely ) to TIFF Group 4 for OCR. Even if the desired output is Color PDF the scan should be done as Color TIFF, at which point the OCR engine will convert it to TIFF Group 4 and at export reverts back to the color TIFF to make a Color PDF with a text layer. Because capture products output all sorts of formats, and now with scanners supporting dual-stream, the initial scan should always be TIFF.

As a final output the most common is PDFs, but it all depends on the business requirements. There is also the possibility of Layered TIFF. In reality the PDF is just a glorified Layered TIFF, but more widely accepted and supported.

Report
Was this helpful? Yes No
Reply
Abhijit Kulkarni

Hi Chris,
Definately TIFF will be more useful if OCR to be implmented. Can we have good OCR on PDFs . As metioned in article, PDFs are definately better option as end product of Capture from security perspective and presentation which is lacking in TIFF.
Report
Was this helpful? Yes No
Reply

Chris Riley, ECMp, IOAp

Abhijit,

An end result as a PDF is excellent, really what I'm calling out is the image that enters OCR. A PDF will have some compression image layer be it TIFF or JPEG. When the OCR engine sees this is does another conversion process of the compressed TIFF or JPEG to TIFF Group 4. It OCRs this image, when it's done, converts it back to a searchable PDF. You can see that these number of conversions, and the fact that the PDF is compressed already, meaning it does not have all pertinent data, reduces OCR accuracy. You certainly can have good OCR on a 300 DPI 80% or higher quality compressed JPEG PDF, but bottom line is the absolute most accurate would be TIFF group 4 becuase all engines work on this file format no matter what you give it.
Report
Was this helpful? Yes No
Reply

This post and comment(s) reflect the personal perspectives of community members, and not necessarily those of their employers or of AIIM International