TIFF, JPEG, and PDF. Different formats for different jobs.
When one thinks of enterprise content
management (ECM), one
tends to focus more on the management
of content in an organization. As
AIIM defines ECM, there is a whole lot more
to it that includes the technologies used to
capture, manage, store, preserve, and deliver
content and documents.
Even though a majority of the information
organizations handle today is born digitally
and stays that way, a lot of information still
enters organizations as paper. While paper
continues to be useful in business, digital
information enables an organization to
respond more rapidly to changing circumstances.
In the late 1980s, the buzz in the
business world circulated around digital
document imaging. It was the point solution
many organizations implemented to gain
some level of control over their documents.
As recently as 10 years ago (according to
AIIM research), industry solutions were typically
departmental in scope with a particular
business focus. As our survey indicates,
82% of end users now see ECM technologies—
imaging is one—as a core element in
their overall IT infrastructure.* Today, organizations
don’t think twice about imaging as
imaging technology is integrated in most
products. Image viewers are prevalent.
Image file format standards have helped
move along the widespread adoption of
imaging technology. Image file formats provide
a standardized method of organizing
and storing image data. A scanned document
or image consists of picture elements,
or pixels, that represent the brightness and
color of the information on the page. While
there are numerous graphic and image file
formats, this article will look at TIFF, JPEG,
and PDF; three of the standards used in
document imaging.
Compression of Image Files
When considering imaging file formats, one
needs to have a basic understanding of
compression. A single compression method
is not applicable for all scanned documents.
When choosing the best method, one must
consider the type of document that will be
scanned.
Compression scheme is the method used
to reduce the amount of data needed to
store or transmit a representation of an
image. Compression is lossless when the
data is compressed by efficient coding of the
information in the image and where the
reconstructed image contains the same
amount of information. In lossy compression,
images are compressed by selectively
removing information from the image. This
does not mean that words, phrases, or sentences
are removed. Through complex algorithms
statistically redundant information as
well as perceptually irrelevant or unimportant
information is removed leaving only the useful
information. ANSI/AIIM TR 33, Selecting
an Appropriate Image Compression Method
to Match User Requirements provides an
explanation of compression algorithms and
useful information in selecting the best compression
algorithm for your application.
TIFF
TIFF, Tagged Image File Format, is used
mainly for storing raster images, including
photographs and line art, and is largely
credited with founding the imaging industry.
Aldus is credited with developing TIFF for
use with PostScript printing. It is now widely
used for images along with JPEG. TIFF’s
primary goal is to provide a rich environment
within which applications can
exchange image data. This richness is
required to take advantage of the varying
capabilities of scanners and other imaging
devices.
TIFF uses tags to handle multiple images
and data in a single file. These tags describe
the size of the image or define how the
image data is arranged and identifies the
compression algorithm, if any, that is used.
Images created using TIFF can be used for
archiving purposes because TIFF is a lossless
format, i.e., the file may be edited and
saved without losing any compression.
In document management, TIFF is used
in conjunction with CCITT Group IV compression
(typically used with facsimile technology).
Usually black and white documents
are captured using TIFF; however, color may
also be used. In large volume applications,
documents are typically scanned in black
and white, rather than color or grayscale to
conserve on the file size. Because TIFF supports
multiple pages, a multi-page document
can be scanned to a single file rather
than an individual file for each page
scanned.
JPEG
JPEG (pronounced jay-peg; Joint
Photographic Experts Group) is a lossy
compression format for photographic
images. It is designed for use with either full
color or gray-scale images. JPEG is best
when used with photographs rather than
text. JPEG specifies how an image is transformed
into a stream of bytes, but not how
those bytes are encapsulated in any particular
storage medium. JFIF (JPEG File
Interchange Format), created by the
Independent JPEG Group, specifies how to
produce a file suitable for computer storage
and transmission over the Internet from a
JPEG stream.
JPEG/JFIF is commonly used to store
and transmit photographs over the Internet.
It is not suitable for use with line drawings or
text because its compression method does
not perform well with these types of images.
PNG and GIF are used in these instances.
JPEG is best used with photographs and
paintings of realistic scenes with smooth
variations of tone and color. In many cases,
JPEG will produce a much higher quality
image than other common methods.
With the increasing use of multimedia
technologies, image compression requires
higher performance and new features.
JPEG 2000 is intended to advance standardized
image coding systems to serve
applications for years to come. JPEG 2000
is a new image format based on state-ofthe-
art wavelet compression. It is applicable
for a number of different applications in the
digital imaging market including digital cameras,
pre-press, medical imaging, and others.
JPEG 2000, Part 1 (ISO 15444) offers
both lossless and lossy compression and
provides better image quality at smaller file
sizes than JPEG. JPEG 2000, Part 2 (ISO
15444/6) is used to compress scanned
color documents containing both bitonal
elements as well as images.
The development of JPEG 2000 is the
result of collaboration between the
International Organization for Standardization
(ISO), the International Telecommunications
Union (ITU-T, formerly CCITT), and input from
a multitude of industry experts.
PDF
The final file format to be discussed is PDF,
Portable Document Format. Did you know
that there are over 500 PDF product suppliers?
PDF is a file format developed by
Adobe Systems for representing documents
in a manner that is independent of the original
application software, hardware, and
operating system used to create those documents.
PDF is an open standard and anyone
may write applications, royalty free, that
can read or write a PDF document. A PDF
document is a self-contained, cross-platform
document. It is a file that will look the
same on the screen and in print, regardless
of what kind of computer or printer someone
is using and regardless of what software
package was originally used to create
it. Although they contain the complete formatting
of the original document, including
fonts and images, PDF files are highly compressed,
allowing complex information to be
downloaded efficiently. PDF is the de facto
standard for secure, dependable electronic
information exchange that is widely recognized
by industries and governments
around the world.
In addition to being an open standard,
PDF is also flexible. A family of PDF standards
has either been produced or are in
the developmental stages. AIIM and NPES,
working with many records managers,
archivists, industry representatives, and
other PDF developers, completed work on
PDF/Archive or PDF/A (ISO 19005-1) that
will ensure the long-term preservation of
electronic documents. A second part to ISO
19005 is being developed to address digital
signatures, Open Type fonts, 3D graphics,
JPEG 2000, consistency with PDF/X,
PDF/E, and PDF/UA is currently being
developed. The digital pre-press industry
joined forces to develop the PDF/X standard
(ISO 15930) which defines methods for the
exchange of digital data within the graphic
arts industry and for the exchange of files
between graphic arts establishments.
PDF/X is predominantly used in the
exchange of advertisements for magazines.
In the developmental pipeline you will find
PDF/Engineering (PDF/E), which defines a
file format for the exchange of engineering
documents based on the PDF format for
various communities working with engineering
documentation. It is intended to improve
document exchange and collaboration within
engineering workflows both inside companies
and with their partners, suppliers,
customers, and others. PDF/UA
(PDF/Universal Access) will define a file format
to ensure that PDF documents are
accessible to those with disabilities. This
standard is in the early development phase.
There are a multitude of image file formats
to choose from. Whatever image file format
and compression that your organization
chooses is dependent on the application
you are using. It is important to take into
consideration the type of documents you
will be scanning, the graphical content contained
in the documents, and how they will
be used. The file format you select should
meet the intended use and be capable of
including the compression scheme you
choose.
--Betsy Fanning is AIIM’s
director, of content and standards development.
She welcomes any and all comments
regarding standards and/or the AIIM
standards program.