Imaging, whose objective is to make information “locked up” on paper more
transportable, manageable, or otherwise more useful to automated processes, has
its own idiosyncrasies when it comes to sharing and transporting files. The
objective we address in this article is how to quickly and securely transfer
such work from site to site.
Who needs to do this? An organization which has the flexibility to locate
certain functions in sites other than the image services center, assuming those
sites are connected with a Wide Area Network (WAN). This may be needed to
(1) leverage centralized pools of personnel,
(2) take advantage of labor arbitrage (defined by
Sourcingmag.com as “the financial benefit of buying a comparable service
elsewhere to exploit the difference in pricing,”) or (3)
outsource to a third party for a specific function.
For the purposes of this article, “imaging system” refers to the set of
functions typically performed when processing documents, including scanning and
document capture; image ingestion; image enhancement; document identification;
quality assurance; character and barcode recognition; indexing (or key from
image); data entry; character recognition repair and completion; and release of
images and/or data to a downstream application. As short hand, I refer to
scanning and related activities collectively as image capture. Indexing and
related activities are referred to as data capture.
The five methods that follow can apply to both image capture and data
capture, but the focus of this article is on data capture, i.e., key-from-image,
data entry, or character recognition repair.
1. Thick Clients
“Thick clients” and “fat clients” are terms that apply
to PCs/desktops. Historically, thick clients were considered necessary to deal
with rich media, such as images. They were also good at connecting with the
peripheral-rich environment of a scan center. Client-server architectures moved
some of the common functions that were resident on desktop applications over to
a shared server. Typically the server was a shared file system or database
management system.
Either way, the amount of data being passed between the thick client and the
server was not an issue, because the two would communicate over a local area
network (LAN), which provided enough bandwidth that a programmer didn’t have to
worry about the size of scanned documents or how frequent those communications
were. But moving that same protocol to a WAN environment, where network speeds
are an order of magnitude or two slower, is not practical. To make this clear,
some product support documentation will state a minimum network bandwidth for
successfully inter-networking imaging components. Depending on the number of
people in the operation, not only is the network a bottleneck, but the support
of all those desktops can be onerous as well. In other words, thick client
configurations do not support a distributed data capture workforce.
2. File Transfer
Server-to-server (and, ultimately, application-
to-application) file transfer has been around for decades. Utilities like file
transfer protocol can reliably transfer documents to a receiving site over a
WAN. If those applications do not have to be tightly connected or synchronized
to complete their work, and they have the necessary smarts to integrate with the
transfer utility, then file transfer can be used for getting work to a remote
data entry office.
The turnaround time for completing document processing tasks usually allows
for this kind of asynchronous approach. But while turnaround times are not the
problem, the added infrastructure often is. This is evident when you consider
the larger context in which file transfer exists. It usually involves two
independent but interconnected systems: the first system (A) manages the image
capture; the second (B) manages the data capture. After work is completed on the
second system, the whole package may come back to system A, depending on which
site (A or B) is releasing images and data. This gets the job done (especially
the keying part) in a seemingly efficient manner, since both systems are
essentially LAN-based, except for the link between them. But it also raises a
number of issues, including weakened security from having multiple copies of
documents, additional hardware and software to be maintained by IT
administration, the need to track where a given piece of work is at any moment,
and the need to ensure the receiving site is keeping pace with the sending site.
This last issue–making work flow harmoniously between sites–usually results in
having to use a vendor’s proprietary multi-site configuration.
Thus file-transfer based systems are an option for distributed processing,
but support costs can erase any cost savings.
3. Thin Clients
Thin clients solve the problem found in thick clients by
relying more on the server side of client-server. That dependence means lower
data volumes traversing the network. Thin clients also have lower IT
administration overhead; there’s less that can go wrong. They are also
frequently browser-based, allowing for secure sockets layer connections. To make
this work, the imaging server needs a Web server on the front end, which is
usually not a big deal.
Yet thin clients are not commonly used for data capture. First, images still
have to be sent to the thin client in order to be viewed. And while they don’t
have to travel back to the server (the images haven’t been modified), if each
page or document takes a few seconds to display, data entry personnel never
achieve the rhythm and pace they have come to expect.
In addition, the images sent to the thin client are often cached there,
making the desktop prone to privacy and security concerns. Finally, saying that
the server does more of the work does not mean the transactions back and forth
between client and server are negligible; e.g., the (scaleddown) programs on the
desktop may need to validate field values against the server database. A lot of
fields-per-form equates to a lot of database hits.
Thin client, though it may not be a commodity to be implemented on a large
scale today, is promising. As vendors transform their applications to a Web
services model, clients can be tailored to use the services needed for a
particular line of business, rather than being compelled to use a onesize-
fits-all interface.
4. Citrix solutions
Citrix Systems is a company providing remote
desktop-related products. Its core products enable thin clients to access
applications as if the desktop were really a thick client. This is accomplished
by interposing Citrix servers (sometimes called a server farm) between the
client and the server. The remote desktop user gets the advantages of using a
thick client (namely, the rich functionality), because a thick client is
actually involved–it is running on the Citrix farm and communicating with its
server half as if it were on an actual physical desktop rather than a
virtualized environment. Meanwhile, Citrix is transmitting screen shots of the
screens the virtual thick client is emitting over to the remotely situated thin
client. Screen shots of images are much smaller than the actual image file, so
the all-important network impact is lessened.
Imaging applications fit nicely into this paradigm. Their thick clients
generally behave well inside the virtual environment created by Citrix. The
network connection between the Citrix farm and the thin client can be encrypted
for security. Authentication into Citrix also enhances security. The fact that
screen shots and not image files are being transmitted to remote sites addresses
privacy and security of data. This also makes tracking simpler, since images and
data do not get sent around in order to support remote keying. Many enterprises
already have a Citrix farm deployed for serving up other client-server
applications to distributed sites. If not, a small farm can be set up to service
the imaging application, although this adds more responsibilities for the IT
management team. The existing WAN’s latency also should be below a certain
threshold or response time and the end user will suffer.
If the performance of the WAN permits, using Citrix is a sound approach to
enabling remote indexing and data entry.
5. Hybrid configurations
Hybrid configurations (where two or more of the
above options are combined) make sense in a number of ways: as a hedge against
using only one technology, because there is a variety of work types, and as a
mechanism for transitioning from one style of work to another.
But there are few cases where a design that has a mixture of protocols and
other mechanisms is the best course of action in the long term. The
heterogeneity can overburden support staff as they troubleshoot, scale, or
otherwise maintain the system.
Summary
The five methods described above do not comprise an exhaustive
list of solutions, but they frequently come to mind when an organization is
attempting to accommodate a geographically distributed workforce. I have
outlined the pros and cons of each, addressing security and privacy,
administration, tracking, response time, functionality, and technology trends as
the key selection criteria. The organization also has to factor in the people,
process, and technology environments in which these methods are used, in order
to make an informed decision.
John Klavanian
works as technology leader for content management services at EDS, a Hewlett-Packard company. He
can be reached at (248) 265-3345.