Got the Picture? Securely?

Five Approaches for Distributing Imaging Services

Imaging, whose objective is to make information “locked up” on paper more transportable, manageable, or otherwise more useful to automated processes, has its own idiosyncrasies when it comes to sharing and transporting files. The objective we address in this article is how to quickly and securely transfer such work from site to site.

Who needs to do this? An organization which has the flexibility to locate certain functions in sites other than the image services center, assuming those sites are connected with a Wide Area Network (WAN). This may be needed to (1) leverage centralized pools of personnel, (2) take advantage of labor arbitrage (defined by Sourcingmag.com as “the financial benefit of buying a comparable service elsewhere to exploit the difference in pricing,”) or (3) outsource to a third party for a specific function.

For the purposes of this article, “imaging system” refers to the set of functions typically performed when processing documents, including scanning and document capture; image ingestion; image enhancement; document identification; quality assurance; character and barcode recognition; indexing (or key from image); data entry; character recognition repair and completion; and release of images and/or data to a downstream application. As short hand, I refer to scanning and related activities collectively as image capture. Indexing and related activities are referred to as data capture.

The five methods that follow can apply to both image capture and data capture, but the focus of this article is on data capture, i.e., key-from-image, data entry, or character recognition repair.

1. Thick Clients
“Thick clients” and “fat clients” are terms that apply to PCs/desktops. Historically, thick clients were considered necessary to deal with rich media, such as images. They were also good at connecting with the peripheral-rich environment of a scan center. Client-server architectures moved some of the common functions that were resident on desktop applications over to a shared server. Typically the server was a shared file system or database management system.

Either way, the amount of data being passed between the thick client and the server was not an issue, because the two would communicate over a local area network (LAN), which provided enough bandwidth that a programmer didn’t have to worry about the size of scanned documents or how frequent those communications were. But moving that same protocol to a WAN environment, where network speeds are an order of magnitude or two slower, is not practical. To make this clear, some product support documentation will state a minimum network bandwidth for successfully inter-networking imaging components. Depending on the number of people in the operation, not only is the network a bottleneck, but the support of all those desktops can be onerous as well. In other words, thick client configurations do not support a distributed data capture workforce.

2. File Transfer
Server-to-server (and, ultimately, application- to-application) file transfer has been around for decades. Utilities like file transfer protocol can reliably transfer documents to a receiving site over a WAN. If those applications do not have to be tightly connected or synchronized to complete their work, and they have the necessary smarts to integrate with the transfer utility, then file transfer can be used for getting work to a remote data entry office.

The turnaround time for completing document processing tasks usually allows for this kind of asynchronous approach. But while turnaround times are not the problem, the added infrastructure often is. This is evident when you consider the larger context in which file transfer exists. It usually involves two independent but interconnected systems: the first system (A) manages the image capture; the second (B) manages the data capture. After work is completed on the second system, the whole package may come back to system A, depending on which site (A or B) is releasing images and data. This gets the job done (especially the keying part) in a seemingly efficient manner, since both systems are essentially LAN-based, except for the link between them. But it also raises a number of issues, including weakened security from having multiple copies of documents, additional hardware and software to be maintained by IT administration, the need to track where a given piece of work is at any moment, and the need to ensure the receiving site is keeping pace with the sending site. This last issue–making work flow harmoniously between sites–usually results in having to use a vendor’s proprietary multi-site configuration.

Thus file-transfer based systems are an option for distributed processing, but support costs can erase any cost savings.

3. Thin Clients
Thin clients solve the problem found in thick clients by relying more on the server side of client-server. That dependence means lower data volumes traversing the network. Thin clients also have lower IT administration overhead; there’s less that can go wrong. They are also frequently browser-based, allowing for secure sockets layer connections. To make this work, the imaging server needs a Web server on the front end, which is usually not a big deal.

Yet thin clients are not commonly used for data capture. First, images still have to be sent to the thin client in order to be viewed. And while they don’t have to travel back to the server (the images haven’t been modified), if each page or document takes a few seconds to display, data entry personnel never achieve the rhythm and pace they have come to expect.

In addition, the images sent to the thin client are often cached there, making the desktop prone to privacy and security concerns. Finally, saying that the server does more of the work does not mean the transactions back and forth between client and server are negligible; e.g., the (scaleddown) programs on the desktop may need to validate field values against the server database. A lot of fields-per-form equates to a lot of database hits.

Thin client, though it may not be a commodity to be implemented on a large scale today, is promising. As vendors transform their applications to a Web services model, clients can be tailored to use the services needed for a particular line of business, rather than being compelled to use a onesize- fits-all interface.

4. Citrix solutions
Citrix Systems is a company providing remote desktop-related products. Its core products enable thin clients to access applications as if the desktop were really a thick client. This is accomplished by interposing Citrix servers (sometimes called a server farm) between the client and the server. The remote desktop user gets the advantages of using a thick client (namely, the rich functionality), because a thick client is actually involved–it is running on the Citrix farm and communicating with its server half as if it were on an actual physical desktop rather than a virtualized environment. Meanwhile, Citrix is transmitting screen shots of the screens the virtual thick client is emitting over to the remotely situated thin client. Screen shots of images are much smaller than the actual image file, so the all-important network impact is lessened.

Imaging applications fit nicely into this paradigm. Their thick clients generally behave well inside the virtual environment created by Citrix. The network connection between the Citrix farm and the thin client can be encrypted for security. Authentication into Citrix also enhances security. The fact that screen shots and not image files are being transmitted to remote sites addresses privacy and security of data. This also makes tracking simpler, since images and data do not get sent around in order to support remote keying. Many enterprises already have a Citrix farm deployed for serving up other client-server applications to distributed sites. If not, a small farm can be set up to service the imaging application, although this adds more responsibilities for the IT management team. The existing WAN’s latency also should be below a certain threshold or response time and the end user will suffer.

If the performance of the WAN permits, using Citrix is a sound approach to enabling remote indexing and data entry.

5. Hybrid configurations
Hybrid configurations (where two or more of the above options are combined) make sense in a number of ways: as a hedge against using only one technology, because there is a variety of work types, and as a mechanism for transitioning from one style of work to another.

But there are few cases where a design that has a mixture of protocols and other mechanisms is the best course of action in the long term. The heterogeneity can overburden support staff as they troubleshoot, scale, or otherwise maintain the system.

Summary
The five methods described above do not comprise an exhaustive list of solutions, but they frequently come to mind when an organization is attempting to accommodate a geographically distributed workforce. I have outlined the pros and cons of each, addressing security and privacy, administration, tracking, response time, functionality, and technology trends as the key selection criteria. The organization also has to factor in the people, process, and technology environments in which these methods are used, in order to make an informed decision.

John Klavanian works as technology leader for content management services at EDS, a Hewlett-Packard company. He can be reached at (248) 265-3345.