Q: To what extent is quality control (QC) validation by humans required for
results of automated data classification and data extraction to ensure
confidence in accuracy?
Riley: It's really hard to say, but what I can
say is that many high-volume organizations are only touching three percent to
five percent of their documents in a good implementation. The quality of your
scans, how you receive your documents, etc., have a huge impact on your OCR
accuracy and what it reports as its uncertainty (which are characters that will be
reviewed). Also, your business process will dictate if you look at one
particular field 100 percent of the time or if you only look at those characters
with uncertainty.
Q: What are the consequences of using a mix of non-normalized images?
Riley: They are two foldtwofold. First, you will be less accurate since you
won’t be able to finetune for a single input. But this should not discourage
you, and if the cost is high, it should not force you to normalize all images.
Only do it if easy. Second, it will take additional time to setup and test, as
you need to fine-tune for the biggest variance you can.
Q: Is there a data-capture technology that recognizes Chinese font? I am
working with several Asian factories to automate their A/P and A/R depts.
Riley: Yes. It's more difficult to find, and sometimes only in API
(application program interface) form. What you are looking for is support of
Chinese, Japanese, and Korean, and realistically I think the best option is to
look for technology in the form of an API where you can write a solution around
it.
Chris Riley (chris.riley@livinganalytics.com)
is founder of Living@nalytics (www.livinganalytics.com),
where he uses his in-depth knowledge of data capture technologies to advise
clients and proselytize the value of these tools. Chris was the featured speaker
for our March 5, 2009 webinar, "Tips and Tricks to Help You Automate your Office
Documents (for Effective Data Capture)". Listen at www.
aiim.org/webinararchive.