Artificial Intelligence is an intriguing technology that is generally associated, by the un-initiated,
with science fiction. Steven Spielberg’s film “AI” is a prime example. This heart-rending tale
of a boy robot/computer anxious to feel real
love raises the question of a computer’s ability to reach beyond
its basic programming and learn new responses.
In the Enterprise Content Management (ECM) arena, however, artificial
intelligence is a non-fiction, key influencer in the development of solutions
that reduce the time and costs associated with identifying, sorting and
extracting information from vast amounts of data.
Take for instance a global financial institution that receives documents from
around the world in native languages. These documents may be originals,
photocopies, faxes, faxes of faxes, or even printed from a home computer. The
firm wants to scan all the paper it receives and then have the computer
automatically recognize, sort, extract and tag the documents for later search
and retrieval to successfully manage its business through a digital mailroom.
What if you run a company that’s involved with port security? Your team
investigates a suspect cargo and comes upon piles of discarded, damaged
documents that are not in a recognizable language and seem indecipherable. These
documents need to be salvaged, translated, classified and analyzed for possible
security implications.
Or, imagine that you work in a library in a Middle-Eastern country that needs
to capture both current and historical documents to create an on-line repository
that will aid researchers in developing cultural, social and historical theses.
Highly degraded newspapers, ancient manuscripts, books and other documents in
varying languages need to be scanned, translated and classified so that
previously unshared knowledge can be made available for the greater good.
During the past decade, many ECM capture vendors have attempted to use
traditional classification and OCR technologies to solve these types of customer
needs. Unfortunately, these technologies have not offered operational or
cost-effective solutions. Typical Intelligent Document Recognition (IDR)
technologies are difficult to train and require a significant investment in
professional recognition training services during installation. These
technologies also have difficulty recognizing pages whose contents have been
scaled or shifted. Traditional OCR products are unable to extract information
from low-resolution or degraded documents and cannot automatically identify the
language on a page so that it can be extracted without human intervention.
Whether for business, security or academic reasons, the above scenarios share
a common theme: the need to identify, sort and extract information from varying
and sometimes challenging data sources. Fortunately there are now advanced
optical OCR and IDR technologies that utilize advanced A.I. and other scientific
methods to overcome the problems of degradation and difficult-to-decipher
languages. Today we’re seeing significant new advances in capture technology
that draw on rich scientific legacies. Scientific discoveries in statistical
modeling, pattern recognition and image processing have created a new generation
of document capture software that is able to:
- Automatically clean degraded pages
- Automatically orient pages that are rotated
- Recognize pages whose content is shifted or
scaled
- Quickly find information stored on a page and detect
its language for conversion into computer text
- Automate document recognition training, which
eliminates the need for high-cost recognition training services and increases
accuracy rates
- Simplify its integration into third-party solutions
For any organization that deals in large volumes of paper documents, these
advances in OCR and IDR give today’s capture software the ability to identify
and sort structured, semi-structured, and unstructured documents and
automatically route them to the appropriate person or department within an
enterprise. Creating a truly cost-effective digital mailroom is no longer
science fiction, but reality.
David Rock is President and CEO of NovoDynamics, Inc. Based in Ann
Arbor, Mich., NovoDynamics is a leader in providing advanced software technology
solutions for organizations seeking powerful, reliable and accurate ways to
extract and leverage information from critical data sources. Visit online at www.novodynamics.com.