AIIM — The Enterprise Content Management Association

The source for solving your business content challenges.

SharePoint Micro Site

Automating Capture with Artificial Intelligence

Mar 23, 2010

 

Artificial Intelligence is an intriguing technology that is generally associated, by the un-initiated, with science fiction. Steven Spielberg’s film “AI” is a prime example. This heart-rending tale of a boy robot/computer anxious to feel real love raises the question of a computer’s ability to reach beyond its basic programming and learn new responses.

In the Enterprise Content Management (ECM) arena, however, artificial intelligence is a non-fiction, key influencer in the development of solutions that reduce the time and costs associated with identifying, sorting and extracting information from vast amounts of data.

Take for instance a global financial institution that receives documents from around the world in native languages. These documents may be originals, photocopies, faxes, faxes of faxes, or even printed from a home computer. The firm wants to scan all the paper it receives and then have the computer automatically recognize, sort, extract and tag the documents for later search and retrieval to successfully manage its business through a digital mailroom.

What if you run a company that’s involved with port security? Your team investigates a suspect cargo and comes upon piles of discarded, damaged documents that are not in a recognizable language and seem indecipherable. These documents need to be salvaged, translated, classified and analyzed for possible security implications.

Or, imagine that you work in a library in a Middle-Eastern country that needs to capture both current and historical documents to create an on-line repository that will aid researchers in developing cultural, social and historical theses. Highly degraded newspapers, ancient manuscripts, books and other documents in varying languages need to be scanned, translated and classified so that previously unshared knowledge can be made available for the greater good.

During the past decade, many ECM capture vendors have attempted to use traditional classification and OCR technologies to solve these types of customer needs. Unfortunately, these technologies have not offered operational or cost-effective solutions. Typical Intelligent Document Recognition (IDR) technologies are difficult to train and require a significant investment in professional recognition training services during installation. These technologies also have difficulty recognizing pages whose contents have been scaled or shifted. Traditional OCR products are unable to extract information from low-resolution or degraded documents and cannot automatically identify the language on a page so that it can be extracted without human intervention.

Whether for business, security or academic reasons, the above scenarios share a common theme: the need to identify, sort and extract information from varying and sometimes challenging data sources. Fortunately there are now advanced optical OCR and IDR technologies that utilize advanced A.I. and other scientific methods to overcome the problems of degradation and difficult-to-decipher languages. Today we’re seeing significant new advances in capture technology that draw on rich scientific legacies. Scientific discoveries in statistical modeling, pattern recognition and image processing have created a new generation of document capture software that is able to:

  • Automatically clean degraded pages
  • Automatically orient pages that are rotated
  • Recognize pages whose content is shifted or scaled 
  • Quickly find information stored on a page and detect its language for conversion into computer text 
  • Automate document recognition training, which eliminates the need for high-cost recognition training services and increases accuracy rates
  • Simplify its integration into third-party solutions

For any organization that deals in large volumes of paper documents, these advances in OCR and IDR give today’s capture software the ability to identify and sort structured, semi-structured, and unstructured documents and automatically route them to the appropriate person or department within an enterprise. Creating a truly cost-effective digital mailroom is no longer science fiction, but reality.

David Rock is President and CEO of NovoDynamics, Inc. Based in Ann Arbor, Mich., NovoDynamics is a leader in providing advanced software technology solutions for organizations seeking powerful, reliable and accurate ways to extract and leverage information from critical data sources. Visit online at www.novodynamics.com.

Preferred Solution Providers