Bridging the gap between structured and unstructured information inches toward reality with new providers and new solutions
Hello,
structured data, meet....unstructured data! We know
this is an awkward first date,
but we’re so glad to finally be
getting you guys together. And while
this is jumping the gun a bit,
we really do hope you get married. One
day. When you’ve carefully sorted out all
your differences and learned how to live
together in peace and
harmony for the betterment of all.
Although comical, such a statement might well serve as metaphor for the
current state of attempts to bridge the gap between structured and unstructured
content.
Within the structured world there’s all the tidy core and transactional data
as well as business intelligence (BI) locked up in data warehouses and data
management systems. Then there’s the wild morass of unstructured data, in all
forms of documents and other electronic information, some of which is harnessed
in formal content management systems in the enterprise.
Integrating these two different worlds within the enterprise has only
recently come into the forefront as an endeavor untaken mostly by large
enterprises, says Boris Evelson, principal analyst with Forrester Research in
Cambridge, Mass. Globalization and mergers and acquisitions has certainly left
many enterprises with numerous BI, enterprise content management (ECM), and data
warehouse systems to sort out. “The first problem is how do I bring it all
together, but you’ve got to learn to walk before you can fly,” he says.
Even before the enterprise tackles the integration issue between the two
forms of data, there are other considerations around company intelligence
scattered throughout the enterprise. About 80 percent of corporate business
information is stored as unstructured content, according to Forrester. This
excludes content which may not be formally stored like e-mails, text messaging,
social media, unscanned paper, and the like.
Of course the quest to eliminate silos of information pays off with the
ability to make better, faster business decisions, stay competitive, and comply
in regulated environments, among other survival skills required in business
today. In other words, people want and need access to business information
regardless of where it lives within the enterprise.
Yes, but . . .
The basics to integrating structured and unstructured data
start with clean and integrated data. “Before I can implement the technology, I
have to do all the steps of integrating, cleansing, and matching the data. Once
in the right place and shaped in the right form, then you can proceed in
unifying it,” Evelson says. Figuring out the data model is fundamental. “The
data model is only as good as my technology’s interpretation of the business
requirements. Number two, business requirements only change as fast as you can
keep up with the data model.”
While many organizations and institutions have struggled with what Evelson
calls “old-fashioned convergence” for the last couple of decades, a new brand of
convergence is emerging that pulls together BI with search. But it’s more akin
to rocket science than not, Evelson says. “It’s pretty advanced. Without all the
advance work up front, you can’t have all the basic components you need.”
Search, BI, and content management technologies all come into play. New
breeds of solutions providers, often in combination with traditional ECM
providers, are helping to bridge the gap between structured and unstructured
data. “It’s huge,” says Ben Cody, vice president of product development at
Global 360, a process and document management solutions provider in Dallas,
Texas. “The worlds of content management and BI have historically been very
separate, on opposite ends of the spectrum, and the reality is, technology is
bringing them together.”
“Search and BI technologies have evolved from different starting points,
addressing different requirements and needs,” Elveson says. “They’ve evolved as
BI technologies are trying to embed search-like technologies into their engines
and search providers are integrating BI tools in their systems.” Search
platforms are putting in BI functions like data visualization and reporting,
while BI vendors are embedding simple-to-use search experiences in their
products.
In other words, search and BI are evolving into unified information analysis
and discovery. But the trick, Elveson says, is putting search and BI
functionality inside business processes. “Business intelligence and search are
only as good as the questions you ask. You have to ask the question in the
context of a business process.” Here’s a rundown of several providers
endeavoring to bridge the gap between structured and unstructured data.
Bringing raw text together in meaningful
ways
Bill Inmon, founder of
ForestRim Technology in Castle Rock, Colo., a software company for textual ETL,
says to start with integration of the data. “Before you do search you have to do
the integration, the process of taking text and unifying it so that common
terminology is recognized and treated in the text,” he says, noting there’s a
very subtle and important difference between searching text and analyzing it.
“Search is a superficial technology because before data can be meaningfully
analyzed, it has to be integrated.”
Much of ForestRim’s work, some of which is patented, is in the area of
wrangling mismatched unstructured textual data that is the corporate
environment. “This is anything that’s locked up in the enterprise system. There
are literally hundreds of forms of this type of data from emails, contracts, and
reports to medical records and insurance policies.” ForestRim puts text through
a rigorous series of a few dozen steps, such as synonym resolution, homographic
resolution, variable pattern recognition, and the like.
In short, Inmon says, much of the convergence problem is tied to the ability,
or lack thereof, to free up important textual information that’s locked up in
ECM systems and make it available for textual, analytical processing. One
example is in the oil and gas industry, where the company is assisting in taking
documents in various languages and building a database for the information in
another language to create the largest searchable library in the world on a
specialty topic. ForestRim also serves companies in the healthcare, insurance,
and manufacturing fields.
Mind your metadata
Managing content and the streams of data within the
organization are the realm of Bluenog, an enterprise software provider based in
open source in Piscataway, N.J. Metadata management goes a long way in
addressing the convergence problem, says Suresh Kuppusamy, Bluenog co-founder
and CEO. “The only way the structured and the unstructured will converge is
through metadata management.” The proper tagging of information, whether manual
or automatic, is a great first step toward finding critical information in the
enterprise.
Then it’s a matter of consolidating data into a uniform search and access
mechanism, Kuppusamy says. “Everything that disseminates from an organization,
whether [it’s a] portal, wikis, blogs, reports, documents that have been scanned
in, or BI—all of that needs to have a common basis of storage and retrieval.”
Bluenog’s ICE solution tightly integrates ECM, enterprise portal, and BI
functionality, bypassing the need for deployments of software across these
separate functions.
With an aim toward serving employees, vendors, and the customer with a common
infrastructure, the Center for International Earth Science Information Network
(CIESIN), part of Columbia University’s Earth Institute, is in the process of
implementing the Bluenog ICE platform. “CIESIN manages several disparate
websites that repurpose similar content, but with a very different look and feel
in each instance. Prior to deploying ICE, CIESIN’s marketing team would create
content for the different sites, and render it to IT personnel, who were
responsible for putting the content into a database in a particular schema for
presentation on the website,” Kuppusamy says.
Now the CIESIN marketing team and business staff can bypass IT involvement
and get the benefit of automating and modifying content in real-time. “Content
is only updated once, and the information is repurposed on several different
websites simultaneously,” Kuppusamy says. CIESIN created rich content types in
Bluenog CMS to give editors and authors access to metadata as well as the
content, which gives even more control to the business staff with respect to the
distribution and the look and feel of the content. Next, CIESIN will focus on
the look and feel of the portal, and options include Bluenog RichPortal. CIESIN
will also look to business intelligence solutions within the ICE platform.
A 360 degree view
At Global 360, Cody notes that the quest for a single
view of the customer, employee performance management, and workflow are just a
few of the reasons why corporations are thirsting for convergence of structured
and unstructured data. “We’ve made a big investment into a pre-built datamart
for process and document management. Our customers want to see data at all
levels,” Cody says.
For example, knowledge workers, each with their own specialty, need granular
views and workflow, while managers need to do things like monitor performance,
see real-time performance data, and reassign work. “Everyone wants and needs
their own point of view,” Cody says. Global 360 enterprise solutions, drawing on
the company’s Viewpoint technology, bring together content and document
management, BI, and reporting and analytics from the data warehouse, bypassing
the need for mining individual databases.
Among other functions, the Global 360 enterprise solution queues work for the
employee. “The system figures out what they need to do next and—increasingly—
people want to have a view of their individual performance, such as how they’re
doing relative to their performance goals,” Cody says. And one timely driver of
the solution lies in big business with many companies growing exponentially by
mergers and acquisitions, as seen in the financial crises fallout with the
banking industry. “A single view of the customer allows for upselling and cross
selling, working together to give a virtual view of the customer.”
Smarter searching and alerting
Like Global 360, solutions by Attivio also
push information out to workers drawn from various repositories. Attivio, based
in Newtonville, Mass., is an enterprise search and BI solutions provider which
has patented its technology. It uses “the precision of SQL and the fuzziness of
search,” to deliver enterprise information in dashboard form, says Sid
Probstein, chief technology officer. “You get the performance of search and the
ability to represent and model structured data exactly the way a relational
database does, without the weaknesses of a traditional search engine that
doesn’t deal with related data.”
Unlike relational databases, which only allow one schema, Probstein says “Our
Active Intelligence Engine is a unified information access platform that happily
accepts data from all types of systems, allowing far less effort to bring
unstructured and structured data into an engine.” A range of industries use the
solution for portals, fraud detection, compliance, and customer service
solutions, among other applications.
Probstein says heavily regulated industries as well as large businesses use
Attivio solutions to mine company and customer intelligence.
Alerting functionality is built right into its Active Intelligence Engine. He
uses the analogy of water in the form of a pond to describe legacy enterprise
search, while active alerting can be thought of as an aqueduct or pipe for
search that pumps appropriate information to all other systems that need it.
Queries can be saved as alerts, among other sophisticated features.
Among other whiz-bang feats, and similar to technologies offered by companies
such as FAST Search (recently acquired by Microsoft), Autonomy, and Endeca,
Attivio also offers guided exploration and analytics to deliver search results
which are aggregated by attributes, allowing workers to explore all available
data and content. Attivio supports a wide range of CMS systems out of the box,
Probstein says.
What lies ahead?
Where is the quest to unify the structured with the
unstructured going? According to Elveson of Forrester, enterprises have only
scratched the surface of the structured versus unstructured conundrum. Elveson
says it will be another decade before providers make great advances in bridging
the gap. “Starting in about five years, we’ll start noticing something different
(as technologies evolve). Then it’s going to be game-changing,” he concludes.
Marcia Jedd is a Minneapolis-based marketing
consultant and writer. Her website is www.marciajedd.com.
Seven strategies to build a bridge between
structured and unstructured content
The experts consulted for this
article offer their own high-level
tips when considering strategy on
integrating structured and unstructured
data for the enterprise:
Think broadly in bringing together structured
and unstructured data and work through analytics
and content. Don't focus too much on the
point solution; think bigger. --Probstein
Approach the challenge one step at a time.
The world of text is an imprecise world: you'l get some of it right, some
of it wrong. --Inmon
Understand the business problem first and
then worry about the integration. Don't get
sucked into the tooling aspect of how you build
your enterprise applications. --Kuppusamy
Work with vendors
that understand the
new form of convergence. Many typical BI
vendors don't understand unstructured data.
Search vendors may not understand structured
data. -- Elveson
Always work with systems integrators and consultants
who are familiar with the business problem
and technologies. -- Elveson
Start small, achieve return on investment, and then
build it out into other areas. Don't just think
about a solution for one role. -- Cody
Be pragmatic. Avoid too many data warehouses. Don't
make the structured and unstructured convergence
more of a project than it needs to be
from the systems perspective. -- Cody
|