Discovery of electronic documents can be one of the most expensive aspects of a lawsuit in today’s information-glutted society if left undone until the 11th hour – to say nothing of the role it plays in obtaining a favorable judgment.
A key aspect of ediscovery is the creation of a data map to determine
precisely what information is available within an organization and where it
resides. This is a process that should begin long before a company ever finds
itself in court.
The phone rings. It is the general counsel. The organization may be sued
over patent infringement. Counsel knows that this could be “The Big One.” All
sorts of data, documents, metadata, emails, and other forms of information may
be required. Counsel asks IT: Do you have, or can you get together, a list of
all systems and the data they contain?” There is a long, silent pause on the
phone. Then the IT manager says “Well, we do have a list of systems. Let me send
it your way.” Counsel gets the list. It is nothing close to the data map it
needs. Instead it is a list of servers, their IP addresses, platform
configuration, and their physical rack location in the data center. Good
information for disaster recovery purposes, but not particularly helpful in
court.
“Well, this is the best I’ve got,” comes the retort from IT. “We do not have
a data map nor would we know how to create one – and, by the way, do you really
think we have the bandwidth to work on this now?”
Why You? The Challenge of Data Mapping
So who gets
stuck with the job? You do. You might argue that “IT manages all the
infrastructure and stuff, why couldn’t they just run an inventory on their
systems?” And IT will reply that “Well, we do manage the infrastructure, but we
know very little about the inputs, outputs, documents, records, and other
information on these applications. Go talk to the business side.” And you go to
business, and business will tell you that “I just use the system and click these
buttons on the screen. The system is a black box to me. I have no idea about all
of the underlying data, metadata, and data structures. I suggest you talk to the
operational folks.” And you talk to the operational folks, and they say, “What
are you talking about? We just execute business processes. Don’t ask us about
data and metadata. Go talk to the analyst who worked on the system design.” And
you look for the analyst, and you eventually learn that …“Oh, she was a
consultant and she left the project three years ago.”
The challenges are many but the data map must be created. And the job is
yours. So where do you start? By going back to the beginning … the very
beginning.
How Did We Get in Such a Mess?
Let’s take a look at a
typical mid-size organization. It has several thousand employees and contractors
with offices in the U.S. and E.U. The sheer volume of information that resides
in just one division is mind-boggling. New information sources keep popping up,
employees keep creating new SharePoint sites on their own, and there is use (or
misuse) of social collaboration tools, to say nothing of several hundred IT
systems in play at any point in time. Data is moved and migrated from one place
to another without proper documentation or communication, more and more tape
backups are being created, and some employees are making copies of data on thumb
drives or worse, emailing them to their personal email addresses.
How did things ever become such a mess? There are manifold reasons: IT is
traditionally kept at arm’s length on compliance and uninvolved with information
management and governance during systems design and development. Records
management departments, on the other hand, often institute sound policies and
retention schedules but have a tough time putting these into practice and
getting people to adhere to them. On the legal side, general counsels often work
against themselves: becoming increasingly exasperated over the large amount of
money spent on searching, processing, and producing electronically stored
information (ESI), they often push hard to cut costs, thereby shortcircuiting
the process.
Must an Organization Have a Data Map?
It may seem
surprising that even today, many successful organizations do not have a data
map, or at best, a superficial one. It is not that organizations are lacking in
will, however, but that the process seems too daunting. Consider the “typical
organization” above. If there are several hundred IT systems and other
home-grown business applications, one must not only know what these systems are,
and where they are located, but also the types of information (documents,
records, other content) that are produced from these systems and additional
information such as data format, data location, whether the data is updated by
other systems, or transformed into other formats, etc.
To add to the complexity, a determination also needs to be made as to whether
a piece of ESI can be extracted and presented using reasonable and customary
means. For example, if an IT system was retired and the data backed-up on tape,
it is reasonable to assume that extracting the tape, processing the information,
and presenting it in a readable format may not be easy since the underlying
version of the software no longer exists. Counsel, however, must be able to
assess whether this is indeed the case. Having a data map eases some of these
tasks and makes it easier for counsel to relate the information as needed.
Data Mapping Considerations
In the current economic
environment, companies are bracing themselves for an uptick in the number of
lawsuits. Whether the matter is related to regulators, customers, consumers,
employees, or business partners, companies are often required to provide ESI in
court. While this should be sine qua non for most organizations, many are simply
too overwhelmed to be able to react fast enough and are thus placing themselves
at a much greater risk. If that’s the case in your enterprise, here are some
initial steps that will help you move forward.
- Understand the prevailing legal
environment.
Organizations are not created equally, and not all have the same set of
applicable legal requirements. It is therefore important to analyze the type
of environment that the organization operates in, the jurisdiction it is
under, and the various federal and state laws, regulations, and common
industry standards that apply to it, with regard to ESI. While the contents of
a data map by itself do not directly correlate to a particular law or
regulation, it is useful to know what checks and controls need to be
established during the datamapping process and ensure that there are no
“show-stopper” questions in court around how the data map was created or what
the process was.
- Use a partnership model and obtain buy-in
from senior management.
It is important that each entity within an organization have a
vested stake in the success of any data-mapping project. This means that
management in each of these organizational fiefdoms must understand what a
data map is, how it will be used, and what the process of creating one is.
Getting buy-in from these senior managers is a crucial first step and must be
completed prior to the start of the process. Additionally, it is important
that people of the appropriate rank are selected to work on the project. Folks
who are deep in the weeds will generally have a lot more information about
data flows and how processes and people work together versus the senior
executive who operates in more of a decision-making capacity.
- There is little point in pursuing a
“big-bang” approach for the data map.
Instead, work towards a phased approach. Prioritize
which divisions or lines of business to focus on first and then address the
remaining ones later. Work with line managers to determine what, if any,
information has been collected on systems and processes within their
particular areas. Standard industry lists may be employed as a starting point,
e.g. HR, Accounting, Communications and Marketing, etc. Begin the first phase
of the process here and then iteratively build upon what’s already available.
- Use the right technology. As more capital is allocated towards automating
ediscovery, vendors will naturally gravitate towards building specialized
software for this mission. Time, cost, and relevancy of results will drive the
success of vendor products. While some organizations have attempted to build
custom tools, more and more prefer choosing established products or service
offerings to guide them through the rediscovery and data-mapping process.
Already many vendors have begun mapping their offerings to the electronic
discovery reference model (EDRM) and other industry standards. This market is
still maturing and organizations should not go out and immediately purchase a
top-rated vendor’s software without due consideration of the organization’s
unique circumstances.
Creating the Data Map
Once you’ve worked your way through
each of considerations above and taken action as needed, you’re ready to start
the actual data-mapping process. It is lengthy but well-defined and can be
broken down into each of the following steps:
The Data Mapping Process
- Get a list of all systems – and be prepared
for a few surprises.
Begin the process by creating a list of all systems that exist in
the company. This is easier said than done, as in many cases, IT does not even
have a full list of all systems. Sure, they usually have a list of systems,
but don’t take that as the final list! Due diligence involves talking to
business process owners, employees, and contractors, which often brings to
light hidden systems, utilities, and home-grown applications that were
unbeknownst to IT. Ensure that all types of systems are covered, e.g. physical
servers, virtual servers, networks, externally hosted systems, backups
(including tapes), archival systems, and desktops, etc. Pay special attention
to emails, instant messaging, core business systems, collaboration software,
and file shares, etc.
- Document system information. After the list of all
systems is known, gather as much information about each as possible. This
exercise can be performed with the help of system infrastructure teams,
application support teams, development teams, and business teams. Here are
some types of information that can be gathered: system name, description,
owner, platform type, location; is it a home grown-package, and does it store
both structured and unstructured data; system dependencies (i.e., what systems
are dependent on it and what systems does it depend on); business processes
supported, business criticality of the system, security and access controls,
format of data stored, format of data produced, reporting capabilities, how/
where the system is hosted; backup process and schedule, archival process and
schedule, whether data is purged or not; if purged, how often and what data
gets purged; how many users, is there external access allowed (outside of the
company firewall), are retention policies applied, what are the audit-trail
capabilities, what is the nature of data stored, e.g. confidential data,
nonpublic personal information, or still others.
- Get a list of business processes. Inventory the list
of business processes and map it to the system list obtained in the step above
to ensure that all the various types of ESI are documented. The list of
business processes is also useful during the discovery process, when one can
leverage the list to hone in on a particular type of ESI and obtain
information about how it was generated, who owned the data, how the data was
processed, how it was stored, and so on. A list of business processes can also
be useful when assessing information flows.
- Develop a list of roles, groups, and users
(custodians).
Obtain the organizational chart and determine the roles and
groups across the business and the business processes. Document the process
custodians and map out who had privileges to do what. Understand the human
actors in the information lifecycle flow.
- Document the information flow across the
entire organization.
Determine where critical pieces of information got initiated,
how the information was/is manipulated, what systems touch the information,
who processes the information, what systems depend on the information, and so
on. Understanding the flow of information is key to the data mapping/discovery
process.
- Determine how email is stored, processed, and
consumed.
Given the large percentage of business information and business
records that reside in email, special attention needs to be placed on email
ESI. Typically email is the first thing that opposing counsel go after, so
determining whether email retention and disposition policies are consistently
enforced will be key to proving good faith. There are a number of automated
tools that will enable you to create email maps, link threads of conversation,
heuristically perform relevancy search, extract underlying metadata, and so
on. Before deciding to buy the best-of-breed solution, however, perform due
diligence on existing email processes. Understand how employees are using
email. Are they creating local archives (.PST files), are they storing emails
on a network or a repository, are they disposing of them at the end of
retention periods, are they using personal emails to conduct official
business, and so on. Identify deficiencies and violations in email policies
before the opposing counsel does.
- Identify use of collaboration tools. SharePoint will
have the lion’s share of the collaboration space in many organizations, but
even then you must ensure that all other tools – whether they are social
networking tools, Web-based tools, or home-grown tools – are included in the
data-mapping process. You need to carefully document the types of information
being stored on each of these tools. Sometimes company information has a nasty
habit of being found in the most unlikely of places. Wherever possible work
with compliance, information management, or records management groups to
establish usage policies to prevent runaway viral growth of these tools. If
the organization already has thousands of unmanaged SharePoint sites, work
with IT and business to institute governance controls to prevent further
runaway growth.
- Don’t forget offsite storage. After inventorying and mapping all systems,
one would think the job is done. Alas, there is more work ahead. Offsite
storage is an often under-appreciated aspect of the discovery process. It is
quite reasonable to assume that there might be substantial evidence stored
offsite which might become incriminating at a later date. Offsite storage may
contain boxes or tapes full of records whose existence was somehow never
properly documented, with the result that they cannot be located unless
someone opens the box or attempts to recover the tape data. These records
continue to live well past their onsite cousins. This means the organization
continues to have the record in backup tapes (or paper) and other formats that
it purportedly claimed to have destroyed. The search for records in offsite
storage is made more complicated if the offsite storage process did not create
detailed indices about the contents. If there are tapes labeled “2007 Backup
Y: Drive,” then it may become quite an arduous task to determine what
information is really contained in those tapes. Nevertheless the journey must
be started. It could involve anything from a full-scale review of all tapes,
followed by reclassifying and re-filing the tapes, to perhaps a review of just
the offsite storage manifests. It could also involve a search for critical
information or a clean-up of the last three years’ worth of tapes, and so on.
Conclusion
In today’s highly litigious world, creating a
data map is one of the primary steps in responding to litigation requests. It is
vital that organizations get a solid foundation by focusing time, energy and
resources in doing it right – and creating it long before it’s needed.
Ganesh
Vednere is a content and records management consultant
with expertise in implementing enterprise-wide content and records management
programs, including program strategy and setup, policies and procedure
development, records retention research, and technology implementation. He has
more than 15 years of relevant industry experience in various business and
technology verticals.
| The Physics of the Data Map: |
The form and
format of data maps differ widely by industry type, organizational
size, geography, regulatory environment, business processes, and more. While
each organization's data map may look different,
there are several key elements
essential to any good data map:
- Looks Matter. How the data map
looks is key to its usability, relevance, and presentability. A good
data map will be organized either functionally or hierarchically with
various data points organized around key subject lines. Typically it
would consist of rows of data with columns of attributes for each data
set. The size of the map is entirely dependent upon the organization,
but at a minimum, each one should contain information about process,
systems, and people.
- A format that supports change.
Data maps are subject to frequent change and thus choosing a format that
allows updates to be made in a painless manner is critical. In the
initial stages significant volumes of data need to be entered, so start
with a format that supports quick data entry, such as Excel, and
subsequently migrate to a longer-term format that supports searching,
reporting, and quick retrieval, such as a database. Do not
overcomplicate either the form or the format. Bottom line: "Keep it
Simple."
- Emphasize the quality of
content. Data map designers tend to "over engineer" the
document and set themselves up for a process that involves gathering
numerous data values for each entry in the map. Instead, by honing in on
only those columns that truly add value to the document, the process of
collecting, collating and organizing the information for it becomes more
manageable. For each column in the data map, collect as much accurate
information as possible. For the "location" column, for instance,
enumerate both primary and secondary locations, if there is one. A
system may store the last 10 years of data online (primary storage
location) with legacy data archived in a data archival system, tape, or
offsite location. All locations should be reflected on the data map.
- Access and Storage. Data are
typically considered a "record" under record retention rules and
therefore all of the requirements of good records management would
apply. Unless explicitly prohibited, access to the data map can be
granted to various groups and roles within an organization. The
rationale is that the data map contains critical information that should
be accessible broadly rather than available only to so me individuals.
Most of these individuals, however, would get "read-only" access to
it. Accordingly, a view of the data map should be placed on a more
widely-accessible storage location while the data map itself can be
controlled via the appropriate database or file system controls.
- Maintaining the Data Map. Ensuring
that the data map stays accurate is vital to the relevance and long term viability of it. A cross-functional team comprised
of business, IT, and compliance that is sponsored
by legal should be setup to maintain it. A data map administrator who
performs the edits and controls access should also be established, and
an appropriate chain of custody should be established such that when the
data map administrator leaves the organization, the right handoffs take
place. Data map updates should generally be done on an annual basis, but
also in response to significant organizational events, as well as
compliance and regulatory changes, or revamping of IT systems and
processes. The update process should be a collaborative effort and not
just a "do we have to do this" exercise.
- Using the Data Map. One would think that once
created, the data map would be widely used and referenced by all
departments for various purposes. Surprisingly, this is not always the
case. The data map simply become a "checkbox" that gets relegated to a
paralegal in the litigation group. Why isn't business, IT, or compliance
using the data map, after all the time and effort spent creating it? The
answer may lie in the perception that the document is only for
“ediscovery†and not useful for day-to-day operations. While that
may be partially true, the data map is indeed a lot more versatile and
useful. It can be used for everything from IT portfolio rationalization
to IT asset management and business process improvement. It is therefore
incumbent
upon the data map team to undertake suitable efforts and means to publicize, communicate, and demonstrate how
it can be and is useful to various cross functions within the organization.
Ganesh Vednere is a recipient of AIIM's 2009
Distinguished Service Award. |