The Quest for eDiscovery: Creating a Data Map

Discovery of electronic documents can be one of the most expensive aspects of a lawsuit in today’s information-glutted society if left undone until the 11th hour – to say nothing of the role it plays in obtaining a favorable judgment.

 

A key aspect of ediscovery is the creation of a data map to determine precisely what information is available within an organization and where it resides. This is a process that should begin long before a company ever finds itself in court.

The phone rings. It is the general counsel. The organization may be sued over patent infringement. Counsel knows that this could be “The Big One.” All sorts of data, documents, metadata, emails, and other forms of information may be required. Counsel asks IT: Do you have, or can you get together, a list of all systems and the data they contain?” There is a long, silent pause on the phone. Then the IT manager says “Well, we do have a list of systems. Let me send it your way.” Counsel gets the list. It is nothing close to the data map it needs. Instead it is a list of servers, their IP addresses, platform configuration, and their physical rack location in the data center. Good information for disaster recovery purposes, but not particularly helpful in court.

“Well, this is the best I’ve got,” comes the retort from IT. “We do not have a data map nor would we know how to create one – and, by the way, do you really think we have the bandwidth to work on this now?”

Why You? The Challenge of Data Mapping 
So who gets stuck with the job? You do. You might argue that “IT manages all the infrastructure and stuff, why couldn’t they just run an inventory on their systems?” And IT will reply that “Well, we do manage the infrastructure, but we know very little about the inputs, outputs, documents, records, and other information on these applications. Go talk to the business side.” And you go to business, and business will tell you that “I just use the system and click these buttons on the screen. The system is a black box to me. I have no idea about all of the underlying data, metadata, and data structures. I suggest you talk to the operational folks.” And you talk to the operational folks, and they say, “What are you talking about? We just execute business processes. Don’t ask us about data and metadata. Go talk to the analyst who worked on the system design.” And you look for the analyst, and you eventually learn that …“Oh, she was a consultant and she left the project three years ago.”

The challenges are many but the data map must be created. And the job is yours. So where do you start? By going back to the beginning … the very beginning.

How Did We Get in Such a Mess?
Let’s take a look at a typical mid-size organization. It has several thousand employees and contractors with offices in the U.S. and E.U. The sheer volume of information that resides in just one division is mind-boggling. New information sources keep popping up, employees keep creating new SharePoint sites on their own, and there is use (or misuse) of social collaboration tools, to say nothing of several hundred IT systems in play at any point in time. Data is moved and migrated from one place to another without proper documentation or communication, more and more tape backups are being created, and some employees are making copies of data on thumb drives or worse, emailing them to their personal email addresses.

How did things ever become such a mess? There are manifold reasons: IT is traditionally kept at arm’s length on compliance and uninvolved with information management and governance during systems design and development. Records management departments, on the other hand, often institute sound policies and retention schedules but have a tough time putting these into practice and getting people to adhere to them. On the legal side, general counsels often work against themselves: becoming increasingly exasperated over the large amount of money spent on searching, processing, and producing electronically stored information (ESI), they often push hard to cut costs, thereby shortcircuiting the process.

Must an Organization Have a Data Map? 
It may seem surprising that even today, many successful organizations do not have a data map, or at best, a superficial one. It is not that organizations are lacking in will, however, but that the process seems too daunting. Consider the “typical organization” above. If there are several hundred IT systems and other home-grown business applications, one must not only know what these systems are, and where they are located, but also the types of information (documents, records, other content) that are produced from these systems and additional information such as data format, data location, whether the data is updated by other systems, or transformed into other formats, etc.

To add to the complexity, a determination also needs to be made as to whether a piece of ESI can be extracted and presented using reasonable and customary means. For example, if an IT system was retired and the data backed-up on tape, it is reasonable to assume that extracting the tape, processing the information, and presenting it in a readable format may not be easy since the underlying version of the software no longer exists. Counsel, however, must be able to assess whether this is indeed the case. Having a data map eases some of these tasks and makes it easier for counsel to relate the information as needed.

Data Mapping Considerations
In the current economic environment, companies are bracing themselves for an uptick in the number of lawsuits. Whether the matter is related to regulators, customers, consumers, employees, or business partners, companies are often required to provide ESI in court. While this should be sine qua non for most organizations, many are simply too overwhelmed to be able to react fast enough and are thus placing themselves at a much greater risk. If that’s the case in your enterprise, here are some initial steps that will help you move forward.

  1. Understand the prevailing legal environment. Organizations are not created equally, and not all have the same set of applicable legal requirements. It is therefore important to analyze the type of environment that the organization operates in, the jurisdiction it is under, and the various federal and state laws, regulations, and common industry standards that apply to it, with regard to ESI. While the contents of a data map by itself do not directly correlate to a particular law or regulation, it is useful to know what checks and controls need to be established during the datamapping process and ensure that there are no “show-stopper” questions in court around how the data map was created or what the process was.
  2. Use a partnership model and obtain buy-in from senior management. It is important that each entity within an organization have a vested stake in the success of any data-mapping project. This means that management in each of these organizational fiefdoms must understand what a data map is, how it will be used, and what the process of creating one is. Getting buy-in from these senior managers is a crucial first step and must be completed prior to the start of the process. Additionally, it is important that people of the appropriate rank are selected to work on the project. Folks who are deep in the weeds will generally have a lot more information about data flows and how processes and people work together versus the senior executive who operates in more of a decision-making capacity.
  3. There is little point in pursuing a “big-bang” approach for the data map. Instead, work towards a phased approach. Prioritize which divisions or lines of business to focus on first and then address the remaining ones later. Work with line managers to determine what, if any, information has been collected on systems and processes within their particular areas. Standard industry lists may be employed as a starting point, e.g. HR, Accounting, Communications and Marketing, etc. Begin the first phase of the process here and then iteratively build upon what’s already available.
  4. Use the right technology. As more capital is allocated towards automating ediscovery, vendors will naturally gravitate towards building specialized software for this mission. Time, cost, and relevancy of results will drive the success of vendor products. While some organizations have attempted to build custom tools, more and more prefer choosing established products or service offerings to guide them through the rediscovery and data-mapping process. Already many vendors have begun mapping their offerings to the electronic discovery reference model (EDRM) and other industry standards. This market is still maturing and organizations should not go out and immediately purchase a top-rated vendor’s software without due consideration of the organization’s unique circumstances.

Creating the Data Map
Once you’ve worked your way through each of considerations above and taken action as needed, you’re ready to start the actual data-mapping process. It is lengthy but well-defined and can be broken down into each of the following steps:

The Data Mapping Process

  1. Get a list of all systems – and be prepared for a few surprises. Begin the process by creating a list of all systems that exist in the company. This is easier said than done, as in many cases, IT does not even have a full list of all systems. Sure, they usually have a list of systems, but don’t take that as the final list! Due diligence involves talking to business process owners, employees, and contractors, which often brings to light hidden systems, utilities, and home-grown applications that were unbeknownst to IT. Ensure that all types of systems are covered, e.g. physical servers, virtual servers, networks, externally hosted systems, backups (including tapes), archival systems, and desktops, etc. Pay special attention to emails, instant messaging, core business systems, collaboration software, and file shares, etc.
  2. Document system information. After the list of all systems is known, gather as much information about each as possible. This exercise can be performed with the help of system infrastructure teams, application support teams, development teams, and business teams. Here are some types of information that can be gathered: system name, description, owner, platform type, location; is it a home grown-package, and does it store both structured and unstructured data; system dependencies (i.e., what systems are dependent on it and what systems does it depend on); business processes supported, business criticality of the system, security and access controls, format of data stored, format of data produced, reporting capabilities, how/ where the system is hosted; backup process and schedule, archival process and schedule, whether data is purged or not; if purged, how often and what data gets purged; how many users, is there external access allowed (outside of the company firewall), are retention policies applied, what are the audit-trail capabilities, what is the nature of data stored, e.g. confidential data, nonpublic personal information, or still others.
  3. Get a list of business processes. Inventory the list of business processes and map it to the system list obtained in the step above to ensure that all the various types of ESI are documented. The list of business processes is also useful during the discovery process, when one can leverage the list to hone in on a particular type of ESI and obtain information about how it was generated, who owned the data, how the data was processed, how it was stored, and so on. A list of business processes can also be useful when assessing information flows.
  4. Develop a list of roles, groups, and users (custodians). Obtain the organizational chart and determine the roles and groups across the business and the business processes. Document the process custodians and map out who had privileges to do what. Understand the human actors in the information lifecycle flow.
  5. Document the information flow across the entire organization. Determine where critical pieces of information got initiated, how the information was/is manipulated, what systems touch the information, who processes the information, what systems depend on the information, and so on. Understanding the flow of information is key to the data mapping/discovery process.
  6. Determine how email is stored, processed, and consumed. Given the large percentage of business information and business records that reside in email, special attention needs to be placed on email ESI. Typically email is the first thing that opposing counsel go after, so determining whether email retention and disposition policies are consistently enforced will be key to proving good faith. There are a number of automated tools that will enable you to create email maps, link threads of conversation, heuristically perform relevancy search, extract underlying metadata, and so on. Before deciding to buy the best-of-breed solution, however, perform due diligence on existing email processes. Understand how employees are using email. Are they creating local archives (.PST files), are they storing emails on a network or a repository, are they disposing of them at the end of retention periods, are they using personal emails to conduct official business, and so on. Identify deficiencies and violations in email policies before the opposing counsel does. 
  7. Identify use of collaboration tools. SharePoint will have the lion’s share of the collaboration space in many organizations, but even then you must ensure that all other tools – whether they are social networking tools, Web-based tools, or home-grown tools – are included in the data-mapping process. You need to carefully document the types of information being stored on each of these tools. Sometimes company information has a nasty habit of being found in the most unlikely of places. Wherever possible work with compliance, information management, or records management groups to establish usage policies to prevent runaway viral growth of these tools. If the organization already has thousands of unmanaged SharePoint sites, work with IT and business to institute governance controls to prevent further runaway growth.
  8. Don’t forget offsite storage. After inventorying and mapping all systems, one would think the job is done. Alas, there is more work ahead. Offsite storage is an often under-appreciated aspect of the discovery process. It is quite reasonable to assume that there might be substantial evidence stored offsite which might become incriminating at a later date. Offsite storage may contain boxes or tapes full of records whose existence was somehow never properly documented, with the result that they cannot be located unless someone opens the box or attempts to recover the tape data. These records continue to live well past their onsite cousins. This means the organization continues to have the record in backup tapes (or paper) and other formats that it purportedly claimed to have destroyed. The search for records in offsite storage is made more complicated if the offsite storage process did not create detailed indices about the contents. If there are tapes labeled “2007 Backup Y: Drive,” then it may become quite an arduous task to determine what information is really contained in those tapes. Nevertheless the journey must be started. It could involve anything from a full-scale review of all tapes, followed by reclassifying and re-filing the tapes, to perhaps a review of just the offsite storage manifests. It could also involve a search for critical information or a clean-up of the last three years’ worth of tapes, and so on.

Conclusion
In today’s highly litigious world, creating a data map is one of the primary steps in responding to litigation requests. It is vital that organizations get a solid foundation by focusing time, energy and resources in doing it right – and creating it long before it’s needed.

Ganesh Vednere  is a content and records management consultant with expertise in implementing enterprise-wide content and records management programs, including program strategy and setup, policies and procedure development, records retention research, and technology implementation. He has more than 15 years of relevant industry experience in various business and technology verticals.

The Physics of the Data Map:

The form and format of data maps differ widely by industry type, organizational size, geography, regulatory environment, business processes, and more. While
each organization's data map may look different, there are several key elements essential to any good data map:

  • Looks Matter. How the data map looks is key to its usability, relevance, and presentability. A good data map will be organized either functionally or hierarchically with various data points organized around key subject lines. Typically it would consist of rows of data with columns of attributes for each data set. The size of the map is entirely dependent upon the organization, but at a minimum, each one should contain information about process, systems, and people.
  • A format that supports change. Data maps are subject to frequent change and thus choosing a format that allows updates to be made in a painless manner is critical. In the initial stages significant volumes of data need to be entered, so start with a format that supports quick data entry, such as Excel, and subsequently migrate to a longer-term format that supports searching, reporting, and quick retrieval, such as a database. Do not overcomplicate either the form or the format. Bottom line: "Keep it Simple."
  • Emphasize the quality of content. Data map designers tend to "over engineer" the document and set themselves up for a process that involves gathering numerous data values for each entry in the map. Instead, by honing in on only those columns that truly add value to the document, the process of collecting, collating and organizing the information for it becomes more manageable. For each column in the data map, collect as much accurate information as possible. For the "location" column, for instance, enumerate both primary and secondary locations, if there is one. A system may store the last 10 years of data online (primary storage location) with legacy data archived in a data archival system, tape, or offsite location. All locations should be reflected on the data map.
  • Access and Storage. Data are typically considered a "record" under record retention rules and therefore all of the requirements of good records management would apply. Unless explicitly prohibited, access to the data map can be granted to various groups and roles within an organization. The rationale is that the data map contains critical information that should be accessible broadly rather than available only to so me individuals. Most of these individuals, however, would get "read-only" access to it. Accordingly, a view of the data map should be placed on a more widely-accessible storage location while the data map itself can be controlled via the appropriate database or file system controls.
  • Maintaining the Data Map. Ensuring that the data map stays accurate is vital to the relevance and long term viability of it. A cross-functional team comprised
    of business, IT, and compliance that is sponsored by legal should be setup to maintain it. A data map administrator who performs the edits and controls access should also be established, and an appropriate chain of custody should be established such that when the data map administrator leaves the organization, the right handoffs take place. Data map updates should generally be done on an annual basis, but also in response to significant organizational events, as well as compliance and regulatory changes, or revamping of IT systems and processes. The update process should be a collaborative effort and not just a "do we have to do this" exercise.
  • Using the Data Map. One would think that once created, the data map would be widely used and referenced by all departments for various purposes. Surprisingly, this is not always the case. The data map simply become a "checkbox" that gets relegated to a paralegal in the litigation group. Why isn't business, IT, or compliance using the data map, after all the time and effort spent creating it? The answer may lie in the perception that the document is only for “ediscovery” and not useful for day-to-day operations. While that may be partially true, the data map is indeed a lot more versatile and useful. It can be used for everything from IT portfolio rationalization to IT asset management and business process improvement. It is therefore incumbent upon the data map team to undertake suitable efforts and means to publicize, communicate, and demonstrate how
    it can be and is useful to various cross functions within the organization.

Ganesh Vednere is a recipient of AIIM's 2009 Distinguished Service Award.