Basement Needs Cleaning; Hire a Maid (Using Content Analytics)

Community Topic(s):

Keywords: sex, drugs, rock and roll

Current Rating:
(0 ratings)

There has been quite a marketing buzz from the supplier community on the promise of content analytics. For those organizations who keep everything forever (effectively filling their “basements” with stuff), these new “electronic maids” come equipped with brooms and vacuums that allow organizations to at least begin to address the disposition of electronic content.

“Content analytics” is the term used to refer to a suite of technical capabilities that are designed to automatically crawl file systems (network drives, SharePoint, email systems) and interrogate various attributes such as date authored, file type, and even key words within the content (such as “confidential,” for example). The output of the analytical assessment is a set of reports and dashboards an organization can use for a number of purposes, e.g. to develop a classification approach, plan for content migration, early case assessment, etc.

But does it work? The answer is yes, but keep your expectations in check. Sure, the tools can provide a list of all the files that are older than 5 years and haven’t been accessed for the last three. You can also sweep through a file server to determine if any documents contain the word “confidential” which perhaps should not be sitting on a system accessible by all employees but rather within a managed repository like IBM’s P8 or EMC’s Documentum. But the clients I’ve worked with, many of whom have tried various suppliers’ tools, have very real concerns about performance. So depending on the volume of content and the complexity of the analysis, be cautious about performance. More important, do performance testing with a specific set of requirements (what exactly are you trying to accomplish vs. just “fishing”).

What productivity improvements can you expect? The exact figure is hard to determine, because few organizations have ever invested the time to conduct credible benchmarks. The best indicators come from the e-discovery service providers, which have been using various capabilities for several years to cull through content placed on legal hold before conducting review. In these circumstances, the automated analytics tools provide improvements that range from 100 to 500 times better than manual efforts. That’s right – 100 to 500x. Particularly for simple tasks, such as determining author or date (basic interrogation of meta-data).

Currently, the tools usage has predominantly been limited to e-discovery – reactive exercises in response to a discovery request. Thus, the opportunity today is to begin leveraging these capabilities proactively and on a regular basis. Imagine, one by one, working through an organization’s network drives or SharePoint sites, and determining “what’s out there.” At a minimum, a sampling of 10 to 15 network drives might provide just enough insight to issue a wake-up call for senior management: “Let’s start cleaning this stuff up now, because it’s growing by 35% a year”; “Did you know that 18% of the files on our Z drive are exact duplicates”; “Fully 67% of the files on server RSO227 haven’t been touched for 3 years or longer”.

Make no mistake – human intervention is still required for decision-making, and that takes time, and more than likely a re-evaluation of policies and procedures. What do we do with the duplicate material: keep only the most recent? What about those files that are 3 years old: can we move them to tape for another year and then delete them?

Net, net, I’m bullish. In fact, I’m so excited about this segment of our industry, that many of our recommendations to clients include specifically designed plans directing them to begin using these tools. As expected, few organizations are systematic about cleaning up their “basements” (the same is true for many households, thus the need for maid services), and content analytics is clearly a capability worth investigating.

Report

Rate Post

You need to log in to rate blog posts. Click here to login.

Add a Comment

You need to log in to post messages. Click here to login.

Comments

John Glover

It is wonderful that this problem of Too Much Stuff is being addressed.
While I like the concept of Lots Of Copies Keep Stuff Safe (LOCKSS) for preservation where and when it's needed; what ever happened to Hierarchical Storage Management (HSM) for our business records?
I thought the whole idea of Records Management was to have a retention schedule, get the inactive data off the expense media and throw out the trash.

Twenty five years ago when we microfilmed documents the paper was destroyed.
Now we save everything.

We store our images and documents on cloud repositories, write the images to microfilm (we'll need them 500 years from now) and then still keep the same stuff online as well as store the original paper.
Deduping has become an ever growing IT expense and migrating has become another form of copying

My local community just had a 'Spring Cleaning' weekend.
We recycle as much as we can and throw out the trash.
Our Information Technology Industry should do the same.
Let's have an AIIM Community cleanup week and dump the trash
Report
Was this helpful? Yes No
Reply

This post and comment(s) reflect the personal perspectives of community members, and not necessarily those of their employers or of AIIM International