After writing the last blog on The Ocean of Big Data and its seeming repetition of past Data Warehousing implementation issues and challenges, I decided to further “educate” myself. Who knows, maybe you actually can teach an old dog new tricks! So, I re-read John Mancini’s “OccupyIT” report and Geoffrey Moore’s “A Sea of Change in Enterprise IT,” as well as attended the AIIM Webinar Wednesday by John Mancini and Sid Probstein, CTO-Attivio. (Yes, I do have real work to do but sometimes intellectual duty calls!) I think these items are must-reads for serious information professionals, whether you completely agree with their ideas or not. Food for the brain.
I agree that Big Data is a maturing and useful concept today. But, what exactly is Big Data? Definitions and perspectives vary making scoping actual solutions difficult. The opportunity to “mine” new insights and business value from existing voluminous, seemingly unrelated, and initially uncoordinated data sources is obviously enormous. A frustration all information professionals have had for many years is that we need to distill knowledge of significance from raw data and document-based content collections while our ability to do so emanates from historically limited toolsets. SQL queries do not work well on “unstructured” electronic objects and “structured” data is often useless until it is formatted into “unstructured” reports. Huh? Yes, that’s right, even the output of structured data (which typically means from databases) is often rendered into evidentiary records as “unstructured” electronic objects such as Acrobat PDF or bit-mapped image files. How many people want to review a database data dump with their coffee in the morning? I’ll take a paper newspaper or an LCD-based PDF file thank you.
This may be partially why in John Mancini’s webinar an astounding observation was offered that “Unstructured information is growing 15x faster than structured information.” Since IT’s traditional role has been to develop and maintain database applications and the networking infrastructure, their inability to grasp and lead a movement toward unstructured electronic objects as the high value information content of the future will further erode IT’s dominance of technology innovation and implementation in organizations. Maybe that is why they could get “occupied” by information technology anarchists that demand an ability to use their own devices and participate in collaborative discussion environments that mimic those environments to which they are already accustomed from personal use of technologies.
The tsunami of unstructured electronic objects coming from Moore’s rapidly multiplying Systems of Engagement is becoming a major factor in the lives of Internet connected individuals across the planet. Facebook, LinkedIn, Twitter, YouTube, and many business sponsored social media sites are successfully competing with plain old e-mail as the content creators and electronic communicators of choice. Those stodgy old Systems of Records that encapsulate quantifiable information of value generated from work process based IT infrastructure are now being seen as not sufficiently collaborative. Socially compulsive groupies and corporate project teams can both see and be seen more effectively when “engaged” on Social Media Web sites rather than when sending ideas point to point to point in an increasingly redundant circle of individual thoughts. Social media simply uses technology to leverage the well-documented fact that teamwork can often be more intellectually productive than individuals toiling away alone. And the output of Social Media sites of Engagement is usually in less structured electronic object formats rather than the structured matrix style data formats of databases.
So, Big Data is going to have to deal with accessing multiple IT systems, multiple data formats, varied user needs, and often metadata constructs unrelated by their original design. But will it build on new information requirements? How about the information requirements we currently have for existing Systems of Records that still are being addressed, and often not effectively? Will Big Data integrate these requirements allowing cross-application insights by providing new information correlation tools? Since the metadata used to build current applications is not going to change overnight, one would expect that a major innovation that might come from Big Data management tools would be new ways to use content analytics or system “crawling” to reveal new relationships between data existing in different information silos. There are already some conceptual tools and systems tools being used to attempt these goals.
However, the problem is defining the problem. What exactly are we to use Big Data to look for? More specifically, who will be using Big Data for what information searching purposes? Do we start out by creating a Big Data Map of all of the Systems of Record? Now that could take a while! Will there be a Big Data As-Is Model before we define new “views” into a Big Data To-Be Model? Since getting concurrence on how to massage data from different silos into a new application has usually been an organizational nightmare, I would expect that the new “model” will be new “views” that become available based on using more sophisticated data management and searching tools rather than any attempts to integrate silos into one Big Data “landfill”. However, any way you look at the utility of Big Data, it will always be a victim of the GIGO - Garbage In/Garbage Out - dilemma. Can you find a needle in a haystack if no one put a needle in the haystack in the first place? Or maybe you are really looking for a nail.
So, I am just a bit concerned about the ROI that Big Data systems, concepts, tools and will bring us. As a former IT systems designer, I am wondering what quantifiable information requirements we are looking to satisfy. As a former librarian, I am wondering how the Big Data collection will be indexed or metadata and taxonomies applied to be able to retrieve actually meaningful information. And as a former, records manager, I am wondering how the data that is dumped into a Big Data digital landfill will somehow inherit the credibility or authenticity of the data from the original system in which it resided. Will there be a Big Data Chain of Custody?
But, of course, as a consultant, I am very much looking forward to the hours, days, and weeks of consulting that will be generated by Big Data, and I want to get “certified” in Big Data ASAP.
You need to log in to rate blog posts.
Click here to login.
This post and comment(s) reflect the personal perspectives of community members, and not necessarily those of their employers or of AIIM International