Organizations Will Need to Tackle Three Challenges to Curb Unstructured Data Glut and Neglect
Increased data growth propelled by the Nexus of Forces has created an unstructured data nightmare. To effectively manage data growth and security, information managers will need to deploy the right tools, and educate employees organizationwide on how to overcome instinctual data hoarding.
Impacts
Information managers are unsure of what tools are available to assess scope and risk of the enterprise data footprint, stifling most early stage information management programs.
Lack of full understanding of regulatory requirements and hope for analytics prompts organizations to retain all information, creating challenges for information managers.
Everyone is a "data hoarder" by nature, and new storage options are propelling corporate data glut and hindering data deletion efforts by information managers.
Recommendations
Purchase a file analysis product to get a picture of the data demographics, emphasizing redundant, outdated and trivial data along with sensitive and personally identifiable information.
Engage the CIO and CDO in creating retention policies; these individuals should then, in turn, engage their peers in legal, risk, compliance, security, business intelligence and lines of business across all regions.
Use the data gathered through file analysis, including potential cost savings, to assist each business unit to dispose of or mitigate risk of unstructured data.
Strategic Planning Assumption
Through 2020, less than 10% of organizations will find value in "dark data."
Analysis
Most of us are guilty of "data hoarding." Without so much as a thought, we save every digital photo, email, document, presentation and spreadsheet, losing track of what we have saved along the way. Across the enterprise, employees are blindly building a bottomless lake of "dark data," and, in many cases, a corporate mantra of "save everything, just in case" is encouraging the behavior.
Additionally, the Nexus of Forces has opened up myriad ways to create, store and access user-controlled data. No one really knows the true scope of enterprise data glut. This is because the IT organization has, until recently, been in the dark. It has been restricted to storage resource management (SRM) and search tools, which are often not deployed and provide little to no functionality for determining if data – in particular, unstructured data – has any real business value or if it is sheer waste.
So what do we do? And, perhaps more importantly: given dropping storage costs, does uncontrolled data growth even matter?
Uncontrolled data growth does matter. Client inquiries suggest that, for many organizations, around 30% of data is redundant, outdated or trivial (ROT). Inquiries also suggest that around 50% of data has an indeterminate value, while the remaining data is mission-critical. Assuming a midsize storage environment, with between 1PB and 4PB of raw capacity, and a storage total cost of ownership (TCO) of $2,325 per TB raw or $3,092 per TB usable (assuming 75% of raw capacity being usable), this equates to $927,600 to $3,710,400 in wasted spending on ROT.1 Moreover, if the 50% of data with indeterminate value proves to be waste, these numbers skyrocket, resulting in unnecessary storage costs of $1,546,000 to $6,184,000.
Making matters worse, storage teams typically throw more and more storage at the ballooning data problem. In fact, a recent Gartner survey found that 51% of survey respondents felt that, when it came to the general management of storage, "not managed besides purchasing more…" best described their strategy (see the Appendix).
The good news: storage hardware costs continue to drop. However, hardware only accounts for 48% of the storage TCO.1 And, even if the TCO could drop to zero, another problem would still remain: keeping everything not only can lead to extremely costly and damaging issues of noncompliance, but also creates a bigger pool of sensitive and personally identifiable information (PII), vulnerable to improper access.
What do we do about all of this data? This answer is two-part. First, information managers must implement a data policy and management program, if they haven't already. Second, they must recognize that, for such a program to tackle data growth and deletion, it must include the deployment of the necessary tools to identify the problem, while also dealing with the people-related challenges of hoarding and misguided thinking. In covering these topics, this research focuses primarily on end-user-controlled data, such as files, images and objects.
Figure 1. Impacts and Top Recommendations for Managing Data Growth
gartner
Source: Gartner (June 2015)
Impacts and Recommendations
Information managers are unsure of what tools are available to assess scope and risk of the enterprise data footprint, stifling most early stage information management programs
We can't deal with the data problem until we can see it and understand it. And, with as much as 80% of enterprise data now being unstructured data (according to Gartner estimates), hope is available with file analysis products. Information managers can now scour unstructured (and in some cases structured) content, and perform standard and customized metadata analysis. Using such a tool, information managers can, for instance, quickly determine duplicate files or files that belong to employees that are no longer with the company. In addition, many file analysis products have "content awareness" for PII, payment card industry (PCI) and personal health information (PHI) identification.
After analyzing files, information managers can then use the resulting reports to inform their strategic initiatives for getting end users to delete data (for example, by making a number-driven decision on a reasonable inbox size) or for building a business case to present to the CIO and other executives. Aside from disposing of unwanted data, these tools can also assist in identifying data of value, such as data that should be tagged as records or data that can be filtered into analytic programs.
Recommendations:
Shed light on the unstructured data problem with the help of a file analysis product. One of the issues we always encounter is that the subject of data glut is too abstract for most people to understand. Information managers should consider a file analysis to get a picture of the enterprise unstructured data footprint, using the tool to home in on ROT, and on sensitive and PII as a starting point.
Begin to proactively classify data. As organizations look backward through a "file analysis" lens, this can lead to unstructured data management best practices and begin to open the possibilities of data classification policies that tag data, with human oversight, at the point of creation based on the organization's business needs.
Ensure buy-in and enforcement by the chief data officer (CDO), data scientists, information officers and others who have an interest in analyzing corporate data. Nothing happens until someone gets excited. Information managers will need to gain C-level sponsorship if they want to educate the organization and the individual. With sponsorship gained, they will then need to create a cross-functional working team that is able to make decisions regarding corporate data value, classification, tagging, migration, analysis and disposal.
Lack of full understanding of regulatory requirements and hope for analytics prompts organizations to retain all information, creating challenges for information managers
Organizations have regulations that need to be adhered to. All too often, lack of full awareness of these regulatory requirements leads to a policy of "keep everything just in case." Ironically, this kind of behavior is often a violation of actual regulations.
Keeping everything also presents a larger-than-necessary target for hackers. In the December 2014 Sony hack, for example, hackers accessed thousands of emails, including deleted items that never actually "went away." Sony noted that, posthack, it was changing its email retention policy from six years for emails with financial information to two years for all email, unless that email is on legal hold.2 Organizations need to understand and balance what has to be kept (for example, Barclays was fined $3.75 million in 2013, after failing to keep critical records3) versus what data exposes the organization to risk while not providing any value and not being required to retain.
Undisciplined regulatory adherence presents storage managers with an uphill battle, requiring them to educate and persuade the ranks. In addition, the hype – and hope for – big data analytics is only further increasing the problem. Many leaders, including those in IT, see "big data analysis" for lakes of unstructured data as the technology equivalent of dumpster diving, wherein they mine trash data for gold. With this mindset, all data begins to look as if it could be useful – it's not.
Organizations need to walk a fine line between what value they want to generate from their datasets and what is actually possible. Tools, such as those for file analysis, can present a map of unstructured data. In addition, the work already done by master data management (MDM), business intelligence (BI) and data scientists can provide a decent representation of what is included in the structured dataset. By analyzing a combination of all the data and weighing corporate goals, regulatory requirements and viable usage of the data, information managers can help set realistic policies for data classification, storage, analysis and disposal.
Gartner inquiries suggest that less than 10% of organizations are even beginning to analyze dark data. And of this small group, the value of that data is still to be determined. This is not to say that analytics is not a good idea; however, it's a matter of "garbage-in, garbage-out," where analytics