Data Protection Series: To De-Duplicate or Not? (Part 2)

By: In: Data Privacy & Protection On: Jan 25, 2016
Data Protection Series: To De-Duplicate or Not?  (Part 2)

In my last blog on de-duplication, I wrote about the complications that data de-duplication can bring to legal proceedings. I’d like to follow up with how, despite this fact, de-duplication can be the right fit for IT teams, if it’s part of a comprehensive data management strategy.

De-duplication and compression are important technologies, at least in terms of the problem they are seeking to address.  In 2011, IDC reported that there were roughly 22 exabytes of external storage deployed by firms worldwide and that capacity demand was increasing at a rate of about 40% annually.  At that rate, the storage manufacturers could just about keep up with the capacity demand of the consumer.  In 2014, however, the analyst changed its projection:  capacity demand would grow at 300% per year in shops that were using server virtualisation.

The reason had to do with the expanding capacity requirements represented by “new” software-defined storage topologies being introduced by VMware, Microsoft and other hypervisor vendors that required a minimum of three storage nodes per virtual server host with replication between the three (or more) nodes. Every data write would require three times the space to store original and replicated bits.  Shortly after IDC modified its projection, Gartner speakers were heard on stage at conferences to double the IDC estimate:  over 600% year over year growth would be required both to store data on replicated nodes and to back up those nodes.

Whether the analyst projections are factual or not, they have created something of a nightmare scenario for IT planners.  How will they afford all of the additional capacity that the analysts are projecting?  De-duplication and compression vendors have characterised their technologies as superheroes.  With 70:1 data reduction, they can make short work of the data burgeon – every disk or flash storage device will act as though it has the capacity of 70 disks or flash units, courtesy of de-duplication.  That will bend the cost curve of storage despite the data burgeon.

The incentives of the IT professional are to embrace data reduction technologies in order to control CAPEX costs; the concern of the front office (assuming they understand what is being done with their data and the consequent legal exposure it creates) is to limit legal or regulatory liability deriving from IT operations.  It is difficult to find a middle ground, but not impossible.

The simple truth is that the compliance issues raised by de-duplication and compression can be side-stepped all together with effective data management.  Data management involves the identification of business processes at the firm, then the discovery of applications that support those business processes.  With this process-correlated application list in hand, the next step is to find the data used by and produced by each application and to discover its physical location in infrastructure.  This could prove challenging, since the last time that such an application data-to-infrastructure map was created was probably at the time that the application was initially deployed.

When the heavy lifting of this investigatory process is complete, the characteristics of data – the application and business process that the data serves – and its physical location will be known.  This is an important precursor to making intelligent decisions about how the data should be platformed (data that is accessed and changed frequently might be best hosted on flash or high speed disk storage, for example), what services the data should receive (frequency of replication for availability, special capacity provisioning, exposure to de-duplication/compression processes, etc.), and what kinds of security and disaster recovery/protection services the data requires (these are determined by business process criticality, etc.).  In fact, without this data management investigation, companies are likely to be misusing their storage infrastructure and storage services, wasting money and resources and potentially exposing themselves to legal and practical risks!

De-duplication seems like an innocuous service with obvious practical benefits, but like any storage service, its benefits are only realised if it is applied in a deliberate way to achieve objectives that have been carefully defined after comprehensive analysis.  If you do not understand the process, it might be useful to partner with a provider to help you to get off to a good start with your data management practice. After all, putting a comprehensive data management plan in place is the first step in achieving true data protection.

← Data Protection Series: How Do New Data Privacy Laws Affect Records Managers? Data Protection Series: To De-Duplicate or Not? That is the Legal Question. (Part 1) →

Leave A Comment

About the author

Jon Toigo

Jon Toigo is the CEO and managing principal of Toigo Partners International and chairman of the Data Management Institute. He is the author of 15 books, including five on the subject of business continuity planning.