What is Data Categorization and why is it important?
Many of us have seen Hollywood movies in which the plot includes some dark, sinister secrets, unravelling which holds the key to happiness and sunshine. These secrets could be ‘informal’ secrets about the lives of people or they could be ‘formal’ secrets hidden away through a conscious decision of governments or corporations. For governments, the decision to keep some information secret might be based on the revelation leading to a potential law and order situation, or compromising the security situation of the nation. For corporations, it could be a trade secret revealing which would compromise their competitive position. Think Coca Cola formula for their signature beverage. Think Google algorithm for their search engine.
Continuing with the Hollywood theme, the protagonist would normally be the one pursuing the mystery with the aim of eventually spreading sunshine and cheer. He is usually portrayed to be up against the ‘evil empire,’ either the big corporation or the government, who keep blocking efforts at revealing these secrets, as it could portray them in a poor light.
Though now a lot of information is held digitally, one can perhaps visualize that in the movies of yore that we are referring to, a round red stamp would be dramatically affixed on the paper where the information to be kept secret was written, or typed, proudly stating ‘Classified’ or ‘Top Secret.’ That paper would then be consigned to a secretive place for storage, with access only to a few chosen people. And that would be the end of many such ‘classified’ documents, until a Hollywood hero made it his mission to unlock some of these mysteries.
Data classification
Data having to be kept secret may not have been a challenge if that was the only disposition available. The challenge perhaps arises, as there are many different types of data that need different treatment, keeping them secret being only one of them. Both governments and large corporations also have a need to share a lot of information with their constituents. Governments need to share about the new policies being introduced, the progress being made during their rule and perhaps the law and order situation. Corporations need to keep their staff updated on HR practices, they need to keep customers informed on new products and features, and so on. So, while they have to deal with data that is classified, they also need to deal with data that needs to be widely shared. There could be other categories between these two extremes as well. Since information or data is of different types requiring different treatment, it has to be identified as such so that people handling it would handle it in the appropriate manner. This gives rise to data classification. Over the years a few common classifications have emerged. While they have been widely used in governments and militaries, their use in private corporations is more recent. For the most part, as a starting position, corporations seem to have adopted similar classifications as governments have used. The common ones are:Public
Information that can be made available to whoever is interested in it. Much of government information is meant for public consumption. Often, the regulatory framework may even demand that certain information be in the public domain.Sensitive
From a government perspective, this is information that needs to be handled with care. It has the potential to cause disorder, unrest, violence, etc. It may be best to share it only with identified people.Private
This is meant for the consumption of only an identified set of people, or roles. For corporations, information on clients and business strategies could fall in this box. Performance evaluation ratings of employees could fall in this category too. It should be known only to a limited set of people.Confidential or Classified
As the name suggests, this demands the highest level of sensitivity in handling. From a government perspective, data on military strategies could fall in this category. BPO companies like oWorkers, providing data classification services to organizations, play a key role in the process, by taking on the task and leaving the business to continue with its core work. oWorkers is a data focused BPO provider that has been recognized as one of the top three providers in the world for data related services. Thanks to its multicultural teams across three separate geographies, it is able to support client requirements in over 22 languages, which typically cover most official documentation that requires classification.Data categorization and data classification
Many people use these two terms interchangeably. There are also those who point to differences between the two. One of the main factors of differentiation coming up is that the data classification methodology, as well as classes, have been based almost entirely on the sensitivity of data; what should be made available to whom. While it may have served the need of governments and militaries who needed simple, unidimensional classifications, to rule out the possibility of any ambiguity in interpretation and use, they are proving to be inadequate for private organizations. In the traditional system, a corporation may classify as confidential the data pertaining to employee compensation as well as the minutes of the meeting where the strategy for a new product has been finalized. The access rights to these two pieces of data might be totally different. Employee compensation information should be accessible to the HR Head of the company but perhaps not to the Marketing or Sales Heads. The strategy data, on the other hand, might be available to the Marketing Head but not to the HR Head. What about medical information of patients undergoing treatment at a medical center? What about bank accounts or other financial information of clients of a financial institution? As we can see, there could be many pieces of data that we might understand as sensitive or critical or even confidential, but struggle with managing it in the absence of guidelines. It follows that in large, complex organizations, a classification system based only on sensitivity of data might not be adequate. In its report titled Rethinking Data Discovery and Classification Strategies published in 2016, Forrester Research appears to have argued in favor of a system that could overcome some of these challenges. The system of data categorization that this thinking spawned tries to categorize documents based on the type of data it contains, and not merely on the basis of who should have access to what. Thus, at the very least, the contents of the document, or the type, would become a component of the system, based on which access right to the data could be decided. It would become an additional dimension to the classification system, giving it greater granularity and clarity. While the logic for deciding whether a certain piece of information should be classified as ‘Sensitive’ or ‘Confidential’ may not be understood by many people, whether the data consists of employee information or a trade secret or client contacts should be fairly apparent. Hence, the task for people, while deciding on the classification, should only be for them to determine the ‘category’ based on which its access rights could be determined. This is also only relevant for certain types of information that need to be protected through some form of access control. Hence, the person assigned this responsibility will probably be required to select one of a few options available for data categorization. If the document does not contain one of those types of data, presumably it does not need to be restricted, and hence, by default, might qualify for its availability as ‘public.’ Organizations, in order to remove bias, have even experimented with this task being automated or algorithm based. However, at this point the experience is that a human ‘eyeballing’ is required. The second step, however, of classifying the data based on the category, could be automated much more easily. oWorkers draws its resources from the geographies it operates in. Being a preferred employer, it is able to attract the best talent and choose the right people for the right jobs, be it classification or categorization. The established training engine of oWorkers then takes over and makes the hired resources fit for purpose, regardless of the previous educational background and experience of the resources. Our ability to attract talent also gives us the flexibility to hire for peaks at short notice, almost a hundred resources in 48 hours, while keeping our hiring costs in control.Parameters for data categorization
There are many ways in which data can be categorized. Organizations can even create multi-dimensional standards that categorize a particular piece of data on multiple parameters. Of course, data storage in digital formats also facilitates multi-dimensional categorization that could be affixed as tags. Some common types used by complex organizations:- Based on value
- Based on usefulness timeframe
- Based on information type
- Based on who it pertains to – clients, employees, etc.
- Based on requirements to refresh
- Based on retrieval rights
- Based on storage location/ device