A Guide to Content Classification and Categorization

While the two may be used interchangeably by many people, there are subtle differences between content classification and categorization which many scholarly articles have made an effort to highlight. In the context of information systems, Illinois Digital Environment for Access to Learning and Scholarship of The University of Illinois acknowledges the difference by stating “Examination of the systemic properties and forms of interaction that characterize classification and categorization reveals fundamental syntactic differences between the structure of classification systems and the structure of categorization systems. These distinctions lead to meaningful differences in the contexts within which information can be apprehended and influence the semantic information available to the individual. Structural and semantic differences between classification and categorization are differences that make a difference in the information environment by influencing the functional activities of an information system and by contributing to its constitution as an information environment.” Another common view is that data classification is the process that results in categorization of data. As an example, a pharmaceutical company that produces drugs may wish to classify its products by the type of condition they address based on which the drugs may be slotted into categories that are based on conditions they address. Alternately, the company wants to classify them based on whether they are prescription drugs or over-the-counter (OTC) drugs and put them into these two categories. A binary Yes-No classification is also a valid classification with the Yes and No becoming categories in this case. A category could also have multiple sub-categories. It would not be out of place to make a mention of the EU General Data Protection Regulation (GDPR). Based on the principles of rights, integrity, accuracy, storage limitations, minimized data, scope and fairness, this set of global guidelines has been rolled out to define how sensitive data should be handled with respect and care. Penalties for violations are steep. In this article the word ‘classification’ will be used to denote classification leading to categorization. Whether it is classification, or categorization, oWorkers has been supporting its global clients and providing these services for over eight years. As a BPO focused on back-office services, it has gained recognition and been identified as one of the top three data services providers in the world. On more than one occasion. Clients across the world trust oWorkers.

Strategy for content classification and categorization

How does one decide in what manner a content classification exercise should be done? For that, a more basic question needs to be answered. What is giving rise to the need to classify products, and information related to those products, into categories? Products could also be services or ideas. Is the content classification and categorization exercise being contemplated so that it is easy to retrieve that piece of data when needed? Is compliance the main purpose? Data needs to be classified so that it can comply with federal and other legal requirements? Does it need to be done so that customers find it easy to look for what they want? On ecommerce websites, for example, the revenue could be directly linked to the ease with which customers can locate what they are looking for, and discover maybe even many other similar and/ or related products. Or is confidentiality of data driving the classification decision? Has the business grown to a point where a larger set of people need to be involved in decision-making? Does it mean that information needs to be shared on a ‘need’ basis and not ‘everything available to all’ as may have been the case earlier? Whatever the strategy, oWorkers has the talent to deliver the goods. We are not merely users of the resources provided by a community, we are active participants in the community and its development, wherever we have delivery locations. This positions us as favored employers, generating walk-in traffic of candidates seeking employment. This gives us a choice of talent while keeping our hiring costs in check, as we don’t need to advertise to attract talent. Our training teams make the hired candidates job-ready in a short period of time. A related benefit of access to a continuous talent pool is the ability to provide for short-term ramps in client volumes. These could be seasonal or these could be driven by promotions or other events. Our deep supply pool enables us to meet these short-term requirements, resulting in significant savings for clients who would otherwise need to keep resources idle for the rest of the year.

How is classification done?

Very simply, classification of any content can be done in two ways; manual or automated. While discussing more about the two, the perspective will be content in the form of text. At this point, computers and machines are not able to understand any data except for structured text. Other forms of content like audio, video, images and unstructured text can be understood to the extent of an equivalence being created with structured text that they have been built to understand. Software programs are a form of structured text.

Manual classification

This is straightforward. Once a classification strategy has been agreed upon, human beings appointed for the task are made aware of the strategy, and provided training if required. They then review each piece of text that is in scope and go about the task of content classification and categorization. The manual approach is likely to deliver the best results as the human brain remains an organ that, so far, has not been possible to replicate in machines, with its fine sensitivities and awareness of context. However, the manual strategy also suffers from all the frailties and limitations of human beings:
  • It is expensive as humans require continuous compensation for sustenance.
  • It has limited capacity. A human can only process at the speed of a human, not a machine.
  • It is prone to human errors that can be varied in nature and difficult to catch as they don’t have a set pattern.
  • One may require an endless search for new resources as humans could get jaded doing a repetitive task or may choose to move to others.
Thus, while manual classification remains an option, it can only be used selectively, depending on the type of project and its sensitivity.

Artificial Intelligence (AI) based classification

Automation through AI is the other possibility. Bear in mind, however, that AI needs to be created, once again by humans. The process of an AI model creation goes through a process called Machine Learning (ML) in which thousands and millions of ‘training’ examples are fed to a computer so that it can begin to recognize and establish patterns and consequences between input and output. It follows that the AI method develops its model based on past models. It is also called the ‘feature extraction’ method in which relevant features of content, along with their relationship with the ‘output’ are ingested by the computer. This enables the program to build relationships between the input and output. The more the training data the finer is the likelihood of the output. A point is reached where the model is able to predict the outcome based on fresh input data, with an accuracy level acceptable to the creators. The model then goes into business as a content classification and categorization expert, poring over data and placing it in categories. While it is understood that AI does not yet possess the fine sensibilities of the human brain, there are many advantages it has because of which it is being deployed in many applications. The main advantages are:
  • Machines work at the speed of machines. They can process large quantities of data in almost the blink of an eye. Hence, they can be used for large volumes of data without any significant increase in time or cost.
  • Their speed makes analysis and classification possible in almost a ‘live’ or real-time scenario. Instead of a human reading through a large volume of content and then choosing a classification, a computer can do it almost instantly.
  • Computers do not get jaded. They do not get bored. Their work is consistent. When they make a mistake, they will make the same mistake again and again, and not new ones randomly. This makes the output consistent and predictable.
oWorkers is GDPR compliant, ISO (27001:2013 & 9001:2015) certified and operates from super secure facilities in each of its three delivery locations. It has been amongst the first to create an environment for its staff to work from home in times of the epidemic, as and when required and can fully operate either from home or office. In addition to trained human resources, oWorkers is able to access the latest technology tools suitable for this activity, thanks to its enduring partnership with leading providers of technology. Whether you are looking for classification or categorization of content, we can do it for you.

Content classification and categorization – an ecommerce example

The content in this case is the product, or products, that the platform wishes to place on its shelves for customers to access and, hopefully, buy. Ecommerce has been a rapidly growing business over the last few years, further fuelled by the global pandemic that limited many people to ‘in-place sheltering’ for long stretches of time and satisfying their various needs through online purchases. For this industry, classification and categorization of its products is a core activity that will determine its success. What do ecommerce platforms seek?

An easy to reach site

Firstly, it seeks that the customer should be able to reach their platform easily. While this may be the end result of what your product categorization enables you to achieve, we start with this objective as this is where the customer journey begins. Being in a competitive environment, there are many similar platforms eager to attract the same customer. Most customers will not access your site directly. A large number will do it through search engines, hence good search engine rankings are the goal for every website, like it is for you. A suitable taxonomy that allows you to create and populate the landing pages with the right words and phrases is what helps you with this goal. Google and other search engines will index the site and its products on that basis, enhancing or lowering the opportunity for seekers to find your site.

A pleasant browsing experience for the visiting customer

Once a customer is on your platform, how you can deliver the most pleasant browsing and shopping experience is your main objective. While the look and feel of your website is a contributor, this objective can be better fulfilled if your website is aligned with the objectives of the visitor. As the objective is shopping, perhaps the website’s objective would be best served if the visitor could locate the target products with the greatest ease. Product classification plays a key role in enabling this objective. Should shoes be bundled with apparel or be a separate category? Should the Male/ Female pathways be segregated at the beginning of the journey or at the end when the customer has reached the product category? These are the type of decisions you will need to take to get this right. Remember, the customer has a choice of many websites. Typically, the first visit will not end in a purchase. The customer will look at different sites and go back to the one that most satisfies her unique combination of requirements that induces her to shell out the money required to make a purchase.

Making subtle suggestions for additional purchases

While the primary objective will always be to fulfil the basic objective of a customer’s visit, of facilitating the purchase of the item she came looking for, like any good business you would also want to make suggestions to her about products that she may not have come looking for, but could nicely complement what she did come looking for. Suggesting socks if she came looking for shoes and suggesting a belt if she came looking for trousers might not be out of place. Once again, product classification is what will help you achieve these objectives. With several unicorn marketplaces as longtime clients, oWorkers understand the challenges of this work and is equipped to handle them. With centers in three of the most sought-after delivery locations in the world, oWorkers employs a multicultural team which enables it to offer services in 22 languages.

Outsourcing content classification and categorization

In a competitive environment, running a business is hard enough. If a business starts taking on the work that is required to be done in support of its main activity, its business is likely to suffer from a lack of attention. Whether it is hiring, or training, or execution,. or quality control, leave it to an expert. Leave it to oWorkers. Operating with employed staff, as opposed to contractors and freelancers employed by many competitors, we regularly monitor each individual’s performance as part of a larger career management framework and take steps like training programs and job rotation as and when needed. Our efforts have resulted in many youngsters being able to make a transition from their challenging circumstances to becoming a part of the global digital workforce. Your work will enable us to support a few more people to make the transition.

Table of Contents

Categories