The role played by content moderation tools

Over 3 million years ago, even before the present version of humans emerged, tools shaped out of stone are believed to have been used by ancestors of the present-day humans. About a million years ago, ancestors of modern-day humans discovered the ability to light a fire. More than 15,000 years ago, humans invented/ discovered agriculture as a means of sustenance and livelihood, moving away from their foraging and hunting past. 10,000 to 15,000 years ago, man discovered the art of making pottery, bricks and clothes. The wheel is also understood to be invented in the same period. Iron, Gunpowder, Compass, Mechanical clock, Printing press, Steam engine followed in the few thousand years thereafter. Though the term itself is understood to have been ‘invented’ as recently as the 12th century, mankind’s search for tools that enable him to do more, faster, better, has been going on since time immemorial. If anything, it has only gathered pace as time has gone by. In fact, the use of tools is one of the ways of separating man from other animals. Some of the inventions may have, at times, created unforeseen or unintended consequences, like disease, but the objective behind the relentless striving has always been noble; to enable mankind to do more, faster, better. We may not be as old as some of the tools invented by humans, but we have made rapid progress in the few years we have been in existence. oWorkers prides itself on being selected as one of the three top providers of data based BPO services in the world, despite not yet being a teenager.

Introduction of content moderation tools

Why should the digital age be any different? Man’s desire to introduce tools that enable him to do something better and faster applies to all his endeavors. Moreover, with the history of all the tools that have been invented over millions of years, pathways to the creation or discovery of the next set of tools are, perhaps, also clearer. The use of tools for an activity is not a natural starting point. Many activities start off organically, as natural processes, based on human needs and desires. Social media platforms started as a means of sharing thoughts and ideas and communicating with others. They grew as they were found to be useful. While they were growing, unintended uses of platforms began. A few saw an opportunity in leveraging the reach of these platforms and started sharing objectionable content with vulnerable audiences. When this was recognized to be a problem, moderation systems were introduced that were likely to have been mostly manual. Usage and adoption continued to grow. The volume of malicious content also continued to grow, requiring more and more bodies to be ‘thrown at the task’ creating profitability issues for platform owners, mostly private businesses. And the search for tools began to overcome the challenge of using people for moderation, such as recurring cost, speed of review and variations in interpretation and application of standards. Once content moderation tools entered the frame there has been no looking back. Most organizations with a substantial moderation need now entrust the heavy lifting to automated systems and tools. That being said, for better or for worse, automated solutions are widely used and some of them are very capable, but, so far, they have not been able to match the capacity of the human brain. The ability to apply context to a situation, identification of nuances, drawing meaning out of unstructured content, taking all content in their stride, regardless of format, are unique to human beings. This is the reason that while the heavy lifting is increasingly entrusted to automation tools, the contentious issues and decision-making in case of reasonable doubt, is still left to human beings. Even for operating the tools, smart people are needed. With its unique place in the communities, it is located in, oWorkers attracts the best talent, with which it is able to staff its many different client projects. We are also able to offer additional staffing to meet short-term, unplanned spikes in volume. We can hire a hundred additional staff within 48 hours, thanks to the liberal supply of walk-in talent.

Human beings as content moderation tools

As we have seen, human beings are the best moderators, best anything in fact. Of course, they have various limitations too, but there is no denying their intellectual superiority to any machine. Human moderation has resulted in the evolution of several methods of content moderation:

Pre-moderation

The content submitted by a user is reviewed before it is made available to others. Though this is seen as the ideal method, it causes delays in publishing, leading to user dissatisfaction, as they want to see immediate results. Besides, it uses up a lot of resources and becomes expensive.

Post-moderation

The user generated content (UGC) is allowed to be published, with moderators playing catch-up. In other words, they continue the review process and, in the event of coming across offensive content, will take it down even though it may have been published. While this results in a better user experience, it enables offensive content to slip through.

Reactive moderation

All users are given the means to report and complain about the content that is available. Complains drive content into a review queue which, then, leads to it being handled on its merit by the review team. Clearly not fool-proof, but an inexpensive solution that would serve the purpose of websites that have a high tolerance threshold.

Distributed moderation

This method requires users to vote on the content they access and its suitability for the site, which could be in a variety of ways. The eventual result is that content voted down by users keeps going down in rankings, with the lowest ones becoming virtually non-existent or invisible. With all our centers equipped to operate 24×7, oWorkers can provide quick turnaround on transactions, over and above any benefit that we might get on account of time-zone differences. We have made the choice of working with employed staff, and not freelancers and contractors that some of our competitors do. This gives us flexibility in deployment of resources, as well as a trained middle management team of supervisors who have grown in the company. We are registered as a local company in our delivery locations, and pay social and local taxes for our staff. Our staff, both past and present, rate us 4.65 or above on platforms like Glassdoor.

Adoption of tools for content moderation

When the need for tools manifested itself, as a result of growing volumes, limited human capacity and rising cost of human moderation, it became possible to apply the pre-moderation method across a much larger swathe of content, because machines, once trained, can handle a lot more information at the same time as compared to a human. Besides, with the introduction of technologies like Artificial Intelligence (AI), unstructured information, a no-go zone thus far for computers and machines, suddenly became reachable. Based on AI or otherwise, the tools adopted for content moderation are not meant to be merely tools that do their automated thing in isolation, in complete disregard of what the human moderators are doing. Present-day tools are expected to do their ‘tool thing’ while allowing human moderators to leverage them and do the ‘human thing.’ In other words, tools should provide an interface that not only allows the automated functionality of moderation to operate, it also provides a window to the human operators to review and handle escalations. It obviates the need to move content between platforms and the attendant challenges of doing so. Some of the other features reliable content moderation tools are expected to offer:
  • The ability to handle a variety of formats like video, audio, image and text
  • The flexibility to handle emails, blogs, comments, reviews and all other types of content
  • A filtering mechanism for keeping out obviously objectionable content such as pornography, vulgarity, etc.
  • Capabilities like Natural Language Processing (NLP) that make it possible to evaluate content such as audio and video and create context around them
  • An interface with a dashboard for online monitoring of traffic and performance
  • All web spaces, including community forums, websites and social media handles, should be accessible through the tool
  • The tools to automate tasks, delegate and monitor tasks
oWorkers operates in a highly secure environment designed to keep client data secure. It is ISO (27001 :2013 & 9001:2015) certified and GDPR compliant. It has long-standing relationships with technology companies, and can access the latest technologies for client projects.

Limitations of content moderation tools

Blessings are generally mixed, and that applies to tools for content moderation too. While we are going to pursue all opportunities for automation, it would be beneficial to understand their limitations instead of going in blind.

Automation requires scale, not everyone has it

Each industry is different and each company is unique, which is what sets them apart in a crowded marketplace. Their requirements also, are unique to some extent. It is, therefore, logical, that they need to develop unique automated solutions to handle their unique requirements of moderation. Since companies operate at different levels, the smaller ones will struggle to put together the scale that will justify an investment in such tools. They will, therefore, have to rely on off-the-shelf solution that, while being very good at what they do, could fall short of handling their unique requirements in an ideal manner. They will need to understand the strengths and weaknesses of the tool they adopt, in order to ensure they are able to plug the holes.

As good or as bad as the training they get

Being unthinking, they depend on human intelligence to ‘learn’ about what they need to do, and then go about the learned task at a scale of efficiency that cannot be matched by humans. These tools are, thus, limited by the training they receive. For AI models that are trained with the help of Machine Learning datasets, the larger and more varied the datasets, the greater the learning for the tools. Similarly, the smaller and less varied the datasets, the lower the quality of learning, leading to reduced reliability and accuracy.

Absence of context

This is the obvious disadvantage of not possessing as fine as the human brain which, intuitively, knows. The moment a tool encounters an unfamiliar situation, it is likely to deliver unexpected results as its context is limited to what it has been fed by its human operators. An image of a female breast may be considered to be a case of nudity, but an image of a woman breastfeeding an infant is not nudity and is permitted on most sites. A human can make out the difference but a tool will struggle to, unless painstakingly taught the difference.

Struggle with unstructured data

Computers have been taught to understand humans through a set of characters, arranged in a certain sequence for a certain meaning, and another sequence for a different meaning. Structured sets of characters came to be known as software code or programming. Tools will, however, struggle to understand sets of characters not arranged in those sequences, or data in any other format, such as audio and video and images. This is why an audio file needs to be converted into text before a tool can even begin to extract meaning out of it. Once transcribed to text, speech loses its context. For example, were the words spoken in anger or softly? An image is merely a set of randomly arranged pixels, till ML datasets teach the tool the meaning attached to each arrangement. A lot of progress has been made with AI models, but we are far from content moderation tools having an understanding anywhere near complete.

Training datasets may be biased

While the internet is global and now penetrates to the darkest, remotest corners of the globe, the work that is being done in creating automation tools is mostly done in English. So what? It could result in prejudicial treatment of content that is not in English. A related risk is that the biases of the data annotator preparing the ML datasets are also likely to creep into the training and impair its judgment to some degree. oWorkers has a multi-ethnic, multi-cultural workforce in each of its three delivery centers. With this workforce, we are in a position to support clients in 22 languages.

Lack of an audit trail

In a software program, it is possible to review the code and identify an error, if the need arises, by going through the lines of code. Content moderation tools, trained with the help of ML datasets, do not afford the same transparency. It is not a linear relationship. If a tool takes a particular decision, it is very difficult to identify the exact reason why it did so. Thousands and millions of data points might have gone into making up its training. Many of them might have been identical, many others with minor variations. What caused the algorithm to be constructed in a certain manner is extremely difficult to unravel. Hence, if a decision taken by an algorithm gets challenged, while it can be changed by a human being reviewing the case, providing a logical explanation for it does not seem possible at this point.

Humans and tools need to co-exist

The foregoing creates an ideal environment where human beings and automated tools need to co-exist for delivering the best results. A combination that an established player like oWorkers, with its focus on data related services like moderation and a leadership team with over 20 years of hands-on experience in the industry, can offer. Our customers routinely note substantial savings upon outsourcing to oWorkers and appreciate the transparency and choice they get in pricing. 85% of our customers are technology companies, and includes several unicorn marketplaces. We hope you will be one soon as well.

Table of Contents

Categories