sdecoret - stock.adobe.com

Tip

6 types of AI content moderation and how they work

AI will change how organizations moderate content, especially on social media and with the increase in AI-generated content. Here's what you need to know.

Disinformation and inappropriate content abound in digital environments, and users might struggle to determine the source of such content or how to filter it out.

Content moderation is commonly used as a social media screening practice. It enables the approval or rejection of comments and content that users create and post. The task involves removing rule-violating content to ensure published posts adhere to community guidelines and terms of service.

AI can aid in that process. It searches for, flags and eliminates content -- both human- and AI-generated -- that violates the rules or guidelines of a social media platform, website or organization. This includes any audio, video, text, pictures, posts and comments deemed offensive, vulgar or likely to incite violence.

What is content moderation?

Historically, organizations have moderated content with human moderators who would review most content before it published, said Jason James, CIO at retail software vendor Aptos. The moderators would check the content for appropriateness and either approve and post or disapprove and block it.

Until recently, users often did not know if their content was rejected or, if so, the criteria for the rejection process. The entire process was manual and prevented real-time responses to postings. Approval was also ultimately subject to a single moderator's decisions and leanings.

As a result, many organizations adopted a mix of automated and human intervention to moderate content, James said. AI is the first layer and filters out spam and easy-to-moderate items, while humans moderate the more nuanced items. Human moderation on top of automation is critical because if something offensive fell through the cracks, the organization would face serious consequences.

Automated moderation occurs when user-generated content (UGC) posted through the platform or website is automatically screened for violating the platform's rules and guidelines. If it does, the platform either removes it altogether or submits it for human moderation, according to Sanjay Venkataraman, former chief transformation officer at ResultsCX, a CX management vendor.

A chart listing best practices for content moderation
If an organization has strong best practices for content moderation, then it could more easily embrace AI tools to moderate content.

6 types of AI content moderation

Organizations have six methods they can adopt to effectively use AI content moderation to scale.

1. Pre-moderation

To ensure content meets their guidelines, businesses can use NLP to look for words and phrases, including offensive or threatening words and terms. If the content includes those words, it could be automatically rejected, and the user warned or blocked from future postings. This automated approach limits the need for human moderators to review every post.

This type of moderation is an early method of machine learning (ML) for content moderation. The tool can review content against a published blocklist to ensure it does not contain forbidden words or phrases, James said.

An AI-enabled pre-moderation model automatically scans and evaluates content before it publishes, Venkataraman said. AI systems -- including large language models (LLMs), computer vision and content classifiers -- assess text, images, video and audio to determine if content goes against platform guidelines. If it does, like promoting hate speech, explicit imagery or threats, it is either blocked automatically or escalated for human review.

2. Post-moderation

Post-moderation lets users post content in real time without a pre-moderation review. After a user posts something, a moderator reviews the content. With this method, users could see content that violates community guidelines before a moderator notices and blocks it. This lets a user adjust any content deemed in violation so the content can publish afterward, James said.

AI systems and/or human moderators review this content after it publishes. AI automates the review, rapidly scanning new content in real time, flagging potentially harmful material for review or takedown, Venkataraman said.

3. Reactive moderation

This method enables users to serve as moderators, who review posts to determine if they meet or violate community standards. With this method, content could publish prior to moderation. This method crowdsources moderation to the community rather than using and employing human moderators. The community forums of many brands work this way, James said.

With reactive moderation, ML systems can prioritize incoming reports based on severity, content type and user history, Venkataraman said.

4. Distributed moderation

This approach is similar to reactive moderation, where users vote to determine whether a post meets or violates community standards. AI then promotes or suppresses content based on voting behavior, and can detect manipulation patterns or bias, Venkataraman said. The more positive votes received, the more users see it. If enough users report the post as a violation, it is more likely to be blocked from others.

Services like Reddit use this method to allow community engagement on content posted on the site, James said.

5. User-only moderation

This method lets users filter out what they deem inappropriate. Only registered and approved users can moderate content. If several registered users report a post, the system automatically blocks others from seeing it.

These systems are only as fast as the number of moderators available to review and post the content. The greater the number of human moderators, the faster they can review and publish content, James said.

Users set their own filters or preferences for what they do or don't want to see. Some systems hide content after enough user reports, with limited central oversight. AI can learn from user behavior and automate moderation based on individual preferences, such as muting or keyword filters, Venkataraman said.

6. Hybrid moderation

Generative AI (GenAI) is not infallible, James said. It can create hallucinations, which include false, misleading or incorrect information. With the potential for AI hallucinations, organizations still need humans to review content and make sure it's appropriate and accurate.

The hybrid blend of human and AI moderation enables both speed and accuracy. AI completes faster pre- and post-moderation, and human moderation has the final say to make sure content meets community guidelines while being logical and accurate.

How does AI content moderation work?

AI content moderation is an ML model. It uses natural language processing (NLP) and incorporates platform-specific data to catch inappropriate UGC, Venkataraman said.

An AI moderation service can automatically make moderation decisions -- refusing, approving or escalating content -- and continuously learns from its choices. Moderation for AI-generated content is complex, and the rules and guidelines are evolving in tandem with the pace of technology, Venkataraman said.

"Content created using generative AI and large language models is very similar to human-generated content," Venkataraman said. "In such a scenario, adapting the current content moderation processes, AI technology, and trust and safety practices become extremely critical and important."

Additionally, AI-generated content is easy to create, and the amount of it online has increased dramatically since AI content tools have become publicly available. Human moderators now have to train to identify massive amounts of AI-generated content in order to weed it out and highlight actual UGC, according to Venkataraman.

"The last thing any brand wants is to have a community area, a website or a platform filled with nothing but AI-created content," Venkataraman said.

As GenAI brings a lot of contextual understanding and adaptability into content generation, moderation tools must be reinforced with advanced AI capabilities to detect nonconformance, Venkataraman said. That includes training the AI models with larger numbers of data sets, using humans to validate a higher sample of content, collaborative filtering with community-generated feedback on published content, and continuous learning and feedback.

AI-generated content is massively increasing, and organizations must adapt to the rapid pace, James said.

"As content can be created faster, the need to review and moderate content more quickly also increases," James said. "Relying on human-only moderators could create a backlog of reviewing content -- thus delaying content creation. The delays created impact collaboration, ultimately resulting in a poor user experience."

GenAI has surpassed NLP's capabilities for content moderation. For example, multimodal LLMs can understand things like sarcasm, coded language or cultural nuance. Traditional natural language understanding tools are deficient in these areas, Venkataraman said.

Hyperscalers, such as Meta, use custom deep learning models, image recognition and cross-modal AI that understands memes -- text plus images. YouTube uses pattern matching and object/image recognition to scan billions of video minutes daily. And TikTok uses multilingual, multimodal AI to do more nuanced tasks like detecting cultural norms, Venkataraman said. Additionally, video moderation tools can scan videos or audio files for copyrighted or inappropriate content.

How AI will affect content moderation

GenAI will continue to lead the AI evolution, James said. This will put greater pressure on organizations to invest in AI at some level to remain competitive and, in turn, will make AI content moderation a must-have capability.

"AI will be more heavily used to not only create content, but [to] respond to postings on social media," James said. "This will require that organizations employ AI-empowered content moderation to not only automate, but also modernize their existing process."

AI can enable faster, more accurate moderation with less subjective review by human moderators, James said. And, as GenAI models evolve and become more advanced, content moderation will become more effective over time.

"Already, [AI] can automatically make highly accurate automated moderation decisions. ... By continuously learning from every decision, [it's] accuracy and usefulness can't help but evolve for expanded usefulness," Venkataraman said.

Some companies are already adapting to AI bots for content moderation, James said. In 2024, social media giant TikTok laid off 700 human moderators in favor of AI moderators. With the massive increase in AI-generated content, it has created a mentality of fighting fire with fire by using AI to moderate the content that AI creates.

With AI models becoming faster and more accurate, the number of human moderators will surely decrease in the coming years, James said.

Editor's note: This article was originally published in 2023 and updated to reflect changes in the AI tool market.

David Weldon is a freelance writer in the Boston area who covers topics related to IT, data management, infosec, healthcare tech and workforce management.

Dig Deeper on Content management software and services