Content Moderation — Nipun Aggarwal

DECISION FRAMEWORK

Content moderation is not about following a rulebook — it's about understanding intent, context, and risk. Here's how I approached every case.

How I Assessed Every Case

Context First

The same content means different things in different contexts. A symbol, word, or image is never assessed in isolation — who posted it, where, in response to what, and to whom all matter.

Intent Assessment

Surface reading is not enough. Language is manipulated deliberately — abbreviations, name formats, coded language. The question is always: what is this person actually trying to say or do?

Threat Severity Model

For threats specifically, three signals distinguish venting from credible danger: a call to action, a specific target or method, and a timeframe. All three together = immediate escalation.

Evasion Detection

Bad actors adapt. They use abbreviations, alternate spellings, coded references, and plausible deniability to evade filters. Pattern recognition across cases builds the ability to spot these.

Free Speech vs Safety Balance

During high-profile events, content volume spikes and opinions polarize. The job is not to pick a side — it's to allow legitimate discourse while removing content that crosses into harassment, incitement, or policy violation regardless of which side it comes from.

CASE STUDIES

Real scenarios from Twitter/X that required contextual judgment beyond standard policy application. All examples are anonymized.

Case #001 Linguistic Obfuscation

The Situation

A post contained the name "John Ben Dover" — on first reading, it appeared to be a reference to a person named John Ben Dover. No obvious violation on surface review.

The Analysis

On closer reading, the name was a deliberate construction — "John Bendover" — a sexual taunt directed at a real person named John. The harasser used a name format as cover for targeted harassment.

Key Insight Surface-level reading fails here — linguistic intent requires slow, deliberate parsing, especially with names and phrases that could have dual readings. Removed

Case #002 Crisis Moderation

The Situation

During the Kyle Rittenhouse controversy, content volume spiked massively — posts both defending and condemning him. The platform had to handle thousands of pieces of related content simultaneously.

The Challenge

The key tension: free speech vs safety. Opinions on both sides — however strong — are protected expression. But harassment, incitement, and threats related to the case still violate policy regardless of which side they come from.

Approach Opinions and commentary allowed regardless of stance. Any content crossing into harassment, threats, or incitement removed — applied equally to both sides of the debate. Contextual

Case #003 Threat Assessment

The Challenge

Distinguishing genuine threats from emotional venting is one of the hardest judgment calls in T&S. People express anger online constantly — not all of it is dangerous.

The Framework

A credible threat requires three signals: ① Call to action — explicit intent to harm. ② Specific target or method — naming a person, place, or weapon. ③ Timeframe — when this will happen. All three together = escalate immediately.

Key Insight "I could kill him" = venting. "I'm going to hurt [name] at [place] on [date]" = credible threat requiring immediate escalation. Escalate

Case #004 Identity & Impersonation

The Situation

Reports from real users claiming someone was impersonating them — running fake accounts in their name, using their identity, potentially damaging their reputation or deceiving their followers.

The Process

① Request government-issued ID from the reporting user. ② Verify the ID is genuine and not forged. ③ Cross-match the ID details with the account claiming to be them. ④ If verified, remove the impersonating account and duplicate content.

Key Insight ID verification required careful fraud detection — fake IDs were sometimes submitted. The process had to be rigorous to avoid both false positives and false negatives. ID Verified → Removed

Case #005 Cultural Context

The Situation

Content featuring a swastika — a symbol flagged automatically and by users as hate speech due to its association with the Nazi party and the Holocaust.

The Complexity

The swastika is also a sacred Hindu symbol with thousands of years of religious significance, predating its Nazi appropriation. Context determined everything — a Hindu religious post using the symbol in a ritual context is not a policy violation.

Key Insight Content identification requires cultural literacy. The same symbol carries completely different meaning depending on context, community, and intent. A global platform serves a global audience. Context-Dependent

Case #006 Evasion Tactics

The Situation

References to banned radical groups — like the KKK (Ku Klux Klan) — were prohibited on the platform. Bad actors adapted by using abbreviations, coded references, and ambiguous phrasing to evade detection.

The Detection

Catching these required recognizing the intent behind the evasion — "KKK", "triple K", "the three letters", or contextual references all pointing to the same banned entity. Pattern recognition built over time through repeated exposure to these tactics.

Key Insight This skill transfers directly to AI red teaming — bad actors use the same evasion logic against LLMs. Recognizing obfuscation patterns is the same cognitive task in both domains. Removed

SCOPE OF WORK

The types of platforms and content I reviewed — each environment brought its own context, culture, and moderation challenges.

Platform Types

📱

SOCIAL MEDIA

Large-scale public platforms with millions of daily active users. High content velocity, politically sensitive material, and complex cultural nuance across global audiences.

🎮

GAMES

Gaming ecosystems with younger demographics, in-game chat, and community behaviour. Harassment took platform-specific forms — teabagging, griefing, toxic usernames, and in-game threats.

💬

COMMUNITY FORUMS

Brand and interest-based communities with established norms. Focus on maintaining constructive discussion, managing abuse patterns, and protecting community health over time.

Content Types Reviewed

🖼️

IMAGES

Photos, memes, screenshots — including NSFW, gore, hate symbols, and doctored media.

📝

TEXT

Posts, comments, captions, bios — including coded language, obfuscation, and linguistic manipulation.

🎬

VIDEOS

Short and long-form video — violence, graphic content, crisis footage, and manipulated media.

👤

ACCOUNT INFO

Usernames, display names, profile bios — including impersonation, slurs hidden in handles, and fake identities.

CONTENTMODERATION

DECISION FRAMEWORK

CASE STUDIES

SCOPE OF WORK

CONTENT
MODERATION