Trust & Safety — Nipun Aggarwal

VIOLATION TAXONOMY

The full landscape of policy violations I enforced — organised by category. Each type required a different assessment lens, threshold, and enforcement response.

Safety Violations

🔥

HATE SPEECH

Content targeting individuals or groups based on race, religion, gender, ethnicity, or nationality. Includes coded language, slurs, and linguistic obfuscation designed to evade filters.

Direct slurs and dehumanizing language

Coded references and abbreviations (e.g. KKK)

Symbols assessed in cultural context

⚠️

THREATS & VIOLENCE

Explicit or implied threats of physical harm. Assessed using three signals — call to action, specific target or method, and timeframe. All three together = immediate escalation.

Direct threats with target + method + time

Incitement to violence against groups

Glorification of real-world violence

🚨

CSAM

Child Sexual Abuse Material — zero tolerance, no contextual assessment. Any content sexualising minors is immediately escalated to a dedicated child safety specialist team.

Always escalated — never handled directly

No contextual exceptions apply

Specialist team with law enforcement liaison

Platform Integrity

🤖

SPAM & BOTS

Automated or coordinated inauthentic behaviour — bot accounts, bulk posting, artificial engagement, and coordinated campaigns designed to manipulate platform metrics.

Automated posting patterns

Coordinated inauthentic behaviour

Artificial engagement farming

👤

FAKE ACCOUNTS & IMPERSONATION

Accounts misrepresenting identity — impersonating real people, brands, or public figures. Required ID verification process to confirm the reporting user's real identity before action.

ID verification and cross-matching

Impersonation of public figures

Fake brand or organisation accounts

📰

MISINFORMATION

Demonstrably false information presented as fact — health misinformation, election interference content, and fabricated news designed to mislead users at scale.

Health and medical misinformation

Election and civic misinformation

Manipulated media and doctored content

Platform Abuse

📢

ADVERTISING VIOLATIONS

Unauthorised or deceptive advertising — promoting products or services in violation of platform policies. Includes prohibited product categories and misleading commercial claims.

Unauthorised commercial promotion

Prohibited product advertising

Misleading or deceptive claims

🔗

THIRD PARTY TRAFFIC ABUSE

Posting content solely to drive traffic to external sites in violation of platform policies — referral farming, artificial link amplification, and traffic manipulation schemes.

Referral and affiliate link farming

Traffic manipulation to external sites

Link spam at scale

💸

ONLINE FRAUD & SCAMS

Financial deception targeting users — investment scams, giveaway fraud, impersonation scams, and phishing attempts designed to extract money or personal information.

Investment and crypto scams

Fake giveaways and prize fraud

Credential phishing attempts

SELF-HARM PROTOCOL

Self-harm content required a three-tier assessment — the same piece of content could warrant very different responses depending on intent, framing, and immediacy of risk.

⚠️ Three-Tier Assessment Framework

Tier 1 — Allow

Awareness & Discussion

Personal struggles, mental health conversations, support communities, and awareness content. Removing this harms the people who need support most — context and intent show this is safe to keep.

Keep — Monitor

Tier 2 — Remove

Promotion & Methods

Content that promotes self-harm, shares specific methods, instructs others on how to harm themselves, or actively incites someone to hurt themselves. Clear policy violation regardless of framing.

Remove Immediately

Tier 3 — Escalate

Immediate Risk

Active, real-time expressions of suicidal intent or imminent self-harm — where there is a credible and immediate threat to the user's life. Goes directly to specialist welfare team for urgent intervention.

Escalate — Specialist Team

Key Judgment Factors

Specificity

Is this a general expression of pain or a specific plan with method and timeframe?

Direction

Is it inward (personal struggle) or outward (instructing or inciting others)?

Immediacy

Is this historical, hypothetical, or happening right now? Real-time = escalate.

ENFORCEMENT ACTIONS

Not every violation warrants the same response. Enforcement was graduated — matched to the severity of the violation, account history, and platform context.

⚡

WARN

First-time or minor violations. User is notified, content may stay up. Creates a record for future enforcement.

🗑️

REMOVE

Content taken down. Used for clear policy violations where the content itself is the problem.

⏸️

SUSPEND

Temporary account restriction. Used for repeat violations or serious single incidents requiring a cooling period.

🚫

PERMANENT BAN

Account terminated. Reserved for the most severe violations or repeat offenders with no path back.

🔺

ESCALATE

Case passed to a specialist team. Used for CSAM, credible threats, and immediate self-harm risk.

ESCALATION FRAMEWORK

Most cases were handled directly. A small subset required escalation — where severity, legal risk, or specialist knowledge exceeded standard review. Knowing when to escalate was as important as the decision itself.

01

STANDARD REVIEW

The vast majority of cases — content reviewed against policy, action taken directly. No escalation needed. Decision documented in the moderation queue.

Hate speech Spam Misinformation Advertising violations Fraud NSFW

Handle Directly

02

POLICY REVIEW ESCALATION

Cases where policy interpretation required additional scrutiny — high-visibility accounts, ambiguous edge cases, or decisions with significant platform-wide implications. Routed for collaborative policy review.

High-profile accounts Ambiguous policy cases Crisis event content Borderline misinformation

Policy Review

03

SPECIALIST TEAM ESCALATION

Cases requiring dedicated specialist response — child safety, credible threats to life, and active self-harm risk. Passed directly to a specialist team with the appropriate training and tools to respond.

CSAM — always Credible imminent threat Active self-harm risk Real-time suicide risk

Specialist Team

Why Escalation Judgment Matters

Knowing when not to handle a case yourself is as critical as the moderation decision itself. Under-escalating a credible threat or a CSAM case has real-world consequences. Over-escalating burdens specialist teams and slows response times for genuine emergencies. The judgment lives in between.

TRUST &SAFETY

VIOLATION TAXONOMY

SELF-HARM PROTOCOL

ENFORCEMENT ACTIONS

ESCALATION FRAMEWORK

TRUST &
SAFETY