TRUST &
SAFETY

The policy and systems layer of platform safety β€” the full taxonomy of what gets enforced, how severity is determined, and when a case goes beyond standard review into specialist escalation.

Experience4+ Years
Violation CategoriesSafety Β· Integrity Β· Platform Abuse
Enforcement ActionsWarn Β· Remove Β· Suspend Β· Ban Β· Escalate
Violation Taxonomy Self-Harm Protocol Enforcement Actions Escalation Framework

VIOLATION TAXONOMY

The full landscape of policy violations I enforced β€” organised by category. Each type required a different assessment lens, threshold, and enforcement response.

Safety Violations
πŸ”₯
HATE SPEECH
Content targeting individuals or groups based on race, religion, gender, ethnicity, or nationality. Includes coded language, slurs, and linguistic obfuscation designed to evade filters.
Direct slurs and dehumanizing language
Coded references and abbreviations (e.g. KKK)
Symbols assessed in cultural context
⚠️
THREATS & VIOLENCE
Explicit or implied threats of physical harm. Assessed using three signals β€” call to action, specific target or method, and timeframe. All three together = immediate escalation.
Direct threats with target + method + time
Incitement to violence against groups
Glorification of real-world violence
🚨
CSAM
Child Sexual Abuse Material β€” zero tolerance, no contextual assessment. Any content sexualising minors is immediately escalated to a dedicated child safety specialist team.
Always escalated β€” never handled directly
No contextual exceptions apply
Specialist team with law enforcement liaison
Platform Integrity
πŸ€–
SPAM & BOTS
Automated or coordinated inauthentic behaviour β€” bot accounts, bulk posting, artificial engagement, and coordinated campaigns designed to manipulate platform metrics.
Automated posting patterns
Coordinated inauthentic behaviour
Artificial engagement farming
πŸ‘€
FAKE ACCOUNTS & IMPERSONATION
Accounts misrepresenting identity β€” impersonating real people, brands, or public figures. Required ID verification process to confirm the reporting user's real identity before action.
ID verification and cross-matching
Impersonation of public figures
Fake brand or organisation accounts
πŸ“°
MISINFORMATION
Demonstrably false information presented as fact β€” health misinformation, election interference content, and fabricated news designed to mislead users at scale.
Health and medical misinformation
Election and civic misinformation
Manipulated media and doctored content
Platform Abuse
πŸ“’
ADVERTISING VIOLATIONS
Unauthorised or deceptive advertising β€” promoting products or services in violation of platform policies. Includes prohibited product categories and misleading commercial claims.
Unauthorised commercial promotion
Prohibited product advertising
Misleading or deceptive claims
πŸ”—
THIRD PARTY TRAFFIC ABUSE
Posting content solely to drive traffic to external sites in violation of platform policies β€” referral farming, artificial link amplification, and traffic manipulation schemes.
Referral and affiliate link farming
Traffic manipulation to external sites
Link spam at scale
πŸ’Έ
ONLINE FRAUD & SCAMS
Financial deception targeting users β€” investment scams, giveaway fraud, impersonation scams, and phishing attempts designed to extract money or personal information.
Investment and crypto scams
Fake giveaways and prize fraud
Credential phishing attempts

SELF-HARM PROTOCOL

Self-harm content required a three-tier assessment β€” the same piece of content could warrant very different responses depending on intent, framing, and immediacy of risk.

⚠️ Three-Tier Assessment Framework
Tier 1 β€” Allow
Awareness & Discussion
Personal struggles, mental health conversations, support communities, and awareness content. Removing this harms the people who need support most β€” context and intent show this is safe to keep.
Keep β€” Monitor
Tier 2 β€” Remove
Promotion & Methods
Content that promotes self-harm, shares specific methods, instructs others on how to harm themselves, or actively incites someone to hurt themselves. Clear policy violation regardless of framing.
Remove Immediately
Tier 3 β€” Escalate
Immediate Risk
Active, real-time expressions of suicidal intent or imminent self-harm β€” where there is a credible and immediate threat to the user's life. Goes directly to specialist welfare team for urgent intervention.
Escalate β€” Specialist Team
Key Judgment Factors
Specificity
Is this a general expression of pain or a specific plan with method and timeframe?
Direction
Is it inward (personal struggle) or outward (instructing or inciting others)?
Immediacy
Is this historical, hypothetical, or happening right now? Real-time = escalate.

ENFORCEMENT ACTIONS

Not every violation warrants the same response. Enforcement was graduated β€” matched to the severity of the violation, account history, and platform context.

⚑
WARN
First-time or minor violations. User is notified, content may stay up. Creates a record for future enforcement.
πŸ—‘οΈ
REMOVE
Content taken down. Used for clear policy violations where the content itself is the problem.
⏸️
SUSPEND
Temporary account restriction. Used for repeat violations or serious single incidents requiring a cooling period.
🚫
PERMANENT BAN
Account terminated. Reserved for the most severe violations or repeat offenders with no path back.
πŸ”Ί
ESCALATE
Case passed to a specialist team. Used for CSAM, credible threats, and immediate self-harm risk.

ESCALATION FRAMEWORK

Most cases were handled directly. A small subset required escalation β€” where severity, legal risk, or specialist knowledge exceeded standard review. Knowing when to escalate was as important as the decision itself.

01
STANDARD REVIEW
The vast majority of cases β€” content reviewed against policy, action taken directly. No escalation needed. Decision documented in the moderation queue.
Hate speech Spam Misinformation Advertising violations Fraud NSFW
Handle Directly
02
POLICY REVIEW ESCALATION
Cases where policy interpretation required additional scrutiny β€” high-visibility accounts, ambiguous edge cases, or decisions with significant platform-wide implications. Routed for collaborative policy review.
High-profile accounts Ambiguous policy cases Crisis event content Borderline misinformation
Policy Review
03
SPECIALIST TEAM ESCALATION
Cases requiring dedicated specialist response β€” child safety, credible threats to life, and active self-harm risk. Passed directly to a specialist team with the appropriate training and tools to respond.
CSAM β€” always Credible imminent threat Active self-harm risk Real-time suicide risk
Specialist Team
Why Escalation Judgment Matters
Knowing when not to handle a case yourself is as critical as the moderation decision itself. Under-escalating a credible threat or a CSAM case has real-world consequences. Over-escalating burdens specialist teams and slows response times for genuine emergencies. The judgment lives in between.