ABOUT
ME

Six years on the front lines of platform safety. Now on the frontier of AI — stress-testing models, evaluating outputs, and building toward a career in AI Governance.

BasedNew Delhi, India
Experience6+ Years
Current RoleRed Teaming Analyst · Mercor
Open ToRemote · Relocation
Nipun Aggarwal
Available · Remote or Relocation
Location New Delhi, India
Languages English · Hindi
Education BA Psychology · BA History
Email nipunagarwal1212
@gmail.com
NIPUN
AGGARWAL
Trust & Safety · LLM Training · Content Moderation · Community Management

I started reviewing content at scale in 2019 — flagging hate speech, coordinating abuse patterns, protecting users on social media and gaming platforms. That was my entry point into safety work: sitting with the worst of what people produce online, and making judgment calls thousands of times a day.

Six years on, I'm doing the same thing — but for AI systems. I adversarially test large language models, probing the gaps between what a model says it won't do and what it actually does when you push the right way. The instincts built from years of human content review turn out to be exactly what red teaming needs.

What makes my path unusual is the academic lens I bring. Two degrees in psychology and history mean I don't just ask how a system fails — I ask why. I think about intent, context, and the human behaviour underneath the content. That combination of operational experience and analytical thinking is what I'm building a career in AI Governance on.

6+
Years in Safety
5
Companies
2
Degrees
3
Certifications
Career Path

TIMELINE

From content moderation to LLM training - every role added a new layer to how I think about safety, policy, and human behaviour online.

2026 →
RED TEAMING ANALYST
Mercor · Remote · Contract
Adversarially testing large language models to find where guardrails break. My job is to think like a bad actor — constructing prompts, personas, and scenarios that expose safety gaps before they can be exploited in the real world. I also evaluate model outputs for hallucinations, bias, and policy compliance.
Red Teaming LLM Safety Output Evaluation Adversarial Testing
2025–26
AI ANALYST
Turing · Remote · Contract
Reviewed AI-generated content for safety, quality, and policy compliance. Applied structured evaluation guidelines to identify high-risk or harmful outputs. Also contributed to data annotation work — building training datasets for Vision-Language-Action models across robotics, gameplay, and live-action sports video.
AI Evaluation RLHF Data Annotation Safety Review
2024–25
COMMUNITY COORDINATOR
Khoros · Bengaluru
Day-to-day content moderation across social media and brand communities — enforcing guidelines, managing abuse and crisis situations, and keeping user environments safe and constructive. Produced weekly reports on sentiment shifts, abuse patterns, and emerging community risks. This role sharpened my data analysis instincts around user behaviour.
Community Management Sentiment Analysis Crisis Management Reporting
2022–23
SR. TRUST & SAFETY ASSOCIATE
Tech Mahindra · Noida
Moderated user-built islands and user-generated content on one of the world's largest gaming platforms — a scale involving millions of active creators publishing their own in-game environments. Assessed context, slang, and cultural nuance to determine intent and severity. The creative sandbox environment demanded a particularly sharp eye for coded language and community-specific behaviour that doesn't read as a violation without the right cultural context.
Trust & Safety Player Safety Gaming Platforms Policy Enforcement
2021–22
SAFETY ANALYST
Cognizant · Gurugram
Reviewed user-generated content for compliance with platform policies, covering hate speech, harassment, NSFW material, and privacy violations including ID verification and data protection review. This was the role where my assessment methodology became systematic — moving from instinct to structured, repeatable process.
Content Moderation Privacy Analysis ID Verification Policy Compliance
2019–21
SR. CONTENT ANALYST
Neubotic · Delhi
Where it all started. Comprehensive review of sensitive and graphic content at scale — flagging non-compliance, exercising sound judgment on ambiguous cases, and handling content that required both policy knowledge and psychological resilience. The foundation for everything that came after.
Content Review Policy Judgment Sensitive Content
Capabilities

SKILLS &
TOOLS

A toolkit built across six years of safety operations — from human content review to adversarial AI testing.

Trust & Safety
🛡️
POLICY ENFORCEMENT
Applying platform policies to ambiguous real-world content — including cultural nuance, coded language, and intent assessment.
📊
RISK ASSESSMENT
Severity scoring, escalation judgment, and structured analysis of harm potential across violation categories.
🔺
ESCALATION MANAGEMENT
Knowing when a case exceeds standard review — CSAM, credible threats, self-harm — and routing appropriately.
🧩
ABUSE PATTERN DETECTION
Identifying coordinated campaigns, bot networks, and emerging threat patterns before they scale.
📋
TREND & INSIGHT ANALYSIS
Weekly reporting on sentiment shifts, violation trends, and user behaviour signals for platform health monitoring.
🔒
PRIVACY & ID VERIFICATION
Data protection review and identity verification workflows ensuring compliance with platform and regulatory standards.
LLM & AI
⚔️
RED TEAMING
Adversarial prompt construction — jailbreak attempts, persona injection, fictionalization, rhetoric, and multi-step attack chains.
🔬
OUTPUT EVALUATION
Structured assessment of model responses across task completion, factual accuracy, and AI performance dimensions.
🏷️
DATA ANNOTATION
Video annotation for Vision-Language-Action models — bounding boxes, action labels, keypoints across robotics and sports footage.
Tools & Platforms
🤖
LLM INTERFACES
Hands-on testing across ChatGPT, Gemini, Claude, and other frontier models — for red teaming, output evaluation, and day-to-day AI workflows.
🎯
PROMPT ENGINEERING
Structured prompt design for adversarial attack construction, jailbreak chaining, and precise output elicitation across different model architectures.
🏷️
ANNOTATION TOOLS
Video and image annotation platforms used for VLA model training — bounding boxes, keypoint labelling, action segmentation across multiple data pipelines.
🗄️
SQL
Data querying for moderation reporting — pulling violation trends, volume metrics, and escalation rates from structured databases.
📊
GOOGLE SHEETS · EXCEL
Pivot tables, dashboards, and weekly reporting on sentiment shifts, abuse patterns, and community health metrics.
📋
MODERATION QUEUES
Worked inside high-volume internal review tools across multiple platforms — managing case queues, applying policy tags, and documenting decisions at scale.
💬
COMMUNITY PLATFORMS
Direct experience managing communities across Reddit, Discord, and Khoros — monitoring discussions, identifying abuse patterns, and maintaining platform health in real time.
📝
REPORTING & DOCUMENTATION
Writing structured weekly reports on user behaviour, violation trends, and safety incidents for cross-functional stakeholders.
🔗
WORKFLOW & COLLABORATION
Salesforce for case management and audit trails, Slack for real-time cross-functional coordination across safety, policy, and ops teams.
🔍
OSINT & RESEARCH
Open-source research for trend identification — tracking emerging slang, coded language, and new attack vectors across online communities.
Academic Background

EDUCATION &
CERTS

Two degrees that shaped how I think about human behaviour — and three certifications that sharpened the technical edge.

Bachelor of Arts
PSYCHOLOGY
IGNOU · Delhi
Understanding motivation, cognition, and behaviour — directly applicable to how I assess intent behind content and why people push boundaries online.
Bachelor of Arts
HISTORY
IGNOU · Delhi
Systems thinking, pattern recognition across time, and the study of how ideas — including dangerous ones — spread and evolve. A useful lens for content and AI policy work.
🤖
ChatGPT Prompt Engineering for Developers
DeepLearning.AI
2025
🗄️
SQL Essential Training
LinkedIn Learning
2025
📊
Google Sheets: Pivot Tables
Udemy
2020
How I Work

MY
APPROACH

Safety work is judgment work. These are the principles I bring to every case — whether it's a piece of harmful content or an LLM guardrail under pressure.

01
CONTEXT IS EVERYTHING
The same words mean different things depending on who says them, where, and to whom. A slur between friends in a private chat, a threat in a public forum, a coded phrase in a gaming community — the content is identical, the violation is not. I never assess words in isolation.
02
ASK WHY, NOT JUST HOW
Most safety work focuses on how a system fails. My psychology background pushes me to ask why — what's the underlying motivation, the exploit logic, the human behaviour being modelled. Understanding the intent behind a jailbreak attempt is how you build better defences against the next one.
03
CALIBRATION OVER CAUTION
Over-enforcement is a failure mode too. A model that refuses everything is as broken as one that allows everything. Good safety work means knowing exactly where the line is — and being able to justify why something sits on either side of it. Calibration is a skill, not a setting.
04
DOCUMENT EVERYTHING
A moderation decision that isn't documented didn't happen. Every case, every pattern, every escalation should leave a trace — both for accountability and for learning. The patterns that show up in reporting are how you improve policy, not just respond to incidents.
The Person Behind the Work

BEYOND
WORK

The things that inform how I think — not just what I do.

🧠
ANTHROPOLOGY & HUMAN SYSTEMS
I'm drawn to questions about why societies organise the way they do — how rules emerge, how norms enforce themselves, and how systems break down under pressure. It's the same question I ask about platforms and AI, just at a different scale.
✈️
SOLO TRAVEL
I prefer travelling alone — it forces genuine engagement with a place rather than a curated group experience. The discomfort of navigating unfamiliar environments solo is where most of the learning happens. This is also how I think about difficult content: sit with the discomfort, don't look away.
🎯
DEEP FOCUS WORK
I do my best work in long, uninterrupted blocks — not quick sprints. Content review and red teaming both reward sustained attention: the patterns you miss in the first five minutes often reveal themselves in the forty-fifth. I build my environment to protect focus, not fragment it.
⚖️
ETHICS & SYSTEMS THINKING
The questions AI raises about accountability, consent, and harm are not new — they're the oldest questions in ethics, just with new actors. I find myself drawn to the philosophical side of AI governance as much as the operational one. The two aren't separable anyway.
📚
READING WIDELY
History, behavioural economics, cognitive science, long-form journalism on technology and society. The best preparation for understanding how AI systems fail is understanding how human systems fail — and there's a long record to learn from.
🌏
GLOBAL OUTLOOK
Safety work taught me that context is always local — a gesture, phrase, or image means different things in different cultural settings. I want to work in environments that take that seriously. AI governance that doesn't account for global cultural variance isn't governance — it's assumption.
LET'S CONNECT

Open to remote roles in Trust & Safety, LLM Training, and AI Governance. If you're building safer AI systems, I'd like to help.