About — Nipun Aggarwal

Available · Remote or Relocation

Location New Delhi, India

Languages English · Hindi

Education BA Psychology · BA History

Email nipunagarwal1212
@gmail.com

NIPUN
AGGARWAL

Trust & Safety · LLM Training · Content Moderation · Community Management

I started reviewing content at scale in 2019 — flagging hate speech, coordinating abuse patterns, protecting users on social media and gaming platforms. That was my entry point into safety work: sitting with the worst of what people produce online, and making judgment calls thousands of times a day.

Six years on, I'm doing the same thing — but for AI systems. I adversarially test large language models, probing the gaps between what a model says it won't do and what it actually does when you push the right way. The instincts built from years of human content review turn out to be exactly what red teaming needs.

What makes my path unusual is the academic lens I bring. Two degrees in psychology and history mean I don't just ask how a system fails — I ask why. I think about intent, context, and the human behaviour underneath the content. That combination of operational experience and analytical thinking is what I'm building a career in AI Governance on.

6+

Years in Safety

5

Companies

2

Degrees

3

Certifications

Career Path

TIMELINE

From content moderation to LLM training - every role added a new layer to how I think about safety, policy, and human behaviour online.

2026 →

RED TEAMING ANALYST

Mercor · Remote · Contract

Adversarially testing large language models to find where guardrails break. My job is to think like a bad actor — constructing prompts, personas, and scenarios that expose safety gaps before they can be exploited in the real world. I also evaluate model outputs for hallucinations, bias, and policy compliance.

Red Teaming LLM Safety Output Evaluation Adversarial Testing

2025–26

AI ANALYST

Turing · Remote · Contract

Reviewed AI-generated content for safety, quality, and policy compliance. Applied structured evaluation guidelines to identify high-risk or harmful outputs. Also contributed to data annotation work — building training datasets for Vision-Language-Action models across robotics, gameplay, and live-action sports video.

AI Evaluation RLHF Data Annotation Safety Review

2024–25

COMMUNITY COORDINATOR

Khoros · Bengaluru

Day-to-day content moderation across social media and brand communities — enforcing guidelines, managing abuse and crisis situations, and keeping user environments safe and constructive. Produced weekly reports on sentiment shifts, abuse patterns, and emerging community risks. This role sharpened my data analysis instincts around user behaviour.

Community Management Sentiment Analysis Crisis Management Reporting

2022–23

SR. TRUST & SAFETY ASSOCIATE

Tech Mahindra · Noida

Moderated user-built islands and user-generated content on one of the world's largest gaming platforms — a scale involving millions of active creators publishing their own in-game environments. Assessed context, slang, and cultural nuance to determine intent and severity. The creative sandbox environment demanded a particularly sharp eye for coded language and community-specific behaviour that doesn't read as a violation without the right cultural context.

Trust & Safety Player Safety Gaming Platforms Policy Enforcement

2021–22

SAFETY ANALYST

Cognizant · Gurugram

Reviewed user-generated content for compliance with platform policies, covering hate speech, harassment, NSFW material, and privacy violations including ID verification and data protection review. This was the role where my assessment methodology became systematic — moving from instinct to structured, repeatable process.

Content Moderation Privacy Analysis ID Verification Policy Compliance

2019–21

SR. CONTENT ANALYST

Neubotic · Delhi

Where it all started. Comprehensive review of sensitive and graphic content at scale — flagging non-compliance, exercising sound judgment on ambiguous cases, and handling content that required both policy knowledge and psychological resilience. The foundation for everything that came after.

Content Review Policy Judgment Sensitive Content

Capabilities

SKILLS &
TOOLS

A toolkit built across six years of safety operations — from human content review to adversarial AI testing.

Trust & Safety

🛡️

POLICY ENFORCEMENT

Applying platform policies to ambiguous real-world content — including cultural nuance, coded language, and intent assessment.

📊

RISK ASSESSMENT

Severity scoring, escalation judgment, and structured analysis of harm potential across violation categories.

🔺

ESCALATION MANAGEMENT

Knowing when a case exceeds standard review — CSAM, credible threats, self-harm — and routing appropriately.

🧩

ABUSE PATTERN DETECTION

Identifying coordinated campaigns, bot networks, and emerging threat patterns before they scale.

📋

TREND & INSIGHT ANALYSIS

Weekly reporting on sentiment shifts, violation trends, and user behaviour signals for platform health monitoring.

🔒

PRIVACY & ID VERIFICATION

Data protection review and identity verification workflows ensuring compliance with platform and regulatory standards.

LLM & AI

⚔️

RED TEAMING

Adversarial prompt construction — jailbreak attempts, persona injection, fictionalization, rhetoric, and multi-step attack chains.

🔬

OUTPUT EVALUATION

Structured assessment of model responses across task completion, factual accuracy, and AI performance dimensions.

🏷️

DATA ANNOTATION

Video annotation for Vision-Language-Action models — bounding boxes, action labels, keypoints across robotics and sports footage.

Tools & Platforms

🤖

LLM INTERFACES

Hands-on testing across ChatGPT, Gemini, Claude, and other frontier models — for red teaming, output evaluation, and day-to-day AI workflows.

🎯

PROMPT ENGINEERING

Structured prompt design for adversarial attack construction, jailbreak chaining, and precise output elicitation across different model architectures.

🏷️

ANNOTATION TOOLS

Video and image annotation platforms used for VLA model training — bounding boxes, keypoint labelling, action segmentation across multiple data pipelines.

🗄️

SQL

Data querying for moderation reporting — pulling violation trends, volume metrics, and escalation rates from structured databases.

📊

GOOGLE SHEETS · EXCEL

Pivot tables, dashboards, and weekly reporting on sentiment shifts, abuse patterns, and community health metrics.

📋

MODERATION QUEUES

Worked inside high-volume internal review tools across multiple platforms — managing case queues, applying policy tags, and documenting decisions at scale.

💬

COMMUNITY PLATFORMS

Direct experience managing communities across Reddit, Discord, and Khoros — monitoring discussions, identifying abuse patterns, and maintaining platform health in real time.

📝

REPORTING & DOCUMENTATION

Writing structured weekly reports on user behaviour, violation trends, and safety incidents for cross-functional stakeholders.

🔗

WORKFLOW & COLLABORATION

Salesforce for case management and audit trails, Slack for real-time cross-functional coordination across safety, policy, and ops teams.

🔍

OSINT & RESEARCH

Open-source research for trend identification — tracking emerging slang, coded language, and new attack vectors across online communities.

Academic Background

EDUCATION &
CERTS

Two degrees that shaped how I think about human behaviour — and three certifications that sharpened the technical edge.

Bachelor of Arts

PSYCHOLOGY

IGNOU · Delhi

Understanding motivation, cognition, and behaviour — directly applicable to how I assess intent behind content and why people push boundaries online.

Bachelor of Arts

HISTORY

IGNOU · Delhi

Systems thinking, pattern recognition across time, and the study of how ideas — including dangerous ones — spread and evolve. A useful lens for content and AI policy work.

🤖

ChatGPT Prompt Engineering for Developers

DeepLearning.AI

2025

🗄️

SQL Essential Training

LinkedIn Learning

2025

📊

Google Sheets: Pivot Tables

Udemy

2020

How I Work

MY
APPROACH

Safety work is judgment work. These are the principles I bring to every case — whether it's a piece of harmful content or an LLM guardrail under pressure.

01

CONTEXT IS EVERYTHING

The same words mean different things depending on who says them, where, and to whom. A slur between friends in a private chat, a threat in a public forum, a coded phrase in a gaming community — the content is identical, the violation is not. I never assess words in isolation.

02

ASK WHY, NOT JUST HOW

Most safety work focuses on how a system fails. My psychology background pushes me to ask why — what's the underlying motivation, the exploit logic, the human behaviour being modelled. Understanding the intent behind a jailbreak attempt is how you build better defences against the next one.

03

CALIBRATION OVER CAUTION

Over-enforcement is a failure mode too. A model that refuses everything is as broken as one that allows everything. Good safety work means knowing exactly where the line is — and being able to justify why something sits on either side of it. Calibration is a skill, not a setting.

04

DOCUMENT EVERYTHING

A moderation decision that isn't documented didn't happen. Every case, every pattern, every escalation should leave a trace — both for accountability and for learning. The patterns that show up in reporting are how you improve policy, not just respond to incidents.

The Person Behind the Work

BEYOND
WORK

The things that inform how I think — not just what I do.

🧠

ANTHROPOLOGY & HUMAN SYSTEMS

I'm drawn to questions about why societies organise the way they do — how rules emerge, how norms enforce themselves, and how systems break down under pressure. It's the same question I ask about platforms and AI, just at a different scale.

✈️

SOLO TRAVEL

I prefer travelling alone — it forces genuine engagement with a place rather than a curated group experience. The discomfort of navigating unfamiliar environments solo is where most of the learning happens. This is also how I think about difficult content: sit with the discomfort, don't look away.

🎯

DEEP FOCUS WORK

I do my best work in long, uninterrupted blocks — not quick sprints. Content review and red teaming both reward sustained attention: the patterns you miss in the first five minutes often reveal themselves in the forty-fifth. I build my environment to protect focus, not fragment it.

⚖️

ETHICS & SYSTEMS THINKING

The questions AI raises about accountability, consent, and harm are not new — they're the oldest questions in ethics, just with new actors. I find myself drawn to the philosophical side of AI governance as much as the operational one. The two aren't separable anyway.

📚

READING WIDELY

History, behavioural economics, cognitive science, long-form journalism on technology and society. The best preparation for understanding how AI systems fail is understanding how human systems fail — and there's a long record to learn from.

🌏

GLOBAL OUTLOOK

Safety work taught me that context is always local — a gesture, phrase, or image means different things in different cultural settings. I want to work in environments that take that seriously. AI governance that doesn't account for global cultural variance isn't governance — it's assumption.

LET'S CONNECT

Open to remote roles in Trust & Safety, LLM Training, and AI Governance. If you're building safer AI systems, I'd like to help.

Send an Email → LinkedIn ↗

ABOUTME

TIMELINE

SKILLS &TOOLS

EDUCATION &CERTS

MYAPPROACH

BEYONDWORK

ABOUT
ME

SKILLS &
TOOLS

EDUCATION &
CERTS

MY
APPROACH

BEYOND
WORK