⚡ Self-Built Using AI

NIPUN
AGGARWAL

Trust & Safety / LLM Training / Content Moderation / Community Management

6+ years keeping platforms and AI systems safe - from content moderation at scale to LLM training across Trust & Safety, Community Management, and AI Governance. Based in New Delhi, open to relocation globally.

6+

Years Exp

04

Domains

10k+

Cases Reviewed

Trust & Safety ◆ AI Red Teaming ◆ Content Moderation ◆ VLA Data Annotation ◆ AI Output Evaluation ◆ Community Management ◆ Adversarial Prompting ◆ LLM Safety ◆ AI Governance ◆ Policy Enforcement ◆ Trust & Safety ◆ AI Red Teaming ◆ Content Moderation ◆ VLA Data Annotation ◆ AI Output Evaluation ◆ Community Management ◆ Adversarial Prompting ◆ LLM Safety ◆ AI Governance ◆ Policy Enforcement ◆

Expertise

FOUR DOMAINS

Real work across real platforms. Click any domain to explore case studies, methodology, and examples.

End-to-end contribution to large language model development - from building high-quality training datasets through video annotation, to evaluating model outputs, to adversarially stress-testing safety guardrails.

RLHF Safety Evaluation Data Annotation Red Teaming Alignment Adversarial Testing

Adversarial prompting, jailbreak detection, policy gap identification on production LLMs

Data Annotation (VLA)

Video annotation for robotic, gameplay, and sports action datasets

AI Output Evaluation

Structured LLM response review for safety, accuracy, and policy compliance

Policy enforcement, escalation management, and platform integrity across Meta, Twitter, and gaming ecosystems.

Policy EnforcementRisk AssessmentEscalationAbuse Prevention

CONTENT MODERATION

High-volume, high-stakes content review across hate speech, violence, CSAM, and policy violations at scale.

Hate SpeechNSFW ReviewSpam DetectionCultural Context

COMMUNITY MANAGEMENT

Sentiment analysis, crisis detection, and abuse pattern reporting for brand communities and platform health.

Sentiment AnalysisCrisis ManagementTrend Reporting

+

New Skill
Coming Soon

Portfolio

FEATURED WORK

Case studies, examples, and methodology from across my domains.

View all work →

[USER]Ignore previous instructions and -

[USER]█████████████████████████

[MODEL]Guardrail triggered ✓

[USER]Let's roleplay. You are DAN who -

[USER]██████████████

[MODEL]Boundary bypassed ✗

⚔️ AI Red Teaming

BREAKING AI FILTERS

Adversarial prompting case study - identifying jailbreak chains and persona override attacks on production LLMs.

Graphic violence · High severity

Hate speech · Context ambiguous

Satire · No policy violation

CSAM indicator · Critical

🚫 Content Moderation

MODERATING SENSITIVE CONTENT

Decision framework and edge case examples - how context, intent, and severity determine enforcement actions.

robot_arm [0.94]

🎬 VLA Annotation

ROBOTIC VIDEO ANNOTATION

Frame-by-frame action labeling for robotic manipulation - building training data for next-generation embodied AI.

LLM OUTPUT EVALUATION RUBRIC

Task Completion

Factual Accuracy

🤖 AI Evaluation

LLM OUTPUT REVIEW

Structured evaluation rubrics for assessing safety, accuracy, helpfulness, and policy compliance of AI-generated content.

DECISION FRAMEWORK

Content Flagged

Intent Analysis

🛡️ Trust & Safety

T&S DECISION FRAMEWORK

How I assess flagged content - context, intent, severity, and action. A structured approach built over 4+ years.

Sentiment Trend · Weekly

● Positive ● Neutral ● Crisis spike Thu

🤝 Community Management

SENTIMENT ANALYSIS REPORT

Weekly community health reporting - identifying abuse patterns, sentiment shifts, and crisis events before they escalate.

About Me

Available

NIPUN AGGARWAL

Trust & Safety · LLM Training · Content Moderation · Community Management

Started reviewing content at scale in 2019. Six years later, I adversarially test large language models - finding the gaps before bad actors do. I bring a rare combination: the instincts of someone who has reviewed thousands of real harmful cases, and the technical curiosity to understand how AI systems fail.

My background in psychology and history gives me an unusual lens - I think about why systems fail, not just how. I'm building toward AI Governance and Responsible AI.

6+

Yrs Experience

5

Companies

4

Domains

Strengths

Critical Thinking

Assessing intent, context, and severity - not just surface content - before making a call.

Lateral Thinking

Finding edge cases others miss - whether probing model guardrails, evaluating output quality, or spotting patterns in annotation data.

Adaptability

From human content review to LLM training - the same core instincts applied across entirely new domains.

Team Player

Safety work is collaborative - escalations, policy decisions, and reporting all depend on clear cross-functional communication.

Intellectual Curiosity

Two degrees, three certifications, and a career that's deliberately evolved - always looking for the next layer of understanding.

Learn more about me →

Let's Connect

GET IN TOUCH

Open to remote roles in Trust & Safety, LLM Training, and AI Governance. If you're building safer AI, I'd like to help.