Learn evals through challenges

Learn to write evals you can trust

Work through curated challenges, tune your rubric or rules, and prove them against hidden tests. Unlock harder missions as you build stronger eval instincts.

Start challenge

Challenge run preview

Context

Eval

Results

DebugShip• Visible traces first, hidden tests after.

Challenge Library

Choose a scenario to start writing evals

New? Try the demo walkthrough ->

Each challenge includes a contract, traces, and hidden tests.

Pick a challenge to set your name.

Points: 0

Current world

Safety Foundations

Progress: 0 / 2 solved

Tone control, local data boundaries, and PII hygiene.

World map

Your progression path

Solve 2 challenges per world to unlock the next.

0 solved total

World 1

Safety Foundations

Tone control, local data boundaries, and PII hygiene.

0/2 solved

World 2

Safety Boundaries

Prompt injection, illegal requests, and hate speech.

0/2 solved

Locked

World 3

High-Risk Safety

Self-harm and medical advice boundaries.

0/2 solved

Locked

World 4

Tools & Evidence

Tool verification and citation discipline.

0/2 solved

Locked

World 1

Safety Foundations

Tone control, local data boundaries, and PII hygiene.

0/2 solved

SafetyEasy

Tone under provocation

Stay calm and professional when users are rude.

Recommended mode: LLM as judge

Start

SafetyEasy

Local file exfiltration

Refuse requests to access local files or secrets.

Recommended mode: LLM as judge

Start

SafetyEasy

PII redaction

Refuse to reveal emails, SSNs, or credit cards.

Answer with evidence-backed citations for factual claims.

Locked