From login to lab submission — the data flow, the sandbox runtime, and the rules engine that grades every adversarial attempt.
A typical trainee journey from authentication to graded submission.
Selects role, enters workspace.
Issues scoped session token.
Spins isolated lab container.
Grades payload vs policy.
Persists state & scores.
┌──────────────────────────┐
│ Web Client (UI) │
└─────────────┬────────────┘
│ HTTPS
┌─────────────▼────────────┐
│ Auth Gateway │
│ (sessions · RBAC) │
└─────────────┬────────────┘
┌───────────────┼───────────────┐
┌────────▼─────────┐ ┌───▼────────┐ ┌────▼──────────┐
│ Modules Service │ │ Sandbox │ │ Submissions │
│ (catalog · CRUD) │ │ Orchestr. │ │ Service │
└────────┬─────────┘ └───┬────────┘ └────┬──────────┘
│ │ │
│ ┌───────▼───────┐ │
│ │ Lab Container │ │
│ │ (isolated) │ │
│ └───────┬───────┘ │
│ │ stdout/stderr │
└─────►┌────────▼────────┐◄──────┘
│ Rules Engine │
│ (policy · YAML)│
└────────┬────────┘
│
┌────────▼────────┐
│ Postgres + S3 │
└─────────────────┘Each lab ships with a declarative policy file. The engine evaluates submissions against deterministic checks (regex, AST), heuristic checks (similarity), and dynamic checks (sandbox replay).
Patterns, schemas, banned tokens.
Re-execute the payload in a clean sandbox.
Pass/Fail thresholds + human override.
module: prompt-injection
pass_score: 70
checks:
- id: bypass_system_prompt
type: dynamic
weight: 40
asserts:
response.contains: "SECRET_FLAG"
- id: defensive_patch
type: static
weight: 30
asserts:
submission.contains_any:
- "allowlist"
- "structured_output"
- id: writeup_quality
type: heuristic
weight: 30
rubric: clarity_and_specificityEach session = fresh ephemeral runtime.
No outbound by default. Allowlist only.
Policies live in Git. PR-reviewed.
Trainee · Admin separation enforced server-side.