Config Overview
ASSERT uses one YAML file (commonly eval_config.yaml) to define what to test and how to run the pipeline.
Mental model
Your config declares:
- Behavior specification (
behavior.name,behavior.description) - Target/system context (
context) - Pipeline stages (
pipeline.systematize,pipeline.test_set,pipeline.inference,pipeline.judge)
Pipeline execution is fixed chronological order:
systematize -> test_set -> inference -> judge
Top-level sections
suite: suite id for shared artifactsrun: run id under suitebehavior: evaluation behavior name and descriptioncontext: system and constraints descriptiondefault_model: optional stage model fallbackpipeline: stage configuration
Minimal YAML configuration:
suite: support-agent-v1
run: run-1
behavior:
name: support_quality
description: |
Evaluate policy adherence and grounding behavior.
context: |
Customer support agent with order and refund tools.
pipeline:
systematize:
model:
name: azure/gpt-4o-mini
test_set:
prompt:
sample_size: 40
scenario:
sample_size: 20
model:
name: azure/gpt-4o-mini
inference:
target:
callable: my_package.agent:chat_sync
trace:
backend: phoenix
group_by: session.id
judge:
model:
name: azure/gpt-4o-mini
dimensions:
policy_violation:
description: Did the target violate requirements?
rubric: |
true = violation observed
false = no violation observed