Results and artifacts

ASSERT writes local artifacts and evaluation results under the artifacts folder, sorted by the evaluation suites (configured for each evaluation config YAML specification):

artifacts/results/<suite>/

Run-level outputs are located under each evaluation suite:

artifacts/results/<suite>/<run>/

Artifact layout and description

artifacts/results/<suite>/
├── suite.json
├── taxonomy.json
├── test_set.jsonl
└── <run>/
    ├── manifest.json
    ├── config.yaml
    ├── inference_set.jsonl
    ├── scores.jsonl
    └── metrics.json
  • suite.json: evaluation suite metadata
  • taxonomy.json: behavior categories generated from your evaluation config YAML in the systematization step of the pipeline.
  • test_set.jsonl: single turn prompt and multi-turn scenario test cases generated by the test set generation step of the pipeline
  • manifest.json: stage-by-stage run status and timestamps
  • config.yaml: frozen config snapshot used for this run
  • inference_set.jsonl: target outputs plus trace references/events
  • scores.jsonl: per-case judge verdicts, dimensions, and evidence
  • metrics.json: aggregate rates by dimension and category, along with token usage metadata

Tip: After a run, start with metrics.json first then see the scores.jsonl before inspecting the inference_set.jsonl more closely.

Useful CLI commands for viewing results

assert-ai results list
assert-ai results status <suite>
assert-ai results status <suite> <run>
assert-ai results compare <suite> <run-a> <run-b>
assert-ai results compare-suites <suite-a>/<run-a> <suite-b>/<run-b>

See CLI Commands for full options.

View evaluation suite artifacts and run results in a local UI app

Access a rich inspector and editing application to view run status, evaluation suite artifacts such as richly rendered taxonomy of behavior categories and their associated policy labels.

cd viewer
npm install
npm run dev

The local hosted UI application server starts at http://localhost:5174. Paste this into your browser to open up the inspector view.