Getting Started

This guide covers installation and your first end-to-end evaluation run.

Prerequisites

  • Python 3.11+
  • pip
  • Model credentials in environment variables (for example AZURE_API_KEY and AZURE_API_BASE for Azure OpenAI)

Install with a quickstart example: LangGraph travel planner

The flagship example evaluates a multi-tool LangGraph travel planner. The target is reached through target.callable — the same integration boundary you would use for any agent or multi-agent system — and Phoenix/OpenInference auto-instrumentation captures the agent's OpenTelemetry spans so the judge can cite tool calls and routing decisions. This is the recommended integration shape for any non-trivial agent.

Bash:

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[otel,langgraph]"
cp .env.example .env

Edit .env with credentials for your provider. Defaults match the example's azure/... model. Any LiteLLM provider (OpenAI, Anthropic, Bedrock, Vertex, Ollama, and others) works.

PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -e ".[otel,langgraph]"
Copy-Item .env.example .env

Run your first evaluation

Optional: run Phoenix locally if you want to browse traces.

phoenix serve

Run the flagship quick start example:

assert-ai run --config examples/travel_planner_langgraph/eval_config.yaml

Check run status:

assert-ai results status travel-planner-langgraph-v1 demo-1

Artifacts are written under:

artifacts/results/travel-planner-langgraph-v1/demo-1/

Codespaces / VS Code Dev Containers

Open in GitHub Codespaces

The repo includes a minimal dev container for the LangGraph quickstart. It installs .[otel,langgraph,dev], copies .env.example to .env if needed, and forwards Phoenix on port 6006. After container setup, add your provider credentials to .env and run the same assert-ai run command.

Windows PowerShell equivalent:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -e ".[otel,langgraph]"
Copy-Item .env.example .env

phoenix serve
assert-ai run --config examples/travel_planner_langgraph/eval_config.yaml
assert-ai results status travel-planner-langgraph-v1 demo-1

What just happened

  1. systematize expanded the behavior spec into behavior categories.
  2. test_set generated prompt and scenario test cases.
  3. inference executed the target for each case.
  4. judge produced verdicts, evidence, and aggregate metrics.

What the quickstart does:

StepDeveloper behaviorCurrent YAML / artifact
1Eval spec: plain-English behavior requirementsbehavior.name and behavior.description live inline in eval_config.yaml
2Behavior categories: generated failure-mode taxonomypipeline.systematize writes taxonomy.json
3Test cases: prompts and multi-turn scenariospipeline.test_set writes test_set.jsonl
4Execute: run the agent and capture tracespipeline.inference.target.callable + target.trace write inference_set.jsonl
5Judge: score against your rubricpipeline.judge.dimensions writes scores.jsonl and metrics.json

CLI helper assistant to create your own config

Don't want to write YAML by hand? assert-ai init starts a conversational LLM assistant that asks about your agent, eval goals, and constraints, then proposes a complete config YAML file to use for your evaluations.

assert-ai init needs an LLM to power the conversation. Pass --model with any LiteLLM model string and make sure the matching API key is set in your .env file (loaded by default) or environment:

assert-ai init --model azure/gpt-5.4
# or skip the first question:
assert-ai init --model azure/gpt-5.4 --describe "A customer-support chatbot with order-lookup and refund tools"
# or edit/extend an existing config:
assert-ai init --model azure/gpt-5.4 --from examples/travel_planner_langgraph/eval_config.yaml

See CLI Commands for the full option reference.