from ai_infra.eval import ToolUsageEvaluatorEvaluate that an agent called expected tools. Uses span-based evaluation with OpenTelemetry to check which tools were called during agent execution.
expected_tools: List of tool names that should have been called. forbidden_tools: List of tool names that should NOT have been called. require_all: If True, all expected_tools must be called. If False, at least one must be called. Default: True. check_order: If True, expected_tools must be called in order. Default: False.
>>> from ai_infra.eval.evaluators import ToolUsageEvaluator >>> from pydantic_evals import Case, Dataset >>> >>> dataset = Dataset( ... cases=[Case(inputs="What's the weather?")], ... evaluators=[ ... ToolUsageEvaluator( ... expected_tools=["get_weather"], ... forbidden_tools=["delete_data"], ... ), ... ], ... )
dict with: - called_expected: bool (True if expected tools were called) - avoided_forbidden: bool (True if forbidden tools were avoided) - tools_called: list of tool names that were called