🎉 ai-infra v1.0 is here — Production-ready AI/LLM infrastructure
What's new
nfrax logonfrax

Infrastructure that just works. Ship products, not boilerplate.

Frameworks

  • svc-infra
  • ai-infra
  • fin-infra
  • robo-infra

Resources

  • Getting Started
  • What's New
  • Contributing

Community

  • GitHub

© 2026 nfrax. All rights reserved.

nfrax logonfrax
Start HereWhat's New
GitHub
ai-infra / API Reference

ToolUsageEvaluator

from ai_infra.eval import ToolUsageEvaluator
View source
ai_infra.eval
Extends:Evaluator[Any, Any]

Evaluate that an agent called expected tools. Uses span-based evaluation with OpenTelemetry to check which tools were called during agent execution.

Args

expected_tools: List of tool names that should have been called. forbidden_tools: List of tool names that should NOT have been called. require_all: If True, all expected_tools must be called. If False, at least one must be called. Default: True. check_order: If True, expected_tools must be called in order. Default: False.

Example

>>> from ai_infra.eval.evaluators import ToolUsageEvaluator >>> from pydantic_evals import Case, Dataset >>> >>> dataset = Dataset( ... cases=[Case(inputs="What's the weather?")], ... evaluators=[ ... ToolUsageEvaluator( ... expected_tools=["get_weather"], ... forbidden_tools=["delete_data"], ... ), ... ], ... )

Returns

dict with: - called_expected: bool (True if expected tools were called) - avoided_forbidden: bool (True if forbidden tools were avoided) - tools_called: list of tool names that were called

Constructor
ToolUsageEvaluator(expected_tools: list[str] = list(), forbidden_tools: list[str] = list(), require_all: bool = True, check_order: bool = False) -> None
ParameterTypeDefaultDescription
expected_toolslist[str]list()—
forbidden_toolslist[str]list()—
require_allboolTrue—
check_orderboolFalse—

Methods

On This Page

Constructorevaluate