ai-infra / API Reference

RAGFaithfulness

from ai_infra.eval import RAGFaithfulness

ai_infra.eval

Extends:Evaluator[str, str]

Evaluate if an answer is grounded in the provided context. Uses an LLM judge to verify that the generated answer is faithful to the retrieved context and doesn't contain hallucinations.

Args

llm_judge: Model to use for judging (e.g., "gpt-4o-mini"). If None, uses default from environment. provider: LLM provider (openai, anthropic, google, etc.). context_key: Metadata key containing the context/retrieved docs. Default: "context". strict: If True, requires exact grounding. If False, allows reasonable inferences. Default: False.

Example

>>> from ai_infra.eval.evaluators import RAGFaithfulness >>> from pydantic_evals import Case, Dataset >>> >>> dataset = Dataset( ... cases=[ ... Case( ... inputs="What is the refund policy?", ... metadata={"context": "Refunds are available within 30 days."}, ... ), ... ], ... evaluators=[RAGFaithfulness(llm_judge="gpt-4o-mini")], ... )

Returns

EvaluationReason with: - value: float (faithfulness score 0.0-1.0) - reason: Explanation from the LLM judge

Constructor

RAGFaithfulness(llm_judge: str | None = None, provider: str | None = None, context_key: str = 'context', strict: bool = False) -> None

Parameter	Type	Default	Description
`llm_judge`	`str\|None`	None	—
`provider`	`str\|None`	None	—
`context_key`	`str`	'context'	—
`strict`	`bool`	False	—

RAGFaithfulness

Args

Example

Returns

Methods

RAGFaithfulness

Args

Example

Returns

Methods