🎉 ai-infra v1.0 is here — Production-ready AI/LLM infrastructure
What's new
nfrax logonfrax

Infrastructure that just works. Ship products, not boilerplate.

Frameworks

  • svc-infra
  • ai-infra
  • fin-infra
  • robo-infra

Resources

  • Getting Started
  • What's New
  • Contributing

Community

  • GitHub

© 2026 nfrax. All rights reserved.

nfrax logonfrax
Start HereWhat's New
GitHub
ai-infra / API Reference

RAGFaithfulness

from ai_infra.eval import RAGFaithfulness
View source
ai_infra.eval
Extends:Evaluator[str, str]

Evaluate if an answer is grounded in the provided context. Uses an LLM judge to verify that the generated answer is faithful to the retrieved context and doesn't contain hallucinations.

Args

llm_judge: Model to use for judging (e.g., "gpt-4o-mini"). If None, uses default from environment. provider: LLM provider (openai, anthropic, google, etc.). context_key: Metadata key containing the context/retrieved docs. Default: "context". strict: If True, requires exact grounding. If False, allows reasonable inferences. Default: False.

Example

>>> from ai_infra.eval.evaluators import RAGFaithfulness >>> from pydantic_evals import Case, Dataset >>> >>> dataset = Dataset( ... cases=[ ... Case( ... inputs="What is the refund policy?", ... metadata={"context": "Refunds are available within 30 days."}, ... ), ... ], ... evaluators=[RAGFaithfulness(llm_judge="gpt-4o-mini")], ... )

Returns

EvaluationReason with: - value: float (faithfulness score 0.0-1.0) - reason: Explanation from the LLM judge

Constructor
RAGFaithfulness(llm_judge: str | None = None, provider: str | None = None, context_key: str = 'context', strict: bool = False) -> None
ParameterTypeDefaultDescription
llm_judgestr|NoneNone—
providerstr|NoneNone—
context_keystr'context'—
strictboolFalse—

Methods

On This Page

Constructorevaluateasync