Grounding LLM responses in chunks retrieved from an external corpus so the model reasons over real, citable sources instead of parametric memory alone.
Package-level reference for the langsmith SDK on PyPI — install, versioning, env-var setup, and observability alternatives.
Package-level reference for ragas on PyPI — install variants, LLM-as-judge dependencies, metric churn, and alternative evaluators.
Package-level reference for trulens-eval on PyPI — install variants, the trulens umbrella rename, framework extras, and alternative evaluators.
Build production evaluation pipelines for LLM applications — golden datasets, LLM-as-judge, rubrics, statistical significance, regression detection, and evals vs tests.
Trace, debug, evaluate, and monitor LLM applications with LangSmith. Covers tracing setup, datasets, evaluators, prompt hub, comparing runs, and CI integration.
Measure and improve RAG pipeline quality with ragas. Covers faithfulness, answer relevancy, context precision, context recall, dataset format, LLM judges, and CI integration.
Evaluate and monitor LLM applications with TruLens. Covers the RAG triad, feedback functions, TruChain, TruLlama, custom evaluators, the dashboard, and CI integration.
navigation
actions
cheat sheet pages