LangGraph Critic Loop: Production Patterns
The researcher-critic-HITL pattern in LangGraph catches hallucinated citations before they ship. Production patterns for medical, financial, and legal agents.
The critic loop pattern works well for multi-agent research systems until production, where the state space explodes and the critic itself becomes a bottleneck. Production-grade critic loops need narrow scoring dimensions, a max iteration cap, a deterministic scoring layer alongside the LLM, and human-in-the-loop as a final gate.
The critic loop pattern
A researcher agent generates output. A critic agent evaluates it against defined criteria. If the output fails, the researcher revises and resubmits. This iterates until the critic approves or a max iteration limit is hit.
Why it breaks in production
- The critic evaluates dimensions it was not designed for, expanding the state space
- Without iteration limits, loops run indefinitely burning tokens and credits
- LLM-only evaluation is inconsistent: the same output gets different scores on re-evaluation
- No audit trail showing why the critic approved or rejected each iteration
Production patterns
Pattern 1: narrow scoring dimensions
Limit the critic to exactly 3 scoring dimensions. For a clinical research agent: source count, recency of sources, and relevance to the query. Each dimension gets a binary or 1-5 score. This prevents scope creep in evaluation.
Pattern 2: hybrid scoring (LLM + deterministic)
def critic_score(output: dict, sources: list) -> dict:
"""Hybrid scoring: deterministic checks + LLM nuance."""
score = {"pass": True, "reasons": []}
# Deterministic checks (fast, consistent)
if len(sources) < 3:
score["pass"] = False
score["reasons"].append(f"Only {len(sources)} sources, need 3+")
stale = [s for s in sources if days_old(s["date"]) > 30]
if len(stale) > len(sources) // 2:
score["pass"] = False
score["reasons"].append("Majority of sources older than 30 days")
# LLM check (only if deterministic checks pass)
if score["pass"]:
relevance = llm_judge_relevance(output["text"], sources)
if relevance < 0.7:
score["pass"] = False
score["reasons"].append(f"Relevance score {relevance:.2f} < 0.7")
return scorePattern 3: max iterations with graceful exit
MAX_CRITIC_ITERATIONS = 3
def critic_loop(state: dict) -> dict:
for i in range(MAX_CRITIC_ITERATIONS):
score = critic_score(state["output"], state["sources"])
state["audit_trail"].append({
"iteration": i + 1,
"score": score,
"timestamp": now_iso(),
})
if score["pass"]:
state["status"] = "approved"
return state
state["output"] = researcher_revise(state, score["reasons"])
state["status"] = "max_iterations_reached"
state["needs_human_review"] = True
return statePattern 4: search verification in the critic
For research agents, the critic should verify citations against live sources. This catches hallucinated references: the critic searches for each cited source and confirms it exists and contains the claimed information.
Audit trail for clinical contexts
LangGraph's checkpointer logs full state at each node, but it is not human-readable by default. For clinical research, surface the audit trail explicitly: which dimensions scored how, why the critic approved or rejected, and what changed between iterations.