CCA Study Guide – Domain 5: Context Management & Reliability

5.1

Manage conversation context to preserve critical information across long interactions

In a long customer support session, a summarized history is lethal to accuracy — numbers like "$47.20 refund" become "a large amount," order IDs vanish, and the model has nothing solid to reason from. The fix is a persistent case facts block: a small structured object containing transactional data (amounts, dates, order numbers, statuses) extracted outside the summarized history and injected verbatim into every prompt. Tool outputs deepen the problem independently: a single order lookup can return 40+ fields when only 5 matter, so trim before they accumulate in context. A third hazard is position: the model reliably processes content at the beginning and end of long inputs but may skip what is buried in the middle — the lost in the middle effect — so place key findings first and use explicit section headers to counteract it.

Extract transactional facts (amounts, dates, order numbers, statuses) into a persistent case-facts block kept outside the summarized history and injected into every prompt
For multi-issue sessions, persist structured issue data (IDs, amounts, statuses) in a separate context layer so distinct issues remain unambiguous
Trim verbose tool outputs to only the fields relevant to the current task before appending them to context
Place key findings at the beginning of aggregated inputs; use explicit section headers throughout to counteract the "lost in the middle" position effect
Require subagents to include metadata (dates, source locations, methodological context) in structured outputs; upstream agents should return key-facts structures rather than verbose reasoning chains when downstream context budgets are limited

Diagram

Distractor Traps (common wrong answers)

Progressive summarization of numerical values, percentages, and dates — condensing "$47.20" to "large amount" destroys precision needed later
Relying on model memory across turns without re-injecting the case facts block — the model's attention does not guarantee prior facts remain accessible
Including full raw tool output in context (40+ fields) without trimming to the 5 that matter — token cost grows faster than the useful signal
Placing key findings in the middle of a long aggregated input rather than at the start — the "lost in the middle" effect makes the model skip them
Having upstream agents return verbose content and reasoning chains when downstream agents have limited context budgets

Faded Example (JSON — case facts block)

{
  "case_facts": {
    "order_id":        "ORD-8841",
    "claim_amount":    "$47.20",      // exact dollar — never "a large amount"
    "return_deadline": "2026-03-15",  // exact ISO date — never "soon"
    "issue":           "screen cracked in transit",
    "status":          "return_initiated"
  }
}
// Injected verbatim into every prompt — kept outside the summarized history.
// Placing it inside the summary risks truncating the exact values above.

Recommended Material

Claude Docs (official): Prompting best practices — Long context tips · Position effects, putting data before instructions, mitigating "lost in the middle"

Anthropic Academy on Skilljar (optional): Claude Code in Action — Module 3 + Building with the Claude API — Module 7 · Controlling context (CC) · Prompt caching · Rules of prompt caching · Prompt caching in action (BCA)

Peace Of Code (YouTube, optional): Ep 18 — Why AI Agents Forget: Context Engineering

5.2

Design effective escalation and ambiguity resolution patterns

A customer support agent achieves only 55% first-contact resolution because it escalates routine refunds a policy lookup would resolve in seconds — while letting "I want to speak to a manager" slip past without immediate human handoff. The fix is not smarter sentiment detection or a confidence score: it is three explicit hard triggers encoded with few-shot examples in the system prompt: (1) the customer explicitly requests a human (route immediately — no investigation first), (2) policy is silent or ambiguous on the specific request (a competitor price-match request falls outside a policy that only addresses own-site adjustments), and (3) the agent cannot make meaningful progress. When a tool lookup returns multiple customer records, the agent must ask for an additional identifier — never select by heuristic.

Honor customer requests for a human agent immediately — no prior investigation attempt — by encoding this trigger explicitly in the system prompt
Acknowledge frustration and offer resolution when the issue is within the agent's capability; escalate only if the customer reiterates their preference for a human
Escalate when policy is silent or ambiguous on the specific request — not just when the case seems complex in general
Ask for an additional identifier (email, order number, ZIP) when a tool lookup returns multiple matching customer records
Encode escalation criteria as explicit rules with few-shot examples showing when to escalate versus resolve autonomously

Diagram

Distractor Traps (common wrong answers)

"Self-reported confidence score" — LLM self-reported confidence is poorly calibrated and does not correlate with actual case complexity
"Sentiment analysis as escalation signals" — negative sentiment does not reliably indicate whether the case exceeds the agent's capability
"Separate classifier model before prompt optimization" — over-engineered; requires labeled data and ML infrastructure when prompt optimization hasn't been tried
Attempting investigation before routing an explicit "I want a human" request — the trigger is clear and must be honored without delay
Selecting a customer record by heuristic when multiple matches exist — always request a disambiguating identifier instead

Recommended Material

Anthropic Academy on Skilljar (optional): Building with the Claude API — Module 4 + Claude 101 — Module 4 · Being clear and direct · Being specific (BCA) · Claude in action: use-cases by role (101)

Peace Of Code (YouTube, optional): Ep 20 — When AI Needs a Human

5.3

Implement error propagation strategies across multi-agent systems

A web search subagent times out mid-research. The wrong response returns a generic "search unavailable" status — or worse, an empty result set marked as successful — stripping the coordinator of everything it needs to recover. The right response returns structured error context: failure_type, attempted_query, partial_results, and alternatives. This data transforms a dead end into a decision point: the coordinator can retry, reroute to a mirror, proceed with partial results annotated for coverage gaps, or escalate — none of which is possible from a generic status. The critical distinction is between an access failure (timeout — retry decision needed) and a valid empty result (query succeeded but matched nothing); conflating them produces bad retry logic.

Return structured error context — failure type, what was attempted, partial results, alternatives — not a generic "search unavailable" status
Distinguish access failures (timeout, permission denied → retry decision) from valid empty results (successful query, zero matches) in all error reporting
Implement local recovery for transient failures; propagate only errors the subagent cannot resolve, always including what was attempted and any partial results
Annotate synthesis output with coverage notes indicating which topic areas have gaps due to unavailable sources — the report should state its own limits

Diagram

Distractor Traps (common wrong answers)

"Generic search unavailable status" — hides the failure type, the attempted query, and partial results; the coordinator cannot make an intelligent recovery decision
"Return empty result set marked as successful" — suppresses the error entirely; the coordinator proceeds as if the search succeeded, producing incomplete output with no signal
"Terminate the entire workflow on a single failure" — one subagent failure rarely justifies abandoning all other findings already collected
Propagating an error the subagent could have resolved locally — transient failures (network retry, brief timeout) should be handled before escalation

Faded Example (JSON — structured error context)

{
  "failure_type": "timeout",                    // access failure — not "search unavailable"
  "attempted":    "market share analysis 2026",  // exact query — coordinator can retry it
  "partial":      ["result_a"],                   // what was found before the failure
  "alternatives": ["cached_data", "mirror"]       // other approaches coordinator can try
}
// Return this structure — not a generic status string.
// A timeout and a valid empty result are not the same signal.

Recommended Material

Anthropic Academy on Skilljar (optional): Building with the Claude API — Modules 5 + 10 · Sending tool results · Handling message blocks (M5) · Agents and tools · Environment inspection (M10)

Peace Of Code (YouTube, optional): Ep 19 — Subagent Error Propagation & Context Management

5.4

Manage context effectively in large codebase exploration

An extended codebase exploration session degrades noticeably: the model begins citing "typical class hierarchies" instead of the actual inheritance structure it read 40 turns ago. Isolation is the fix — a main coordinator that holds only high-level findings delegates verbose discovery to named subagents ("find all test files," "trace the refund flow dependencies"), keeping their raw output in their isolated context while returning only a compact summary to the coordinator. Key findings must be persisted to a findings.md scratchpad on disk and re-injected in later turns to counteract drift; use /compact when the context window fills with verbose discovery output. For crashes, each subagent exports state to a known location and the coordinator loads a manifest on resume.

Spawn subagents with specific, scoped questions so verbose file listings and traces stay in their isolated contexts — not the main coordinator's
Summarize key findings from each exploration phase before spawning the next phase, injecting the summary into new subagent prompts
Maintain a scratchpad file (findings.md) and re-read it for subsequent questions to counteract context degradation
Use /compact to reduce context usage when extended sessions accumulate verbose discovery output
Design crash recovery with structured state exports (manifests) that the coordinator loads and injects into agent prompts on resume

Diagram

Distractor Traps (common wrong answers)

Loading the coordinator's context with raw file dumps from subagents — verbose discovery belongs in isolated subagent contexts, not the coordinator's
Ignoring the scratchpad pattern and relying on model memory alone — without written persistence, findings degrade and the model reverts to "typical patterns"
Spawning phase N+1 subagents without first summarizing phase N findings — each phase should inject a compact summary, not the full prior context
Treating a crash as unrecoverable — structured manifests allow the coordinator to reload state and resume without restarting the entire exploration

Recommended Material

Claude Docs (official): Best practices for Claude Code · Manage context with subagents, scratchpad files, and /compact

Anthropic Academy on Skilljar (optional): Claude Code in Action — Modules 2 + 3 + Building with the Claude API — Module 6 · Adding context · Controlling context (CC) · Text chunking strategies · The full RAG flow · BM25 lexical search · A Multi-Index RAG pipeline (BCA)

Peace Of Code (YouTube, optional): Ep 18 — Why AI Agents Forget: Context Engineering

5.5

Design human review workflows and confidence calibration

Routing every extracted document to a human reviewer is unsustainable at scale, but one aggregate accuracy number — 97% correct — can hide a document type or field that is failing at 71%. The reliable path is field-level confidence scores calibrated against a labeled validation set (raw model confidence is not calibrated out of the box), routing extractions below the threshold or with ambiguous/contradictory source documents to human review. The auto-accepted pile is not safe to leave unmonitored: stratified random sampling by document type and field ensures novel error patterns surface before they propagate across thousands of records.

Output field-level confidence scores (not a single document-level score) — routing decisions need granularity at the field, not the document
Calibrate review thresholds using a labeled validation set, not raw model output — model confidence is not calibrated by default
Route extractions with low confidence or from ambiguous/contradictory source documents to human review, prioritizing limited reviewer capacity where it matters most
Implement stratified random sampling of high-confidence auto-accepted extractions for ongoing error-rate measurement and novel pattern detection
Analyze accuracy by document type and by field before reducing human review on any segment — never rely on an aggregate metric alone

Diagram

Distractor Traps (common wrong answers)

Using a single document-level confidence score to route review — field-level granularity is required; a document can be high-confidence on most fields but fail on one
Trusting raw model confidence as a calibrated signal — it is not; calibrate against a labeled validation set first
Stopping human review of high-confidence extractions based on an aggregate accuracy metric — one failing segment (e.g., handwritten forms at 71%) can be hidden inside a 97% aggregate
Sampling uniformly rather than by stratum — document types and field types have different error profiles; uniform sampling under-samples the segments that fail

Recommended Material

Anthropic Academy on Skilljar (optional): Building with the Claude API — Module 3 + Introduction to Agent Skills — Module 6 · Model-based grading · Code-based grading (BCA) · Troubleshooting skills (IAS)

Peace Of Code (YouTube, optional): Ep 20 — When AI Needs a Human

5.6

Preserve information provenance and handle uncertainty in multi-source synthesis

Multi-source synthesis breaks silently when a summarization step drops source URLs, document names, and dates — the final report presents claims with no way to trace them back. Every subagent must output structured claim-source mappings (source URL, document name, relevant excerpt, publication date) and the synthesis agent must preserve and merge them rather than flattening them into anonymous prose. When two credible sources report conflicting statistics — one shows revenue growth of 14%, another 9% — both values are included and annotated with their sources and dates; the coordinator decides reconciliation. A publication date in the mapping prevents a Q1/Q4 difference from being misread as a factual contradiction.

Require subagents to output structured claim-source mappings (source URL, document name, relevant excerpt, date) in every structured result
Preserve and merge source attribution through synthesis steps — never flatten mappings into prose that loses the originating source
When two credible sources conflict, annotate both values with source and date; do not arbitrarily select one — let the coordinator decide reconciliation
Require publication or data-collection dates in all structured outputs so temporal differences between sources are not misinterpreted as factual contradictions
Structure synthesis reports with separate sections distinguishing well-established findings from contested ones; render content type-appropriately (financial data as tables, news as prose, technical findings as structured lists)

Diagram

Distractor Traps (common wrong answers)

Dropping source attribution during summarization — once URLs, doc names, and excerpts are removed from a compressed finding, they cannot be reconstructed downstream
Arbitrarily selecting one value when two credible sources conflict — the synthesis agent should annotate both with provenance and let the coordinator or human decide
Omitting publication dates from structured outputs — a Q1 vs. Q4 figure may be a temporal update, not a contradiction; without dates the coordinator cannot tell
Converting all content to a uniform format — financial data, news prose, and technical findings each need a different rendering; one format loses structure or nuance

Faded Example (JSON — claim-source mapping)

{
  "claim":      "Revenue grew 14% in Q1 2026",
  "source_url": "https://corp.example.com/reports/q1-2026.pdf",
  "doc_name":   "Q1 2026 Earnings Report",
  "excerpt":    "Total revenue increased 14.2% year-over-year...",
  "date":       "2026-04-15"   // required — distinguishes update from contradiction
}
// Without the "date" field, a Q1 vs Q4 figure looks like a conflict.
// Synthesis agent merges these mappings — it never strips them into prose.

Recommended Material

Anthropic Academy on Skilljar (optional): Building with the Claude API — Modules 6 + 7 · Text embeddings · The full RAG flow · A Multi-Index RAG pipeline (M6) · Citations (M7)

Further Viewing — Peace Of Code (YouTube)

Watch if a topic is still unclear after reading.

CCA Full Course — Peace Of Code →

Tasks 5.1, 5.4 — Context engineering & memory management Ep 18 — Why AI Agents Forget: Context Engineering →

Task 5.3 — Subagent error propagation & context boundaries Ep 19 — Subagent Error Propagation & Context Management →

Tasks 5.2, 5.5 — Human-in-the-loop & escalation patterns Ep 20 — When AI Needs a Human →

Context Management & Reliability

Manage conversation context to preserve critical information across long interactions

Design effective escalation and ambiguity resolution patterns

Implement error propagation strategies across multi-agent systems

Manage context effectively in large codebase exploration

Design human review workflows and confidence calibration

Preserve information provenance and handle uncertainty in multi-source synthesis

Further Reading — Claude Docs

Further Reading — Anthropic Academy on Skilljar

Further Viewing — Peace Of Code (YouTube)