What is a "case facts block" and why is it injected outside the summarized history?
click or press Space to flipA persistent structured object holding transactional facts — amounts, dates, order numbers, statuses — injected verbatim into every prompt. Summarization turns "$47.20" into "large amount"; keeping it outside history prevents precision loss.
← / → to navigateWhat is the "lost in the middle" effect and how do you counteract it?
click or press Space to flipModels reliably process content at the start and end of long inputs but may skip material buried in the middle. Counteract by placing key findings first and organizing detailed results with explicit section headers.
← / → to navigateWhy trim verbose tool output before appending it to context?
click or press Space to flipA single order lookup can return 40+ fields when only 5 matter. Untrimmed outputs accumulate tokens disproportionately to their relevance, degrading context efficiency across turns.
← / → to navigateHow do you handle context for a session with multiple distinct customer issues?
click or press Space to flipExtract and persist structured issue data (order IDs, amounts, statuses) into a separate context layer for each issue — preventing distinct cases from blurring together across turns.
← / → to navigateWhat should upstream subagents return when downstream agents have limited context budgets?
click or press Space to flipStructured data — key facts, citations, relevance scores — rather than verbose content and reasoning chains. This includes metadata like dates and source locations for accurate downstream synthesis.
← / → to navigateWhat are the three legitimate escalation triggers for a customer support agent?
click or press Space to flip(1) Customer explicitly requests a human. (2) Policy is silent or ambiguous on the specific request. (3) Agent cannot make meaningful progress. Sentiment and confidence scores are not triggers.
← / → to navigateWhy are sentiment analysis and self-reported confidence scores unreliable escalation signals?
click or press Space to flipSentiment measures emotional tone, not case complexity. LLM self-reported confidence is poorly calibrated — the agent is already incorrectly confident on the hard cases it escalates incorrectly.
← / → to navigateA customer says "I want to speak to a manager." What does the agent do first?
click or press Space to flipRoute to a human agent immediately — no investigation, no attempt to resolve. Honoring an explicit human request without delay is a hard rule encoded in the system prompt.
← / → to navigateA frustrated customer has a delayed shipment the agent can resolve. What does the agent do?
click or press Space to flipAcknowledge frustration and offer resolution — the issue is within the agent's capability. Escalate only if the customer reiterates their preference for a human agent.
← / → to navigateA customer lookup returns multiple matching records. What should the agent do?
click or press Space to flipAsk for an additional identifier — email, order number, or ZIP code — to disambiguate. Never select a customer record by heuristic when multiple matches exist.
← / → to navigateWhat four fields should a structured error context include?
click or press Space to flipfailure_type · attempted_query (the exact query tried) · partial_results (what was found before failure) · alternatives (other approaches to try). The coordinator needs all four to recover intelligently.
What is the difference between an access failure and a valid empty result?
click or press Space to flipAccess failure: the query couldn't run (timeout, permission denied) — a retry decision is needed. Valid empty result: the query ran successfully but matched nothing — no retry needed, the absence is the answer.
← / → to navigateWhy is returning a generic "search unavailable" status an anti-pattern?
click or press Space to flipIt hides the failure type, the attempted query, and partial results. The coordinator cannot make an intelligent recovery decision — retry, reroute, proceed partial, or escalate — without this context.
← / → to navigateWhen should a subagent handle an error locally vs. propagate it to the coordinator?
click or press Space to flipHandle locally only transient failures the subagent can resolve (brief timeout — retry once). Propagate errors the subagent cannot resolve, always including what was attempted and any partial results.
← / → to navigateWhat is context degradation in extended codebase exploration sessions?
click or press Space to flipAfter many turns, the model starts giving inconsistent answers and citing "typical patterns" rather than specific classes or structures it actually discovered earlier in the session.
← / → to navigateWhat is the scratchpad file pattern and why is it used?
click or press Space to flipA file (e.g., findings.md) on disk where agents write key findings as they discover them. Re-read before subsequent questions to counteract context degradation and prevent drift back to "typical patterns."
Why delegate verbose codebase exploration to subagents instead of the main coordinator?
click or press Space to flipRaw file listings and dependency traces stay in the subagent's isolated context. The coordinator receives only a compact summary — its context stays uncluttered for high-level coordination.
← / → to navigateWhat is /compact used for in Claude Code?
Reduces context usage when an extended exploration session has filled the context window with verbose discovery output. Allows the session to continue without hitting context limits.
← / → to navigateHow does crash recovery work in a multi-agent codebase exploration?
click or press Space to flipEach agent exports state to a known location. On resume, the coordinator loads a structured manifest and injects agent state into prompts — the exploration resumes without restarting from scratch.
← / → to navigateWhy is "97% aggregate accuracy" insufficient before reducing human review?
click or press Space to flipA 97% overall metric can hide one document type or field failing at 71%. Always analyze accuracy by document type and by field before reducing human review on any segment.
← / → to navigateWhat is stratified random sampling in human review workflows?
click or press Space to flipSampling high-confidence auto-accepted extractions by document type and field (not uniformly) to measure ongoing error rates and detect novel error patterns before they propagate across large volumes.
← / → to navigateWhy calibrate field-level confidence thresholds on a labeled validation set?
click or press Space to flipRaw model confidence is not calibrated out of the box. Calibration against a labeled set ensures the threshold actually corresponds to the expected error rate for routing decisions.
← / → to navigateWhat two types of extractions should be routed to human review?
click or press Space to flipExtractions with low model confidence AND extractions from ambiguous or contradictory source documents. Both signal cases where automated acceptance carries significant risk.
← / → to navigateWhat four fields should every structured claim-source mapping include?
click or press Space to flipSource URL · document name · relevant excerpt · publication/collection date. These travel with each claim through synthesis — they cannot be reconstructed after the fact once stripped.
← / → to navigateSource A shows +14% revenue; Source B shows +9%. What does the synthesis agent do?
click or press Space to flipAnnotate both values with their sources and dates — do not select one arbitrarily. The coordinator decides reconciliation. The synthesis agent surfaces the conflict; it does not resolve it.
← / → to navigateWhy must structured outputs include publication or collection dates?
click or press Space to flipA Q1 figure vs. a Q4 figure may be a temporal update, not a factual contradiction. Without dates, the coordinator cannot distinguish a time-series difference from a genuine conflict.
← / → to navigateWhat content types require different rendering in synthesis output?
click or press Space to flipFinancial data → tables. News/qualitative findings → prose. Technical findings → structured lists. Converting everything to one format loses structure or nuance from the original source type.
← / → to navigateWhat is the "source attribution loss" failure mode in multi-source synthesis?
click or press Space to flipA summarization step strips source URLs, document names, and excerpts from findings. The final report presents claims with no traceability — downstream consumers cannot verify or cite sources.
← / → to navigateKeyboard: ← → navigate · Space flip · S shuffle · R restart · G got it · V review again