WhyOps doesn’t just record what your agents did—it tells you why they failed and how to improve them. This is done through the Agent Analyses feature in theDocumentation Index
Fetch the complete documentation index at: https://whyops.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
whyops-analyse service.
Static Analysis
Every trace in WhyOps is subjected to Static Analysis. This runs automatically and detects common structural failures without needing an LLM judge.What it detects:
- Missing Parent Steps: An event references a parent step that does not exist in the trace.
- Orphan Tool Call Responses: A tool returned a result, but the request was never logged.
- Missing Tool Call Responses: The agent requested a tool, but it never received a response (timeout or crash).
- Identical Tool Call Loops: The agent called the exact same tool with the exact same arguments 3 or more times consecutively. (A classic infinite loop).
- Consecutive Error Streaks: The agent encountered 3 or more errors in a row without recovering.
- Latency Outliers: A step took significantly longer (e.g., > 1.5x the p95 latency) than the rest of the trace.
- Token Outliers: A step consumed significantly more tokens than the p95 average for the trace.
LLM-as-a-Judge Analysis
For deeper insights, WhyOps supports Agent Analysis Runs using an LLM Judge. You can configure WhyOps to evaluate traces on specific dimensions.Supported Dimensions
intent_precision: Did the agent correctly identify the user’s intent?followup_repair: How well did the agent handle clarifying questions or recover from ambiguity?answer_completeness_clarity: Was the final answer complete, accurate, and easy to understand?tool_routing_quality: Did the agent select the right tool for the job?tool_invocation_quality: Did the agent pass the correct arguments to the tool?tool_output_utilization: Did the agent correctly interpret and use the output of the tool?reliability_recovery: How gracefully did the agent handle errors or missing information?latency_cost_efficiency: Was the agent’s path to the answer efficient?conversation_ux: Was the tone and structure of the conversation appropriate?