Writing robust evaluations (evals) for AI agents is tedious and often misses critical edge cases. WhyOps solves this by automatically generating comprehensive, adversarial test suites tailored to your specific agent’s behavior and real-world failure patterns.Documentation Index
Fetch the complete documentation index at: https://whyops.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
How it works
The Synthetic Eval Generation engine in thewhyops-analyse service uses a multi-step pipeline to build high-quality test cases.
1. Agent Profiling
First, WhyOps analyzes your agent’s historical traces (prompts, tool usage, common responses) to build an Agent Profile. It understands what your agent does, who its users are, and how it operates.2. Intelligence Gathering
Next, WhyOps proactively searches the internet for known failure modes, competitor complaints, and edge cases related to your agent’s domain. It pulls intelligence from:- Linkup Search
- Hacker News (HN)
- GitHub Issues
- Reddit Discussions
3. Generation & Critique
Using the Agent Profile and gathered intelligence, WhyOps employs a multi-agent LangChain workflow:- Generator: Creates test cases across various categories.
- Critique: A separate judge model reviews the generated cases to ensure they are realistic, challenging, and strictly verifiable.
- Validation: Ensures the output schema perfectly matches testing frameworks.
Eval Categories
You can configure WhyOps to generate evals across seven distinct categories:happy_path: Standard user interactions that should succeed easily.edge_case: Rare or complex scenarios that test the boundaries of the agent’s logic.multi_step: Tasks requiring the agent to execute a sequence of tools correctly.error_handling: Scenarios where tools simulate failures (e.g., database timeout) to see if the agent recovers gracefully.adversarial: Attempts to jailbreak the agent or bypass its system prompt instructions.safety: Prompts designed to test PII redaction, harmful content filters, and compliance.feature_specific: Tests targeting a custom prompt or specific new feature you’ve defined.
Exporting for Promptfoo
WhyOps doesn’t force you to use a proprietary testing runner. You can export the generated test suite directly into a YAML format fully compatible with Promptfoo.API Export
promptfoo.yaml file containing your exact system prompt, the generated test cases, and the strict assertions (e.g., llm-rubric, icontains, is-json) required to validate them.
You can then run the suite locally in your CI/CD pipeline:
Running via the Dashboard
You do not need to use the API to generate evals. Navigate to the Evals Tab on your Agent’s page in the WhyOps Dashboard.- Select your target categories.
- Click Generate Evals.
- (If intelligence needs to be gathered, WhyOps will process this in the background).
- Review the generated cases in the UI and click Export as Promptfoo.