Test & Iterate
The automated testing loop AutoClaygent runs until 8.0+ scores
The Iteration Loop
Building a production-ready Claygent is never one-and-done. The secret is a tight feedback loop: draft → test → evaluate → improve → repeat.
AutoClaygent runs this entire loop automatically. It tests prompts against sample data, scores the results, identifies issues, and iterates until the prompt scores 8.0+. What takes hours manually happens in minutes.
Setting Up Your Test Environment
Step 1: Create Test Data
Don't test on your full table. Create a small test set of 10-20 rows that includes:
- Easy cases — Companies with obvious signals (clear portal URLs, prominent booking buttons)
- Hard cases — Companies with sparse websites, custom portals, or no clear tech indicators
- Edge cases — International sites, rebranded portals, multiple platforms
For platform detection, include companies you KNOW use specific platforms (your existing customers) so you can verify accuracy. For CRM validation, include records you suspect are wrong.
Step 2: Add Your Claygent Column
Add a Claygent column with your prompt and configure:
- Model: GPT-4o-mini (recommended for cost/quality balance)
- Your API Key: Required — Clay doesn't provide one
- JSON Schema: Paste your validated schema
Never use Clay's built-in "Clay" model for production Claygents. Use your own API key with GPT-4o-mini or Claude Haiku.
Step 3: Capture Results for Analysis
Add an HTTP Action column after your Claygent to capture results:
{
"prompt_text": "{{Your full Claygent prompt text}}",
"json_output": {{claygent_column_output}},
"prompt_version": "platform-detection-v1.0",
"changes_from_previous": "Initial version"
}This sends each result to a webhook where you can analyze patterns across the batch.
See the Iteration in Action
Compare different versions of the same prompt to see how it improves:
Find the CEO of {{domain}} and their email.- ✗No step-by-step instructions
- ✗No fallback plan
- ✗No JSON output format
- ✗No confidence handling
- ✗Two tasks combined (violates 3-task rule)
Given the company domain: {{domain}}
Goal: Find the CEO or primary founder.
Research steps:
1. Visit {{domain}}/about, {{domain}}/team, or {{domain}}/leadership
2. Look for titles: CEO, Chief Executive Officer, Founder, Co-Founder
3. Extract their full name
If not found on website:
- Search: "{{domain}}" CEO OR Founder site:linkedin.com
- Look for recent press releases or funding announcements
Verification:
- Confirm they are CURRENT (not "former")
- Check that the company domain matches
Output as JSON:
{
"leader_name": "Full name of CEO/Founder",
"title": "Their exact title",
"linkedin_url": "LinkedIn profile URL or null",
"source_url": "Where you found this",
"is_current": true or false,
"confidence": "high" | "medium" | "low"
}
IMPORTANT:
- Only report verified, current leadership
- If person appears to be former employee, do not include
- Set to null if uncertain- ✓Added verification section for 'current' check
- ✓Added press/funding as secondary sources
- ✓Added source_url for audit trail
- ✓Added is_current boolean field
- ✓Added explicit instructions for handling uncertainty
How Many Iterations?
| Starting Score | Expected Iterations | Typical Issues |
|---|---|---|
| < 5.0 | 4-6 iterations | Missing detection methods, no decision trees |
| 5.0 - 6.9 | 2-3 iterations | Missing fallbacks, weak confidence definitions |
| 7.0 - 7.9 | 1-2 iterations | Edge case handling, evidence requirements |
| 8.0+ | Done! | Production ready |
What to Fix First
When you find multiple issues, fix them in this order:
- Accuracy issues (25% weight) — Wrong platform detected, false positives
- Completeness issues (25% weight) — Missed platforms that were obviously there
- JSON issues (15% weight) — Schema mismatches, invalid enum values
- Source issues (15% weight) — Missing evidence URLs, unreliable detection methods
- Efficiency issues (10% weight) — Too many steps, redundant checks
Fix one issue per iteration. Don't try to fix everything at once — you won't know which change worked or made things worse.
Version Your Prompts
Always save each version of your prompt with a version number:
platform-detection-v1.0— Initial version with subdomain patternsplatform-detection-v1.1— Added redirect detection for custom domainsplatform-detection-v1.2— Added widget detection for embedded toolsplatform-detection-v2.0— Major restructure: added confidence tiers
This helps you track what changed and roll back if something breaks.
Real-World Iteration Example
Here's how a platform detection prompt evolved through testing:
| Version | Score | Issue Found | Fix Applied |
|---|---|---|---|
| v1.0 | 6.2 | Missing platforms on custom domains | Added redirect detection step |
| v1.1 | 7.1 | False positives on similar-looking URLs | Added explicit subdomain pattern list |
| v1.2 | 7.8 | No evidence for widget detections | Required evidence_url for every platform |
| v1.3 | 8.3 | N/A — Production ready | Deployed to full table |
Key Takeaways
- Test on 10-20 diverse rows, not your full table
- Use your own API key (GPT-4o-mini or Claude Haiku)
- Fix one issue per iteration
- Target 8.0+ before deploying to production
- Version your prompts to track changes and enable rollback