AC
AutoClaygent
Lesson 6 of 875% complete
12 min read

Test & Iterate

The automated testing loop AutoClaygent runs until 8.0+ scores

The Iteration Loop

Building a production-ready Claygent is never one-and-done. The secret is a tight feedback loop: draft → test → evaluate → improve → repeat.

How AutoClaygent Handles This

AutoClaygent runs this entire loop automatically. It tests prompts against sample data, scores the results, identifies issues, and iterates until the prompt scores 8.0+. What takes hours manually happens in minutes.

The Claygent Improvement Loop
1
Draft — Write initial prompt using patterns (platform detection, CRM validation, etc.)
2
Test — Run on 10-20 sample rows in Clay
3
Evaluate — Score results with 7-criterion rubric
4
Improve — Fix the biggest issue identified
Repeat — Until score is 8.0+

Setting Up Your Test Environment

Step 1: Create Test Data

Don't test on your full table. Create a small test set of 10-20 rows that includes:

  • Easy cases — Companies with obvious signals (clear portal URLs, prominent booking buttons)
  • Hard cases — Companies with sparse websites, custom portals, or no clear tech indicators
  • Edge cases — International sites, rebranded portals, multiple platforms
💡
Test Data Selection

For platform detection, include companies you KNOW use specific platforms (your existing customers) so you can verify accuracy. For CRM validation, include records you suspect are wrong.

Step 2: Add Your Claygent Column

Add a Claygent column with your prompt and configure:

  • Model: GPT-4o-mini (recommended for cost/quality balance)
  • Your API Key: Required — Clay doesn't provide one
  • JSON Schema: Paste your validated schema
⚠️
Model Selection

Never use Clay's built-in "Clay" model for production Claygents. Use your own API key with GPT-4o-mini or Claude Haiku.

Step 3: Capture Results for Analysis

Add an HTTP Action column after your Claygent to capture results:

{ "prompt_text": "{{Your full Claygent prompt text}}", "json_output": {{claygent_column_output}}, "prompt_version": "platform-detection-v1.0", "changes_from_previous": "Initial version" }

This sends each result to a webhook where you can analyze patterns across the batch.

See the Iteration in Action

Compare different versions of the same prompt to see how it improves:

Before/After — Prompt Evolution
Compare:
v1.0Score: 4.2
Find the CEO of {{domain}} and their email.
Issues:
  • No step-by-step instructions
  • No fallback plan
  • No JSON output format
  • No confidence handling
  • Two tasks combined (violates 3-task rule)
v1.3Score: 8.5
Given the company domain: {{domain}}

Goal: Find the CEO or primary founder.

Research steps:
1. Visit {{domain}}/about, {{domain}}/team, or {{domain}}/leadership
2. Look for titles: CEO, Chief Executive Officer, Founder, Co-Founder
3. Extract their full name

If not found on website:
- Search: "{{domain}}" CEO OR Founder site:linkedin.com
- Look for recent press releases or funding announcements

Verification:
- Confirm they are CURRENT (not "former")
- Check that the company domain matches

Output as JSON:
{
  "leader_name": "Full name of CEO/Founder",
  "title": "Their exact title",
  "linkedin_url": "LinkedIn profile URL or null",
  "source_url": "Where you found this",
  "is_current": true or false,
  "confidence": "high" | "medium" | "low"
}

IMPORTANT:
- Only report verified, current leadership
- If person appears to be former employee, do not include
- Set to null if uncertain
Improvements:
  • Added verification section for 'current' check
  • Added press/funding as secondary sources
  • Added source_url for audit trail
  • Added is_current boolean field
  • Added explicit instructions for handling uncertainty
Score improvement: 4.28.5(+4.3)
Production ready!

How Many Iterations?

Starting ScoreExpected IterationsTypical Issues
< 5.04-6 iterationsMissing detection methods, no decision trees
5.0 - 6.92-3 iterationsMissing fallbacks, weak confidence definitions
7.0 - 7.91-2 iterationsEdge case handling, evidence requirements
8.0+Done!Production ready

What to Fix First

When you find multiple issues, fix them in this order:

  1. Accuracy issues (25% weight) — Wrong platform detected, false positives
  2. Completeness issues (25% weight) — Missed platforms that were obviously there
  3. JSON issues (15% weight) — Schema mismatches, invalid enum values
  4. Source issues (15% weight) — Missing evidence URLs, unreliable detection methods
  5. Efficiency issues (10% weight) — Too many steps, redundant checks
💡
Pro Tip

Fix one issue per iteration. Don't try to fix everything at once — you won't know which change worked or made things worse.

Version Your Prompts

Always save each version of your prompt with a version number:

  • platform-detection-v1.0 — Initial version with subdomain patterns
  • platform-detection-v1.1 — Added redirect detection for custom domains
  • platform-detection-v1.2 — Added widget detection for embedded tools
  • platform-detection-v2.0 — Major restructure: added confidence tiers

This helps you track what changed and roll back if something breaks.

Real-World Iteration Example

Here's how a platform detection prompt evolved through testing:

VersionScoreIssue FoundFix Applied
v1.06.2Missing platforms on custom domainsAdded redirect detection step
v1.17.1False positives on similar-looking URLsAdded explicit subdomain pattern list
v1.27.8No evidence for widget detectionsRequired evidence_url for every platform
v1.38.3N/A — Production readyDeployed to full table

Key Takeaways

  • Test on 10-20 diverse rows, not your full table
  • Use your own API key (GPT-4o-mini or Claude Haiku)
  • Fix one issue per iteration
  • Target 8.0+ before deploying to production
  • Version your prompts to track changes and enable rollback

Ready to build Claygents that actually work?

Get the complete course with interactive playground and all 9 prompt patterns.