Automated testing and validation system for Freshworks marketplace apps with intelligent error learning.
Validates Platform 3.0 compliance, Crayons UI usage, and learns from validation failures to improve app generation over time.
Two Testing Modes:
- Generate & Test - Create new apps from prompts and validate
- Evaluate - Test existing apps without regeneration
Key Features:
- β Quick Setup Script - Interactive criteria creation and app setup
- β FDK validation with detailed error reporting
- β Platform 3.0 compliance checking (5 criteria)
- β Crayons UI component detection
- β Automated error learning and pattern detection
- β Custom requirements tracking
- β 100-point scoring with letter grades (A-F)
benchmarking/
βββ automate_test.py # Main automation script
βββ setup_test.py # Quick setup for criteria & apps
βββ convert_criteria.py # Plain text to JSON converter (NEW!)
βββ error_learner.py # Error pattern detection & learning
βββ requirements.txt # Python dependencies
βββ example-criteria.json # Example criteria template
βββ use-cases/ # Test case definitions
βββ test-criteria/ # Validation criteria per app
βββ results/ # Benchmarking scores & reports
βββ test-apps/ # Sample test applications
βββ .dev/ # Error learning data
βββ comparison/error_database.json
βββ planning/AUTO_SKILL_UPDATES.md
pip install -r requirements.txt
npm install -g @freshworks/fdkcd benchmarking
# MODE 1: Generate & Test New Apps
python3 automate_test.py --app APP003
# MODE 2: Evaluate Existing Apps (Quick Setup)
# Step 1: Setup accepts BOTH plain text and JSON!
python3 setup_test.py APP001
# (paste plain text OR JSON, type 'END')
# Step 2: Run evaluation with saved criteria file
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001 --requirements test-criteria/APP001-criteria.json
# OR with comma-separated requirements
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001 --requirements "OAuth,Webhooks"
# MODE 3: Evaluate Existing Apps (Direct)
python3 automate_test.py --evaluate test-apps/zapier --app-id APP005
python3 automate_test.py --evaluate test-apps/zapier --requirements "OAuth,Events,Sync"
# Error Learning & Statistics
python3 automate_test.py --show-stats
python3 automate_test.py --generate-skill-updatesApps are scored on a 100-point scale with letter grades (A-F):
| Category | Weight | Points | Description |
|---|---|---|---|
| FDK Validation | 20% | 20 | Pass/fail FDK validation |
| File Structure | 20% | 20 | Required files present |
| Platform 3.0 Compliance | 40% | 40 | 5 checks Γ 8 pts each |
| Crayons Usage | 20% | 20 | UI component library usage |
Platform 3.0 Compliance Checks:
- Platform version 3.0
- Uses 'modules' structure
- No whitelisted-domains
- Engines block present
- Correct location placement (auto-pass for background/serverless apps)
Grade Scale: A (90-100) β’ B (80-89) β’ C (70-79) β’ D (60-69) β’ F (<60)
Example Result:
{
"app_id": "APP003",
"score": {
"total_score": 85.0,
"percentage": 85.0,
"grade": "B"
},
"validation": {
"success": true,
"platform_errors": [],
"lint_errors": []
},
"platform3_compliance": {
"platform_version_3_0": true,
"modules_structure": true,
"no_whitelisted_domains": true,
"engines_present": true,
"correct_location_placement": true
}
}What each grade means:
- A (90-100): Production-ready, follows all best practices
- B (80-89): Good quality, minor improvements needed
- C (70-79): Functional but needs attention to standards
- D (60-69): Multiple issues, requires fixes before deployment
- F (<60): Significant problems, major refactoring needed
Automatically tracks validation failures and generates improvement suggestions.
How it works:
- Records FDK validation errors with context
- Identifies patterns across multiple apps
- Generates actionable skill improvements
- Tracks which patterns have been resolved
Common patterns tracked:
deprecated_request_api- Using old request methodsasync_no_await- Async functions without awaitrequest_schema_error- Incorrect request template structureinvalid_location- Wrong location placementoauth_integrations- OAuth config structure issues
Commands:
# View error statistics
python3 error_learner.py stats
# Generate skill improvement suggestions
python3 error_learner.py suggest
# View generated suggestions
cat .dev/planning/AUTO_SKILL_UPDATES.mdExample output:
π Error Learning Statistics
Total errors recorded: 2
Unique patterns: 4
Fixed patterns: 0
Unfixed patterns: 4
Most common patterns:
β request_schema_error: 3 occurrences
β deprecated_request_api: 2 occurrences
β async_no_await: 2 occurrences
β product_field_deprecated: 1 occurrences
Data stored in:
- Error Database:
.dev/comparison/error_database.json - Skill Updates:
.dev/planning/AUTO_SKILL_UPDATES.md
# Test APP003 (Freshdesk-GitHub Integration)
python3 automate_test.py --app APP003
# Opens prompt, you generate in separate Cursor window, then validates# Step 1: Setup with interactive input
python3 setup_test.py APP001
# Option A: Paste plain text (EASIEST!)
# Requirements:
# - OAuth 2.0
# - Webhooks
# - Platform 3.0
#
# Features:
# - Request templates
# - Data methods
# - Custom iParams
#
# Description:
# Ticket automation app
#
# Type 'END' and press Enter
# OR Option B: Paste JSON
# {
# "requirements": ["OAuth 2.0", "Webhooks", "Platform 3.0"],
# "expected_files": ["manifest.json", "server/server.js"],
# "description": "Ticket automation app"
# }
# Type 'END' and press Enter
# Step 2: Copy your app
cp -r /path/to/your/app/* test-apps/APP001/
# Step 3: Run evaluation using saved criteria file
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001 --requirements test-criteria/APP001-criteria.json# Setup and copy app in one command
python3 setup_test.py APP001 --app-path /path/to/your/app
# Then run evaluation
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001# After running setup_test.py, use the saved criteria file
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001 --requirements test-criteria/APP001-criteria.json
# This loads:
# - All requirements from the criteria file
# - Expected files list
# - Automatically uses them for validation
# Result: APP001_result.json with all requirements tracked# Copy your app to test-apps/
cp -r ~/my-freshdesk-app test-apps/my-app
# Evaluate it
python3 automate_test.py --evaluate test-apps/my-app
# Result: EVAL_MY-APP_result.json with score and grade# Evaluate with specific requirements
python3 automate_test.py --evaluate test-apps/oauth-app \
--requirements "OAuth 2.0,Token refresh,Webhook support,Crayons UI"
# Requirements are tracked in the results file# Evaluate version 1
python3 automate_test.py --evaluate test-apps/app-v1 --app-id APP_V1
# Evaluate version 2
python3 automate_test.py --evaluate test-apps/app-v2 --app-id APP_V2
# Compare results/APP_V1_result.json vs results/APP_V2_result.json# Run multiple tests
python3 automate_test.py --app APP001
python3 automate_test.py --app APP002
python3 automate_test.py --app APP003
# View error statistics
python3 automate_test.py --show-stats
# Generate improvement suggestions
python3 automate_test.py --generate-skill-updates7 predefined test cases covering various app types:
| ID | Name | Type | Product |
|---|---|---|---|
| APP001 | MS Teams Presence Checker | Frontend | Freshservice |
| APP002 | Freshservice-Asana Sync | Serverless | Freshservice |
| APP003 | Freshdesk-GitHub Integration | Frontend | Freshdesk |
| APP004 | Password Generator | Frontend | Freshservice |
| APP005 | Freshdesk-Zapier Contact Sync | Serverless | Freshdesk |
| APP006 | Jira-Freshdesk OAuth Sync | Serverless | Freshdesk |
| APP007 | Ticket Field Validation | Frontend | Freshdesk |
Interactive mode - paste your criteria in ANY format:
python3 setup_test.py APP001
# Option A: Plain Text (EASIEST!)
# TIP: Include "Frontend" or "Serverless" to auto-detect app type!
Requirements:
- Frontend
- OAuth 2.0
- Webhooks
- Platform 3.0
Features:
- Request templates
- Data methods
- Custom iParams
Description:
Ticket automation with external API
# Type 'END' and press Enter
# Auto-detects: Frontend app (no server/server.js expected)
# Option B: JSON format (also works!)
{
"requirements": ["OAuth 2.0", "Webhooks", "Platform 3.0"],
"expected_files": ["manifest.json", "server/server.js"],
"description": "Ticket automation with external API"
}
# Type 'END' and press Enter
# Script automatically:
# 1. Detects format (plain text or JSON)
# 2. Fixes common typos (erverless β Serverless, iparam β iParam, etc.)
# 3. Detects app type (Frontend/Serverless) from requirements
# 4. Generates appropriate expected files based on app type
# 5. Saves criteria to test-criteria/APP001-criteria.json
# 6. Creates test-apps/APP001/ directory
# 7. Shows evaluation command to runWith existing app (one command):
# Copy app automatically
python3 setup_test.py APP001 --app-path /path/to/your/app
# Or with criteria file
python3 setup_test.py APP001 --criteria-file my-criteria.json --app-path /path/to/appThen evaluate:
# Use saved criteria file (RECOMMENDED)
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001 --requirements test-criteria/APP001-criteria.json
# OR use comma-separated requirements
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001 --requirements "OAuth 2.0,Webhooks,Platform 3.0"Step 1: Add to use-cases/use_cases.json:
{
"id": "APP008",
"name": "My Custom App",
"app_type": "Frontend",
"product": "freshdesk",
"prompt": "Description of your app",
"expected_files": ["manifest.json", "app/index.html", "app/scripts/app.js"]
}Step 2: Run python3 automate_test.py --app APP008
Step 1: Copy app to benchmarking/test-apps/YOUR-APP-NAME/
Step 2: Evaluate with requirements:
# Option A: Use existing use case ID
python3 automate_test.py --evaluate test-apps/YOUR-APP-NAME --app-id APP008
# Option B: Provide custom requirements
python3 automate_test.py --evaluate test-apps/YOUR-APP-NAME --requirements "OAuth,API integration,Crayons UI"
# Option C: Just validate (no requirements)
python3 automate_test.py --evaluate test-apps/YOUR-APP-NAMEStep 3: View results in results/ folder
βββββββββββββββββββ
β Define Use Case β
β (use_cases.json)β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ ββββββββββββββββββββ
β Run Test Script βββββββΆβ Generate in β
β --app APP008 β β Separate Cursor β
ββββββββββ¬βββββββββ ββββββββββ¬ββββββββββ
β β
ββββββββββββββββββββββββββ
β Press ENTER
βΌ
βββββββββββββββββββ
β FDK Validation β
β + Compliance β
β + Scoring β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Results JSON β
β + Error Learningβ
βββββββββββββββββββ
Commands:
# 1. Add use case to use-cases/use_cases.json
# 2. Run test
python3 automate_test.py --app APP008
# 3. Generate in separate Cursor window
# 4. Press ENTER to validate
# 5. Check results in results/APP008_result.jsonβββββββββββββββββββ
β Run setup_test β
β python3 setup_ β
β test.py APP001 β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Paste Criteria β
β JSON (or file) β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ ββββββββββββββββββββ
β Criteria Saved β β Directory Createdβ
β test-criteria/ β β test-apps/APP001 β
ββββββββββ¬βββββββββ ββββββββββ¬ββββββββββ
β β
ββββββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β Copy App or Generate β
β cp -r /path/to/app test-apps/ β
ββββββββββ¬βββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Run Evaluation β
β (command shown β
β by setup) β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β FDK Validation β
β + Compliance β
β + Scoring β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ ββββββββββββββββ
β Results JSON βββββββΆβ Fix Issues β
β + Error Learningβ β Re-evaluate β
βββββββββββββββββββ ββββββββββββββββ
Commands:
# 1. Quick setup with interactive criteria
python3 setup_test.py APP001
# (paste criteria JSON)
# 2. Copy your app
cp -r /path/to/my-app test-apps/APP001/
# 3. Run evaluation (command provided by setup script)
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001 --requirements "..."
# 4. Check results
cat results/APP001_result.jsonβββββββββββββββββββ
β Existing App β
β (any source) β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Copy to β
β test-apps/ β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Run Evaluation β
β --evaluate PATH β
β (+ requirements)β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β FDK Validation β
β + Compliance β
β + Scoring β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ ββββββββββββββββ
β Results JSON βββββββΆβ Fix Issues β
β + Error Learningβ β Re-evaluate β
βββββββββββββββββββ ββββββββββββββββ
Commands:
# 1. Copy app to test-apps/
cp -r /path/to/my-app test-apps/my-app
# 2. Evaluate with requirements
python3 automate_test.py --evaluate test-apps/my-app --requirements "OAuth,Crayons UI,Platform 3.0"
# 3. Or use existing use case ID
python3 automate_test.py --evaluate test-apps/my-app --app-id APP005
# 4. Check results
cat results/EVAL_MY-APP_result.jsonThe --evaluate flag enables testing already-generated apps without regeneration:
Benefits:
- β Test apps from any source (manual, AI-generated, production)
- β No need to regenerate or modify existing apps
- β Track custom requirements alongside standard checks
- β Compare multiple versions of the same app
- β Validate apps before deployment
What it checks:
- FDK validation (pass/fail)
- File structure (auto-detected or from use case)
- Platform 3.0 compliance (5 checks)
- Crayons UI usage
- Custom requirements (if provided)
Example use cases:
# Validate a production app before update
python3 automate_test.py --evaluate test-apps/prod-app
# Check if app meets specific requirements
python3 automate_test.py --evaluate test-apps/oauth-app --requirements "OAuth 2.0,Token refresh,Error handling"
# Compare two versions
python3 automate_test.py --evaluate test-apps/v1 --app-id APP_V1
python3 automate_test.py --evaluate test-apps/v2 --app-id APP_V2- Check error stats regularly after test runs
- Generate suggestions when patterns reach 2+ occurrences
- Review and apply skill updates to improve app generation
- Re-run tests to verify improvements
- Use evaluation mode to validate apps from any source
- Track requirements for better quality assurance
python3 automate_test.py [OPTIONS]
Options:
--app APP_ID Test a predefined use case (e.g., APP001)
--evaluate PATH Evaluate an existing app at PATH
--app-id ID Custom app ID for evaluation results
--requirements "R1,R2" Comma-separated requirements to track
--show-stats Display error learning statistics
--generate-skill-updates Generate improvement suggestions
--benchmark-dir PATH Custom benchmark directory (default: ~/benchmark-test)
-h, --help Show help messageExamples:
# Generate and test
python3 automate_test.py --app APP003
# Evaluate existing
python3 automate_test.py --evaluate test-apps/my-app
# Evaluate with requirements
python3 automate_test.py --evaluate test-apps/my-app --requirements "OAuth,Webhooks"
# Custom app ID
python3 automate_test.py --evaluate test-apps/my-app --app-id CUSTOM_001
# Statistics
python3 automate_test.py --show-stats
# Generate suggestions
python3 automate_test.py --generate-skill-updatesIssue: "FDK not found"
# Solution: Install FDK globally
npm install -g @freshworks/fdkIssue: "Use case not found"
# Solution: Check use-cases/use_cases.json for valid IDs
cat use-cases/use_cases.json | grep '"id"'Issue: "App path not found"
# Solution: Use relative path from benchmarking/ or absolute path
python3 automate_test.py --evaluate test-apps/my-app # relative
python3 automate_test.py --evaluate /full/path/to/app # absoluteIssue: Validation passes but score is low
- Check Platform 3.0 compliance (40 points)
- Verify Crayons usage (20 points)
- Ensure all expected files are present (20 points)
Issue: Error learning not working
# Check if error_learner.py exists
ls error_learner.py
# Check error database
cat .dev/comparison/error_database.jsonLast Updated: February 26, 2026 β’ Internal Freshworks tool for marketplace app quality assurance