| name | gitlab-job-analyzer |
|---|---|
| description | Analyze GitLab CI/CD job failures, parse logs, identify error patterns, and troubleshoot pipeline issues. This skill should be used when the user asks to investigate failed jobs, debug pipeline failures, analyze job logs, compare job runs, or generate failure reports. Uses the official GitLab CLI (glab) and integrates with the gitlab skill. |
| allowed-tools | Bash(*/gitlab-job-analyzer/scripts/*.sh:*),Bash(*/tools/*/install.sh:*) |
Comprehensive analysis of GitLab CI/CD jobs - identify failures, parse logs, extract errors, compare runs, analyze dependencies, and generate structured failure reports.
Prerequisites:
glabcommand is available- User is already authenticated
- Works with gitlab.com, GitLab Self-Managed, and GitLab Dedicated
- Optional:
jqfor JSON parsing
Installation: Use the centralized tool installation scripts to install dependencies:
# glab CLI (required)
../../../tools/glab/install.sh # Check and install glab
../../../tools/glab/install.sh --check # Check only, don't install
# jq (optional, recommended)
../../../tools/jq/install.sh # Check and install jq
../../../tools/jq/install.sh --check # Check only, don't installThe tool scripts are idempotent - safe to run multiple times. They will only install if the tool is not present or outdated.
Philosophy: Start with job status, identify failures, fetch logs, extract error patterns, and provide actionable insights. Use structured analysis to diagnose root causes.
- Job: Individual unit of work in a CI/CD pipeline (build, test, deploy)
- Pipeline: Collection of jobs organized in stages
- Job Log: Console output from a job execution
- Job Artifacts: Files produced by a job (test reports, binaries, coverage)
- Job Dependencies: Jobs that must complete before a job can start
- Job Trace: Complete log output from a job execution
All analysis scripts output JSON by default to make results easy to parse and minimize model interaction.
analyze_pipeline.sh for comprehensive analysis
For most pipeline investigations, use the comprehensive analyzer to get all data in a single call:
- ✅ Single round-trip - get complete pipeline analysis in one command
- ✅ No multiple script calls - everything included (metadata, jobs, errors, dependencies)
- ✅ Efficient - For typical pipelines (even 50+ jobs), JSON output is manageable (~100KB)
- ✅ Parse what you need - Use
jqto extract specific fields
Recommended approach - get everything in one call:
# Run comprehensive analysis and capture full JSON output
OUTPUT=$(./scripts/analyze_pipeline.sh owner/repo 12345)
# Parse specific fields as needed
echo "$OUTPUT" | jq '.job_statistics'
echo "$OUTPUT" | jq '.failed_jobs[] | {id, name, stage, failure_reason}'
echo "$OUTPUT" | jq '.common_error_patterns'
echo "$OUTPUT" | jq '.stages[] | select(.failed > 0)'# Get all pipeline data as JSON
./scripts/analyze_pipeline.sh owner/repo 12345
# Output structure:
{
"repository": "owner/repo",
"pipeline_id": 12345,
"pipeline_metadata": {
"status": "failed",
"ref": "main",
"sha": "abc123...",
"duration": 1234,
"created_at": "2026-02-09T...",
"user": "username"
},
"job_statistics": {
"total": 20,
"passed": 15,
"failed": 3,
"skipped": 2,
"running": 0
},
"stages": [ ... ],
"analyzed_jobs": [ ... ], // Failed jobs with error analysis
"failure_reasons": [ ... ],
"blocked_jobs": [ ... ],
"common_error_patterns": [ ... ]
}Add --human flag for formatted output:
./scripts/analyze_pipeline.sh owner/repo 12345 --humanThe comprehensive analyzer is recommended for most cases. Use specialized scripts for:
compare_job_logs.sh- Detailed comparison of two specific job runsanalyze_dependencies.sh- Dependency graph visualizationextract_errors.sh- Detailed error categorization of a single log file
All specialized scripts also support --json output.
Most common workflow: Time-based job analysis across ALL pipelines.
Use the following scripts based on your needs:
-
Recent Job Summary (Most Common) →
analyze_recent_jobs.sh- "What jobs ran/failed in the last 24 hours?"
- "Show me all failures this week"
- Analyzes jobs across ALL pipelines in a time range
-
Runner-Specific Analysis →
analyze_by_runner.sh- "Show me failures on aipcc-* runners"
- "Compare performance across different runner types"
- Groups jobs by runner tags and shows success rates
-
Single Pipeline Deep Dive →
analyze_pipeline.sh- "What failed in pipeline #12345?"
- "Analyze errors in this specific pipeline"
- Comprehensive analysis of one pipeline
-
Compare Two Job Runs →
compare_job_logs.sh- "Why did this job start failing?"
- "Compare successful vs failed runs"
- Find differences between job executions
User: "What jobs failed in the last 24 hours?" or "Summarize jobs for the last 24h"
This is the most common use case - analyzing recent activity across ALL pipelines.
Recommended Approach:
# Last 24 hours (default)
./scripts/analyze_recent_jobs.sh owner/repo --hours 24
# Parse specific fields
./scripts/analyze_recent_jobs.sh owner/repo --hours 24 | jq '.job_statistics'
# Output: {"success": 179, "failed": 22, "skipped": 32, ...}
./scripts/analyze_recent_jobs.sh owner/repo --hours 24 | jq '.by_stage'
# Output: Stage-level breakdown with failure counts
./scripts/analyze_recent_jobs.sh owner/repo --hours 24 | jq '.failed_jobs[] | {id, name, stage, failure_reason}'
# Output: List of failed jobs with details
# Human-readable output
./scripts/analyze_recent_jobs.sh owner/repo --hours 24 --humanOther time ranges:
# Last 7 days
./scripts/analyze_recent_jobs.sh owner/repo --days 7
# Since specific date
./scripts/analyze_recent_jobs.sh owner/repo --since "2026-02-09T00:00:00Z"Filter by runner tags:
# Only jobs on aipcc-* runners
./scripts/analyze_recent_jobs.sh owner/repo --hours 24 --runner-tag "aipcc-"
# See how many jobs matched the filter
./scripts/analyze_recent_jobs.sh owner/repo --hours 24 --runner-tag "aipcc-" | jq '.runner_filter'Output includes:
- Total jobs in time period
- Breakdown by status (success/failed/skipped/canceled/running)
- Breakdown by stage with failure counts and success rates
- Pipeline summary (total, successful, failed)
- Detailed list of failed jobs with failure reasons
- Runner filter statistics (if --runner-tag provided)
User: "Show me failures on aipcc-* runners" or "Compare runner performance"
Analyze job performance grouped by runner tags.
Commands:
# Analyze all runners in last 24 hours
./scripts/analyze_by_runner.sh owner/repo --hours 24
# Parse specific runner
./scripts/analyze_by_runner.sh owner/repo --hours 24 | jq '.by_runner_tag[] | select(.tag == "aipcc-small-x86_64")'
# Compare specific runners
./scripts/analyze_by_runner.sh owner/repo --hours 24 --compare "aipcc-small-x86_64,aipcc-small-aarch64"
# Human-readable output
./scripts/analyze_by_runner.sh owner/repo --days 7 --humanOutput includes:
- Jobs grouped by runner tag
- Success rate per runner type
- Average/max/min duration per runner
- Common failures per runner type
- List of failed jobs per runner
Use cases:
- Identify runner-specific issues (e.g., "aipcc-small-x86_64 has 80% failure rate")
- Compare performance across architectures (x86_64 vs aarch64)
- Find slow runners (high avg_duration)
- Track infrastructure problems
User: "What failed in pipeline #12345?" or "Analyze errors in this specific pipeline"
When you need to deeply analyze a specific pipeline (not recent activity across all pipelines).
IMPORTANT - Pipeline ID vs IID:
glab ci listshows:#2315214741 (#2433)- First number (2315214741) is the pipeline ID → USE THIS
- Number in parentheses (2433) is the IID → DO NOT use this
Recommended Approach (JSON-first):
# Step 1: Run comprehensive analysis
OUTPUT=$(./scripts/analyze_pipeline.sh owner/repo 12345)
# Step 2: Extract key information
echo "$OUTPUT" | jq '.job_statistics'
# Output: {"total": 20, "passed": 15, "failed": 3, "skipped": 2, "running": 0}
echo "$OUTPUT" | jq '.analyzed_jobs[] | select(.status == "failed") | {id, name, stage, failure_reason}'
# Output: List of failed jobs with details
echo "$OUTPUT" | jq '.common_error_patterns'
# Output: Error patterns across all failures
echo "$OUTPUT" | jq '.analyzed_jobs[] | select(.status == "failed") | .sample_error_lines[]' | head -20
# Output: Sample error lines from failed jobsBenefits:
- ✅ Single command gets all data (~100KB JSON for typical pipeline)
- ✅ No multiple round-trips to GitLab API
- ✅ Error analysis already included
- ✅ Parse specific fields as needed with jq
Special Cases:
Pipeline with 0 jobs: If a pipeline failed during creation (YAML syntax error, etc.), the script will detect this:
./scripts/analyze_pipeline.sh owner/repo 12345
# Output: {"total_jobs": 0, "message": "Pipeline has no jobs. This typically occurs when..."}User: "Why did job 'test-integration' start failing? It worked before."
Approach:
Use analyze_recent_jobs.sh to find recent jobs, then use compare_job_logs.sh to compare specific runs:
# Step 1: Find recent jobs with the same name
./scripts/analyze_recent_jobs.sh owner/repo --hours 48 | jq -r '.all_jobs[] | select(.name == "test-integration") | "\(.id) \(.status) \(.created_at)"' | head -10
# Step 2: Get job IDs from the output above, then compare
# Use the last successful and last failed job IDs
OUTPUT=$(./scripts/compare_job_logs.sh owner/repo <success-job-id> <failed-job-id> --json)
# Step 3: Extract comparison results
echo "$OUTPUT" | jq '.comparison'
# Output: {duration_difference: -45, same_runner: true, same_commit: false, ...}
echo "$OUTPUT" | jq '.log_analysis.new_errors[]'
# Output: New error lines in the failed job
echo "$OUTPUT" | jq '.recommendation'
# Output: "Job 1 succeeded but Job 2 failed - investigate new errors..."Human-readable output:
./scripts/compare_job_logs.sh owner/repo <success-job-id> <failed-job-id>User: "What errors are causing the test job to fail?"
Approach:
Use analyze_pipeline.sh to get error analysis for all failed jobs in a pipeline:
# Analyze entire pipeline with error extraction
OUTPUT=$(./scripts/analyze_pipeline.sh owner/repo <pipeline-id>)
# Get error patterns across all failures
echo "$OUTPUT" | jq '.common_error_patterns'
# Get specific failed job's error lines
echo "$OUTPUT" | jq '.analyzed_jobs[] | select(.name == "test-integration") | .sample_error_lines'For detailed error categorization of a specific job log:
The extract_errors.sh script can categorize errors if you have a log file saved locally:
# If you have a log file from another source
./scripts/extract_errors.sh job.log --json | jq '.categories'User: "Which jobs are blocking the deployment?"
Approach:
Use the analyze_dependencies.sh script to analyze job dependencies and critical paths:
# Analyze dependencies and get JSON output
OUTPUT=$(./scripts/analyze_dependencies.sh owner/repo <pipeline-id> --json)
# Get blocked jobs (waiting on failed dependencies)
echo "$OUTPUT" | jq '.blocked_jobs'
# Get dependency tree
echo "$OUTPUT" | jq '.stages'
# Get critical path
echo "$OUTPUT" | jq '.critical_path'For visual dependency graph:
# Generate Mermaid graph
./scripts/analyze_dependencies.sh owner/repo <pipeline-id> --graphHuman-readable output:
./scripts/analyze_dependencies.sh owner/repo <pipeline-id>User: "Give me a summary of all failures in the last pipeline"
Approach:
Use analyze_pipeline.sh for JSON output or generate_failure_report.sh for markdown reports:
# Get comprehensive analysis (JSON)
./scripts/analyze_pipeline.sh owner/repo <pipeline-id> | jq '.'
# Get human-readable report
./scripts/analyze_pipeline.sh owner/repo <pipeline-id> --human
# Generate detailed markdown report
./scripts/generate_failure_report.sh owner/repo <pipeline-id> > failure-report.mdThe report includes:
- Summary (total/passed/failed)
- Failed job details
- Error excerpts
- Recommendations
All scripts output JSON by default. Add --human for formatted text. Run any script with --help for full usage.
| Script | Purpose | Key Args |
|---|---|---|
analyze_recent_jobs.sh |
Jobs across ALL pipelines in time range | --hours N, --days N, --runner-tag PREFIX |
analyze_by_runner.sh |
Jobs grouped by runner tags | --hours N, --days N, --compare TAGS |
analyze_pipeline.sh |
Single pipeline deep dive | <pipeline-id>, --include-successful |
compare_job_logs.sh |
Compare two job runs | <job-id-1> <job-id-2> |
analyze_dependencies.sh |
Dependency graph and critical path | <pipeline-id>, --graph, --json |
extract_errors.sh |
Categorize errors from a log file | <log-file>, --json |
generate_failure_report.sh |
Markdown failure report (legacy) | <pipeline-id> |
When combining this skill with other skills (especially aws-log-analyzer):
See the complete optimization guide in the main CLAUDE.md documentation under "Performance Optimization for Cross-Skill Analysis".
Key optimizations:
- Parallel execution - Run GitLab + AWS analysis simultaneously in one message
- Parse JSON once - Capture output, parse multiple times with jq (don't re-run scripts)
- Skip redundant scripts - Extract runner data from
analyze_recent_jobs.shwith jq instead of runninganalyze_by_runner.sh - Smart targeting - Identify problematic runners first, then analyze only those AWS log groups
Example - Optimized cross-skill analysis:
# SINGLE MESSAGE - Parallel execution:
# Tool 1: GitLab analysis
./gitlab-job-analyzer/scripts/analyze_recent_jobs.sh owner/repo --hours 24
# Tool 2: AWS analysis (runs in parallel)
./aws-log-analyzer/scripts/analyze_errors.sh /aws/app/component1 24
# Then parse GitLab output multiple ways without re-running:
echo "$GITLAB_OUTPUT" | jq '.job_statistics'
echo "$GITLAB_OUTPUT" | jq '.by_runner_tag[] | select(.failed > 5)'
echo "$GITLAB_OUTPUT" | jq '.failed_jobs[] | {name, stage, runner_tag}'Expected performance:
- Optimized: 3-4 minutes, $0.75-0.85
- Non-optimized: 10+ minutes, $1.06+
- Focus on the last 100-200 lines of logs where errors typically appear
- Compare jobs from similar time periods and same runner tags
- Distinguish between flaky failures (intermittent) vs. deterministic
- Use
failure_reasonfield for initial categorization, then analyze logs for details - Use
gitlabskill for broader repository operations (MRs, commits, issues)
GitLab provides failure_reason field in job JSON:
script_failure- Job script failed (exit code != 0)stuck_or_timeout_failure- Job timed outrunner_system_failure- Runner infrastructure issuemissing_dependency_failure- Required service not availableapi_failure- API request failed during jobrunner_unsupported- Runner doesn't support job requirementsstale_schedule- Scheduled pipeline was stalejob_execution_timeout- Job exceeded max execution timearchived_failure- Project is archivedunmet_prerequisites- Job prerequisites not metscheduler_failure- Scheduler errordata_integrity_failure- Data integrity issue
Use failure_reason for initial categorization, then analyze logs for details.
No logs available:
- Job may still be running - check status first
- Logs may have been expired (check artifacts_expire_at)
- Job may have been canceled before producing logs
- Runner may have crashed before logs were uploaded
Truncated logs:
- GitLab has max trace size (default 4MB, configurable)
- Use job artifacts to capture full logs
- Look for "Job log exceeded limit" message
- Check runner logs for full output
Job stuck in pending:
- No available runners with matching tags
- Runner capacity exceeded
- Pipeline quota reached
- Protected branch requires specific runners
Flaky tests:
- Compare multiple runs of same job
- Look for race conditions in logs
- Check for timing-dependent assertions
- Review test isolation (shared state issues)
Required: glab (install via tools/glab/install.sh)
Optional: jq (install via tools/jq/install.sh)
For large datasets, scripts save results via tools/memory/scripts/state.sh. View saved state with ./scripts/view_state.sh. Direct JSON output is recommended for typical use cases.