Skip to content

Maintenance Window Management #62

@ivan-gudak

Description

@ivan-gudak

User Story: Maintenance Window Management

Story Overview

As a DevOps Engineer, SRE, Release Manager, or Migration Specialist
I want to schedule and manage maintenance windows in Dynatrace Managed environments through natural language interactions with the MCP server
So that I can reduce alert fatigue during planned deployments, migrations, and infrastructure changes—while maintaining precise audit trails and coordinating activities across multiple Managed environments


🔍 Scope & Architecture

What This MCP Server Does

This is a Dynatrace Managed MCP Server. It:

  • ✅ Connects to one or more Dynatrace Managed environments
  • ✅ Creates, modifies, and deletes maintenance windows in Managed clusters only
  • ✅ Returns maintenance window data and schedules to the NLM (Natural Language Model)
  • ✅ Supports cross-Managed-environment operations (e.g., same window on multiple Managed clusters)

What This MCP Server Does NOT Do

  • ❌ Connect to Dynatrace SaaS environments directly
  • ❌ Communicate with SaaS MCP Server directly
  • ❌ Create, modify, or query maintenance windows in SaaS environments

Multi-Environment Operations

Within Managed (supported):

  • User can create the same maintenance window across multiple Managed clusters
  • NLM calls this MCP once for each target Managed environment
  • Example: "Create maintenance window on managed-prod-us, managed-prod-eu, and managed-staging"

Across Managed and SaaS (NLM-orchestrated):
When coordination between Managed and SaaS is needed:

  1. User makes request to NLM (e.g., "Create maintenance window during migration affecting both environments")
  2. NLM calls this Managed MCP for Managed environment maintenance windows
  3. NLM calls separate SaaS MCP for SaaS environment maintenance windows (if available)
  4. NLM coordinates timing and scope across environments
  5. NLM aggregates results and presents unified confirmation to user

This user story describes the complete end-to-end workflow to show business value, but this MCP Server only implements maintenance window management in Dynatrace Managed environments.


User Value Proposition

1. Rapid Response During Emergencies

  • Instant ad-hoc maintenance windows - "Create a maintenance window NOW for the next 2 hours" when unplanned deployments happen
  • Emergency deployment coverage - quickly suppress noise when critical hotfixes require immediate deployment
  • Forgotten window recovery - retroactively create windows when admins forgot to schedule them before deployment
  • Natural language speed - create windows in seconds vs. minutes through UI navigation

2. Reduce Alert Fatigue

  • Deployment noise elimination - suppress expected availability and performance problems during releases
  • Targeted suppression - scope windows precisely to affected services, hosts, or clusters using tags and management zones
  • Recurring window patterns - set up weekly deployment windows once, never manually create them again
  • Multi-environment consistency - apply the same schedule across dev/staging/production Managed clusters

3. Dynamic Adjustments

  • Extend running windows - "The deployment is taking longer, extend the window by 30 minutes"
  • Early closure - "Deployment finished early, close the maintenance window now"
  • Scope refinement - adjust which entities are covered mid-window if needed
  • Schedule modifications - change recurring patterns without recreating windows

4. Managed-to-SaaS Migration Support

  • Wave-based scoping - create precise windows for each migration wave (app by app, host by host, cluster by cluster)
  • Tag-based targeting - scope windows using DQL/tags like migration:wave1, environment:production, cluster:k8s-prod
  • Management zone isolation - suppress alerts only for migrating entities, not entire environments
  • Correlation with migration events - link maintenance windows to migration events for complete audit trail

5. Intelligent Suggestions

  • NLM-driven recommendations - AI suggests maintenance windows based on detected deployment patterns
  • Optimal window sizing - recommend duration based on historical deployment times
  • Smart recurrence patterns - detect weekly deployment schedules and suggest recurring windows
  • Conflict detection - warn if overlapping windows or critical monitoring gaps

6. Integration with Fleet Updates

  • OneAgent/ActiveGate update coordination - suggest maintenance windows during fleet update operations
  • Cluster upgrade protection - suppress expected problems during Dynatrace platform upgrades
  • Phased rollout support - create staggered windows for host group updates
  • Integration point for future fleet management story

Acceptance Criteria

Core Capabilities

1. Create Scheduled Maintenance Windows

Given a user plans a deployment or migration
When they describe the maintenance window requirements in natural language
Then the MCP server should:

  • Parse the intent to extract:
    • Start date/time (or "now" for immediate)
    • Duration or end date/time
    • Recurrence pattern (one-time, daily, weekly, monthly, custom)
    • Scope (entities, tags, management zones, DQL selector)
    • Description/reason
  • Use Dynatrace Settings 2.0 API with builtin:alerting.maintenance-window schema to create the window
  • Support all scope types:
    • Management zone
    • Tag-based filtering
    • Entity selector (DQL)
    • Combination of filters
  • Return confirmation with window ID, scope summary, and schedule details

Example interactions:

User: "Create a maintenance window every Tuesday from 2 PM to 4 PM UTC 
       for all services tagged deployment:weekly in Production"

MCP: Creating recurring maintenance window...
     
     Schedule:
     ✓ Recurrence: Weekly on Tuesdays
     ✓ Time: 14:00-16:00 UTC (2 hours)
     ✓ Effective from: 2026-02-04 (next Tuesday)
     
     Scope:
     ✓ Management Zone: Production
     ✓ Tag filter: deployment:weekly
     ✓ Matched entities: 12 services
       - Payment Service (SERVICE-ABC123)
       - User Service (SERVICE-ABC124)
       - Checkout Service (SERVICE-ABC125)
       ... (9 more)
     
     Suppression:
     ✓ All problems suppressed during window
     ✓ Notifications paused
     ✓ Alerting profiles: All
     
     Maintenance window created: mw-weekly-deploy-001
     Next occurrence: 2026-02-06 14:00 UTC (in 2 days)

2. Create Ad-Hoc Maintenance Windows

Given an unplanned deployment or emergency change is happening
When a user requests an immediate maintenance window
Then the MCP server should:

  • Support "now" or "immediately" as start time
  • Default to 2-hour duration if not specified (prompt for confirmation)
  • Allow quick scope specification (service name, host name, cluster name)
  • Create window within seconds
  • Confirm what entities are covered

Example interaction:

User: "Emergency hotfix deployment on Payment Service starting NOW. 
       Create a maintenance window for the next 90 minutes."

MCP: Creating emergency maintenance window...
     
     ⚠️  IMMEDIATE WINDOW - Starting now
     
     Schedule:
     ✓ Start: 2026-02-04 14:42 UTC (immediately)
     ✓ End: 2026-02-04 16:12 UTC (90 minutes)
     ✓ Type: One-time (ad-hoc)
     
     Scope:
     ✓ Service: Payment Service (SERVICE-ABC123)
     ✓ Included process groups: 3
     ✓ Included hosts: 5
     
     Reason: Emergency hotfix deployment
     
     Maintenance window ACTIVE: mw-emergency-payment-001
     Alert suppression in effect.
     
     Reminder: Window ends at 16:12 UTC (set a timer?)

3. Create Migration-Wave Maintenance Windows

Given a Managed-to-SaaS migration is happening in waves
When a user describes the migration wave scope
Then the MCP server should:

  • Support DQL-based entity selection for precise targeting
  • Use tags to identify migration wave entities
  • Optionally link to migration events for correlation
  • Create appropriately sized windows for migration activities
  • Support overlapping waves (different entities)

Example interaction:

User: "We're migrating the Payment application (wave 1) to SaaS this Saturday 
       from 8 AM to 2 PM UTC. Create a maintenance window for all entities 
       tagged migration:wave1 and app:payment"

MCP: Creating migration maintenance window...
     
     Migration Wave: Wave 1 - Payment Application
     
     Schedule:
     ✓ Date: Saturday 2026-02-08
     ✓ Time: 08:00-14:00 UTC (6 hours)
     ✓ Type: One-time (migration event)
     
     Scope Analysis:
     ✓ Tag filters: migration:wave1 AND app:payment
     ✓ Matched entities: 23 total
       - 4 services (Payment API, Payment Worker, Payment DB, Payment Cache)
       - 12 process groups
       - 7 hosts
     
     Expected Impact:
     ⚠️  Availability problems suppressed (hosts leaving Managed)
     ⚠️  Connection loss alerts suppressed
     ✓ Other production services NOT affected
     
     Maintenance window created: mw-migration-wave1-payment
     
     Would you like me to create a migration event linked to this window?

User: "Yes, create the migration event."

MCP: ✓ Migration event created: evt-migration-wave1-payment
     ✓ Event linked to maintenance window
     ✓ Both Managed environment records updated
     
     Migration tracking complete. Alert suppression ready for Saturday.

4. Create Recurring Maintenance Windows

Given deployments happen on a regular schedule
When a user describes the recurring pattern
Then the MCP server should:

  • Support recurrence patterns:
    • Daily (every day at X time)
    • Weekly (specific days of week)
    • Monthly (specific day of month or "first Monday", etc.)
    • Custom cron-like patterns if supported by Dynatrace
  • Prefer creating ONE recurring window vs. multiple one-time windows
  • Allow specifying end date for recurrence (or infinite)
  • Show preview of next 3-5 occurrences

Example interaction:

User: "Our deployment window is every Tuesday and Thursday from 6 PM to 8 PM 
       for the next 3 months. Scope: all services in Production management zone."

MCP: Creating recurring maintenance window...
     
     Recurrence Pattern:
     ✓ Days: Tuesday, Thursday
     ✓ Time: 18:00-20:00 local time (2 hours)
     ✓ Duration: 3 months (until 2026-05-04)
     ✓ Total occurrences: ~26 windows
     
     Scope:
     ✓ Management Zone: Production
     ✓ All services (42 services currently)
     
     Next 5 occurrences:
     1. Tue 2026-02-06 18:00-20:00 UTC
     2. Thu 2026-02-08 18:00-20:00 UTC
     3. Tue 2026-02-13 18:00-20:00 UTC
     4. Thu 2026-02-15 18:00-20:00 UTC
     5. Tue 2026-02-20 18:00-20:00 UTC
     
     Maintenance window created: mw-recurring-deploy-prod
     Recurrence active until 2026-05-04.

5. List and Query Maintenance Windows

Given a user wants to see scheduled or active maintenance windows
When they query maintenance windows
Then the MCP server should:

  • List windows filtered by:
    • Status (upcoming, active, completed)
    • Time range
    • Scope (management zone, tags, entities)
    • Environment
  • Show window details:
    • Schedule (start, end, recurrence)
    • Scope (entities covered)
    • Status (scheduled, active, ended)
    • Creator and creation date
  • Present in human-readable format

Example interaction:

User: "What maintenance windows are scheduled for this week?"

MCP: Querying maintenance windows for this week (2026-02-04 to 2026-02-10)...
     
     Found 4 scheduled windows:
     
     1. ACTIVE NOW - Emergency Payment Hotfix
        ID: mw-emergency-payment-001
        End: Today 16:12 UTC (in 1h 30min)
        Scope: Payment Service (SERVICE-ABC123)
        Status: 🔴 ACTIVE - Alerts suppressed
     
     2. Weekly Deployment - Production
        ID: mw-recurring-deploy-prod
        Next: Tuesday 2026-02-06 18:00 UTC (in 2 days)
        Recurrence: Every Tue/Thu 18:00-20:00
        Scope: Production management zone (42 services)
        Status: ⏰ SCHEDULED
     
     3. Migration Wave 1 - Payment App
        ID: mw-migration-wave1-payment
        Start: Saturday 2026-02-08 08:00 UTC (in 4 days)
        Duration: 6 hours (08:00-14:00)
        Scope: migration:wave1 AND app:payment (23 entities)
        Status: ⏰ SCHEDULED
     
     4. Weekly Deployment - Production
        ID: mw-recurring-deploy-prod (recurrence)
        Next: Thursday 2026-02-08 18:00 UTC (in 4 days)
        Scope: Production management zone (42 services)
        Status: ⏰ SCHEDULED
     
     Total: 1 active, 3 upcoming

6. Extend or Shorten Active Maintenance Windows

Given a maintenance window is currently active
When a user needs to adjust the duration
Then the MCP server should:

  • Support extending end time ("add 30 minutes", "extend until 5 PM")
  • Support early closure ("close now", "end this window")
  • Update the window via Settings 2.0 API
  • Add comment explaining the change
  • Confirm new end time

Example interaction:

User: "The deployment is taking longer than expected. 
       Extend the current maintenance window by 45 minutes."

MCP: Finding active maintenance window...
     
     Active window: mw-emergency-payment-001
     Current end time: 2026-02-04 16:12 UTC (in 1h 30min)
     
     Extending by 45 minutes...
     ✓ New end time: 2026-02-04 16:57 UTC (in 2h 15min)
     ✓ Comment added: "Window extended due to deployment delays"
     
     Maintenance window extended.
     Alert suppression now active until 16:57 UTC.

---

User: "Deployment finished successfully. Close the maintenance window early."

MCP: Finding active maintenance window...
     
     Active window: mw-emergency-payment-001
     Originally scheduled until: 16:57 UTC (still 1h 45min remaining)
     
     Closing window NOW...
     ✓ Window ended: 2026-02-04 15:12 UTC
     ✓ Duration: 2h 30min (planned 3h 15min, ended 45min early)
     ✓ Comment added: "Window closed early - deployment completed"
     
     Maintenance window closed.
     ✓ Alert suppression ENDED
     ✓ Normal monitoring resumed
     
     Reminder: Problems may take 1-2 minutes to reappear if still active.

7. Update Maintenance Window Scope

Given a user needs to adjust which entities are covered
When they modify the window scope
Then the MCP server should:

  • Support adding or removing entities, tags, or management zones
  • Update the Settings 2.0 object
  • Show before/after entity count
  • Validate scope changes (prevent empty scope)
  • Apply immediately if window is active

Example interaction:

User: "Add all hosts tagged cluster:analytics to the current maintenance window. 
       We're doing unplanned maintenance on them too."

MCP: Finding active maintenance window...
     
     Active window: mw-emergency-payment-001
     Current scope: Payment Service (SERVICE-ABC123)
     Current entities: 8 (1 service, 2 process groups, 5 hosts)
     
     Adding scope: cluster:analytics
     Analyzing impact...
     
     Additional entities: 12 hosts in Analytics cluster
     New total: 20 entities
     
     Updating maintenance window scope...
     ✓ Tag filter added: cluster:analytics
     ✓ Scope updated (effective immediately)
     ✓ Comment added: "Scope expanded to include Analytics cluster hosts"
     
     Maintenance window updated.
     Alert suppression now covers 20 entities (was 8).

8. Delete Maintenance Windows

Given a user needs to cancel or remove a maintenance window
When they request deletion
Then the MCP server should:

  • Support deleting upcoming windows (before they start)
  • Prevent deletion of active windows (must close first, then delete)
  • For recurring windows, offer to delete:
    • Just the next occurrence
    • All future occurrences
    • The entire recurrence pattern
  • Confirm deletion with scope and schedule summary
  • Warn if window is linked to migration events

Example interaction:

User: "Cancel the maintenance window scheduled for Saturday. 
       The migration is delayed."

MCP: Finding maintenance windows scheduled for Saturday 2026-02-08...
     
     Found: mw-migration-wave1-payment
     Schedule: Saturday 2026-02-08 08:00-14:00 UTC
     Scope: migration:wave1 AND app:payment (23 entities)
     Status: SCHEDULED (in 4 days)
     
     ⚠️  Warning: This window is linked to migration event evt-migration-wave1-payment
     
     Delete this maintenance window?

User: "Yes, delete it."

MCP: Deleting maintenance window...
     ✓ Window mw-migration-wave1-payment deleted
     ✓ No longer scheduled for Saturday
     
     Note: Migration event evt-migration-wave1-payment still exists.
          Should I also delete or update the migration event?

User: "Update the event to say migration postponed."

MCP: ✓ Migration event updated with status: postponed
     ✓ Comment added: "Migration postponed, new date TBD"
     
     Maintenance window deleted, migration event updated.

9. Multi-Environment Maintenance Window Creation

Given the MCP server is connected to multiple Managed environments
When a user wants to create the same maintenance window in multiple environments
Then the MCP server should:

  • Accept multiple environment IDs
  • Create identical windows in each environment (same schedule, scope)
  • Adjust entity selectors per environment if needed
  • Report success/failure per environment
  • Support bulk operations for consistency

Example interaction:

User: "Create a maintenance window on Tuesday 6 PM to 8 PM for Production 
       management zone in BOTH managed-prod-us and managed-prod-eu clusters."

MCP: Creating maintenance window in multiple environments...
     
     Target environments:
     - managed-prod-us (Managed US cluster)
     - managed-prod-eu (Managed EU cluster)
     
     Window configuration:
     ✓ Schedule: Tuesday 2026-02-06 18:00-20:00 UTC
     ✓ Scope: Management Zone "Production"
     
     Creating in managed-prod-us...
     ✓ Window created: mw-us-deploy-tue-001
     ✓ Scope: 42 services in Production
     
     Creating in managed-prod-eu...
     ✓ Window created: mw-eu-deploy-tue-001
     ✓ Scope: 38 services in Production
     
     Maintenance windows created in both clusters.
     Same schedule, environment-specific entity counts.
     
     Summary:
     - managed-prod-us: 42 services covered
     - managed-prod-eu: 38 services covered
     - Both active: Tuesday 18:00-20:00 UTC

10. Intelligent Maintenance Window Suggestions

Given the MCP server detects deployment patterns
When a user asks for recommendations
Then the MCP server should:

  • Analyze historical deployment events
  • Detect recurring patterns (same day/time)
  • Calculate average deployment duration
  • Suggest appropriate window schedules
  • Recommend scope based on past deployments

Example interaction:

User: "We deploy Payment Service almost every week. 
       Can you suggest a recurring maintenance window?"

MCP: Analyzing deployment history for Payment Service...
     
     Found 8 deployments in the last 60 days:
     - 6 on Tuesday between 17:00-19:00 UTC
     - 2 on Wednesday between 18:00-20:00 UTC
     
     Deployment duration analysis:
     - Average: 1h 15min
     - Max: 2h 10min
     - Min: 45min
     
     💡 RECOMMENDATION:
     
     Create recurring maintenance window:
     ✓ Day: Every Tuesday
     ✓ Time: 17:00-19:30 UTC (2.5 hours for safety margin)
     ✓ Scope: Payment Service (SERVICE-ABC123)
     
     This covers 75% of your historical deployment window.
     
     Should I create this recurring window?

User: "Yes, create it."

MCP: Creating recurring maintenance window...
     ✓ Recurrence: Every Tuesday 17:00-19:30 UTC
     ✓ Scope: Payment Service
     ✓ First occurrence: Next Tuesday 2026-02-06
     
     Maintenance window created: mw-recurring-payment-deploy
     
     Tip: If deployments shift to a different day, let me know 
          and I'll adjust the recurrence pattern.

11. Fleet Update Coordination (Integration Point)

Given a fleet update (OneAgent, ActiveGate, cluster upgrade) is planned
When a user schedules the fleet update
Then the MCP server should:

  • Detect that fleet updates often justify maintenance windows
  • Suggest creating a maintenance window for the update duration
  • Offer scope based on fleet update targets (all hosts, specific host groups, etc.)
  • Link maintenance window to fleet update activities

Example interaction:

User: "Schedule OneAgent update for all Production hosts this Friday midnight."

MCP: OneAgent fleet update scheduled for Friday 2026-02-08 00:00 UTC.
     
     💡 Suggestion: Create a maintenance window?
     
     Fleet updates often cause:
     - Brief agent disconnections
     - Process group restarts
     - Temporary monitoring gaps
     
     Recommended maintenance window:
     ✓ Start: Friday 2026-02-08 00:00 UTC
     ✓ Duration: 3 hours (typical for ~50 hosts)
     ✓ Scope: All Production hosts
     
     This will suppress expected availability/monitoring problems during the update.
     
     Create this maintenance window?

User: "Yes."

MCP: Creating maintenance window for fleet update...
     ✓ Window: mw-fleet-oneagent-prod-001
     ✓ Schedule: Friday 00:00-03:00 UTC
     ✓ Scope: Management Zone "Production" (52 hosts)
     ✓ Linked to: OneAgent update job fleet-update-20260208
     
     Maintenance window created and linked to fleet update.
     Alert suppression will activate during update window.

Technical Implementation Requirements

MCP Tools to Implement

Tool 1: create-maintenance-window

Purpose: Create a new maintenance window (scheduled or ad-hoc)

Inputs (Zod schema):

{
  environmentId: string | string[],    // One or more Managed environments
  name: string,                         // Display name
  description?: string,                 // Optional description/reason
  schedule: {
    startTime: string,                  // ISO timestamp or "now"
    endTime?: string,                   // ISO timestamp (or use duration)
    duration?: number,                  // Minutes (alternative to endTime)
    recurrence?: {                      // Optional: for recurring windows
      pattern: "daily" | "weekly" | "monthly" | "custom",
      daysOfWeek?: number[],            // For weekly (0=Sunday, 1=Monday, etc.)
      dayOfMonth?: number,              // For monthly
      endDate?: string,                 // When recurrence stops (optional)
      cronExpression?: string           // For custom patterns
    }
  },
  scope: {
    managementZones?: string[],         // Management zone IDs or names
    tags?: string[],                    // Tag filters (e.g., "deployment:weekly")
    entitySelector?: string,            // DQL selector
    entities?: string[]                 // Explicit entity IDs
  },
  suppressAlertingProfiles?: string[],  // Specific alerting profiles (or all)
  linkedEventId?: string,               // Optional: link to deployment/migration event
  metadata?: {                          // Optional structured metadata
    reason?: string,                    // "deployment", "migration", "maintenance", etc.
    ticket?: string,
    owner?: string
  }
}

Implementation:

  • Parse natural language to extract schedule and scope
  • If startTime is "now" or relative, convert to absolute timestamp
  • Construct Settings 2.0 payload for builtin:alerting.maintenance-window schema
  • For multiple environments, create window in each sequentially
  • Validate scope (ensure entities exist)
  • POST to /api/v2/settings/objects with appropriate payload
  • Return created window ID(s) and confirmation

Tool 2: list-maintenance-windows

Purpose: Query and list maintenance windows

Inputs:

{
  environmentId: string,
  filters?: {
    status?: "scheduled" | "active" | "ended" | "all",
    fromDate?: string,                  // Date range filter
    toDate?: string,
    managementZone?: string,            // Filter by scope
    tags?: string[],
    includeRecurring?: boolean          // Show all recurrence instances
  },
  includeDetails?: boolean              // Full details vs. summary
}

Implementation:

  • GET /api/v2/settings/objects?schemaIds=builtin:alerting.maintenance-window
  • Filter results based on schedule and scope
  • Determine status (active, scheduled, ended) based on current time
  • Format for human-readable output
  • For recurring windows, calculate next occurrences

Tool 3: update-maintenance-window

Purpose: Modify an existing maintenance window

Inputs:

{
  environmentId: string,
  windowId: string,                     // Settings object ID or friendly name
  updates: {
    extendBy?: number,                  // Minutes to extend (for active windows)
    newEndTime?: string,                // Set specific end time
    closeNow?: boolean,                 // Close window immediately
    scope?: {                           // Update scope
      add?: {                           // Add entities/tags
        managementZones?: string[],
        tags?: string[],
        entities?: string[]
      },
      remove?: {                        // Remove entities/tags
        managementZones?: string[],
        tags?: string[],
        entities?: string[]
      }
    },
    reschedule?: {                      // Change start/end times (for scheduled windows)
      startTime?: string,
      endTime?: string
    }
  },
  comment?: string                      // Reason for update
}

Implementation:

  • Fetch existing window from Settings 2.0
  • Apply updates to the settings object
  • Validate changes (e.g., can't move active window to the past)
  • PUT /api/v2/settings/objects/{objectId} with updated payload
  • Add comment/audit trail
  • Return updated window summary

Tool 4: delete-maintenance-window

Purpose: Cancel or remove a maintenance window

Inputs:

{
  environmentId: string,
  windowId: string,                     // Settings object ID or friendly name
  deleteType?: "this" | "future" | "all", // For recurring windows
  force?: boolean                       // Allow deleting active window (closes first)
}

Implementation:

  • Fetch window from Settings 2.0
  • If active and force=false, return error (must close first)
  • If active and force=true, close then delete
  • For recurring windows, handle deleteType:
    • "this": Cancel only next occurrence (if supported by schema)
    • "future": Cancel all future occurrences
    • "all": Delete entire recurrence pattern
  • DELETE /api/v2/settings/objects/{objectId}
  • Return confirmation

Tool 5: get-maintenance-window-details

Purpose: Get full details of a specific maintenance window

Inputs:

{
  environmentId: string,
  windowId: string                      // Settings object ID or friendly name
}

Implementation:

  • GET /api/v2/settings/objects/{objectId}
  • Parse and format for display:
    • Schedule (start, end, recurrence)
    • Scope (entities, tags, management zones)
    • Status (active, scheduled, ended)
    • Creator and timestamps
    • Linked events or metadata
  • Show matched entities count
  • If recurring, show next 5 occurrences

Tool 6: suggest-maintenance-window

Purpose: Analyze deployment history and suggest optimal windows

Inputs:

{
  environmentId: string,
  entitySelector?: string,              // Scope to analyze (service, host group, etc.)
  analysisWindow?: string,              // How far back to look (default: 60 days)
  suggestRecurring?: boolean            // Prefer recurring vs. one-time (default: true)
}

Implementation:

  • Query Events API for historical deployment events
  • Filter by entity selector if provided
  • Analyze patterns:
    • Day of week frequency
    • Time of day clustering
    • Duration statistics (avg, max, p95)
  • Detect recurring patterns
  • Generate recommendation with:
    • Suggested schedule
    • Recommended duration
    • Scope
    • Confidence level
  • Return suggestion for user approval

Tool 7: link-window-to-event

Purpose: Associate a maintenance window with a deployment/migration event

Inputs:

{
  environmentId: string,
  windowId: string,
  eventId: string,                      // Event correlation ID
  linkType?: "deployment" | "migration" | "maintenance" | "fleet_update"
}

Implementation:

  • Update maintenance window metadata with event reference
  • Add comment to event linking to maintenance window
  • Create bidirectional reference
  • Return confirmation

MCP Resources

Resource: maintenance-window-guide

Purpose: Provide documentation on maintenance window features

Content:

  • Overview of maintenance window types (scheduled, ad-hoc, recurring)
  • Example natural language commands
  • Scope targeting strategies (tags, management zones, DQL)
  • Migration-specific window patterns
  • Best practices for window sizing and timing
  • Recurrence pattern examples
  • Integration with fleet updates and CI/CD

Resource: maintenance-window-schema-reference

Purpose: Technical reference for Dynatrace maintenance window schema

Content:

  • Settings 2.0 schema: builtin:alerting.maintenance-window
  • Required and optional fields
  • Scope syntax and examples
  • Recurrence rule format
  • Suppression behavior details
  • Limitations and constraints

Example User Interactions

Scenario 1: Weekly Deployment Window

User: "We deploy to production every Tuesday at 6 PM for about 2 hours. 
       Set up a recurring maintenance window for all Production services."

MCP: Creating recurring maintenance window...
     
     Deployment Schedule:
     ✓ Recurrence: Weekly on Tuesdays
     ✓ Time: 18:00-20:00 UTC (2 hours)
     ✓ Starting: Next Tuesday 2026-02-06
     ✓ Ending: Never (infinite recurrence)
     
     Scope:
     ✓ Management Zone: Production
     ✓ All services: 42 services
     ✓ Includes: 67 process groups, 52 hosts
     
     Next 3 occurrences:
     1. Tuesday 2026-02-06 18:00-20:00 UTC
     2. Tuesday 2026-02-13 18:00-20:00 UTC
     3. Tuesday 2026-02-20 18:00-20:00 UTC
     
     Maintenance window created: mw-recurring-prod-deploy
     
     ✓ Alerts will be suppressed during each window
     ✓ Problems on Production services won't trigger notifications
     ✓ You can modify or cancel this recurrence anytime
     
     Setup complete!

Scenario 2: Emergency Hotfix with Forgotten Window

User: "Oh no, we deployed a hotfix 20 minutes ago and forgot to create 
       a maintenance window. We're getting flooded with alerts because 
       services are restarting. Help!"

MCP: Creating retroactive maintenance window...
     
     ⚠️  EMERGENCY WINDOW - Starting retroactively
     
     Schedule:
     ✓ Start: 2026-02-04 14:20 UTC (20 minutes ago)
     ✓ End: 2026-02-04 16:20 UTC (1h 40min from now)
     ✓ Duration: 2 hours total
     
     What scope should this cover?

User: "All services in Production management zone."

MCP: Creating maintenance window...
     
     Scope:
     ✓ Management Zone: Production (42 services)
     
     Maintenance window ACTIVE: mw-emergency-prod-hotfix
     
     ✓ Window backdated to 14:20 UTC
     ✓ Alert suppression NOW ACTIVE
     ✓ Window ends at 16:20 UTC (1h 40min remaining)
     
     Note: Problems that already fired will remain, but:
     - No NEW problems will trigger alerts
     - No notifications will be sent
     - UI will show maintenance window as active
     
     Alert fatigue should reduce immediately!

Scenario 3: Migration Wave with Precise Scoping

User: "Saturday morning we're migrating Wave 2 of our migration to SaaS. 
       This includes all k8s workloads in the Analytics cluster, 
       tagged migration:wave2 and cluster:analytics. 
       Create a maintenance window from 6 AM to 12 PM."

MCP: Creating migration maintenance window...
     
     Migration Details:
     ✓ Wave: Wave 2 - Analytics cluster
     ✓ Migration type: Managed → SaaS
     
     Schedule:
     ✓ Date: Saturday 2026-02-08
     ✓ Time: 06:00-12:00 UTC (6 hours)
     ✓ Type: One-time (migration event)
     
     Scope Analysis:
     ✓ Tag filters: migration:wave2 AND cluster:analytics
     
     Matched entities (23 total):
     - 6 services (Analytics API, Stream Processor, Query Engine, etc.)
     - 15 process groups (k8s workloads)
     - 2 hosts (k8s worker nodes: analytics-k8s-01, analytics-k8s-02)
     
     Expected Suppression:
     ⚠️  HOST UNAVAILABLE - suppressed (hosts leaving Managed)
     ⚠️  ONEAGENT CONNECTION LOST - suppressed
     ⚠️  PROCESS GROUP UNAVAILABLE - suppressed
     ⚠️  SERVICE UNAVAILABLE - suppressed
     ✓ Other production workloads NOT affected
     
     Maintenance window created: mw-migration-wave2-analytics
     
     Would you like me to:
     1. Create a migration event linked to this window?
     2. Create a corresponding window in SaaS for arrival tracking?

User: "Yes to both."

Managed MCP: Creating migration artifacts in Managed environment...
     
     ✓ Maintenance window created: mw-migration-wave2-analytics
       - Scope: Management Zone "Analytics Cluster"
       - Tags: migration:wave2, migration:to_saas
       - Schedule: Saturday 6:00-12:00 UTC
     
     ✓ Migration event created: evt-migration-wave2-analytics
       - Type: CUSTOM_INFO
       - Properties: 
         * status=scheduled
         * target=saas-tenant-abc123
         * migration_wave=wave2
       - Correlation ID: mig-wave2-analytics
     
     ✓ Event and window linked together
     
     ═══════════════════════════════════════════════════════
     NEXT STEP: SaaS Environment (via NLM orchestration)
     ═══════════════════════════════════════════════════════
     
     This Managed MCP has completed its tasks. To create a corresponding
     maintenance window in SaaS, ask your NLM to coordinate with SaaS MCP:
     
     Suggested NLM prompt:
     "Create arrival tracking maintenance window in SaaS for migration 
      mig-wave2-analytics on Saturday 6 AM to 12 PM UTC.
      Scope: Incoming Analytics Cluster entities."
     
     Migration window ready for Saturday!

Scenario 4: Deployment Taking Longer - Extend Window

User: "The deployment is running longer than expected. 
       Extend the current window by 1 hour."

MCP: Finding active maintenance window...
     
     Active window: mw-recurring-prod-deploy
     Current schedule: 18:00-20:00 UTC (ends in 25 minutes)
     
     Extending by 1 hour...
     ✓ New end time: 21:00 UTC (ends in 1h 25min)
     ✓ Comment added: "Extended due to deployment delays"
     ✓ Duration: 3 hours (was 2 hours)
     
     Maintenance window extended.
     Alert suppression now active until 21:00 UTC.
     
     Reminder: This is a recurring window. Extension only applies to 
               TODAY'S occurrence. Next Tuesday's window remains 2 hours.

Scenario 5: Multi-Environment Consistency

User: "Create the same weekly deployment window in all 3 of our Managed clusters: 
       managed-us, managed-eu, and managed-apac. 
       Every Wednesday 10 PM local time, 2 hours, Production services."

MCP: Creating maintenance window across 3 Managed environments...
     
     Configuration:
     ✓ Recurrence: Weekly on Wednesdays
     ✓ Duration: 2 hours
     ✓ Scope: Management Zone "Production"
     
     Time zone handling:
     ⚠️  Note: You said "local time" but each cluster is in different timezone.
     
     Options:
     1. Use each cluster's local timezone (22:00 local)
     2. Use same UTC time (22:00 UTC = different local times)
     3. Specify exact times per cluster
     
     Which approach?

User: "Option 1 - each cluster's local time 10 PM."

MCP: Creating environment-specific windows...
     
     managed-us (UTC-5):
     ✓ Window: mw-us-wed-deploy
     ✓ Schedule: Wednesdays 03:00-05:00 UTC (22:00-00:00 local)
     ✓ Scope: 42 services in Production
     
     managed-eu (UTC+1):
     ✓ Window: mw-eu-wed-deploy
     ✓ Schedule: Wednesdays 21:00-23:00 UTC (22:00-00:00 local)
     ✓ Scope: 38 services in Production
     
     managed-apac (UTC+8):
     ✓ Window: mw-apac-wed-deploy
     ✓ Schedule: Wednesdays 14:00-16:00 UTC (22:00-00:00 local)
     ✓ Scope: 31 services in Production
     
     All 3 maintenance windows created!
     Each activates at 22:00 in its local timezone.
     
     Summary:
     - US: Wednesdays 10 PM EST (03:00 UTC)
     - EU: Wednesdays 10 PM CET (21:00 UTC)
     - APAC: Wednesdays 10 PM SGT (14:00 UTC)

Scenario 6: Fleet Update with Window Suggestion

User: "Schedule OneAgent update for all Production hosts this Sunday at midnight."

MCP: Scheduling OneAgent fleet update...
     
     Fleet Update Configuration:
     ✓ Target: All Production hosts (52 hosts)
     ✓ Schedule: Sunday 2026-02-09 00:00 UTC
     ✓ Update version: OneAgent 1.285.0
     ✓ Rollout strategy: Host-by-host (safer, slower)
     
     Estimated duration: 2-3 hours for 52 hosts
     
     💡 RECOMMENDATION: Create maintenance window?
     
     During OneAgent updates, you may see:
     - Brief host connectivity issues
     - Process group restarts
     - Monitoring data gaps (1-2 min per host)
     - False AVAILABILITY problems
     
     Suggested maintenance window:
     ✓ Start: Sunday 00:00 UTC (same as fleet update)
     ✓ Duration: 4 hours (buffer for safety)
     ✓ Scope: All Production hosts (52 hosts)
     
     This will suppress expected problems during the update.
     Create this window?

User: "Yes, create it."

MCP: Creating coordinated maintenance window...
     
     ✓ Window: mw-fleet-oneagent-prod-20260209
     ✓ Schedule: Sunday 2026-02-09 00:00-04:00 UTC (4 hours)
     ✓ Scope: Production management zone (52 hosts)
     ✓ Linked to: Fleet update job fleet-update-oneagent-20260209
     
     Setup complete:
     ✓ Fleet update scheduled
     ✓ Maintenance window scheduled
     ✓ Both linked for tracking
     
     On Sunday at midnight:
     1. Maintenance window activates (alerts suppressed)
     2. OneAgent update begins
     3. Rollout proceeds host-by-host
     4. Window closes at 04:00 UTC (or earlier if you close manually)
     
     You'll be notified when the fleet update completes.

Scenario 7: Deployment Finished Early - Close Window

User: "Great news, deployment finished in 45 minutes instead of 2 hours. 
       Close the maintenance window now."

MCP: Finding active maintenance window...
     
     Active window: mw-recurring-prod-deploy
     Started: 18:00 UTC (45 minutes ago)
     Scheduled end: 20:00 UTC (1h 15min remaining)
     
     Closing window early...
     ✓ Window closed: 2026-02-04 18:45 UTC
     ✓ Actual duration: 45 minutes (vs. 2 hours planned)
     ✓ Saved: 1h 15min of suppression time
     ✓ Comment added: "Window closed early - deployment completed ahead of schedule"
     
     Maintenance window CLOSED.
     
     ✓ Alert suppression ENDED
     ✓ Normal monitoring RESUMED
     ✓ Problem detection active again
     
     Note: Problems may take 1-2 minutes to re-evaluate and appear if issues exist.
     
     Deployment completed 37.5% faster than planned! 🚀

Non-Functional Requirements

Performance

  • Maintenance window creation should complete within 3 seconds
  • Multi-environment operations should use parallel API calls
  • Window queries should return within 2 seconds
  • Scope validation (entity count) should complete within 5 seconds

Security

  • All API tokens remain server-side (not exposed to users)
  • Validate user has appropriate Dynatrace permissions (settings.read, settings.write)
  • Audit all maintenance window operations with user identity
  • Prevent unauthorized scope expansion (e.g., can't create window for management zones user doesn't own)

Error Handling

  • If entity selector matches zero entities, warn user (empty scope)
  • If start time is in the past (except for retroactive emergency windows), confirm intent
  • If overlapping windows exist for same scope, warn about redundancy
  • Gracefully handle Settings 2.0 schema version differences between Managed versions
  • If recurring pattern is invalid (e.g., "31st of every month"), suggest alternatives

Observability

  • Log all maintenance window lifecycle operations (create, update, extend, close, delete)
  • Track metrics: windows created per day, average duration, extension frequency
  • Monitor API latency and error rates
  • Alert if window creation fails repeatedly

Usability

  • Natural language flexibility: accept "now", "tomorrow", "next Tuesday", "in 2 hours", etc.
  • Support timezone-aware scheduling (UTC, local, specific timezones)
  • Preview scope before creating window (show entity count)
  • Confirm high-impact operations (e.g., multi-environment, long duration, broad scope)

Audit and Compliance

  • Every maintenance window operation must include:
    • User identity (who created/modified)
    • Timestamp (when)
    • Reason/description (why)
    • Environment and scope (what/where)
  • Support exporting window history for compliance review
  • Preserve audit trail even after window is deleted

Out of Scope (Future Stories)

The following related features are not included in this story but are candidates for future work:

  1. Automatic Window Creation from CI/CD - Directly trigger window creation from Jenkins/GitLab/GitHub Actions
  2. Smart Window Duration Prediction - ML-based prediction of optimal window length based on deployment complexity
  3. Maintenance Window Templates - Pre-defined templates for common scenarios (weekly deploy, monthly maintenance, migration waves)
  4. Window Collision Detection - Prevent scheduling conflicting windows; suggest alternative times
  5. Notification Preferences - Custom notification behavior during windows (suppress all vs. critical only)
  6. Window Effectiveness Metrics - Track how many problems were suppressed, cost savings, etc.
  7. Approval Workflows - Require approval before creating production maintenance windows
  8. Window Handoff Automation - Automatically extend or adjust windows based on real-time deployment progress

Dependencies

Dynatrace APIs Required

  • Settings 2.0 API (/api/v2/settings/...)
    • Schema: builtin:alerting.maintenance-window
    • GET /api/v2/settings/schemas/builtin:alerting.maintenance-window - Schema definition
    • POST /api/v2/settings/objects - Create window
    • PUT /api/v2/settings/objects/{objectId} - Update window
    • DELETE /api/v2/settings/objects/{objectId} - Delete window
    • GET /api/v2/settings/objects?schemaIds=builtin:alerting.maintenance-window - Query windows
  • Entities API v2 (/api/v2/entities) - Validate entity selectors and count entities
  • Events API v2 (/api/v2/events/ingest) - Link to deployment/migration events (optional)

Existing MCP Server Capabilities

  • Multi-environment connection management (already implemented)
  • Authentication via API tokens (already implemented)
  • NLM integration for natural language parsing
  • Event ingestion (from previous user story) - for linking windows to events
  • Problem lifecycle management (from previous user story) - for coordinating window closure with problem resolution

External Integrations (Future)

  • CI/CD tools (Jenkins, GitLab CI, GitHub Actions) - trigger window creation
  • Fleet management system (future story) - coordinate with OneAgent/ActiveGate updates
  • Calendar systems (Google Calendar, Outlook) - sync maintenance windows to team calendars

Success Metrics

Efficiency Gains

  • Target: Reduce maintenance window creation time by 80% compared to manual UI workflow
  • Measure: Time from user request to active window (avg. <10 seconds via MCP vs. 2-3 minutes in UI)

Alert Fatigue Reduction

  • Target: 70% reduction in unnecessary alerts during deployments
  • Measure: Compare problem notification count during windowed vs. non-windowed deployments

Migration Support

  • Target: 95% of migration waves use maintenance windows
  • Measure: Track migration events with linked maintenance windows

User Adoption

  • Target: 80% of users create at least one maintenance window via MCP within first month
  • Measure: Track create-maintenance-window tool usage

Recurring Window Efficiency

  • Target: 60% of maintenance windows are recurring (not one-time)
  • Measure: Track recurring vs. one-time window creation ratio

Multi-Environment Consistency

  • Target: 50% of windows are created across multiple Managed clusters simultaneously
  • Measure: Track multi-environment window creation

Testing Strategy

Unit Tests

  • Validate Settings 2.0 payload construction for maintenance windows
  • Test natural language parsing for schedules (relative and absolute times)
  • Verify entity selector construction from tags and management zones
  • Test recurrence pattern generation (daily, weekly, monthly)
  • Validate scope validation logic (entity count, empty scope detection)

Integration Tests

  • E2E: Create scheduled window, verify in Dynatrace, let it activate, verify suppression
  • E2E: Create ad-hoc window immediately, verify it's active within seconds
  • E2E: Extend active window, verify new end time
  • E2E: Close window early, verify suppression ends
  • E2E: Create recurring window, verify next occurrences calculated correctly
  • Cross-environment: Create same window in multiple Managed clusters

Contract Tests

  • Validate against latest builtin:alerting.maintenance-window schema from Managed
  • Test with different schema versions (ensure backward compatibility)
  • Verify all required fields are included
  • Test optional field combinations

User Acceptance Tests

  • Scenario: DevOps engineer creates weekly deployment window
  • Scenario: SRE creates emergency window during unplanned outage
  • Scenario: Migration specialist creates wave-based migration windows
  • Scenario: Platform engineer extends window mid-deployment
  • Scenario: Team lead queries upcoming windows across environments

Load Tests

  • Create 100+ windows across multiple environments
  • Verify performance remains acceptable (< 5 seconds per operation)
  • Test parallel window creation (multiple users simultaneously)

Risks and Mitigations

Risk Impact Mitigation
Incorrect scope suppresses too many alerts High - could hide critical production issues Always preview entity count before creation; require confirmation for broad scope (>50 entities)
Maintenance window forgotten/not closed Medium - extended suppression hides real problems Send reminders when window approaches end time; auto-close option after deployment events
Timezone confusion causes wrong timing High - window activates at wrong time Always confirm timezone; show time in UTC AND local time; support explicit timezone specification
Recurring pattern miscalculation Medium - windows at wrong times Show preview of next 3-5 occurrences; allow user to verify before creation
Schema version differences across Managed clusters Medium - window creation may fail on older clusters Detect Managed version; adjust payload to compatible schema version; warn about unsupported features
Overlapping windows create confusion Low - redundant suppression Detect overlaps; warn user; suggest consolidating windows
Retroactive windows don't suppress existing problems Low - expectations mismatch Clearly communicate that retroactive windows prevent NEW alerts, don't clear existing problems

Appendix: Maintenance Window Schema Reference

builtin:alerting.maintenance-window

Scope: environment

Key Properties:

{
  enabled: boolean,                    // Active or disabled
  generalProperties: {
    name: string,                      // Display name
    description?: string,              // Optional description
    maintenanceType: "PLANNED" | "UNPLANNED",
    suppression: "DETECT_PROBLEMS_DONT_ALERT" | "DONT_DETECT_PROBLEMS"
  },
  schedule: {
    scheduleType: "ONCE" | "DAILY" | "WEEKLY" | "MONTHLY",
    start: string,                     // ISO 8601 timestamp
    end: string,                       // ISO 8601 timestamp
    zoneId: string,                    // Timezone (e.g., "UTC", "America/New_York")
    recurrence?: {
      daysOfWeek?: number[],           // For weekly (1=Monday, 7=Sunday)
      dayOfMonth?: number,             // For monthly
      durationMinutes?: number         // Window length
    }
  },
  scope: {
    matches: [{
      type: "MANAGEMENT_ZONE" | "ENTITY" | "ENTITY_NAME" | "TAG",
      managementZoneId?: string,
      entityIds?: string[],
      tagKey?: string,
      tagValue?: string
    }]
  }
}

Full schema available via: GET /api/v2/settings/schemas/builtin:alerting.maintenance-window

Suppression Options:

  • DETECT_PROBLEMS_DONT_ALERT - Problems are detected but don't trigger notifications
  • DONT_DETECT_PROBLEMS - Problems are not detected at all during window

Supported Scope Types:

  • Management Zone ID
  • Explicit entity IDs
  • Tag-based matching (key + value)
  • Entity name pattern matching

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions