Skip to content

feat(deploy,infra,security): rate limiting + abuse detection (per Entra OID, CAPTCHA) #1034

@Cataldir

Description

@Cataldir

Problem statement

Capability 43's locked rate-limit + abuse-detection contract:

  • Per Entra Object ID: 3 active deployments / 24 h; 10 deployments / 30 d.
  • Pre-flight: 1 / minute per user.
  • CAPTCHA after the third pre-flight attempt in 1 h.
  • Manual-review flag when more than 5 new users from the same tenant arrive in a 5-minute burst.
  • Microsoft / Entra account required (no anonymous).

This issue ships the APIM rate-limit policies plus the per-Entra-OID counter store and the manual-review-flag pipeline.

Acceptance criteria

  • APIM rate-limit policy on the deploy-portal API:
    • POST /api/preflight: 1 / minute per Entra OID
    • POST /api/deploy: 3 active concurrent + 3 starts / 24 h per Entra OID, 10 starts / 30 d per Entra OID
    • All other endpoints: sane default
  • Per-Entra-OID counters stored in Redis (or Cosmos with TTL); APIM policy queries them via a custom policy fragment.
  • CAPTCHA challenge integrated into the UI on /deploy/preflight after the third pre-flight attempt within 1 h. Free option: hCaptcha or reCAPTCHA v3. Server-side verification before counting the attempt.
  • Manual-review flag pipeline:
    • Detect "5 new Entra OIDs from the same tenant in a 5-minute window"
    • Write a flag record to Cosmos
    • Surface the flag in App Insights (custom event) for an alert
  • Anonymous access blocked at APIM (no token = 401).
  • Rate-limit responses include a clear Retry-After header and human-readable JSON body.
  • Threshold values configurable via Key Vault (no recompile to change limits).

Risks and dependencies

Risk Mitigation
Legitimate users with multiple subscriptions hit the 3-active limit. 3 active is per Entra OID, not per subscription; pilot users with multi-sub needs can request a temporary lift via support.
CAPTCHA reduces conversion. CAPTCHA only triggers after 3 attempts in 1 h; v3 silent challenge first; visible challenge only on score-fail.
Per-OID counter store goes down → all deploys blocked. Fail-open on counter-store error (allow with a warning audit event); pager on counter-store availability.
Tenant-burst flag false-positives on legitimate enterprise rollout days. Flag is for review, not auto-block; reviewed within 1 business day; can be suppressed per tenant.

Blocked by: epic 41 #1020; C1 #1027 (APIM + Redis/Cosmos); C5 #1031 (OBO/auth); R2 epic #1008; R1 epic #990.

Evidence links

ADR impact

  • ADR-034 (audience-segmented IA) — implementation step.

Branch

feature/<this-issue-id>-deploy-rate-limit-and-abuse-detection per ADR-018.

BPMN process

%%{init: {'theme':'base', 'themeVariables': {
  'primaryColor':'#FFB3BA',
  'primaryTextColor':'#000',
  'primaryBorderColor':'#FF8B94',
  'lineColor':'#BAE1FF',
  'secondaryColor':'#BAE1FF',
  'tertiaryColor':'#FFFFFF'
}}}%%
flowchart LR
  A[Analyze Current Code] --> B[Design Change]
  B --> C[Implement on Issue Branch]
  C --> D[Open PR]
  D --> E[Validation and Fixes]
  E --> F[Merge to Main]
  F --> G[Monitor Workflows]
  G --> H[Close Issue and Cleanup]
Loading

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:infraInfrastructure / IaC concernarea:securitySecurity posture, OAuth, data residency, compliancegtm:deploy-portalCapability 43 — One-click deployment portalpriority:highHigh priority worktype:featureNew feature or capability

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions