Skip to content

Data Compliance & Redaction Framework #83

Description

@tsutomi

A schema-driven data classification and redaction foundation built on the Microsoft.Extensions.Compliance.* stack, with first-class hooks for the Hermodr event model — no reinvented taxonomy, no POCO-only assumptions.

The problem today: In regulated industries (PCI-DSS, healthcare, government) it is forbidden to store or transport confidential and strictly confidential data in clear form. The Microsoft.Extensions.Compliance library provides a solid redaction engine and a DataClassification taxonomy, but it is POCO-oriented — it redacts CLR objects by property name. Hermodr publishes CloudEvents whose Data is an opaque byte payload (JSON, XML, Protobuf, etc.); the framework's EventFactory builds the envelope from an annotated CLR class, but the runtime event carries a serialised blob. As a result, neither the stock Microsoft redaction engine nor any existing Hermodr mechanism can match classification rules against the actual properties inside the CloudEvent.Data payload.

What we will build: A new Hermodr.Compliance package that bridges the Microsoft.Extensions.Compliance.* stack to the Hermodr event world, composed of four building blocks:

  • Annotation integration. Hermodr.Annotations takes a dependency on Microsoft.Extensions.Compliance.Abstractions and the framework's DataClassificationAttribute is used directly (no wrapper). EventSchemaFactory.CreateEventProperty reads the attribute from each member and stamps the resulting DataClassification on the produced EventProperty, so reflection-based schema generation already knows which fields are sensitive.
  • Schema-aware IEventRedactor. The default SchemaDrivenCloudEventRedactor deserialises the CloudEvent.Data payload (JSON-compatible content types in v1; other formats delegate to a pluggable redactor) and walks the IEventSchema property tree in parallel, asking IRedactorProvider.GetRedactor(DataClassificationSet) for the configured Redactor and replacing the matching values. The original CloudEvent is never mutated; the redactor always returns a new event.
  • IEventSchemaRegistry in the Schema domain. A small registry abstraction that maps DataSchema URIs and (eventType, version) tuples to IEventSchema instances. The user registers schemas explicitly (services.AddEventSchemaRegistry(r => r.Add<OrderPlaced>())) — Compliance and Audit Trail never register it. Verified during design: Microsoft.Extensions.Compliance does not provide a schema registry, so this is a Hermodr concern, kept inside Hermodr.Schema.
  • IRedactionPolicy and datasensitivity signal. A cheap, allocation-light policy decides whether a given event should be redacted, driven by either the per-channel Compliance.Redaction mode (Disabled, WhenSensitive, Always) or the datasensitivity CloudEvent extension attribute that producers can stamp on outgoing events (or that EventFactory stamps automatically when a [DataClassification] is present on the source CLR type). Missing-schema behaviour (Allow / Block / Fallback) is configurable per channel.

Benefits:

  • Reuses the canonical Microsoft.Extensions.Compliance taxonomy and redaction engine — no parallel classification system, no lock-in to a Hermodr-specific redaction API.
  • The schema is the single source of truth for which fields are sensitive: one annotation on the CLR class drives schema generation, runtime redaction, and the datasensitivity extension attribute.
  • The framework boundary is well defined: a new Hermodr.Compliance package depending on two Microsoft packages, plus a new IEventSchemaRegistry in Hermodr.Schema. Nothing in the publish core needs to change.
  • Pluggable IEventRedactor makes the framework format-agnostic — JSON in v1, XML/Avro/Protobuf via custom redactors.

See ROADMAP.md item 23 for the full design.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions