[Design Proposal] Adversarial Input Normalisation Layer (Layer 1) — Canonicalisation & De-obfuscation Engine

## [Design Proposal] Lightweight Input Normalisation Gate (Layer 1 Preprocessing)

## Overview

This proposal introduces a **lightweight input normalisation step** within Layer 1 of the Cognitive Firewall pipeline.

The goal is to standardize incoming text into a consistent, analysable form **before it reaches heuristic (regex) and semantic layers**, while maintaining strict low-latency constraints.

---

## Problem

Adversarial inputs often use simple obfuscation techniques such as:

* Unicode variations (e.g., homoglyphs, decomposed characters)
* Zero-width or invisible characters
* Encoded representations (e.g., URL encoding)

These can reduce the effectiveness of downstream detection layers if processed in raw form.

---

## Proposed Approach

Introduce a **minimal, stateless normalisation step** that performs safe and low-cost transformations.

### Pipeline (Fast-Path)

```
Raw Input
   │
   ▼
[1] Unicode Normalisation (NFKC)
   ▼
[2] Strip Zero-Width & Invisible Characters
   ▼
[3] Limited Percent-Decoding (URL encoding only, where valid)
   ▼
Normalised Text → Layer 1 Heuristic Scanner
```

---

## Design Principles

* **Lightweight**: No heavy parsing or ML inference
* **Deterministic**: Only safe, well-defined transformations
* **Low latency**: Designed to add minimal overhead to the pipeline
* **Stateless**: No dependency on external systems or memory

---

## Non-Goals (for Phase 1)

To keep the fast-path efficient, the following are intentionally excluded:

* Full homoglyph mapping across scripts
* Base64 or complex encoding detection
* Aggressive text reconstruction
* Heuristic scoring within this layer

These can be handled in downstream layers if needed.

---

## Output

```python
@dataclass
class NormalisationResult:
    normalised_text: str
    original_text: str
```

This keeps the interface minimal and avoids adding latency.

---

## Integration

* Runs immediately after payload is received via UDS
* Feeds into:

  * Layer 1 heuristic scanner
  * Layer 2 semantic analysis

---

## Alignment with Project Philosophy

This supports the principle of:

> “All inputs must be validated before entering reasoning context”

by ensuring inputs are first **made consistent and readable** before validation.

---

## Open Questions

1. Should percent-decoding be limited strictly to valid URL-encoded sequences?
2. Are there specific Unicode normalization standards preferred beyond NFKC?
3. Should this layer emit any lightweight signals, or remain purely transformational?

---

## Next Step

Happy to open a draft PR with:

* minimal normalisation function
* test cases for edge inputs (Unicode + zero-width + encoding)

---

cc: @tharindupr @charithccmc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Design Proposal] Adversarial Input Normalisation Layer (Layer 1) — Canonicalisation & De-obfuscation Engine #8

[Design Proposal] Lightweight Input Normalisation Gate (Layer 1 Preprocessing)

Overview

Problem

Proposed Approach

Pipeline (Fast-Path)

Design Principles

Non-Goals (for Phase 1)

Output

Integration

Alignment with Project Philosophy

Open Questions

Next Step

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Design Proposal] Adversarial Input Normalisation Layer (Layer 1) — Canonicalisation & De-obfuscation Engine #8

Description

[Design Proposal] Lightweight Input Normalisation Gate (Layer 1 Preprocessing)

Overview

Problem

Proposed Approach

Pipeline (Fast-Path)

Design Principles

Non-Goals (for Phase 1)

Output

Integration

Alignment with Project Philosophy

Open Questions

Next Step

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions