feat: add top-level encoding field to the data contract

## Summary

The ODCS currently describes the logical and physical types of schema properties (`logicalType`, `physicalType`) but does not provide a standardized way to declare the **character encoding** expected for the data (e.g. UTF-8, ISO-8859-1, ASCII, UTF-16).

## Motivation

Encoding mismatches are a common and painful source of data pipeline failures — especially with flat files (CSV, TXT) or legacy systems that produce non-UTF-8 data. Today, teams work around this by documenting encoding in free-text descriptions or custom properties, which is inconsistent and not machine-readable.

Adding a top-level `encoding` field would allow:
- **Data producers** to declare the expected encoding explicitly
- **Data consumers** to validate and configure their readers accordingly
- **Tooling** to automate encoding checks as part of data quality

## Proposed Change

Add an optional `encoding` field at the **contract level** (alongside `name`, `domain`, `status`, etc.) with a free-form string value to remain flexible and future-proof:

```json
{
  "apiVersion": "v3.1.0",
  "kind": "DataContract",
  "name": "my_dataset",
  "encoding": "UTF-8",
  ...
}
```

### Schema addition (JSON Schema)

```json
"encoding": {
  "type": "string",
  "description": "The expected character encoding of the data (e.g. UTF-8, ISO-8859-1, ASCII, UTF-16). Free-form string to remain flexible across use cases."
}
```

## Alternatives Considered

- **At the column level (`SchemaBaseProperty`)**: More granular but arguably over-engineered for most use cases where encoding is consistent across the dataset. Could be a follow-up if the community needs it.
- **Via `customProperties`**: Already possible today, but not standardized — no tooling can rely on it.

## Open Questions

- Should common values be documented as examples (UTF-8, ISO-8859-1...) even if the field stays free-form?
- Should this be scoped per `server` instead of (or in addition to) the contract level?

Looking forward to the community's thoughts! 🙌

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add top-level encoding field to the data contract #263

Summary

Motivation

Proposed Change

Schema addition (JSON Schema)

Alternatives Considered

Open Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: add top-level encoding field to the data contract #263

Description

Summary

Motivation

Proposed Change

Schema addition (JSON Schema)

Alternatives Considered

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions