Need for a deterministic mechanism to declare the Abstract State Infoset Type of a resource

## Summary

LWS currently provides no in-band, machine-actionable mechanism for a client to declare — or for a server to record and enforce — what *kind* of information set a resource's abstract state constitutes. This gap is tolerable in pre-configured, domain-specific servers where the information model is fixed by deployment. It becomes acutely paralyzing in general LWS storage servers, where the information model of each resource is a first-class concern that varies per resource and cannot be assumed in advance.

This issue proposes that LWS define a mechanism to associate an explicit **Abstract State Infoset Type (AIST)** with a resource — both at rest (governing persistence, metadata, constraints, and operation semantics) and in transit (governing representation interpretation during HTTP interactions).

---

## Background: the REST assumption that breaks for general storage

The REST architectural style defines a resource as any identifiable thing, and a representation as bytes + a media type encoding the resource's current state for transfer. REST components are expected to glean the resource's abstract state from the media type's semantics alone.

This assumption holds when a media type uniquely determines a single information model — `image/png` unambiguously identifies a raster bitmap; `text/html` a document tree. It breaks down for media types capable of encoding structurally *incompatible* information models under the same type identifier.

**`application/ld+json` is the canonical example.** A conforming processor may legitimately treat the same document as:

- A plain JSON object tree
- An RDF dataset, by expanding the document to a set of triples
- A file with media-type `application/ld+json`

These are not alternative serializations of one model. A JSON tree is positional and closed-world; an RDF dataset is graph-oriented and open-world. They require different persistence backends, support different constraint languages, admit different patch mechanisms, and expose different query interfaces.

**REST provides no mechanism to declare which model is intended.** The HTTP `profile` parameter is insufficient: it constrains *within* one interpretation (declaring a shape or vocabulary), but does not select *between* fundamentally incompatible information models expressed by the same media type.

---

## The concrete indeterminism: what a server must silently decide

Consider an LWS client sending:

```http
PUT /storage/doc1 HTTP/1.1
Host: storage.example
Content-Type: application/ld+json

{
  "@context": "https://schema.org/",
  "@type": "Person",
  "name": "Arun Kumar",
  "email": "arun@example.org"
}
```

The server must make at least the following decisions, none of which are determinable from the request:

| Decision | Option A | Option B | Option C |
|---|---|---|---|
| **Persistence** | Object store (raw bytes + media type) | Document store (JSON tree) | Triple store (RDF graph) |
| **What is stored** | Exact bytes as-is | JSON key-value tree | Expanded RDF triples |
| **Metadata** | Byte size, checksum | JSON Schema, tree depth | Triple count, named graph IRIs |
| **Valid constraints** | Max-size, allowed media types | JSON Schema, required keys | SHACL shapes, OWL consistency |
| **PATCH semantics** | Byte-range / full replace | JSON Patch, JSON Merge Patch | SPARQL Update, RDF Patch |
| **Content negotiation** | Not applicable — always returns stored bytes | May serve YAML, TOML, CBOR | May serve Turtle, N-Quads, RDF/XML |
| **Query interface** | None | JSON path / document queries | SPARQL |

Each combination is architecturally coherent and RFC-conformant. The server is forced to pick one implicitly. The choice is never communicated to the client. The client has no way to express a preference.

### Downstream failures

These silent choices produce real, hard-to-diagnose failures:

- A client stores the document against Option A (byte-literal) and later issues a SPARQL query. The query returns nothing — not because the data is absent, but because no RDF graph was ever stored. No error is raised.
- A client issues a JSON Patch against a server that chose Option C (RDF graph). The server must either reject it without an actionable explanation, or — worse — misapply it to the stored JSON-LD bytes, silently corrupting the dataset's serialized form.
- A client migrates the resource from an Option B server (JSON tree) to an Option C server (RDF graph). On the former, `"@context"` was an inert JSON key. On the latter it becomes a live semantic expansion link, fundamentally altering the meaning of every other statement in the document. No error is raised. The abstract state has changed silently during migration.

> **These are not implementation bugs. They are the predictable consequence of an absent architectural mechanism. No amount of implementation discipline or documentation can compensate.**

---

## Why this is particularly acute for general LWS storage servers

A pre-configured, domain-specific server can paper over this gap. If a server is deployed exclusively to store FOAF profiles as RDF datasets, the operator hard-codes the model. The client and server share out-of-band knowledge, and the gap is never exposed.

A **general LWS storage server** has no such luxury. It must host arbitrary resources whose information models are declared at resource-creation time by clients the server has never seen. Without a mechanism to record and enforce the intended model:

- The server cannot know whether `application/ld+json` bytes should be stored as a JSON document, an RDF graph, or a byte sequence — and cannot make a choice that is correct for all clients.
- Two applications sharing the same storage may silently hold different assumptions about the same resource, producing inconsistent behaviour across reads, writes, and migrations.
- The server cannot expose a principled, resource-specific operation surface (which patch mechanisms are valid? which query interface applies?) because it does not know the resource's information model.

LWS, like REST, assumes the media type is sufficient to determine the information model. General storage is precisely the environment where that assumption structurally cannot hold.

---

## Prior art: the same problem surfaced in the Solid ecosystem

This tension is not hypothetical, and has already surfaced in prior specifications. In [`[solid/specification#342](https://github.com/solid/specification/issues/342)`](https://github.com/solid/specification/issues/342), a user argued that the *complete content* of Turtle and RDFa documents — including whitespace, comments, and formatting — must be wholly preserved by a server. The response was that if byte-literal preservation is required, the client should use `text/plain` or `application/octet-stream` instead.

That response is correct within REST's model — but it exposes exactly the gap described here. The client's real intention is neither "treat this as opaque bytes" nor "treat this as an abstract RDF graph." The client wants the resource treated as a **Turtle file**: simultaneously a byte sequence (whose exact content must be preserved) and an RDF graph (whose semantic content is accessible and queryable). These are two facets of one unified state, not two separate resources.

Under the current LWS model, the client has no architectural mechanism to express this. The only options are:

1. Use `text/turtle` and accept that the server may parse and re-serialize, losing exact byte content
2. Use `application/octet-stream` and lose all RDF semantics

Neither is what the client actually needs. This is a concrete, already-documented instance of the general problem this issue raises.

---

## Proposed mechanism: Abstract State Infoset Type (AIST)

We propose that LWS define a mechanism to associate an explicit, URI-identified **Abstract State Infoset Type (AIST)** with a resource, both at rest and in transit. The analogy is direct: just as `Content-Type` types a *representation*, an AIST types the *abstract state* of a resource.

AISTs are classified into three kinds:
- **Leaf** — a single atomic information model (e.g. byte sequence, JSON tree, RDF dataset)
- **Multi-faceted** — a unified state simultaneously expressible under multiple models, which are *views* of the same information rather than independent sub-infosets (e.g. a Turtle file as bytes *and* as an RDF graph — the `solid/specification#342` case)
- **Multi-part** — a compound state composed of distinct sub-infosets with defined compositional rules

### The `Resource-AIST` HTTP header

A new header communicates the AIST during HTTP interactions. Returning to the opening example:

```http
PUT /storage/doc1 HTTP/1.1
Host: storage.example
Content-Type: application/ld+json
Resource-AIST: <https://example.org/aist/RdfDataset>; known=true

{
  "@context": "https://schema.org/",
  "@type": "Person",
  "name": "Arun Kumar",
  "email": "arun@example.org"
}
```

The server now knows unambiguously: persist in a triple store; accept SPARQL Update and RDF Patch (reject JSON Patch); negotiate content using RDF serialization formats; apply SHACL and OWL-level constraints. All decisions from the indeterminism table above are resolved by a single declared type.

For the `solid/specification#342` case, the client declares a multi-faceted AIST combining `File` (byte-literal preservation) with `RdfDataset` (semantic access):

```http
PUT /storage/profile.ttl HTTP/1.1
Host: storage.example
Content-Type: text/turtle
Resource-AIST: <https://example.org/aist/TurtleFile>; known=true; immutable=true
```

Where `TurtleFile` is a multi-faceted AIST declaring: the byte sequence is canonical and must be preserved exactly; the RDF graph is derivable from it by parsing and is independently queryable; PATCH must maintain consistency between both facets. The client no longer has to choose between losing bytes or losing semantics.

### Key header parameters

- `known` (boolean, default `false`): whether the server recognizes the AIST and has verified the media type association. `known=false` allows unknown AISTs to be persisted byte-literally with explicit uncertainty signalled in responses.
- `immutable` (boolean, default `false`): whether the AIST is fixed for the resource's lifecycle. When `true`, the server rejects updates whose media type is incompatible with the AIST's declared supported types.
- `rep-asm-facets` / `rep-asm-parts`: space-separated lists of facet or part AIST URIs encoded in the accompanying representation. Enables partial state transfer during content negotiation for multi-faceted and multi-part AISTs respectively.

---

## What LWS would need to specify

1. **The `Resource-AIST` header** — syntax, parameters, and request/response semantics.
2. **Behaviour on resource creation** — when `Resource-AIST` is absent and the server cannot unambiguously determine an AIST from `Content-Type` alone, the server should signal this rather than silently choosing.
3. **Content negotiation governed by AIST** — `Accept`-driven negotiation constrained to the AIST's declared supported media types.
4. **PATCH semantics governed by AIST** — servers must reject patch formats not listed as supported by the resource's AIST.
5. **AIST description document format** — a machine-readable document resolvable from the AIST URI, declaring supported media types, metadata vocabulary, constraint types, and persistence recommendations.
6. **A small set of predefined AISTs** — at minimum: `Null` (non-information resource / 303 redirect), `File` (byte-literal), `RdfDataset` (RDF graph), `JsonInfoSet` (JSON tree), and composites such as `TurtleFile` (File + RdfDataset, multi-faceted) for the `solid/specification#342` class of use cases.

## References

- [`[solid/specification#342](https://github.com/solid/specification/issues/342)`](https://github.com/solid/specification/issues/342) — Content of Turtle and RDFa documents should be wholly and entirely preserved
- Fielding, R. T. (2000). [[Architectural Styles and the Design of Network-based Software Architectures](https://ics.uci.edu/~fielding/pubs/dissertation/top.htm)](https://ics.uci.edu/~fielding/pubs/dissertation/top.htm), §5
- W3C. [[Linked Data Platform 1.0](https://www.w3.org/TR/ldp/)](https://www.w3.org/TR/ldp/)
- W3C. [[Cool URIs for the Semantic Web](https://www.w3.org/TR/cooluris/)](https://www.w3.org/TR/cooluris/)
- IETF RFC 9110. [[HTTP Semantics](https://www.rfc-editor.org/rfc/rfc9110)](https://www.rfc-editor.org/rfc/rfc9110)


cc: @pchampin  ( I had seen you, expressing concern about associating representation metadata with resource somewhere.)


AI-NOTE: Essential content is mine. But took help of claude for phrasing, as the language is not native to me. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Need for a deterministic mechanism to declare the Abstract State Infoset Type of a resource #110

Summary

Background: the REST assumption that breaks for general storage

The concrete indeterminism: what a server must silently decide

Downstream failures

Why this is particularly acute for general LWS storage servers

Prior art: the same problem surfaced in the Solid ecosystem

Proposed mechanism: Abstract State Infoset Type (AIST)

The `Resource-AIST` HTTP header

Key header parameters

What LWS would need to specify

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Decision	Option A	Option B	Option C
Persistence	Object store (raw bytes + media type)	Document store (JSON tree)	Triple store (RDF graph)
What is stored	Exact bytes as-is	JSON key-value tree	Expanded RDF triples
Metadata	Byte size, checksum	JSON Schema, tree depth	Triple count, named graph IRIs
Valid constraints	Max-size, allowed media types	JSON Schema, required keys	SHACL shapes, OWL consistency
PATCH semantics	Byte-range / full replace	JSON Patch, JSON Merge Patch	SPARQL Update, RDF Patch
Content negotiation	Not applicable — always returns stored bytes	May serve YAML, TOML, CBOR	May serve Turtle, N-Quads, RDF/XML
Query interface	None	JSON path / document queries	SPARQL

Uh oh!

Need for a deterministic mechanism to declare the Abstract State Infoset Type of a resource #110

Description

Summary

Background: the REST assumption that breaks for general storage

The concrete indeterminism: what a server must silently decide

Downstream failures

Why this is particularly acute for general LWS storage servers

Prior art: the same problem surfaced in the Solid ecosystem

Proposed mechanism: Abstract State Infoset Type (AIST)

The Resource-AIST HTTP header

Key header parameters

What LWS would need to specify

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

The `Resource-AIST` HTTP header