Summary
LWS currently provides no in-band, machine-actionable mechanism for a client to declare — or for a server to record and enforce — what kind of information set a resource's abstract state constitutes. This gap is tolerable in pre-configured, domain-specific servers where the information model is fixed by deployment. It becomes acutely paralyzing in general LWS storage servers, where the information model of each resource is a first-class concern that varies per resource and cannot be assumed in advance.
This issue proposes that LWS define a mechanism to associate an explicit Abstract State Infoset Type (AIST) with a resource — both at rest (governing persistence, metadata, constraints, and operation semantics) and in transit (governing representation interpretation during HTTP interactions).
Background: the REST assumption that breaks for general storage
The REST architectural style defines a resource as any identifiable thing, and a representation as bytes + a media type encoding the resource's current state for transfer. REST components are expected to glean the resource's abstract state from the media type's semantics alone.
This assumption holds when a media type uniquely determines a single information model — image/png unambiguously identifies a raster bitmap; text/html a document tree. It breaks down for media types capable of encoding structurally incompatible information models under the same type identifier.
application/ld+json is the canonical example. A conforming processor may legitimately treat the same document as:
- A plain JSON object tree
- An RDF dataset, by expanding the document to a set of triples
- A file with media-type
application/ld+json
These are not alternative serializations of one model. A JSON tree is positional and closed-world; an RDF dataset is graph-oriented and open-world. They require different persistence backends, support different constraint languages, admit different patch mechanisms, and expose different query interfaces.
REST provides no mechanism to declare which model is intended. The HTTP profile parameter is insufficient: it constrains within one interpretation (declaring a shape or vocabulary), but does not select between fundamentally incompatible information models expressed by the same media type.
The concrete indeterminism: what a server must silently decide
Consider an LWS client sending:
PUT /storage/doc1 HTTP/1.1
Host: storage.example
Content-Type: application/ld+json
{
"@context": "https://schema.org/",
"@type": "Person",
"name": "Arun Kumar",
"email": "arun@example.org"
}
The server must make at least the following decisions, none of which are determinable from the request:
| Decision |
Option A |
Option B |
Option C |
| Persistence |
Object store (raw bytes + media type) |
Document store (JSON tree) |
Triple store (RDF graph) |
| What is stored |
Exact bytes as-is |
JSON key-value tree |
Expanded RDF triples |
| Metadata |
Byte size, checksum |
JSON Schema, tree depth |
Triple count, named graph IRIs |
| Valid constraints |
Max-size, allowed media types |
JSON Schema, required keys |
SHACL shapes, OWL consistency |
| PATCH semantics |
Byte-range / full replace |
JSON Patch, JSON Merge Patch |
SPARQL Update, RDF Patch |
| Content negotiation |
Not applicable — always returns stored bytes |
May serve YAML, TOML, CBOR |
May serve Turtle, N-Quads, RDF/XML |
| Query interface |
None |
JSON path / document queries |
SPARQL |
Each combination is architecturally coherent and RFC-conformant. The server is forced to pick one implicitly. The choice is never communicated to the client. The client has no way to express a preference.
Downstream failures
These silent choices produce real, hard-to-diagnose failures:
- A client stores the document against Option A (byte-literal) and later issues a SPARQL query. The query returns nothing — not because the data is absent, but because no RDF graph was ever stored. No error is raised.
- A client issues a JSON Patch against a server that chose Option C (RDF graph). The server must either reject it without an actionable explanation, or — worse — misapply it to the stored JSON-LD bytes, silently corrupting the dataset's serialized form.
- A client migrates the resource from an Option B server (JSON tree) to an Option C server (RDF graph). On the former,
"@context" was an inert JSON key. On the latter it becomes a live semantic expansion link, fundamentally altering the meaning of every other statement in the document. No error is raised. The abstract state has changed silently during migration.
These are not implementation bugs. They are the predictable consequence of an absent architectural mechanism. No amount of implementation discipline or documentation can compensate.
Why this is particularly acute for general LWS storage servers
A pre-configured, domain-specific server can paper over this gap. If a server is deployed exclusively to store FOAF profiles as RDF datasets, the operator hard-codes the model. The client and server share out-of-band knowledge, and the gap is never exposed.
A general LWS storage server has no such luxury. It must host arbitrary resources whose information models are declared at resource-creation time by clients the server has never seen. Without a mechanism to record and enforce the intended model:
- The server cannot know whether
application/ld+json bytes should be stored as a JSON document, an RDF graph, or a byte sequence — and cannot make a choice that is correct for all clients.
- Two applications sharing the same storage may silently hold different assumptions about the same resource, producing inconsistent behaviour across reads, writes, and migrations.
- The server cannot expose a principled, resource-specific operation surface (which patch mechanisms are valid? which query interface applies?) because it does not know the resource's information model.
LWS, like REST, assumes the media type is sufficient to determine the information model. General storage is precisely the environment where that assumption structurally cannot hold.
Prior art: the same problem surfaced in the Solid ecosystem
This tension is not hypothetical, and has already surfaced in prior specifications. In [solid/specification#342](https://github.com/solid/specification/issues/342), a user argued that the complete content of Turtle and RDFa documents — including whitespace, comments, and formatting — must be wholly preserved by a server. The response was that if byte-literal preservation is required, the client should use text/plain or application/octet-stream instead.
That response is correct within REST's model — but it exposes exactly the gap described here. The client's real intention is neither "treat this as opaque bytes" nor "treat this as an abstract RDF graph." The client wants the resource treated as a Turtle file: simultaneously a byte sequence (whose exact content must be preserved) and an RDF graph (whose semantic content is accessible and queryable). These are two facets of one unified state, not two separate resources.
Under the current LWS model, the client has no architectural mechanism to express this. The only options are:
- Use
text/turtle and accept that the server may parse and re-serialize, losing exact byte content
- Use
application/octet-stream and lose all RDF semantics
Neither is what the client actually needs. This is a concrete, already-documented instance of the general problem this issue raises.
Proposed mechanism: Abstract State Infoset Type (AIST)
We propose that LWS define a mechanism to associate an explicit, URI-identified Abstract State Infoset Type (AIST) with a resource, both at rest and in transit. The analogy is direct: just as Content-Type types a representation, an AIST types the abstract state of a resource.
AISTs are classified into three kinds:
- Leaf — a single atomic information model (e.g. byte sequence, JSON tree, RDF dataset)
- Multi-faceted — a unified state simultaneously expressible under multiple models, which are views of the same information rather than independent sub-infosets (e.g. a Turtle file as bytes and as an RDF graph — the
solid/specification#342 case)
- Multi-part — a compound state composed of distinct sub-infosets with defined compositional rules
The Resource-AIST HTTP header
A new header communicates the AIST during HTTP interactions. Returning to the opening example:
PUT /storage/doc1 HTTP/1.1
Host: storage.example
Content-Type: application/ld+json
Resource-AIST: <https://example.org/aist/RdfDataset>; known=true
{
"@context": "https://schema.org/",
"@type": "Person",
"name": "Arun Kumar",
"email": "arun@example.org"
}
The server now knows unambiguously: persist in a triple store; accept SPARQL Update and RDF Patch (reject JSON Patch); negotiate content using RDF serialization formats; apply SHACL and OWL-level constraints. All decisions from the indeterminism table above are resolved by a single declared type.
For the solid/specification#342 case, the client declares a multi-faceted AIST combining File (byte-literal preservation) with RdfDataset (semantic access):
PUT /storage/profile.ttl HTTP/1.1
Host: storage.example
Content-Type: text/turtle
Resource-AIST: <https://example.org/aist/TurtleFile>; known=true; immutable=true
Where TurtleFile is a multi-faceted AIST declaring: the byte sequence is canonical and must be preserved exactly; the RDF graph is derivable from it by parsing and is independently queryable; PATCH must maintain consistency between both facets. The client no longer has to choose between losing bytes or losing semantics.
Key header parameters
known (boolean, default false): whether the server recognizes the AIST and has verified the media type association. known=false allows unknown AISTs to be persisted byte-literally with explicit uncertainty signalled in responses.
immutable (boolean, default false): whether the AIST is fixed for the resource's lifecycle. When true, the server rejects updates whose media type is incompatible with the AIST's declared supported types.
rep-asm-facets / rep-asm-parts: space-separated lists of facet or part AIST URIs encoded in the accompanying representation. Enables partial state transfer during content negotiation for multi-faceted and multi-part AISTs respectively.
What LWS would need to specify
- The
Resource-AIST header — syntax, parameters, and request/response semantics.
- Behaviour on resource creation — when
Resource-AIST is absent and the server cannot unambiguously determine an AIST from Content-Type alone, the server should signal this rather than silently choosing.
- Content negotiation governed by AIST —
Accept-driven negotiation constrained to the AIST's declared supported media types.
- PATCH semantics governed by AIST — servers must reject patch formats not listed as supported by the resource's AIST.
- AIST description document format — a machine-readable document resolvable from the AIST URI, declaring supported media types, metadata vocabulary, constraint types, and persistence recommendations.
- A small set of predefined AISTs — at minimum:
Null (non-information resource / 303 redirect), File (byte-literal), RdfDataset (RDF graph), JsonInfoSet (JSON tree), and composites such as TurtleFile (File + RdfDataset, multi-faceted) for the solid/specification#342 class of use cases.
References
cc: @pchampin ( I had seen you, expressing concern about associating representation metadata with resource somewhere.)
AI-NOTE: Essential content is mine. But took help of claude for phrasing, as the language is not native to me.
Summary
LWS currently provides no in-band, machine-actionable mechanism for a client to declare — or for a server to record and enforce — what kind of information set a resource's abstract state constitutes. This gap is tolerable in pre-configured, domain-specific servers where the information model is fixed by deployment. It becomes acutely paralyzing in general LWS storage servers, where the information model of each resource is a first-class concern that varies per resource and cannot be assumed in advance.
This issue proposes that LWS define a mechanism to associate an explicit Abstract State Infoset Type (AIST) with a resource — both at rest (governing persistence, metadata, constraints, and operation semantics) and in transit (governing representation interpretation during HTTP interactions).
Background: the REST assumption that breaks for general storage
The REST architectural style defines a resource as any identifiable thing, and a representation as bytes + a media type encoding the resource's current state for transfer. REST components are expected to glean the resource's abstract state from the media type's semantics alone.
This assumption holds when a media type uniquely determines a single information model —
image/pngunambiguously identifies a raster bitmap;text/htmla document tree. It breaks down for media types capable of encoding structurally incompatible information models under the same type identifier.application/ld+jsonis the canonical example. A conforming processor may legitimately treat the same document as:application/ld+jsonThese are not alternative serializations of one model. A JSON tree is positional and closed-world; an RDF dataset is graph-oriented and open-world. They require different persistence backends, support different constraint languages, admit different patch mechanisms, and expose different query interfaces.
REST provides no mechanism to declare which model is intended. The HTTP
profileparameter is insufficient: it constrains within one interpretation (declaring a shape or vocabulary), but does not select between fundamentally incompatible information models expressed by the same media type.The concrete indeterminism: what a server must silently decide
Consider an LWS client sending:
The server must make at least the following decisions, none of which are determinable from the request:
Each combination is architecturally coherent and RFC-conformant. The server is forced to pick one implicitly. The choice is never communicated to the client. The client has no way to express a preference.
Downstream failures
These silent choices produce real, hard-to-diagnose failures:
"@context"was an inert JSON key. On the latter it becomes a live semantic expansion link, fundamentally altering the meaning of every other statement in the document. No error is raised. The abstract state has changed silently during migration.Why this is particularly acute for general LWS storage servers
A pre-configured, domain-specific server can paper over this gap. If a server is deployed exclusively to store FOAF profiles as RDF datasets, the operator hard-codes the model. The client and server share out-of-band knowledge, and the gap is never exposed.
A general LWS storage server has no such luxury. It must host arbitrary resources whose information models are declared at resource-creation time by clients the server has never seen. Without a mechanism to record and enforce the intended model:
application/ld+jsonbytes should be stored as a JSON document, an RDF graph, or a byte sequence — and cannot make a choice that is correct for all clients.LWS, like REST, assumes the media type is sufficient to determine the information model. General storage is precisely the environment where that assumption structurally cannot hold.
Prior art: the same problem surfaced in the Solid ecosystem
This tension is not hypothetical, and has already surfaced in prior specifications. In
[solid/specification#342](https://github.com/solid/specification/issues/342), a user argued that the complete content of Turtle and RDFa documents — including whitespace, comments, and formatting — must be wholly preserved by a server. The response was that if byte-literal preservation is required, the client should usetext/plainorapplication/octet-streaminstead.That response is correct within REST's model — but it exposes exactly the gap described here. The client's real intention is neither "treat this as opaque bytes" nor "treat this as an abstract RDF graph." The client wants the resource treated as a Turtle file: simultaneously a byte sequence (whose exact content must be preserved) and an RDF graph (whose semantic content is accessible and queryable). These are two facets of one unified state, not two separate resources.
Under the current LWS model, the client has no architectural mechanism to express this. The only options are:
text/turtleand accept that the server may parse and re-serialize, losing exact byte contentapplication/octet-streamand lose all RDF semanticsNeither is what the client actually needs. This is a concrete, already-documented instance of the general problem this issue raises.
Proposed mechanism: Abstract State Infoset Type (AIST)
We propose that LWS define a mechanism to associate an explicit, URI-identified Abstract State Infoset Type (AIST) with a resource, both at rest and in transit. The analogy is direct: just as
Content-Typetypes a representation, an AIST types the abstract state of a resource.AISTs are classified into three kinds:
solid/specification#342case)The
Resource-AISTHTTP headerA new header communicates the AIST during HTTP interactions. Returning to the opening example:
The server now knows unambiguously: persist in a triple store; accept SPARQL Update and RDF Patch (reject JSON Patch); negotiate content using RDF serialization formats; apply SHACL and OWL-level constraints. All decisions from the indeterminism table above are resolved by a single declared type.
For the
solid/specification#342case, the client declares a multi-faceted AIST combiningFile(byte-literal preservation) withRdfDataset(semantic access):Where
TurtleFileis a multi-faceted AIST declaring: the byte sequence is canonical and must be preserved exactly; the RDF graph is derivable from it by parsing and is independently queryable; PATCH must maintain consistency between both facets. The client no longer has to choose between losing bytes or losing semantics.Key header parameters
known(boolean, defaultfalse): whether the server recognizes the AIST and has verified the media type association.known=falseallows unknown AISTs to be persisted byte-literally with explicit uncertainty signalled in responses.immutable(boolean, defaultfalse): whether the AIST is fixed for the resource's lifecycle. Whentrue, the server rejects updates whose media type is incompatible with the AIST's declared supported types.rep-asm-facets/rep-asm-parts: space-separated lists of facet or part AIST URIs encoded in the accompanying representation. Enables partial state transfer during content negotiation for multi-faceted and multi-part AISTs respectively.What LWS would need to specify
Resource-AISTheader — syntax, parameters, and request/response semantics.Resource-AISTis absent and the server cannot unambiguously determine an AIST fromContent-Typealone, the server should signal this rather than silently choosing.Accept-driven negotiation constrained to the AIST's declared supported media types.Null(non-information resource / 303 redirect),File(byte-literal),RdfDataset(RDF graph),JsonInfoSet(JSON tree), and composites such asTurtleFile(File + RdfDataset, multi-faceted) for thesolid/specification#342class of use cases.References
[solid/specification#342](https://github.com/solid/specification/issues/342)— Content of Turtle and RDFa documents should be wholly and entirely preservedcc: @pchampin ( I had seen you, expressing concern about associating representation metadata with resource somewhere.)
AI-NOTE: Essential content is mine. But took help of claude for phrasing, as the language is not native to me.