diff --git a/PROTOCOL.md b/PROTOCOL.md
index fd0bfcb64..7cbdbb143 100644
--- a/PROTOCOL.md
+++ b/PROTOCOL.md
@@ -267,12 +267,46 @@ Example:
This is the API to get the metadata of a share.
-HTTP Request | Value
--- | --
-Method | `GET`
-Header | `Authorization: Bearer {token}`
-URL | `{prefix}/shares/{share}`
-URL Parameters | **{share}**: The share name to query. It's case-insensitive.
+
+
+| HTTP Request |
+Value |
+
+
+| Method |
+
+
+`GET`
+
+ |
+
+
+| Headers |
+
+
+`Authorization: Bearer {token}`
+
+Optional: `delta-sharing-capabilities: includeSemanticMetadata=true`, see
+[Delta Sharing Capabilities Header](#delta-sharing-capabilities-header) for details.
+
+ |
+
+
+| URL |
+
+
+`{prefix}/shares/{share}`
+
+ |
+
+
+| URL Parameters |
+
+
+**{share}**: The share name to query. It's case-insensitive.
+ |
+
+
200: The share's metadata was successfully returned.
@@ -288,6 +322,9 @@ URL Parameters | **{share}**: The share name to query. It's case-insensitive.
`Content-Type: application/json; charset=utf-8`
+Optional: `delta-sharing-capabilities: includeSemanticMetadata=true`, see
+[Delta Sharing Capabilities Header](#delta-sharing-capabilities-header) for details
+
@@ -298,12 +335,13 @@ URL Parameters | **{share}**: The share name to query. It's case-insensitive.
{
"share": {
"name": "string",
- "id": "string"
+ "id": "string",
+ "comment": "string"
}
}
```
-Note: the `id` field is optional. If `id` is populated for a share, its value should be unique across the sharing server and stay immutable through the share's lifecycle. The format recommendation of `id` is UUID.
+Note: the `id` field is optional. If `id` is populated for a share, its value should be unique across the sharing server and stay immutable through the share's lifecycle. The format recommendation of `id` is UUID. `comment` is only populated when `includeSemanticMetadata` is set to true in the request and response header.
@@ -1447,9 +1485,11 @@ This is the API for clients to query the table schema and other metadata.
`Authorization: Bearer {token}`
-Optional: `delta-sharing-capabilities: responseformat=delta;readerfeatures=deletionvectors`, see
+Optional: `delta-sharing-capabilities: responseformat=delta;readerfeatures=deletionvectors;includeSemanticMetadata={includeSemanticMetadata}`, see
[Delta Sharing Capabilities Header](#delta-sharing-capabilities-header) for details.
+**{includeSemanticMetadata}** is whether or not the server included semantic metadata in the response.
+
@@ -1491,6 +1531,11 @@ Optional: `delta-sharing-capabilities: responseformat=delta;readerfeatures=delet
**{version}** is a long value which represents the current table version.
+Optional: `delta-sharing-capabilities: includeSemanticMetadata={includeSemanticMetadata}`, see
+[Delta Sharing Capabilities Header](#delta-sharing-capabilities-header) for details.
+
+**{includeSemanticMetadata}** is whether or not the server included semantic metadata in the response.
+
@@ -1501,14 +1546,20 @@ A sequence of JSON strings delimited by newline.
When `responseformat=parquet`, each line is a JSON object defined in [API Response Format in Parquet](#api-response-format-in-parquet).
-The response contains two lines:
+The response can contain three lines:
- The first line is [a JSON wrapper object](#json-wrapper-object-in-each-line) containing the table [Protocol](#protocol) object.
- The second line is [a JSON wrapper object](#json-wrapper-object-in-each-line) containing the table [Metadata](#metadata) object.
+- The third line is [a JSON wrapper object](#json-wrapper-object-in-each-line-in-delta) containing the table [SemanticMetadata](#semanticmetadata) object
+
+The third line is only included if the `includeSemanticMetadata` is set to true in the request and response header.
When `responseformat=delta`, each line is a Json object defined in [API Response Format in Delta](#api-response-format-in-delta).
-The response contains two lines:
+The response can contain three lines:
- The first line is [a JSON wrapper object](#json-wrapper-object-in-each-line-in-delta) containing the delta [Protocol](#protocol-in-delta-format) object.
- The second line is [a JSON wrapper object](#json-wrapper-object-in-each-line-in-delta) containing the delta [Metadata](#metadata-in-delta-format) object.
+- The third line is [a JSON wrapper object](#json-wrapper-object-in-each-line-in-delta) containing the table [SemanticMetadata](#semanticmetadata) object
+
+The third line is only included if the `includeSemanticMetadata` is set to true in the request and response header.
@@ -1677,6 +1728,7 @@ Example (See [API Response Format in Parquet](#api-response-format-in-parquet) f
HTTP/2 200
content-type: application/x-ndjson; charset=utf-8
delta-table-version: 123
+delta-sharing-capabilities: includeSemanticMetadata=true
```
```json
@@ -1697,6 +1749,35 @@ delta-table-version: 123
]
}
}
+{
+ "semanticMetadata": {
+ "comment": "This is a table",
+ "columns": [
+ {
+ "name": "eventTime",
+ "comment": "The time of the event"
+ },
+ {
+ "name": "date",
+ "comment": "The date of the event"
+ }
+ ],
+ "table_constraints": [
+ {
+ "primary_key_constraint": {
+ "name": "pk",
+ "child_columns": ["eventTime"]
+ }
+ }
+ ],
+ "tags": [
+ {
+ "key": "tag1",
+ "value": "value1"
+ }
+ ]
+ }
+}
```
### Read Data from a Table
@@ -2509,7 +2590,7 @@ Each capability is in the format of "key=value1,value2", values are separated by
Example: "responseformat=delta;readerfeatures=deletionvectors,columnmapping". All keys and values should
be case-insensitive when processed by the server.
-This header can be used in the request for [Query Table Metadata](#query-table-metadata),
+This header can be used in the request for [Get Share](#get-share), [Query Table Metadata](#query-table-metadata),
[Query Table](#read-data-from-a-table), and [Query Table Changes](#read-change-data-feed-from-a-table).
**Compatibility**
@@ -2561,6 +2642,12 @@ readerfeatures is only useful when `responseformat=delta`, it includes values fr
features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#table-features). It's set by the
caller of `DeltaSharingClient` to indicate its ability to process delta readerFeatures.
+### includeSemanticMetadata
+If `includeSemanticMetadata=true` is specified by the client, the server should include the new
+semantic metadata in the response. Otherwise, it should not be included. If `includeSemanticMetadata=true`
+header is specified in the server response, the client should expect semantic metadata to be included in
+the response and parse accordingly. It is expected for clients to ignore unrecognized fields when parsing the response.
+
## API Response Format in Parquet
This section discusses the API Response Format in Parquet returned by the server.
@@ -2634,6 +2721,72 @@ Example (for illustration purposes; each JSON object must be a single line in th
}
```
+### SemanticMetadata
+Field Name | Data Type | Description | Optional/Required
+-|-|-|-
+comment | String | User-provided free-form text description
Max length: `65536` | Optional
+columns | Array\<[ColumnMetadata](#ColumnMetadata)> | Table column metadata. Should be a subset of the columns in schemaString
Max length: `32768` | Optional
+table_constraints | Array\<[TableConstraint](#TableConstraint)> | Constraints of the table. We currently only support primary key sharing
Max length: `1` | Optional
+tags | Array\<[TagKeyValue](#TagKeyValue)> | Tags assigned to the table
Max length: `50` | Optional
+
+#### ColumnMetadata
+Field Name | Data Type | Description | Optional/Required
+-|-|-|-
+name | String | Name of column
Max length: `255`
Regex: `[a-zA-Z0-9_@-]+` | Required
+comment | String | User-provided free-form text description
Max length: `65536` | Optional
+tags | Array\<[TagKeyValue](#TagKeyValue)> | Tags assigned to the column
Max length: `50` | Optional
+
+#### TagKeyValue
+Field Name | Data Type | Description | Optional/Required
+-|-|-|-
+key | String | Name of the tag
Max length: 255
Reserved characters: `. , - = / :` | Required
+value | String | Value of the tag associated with the key
Max length: `100` | Optional
+
+#### TableConstraint
+Field Name | Data Type | Description | Optional/Required
+-|-|-|-
+primary_key_constraint | [PrimaryKeyConstraint](#PrimaryKeyConstraint) | Primary key constraint for the table. There should only be 1 primary key | Required
+
+#### PrimaryKeyConstraint
+Field Name | Data Type | Description | Optional/Required
+-|-|-|-
+name | String | Name of the constraint
Max length: `255`
Regex: `[a-zA-Z0-9_@-]+` | Required
+child_columns | Array\ | Columns of the constraint. Columns cannot be nullable
Max name length: `255`
Regex: `[a-zA-Z0-9_@-]+`
Max array length: `32768` | Required
+
+Example (for illustration purposes; each JSON object must be a single line in the response):
+
+```json
+{
+ "semanticMetadata": {
+ "comment": "This is a table",
+ "columns": [
+ {
+ "name": "eventTime",
+ "comment": "The time of the event"
+ },
+ {
+ "name": "date",
+ "comment": "The date of the event"
+ }
+ ],
+ "table_constraints": [
+ {
+ "primary_key_constraint": {
+ "name": "pk",
+ "child_columns": ["eventTime"]
+ }
+ }
+ ],
+ "tags": [
+ {
+ "key": "tag1",
+ "value": "value1"
+ }
+ ]
+ }
+}
+```
+
### File
Field Name | Data Type | Description | Optional/Required