Skip to content

Tracking issues of integrate with delta sharing #7830

Open
@Xuanwo

Description

@Xuanwo

Summary

Delta Sharing is an open protocol for secure data sharing. This tracking issue intends to track the progress of implementing databend as a delta sharing consumer. After this feature is implemented, our users will be able to:

SELECT * from delta.example.ontime;

Implement databend as a delta sharing provider is far more complex, should be tracked in another issues.

NOTE: Implement this issue after #7816 will be easier.

Tasks

  • Maybe need a delta sharing connecter in rust
  • Add delta catalog
  • Implement list databases
  • Implement list tables
  • Implement Table API
  • Integration tests

References

Protocol: https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md

Highlighted APIs:

  • List Shares: Get available shares
  • List Schemas in a Share: List all schemas (databases in databend) in a share
  • List Tables in a Schema: List all tables in a schema
  • Query Table Version
  • Query Table Metadata
    {
      "protocol": {
        "minReaderVersion": 1
      }
    }
    {
      "metaData": {
        "id": "f8d5c169-3d01-4ca3-ad9e-7dc3355aedb2",
        "format": {
          "provider": "parquet"
        },
        "schemaString": "{\"type\":\"struct\",\"fields\":[{\"name\":\"eventTime\",\"type\":\"timestamp\",\"nullable\":true,\"metadata\":{}},{\"name\":\"date\",\"type\":\"date\",\"nullable\":true,\"metadata\":{}}]}",
        "partitionColumns": [
          "date"
        ]
      }
    }
  • Read Data from a Table
    • The most important API in delta sharing.
    • Request
      {
        "predicateHints": [
          "date >= '2021-01-01'",
          "date <= '2021-01-31'"
        ],
        "limitHint": 1000,
        "version": 123
      }
    • Response
      {
        "protocol": {
          "minReaderVersion": 1
        }
      }
      {
        "metaData": {
          "id": "f8d5c169-3d01-4ca3-ad9e-7dc3355aedb2",
          "format": {
            "provider": "parquet"
          },
          "schemaString": "{\"type\":\"struct\",\"fields\":[{\"name\":\"eventTime\",\"type\":\"timestamp\",\"nullable\":true,\"metadata\":{}},{\"name\":\"date\",\"type\":\"date\",\"nullable\":true,\"metadata\":{}}]}",
          "partitionColumns": [
            "date"
          ]
        }
      }
      {
        "file": {
          "url": "https://<s3-bucket-name>.s3.us-west-2.amazonaws.com/delta-exchange-test/table2/date%3D2021-04-28/part-00000-8b0086f2-7b27-4935-ac5a-8ed6215a6640.c000.snappy.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20210501T010516Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=AKIAISZRDL4Q4Q7AIONA%2F20210501%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=97b6762cfd8e4d7e94b9d707eff3faf266974f6e7030095c1d4a66350cfd892e",
          "id": "8b0086f2-7b27-4935-ac5a-8ed6215a6640",
          "partitionValues": {
            "date": "2021-04-28"
          },
          "size":573,
          "stats": "{\"numRecords\":1,\"minValues\":{\"eventTime\":\"2021-04-28T23:33:57.955Z\"},\"maxValues\":{\"eventTime\":\"2021-04-28T23:33:57.955Z\"},\"nullCount\":{\"eventTime\":0}}"
        }
      }
      {
        "file": {
          "url": "https://<s3-bucket-name>.s3.us-west-2.amazonaws.com/delta-exchange-test/table2/date%3D2021-04-28/part-00000-591723a8-6a27-4240-a90e-57426f4736d2.c000.snappy.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20210501T010516Z&X-Amz-SignedHeaders=host&X-Amz-Expires=899&X-Amz-Credential=AKIAISZRDL4Q4Q7AIONA%2F20210501%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=0f7acecba5df7652457164533a58004936586186c56425d9d53c52db574f6b62",
          "id": "591723a8-6a27-4240-a90e-57426f4736d2",
          "partitionValues": {
            "date": "2021-04-28"
          },
          "size": 573,
          "stats": "{\"numRecords\":1,\"minValues\":{\"eventTime\":\"2021-04-28T23:33:48.719Z\"},\"maxValues\":{\"eventTime\":\"2021-04-28T23:33:48.719Z\"},\"nullCount\":{\"eventTime\":0}}"
        }
      }
    • Databend needs to read the real from URL.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-integrationArea: integrationstaleIssue has not had recent activity or appears to be solved. Stale issues will be automatically closed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions