Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
e4d0b44
feat(ingest/kinesis): AWS Kinesis Data Streams + Amazon Data Firehose…
treff7es May 27, 2026
108c78b
fix(ingest/kinesis): satisfy CI lint — mypy + markdown formatting
treff7es May 27, 2026
dcb6179
fix(ingest/kinesis): declare extras in pyproject.toml + lazy-import p…
treff7es May 27, 2026
5b31e85
fix(ingest/kinesis): declare protobuf deps in extras (match kafka pat…
treff7es May 27, 2026
ba4d681
fix(ingest/kinesis): reuse kafka_protobuf for GSR PROTOBUF deps + reg…
treff7es May 27, 2026
64c066d
Merge remote-tracking branch 'origin/master' into aws-kinesis-connector
treff7es May 27, 2026
25216d0
feat(ingest/kinesis): set externalUrl on streams, firehose DataJobs, …
treff7es May 27, 2026
b6595e8
Merge remote-tracking branch 'origin/master' into aws-kinesis-connector
treff7es Jun 18, 2026
ccbf353
fix(ingest/kinesis): migrate to auto-wired stale entity removal after…
treff7es Jun 18, 2026
ae4d905
fix(ingest/kinesis): address PR review — require [kinesis] extra in d…
treff7es Jun 19, 2026
d1f0931
docs(ingest/kinesis): correct inaccurate docs — drop deletion-detecti…
treff7es Jun 19, 2026
d433349
refactor(ingest/kinesis): remove built-in tags-as-ownership and share…
treff7es Jun 19, 2026
3567f95
refactor(ingest/kinesis): address review — dedup helpers, typed urn_b…
treff7es Jun 19, 2026
0506956
refactor(ingest/kinesis): address review round 2 — report partial Ice…
treff7es Jun 19, 2026
006daf4
fix(ingest/kinesis): reject use_naming_convention set while GSR disabled
treff7es Jun 19, 2026
a8d9a2b
test(ingest/kinesis): use standard docker_compose_runner + in-process…
treff7es Jun 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 10 additions & 0 deletions docs-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -884,6 +884,16 @@ module.exports = {
"metadata-ingestion/integration_docs/great-expectations",
"metadata-integration/java/datahub-protobuf/README",
//"metadata-ingestion/source-docs-template",
// Discoverability alias: "Amazon Data Firehose" is ingested by
// the same `kinesis` connector, but users searching for "Firehose"
// expect to see it as its own sidebar entry. Use `ref` so
// docusaurus allows pointing at the same doc twice (the autogen
// list below produces the "Amazon Kinesis Data Streams" entry).
{
type: "ref",
id: "docs/generated/ingestion/sources/kinesis",
label: "Amazon Data Firehose",
},
{
type: "autogenerated",
dirName: "docs/generated/ingestion/sources", // '.' means the current docs folder
Expand Down
12 changes: 12 additions & 0 deletions docs-website/static/img/logos/platforms/kinesis-firehose.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 9 additions & 11 deletions docs-website/static/img/logos/platforms/kinesis.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions metadata-ingestion/constraints.txt
Original file line number Diff line number Diff line change
Expand Up @@ -898,8 +898,12 @@ mypy==1.17.1
# sqlalchemy
mypy-boto3-dynamodb==1.40.56
# via boto3-stubs
mypy-boto3-firehose==1.40.63
# via boto3-stubs
mypy-boto3-glue==1.40.75
# via boto3-stubs
mypy-boto3-kinesis==1.40.64
# via boto3-stubs
mypy-boto3-lakeformation==1.40.55
# via boto3-stubs
mypy-boto3-s3==1.40.61
Expand Down Expand Up @@ -1851,7 +1855,9 @@ typing-extensions==4.15.0
# multidict
# mypy
# mypy-boto3-dynamodb
# mypy-boto3-firehose
# mypy-boto3-glue
# mypy-boto3-kinesis
# mypy-boto3-lakeformation
# mypy-boto3-s3
# mypy-boto3-sagemaker
Expand Down
10 changes: 8 additions & 2 deletions metadata-ingestion/docs/sources/integrations_catalog.json
Original file line number Diff line number Diff line change
Expand Up @@ -780,12 +780,18 @@
"img_path": "img/logos/platforms/quicksight.svg"
},
"kinesis": {
"api_connector": true,
"platform_type": "Messaging",
"title": "Amazon Kinesis",
"title": "Amazon Kinesis Data Streams",
"description": "AWS real-time data streaming service for collecting, processing, and analyzing data streams at any scale.",
"img_path": "img/logos/platforms/kinesis.svg"
},
"kinesis-firehose": {
"supported_via": "kinesis",
"platform_type": "Messaging",
"title": "Amazon Data Firehose",
"description": "AWS managed service for delivering streaming data to S3, Redshift, OpenSearch, Snowflake, Apache Iceberg, and MongoDB (formerly Amazon Kinesis Data Firehose). Ingested via the unified `kinesis` connector.",
"img_path": "img/logos/platforms/kinesis-firehose.svg"
},
"redpanda": {
"supported_via": "kafka",
"title": "Redpanda",
Expand Down
33 changes: 33 additions & 0 deletions metadata-ingestion/docs/sources/kinesis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
## Overview

AWS Kinesis is a real-time streaming and data-delivery service. See the [official AWS Kinesis page](https://aws.amazon.com/kinesis/) for product details.

This connector covers both AWS streaming services with one recipe and one IAM policy:

- **Amazon Kinesis Data Streams (KDS)** — emitted under the `kinesis` platform (display name: _Amazon Kinesis Data Streams_) as **Datasets** (`Stream` subtype).
- **Amazon Data Firehose** (formerly _Amazon Kinesis Data Firehose / KDF_) — emitted under the `kinesis-firehose` platform (display name: _Amazon Data Firehose_) as **DataJobs** under one regional **DataFlow** per recipe, with cross-platform lineage edges to the destination (S3, Redshift, OpenSearch, Snowflake, Apache Iceberg, MongoDB).

AWS resource tags become DataHub tags (and can be turned into ownership via the `extract_ownership_from_tags` transformer). Glue Schema Registry can be opted in to attach Avro / JSON / Protobuf schemas to streams.

:::info Looking specifically for Amazon Data Firehose?

Firehose delivery streams are ingested by this same connector — see the [Concept Mapping](#concept-mapping) table below or the [Limitations](#limitations) section for cross-platform lineage configuration.
:::

## Concept Mapping

| Source Concept | DataHub Concept | Notes |
| ---------------------------------------------------------------------------- | --------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| `"kinesis"` / `"kinesis-firehose"` | [Data Platform](../../metamodel/entities/dataPlatform.md) | Two platforms, mirroring AWS's own service split. |
| AWS Region | [Container](../../metamodel/entities/container.md) | Subtype `Region`. One per recipe. |
| Kinesis Data Stream | [Dataset](../../metamodel/entities/dataset.md) | Subtype `Stream`. Parent: the regional Container. |
| Glue Schema Registry schema (per stream) | `SchemaMetadata` aspect | Avro / JSON / Protobuf. Attached when `glue_schema_registry.enabled: true` and a schema is resolved for the stream. |
| AWS Region (for Firehose) | [DataFlow](../../metamodel/entities/dataFlow.md) | Subtype `Firehose`. One per recipe. |
| Firehose delivery stream | [DataJob](../../metamodel/entities/dataJob.md) | Subtype `Firehose Delivery Stream`. Parent: the regional DataFlow. |
| Firehose destination (S3, Redshift, OpenSearch, Snowflake, Iceberg, MongoDB) | Lineage edge | Emitted via `dataJobInputOutput.outputDatasets`. Upstream is the source KDS stream when `DeliveryStreamType=KinesisStreamAsSource`. |
| AWS resource tag (`Key=Value`) | Tag | Tag URN form: `urn:li:tag:Key:Value`. |
| AWS resource tag (via the `extract_ownership_from_tags` transformer) | Owner | Ownership is derived from the emitted tags by the transformer, not by this source directly. |

### Compatibility

Six Firehose destination platforms are supported: S3, Redshift, OpenSearch/Elasticsearch, Snowflake, Apache Iceberg, and MongoDB. Delivery streams targeting other destinations (HTTP, Datadog, Splunk, New Relic, etc.) emit as DataJobs without lineage output edges and surface a warning in the ingestion report. See [Limitations](#limitations) for the `destination_platform_map` configuration required when destination platforms were ingested with a non-default `platform_instance`.
Loading
Loading