Skip to content

Commit 8610bd3

Browse files
feat: Kafka source and destination connector (#3176)
Thanks to @tullytim we have a new Kafka source and destination connector. It also works with hosted Kafka via Confluent. Documentation will be added to the Docs repo.
1 parent 2d965fd commit 8610bd3

File tree

27 files changed

+908
-3
lines changed

27 files changed

+908
-3
lines changed

CHANGELOG.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## 0.14.8-dev3
1+
## 0.14.8-dev4
22

33
### Enhancements
44

@@ -21,6 +21,8 @@
2121

2222
* **Expose conversion functions for tables** Adds public functions to convert tables from HTML to the Deckerd format and back
2323

24+
* **Adds Kafka Source and Destination** New source and destination connector added to all CLI ingest commands to support reading from and writing to Kafka streams. Also supports Confluent Kafka.
25+
2426
### Fixes
2527

2628
* **Fix an error publishing docker images.** Update user in docker-smoke-test to reflect changes made by the amd64 image pull from the "unstructured" "wolfi-base" image.

MANIFEST.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ include requirements/ingest/gitlab.in
3838
include requirements/ingest/google-drive.in
3939
include requirements/ingest/hubspot.in
4040
include requirements/ingest/jira.in
41+
include requirements/ingest/kafka.in
4142
include requirements/ingest/mongodb.in
4243
include requirements/ingest/notion.in
4344
include requirements/ingest/onedrive.in

Makefile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,10 @@ install-ingest-reddit:
169169
install-ingest-slack:
170170
pip install -r requirements/ingest/slack.txt
171171

172+
.PHONY: install-ingest-kafka
173+
install-ingest-kafka:
174+
python3 -m pip install -r requirements/ingest/kafka.txt
175+
172176
.PHONY: install-ingest-wikipedia
173177
install-ingest-wikipedia:
174178
python3 -m pip install -r requirements/ingest/wikipedia.txt
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Unstructured Documentation
2+
==========================
3+
4+
The Unstructured documentation page has moved! Check out our new and improved docs page at
5+
`https://docs.unstructured.io <https://docs.unstructured.io>`_ to learn more about our
6+
products and tools.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Unstructured Documentation
2+
==========================
3+
4+
The Unstructured documentation page has moved! Check out our new and improved docs page at
5+
`https://docs.unstructured.io <https://docs.unstructured.io>`_ to learn more about our
6+
products and tools.

examples/ingest/kafka/ingest.sh

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/usr/bin/env bash
2+
3+
# Processes the pdf specified in the input path
4+
# processes the document, and writes to results to a Confluent topic.
5+
6+
SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)
7+
cd "$SCRIPT_DIR"/../../.. || exit 1
8+
9+
PYTHONPATH=. ./unstructured/ingest/main.py \
10+
local \
11+
--input-path="<path to the file to be processed/partitioned>" \
12+
kafka \
13+
--bootstrap-server="<bootstrap server fully qualified hostname>" \
14+
--port "<port, likely 9092>" \
15+
--topic "<destination topic in confluent>" \
16+
--kafka-api-key="<confluent api key>" \
17+
--secret="<confluent secret>" \
18+
--num-processes="<number of processes to be used>"

requirements/ingest/kafka.in

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
-c ../deps/constraints.txt
2+
-c ../base.txt
3+
confluent-kafka

requirements/ingest/kafka.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
#
2+
# This file is autogenerated by pip-compile with Python 3.9
3+
# by the following command:
4+
#
5+
# pip-compile ./ingest/kafka.in
6+
#
7+
confluent-kafka==2.4.0
8+
# via -r ./ingest/kafka.in
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#!/usr/bin/env bash
2+
3+
set -e
4+
5+
SCRIPT_DIR=$(dirname "$(realpath "$0")")
6+
7+
# Create the Weaviate instance
8+
docker-compose version
9+
docker-compose -f "$SCRIPT_DIR"/docker-compose.yml up --wait
10+
docker-compose -f "$SCRIPT_DIR"/docker-compose.yml ps
11+
12+
echo "Instance is live."
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
services:
2+
zookeeper:
3+
image: confluentinc/cp-zookeeper:latest
4+
environment:
5+
ZOOKEEPER_CLIENT_PORT: 2181
6+
ZOOKEEPER_TICK_TIME: 2000
7+
ports:
8+
- 22181:2181
9+
10+
kafka:
11+
image: confluentinc/cp-kafka:latest
12+
depends_on:
13+
- zookeeper
14+
ports:
15+
- 29092:29092
16+
environment:
17+
KAFKA_BROKER_ID: 1
18+
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
19+
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092
20+
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
21+
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
22+
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

0 commit comments

Comments
 (0)