Extremely high level instructions:
- Create a Streams Messaging Manager data hub
- Create a new Source connector under "Connect"
- Select StatelessNifiSourceConnector
- Click the pencil icon next to the
flow.snapshotconfig parameter - Browse to
Stateless_Nifi_Source.json(found in this repo), clickSave - Adjust the configutarion
- give it a name
output.portshould match the name of the output port in the flow definition.send_to_kafkaif you're using the flow defs found in this repotopicsis the name of the Kafka topic you want to publish to. It does not need to be created ahead of time
- Click validate, next, deploy
For the sink connector, similar steps to the source connector, most notably click Save and Enhance instead of Save in order to bring in the nifi parameter context. Sample values are included in the parameter values.
parameter.pcontext:bucket_uri==> your bucket name, of the forms3a://your-bucket-nameparameter.pcontext:bucket_folder==> the subfolder within your bucket, of the form/folder/structure/parameter.pcontext:cdp_username==> your CDP workload usernameparameter.pcontext:cdp_password==> your CDP workload passwordinput.port==> should be the name of the input port.receive_kafkaif you're using the flow def found in this repotopics==> should match the topic name your source connector is publishing to
CLick validate, next, deploy.
This process writes parquet to your S3 location. If you want to query that data, run this from an Impala virtual warehouse, being sure to change the location to your bucket/folder.
CREATE EXTERNAL TABLE default.snifi (id STRING, ts STRING)
STORED AS PARQUET
LOCATION 's3a://goes-se-sandbox01/cnelson2/stateless_nifi_parquet';
select * from default.snifi limit 10;