Skip to content
Andrea Patelli edited this page Jan 21, 2016 · 4 revisions

Dev tests

To test if the Couchbase connector is working properly, we copied different test buckets from a Coucbase server to Kafka.

What we did - Setup part

After the creation of the buckets, we used a simple Python script to generate and save on the buckets sample documents. The generator, in addition to saving the documents on couchbase, saves the documents on different Elasticsearch indexes (one index per bucket). Using the Elasticsearch indexes, during an end to end test (Couchbase to Kafka to HDFS), it's possible to check if the saved data on HDFS are the same of the generated data.

After generating sample documents, we created a JSON configuration used to create a connector for every bucket from which load data from:

{
	"name": "couchbase-source-bucket-cwb",
	"config": {
		"connector.class": "org.apache.kafka.connect.couchbase.CouchbaseSourceConnector",
		"topic": "testing-couchbase-cwb",
		"schema.name": "testcouchbase",
		"couchbase.nodes": "10.240.187.20",
		"couchbase.bucket": "cwb",
		"dcp.maximum.drainrate": "500",
		"couchbase.dcpConnectionBufferSize": "0",
		"tasks.max": "1"
	}
}

In the used configuration, using the property topic we defined in which topic the connector should produce the messages. The properties schema.name is used to define the name to use for saving the schema on the schema registry. The following 2 properties are used to setup the connection to the Couchbase server: couchbase.nodes defines the addresses of the nodes of the Couchbase server, while couchbase.bucket defines the name of the bucket from which read documents. dcp.maximum.drainrate and couchbase.dcpConnectionBufferSize are used to tune the speed at which consume documents from the DCP stream.

After the connector is started, it's possible to check if data is being produced on the topic just starting a simple console consumer.

Offset management ad fault tolerance testing

To check if the connector is reliable, while it's reading data from Couchbase, it has been stopped and restarted several times. Whene the connector finishes producing data to Kafka, it's possible to check if the count of the consumed data using the console consumer is the same as the number of the documents on the bucket from which the connector is reading documents.

After ingesting all the documents already present in the buckets, the Python generator has been started again, in order to check the behaviour of the connector when new documents are added to the bucket. To check fault tolerance, the connector has been restarted several times and then we left the connector running for hours while the generator was creating new documents on the bucket. After a day, the generator was stopped and we checked that the count of produced messages on Kafka matched the number of documents present in the bucket.

Clone this wiki locally