BigQuery to Kafka Connector

A Python utility that extracts data from Google BigQuery and publishes it to a Kafka topic using Quix Streams.

Overview

This connector reads data from a BigQuery table and streams it to a Kafka topic, handling various data types and serialization challenges. It's built with Quix Streams for Kafka integration and includes custom serialization to properly handle all BigQuery data types.

Features

Connects to BigQuery using service account authentication
Extracts data using custom SQL queries
Handles complex data types including:
- Datetime objects
- Decimal values
- Binary data
- NULL/NA values
- Time objects
Publishes data to Kafka topics with proper serialization
Error handling for serialization issues

Prerequisites

Python 3.7+
Access to a Google Cloud project with BigQuery
A Kafka cluster configured with Quix
Service account credentials with BigQuery access

Installation

Clone this repository
Install dependencies:

pip install quixstreams google-cloud-bigquery pandas python-dotenv

Configuration

The script uses environment variables for configuration:

output: Kafka topic name to publish data to
service_account: JSON string containing the Google Cloud service account credentials

For local development, you can use a .env file with these variables.

Usage

Set the environment variables or create a .env file
Modify the SQL query in the read_data() function to match your BigQuery table
Run the script:

python bigquery_to_kafka.py

How It Works

The script establishes a connection to BigQuery using the provided service account credentials
It executes the SQL query to fetch data from the specified table
Each row is converted to a JSON-serializable format with custom handling for special data types
The data is serialized and published to the Kafka topic
Error handling captures and logs any serialization issues without stopping the entire process

Custom Data Type Handling

The script includes two custom classes for data serialization:

CustomJSONEncoder: Extends the standard JSON encoder to handle BigQuery-specific data types
BigQuerySerializer: Handles pre-processing of data before serialization

Error Handling

The script includes comprehensive error handling that:

Catches exceptions during serialization
Logs problematic data with detailed information
Continues processing other rows when errors occur

Contributing

Feel free to submit issues or pull requests for improvements to the connector.

License

This project is open source under the Apache 2.0 license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery to Kafka Connector

Overview

Features

Prerequisites

Installation

Configuration

Usage

How It Works

Custom Data Type Handling

Error Handling

Contributing

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

BigQuery to Kafka Connector

Overview

Features

Prerequisites

Installation

Configuration

Usage

How It Works

Custom Data Type Handling

Error Handling

Contributing

License