A Python utility that extracts data from Google BigQuery and publishes it to a Kafka topic using Quix Streams.
This connector reads data from a BigQuery table and streams it to a Kafka topic, handling various data types and serialization challenges. It's built with Quix Streams for Kafka integration and includes custom serialization to properly handle all BigQuery data types.
- Connects to BigQuery using service account authentication
- Extracts data using custom SQL queries
- Handles complex data types including:
- Datetime objects
- Decimal values
- Binary data
- NULL/NA values
- Time objects
- Publishes data to Kafka topics with proper serialization
- Error handling for serialization issues
- Python 3.7+
- Access to a Google Cloud project with BigQuery
- A Kafka cluster configured with Quix
- Service account credentials with BigQuery access
- Clone this repository
- Install dependencies:
pip install quixstreams google-cloud-bigquery pandas python-dotenvThe script uses environment variables for configuration:
output: Kafka topic name to publish data toservice_account: JSON string containing the Google Cloud service account credentials
For local development, you can use a .env file with these variables.
- Set the environment variables or create a
.envfile - Modify the SQL query in the
read_data()function to match your BigQuery table - Run the script:
python bigquery_to_kafka.py- The script establishes a connection to BigQuery using the provided service account credentials
- It executes the SQL query to fetch data from the specified table
- Each row is converted to a JSON-serializable format with custom handling for special data types
- The data is serialized and published to the Kafka topic
- Error handling captures and logs any serialization issues without stopping the entire process
The script includes two custom classes for data serialization:
CustomJSONEncoder: Extends the standard JSON encoder to handle BigQuery-specific data typesBigQuerySerializer: Handles pre-processing of data before serialization
The script includes comprehensive error handling that:
- Catches exceptions during serialization
- Logs problematic data with detailed information
- Continues processing other rows when errors occur
Feel free to submit issues or pull requests for improvements to the connector.
This project is open source under the Apache 2.0 license.