Skip to content

QuDET is Python package for data engineering that bridges classical data pipelines with quantum computing.

License

Notifications You must be signed in to change notification settings

satwiksps/qudet-dev

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QuDET: Quantum Data Engineering Toolkit

QuDET is an enterprise-grade Python library designed to bridge the gap between classical data engineering and quantum machine learning. It provides a robust, modular framework for building hybrid workflows, enabling researchers and engineers to integrate quantum algorithms into production pipelines without deep quantum physics expertise.

Table of Contents

Core Modules

QuDET is built around six specialized modules that handle every stage of the quantum data lifecycle, from ingestion to governance.

1. Connectors

The Bridge to Data Sources This module handles the efficient ingestion of classical data into the quantum pipeline. It supports high-performance loading from modern data formats like Parquet and SQL databases, offering features tailored for large-scale data engineering such as streaming buffers and batch processing. It abstracts away the complexity of getting data ready for quantum conversion.

2. Transforms

Pre-Quantum Feature Engineering Before data can touch a quantum circuit, it must be rigorously prepared. The Transforms module provides quantum-aware feature engineering tools. This includes dimensionality reduction (critical for NISQ devices with limited qubits), feature scaling, and outlier detection. It ensures that the information fed into quantum states is dense, relevant, and normalized.

3. Encoders

Classical-to-Quantum Conversion This is the heart of the "Quantum Data" process. Encoders translate classical numerical data into quantum states. The module offers various strategies—like Amplitude Encoding for compression or Angle Encoding for noise resilience—allowing engineers to choose the best tradeoff between circuit depth and expressibility for their specific dataset.

4. Analytics

Quantum Machine Learning Models The Analytics module contains the actual quantum algorithms used for insight generation. It provides familiar, Scikit-Learn compatible estimators for Classification (QSVC), Regression, and Clustering. These models leverage quantum kernels and variational circuits to find patterns in high-dimensional Hilbert spaces that classical models might miss.

5. Compute

Execution & Resource Management Managing quantum hardware interactions is complex. The Compute layer abstracts this by handling backend connections (to simulators or real QPUs like IBM Quantum), circuit optimization, and distributed processing. It ensures that quantum jobs are executed efficiently, whether you are running a test on a laptop or a massive batch job on a cluster.

6. Governance

Safety, Cost, & Reliability Unique to QuDET, this module addresses the operational challenges of "QuantumOps". It includes tools for:

  • Drift Detection: Monitoring if the data distribution shifts using quantum kernels.
  • Cost Estimation: Predicting the financial cost of running jobs on QPU providers.
  • Integrity Checks: Verifying that data wasn't corrupted during the encoding process.
  • Audit Logging: keeping a record of quantum job execution for compliance.

Installation

git clone https://github.com/meow/quantum-data-engineering.git
cd quantum-data-engineering
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Quick Start

QuDET is designed to be intuitive. Here is a minimal example setting up a simple pipeline:

from qudet.connectors import QuantumDataLoader
from qudet.transforms import QuantumPCA
from qudet.encoders import AngleEncoder
from qudet.analytics import QuantumSVC
from qudet.compute import BackendManager
from qudet.governance import QuantumDriftDetector

# 1. Pipeline Definition
# Load -> Check -> Reduce -> Encode -> Classify
loader = QuantumDataLoader()
drift_detector = QuantumDriftDetector()
pca = QuantumPCA(n_components=4)
encoder = AngleEncoder(n_qubits=4)

# Get backend from Compute module
backend = BackendManager.get_backend("simulator")
classifier = QuantumSVC(n_qubits=4) # Uses simulator by default

# 2. Execution Flow
data = loader.load_csv("data.csv")

# Governance: Check for data drift before processing
drift_detector.fit_reference(data) 

reduced_data = pca.fit_transform(data)
classifier.fit(reduced_data, labels)

print("Pipeline executed successfully.")

Contributing

We welcome contributions! Please fork the repository, create a feature branch, and submit a pull request. Ensure all new code is covered by tests.

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

About

QuDET is Python package for data engineering that bridges classical data pipelines with quantum computing.

Topics

Resources

License

Stars

Watchers

Forks

Languages