A comprehensive Python toolkit for exporting, cleaning, processing, and preparing time series data from InfluxDB for machine learning, primarily designed for working with water meter data.
- 🔄 Export Data: Query and export data from InfluxDB with customizable parameters
- 🧹 Clean Data: Interactive data cleaning tool with options to remove columns, filter values, and rename columns
- 🕒 Reformat Timestamps: Convert timestamps to different timezones and formats, with options to keep only time components
- 📊 Data Visualization: Preview data and statistics directly in the console
- 🔒 Secure Credentials: Store InfluxDB credentials securely, protected from Git
- 🏷️ Event Labeling: Interactive event labeling system for past water consumption data
- Python 3.6+
- pandas
- influxdb-client
- pytz
- matplotlib
- scikit-learn
- numpy
- Clone this repository
- Install the required dependencies:
pip install -r requirements.txt
Run the script with:
python influx_data_toolkit.py
The interactive menu will guide you through the available options:
- Export data from InfluxDB
- Clean existing CSV data for machine learning
- Reformat timestamps and adjust timezone
- Launch event labeler tool
- Exit program
The toolkit provides powerful data cleaning capabilities:
- Remove columns from datasets
- Filter data based on column values (equals, less than, greater than)
- Rename individual columns or multiple columns at once
- View summary statistics of your data
- Preview data before and after operations
The timestamp formatter now supports multiple operations:
- Convert timestamps between different timezones
- Remove date components (keep only time values)
- Combine both operations at once
- Clear naming convention for processed files (_time_only, _tz_converted, etc.)
The event labeler tool provides an interactive interface for labeling water consumption events in your time series data:
python event_labeler_launcher.py [optional_csv_file]
Features include:
- Interactive visualization of water consumption data
- Configurable rules for event detection
- Manual labeling of water consumption events
- Export of labeled datasets for further analysis in Google Colab
- Export data from InfluxDB (providing credentials if needed)
- Preview the exported data
- Reformat timestamps to your local timezone
- Clean the data by removing unnecessary columns or filtering values
- Use the event labeler to identify and label water consumption events
- Export labeled data for use with machine learning models in Google Colab
This project is licensed under the MIT License - see the LICENSE file for details.