DATAMIMIC Performance Report is a CLI tool designed to run and analyze performance tests for DATAMIMIC operations. The project allows you to execute synthetic data generation scripts (XML based) under different configurations, measure key performance metrics (elapsed time, memory usage, throughput), and generate consolidated HTML reports.
This project is aimed at providing an end-to-end performance testing solution for DATAMIMIC operations. It supports:
- Executing various XML test scripts that mimic real data scenarios.
- Configuring tests (e.g., record counts, exporters, process counts) via command-line options or a YAML configuration file.
- Automatically installing specific versions of the
DATAMIMIC-CE
component from PyPI (or TestPyPI) as part of the test runs. - Capturing performance metrics like elapsed execution time and peak memory usage.
- Aggregating collected data to generate comprehensive HTML reports for throughput and memory consumption analysis.
- Performance Testing: Run tests over multiple configurations such as different record counts, process counts, and DATAMIMIC-CE versions.
- Flexible Configuration: Override default parameters via a YAML configuration file or environment variables (e.g.,
DATAMIMIC_VERSIONS
). - Automated Installation: Dynamically install specified versions of
DATAMIMIC-CE
before running a test. - Detailed Reporting: Generate consolidated HTML reports that include both raw and interpolated performance data.
- Modular & Extensible: Built with an emphasis on maintainability and performance. Core functionality is organized into separate modules for CLI, performance measurement, data processing, and report generation.
- Python: 3.11 or newer
- Dependencies:
typer-cli
(>= 0.15.1)psutil
(>= 5.9.0)loguru
(>= 0.6.0)jinja2
(>= 3.1.5)matplotlib
(>= 3.10.0)pip
(>= 25.0.1)setuptools
(>= 75.8.0)wheel
(>= 0.42.0)
Refer to the pyproject.toml file for the complete dependency list and project configuration.
-
Clone the Repository:
git clone https://your-repo-url.git cd DATAMIMIC-performance-report
-
(Optional) Create and Activate a Virtual Environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies using uv sync:
uv sync install
The CLI entry point is defined in app/main.py
and is exposed as the datamimic-perf
command (as set in pyproject.toml
).
Run performance tests using the run
command. This command accepts several options:
--config
: Path to a YAML configuration file (to override default parameters).--counts
: List of record counts (default:[100000, 200000, 300000]
).--exporters
: List of exporters (default:["CSV"]
).--num-processes
: List of process counts (default:[1, 4]
).--versions
: List ofDATAMIMIC-ce
versions to test (default:["1.2.1.dev2", "1.2.2.dev23"]
).--iterations
: Number of iterations per configuration (default:1
).--script
: (Optional) Specific script filename to test (e.g.,test_datetime.xml
).
Example command:
datamimic-perf run --config config.yaml --counts 100000 200000 300000 --exporters CSV --num-processes 1 4 --versions 1.2.1.dev2 1.2.2.dev23 --iterations 1 --script test_datetime.xml
This command will:
- Parse JSON result files available in the
results
directory. - Aggregate performance metrics by version.
- Produce an HTML report with data visualizations on throughput and memory consumption.
.
├── pyproject.toml # Project configuration and dependency management.
├── README.md # This file.
└── app
├── main.py # CLI entry point using Typer.
├── report_components # Modules for data processing and report generation.
│ ├── data_processor.py # Processes throughput and memory data.
│ ├── interpolation.py # Provides interpolation utility functions.
│ └── report_generator.py # Generates the consolidated HTML report.
├── scripts # XML test scripts representing various data generation tests.
│ ├── test_datetime.xml
│ ├── test_entities.xml
│ └── test_entity_file.xml
└── utils
└── performance_utils.py # Performs performance measurements and test executions.
You can override the default testing parameters using a YAML configuration file. The configuration can include:
counts
: List of record counts.exporters
: List of exporters to use.num_processes
: List of process counts.iterations
: Number of test iterations.versions
: List ofDATAMIMIC-ce
versions to test.
When the --config
option is provided to the run
command, these settings will replace the default parameters. Additionally, the DATAMIMIC_VERSIONS
environment variable (if set) will override any version setting you pass.
Contributions are welcome! Please adhere to the following guidelines:
- Do not remove or reduce existing documentation strings. Improve them only.
- Do not change input/output behaviors of functions unless explicitly required.
- Preserve the existing logic while proposing improvements for better architecture and maintainability.
- When writing unit tests, follow Ian Cooper’s TDD approach:
- Reuse existing unit test utilities.
- Create new ones if needed.
- Ensure that all your changes are well tested.
- Before submitting pull requests, please run all tests to avoid breaking any functionality.
This project is licensed under the MIT License. See the LICENSE file for details.