11# Integration test data
22
3- The ` integration-tests/tests/data ` directory contains test datasets for use throughout the
4- integration test system.
3+ The ` integration-tests/tests/data ` directory contains sample datasets for use throughout the
4+ integration test system. Each sample dataset is loaded into the test system through the
5+ ` SampleDataset ` class.
56
67## Directory structure
78
89When augmenting or changing the contents of ` integration-tests/tests/data ` , the following rules
910must be observed:
1011
11- 1 . Each dataset must be located in its own directory.
12- 2 . Each dataset directory must be a direct child of ` integration-tests/tests/data ` ; nested dataset
13- directories are not permitted.
12+ 1 . Each sample dataset must be located in its own directory.
13+ 2 . Each sample dataset directory must be a direct child of ` integration-tests/tests/data ` ; nested
14+ dataset directories are not permitted.
1415
1516## Dataset contents
1617
17- When adding or changing a dataset directory within ` integration-tests/tests/data ` , the following
18- rules must be observed:
18+ When adding or changing a sample dataset directory within ` integration-tests/tests/data ` , the
19+ following rules must be observed:
1920
20- 1 . Each dataset directory must contain a subdirectory that holds all log files for the dataset.
21- 2 . Each dataset directory must contain a file called ` metadata.json ` , which must conform to the
22- following schema:
21+ 1 . Each sample dataset directory must contain a subdirectory that holds all log files for the
22+ dataset.
23+ 2 . Each sample dataset directory must contain a file called ` metadata.json ` , which must conform to
24+ the following schema:
2325
2426 ``` json
2527 {
@@ -40,7 +42,7 @@ rules must be observed:
4042
4143 | Field | Description |
4244 | --- | --- |
43- | ` dataset_name ` | The name of the dataset directory. |
45+ | ` dataset_name ` | The name of the sample dataset directory. |
4446 | ` unstructured ` | ` True ` if logs are unstructured, else ` False ` . |
4547 | ` timestamp_key ` | The authoritative timestamp key, or ` null ` if there is no such key. |
4648 | ` begin_ts ` | The earliest timestamp present in the dataset (ms). |
@@ -49,26 +51,26 @@ rules must be observed:
4951 | ` file_names ` | A list of the files within ` logs_subdir ` . |
5052 | ` single_match_wildcard_query ` | A wildcard query that matches exactly one log message in the dataset. |
5153
52- ## Accessing datasets within the testing system
54+ ## Accessing sample datasets within the testing system
5355
54- To access a dataset from within the test system, the following rules should be observed:
56+ To access a sample dataset from within the test system, the following rules should be observed:
5557
56- 1 . All datasets should have their own session-scoped fixture in
57- ` integration-tests/tests/fixtures/datasets .py ` . Test code should access this fixture to
58- access the dataset.
59- 2 . Each session-scoped dataset fixture should be given the same name as the dataset directory name,
58+ 1 . All sample datasets should have their own session-scoped fixture in
59+ ` integration-tests/tests/fixtures/sample_datasets .py ` . Test code should access this fixture to
60+ access the sample dataset.
61+ 2 . Each session-scoped fixture should be given the same name as the sample dataset directory name,
6062 and should conform to the following format:
6163
6264 ``` python
6365 @pytest.fixture (scope = " session" )
6466 def dataset_name (
6567 integration_test_path_config : IntegrationTestPathConfig,
6668 ) -> SampleDataset:
67- """ Returns an object corresponding to the `dataset_name` test dataset."""
69+ """ Returns an object corresponding to the `dataset_name` sample dataset."""
6870 return SampleDataset(
6971 dataset_root_dir = integration_test_path_config.test_data_dir / " dataset_name" ,
7072 )
7173 ```
7274
73- Tests should use dataset fixtures instead of reading the logs directly, because many verification
74- flows rely on dataset metadata.
75+ Tests should use sample dataset fixtures instead of reading the logs directly, because many
76+ verification flows rely on dataset metadata.
0 commit comments