Skip to content

Commit 115925a

Browse files
Adjust data/README.md.
1 parent 95c8f2a commit 115925a

1 file changed

Lines changed: 22 additions & 20 deletions

File tree

Lines changed: 22 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,27 @@
11
# Integration test data
22

3-
The `integration-tests/tests/data` directory contains test datasets for use throughout the
4-
integration test system.
3+
The `integration-tests/tests/data` directory contains sample datasets for use throughout the
4+
integration test system. Each sample dataset is loaded into the test system through the
5+
`SampleDataset` class.
56

67
## Directory structure
78

89
When augmenting or changing the contents of `integration-tests/tests/data`, the following rules
910
must be observed:
1011

11-
1. Each dataset must be located in its own directory.
12-
2. Each dataset directory must be a direct child of `integration-tests/tests/data`; nested dataset
13-
directories are not permitted.
12+
1. Each sample dataset must be located in its own directory.
13+
2. Each sample dataset directory must be a direct child of `integration-tests/tests/data`; nested
14+
dataset directories are not permitted.
1415

1516
## Dataset contents
1617

17-
When adding or changing a dataset directory within `integration-tests/tests/data`, the following
18-
rules must be observed:
18+
When adding or changing a sample dataset directory within `integration-tests/tests/data`, the
19+
following rules must be observed:
1920

20-
1. Each dataset directory must contain a subdirectory that holds all log files for the dataset.
21-
2. Each dataset directory must contain a file called `metadata.json`, which must conform to the
22-
following schema:
21+
1. Each sample dataset directory must contain a subdirectory that holds all log files for the
22+
dataset.
23+
2. Each sample dataset directory must contain a file called `metadata.json`, which must conform to
24+
the following schema:
2325

2426
```json
2527
{
@@ -40,7 +42,7 @@ rules must be observed:
4042

4143
| Field | Description |
4244
| --- | --- |
43-
| `dataset_name` | The name of the dataset directory. |
45+
| `dataset_name` | The name of the sample dataset directory. |
4446
| `unstructured` | `True` if logs are unstructured, else `False`. |
4547
| `timestamp_key` | The authoritative timestamp key, or `null` if there is no such key. |
4648
| `begin_ts` | The earliest timestamp present in the dataset (ms). |
@@ -49,26 +51,26 @@ rules must be observed:
4951
| `file_names` | A list of the files within `logs_subdir`. |
5052
| `single_match_wildcard_query` | A wildcard query that matches exactly one log message in the dataset. |
5153

52-
## Accessing datasets within the testing system
54+
## Accessing sample datasets within the testing system
5355

54-
To access a dataset from within the test system, the following rules should be observed:
56+
To access a sample dataset from within the test system, the following rules should be observed:
5557

56-
1. All datasets should have their own session-scoped fixture in
57-
`integration-tests/tests/fixtures/datasets.py`. Test code should access this fixture to
58-
access the dataset.
59-
2. Each session-scoped dataset fixture should be given the same name as the dataset directory name,
58+
1. All sample datasets should have their own session-scoped fixture in
59+
`integration-tests/tests/fixtures/sample_datasets.py`. Test code should access this fixture to
60+
access the sample dataset.
61+
2. Each session-scoped fixture should be given the same name as the sample dataset directory name,
6062
and should conform to the following format:
6163

6264
```python
6365
@pytest.fixture(scope="session")
6466
def dataset_name(
6567
integration_test_path_config: IntegrationTestPathConfig,
6668
) -> SampleDataset:
67-
"""Returns an object corresponding to the `dataset_name` test dataset."""
69+
"""Returns an object corresponding to the `dataset_name` sample dataset."""
6870
return SampleDataset(
6971
dataset_root_dir=integration_test_path_config.test_data_dir / "dataset_name",
7072
)
7173
```
7274

73-
Tests should use dataset fixtures instead of reading the logs directly, because many verification
74-
flows rely on dataset metadata.
75+
Tests should use sample dataset fixtures instead of reading the logs directly, because many
76+
verification flows rely on dataset metadata.

0 commit comments

Comments
 (0)