Skip to content

Commit b8f4602

Browse files
authored
To/from "state" helpers for tmt state files (#4486)
tmt saves its state info - step data, test data, guest data, etc. - in YAML format, plent yof `*.yaml` files in a workdir. It's fine, it's readable by both machines and humans, it works. It turns out YAML is not necessarily the best format out there, as YAML can be costly to generate when dealing with large structures and large collections. Two things happen in this PR: * To allow some experimentation, `TMT_STATE_FORMAT` environment variable tells tmt which format to use when storing its data. As of now, only `yaml` is supported. * "to/from JSON" helpers are refactored to align with recent changes to "to/from YAML" helpers, to provide similar function names, annotations and so on, and they are made available for selection via `TMT_STATE_FORMAT`, making JSON the second supported format for tmt state files. Moar formats, moar fun! A set of simple helpers, `to_state` and `from_state` is defined, based on the aforementioned envvar, and patch changes all places where tmt stores state on disk to use them instead of `{to,from}_yaml`. `Common.{read,write}_state()` will help with that, instead of `Common.{read,write}()` methods. Now we can play with different format, and even if we learn the format is not the culprit of ineffective storage of large collections of results, we will have nice and tidy internal API to work with.
1 parent bc009b0 commit b8f4602

19 files changed

Lines changed: 471 additions & 72 deletions

File tree

docs/overview.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -620,6 +620,17 @@ TMT_POLICY_ROOT
620620

621621
__ https://tmt.readthedocs.io/en/stable/spec/policy.html
622622

623+
TMT_STATE_FORMAT
624+
Which format should tmt use for its on-disk storage of various state
625+
information like run or step data, or the collection of discovered
626+
tests.
627+
628+
.. note::
629+
630+
This does not affect user-provided files, like test, plan and
631+
story metadata in fmf files, as well as various exports and
632+
conversions. It merely affects files tmt uses for its own needs.
633+
623634
.. _step-variables:
624635

625636
Step Variables

spec/results.fmf

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,19 @@ story:
33
document the format in which test results are saved on storage.
44

55
description: |
6+
.. note::
7+
8+
Name of the file holding results is affected by the choice of
9+
the on-disk format of tmt state files. See
10+
:ref:`TMT_STATE_FORMAT<command-variables>` environment variable
11+
which controls the selection.
12+
13+
By default, YAML is used, and the file would be called
14+
``results.yaml``. The specification below will stick to this
15+
setting, and will describe the format of results in terms of
16+
YAML, but the specification is applicable to other formats
17+
as well.
18+
619
The following text defines a YAML file structure tmt uses for storing
720
results. tmt itself will use it when saving results of ``execute`` step,
821
and custom test results are required to follow it when creating their

tests/execute/result/custom.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ rlJournalStart
7777
testName="/test/wrong-json-results-file"
7878
rlPhaseStartTest "${testName}"
7979
rlRun -s "${tmt_command} ${testName} 2>&1 >/dev/null" 2 "Test provides 'results.json' in valid JSON but wrong results format"
80-
rlAssertGrep "Expected list in json data, got 'dict'." $rlRun_LOG
80+
rlAssertGrep "Expected list in JSON data, got 'dict'." $rlRun_LOG
8181
rlPhaseEnd
8282

8383
testName="/test/invalid-yaml-results-file"
@@ -89,7 +89,7 @@ rlJournalStart
8989
testName="/test/invalid-json-results-file"
9090
rlPhaseStartTest "${testName}"
9191
rlRun -s "${tmt_command} ${testName} 2>&1 >/dev/null" 2 "Test provides 'results.json' not in JSON format"
92-
rlAssertGrep "Invalid json syntax" $rlRun_LOG
92+
rlAssertGrep "Invalid JSON syntax." $rlRun_LOG
9393
rlPhaseEnd
9494

9595
testName="/test/wrong-yaml-content"

tmt/base/core.py

Lines changed: 84 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@
8080
HasRunWorkdir,
8181
Path,
8282
ShellScript,
83+
StateFormat,
8384
WorkdirArgumentType,
8485
normalize_shell_script,
8586
to_yaml,
@@ -2934,6 +2935,74 @@ def run_workdir(self) -> Path:
29342935

29352936
return self.workdir
29362937

2938+
@functools.cached_property
2939+
def state_format_marker_filepath(self) -> Path:
2940+
return self.run_workdir / 'state-format'
2941+
2942+
@functools.cached_property
2943+
def state_format(self) -> StateFormat:
2944+
try:
2945+
format_name = self.state_format_marker_filepath.read_text().strip()
2946+
2947+
except FileNotFoundError:
2948+
state_format = tmt.utils.get_state_format()
2949+
2950+
self.debug(
2951+
"No state format marker file found,"
2952+
f" using the default state format '{state_format.name}'."
2953+
)
2954+
2955+
return state_format
2956+
2957+
except Exception as exc:
2958+
raise GeneralError('Failed to read state format marker.') from exc
2959+
2960+
state_format = tmt.utils.get_state_format(format=format_name)
2961+
2962+
self.debug(
2963+
f"State format marker file found, using the '{state_format.name}' state format."
2964+
)
2965+
2966+
return state_format
2967+
2968+
def read_state(self, filepath: Path) -> Any:
2969+
"""
2970+
Read a stored state from the given file.
2971+
2972+
.. important::
2973+
2974+
No deserialization is performed, it is the responsibility of the
2975+
caller to turn loaded structure, consisting of built-in-like
2976+
types, into objects of desired classes, e.g. by the power of
2977+
:py:meth:`tmt.container.SerializableContainer.deserialize`.
2978+
2979+
:param filepath: file to read the state from.
2980+
:returns: stored state as Python data structure.
2981+
"""
2982+
2983+
return self.state_format.from_state(
2984+
self.read_file(Path(f'{filepath}{self.state_format.suffix}'))
2985+
)
2986+
2987+
def write_state(self, filepath: Path, data: Any) -> None:
2988+
"""
2989+
Write a state into the given file.
2990+
2991+
.. important::
2992+
2993+
No serialization is performed, it is the responsibility of the
2994+
caller to turn internal objects into built-in-like Python types,
2995+
e.g. by the power of
2996+
:py:meth:`tmt.container.SerializableContainer.serialize`.
2997+
2998+
:param filepath: file to write the state into.
2999+
:param data: state as Python data structure.
3000+
"""
3001+
3002+
return self.write_file(
3003+
Path(f'{filepath}{self.state_format.suffix}'), self.state_format.to_state(data)
3004+
)
3005+
29373006
def load_workdir(self, *, with_logfiles: bool = True) -> None:
29383007
"""
29393008
Prepare the run workdir and associated.
@@ -3078,16 +3147,21 @@ def save(self) -> None:
30783147
"""
30793148
Save list of selected plans and enabled steps
30803149
"""
3150+
3151+
self.state_format_marker_filepath.unlink(missing_ok=True)
3152+
self.state_format_marker_filepath.write_text(self.state_format.name)
3153+
30813154
assert self.tree is not None # narrow type
30823155
assert self._cli_context_object is not None # narrow type
3156+
assert self.workdir is not None # narrow type
30833157
data = RunData(
30843158
root=str(self.tree.root) if self.tree.root else None,
30853159
plans=[plan.name for plan in self._plans] if self._plans is not None else None,
30863160
steps=list(self._cli_context_object.steps),
30873161
environment=self.environment,
30883162
remove=self.remove,
30893163
)
3090-
self.write(Path('run.yaml'), to_yaml(data.to_serialized()))
3164+
self.write_state(self.workdir / 'run', data.to_serialized())
30913165

30923166
def load_from_workdir(self) -> None:
30933167
"""
@@ -3105,10 +3179,12 @@ def load_from_workdir(self) -> None:
31053179
# which in turn is only called by status and clean, both cases where we do not want
31063180
# to attach the logfile loggers to.
31073181
self.load_workdir(with_logfiles=False)
3182+
3183+
assert self.workdir is not None # narrow type
3184+
31083185
try:
3109-
self.data = RunData.from_serialized(
3110-
tmt.utils.yaml_to_dict(self.read(Path('run.yaml')))
3111-
)
3186+
self.data = RunData.from_serialized(self.read_state(self.workdir / 'run'))
3187+
31123188
except tmt.utils.FileError:
31133189
self.debug('Run data not found.')
31143190
return
@@ -3137,10 +3213,11 @@ def load(self) -> None:
31373213
"""
31383214
from tmt.base.plan import Plan
31393215

3216+
assert self.workdir is not None # narrow type
3217+
31403218
try:
3141-
self.data = RunData.from_serialized(
3142-
tmt.utils.yaml_to_dict(self.read(Path('run.yaml')))
3143-
)
3219+
self.data = RunData.from_serialized(self.read_state(self.workdir / 'run'))
3220+
31443221
except tmt.utils.FileError:
31453222
self.debug('Run data not found.')
31463223
return

tmt/cli/about.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
"""``tmt about`` implementation"""
22

3-
import json
43
import re
54
from typing import Any
65

@@ -73,7 +72,9 @@ def _ls(context: Context, how: str, content: Any) -> None:
7372
)
7473

7574
elif how in ('json', 'yaml'):
76-
context.obj.print(json.dumps(content) if how == 'json' else tmt.utils.to_yaml(content))
75+
context.obj.print(
76+
tmt.utils.to_json(content) if how == 'json' else tmt.utils.to_yaml(content)
77+
)
7778

7879

7980
@about.group(invoke_without_command=True, cls=CustomGroup)

tmt/export/_json.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
import json
2-
31
import tmt.base.core
42
import tmt.base.plan
53
import tmt.export
@@ -12,4 +10,4 @@
1210
class JSONExporter(tmt.export.TrivialExporter):
1311
@classmethod
1412
def _export(cls, data: tmt.export._RawExported) -> str:
15-
return json.dumps(data)
13+
return tmt.utils.to_json(data)

tmt/frameworks/beakerlib.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,7 @@ def extract_results(
174174
beakerlib_results_filepath = invocation.path / 'TestResults'
175175

176176
try:
177-
beakerlib_results = invocation.phase.read(beakerlib_results_filepath, level=3)
177+
beakerlib_results = invocation.phase.read(beakerlib_results_filepath, debug_level=3)
178178
except tmt.utils.FileError:
179179
logger.debug(f"Unable to read '{beakerlib_results_filepath}'.", level=3)
180180
note.append('beakerlib: TestResults FileError')

tmt/package_managers/bootc.py

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,7 @@
1212
PackageManagerEngine,
1313
provides_package_manager,
1414
)
15-
from tmt.utils import (
16-
Command,
17-
CommandOutput,
18-
GeneralError,
19-
Path,
20-
RunError,
21-
ShellScript,
22-
)
15+
from tmt.utils import Command, CommandOutput, GeneralError, Path, RunError, ShellScript
2316

2417
LOCALHOST_BOOTC_IMAGE_PREFIX = "localhost/tmt"
2518

tmt/steps/__init__.py

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -598,7 +598,12 @@ def _preserved_workdir_members(self) -> set[str]:
598598
A set of members of the step workdir that should not be removed during pruning.
599599
"""
600600

601-
return {'step.yaml'}
601+
members: set[str] = set()
602+
603+
if self.plan.my_run:
604+
members = {*members, f'step{self.plan.my_run.state_format.suffix}'}
605+
606+
return members
602607

603608
def _check_duplicate_names(self, raw_data: list[_RawStepData]) -> None:
604609
"""
@@ -843,7 +848,9 @@ def load(self) -> None:
843848
"""
844849

845850
try:
846-
raw_step_data: dict[Any, Any] = tmt.utils.yaml_to_dict(self.read(Path('step.yaml')))
851+
assert self.plan.my_run is not None # narrow type
852+
853+
raw_step_data: dict[Any, Any] = self.plan.my_run.read_state(self.step_workdir / 'step')
847854

848855
except tmt.utils.GeneralError:
849856
self.debug('Step data not found.', level=2)
@@ -871,7 +878,9 @@ def save(self) -> None:
871878
'status': self.status(),
872879
'data': [datum.to_serialized() for datum in self.data],
873880
}
874-
self.write(Path('step.yaml'), tmt.utils.to_yaml(content))
881+
882+
assert self.plan.my_run is not None # narrow type
883+
self.plan.my_run.write_state(self.step_workdir / 'step', content)
875884

876885
def _load_results(
877886
self,
@@ -882,8 +891,10 @@ def _load_results(
882891
Load results of this step from the workdir
883892
"""
884893

894+
assert self.plan.my_run is not None # narrow type
895+
885896
try:
886-
raw_results: list[Any] = tmt.utils.yaml_to_list(self.read(Path('results.yaml')))
897+
raw_results: list[Any] = self.plan.my_run.read_state(self.step_workdir / 'results')
887898

888899
return [result_class.from_serialized(raw_result) for raw_result in raw_results]
889900

@@ -902,10 +913,12 @@ def _save_results(self, results: Sequence['BaseResult']) -> None:
902913
Save results of this step to the workdir
903914
"""
904915

916+
assert self.plan.my_run is not None # narrow type
917+
905918
try:
906919
raw_results = [result.to_serialized() for result in results]
907920

908-
self.write(Path('results.yaml'), tmt.utils.to_yaml(raw_results))
921+
self.plan.my_run.write_state(self.step_workdir / 'results', raw_results)
909922

910923
except Exception as exc:
911924
raise GeneralError('Cannot save step results.') from exc

tmt/steps/cleanup/__init__.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,14 @@ def _preserved_workdir_members(self) -> set[str]:
7777
A set of members of the step workdir that should not be removed.
7878
"""
7979

80-
return {*super()._preserved_workdir_members, 'results.yaml'}
80+
members = {
81+
*super()._preserved_workdir_members,
82+
}
83+
84+
if self.plan.my_run:
85+
members = {*members, f'results{self.plan.my_run.state_format.suffix}'}
86+
87+
return members
8188

8289
def wake(self) -> None:
8390
"""

0 commit comments

Comments
 (0)