Skip to content

Commit fc1c411

Browse files
committed
improve yaml schema validation error message (#32870)
1 parent ffd7d02 commit fc1c411

File tree

3 files changed

+9
-4
lines changed

3 files changed

+9
-4
lines changed

CHANGES.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,7 @@
130130
* (Java) Fix BigQuery Storage Write compatibility with Avro 1.8 ([#34281](https://github.com/apache/beam/pull/34281)).
131131
* Fixed checkpoint recovery and streaming behavior in Spark Classic and Portable runner's Flatten transform by replacing queueStream with SingleEmitInputDStream ([#34080](https://github.com/apache/beam/pull/34080), [#18144](https://github.com/apache/beam/issues/18144), [#20426](https://github.com/apache/beam/issues/20426))
132132
* (Java) Fixed Read caching of UnboundedReader objects to effectively cache across multiple DoFns and avoid checkpointing unstarted reader. [#34146](https://github.com/apache/beam/pull/34146) [#33901](https://github.com/apache/beam/pull/33901)
133+
* (YAML) Improved YAML schema validation error message to include the key name ([#32870](https://github.com/apache/beam/issues/32870)).
133134

134135
## Known Issues
135136

sdks/python/apache_beam/yaml/yaml_transform.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,21 +70,24 @@ def pipeline_schema(strictness):
7070
return pipeline_schema
7171

7272

73-
def _closest_line(o, path):
73+
def _closest_line_and_key(o, path):
74+
best_key = '<root>'
7475
best_line = SafeLineLoader.get_line(o)
7576
for step in path:
7677
o = o[step]
7778
maybe_line = SafeLineLoader.get_line(o)
79+
best_key = step
7880
if maybe_line != 'unknown':
7981
best_line = maybe_line
80-
return best_line
82+
return best_line, best_key
8183

8284

8385
def validate_against_schema(pipeline, strictness):
8486
try:
8587
jsonschema.validate(pipeline, pipeline_schema(strictness))
8688
except jsonschema.ValidationError as exn:
87-
exn.message += f" around line {_closest_line(pipeline, exn.path)}"
89+
line, key =_closest_line_and_key(pipeline, exn.path)
90+
exn.message = f"Error found on key '{key}' around line {line}. Cause : {exn.message}."
8891
raise exn
8992

9093

website/www/site/content/en/documentation/sdks/yaml.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -277,7 +277,8 @@ pipeline:
277277
278278
As syntactic sugar, you can name the first and last transforms in your pipeline
279279
as `source` and `sink`. This convention does not change the resulting pipeline,
280-
but it signals the intent of the source and sink transforms.
280+
but it signals the intent of the source and sink transforms. Note that `source`
281+
and `sink` each require a single transform definition (as a YAML object).
281282
282283
```
283284
pipeline:

0 commit comments

Comments
 (0)