Skip to content

Enhanced inventory table #1

@mansenfranzen

Description

@mansenfranzen

The inventory table maintains state of processed data. It is the backend table of pybatchintory and needs to support all mandatory functionality including:

  • related jobs via identifiable job keys (compound key job_name & batch_meta_table)
  • retry failed (processing_status, processing_attempt)
  • transparency and observability (job_config, batch_*)
  • recursive usage of incremental/backfill on data items already contained in inventory table by providing arbitrary data items via processing_result_item and processing_result_weight

Schema

Column Name Description Type Constraints
id Provide unique row identifier Integer primary_key=True, autoincrement=True
job_name Define the name of the job. batch_meta_table and job_name constitute the key for consecutive, related jobs. String nullable=False, index=True
job_identifier Store job identifier provided by processing framework for transparency/observability. String
job_config Store complete job configuration for transparency/observability. JSON nullable=False
batch_meta_table Reference meta table which is used to generate batches of work. String nullable=False, index=True
batch_id_start Store the start id of the meta data table defining the lower bound of the generated batch. Integer nullable=False
batch_id_end Store the end id of the meta data table defining the upper bound of the generated batch. Integer nullable=False
batch_weight Store the accumulated weight of all data items being contained in the given batch. Float
batch_count Store the number of data items being contained in the given batch. Integer nullable=False
processing_start Store the timestamp when the batch was acquired or processing started. DateTime nullable=False, default=TS
processing_end Store the timestamp when the batch was released or processing ended. DateTime
processing_attempt Store the number of processing attempts. Integer nullable=False, default=1
processing_status Store the status of the batch (running, succeeded, failed) Enum nullable=False, default="running"
processing_logging Store additional logging provided by the processing framework for transparency/observability. String
processing_result_item Store batch result provided by processing framework. This can be leveraged for recursive usage of inventory table. JSON
processing_result_weight Store result weight provided by processing framework. This can be leveraged for recursive usage of inventory table. Float

Implementation

  • SqlAlchemy core table
  • Configurable schema and table name

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

Status

Todo

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions