Skip to content

Status service: provide archive status without archiver #71

@qubicmio

Description

@qubicmio

What? Why? Who?

As an operator I would like to be able to run the query infrastructure (status service + query services) without archiver because the archiver status might not be up to date and running an archiver is not feasible on every machine.

Acceptance Criteria

  • The status service can run without archiver and is still compatible with the query services.
  • The current tick intervals information is stored in a central place and can be retrieved by the status service to provide the current tick interval information including last processed tick.
  • On machines where the archiver is running there is still a way to compare the local archiver data against the archived data in elasticsearch.

Scope

  • Modify ingestion pipeline.
  • Modify status generation.
  • Keep or improve archiver data verification.

Out of Scope

  • Changes to query services (should not be necessary, open separate issue, if needed).

Technical Sketch (How?)

Provide archive status information to query services

The status service provides important information about the archive status (last fully processed tick) to the query services.

Currently it provides the status data partly from the elasticsearch repository and partly (current epoch tick information) from the archiver.

This should be changed so that the status service can run without archiver. For this we need to publish the current epochs tick interval (initial tick, last processed tick, epoch) to some place.

Architecture

Elastic is not well suited for storing a changing values like the last processed tick and furthermore this data should not be in persisted in kafka. I suggest to use a key value store like redis. It can be colocated on the elastic machines for this small amount of data. 3x redis with 3x sentinel for failover with strict limits for memory and cpu usage.

Verify archiver data by comparing it with archive data

The status service also has the task to compare the archiver data against the data that is ingested into elastic search for verification reasons. We need to keep this functionality as it allows us to be sure that we ingest the correct data.

Architecture

This functionality should be made optional so that it can be enabled on machines with an running archiver or we could run a separate service.

Open Issues

  • It's not clear how to handle empty ticks, as they are not stored in elastic. We could also use redis for this, for example expiring keys, so that the status service is able to skip the empty ticks.
  • Can we keep the verification functionality in the status service?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    📋 Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions