Skip to content

Migration from pulp to createrepo-agent #972

Open
@cottsay

Description

@cottsay

Summary

This ticket tracks migrating the RPM repository management tool used in ros_buildfarm from pulp to a new purpose-built tool called createrepo-agent.

Background

RPM repository metadata consists of a collection of XML files which reside in a subdirectory of the repository root. The root document, repomd.xml, can be signed using a GPG key. Unlike debian metadata which uses a "clearsign" signature, the repomd.xml.asc is a "detached" signature. Any modification to the contents of the repository typically results in changes to each of the ~5 XML files and the signature.

Pulp is a general-purpose content management solution with robust plugins specifically targeted at RPMs. It leverages postgresql, redis, Django, and stores payload data in a CAS. It is written in Python, and uses several daemon processes to implement different roles to service different types of requests.

Motivation for this change

Pulp is a very powerful content management tool, but it is extremely heavyweight and complex. Implementing the required queries to perform package invalidation (as is required by ros_buildfarm) means that we must perform import operations serially, and performance at our scale has become unsustainable. Central to our performance problems are that metadata generation in Pulp is far too slow.

Additionally, the way RPM repository metadata is hosted inherently provides for races when updating metadata that clients may be simultaneously downloading due to the fact that several separate files must be updated together. Pulp has no mitigation for this problem, and it is causing jobs to occasionally fail to download repository metadata.

Another problem with our current solution is that the serialization of repository operations is tightly coupled to Jenkins, making it difficult to experiment with other orchestration and execution solutions.

After analyzing the performance problems we're currently experiencing with Pulp, it was decided that a new tool should be created which can solve several of the problems holding us back today.

Overview of createrepo-agent

High-level features:

  • Background process which keeps metadata in memory so that it doesn't need to be re-read for each change - only written.
  • Integrated change queue which not only ensures that simultaneous operations do not overwrite each other, but also batches all pending changes in the same metadata write operation.
  • No system provisioning beyond installation of the tool - existing repositories can be used or new ones created as necessary.
  • Process for keeping old metadata files (other than the top-level repomd.xml) and retiring after it is unlikely to be requested.

Roll out process

See #972 (comment)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions