Skip to content

Port GzipZinfo to Go#1874

Draft
kzys wants to merge 1 commit into
awslabs:mainfrom
kzys:purego-gzip-zinfo
Draft

Port GzipZinfo to Go#1874
kzys wants to merge 1 commit into
awslabs:mainfrom
kzys:purego-gzip-zinfo

Conversation

@kzys
Copy link
Copy Markdown
Contributor

@kzys kzys commented Feb 25, 2026

GzipZinfo was implemented in C because;

  1. The original implementation, zran.c was in C
  2. compress/flate didn't expose the necessary APIs to implement the logic in Go

This commit ports GzipZinfo to pure Go by forking compress/flate and exposing the APIs we need.

Issue #, if available:

Description of changes:

Testing performed:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

GzipZinfo was implemented in C because;

1. The original implementation, zran.c was in C
2. compress/flate didn't expose the necessary APIs to implement
   the logic in Go

This commit ports GzipZinfo to pure Go by forking compress/flate and
exposing the APIs we need.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kazuyoshi Kato <kazuyoshi.kato@baseten.co>
@kzys
Copy link
Copy Markdown
Contributor Author

kzys commented Feb 25, 2026

Before connecting the component to the rest of the snapshotter, I'd like to discuss about the following topics;

  • Whether we are fine forking compless/flate or not.
  • How to switch the implementations. I personally want to switch them in runtime, but it is possible to have build flags.
  • Package structure

@prafgup
Copy link
Copy Markdown
Contributor

prafgup commented Feb 25, 2026

A couple of general queries -

  • Will this go based implementation be as performant as C based existing implementations? Did you benchmark the same? would be interesting to see the performance gain/degradation.
  • Will this support pigz based parallel gzip compressions? I was working to add it here - Support concatenated gzip (mgzip/pigz) layers in ztoc #1871

@sondavidb
Copy link
Copy Markdown
Contributor

sondavidb commented Feb 25, 2026

Hey Kaz. Great to see you again 🙂

Whether we are fine forking compless/flate or not.

I don't like it but it also seems to be the best way to achieve the API we want. Based on the change history the package seems to get a few modifications a year so I'm a little concerned about that rate of change, however I don't think drifting is a huge issue for a library that serves a pretty static function. So it's not my favorite but probably the best choice to achieve this.

There is also the question of licensing, though. From my understanding of BSD-3 Clause we need to display the license somewhere in the distribution. In the package is OK by me but I hope I'm not missing anything from a legal standpoint here.

How to switch the implementations. I personally want to switch them in runtime, but it is possible to have build flags.

I don't see any reason to use the C version outside of performance reasons. I'd much rather just choose one or the other. (Unless I'm misunderstanding, which if so, my apologies.)

Package structure

This is something I've been trying to figure out for a while, I would very much like to have a folder of packages that are not actually our code (e.g. we use some id-mapping shenanigans in https://github.com/awslabs/soci-snapshotter/tree/main/idtools which is just copied code from containerd/containerd@83aaa89). I'm not opinionated on where it goes for now, but if it would be easier I can try and get a PR out on the sooner end to consolidate it, so that this work is easier for you.

@kzys
Copy link
Copy Markdown
Contributor Author

kzys commented Mar 3, 2026

Sorry for the late reply.

Will this go based implementation be as performant as C based existing implementations? Did you benchmark the same? would be interesting to see the performance gain/degradation.

No I haven't. Let me add benchmarks in this PR.

Will this support pigz based parallel gzip compressions?

Yes. It should be possible.

@kzys
Copy link
Copy Markdown
Contributor Author

kzys commented Mar 10, 2026

There is also the question of licensing, though. From my understanding of BSD-3 Clause we need to display the license somewhere in the distribution. In the package is OK by me but I hope I'm not missing anything from a legal standpoint here.

Ah right. I probably need to change the header since it is pointing the LICENSE file, which is Apache 2.0 in this repository.

I'm not opinionated on where it goes for now, but if it would be easier I can try and get a PR out on the sooner end to consolidate it, so that this work is easier for you.

Yes. But Go's packages are not hierarchical (foo/bar can't access stuff inside foo and vice versa). Moving packages to another directory wouldn't break stuff much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants