Skip to content

Compress plain GPX uploads with gzip#7124

Open
Rub21 wants to merge 14 commits into
openstreetmap:masterfrom
Rub21:compress-gpx-uploads
Open

Compress plain GPX uploads with gzip#7124
Rub21 wants to merge 14 commits into
openstreetmap:masterfrom
Rub21:compress-gpx-uploads

Conversation

@Rub21

@Rub21 Rub21 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

This PR compresses plain GPX files before they go to S3. This should cut a large part of the size of plain GPX files.

Files that are already compressed (gzip, bzip2, zip, tar) are left as is.

For new uploads, the file comes back as .gpx.gz, no longer as plain GPX. The problem is for Windows users, where .gz is not supported unless they install 7-Zip. To handle this, I have added two links to download . One gives the plain GPX and the other gives the compressed file. I did not add new decompression code. I reused the xml_file function that already decompresses the trace, and linked the plain GPX option to it.

Results:

image

Ref: #4188

cc. @1ec5

@pablobm

pablobm commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

If the majority of downloads are of the uncompressed file, am I right to think that this will translate in costs to the infrastructure? Compute of the server-side decompression and traffic.

If so, can this be significant enough to be a problem? Should the "download gpx" link be slightly more "hidden" to avoid this situation? Just an additional click or something like that.

Comment thread config/locales/en.yml Outdated
@Rub21

Rub21 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

@pablobm you're right that this could add real bandwidth and CPU cost. I looked into it a bit more.

I think we can avoid it without hiding the link, using the Content-Encoding: gzip header. The file is already stored gzipped on S3, so we send it as-is and the browser unzips it by itself. The server does nothing and sends the small file.

Browsers always send Accept-Encoding: gzip, so this happens on its own and the user still gets a normal .gpx

I tested it with curl --compressed, which behaves like a browser:

$ curl -s --compressed -D - http://localhost:3000/traces/1/data.gpx -o out.gpx
content-encoding: gzip
content-length: 2025        # 2 KB over the network

$ file out.gpx
out.gpx: XML document text   # plain GPX on disk
$ wc -c out.gpx
15796                        # 15.8 KB unzipped by the client

If a client doesn't support gzip, the server unzips the file and sends it plain, same as today, so nothing breaks. Other formats work the same way too: we still serve whatever the user uploaded. And since the backend decides this from the request header, one download link is enough now, so I kept just one.

@pablobm pablobm left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serving it directly sounds like a good idea! From a quick online lookup, it seems that even MS Edge should support it 👍

Comment thread app/controllers/traces/data_controller.rb Outdated
Comment thread app/models/trace.rb Outdated
Comment thread app/models/trace.rb
@Rub21

Rub21 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

I changed the logic a bit.

There is now a server_gzipped: true flag in the metadata, so the download knows if the server gzipped a plain GPX.

The user still gets a plain GPX either way. If their client accepts gzip, the server sends the gzip and the client unzips it. If not, the server unzips it first.

And the CPU usage already happens today when clients request data.gpx, because the server unzips every file requested with .gpx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants