Skip to content

Study Apache setting AllowEncodedSlashes NoDecode #392

@lobis

Description

@lobis

I'm attempting to add files with slashes in the name part of the DID; e.g., my_scope:some_directory/my-file. The reason is to allow people who are using the Zarr data format to store their data using Rucio. Quick summary: Zarr stores data in multiple files where the file names and relative paths are significant. The following is a few of the files from data stored using Zarr:

[...]
spatial_ref/0
spatial_ref/.zarray
spatial_ref/.zattrs
.zmetadata
latitude/0
latitude/.zarray
latitude/.zattrs
.zattrs

One problem I've encountered stems from how the Rucio client/server API encodes DID within a request URL: as string concatenation with a / separator. For example, the client encodes the DID my_scope:my_name as my_scope/my_name within the URL when making a request. This mostly works fine. The problem start when either the DID scope or DID name contains a / character. Simple concatenation leads to an ambiguity: does the DID a/b/c in the request path have the scope a/b with file c, or does it have the scope a with the file b/c?

The ambiguity is broken by percent-encoding any /-characters in the DID name (/ --> %2F), so my_scope/a:some_directory/my-file is encoded as my_scope/a/some_directory%2Fmy-file in the request URL.

Theoretically, percent-encoding a /-character is correct and should allow the inclusion of a / without inferring the normal hierarchy semantics (which is what we want). However, percent-encoding /-character is very poorly handled by software: there are so many broken pieces of software out there that it's often hard to get this to work correctly.

Apache httpd is one example of broken software.

The AllowEncodedSlashes directive controls the behaviour. The default behaviour (equivalent to AllowEncodedSlashes Off directive) is to reject any request with %2F with a 404 response. Always.

The AllowEncodedSlashes On directive accepts such requests but decodes them; this is also arguable wrong, as it treats / and %2F as equivalent, which they are not.

The AllowEncodedSlashes NoDecode directive would (I believe) pass on the %2F (in my_scope/a/some_directory%2Fmy-file for example) to Werkzeug for blueprint-based routing (as used by Rucio). This would at least give Rucio server the chance to handle this situation, although it's not clear (to me, right now) whether this would actually work.

I note that the rucio repo contains various example Apache configurations (e.g., in the devel Docker files) that include the AllowEncodedSlashes On directive. This (I believe) is wrong, but I haven't reproduced the problem yet.

Therefore, I think we should update the Apache configuration. I would suggest AllowEncodedSlashes NoDecode is the correct setting, but I haven't verified that this actually works for my use-case.

Originally posted by @paulmillar in #387

Additional info: https://httpd.apache.org/docs/2.4/mod/core.html

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions