Skip to content

Allow a wider range of characters #27

Open
@wojas

Description

@wojas

Currently simpleblob is quite restrictive about the characters that are allowed in names, constraining users to alphanumerical names with a few special chars (".", "-", "_"). The constraint primarily follows from using unescaped filenames in the fs backend.

We have a use case where we want to use a version specific prefix within a bucket (e.g. "v5/"), and also to be able to write a program that can discover all versions in use. This requires listing blobs with "/" is the name.

Amazon writes the following about S3 limitations: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html
Basically they allow any UTF-8 character in the name, with a few caveats and exceptions.

Ideally we would like to allow an as wide as possible range of names, while also constraining to names that any current and future backend can safely support. These are conflicting goals. Perhaps we should use the same approach as the S3 documentation: define and recommend 'safe characters' that must work with any backend, while allowing almost all characters.

Potential solution

  • Allow almost all UTF-8 strings
  • But reject non-canonical and non-local paths like ../../foo///bar
  • Perform escaping in the fs backend to make the allowed names safe

Escaping and validation algorithm for fs:

  • Reject any backslash \ in the name
  • Call path.Clean(name) to check for non-canonical paths. Reject is the output differs from the input.
  • Reject if the name starts with .., because it could be something like ../../../etc/passwd (path.Clean will not touch this)
  • Call url.QueryEscape(name) to escape unsafe characters
  • If the name starts with a ., replace that character by %2E to avoid hidden files on UNIX

The validation and escaping functions can be exposed for other backends to reuse. The validation function should be called by every backend, the escaping function is optional.

There is another issue with special reserved device names on Windows. Go 1.20 introduces a new IsLocal function to check for these, but I don't think we want to depend on this, and it's only available in filepath. Perhaps always prepend _ to the filename to avoid this? This would also solve the UNIX hidden files issue, but be a breaking change, and it could be useful for the fs backend to produce unescaped files when restricting oneself to safe characters.

Cc @ahouene @nvaatstra

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions