Skip to content

More involved argument parsing #5

@geofft

Description

@geofft

I think we should do #4 in the short term, but in the longer term, I'd like us to explicitly accept spaces and split arguments in some fashion in a reliable cross-platform way.

I'm thinking a bit about why we need relexec to exist. One way to describe relexec is that it's a tool like env in the common use of #!/usr/bin/env python3 etc., in that it sits at a well-known path and abstracts over not knowing the exact path to your real interpreter in advance. It just has different semantics about how it finds the absolute path; env looks at $PATH but relexec looks relative to the script.

Arguably, if env had supported relative paths, there would be no need for relexec to exist. You could just have done #!/usr/bin/env --relative python3. But that doesn't work, because env also doesn't split on spaces. (On other kernels like xnu, that would actually work.)

Also, it's sometimes useful to be able to pass environment variables to processes - think e.g. PYTHONUNBUFFERED or PYTHONIOENCODING (which has no command-line equivalent). We also have at least one internal use case where we need to set environment variables to place build dependencies on PYTHONPATH. But #!/usr/bin/env cannot do this, at least on Linux, despite that kind of being env's whole purpose.

And, of course, you cannot pass arguments to the command in question like python -u.

I think we should do something like the following:

  • Read in the shebang from the file to work around any kernel splitting on spaces, truncation, etc., for the reasons described in Refuse to act if argv[1] contains spaces #4 (comment). If the shebang is longer than 4096 bytes (i.e. if there is no newline in the first 4096 bytes), error.
  • Word-split it with POSIX sh quoting rules, ignoring the variable expansion rules, namely:
    • A backslash escapes the next character
    • A single quote quotes everything until the next single quote
    • A double quote quotes everything until the next double quote, except that a backslash escapes the next double quote or backslash
    • If there is no character following a backslash or no closing quote for an opening quote, error
    • Split on whitespace
  • For each token,
    • If it is the exact string --, discard it, then ignore this rule and the next two for all future tokens
    • If it starts with a -, treat it as an option to relexec (of which there are none currently defined and so this produces an error)
    • If it contains an =, treat it as an environment variable assignment
    • Otherwise add it to argv
  • Locate argv[0] relative to the script, and execve(relative_argv0, argv)

That lets us subsume the behavior of env and also gives us an affordance for future extensions (e.g., expanding ~, expanding variables, adding wrappers that are not found relatively, etc.) via command-line options.

In particular, at this point, #!/usr/lib/relexec PYTHONIOENCODING=ebcdic python3 and #!/usr/lib/relexec python3 -u will both work, and #!/usr/lib/relexec --something python is defined as an error, so we can redefine it later without breaking compatibility.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions