Skip to content

Support determining mode based on shebang interpreter directive#47

Open
iainh wants to merge 3 commits into
zee-editor:masterfrom
iainh:mode_from_shebang
Open

Support determining mode based on shebang interpreter directive#47
iainh wants to merge 3 commits into
zee-editor:masterfrom
iainh:mode_from_shebang

Conversation

@iainh

@iainh iainh commented Jul 18, 2022

Copy link
Copy Markdown
Contributor

Attempt to determine the appropriate mode based on the presence of an interpreter directive on the first line of the file. Interpreter directives are examined first, followed by the filename when determining the mode mirroring what Emacs does by default.

Reopening now that highlight-with-queries has been merged. Feel free to close if there is no interest in supporting this feature.

Attempt to determine the appropriate mode based on the presence of
an interpreter directive on the first line of the file. Interpreter
directives are examined first, followed by the filename when
determining the mode mirroring what Emacs does by default.

@mcobzarenco mcobzarenco left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really awesome! Thanks so much, it's a very useful addition I wished I had before.

Sorry for the multiple comments and being pedantic about not loading the whole file, happy to help with any of them

Comment thread zee-grammar/src/config.rs
pub indentation: IndentationConfig,
pub grammar: Option<GrammarConfig>,
#[serde(default)]
pub shebangs: Vec<String>,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking whether this should be a new field, rather than a new variant of FilenamePattern -- granted, we should probably rename it to FilePattern as it will look inside the file too to determine if the mode applies. I.e. something like FilePattern::Shebang.

The structure I suggest may make it harder to avoid reading the file when the filename would suffice.

Comment thread zee/src/editor/buffer.rs Outdated
Comment on lines +176 to +184
let mode = text
.line(0)
.as_str()
.and_then(|shebang| context.0.mode_by_shebang(shebang))
.or_else(|| {
file_path
.as_ref()
.and_then(|path| context.0.mode_by_filename(path))
})

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may potentially read the whole file in pathological cases, e.g. a minified file that has no new lines. One goal of zee is to keep being fast for any kind of pathological file you can think of and do anything that is potentially blocking in the background (e.g. parsing syntax or writing the file to disk). I've also been trying to avoid doing anything linear in the length of a line in the UI thread.

I think the right solution here long term is to build buffers, i.e. call Buffer::new() in a background thread, rather than in the main, UI thread.

For this PR though, I'd be happy if instead you just bound how much of the file you read, say 256 bytes at most and test the regex for that. You'll have to deal with potentially truncated utf-8...

Maybe a better solution is to read characters until you encounter either 1. a new line or 2. if you don't after X characters, you give you and don't check the shebang -- essentially we only test if shebangs apply up to a certain line length.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 2nd comment is that I think we should test if "mode_by_filename" applies and only then check the shebangs to avoid having to read the file if the name already matches.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestions. I'm looking to see if there is any standards around whether white space is allowed before the #! and what the maximum length is on most platforms. I have some outdated information that 127-512 bytes is the maximum but with a lack of a standard to point at, I think an overly large maximum might be best. FreeBSD for example historically supported 4096. If white space before the shebang is not allowed then a two pass strategy where only the first two characters are examined followed by the remainder of the line up to the maximum discussed might be the most efficient.

Comment thread zee/src/editor/mod.rs Outdated
Comment on lines +97 to +106
pub fn mode_by_filename(&self, filename: impl AsRef<Path>) -> &Mode {
pub fn mode_by_filename(&self, filename: impl AsRef<Path>) -> Option<&Mode> {
self.modes
.iter()
.find(|&mode| mode.matches_by_filename(filename.as_ref()))
.unwrap_or(&PLAIN_TEXT_MODE)
}

pub fn mode_by_shebang(&self, shebang: &str) -> Option<&Mode> {
self.modes
.iter()
.find(|&mode| mode.matches_by_shebang(shebang))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If shebang is a variant of FilenamePattern, there would be one function and we could continue to return &Mode rather than Option<&Mode> -- I would like to continue have Context own coming up with a Mode for any possible file whatsoever and not have a default PLAIN_TEXT_MODE potentially duplicated

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about merging these methods into something like mode_by_file_pattern that would always return a &Mode but I'm stuck on how to handle the difference in parameters. Two of the variants operate on the filename while the other needs a portion of the file content. To have one method we would have to always pass a portion of the file content to the function in addition to the filename which would require reading at least part of the content of every file.

All of that being said, prior to detecting the mode or creating the buffer open_file() is calling Rope::from_reader() which I think might be reading the entire file anyway.

Rope::from_reader(BufReader::new(File::open(&file_path)?))?,
I reading that right? If so we already have the Rope created and available in Buffer::new() so reading a slice should be quite efficient.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an updated version of this code with a limit on the length of text examined for the interpreter directive, an inversion of the order mode checks are performed (filename checks first, then shebang), and a merging of the mode_by_filename() and mode_by_shebang() methods into one, mode_by_file(), which always returns a &Mode.

I ended up settling on 256 characters for the maximum length of the shebang directive, matching what linux has done since 2018 (https://lore.kernel.org/lkml/20181112160956.GA28472@redhat.com/) (https://github.com/torvalds/linux/blob/master/include/uapi/linux/binfmts.h)

I haven’t found a satisfactory way of adding a shebang case to FilenamePattern, the biggest stumbling blocks being that the shebang line would need to be passed to the matches() method and the pattern list would need to be sorted/filtered if we wanted to ensure that filename patterns were always examined before shebang directives.

iainh added 2 commits July 20, 2022 22:26
1) Constrain the number of characters examine when determining the
   interpreter directive to 256 to avoid performance penalty of scanning
   long lines unnecessarily.
2) Move to a single mode_by_file() function in Context.
3) Always attempt to match the mode based on the filename first,
   falling back to the shebang line only if no match is found.
Comment thread zee/config/config.ron
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants