Skip to content

Change the indent detection algorithm from GCD to compare-lines #11

@roryokane

Description

@roryokane

This is one of the algorithms described in the article Detecting Code Indentation (along with GCD, minimum-width, and neural net). This algorithm is also used in sindresorhus’s detect-indent JavaScript library (which talks a little about its algorithm in its README).

Why switch from the current GCD algorithm to the compare-lines one?

  • As far as I can tell from running the algorithm manually on the test files, it will fix all three of the unit tests that are currently failing. That is, for those 12 test cases, it detects the desired settings more accurately.
  • The author of Detecting Code Indentation decided to choose that algorithm too, in the end. From the author’s tests of 500 example files, the compare-lines algorithm is the most likely to be the best. The compare-lines algorithm is the only algorithm that showed any statistically-significant benefit compared to another algorithm.
  • The algorithm requires fewer language-specific workarounds, simplifying the code and avoiding having to implement some workarounds in the future. For example, there would not need to be any code to specifically handle single-space-indented C comments.

I might be able to fix the test cases by just adding some workarounds described in Detecting Code Indentation, rather than switching algorithms. But either way, fixing the test cases requires work. Given the benefits described above apart from fixing the test cases, changing algorithms sounds better than patching the current one.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions