Skip to content

Conversation

@DecimalTurn
Copy link
Contributor

This pull request introduces .gitattributes overrides as the reported "strategy" when using the --strategies option. It also adds small integration tests to verify that CLI behavior reflects these changes while preserving the normal behavior when --strategies is not specified.

Language detection improvements:

  • When the language is specified by the linguist-language attribute in .gitattributes, define the strategy as GitAttributes.

Testing and instrumentation enhancements:

  • Added tests to test_basic_instrumenter.rb to verify that the detection strategy and language are correctly tracked when .gitattributes overrides are present, and that the strategy is recorded as GitAttributes.

CLI integration and coverage:

  • Added a new test_cli_integration.rb suite with tests for CLI flags (--strategies, --breakdown, --json) to ensure that .gitattributes overrides are detected, the correct strategy is reported, and JSON output remains accurate.

@DecimalTurn DecimalTurn requested a review from a team as a code owner September 23, 2025 05:04
@Alhadis
Copy link
Collaborator

Alhadis commented Sep 23, 2025

I think it would be clearer to users if we reported "(overridden by [path/to/.gitattributes])" instead of merely "GitAttributes" (as there's really only one that affects Linguist's classification).

So instead of treating linguist-language overrides as a faux-strategy, perhaps tailor the output to include an [overridden] flag, possibly prepended to the strategy that would otherwise have matched in the absence of the override.

@DecimalTurn
Copy link
Contributor Author

DecimalTurn commented Sep 23, 2025

Sounds good, I've made the change to display the information like so:

<FilePath> [<StrategyName> (overridden by .gitattributes)]

This change means that detection has to take place even when there is an override in order to determine what strategy would be used. However, the changes in e879628 also introduce a check on Linguist.instrumenter to see if its defined before running the full detection (this is equivalent to check if --strategies was specified). This way, we keep the lazy approach if the strategy isn't needed.

Adding the .gitattributes path seems to be too complicated to implement for what it adds, but I'm not against the idea in principle.

@DecimalTurn DecimalTurn changed the title Add .gitattributes override as a strategy Add .gitattributes override mention when returning the strategy Sep 23, 2025
Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good idea to me. Can you please also update the README.md to document and show the output this change introduces.

@DecimalTurn
Copy link
Contributor Author

DecimalTurn commented Oct 9, 2025

Seems like a good idea to me. Can you please also update the README.md to document and show the output this change introduces.

In the latest commits, I've added:

  • An intro section about the different modes (Git Repository vs Single file)
  • Documentation about --strategies in Git Repository mode
  • Documentation about all relevant flags in Single file mode

README.md Outdated
```

If a file's language was overridden using `.gitattributes`, the strategy will show the original detection method along with an override note:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"strategy will show the original detection method along with an override note"

🤔 this is confusing with the output below. devcontainer.json is already JSONC but the output suggests it's overridden to JSONC by an override. Why would you override to the same language?

I think it would be better to use an example where there is a clear difference in languages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you override to the same language?

I don't know why, but it's the only "override" in the linguist repo 🤷

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to use an example where there is a clear difference in languages.

Ok, I'll change it for a better example... I'm thinking of using .bas file override in https://github.com/tannerhelland/PhotoDemon

Copy link
Member

@lildude lildude Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why, but it's the only "override" in the linguist repo 🤷

Whoops. I'm not sure how that slipped past. We shouldn't need that override.

As an aside, we use the test/attributes branch for testing overrides. The .gitattributes in that branch has a lot more overrides.

Copy link
Contributor Author

@DecimalTurn DecimalTurn Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This discussion makes me think that maybe the implementation of the --strategies flag should make the note appears different if the detection gives the same language as the override.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

          # Get the original strategy by calling super (which calls Linguist.detect)
          original_language = super
          original_strategy_info = Linguist.instrumenter.detected_info[self.name]
          original_strategy = original_strategy_info ? original_strategy_info[:strategy] : "Unknown"

          # Determine if gitattributes actually changed the result
          if original_language == detected_language
            strategy_name = "#{original_strategy} (confirmed by .gitattributes)"
          else
            strategy_name = "#{original_strategy} (overridden by .gitattributes)"
          end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants