Skip to content

Add .iada and .DS_Store to .gitignore#648

Open
slawekjaranowski wants to merge 1 commit intomainfrom
gitignore
Open

Add .iada and .DS_Store to .gitignore#648
slawekjaranowski wants to merge 1 commit intomainfrom
gitignore

Conversation

@slawekjaranowski
Copy link
Member

No description provided.

@sebbASF
Copy link
Contributor

sebbASF commented Mar 7, 2026

Please, no.

Such ignores belong on the client, as they are client-specific (in this case IDEA and macOS).

@sebbASF sebbASF closed this Mar 7, 2026
@slawekjaranowski slawekjaranowski deleted the gitignore branch March 7, 2026 19:18
@potiuk
Copy link
Member

potiuk commented Mar 7, 2026

I would actually support it strongly. This is a trap that is set for people, and adding configuration in the repo is absolutely no problem. @slawekjaranowski - I am happy to accept such change.

This is how most repos that are "contributor friendly" look like nowadays.

@potiuk
Copy link
Member

potiuk commented Mar 7, 2026

More explanation:

  • casual contributors might accidentally commit those files when they are not ignored
  • having it in our repo protects it from the case where someone did not have global configuration
  • if we want more people to contribute, making it more difficult for them and basically expect them to have "extra requriements" to contribute is a bad idea
  • we do not have it described in requirements or prerequisites that whoever needs to contribute, need to ignore their IDE files
  • those are two lines in a .gitignore file that are harmless otherwise

Do you think @sebb making it difficult to contribute by casual contributors is a good idea? What are we trading-off with here ? What's the problem you are trying to solve?

@slawekjaranowski
Copy link
Member Author

@potiuk @sebbASF should we reopen it - to have a possibility to more people to give own opinions?

@potiuk
Copy link
Member

potiuk commented Mar 7, 2026

Push it again -> apparently the branch has been deleted.

@slawekjaranowski slawekjaranowski restored the gitignore branch March 7, 2026 22:01
@slawekjaranowski
Copy link
Member Author

Push it again -> apparently the branch has been deleted.

Two clicks, restore branch and then reopen 😄

@potiuk
Copy link
Member

potiuk commented Mar 7, 2026

Yeah. I would love to hear more opinions on that one indeed. I have mine, @sebbASF has different - I wonder what others think.

@sebbASF
Copy link
Contributor

sebbASF commented Mar 8, 2026

The gitignore file is going to have to include a lot more entries to be certain to catch all spurious files. If committers expect that they will be protected against spurious additions, they are less likely to check.

Are you sure that there are no files that we do want to include in the repo but which are also generated by local software?

If you truly want to be safe, it seems to me that the way to do this would be to use an allow list, rather than a deny list. i.e. exclude all file types except those known to be needed. This would obviously need to be updated from time to time (but so would a deny list)

@potiuk
Copy link
Member

potiuk commented Mar 8, 2026

The gitignore file is going to have to include a lot more entries to be certain to catch all spurious files. If committers expect that they will be protected against spurious additions, they are less likely to check.

Yes. There are well known patterns that are usually added when you start a repo (usually per language) - even when you create a repo in GH github propopses you add such .gitignore - we could (should) add more here indeed.

Here is the collection of commong GitIgnore patterns: https://github.com/github/gitignore

We could likely adopt and copy the right ignores for a subset of those.

If you truly want to be safe, it seems to me that the way to do this would be to use an allow list, rather than a deny list. i.e. exclude all file types except thoseecau known to be needed. This would obviously need to be updated from time to time (but so would a deny list)

This is problematic. We do it in Airflow for example for Dockerignore - we exclude everything and add what we want to add explicitly, because that sometime significantly speeds up building container images. because first thing that happens when you build image Docker will have to create and send "context" to the build engine and in case you have generated node modules or other files, that might be multiple GBs (and sometimes 10s of seconds of context preparation). But it causes multiple problems with collaboration because include pattern is far less obvious for people and difficult to reson for, and people are often surprised by it's behaviour. So our .gitignore in airflow is just regular ignoring everything that we do not want.

That seems to work well for many projects and in our case comparision of .gitignore and .dockerignore approach is that the default exclusion causes way more confusion.

I think it really depends what you want to optimize for. I tend to optimize fore new contributor's experience - make sure that those new contributors have very little surprises and that they are not falling into trap of accidentally committing things they are likely to accidentally commit. And new, casual contributors are more likely than not to use IDEs.

What's your optimisation goal @sebbASF ? What do you think will be the goal achieved by not having those ignores?

@dave2wave
Copy link
Member

dave2wave commented Mar 8, 2026

For ATR we have these additional patterns which if we are being thorough here should be considered:

.env
.mypy_cache/
.pytest_cache/
.ruff_cache/
.venv-poetry/
.venv-uv/
.vscode/
*.pyc
*.tmp.py

@potiuk
Copy link
Member

potiuk commented Mar 8, 2026

Yep. @dave2wave

Really helpful where you have Python code.

It's super annoying (for both maintainers and contributors) to see those committed in the PR and having an extra back-forth when it can be prevented. Especially with the current world of Agent generated contributions, those agents might also generate some of those files accidentally and commit them as part of a PR and having it explicitly defined in .gitignore saves a lot of time, focus on energy on things that might be entirely prevented.

@sebbASF
Copy link
Contributor

sebbASF commented Mar 8, 2026

deny lists are inherently unsafe, and will need constant updating, depending on changes to environments outside the control of the repo developers. And every gitignore file in every repo will need updating for each new file.

Whereas allow lists only depend on factors totally within the control of the repo developers, and will need fewer updates. Also only the relevant repos will need to be updated.

I also don't think it is helpful for users to absolve them of any responsibility for checking what they are committing.

If they get used to not bothering to check what they are committing, they may be in for a rude shock when working on a Git repo without some or all of the the ignores that they have become accustomed to.

And what about files that are not subject to gitignore but which should not be committed?
e.g. a test version of a file that contains some credentials?

@potiuk
Copy link
Member

potiuk commented Mar 8, 2026

deny lists are inherently unsafe, and will need constant updating, depending on changes to environments outside the control of the repo developer ts. And every gitignore file in every repo will need updating for each new file.

Very generic statement, and quite contrary to comon wisdom among ~50% of apache repositories - last that I checked that was a number of project that had .idea or .vscode in their .gitignores. And no - those patterns are well known to start with, and every time when we discover new file being needed we usuallly add a pattern covering it - making it future-proof. I do not think we have that many files to ignore when we have a good baseline.

I also don't think it is helpful for users to absolve them of any responsibility for checking what they are committing.

It's not absolving them from responsibity, it's just preventing mistakes when they are easy to prevent and likely to happen. That's a different thing. Also honestly I would prefer my contributor use their decision making pool on deciding whether the change they contribute make sense, rather than havin to remember things that can be automated and people can forget about it.

If they get used to hinot bothering to check what they are committing, they may be in for a rude shock when working on a Git repo without some or all of the the ignores that they have become accustomed to.

Of course. And first thing they will do is to propose to add .gitignore -as @slawekjaranowski did with his first commit I guess he just (As I do) run git status and saw that red message about some dangling files that should not be committed. That's quite expected and nice way to spread good practices.

And what about files that are not subject to gitignore but which should not be committed?
e.g. a test version of a file that contains some credentials?

I am not sure what it has to do with the main point It's a totally differen story - for that we usually use prek hooks, which could be good. I am about to propose it in the future- there are automated tools to prevent secrets, also GitHub has built in scanner (Which we have enabled for all repos) that prevents pushing something that looks like private key or credentials.

Just a kind reminder. You still haven't answered my question @sebbASF:

What's your optimisation goal @sebbASF ? What do you think will be the goal achieved by not having those ignores?

I think understanding where are you coming from and what goal you want to achieve would help us to understand how we can respond to your needs.

@raboof
Copy link
Member

raboof commented Mar 9, 2026

An allowlist would add more friction than it removes.

I don't think we can name 5 times where people created PRs that included an .idea folder to this repo, so it's not obvious there really is a problem to be solved here in the first place. A conversation can be had about whether it's more helpful to have a complete .gitignore in the repo, or to help new contributors learn about $HOME/.config/git/ignore.

That said, I also don't see strong a reason to deny this contribution, and we want to encourage people to bring their perspectives and use cases and contribute - for me that'd be reason enough to +1 this PR.

@potiuk
Copy link
Member

potiuk commented Mar 9, 2026

@slawekjaranowski : proposal - maybe you could take a look at those https://github.com/github/gitignore collection and compose a comprehensive list of .gitignore entries that fit the repo and update the PR with it :). That would be probably the best outcome of the discussion.

@sebbASF
Copy link
Contributor

sebbASF commented Mar 9, 2026

An allowlist would add more friction than it removes.

How so? I don't follow.

@slawekjaranowski
Copy link
Member Author

@slawekjaranowski : proposal - maybe you could take a look at those https://github.com/github/gitignore collection and compose a comprehensive list of .gitignore entries that fit the repo and update the PR with it :). That would be probably the best outcome of the discussion.

In https://github.com/github/gitignore we have many examples ...
This repo probably is not a Python application ... so I need a hint which one from mentioned collection should be applied here

This one contains to many Python releated items
https://github.com/github/gitignore/blob/main/Python.gitignore

@potiuk
Copy link
Member

potiuk commented Mar 9, 2026

An allowlist would add more friction than it removes.

How so? I don't follow.

I already explained how it works in case of Airflow. With dockerignore - it adds friction because you have to be aware of that unusual pattern, and remember adding it. This adds completely unnecessary friction and is counter-intuitive and people need extra explanation how it works, because it is not obvious or used at all.

Did you see my earlier comment about it @sebbASF ? Was not that enough of an explanation?

@sebbASF
Copy link
Contributor

sebbASF commented Mar 10, 2026

Whilst dockerignore and gitignore files both act by filtering resources, the end goals are rather different.

A dockerignore file is about excluding resources that are already in the repo.
This is not always easy to determine, as there may be many interdependencies, so I agree that using an allow list here can be tricky.

Whereas a gitignore file is about allowing files types that belong in the respository. This is very easy to determine, because the types can readily be extracted from the existing file-set and updated on the rare occasion when a new file type is introduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants