More thorough contribution guideline #15365
Replies: 4 comments 3 replies
-
Thank you @logan-keede for this. There is indeed a lot of refactoring going on and I think we can do much better w.r.t. how we approach refactoring. A few thoughts:
|
Beta Was this translation helpful? Give feedback.
-
Do we have any communication channel for collecting feedback, or announcing feature branch? @alamb What are your thoughts on this, maybe we can try out 'feature branch' approach with remaining work in #14444? |
Beta Was this translation helpful? Give feedback.
-
In general I think there will always be a tension between:
|
Beta Was this translation helpful? Give feedback.
-
@logan-keede could you potentially file a ticket with this idea (adidng a cargo semver CI check?) |
Beta Was this translation helpful? Give feedback.
-
I am opening this discussion to discuss about how to approach refactoring and perhaps changes in general to make it easier for downstream repos and be more efficient with review process.
This came up while discussing my GSoC 2025 proposal for "Optimizing compile time and binary size" with @ozankabak which expects a large amount of refactoring.
After some research, I found that almost no Open Source Repository has something like Refactoring Guideline and it is reasonable generally it is not needed, general contribution guideline is enough. However, Datafusion is perhaps a bit too refactoring happy/needy.


DataFusion :-
A repo with 17 times more commit then datafusion:-
Perhaps a direct comparison is not fair, because we do need refactoring. So the best we can do is to make it easier for everyone.
Proposed Solution
Make a feature branch, Do all the Major refactoring there publish a Roadmap on Why this refactoring/change is necessary and what does it change. This is perhaps more useful for refactoring Epics like [Epic] Split datasources out from
datafusion
crate (datafusion/core
) #14444.suggested by @ozankabak over discord
Use 'cargo-semver-checks' to detect unintentional API breakages. Smallest things can break APIs in ways we can not predict. Here is an article about this.
add do's and don'ts in Guideline. Start with a tentative version and refine it over time.
DataFusion already has a Contribution Guideline, which explain the general style with which we handle PRs and Issues but it does not go into great detail what to do and to not do. While this is not a big problem(if a problem at all) for more experienced member of community it is still good highlight Good and Bad Practice for the newer members.
This also make sure that we have a DataFusion way of dealing with problems and make sure that there is no unexpected or uninformed(as much as possible) API changes/breaking. It will also save some reviewing bandwidth as reviewer will not have to explain same old common reasons for rejection again and again.
It will be valuable to collect community's ideas on this and reviews of downstream maintainers on what kind of Datafusion issues they face that can be avoided through better policy in this discussion.
Beta Was this translation helpful? Give feedback.
All reactions