-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor changes to Python fileio #24363
Conversation
The file naming function also takes the destination as an input variable.
When fileio writes several shards, the last shard will have as name "00123-of-00124" which may look like incomplete processing. This commit starts the first shard name with 1, and the last shard name with N (for N shards).
Codecov Report
@@ Coverage Diff @@
## master #24363 +/- ##
=======================================
Coverage 73.35% 73.35%
=======================================
Files 718 718
Lines 97033 97033
=======================================
+ Hits 71177 71180 +3
+ Misses 24525 24522 -3
Partials 1331 1331
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Assigning reviewers. If you would like to opt out of this review, comment R: @AnandInguva for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
run python precommit |
LGTM, just rerunning python |
This makes more sense, obviously. Just have worried that if, ever, some user apps could assume the (00123-of-00124) naming pattern and we will break them mysteriously. It may deserve to be noted in "breaking change". |
Agreed with Yi. Can you add a note to the changes file? |
I also suspect same issue also exists elsewhere in the code base. e.g. here:
unfortunately finding and changing all these would be tedious... |
Sorry, I mean the finding is great, however due to same issue elsewhere, partial fix would then make the indexing inconsistent throughout the code base. |
Ah, ok. I have checked that this is used in fileio and in the dataframes module. I will update this PR with more details about the potential impact. So far, in my tests, this does not break anything anywhere else in the Python SDK. Or do you mean that the behavior should be the same in the Java and Python SDKs? I can also add a similar change to the Java SDK, and check where is that used in the rest of the SDK to evaluate potential impact. |
Yes, ideally naming should be consistent throughout SDKs. Currently it is consistent (though weird). Would appreciate if you are willing to find the appearance in both the Java and Python SDKs. |
Reminder, please take a look at this pr: @AnandInguva @johnjcasey |
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @tvalentyn for label python. Available commands:
|
stop reviewer notifications |
Stopping reviewer notifications for this pull request: requested by reviewer |
I'd like to keep working on and change the Go and Java SDK. What should I do with Git? Should I create a new branch from master? Seams that I can not create a branch based on this branch(iht:minor_python_doc) |
thanks for the interests. I would expect many places to change and imo limited benefit compared to the risk. |
I am going to close this PR now, and I will be sending a new one with consistent changes between the Python and Java SDKs. After the holidays, this branch is outdated, and I prefer just to start over than resolving the conflcits. |
This PR fixes #24362 and adds a slight change to the documentation, to make it clear that the
file_naming
function also receivesdestination
as a parameter (it is mentioned at the beginning of the documentation page, but inWriteToFiles
that parameter description is missing).GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.