Update pyarrow and microdata-tools by pawbu · Pull Request #156 · statisticsnorway/microdata-job-executor

pawbu · 2025-03-26T13:13:40Z

One test for partitioned datasets read the .paquet file which returns also the partitioned column. Added a test for the usual case when just asking for "start_year=123" and not "start_year=123/04f4164ec1f247f2ad392fa9c03e71fe-0.parquet".

sonarqubecloud · 2025-03-26T13:14:04Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

DanielElisenberg · 2025-03-26T13:27:15Z

~~so reading the whole directory in as a table doesn't include the column that was partitioned upon, but reading one of the partitions directly yields this column?🤔~~

EDIT:

Reading the whole dataset would yield the partitioned column status_date
Reading the partition folder has status_data
Reading the parquet in the partition folder does not have status_date

Would this be correct? And why do we want to test reading from a single partition, since the app never does? 👀 To document pyarrow behavior?

pawbu · 2025-03-26T13:47:00Z

start_year is the partitioned column in this case, so:

Reading the whole dataset would yield the partitioned column start_year
Reading the partition folder does not yield start_year
Reading the (single) parquet file in the partition folder would yield start_year

And why do we want to test reading from a single partition, since the app never does?

I haven't checked the reason for the test in question. Can do that later after merging this PR since the microdata-tools is out, so we should not be runnning different version here in job-executor for too long 👍

DanielElisenberg

That makes sense 👍🏻 Reasonable conclusion. Let's look at whether reading the separate folders and containing parquet files is reasonable in a new PR 💯

deps: update pyarrow and microdata-tools

b250da1

pawbu requested a review from a team as a code owner March 26, 2025 13:13

DanielElisenberg approved these changes Mar 26, 2025

View reviewed changes

pawbu merged commit 180c62e into main Mar 27, 2025
6 checks passed

pawbu deleted the update-pyarrow branch March 27, 2025 07:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update pyarrow and microdata-tools#156

Update pyarrow and microdata-tools#156
pawbu merged 1 commit intomainfrom
update-pyarrow

pawbu commented Mar 26, 2025

Uh oh!

sonarqubecloud bot commented Mar 26, 2025

Uh oh!

DanielElisenberg commented Mar 26, 2025 •

edited

Loading

Uh oh!

pawbu commented Mar 26, 2025 •

edited

Loading

Uh oh!

DanielElisenberg left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pawbu commented Mar 26, 2025

Uh oh!

sonarqubecloud bot commented Mar 26, 2025

Quality Gate passed

Uh oh!

DanielElisenberg commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pawbu commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DanielElisenberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DanielElisenberg commented Mar 26, 2025 •

edited

Loading

pawbu commented Mar 26, 2025 •

edited

Loading