Skip to content

recursive_file_lookup doesn't read files from subdirectories when set to True in add_parquet_asset #11017

Open
@jagpsz

Description

@jagpsz

Describe the bug
I would like to read all parquet files from subdirectories from s3. I am doing that in databricks. My data is partitioned by yyyy, mm, dd, hh but I want to validate the whole day at once. recursive_file_lookup doesn't seem to work as expected,

I get
TestConnectionError: No file in bucket "my_bucket" with prefix "" and recursive file discovery set to "False" found using delimiter "/" for DataAsset "inventory_parts_asset_".

To Reproduce

import great_expectations as gx

# Get the Ephemeral Data Context
context = gx.get_context(mode="ephemeral")
assert type(context).__name__ == "EphemeralDataContext"

# Define the Data Source's parameters:
data_source_name = "source_name"
bucket_name = "my_bucket"
boto3_options = {
    "region_name": "region",  
    "endpoint_url": "endpoint_url",  
    "aws_access_key_id": "key_id",  
    "aws_secret_access_key": "access_key", 
}
# Create the Data Source:
data_source = context.data_sources.add_or_update_spark_s3(
    name=data_source_name, bucket=bucket_name, boto3_options=boto3_options
)

asset_name = "inventory_parts_asset_"

It doesn't work: 
s3_prefix = "my_prefix/yyyy=2025/mm=03/dd=09/"
data_asset = data_source.add_parquet_asset(name=asset_name, s3_prefix=s3_prefix, recursive_file_lookup=True)

It works:
s3_prefix = "my_prefix/yyyy=2025/mm=03/dd=09/hh=00/"
data_asset = data_source.add_parquet_asset(name=asset_name, s3_prefix=s3_prefix, recursive_file_lookup=True)

Expected behavior
Read all parquet files from subdirectories

Environment (please complete the following information):

  • Operating System: MacOS
  • Great Expectations Version: 1.3.9
  • Cloud environment: AWS

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Fixing

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions