DuckDB throws Invalid data or Snappy decompression failure while reading parquet from S3 after loading cache_httpfs

**Describe the bug**
Large Parquet file in S3 using snappy compression (50M records) is created (size: 4GB)
writer = pq.ParquetWriter(output_file, schema, compression='snappy')

While invoking read_parquet to select from the created parquet from S3 on the above file, DuckDB throws either Invalid data error if we load cache_httpfs.



**To Reproduce**
1. Create parquet using snappy compression (50M records and size > 4GB):
```
writer = pq.ParquetWriter(output_file, schema, compression='snappy')
```

2. Install and Load cache_httpfs:
```
INSTALL cache_httpfs FROM community;
LOAD cache_httpfs;
```

3. Set the S3 credentials:
```
D SET s3_region='***';
D SET s3_access_key_id='***';
D SET s3_secret_access_key='***';
```

4. SELECT using read_parquet:
```
D explain analyze SELECT * FROM read_parquet('your-s3-parquet-path');
Invalid Error:
TProtocolException: Invalid data
D explain analyze SELECT * FROM read_parquet('your-s3-parquet-path');
Invalid Error:
TProtocolException: Invalid data
D explain analyze SELECT * FROM read_parquet('your-s3-parquet-path');
Invalid Error:
TProtocolException: Invalid data
D explain analyze SELECT * FROM read_parquet('your-s3-parquet-path');
Invalid Error:
TProtocolException: Invalid data
D explain analyze SELECT * FROM read_parquet('your-s3-parquet-path');
Invalid Error:
TProtocolException: Invalid data
D explain analyze SELECT * FROM read_parquet('your-s3-parquet-path');
Invalid Error:
don't know what type:
D explain analyze SELECT * FROM read_parquet('your-s3-parquet-path');
Invalid Error:
TProtocolException: Invalid data
```


Even after multiple retries, the query does NOT succeed.

**Expected behavior**
The query should be executed successfully

**Screenshots**
NA

**Desktop (please complete the following information):**
MacOS arm64

**Smartphone (please complete the following information):**
NA

**DuckDB Version:**
v1.4.3 (Andium) d1dc88f950

**DuckDB Client:**
CLI

**Additional context**
Initially created an issue in DuckDB side: https://github.com/duckdb/duckdb/issues/20167


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DuckDB throws Invalid data or Snappy decompression failure while reading parquet from S3 after loading cache_httpfs #331

Sub-issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

DuckDB throws Invalid data or Snappy decompression failure while reading parquet from S3 after loading cache_httpfs #331

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions