Skip to content

feat: avoid double fetches on ibis.duckdb.connect().read_csv("https://slow_url").cache() #10845

Closed
@NickCrews

Description

@NickCrews

Is your feature request related to a problem?

When you call .read_csv() on the duckdb backend, this makes duckdb actually go fetch [some] of the data in order to sniff the schema. Then, when you call .cache() on the created view, it actually goes and fetches the full data.

This is related to #9931.

What is the motivation behind your request?

I am working on relatively large tables on a slow internet connection. Each fetch takes about 30 seconds. I would like to avoid this double fetch.

Describe the solution you'd like

Since the result of .read_csv() needs to be a Table with a known schema, it is going to be required to fetch some data during that function call. So, I think we need to add an optional argument to the function, or create entirely new function. I would vote for adding params if we can come up with something sane. Maybe cache: bool?

What version of ibis are you running?

main

What backend(s) are you using, if any?

duckdb

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureFeatures or general enhancements

    Type

    No type

    Projects

    Status

    done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions