Skip to content

feat: avoid double fetches on ibis.duckdb.connect().read_csv("https://slow_url").cache() #10845

Open
@NickCrews

Description

@NickCrews

Is your feature request related to a problem?

When you call .read_csv() on the duckdb backend, this makes duckdb actually go fetch [some] of the data in order to sniff the schema. Then, when you call .cache() on the created view, it actually goes and fetches the full data.

This is related to #9931.

What is the motivation behind your request?

I am working on relatively large tables on a slow internet connection. Each fetch takes about 30 seconds. I would like to avoid this double fetch.

Describe the solution you'd like

Since the result of .read_csv() needs to be a Table with a known schema, it is going to be required to fetch some data during that function call. So, I think we need to add an optional argument to the function, or create entirely new function. I would vote for adding params if we can come up with something sane. Maybe cache: bool?

What version of ibis are you running?

main

What backend(s) are you using, if any?

duckdb

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureFeatures or general enhancements

    Type

    No type

    Projects

    • Status

      backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions