feat: improve API for choosing a backend on memtable.cache()

### Is your feature request related to a problem?

Consider the following script:

```python
import ibis

conn = ibis.duckdb.connect("mydb.duckdb")
if "my_table" not in conn.tables:
    conn.create_table("my_table", schema={"c": "int64"}, overwrite=False)


def get_new_data():
    t = ibis.memtable([{"a": 1, "b": 2}, {"a": 3, "b": 4}])
    t = t.select(c=t.a + t.b)
    print(t._find_backends())
    # ([], False)
    t = t.cache()
    print(t._find_backends())
    # ([<ibis.backends.duckdb.Backend object at 0x11df823d0>], False)
    t = t.mutate(c=t.c + 1)
    return t

def ingest(conn, new):
    print(conn)
    # <ibis.backends.duckdb.Backend object at 0x11fb4ef10>
    print(f"adding {new.count().execute()} rows to new_table")
    already_there = new.semi_join(conn.table("my_table"), "c")
    print(f"skipping {already_there.count().execute()} rows already in my_table")
    # IbisError: Multiple backends found for this expression
    return conn.insert("my_table", new)
    # Catalog Error: Table with name ibis_cached_ao56qedyyva3hjxvhrew7g3utu does not exist!

ingest(conn, get_new_data())
```

With an original memtable, there is no backend. But as soon as you .cache() it, then it ends up in the default backend. This is a problem when I am trying to make this memtable interact with a backend that is not the default.

I have actually sporadically been encountering this issue for literally 2 years, and I only am finally now realizing what the root cause is. It was so hard to figure out the cause because its such spooky action at a distance, adding the .cache to some distant line of code made the error only pop up much later in the script.

### What is the motivation behind your request?

The workaround is to never .cache() any memtables. But some of the intermediate computations I am doing are expensive, so I really do want to be able to cache them in the middle of the computation chain.

### Describe the solution you'd like

A few ideas, none of which I love:

### 1.  Optional backend param to memtable

Add a `backend=None` param to ibis.memtable. Then, whenever a subsequent .cache() happens, the expression uses this backend. This unfortunately makes it so that if you do `mt = ibis.memtable(..., backend=conn)`, then mt is forever only compatible with conn, you can't use it with conn2, which is a little counterintuitive to my mental model of a memtable, which is an ibis table that is in-memory and thus works with any backend.

### 2. Optional backend param to .cache()

Really, the time at which we need to decide on a backend for a memtable is only when we start running computations. We ideally shouldn't need to specify the backend at expression creation time. So, perhaps an API is `Table.cache(backend=None)`. But then this is awkward, because you could do `backend1.table("t").cache(backend=backend2)`, which should be illegal. It would be better if our API made this impossible to do.

### 3. Add Backend.cache() method

This keeps the API of ibis.memtable and Table.cache from needing the backend param, which is nice.

### 4. On memtable.cache(), you get another backend-agnostic memtable

Like it would use the default backend (usually duckdb), compute the result, but then return a special Op that, when required, actually goes to the backend, calls eg .to_pyarrow on the result, and then hands you back a new ibis.memtable from that.
e.g. equivalent to `ibis.memtable(t.to_pyarrow(), schema=t.schema())`. Ideally we could make it so that this was lazy, and it only materialized the memtable on demand when crossing the boundary between two backends.
The most complex solution, but the best user API.

Out of all these, I really want to avoid the middle 2 options, because I want to make my `expensive_computation` function backend-agnostic. I don't want to have to worry about what backend the given table is in, and both of the second two require the user to choose a backend at cache time.

### What version of ibis are you running?

main

### What backend(s) are you using, if any?

_No response_

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: improve API for choosing a backend on memtable.cache() #10942

Is your feature request related to a problem?

What is the motivation behind your request?

Describe the solution you'd like

1. Optional backend param to memtable

2. Optional backend param to .cache()

3. Add Backend.cache() method

4. On memtable.cache(), you get another backend-agnostic memtable

What version of ibis are you running?

What backend(s) are you using, if any?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: improve API for choosing a backend on memtable.cache() #10942

Description

Is your feature request related to a problem?

What is the motivation behind your request?

Describe the solution you'd like

1. Optional backend param to memtable

2. Optional backend param to .cache()

3. Add Backend.cache() method

4. On memtable.cache(), you get another backend-agnostic memtable

What version of ibis are you running?

What backend(s) are you using, if any?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions