Skip to content

feat(postgres): more flexible map type implementation #10484

Open
@cpcloud

Description

@cpcloud

Discussed in #10483

Originally posted by augcollet November 13, 2024
Hello,

I need your help to resolve a specific problem...

From the following data with postgresql backend :

import ibis
from ibis import _
import os

con = ibis.postgres.connect(
    user=os.getenv('POSTGRES_USER'),
    password=os.getenv('POSTGRES_PASSWORD'),
    host="postgres",
    port=os.getenv('POSTGRES_PORT'),
    database=os.getenv('POSTGRES_DB'),
)
ibis.set_backend(con)

t=ibis.memtable({
    'client_id':[0,1,0,2,3,0,1,2,3],
    'product':['a', 'b', 'a', 'a', 'b', 'c', 'a', 'a', 'b'],
    'amount':[1.2, 2.5, 4.2, 12.7, 1.2, 3.8, 1.4, 3.8, 3],
})

image

I'm trying to perform the following calculation :
image

I tried the following approach:

data=(
    t.group_by(['client_id', 'product'])
    .agg(
        sum_amount=_['amount'].sum()
    )
    .group_by(['client_id'])
    .agg(
        products_and_sum_amounts=ibis.map(
            _['product'].collect(),
            _['sum_amount'].collect()
        )
    )
)
data.execute()

I get the following error :
image

It seems that ibis uses hstore to store data from a .map, which is incompatible with numeric values.

I have to cast the values ​​to a string before using .collect to get a result.
image

How can I get around this? For example, how can I build a JSON object instead of MapValue?

( My goal is to exploit the resulting pandas dataset to use it with a DictVectorizer under sklearn.
https://scikit-learn.org/1.5/modules/generated/sklearn.feature_extraction.DictVectorizer.html )

Thank you in advance for your support!

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureFeatures or general enhancementspostgresThe PostgreSQL backendrefactorIssues or PRs related to refactoring the codebase

    Type

    No type

    Projects

    • Status

      backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions