Skip to content

Feature Request: support for xarray objects #59

Open
@andersy005

Description

@andersy005

First, fantastic work! provenance has a lot of features I am looking for 😀.

Second, I would like to extend provenance functionality to support one of my use cases i.e. tracking provenance of xarray objects. Currently, trying provenance on xarray objects results in an error.

See this example
In [1]: import provenance as p 
   ...:  
   ...:  
   ...: p.load_config({'blobstores': 
   ...:                {'disk': {'type': 'disk', 
   ...:                          'cachedir': 'artifacts', 
   ...:                          'read': True, 
   ...:                          'write': True, 
   ...:                          'read_through_write': False, 
   ...:                          'delete': True}}, 
   ...:                'artifact_repos': 
   ...:                {'local': {'type': 'postgres', 
   ...:                           'db': 'postgresql://localhost/provenance-basic-example', 
   ...:                           'store': 'disk', 
   ...:                           'read': True, 
   ...:                           'write': True, 
   ...:                           'create_db': True, 
   ...:                           'read_through_write': False, 
   ...:                           'delete': True}}, 
   ...:                'default_repo': 'local'})                                                                                                                                       

Out[1]: <provenance.repos.Config at 0x116021fd0>

In [2]:                                                                                                                                                                                

In [2]: import xarray as xr                                                                                                                                                            

In [3]: ds = xr.tutorial.open_dataset('rasm')                                                                                                                                          

In [4]: ds                                                                                                                                                                             
Out[4]: 
<xarray.Dataset>
Dimensions:  (time: 36, x: 275, y: 205)
Coordinates:
  * time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
    xc       (y, x) float64 ...
    yc       (y, x) float64 ...
Dimensions without coordinates: x, y
Data variables:
    Tair     (time, y, x) float64 ...
Attributes:
    title:                     /workspace/jhamman/processed/R1002RBRxaaa01a/l...
    institution:               U.W.
    source:                    RACM R1002RBRxaaa01a
    output_frequency:          daily
    output_mode:               averaged
    convention:                CF-1.4
    references:                Based on the initial model of Liang et al., 19...
    comment:                   Output from the Variable Infiltration Capacity...
    nco_openmp_thread_number:  1
    NCO:                       "4.6.0"
    history:                   Tue Dec 27 14:15:22 2016: ncatted -a dimension...

In [5]: @p.provenance 
   ...: def anomaly(ds, groupby='time.year'): 
   ...:     group = ds.groupby(groupby) 
   ...:     clim = group.mean() 
   ...:     return ds - clim 
   ...:                                                                                                                                                                                

In [6]: anom = anomaly(ds.Tair)                                                                                                                                                        
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/opt/miniconda3/envs/sandbox/lib/python3.8/site-packages/xarray/core/common.py in __setattr__(self, name, value)
    261         try:
--> 262             object.__setattr__(self, name, value)
    263         except AttributeError as e:

AttributeError: 'DataArray' object has no attribute '_provenance_metadata'

The above exception was the direct cause of the following exception:

AttributeError                            Traceback (most recent call last)
<ipython-input-6-a0084989764c> in <module>
----> 1 anom = anomaly(ds.Tair)

~/devel/ncar/provenance/provenance/core.py in wrapped(f)
    680         if tags:
    681             _custom_fields['tags'] = tags
--> 682         f._provenance_metadata = {
    683             'version': version,
    684             'name': name,

~/opt/miniconda3/envs/sandbox/lib/python3.8/site-packages/xarray/core/common.py in __setattr__(self, name, value)
    268             ):
    269                 raise
--> 270             raise AttributeError(
    271                 "cannot set attribute %r on a %r object. Use __setitem__ style"
    272                 "assignment (e.g., `ds['name'] = ...`) instead of assigning variables."

AttributeError: cannot set attribute '_provenance_metadata' on a 'DataArray' object. Use __setitem__ styleassignment (e.g., `ds['name'] = ...`) instead of assigning variables.

I would like to help with this but I want to confirm whether this is something provenance devs would be willing to have/support and/or I may be missing something?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions