New interface to tools #5830
Replies: 2 comments 4 replies
-
Thanks for the overview! It's nice and clear, making all the individual aspects much easier to be perceived. Looking forward for the PR to leave the draft stage. I love especially the Just a quick question:
If I understand how it works from the example, then there would be no need for the I almost tend to say that it is not worth it to specify prefix when the only thing it gives to you is the "shortcut". But only almost. The thing I like beside the tool name being the same as the "real" one is that your IDE whisperers/Python console then whisper the tools that you are interested in two keys earlier. :-) (summarizing it like this, it seems like too much work for almost nothing, right? Would be even anybody else than I interested in the shortcuts? Because if not, then I can live without them and the implementation of |
Beta Was this translation helpful? Give feedback.
-
Nice! The general idea of representing the tools as functions makes more sense than the objects in the pygrass interface. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Intro
I'm not happy with gs.run_command and friends and I'm not happy with Module class from grass.pygrass either (I can elaborate on that).
I did some work on creating a new API which converged with work I did for making the individual tools more accessible in command line. To allow for more specific feedback during PR reviews, I need to split the work to multiple PRs. This will hopefully help with clarity in general because as far as the the original goals are concerned, there is a lot of scope creep. On the other hand, individual PRs without the big picture would be hard to evaluate. So, I'm creating this discussion to describe the whole idea, hopefully allowing for better discussion here and more focused discussion in the PRs (once I start creating them).
Python API
Tool call is a function call
Call of a tool is a function call with the corresponding name:
While this is syntactic sugar, i.e., it is functionally equivalent with
gs.run_command("r.random.surface", output="surface", seed=42)
orsubprocess.run(["r.random.surface", "output=surface", "seed=42"])
, it makes the API to look like any other Python API, completely hiding the subprocess nature of the underlying implementation.This is similar to grass.pygrass.modules.shortcuts, but it is using a single object which is directly exposed to the user, allowing for additional uses of the object (see, e.g., overwrite below). It avoids the need to import the shortcut objects such as grass.pygrass.modules.shortcuts.raster and the need for "split" syntax for tool category (prefix) and rest of the name (
raster.random_surface
orr.random_surface
depending on how it was imported).Direct indexing of parsed JSON
For text outputs, we have more and more JSON, so the access to it as simple as possible (similar to the current best shot at this with
parse_command(...format="json")
):In the background, a text output is parsed and cached if (and only if) the indexing happened, so there is no additional cost unless you decide index the result. This is done through a result object returned by the function. All text outputs are captured by default, so there is no need to set anything like
stdout=PIPE
to make this work.Result accessed through attributes
If JSON is not the right thing, additional attributes allow for accessing the result in other ways, including plain text:
There is an underlying results object similarly to what subprocess.run returns. Our object has attributes tailored to what typical GRASS text outputs are. Most of these are on-the-fly computed properties. I did this part of the work before JSON became prevalent in GRASS, so I included all sorts of outputs including key-value pairs.
Standard input without special parameter
To avoid need for additional parameters or other method names (wanting to keep the function-tool-name correspondence and to avoid thing like
stdin=PIPE
), I implemented the following syntax:It looks like method chaining, but it actually returns a completely new object (the objects are lightweight, so the overhead for this is small). While this is okay, it is not as straightforward as I would like. See below for a parameter-based solution.
Common overwrite
With a common object for different calls, we can set overwrite for this group of calls in one place:
Interaction with a session objects
The Tools class takes a session object as a parameter. At this point, any object with an env attribute will work. This allows for isolated tool calls without need to pass env parameter to each tool:
The same can be done for environment with an env parameter which is also accepted by Tools.
Python - more (syntactic) sugar
Misspelled tool names
We control access to attributes, so we can search through available tools when a tool is not found:
I have a working implementation of this.
pygrass-shortcuts-like prefixes
The API already behaves like grass.pygrass.modules.shortcuts in the sense that tool names are functions (see above). If you like the prefix aspect of the shortcuts (@pesekon2), Tools support prefix which allows for creation of objects equivalent to raster, vector, etc. objects of grass.pygrass.modules.shortcuts:
This, however, looses some of the flexibility of having the Tools object in the user code like passing of overwrite, session, and env. I implemented the prefix as shown above, but I did not add the actual objects.
Anyway, while at it, we could create just a general tool object linked to the global session (environment):
Python API - extras
Standard input as parameter
Parameter which is an instance of io.StringIO is translated to "-" (customary command line singling to read stdin used by GRASS tools) and the content of io.StringIO is used as stdin for the subprocess. This replaces all the different input options for stdin without need for an additional parameters (
input="text...", input="-", stdin="text...", stdin_="text...", stdin=PIPE, feed_command(...)
).NumPy arrays as inputs and outputs
By pre-processing the parameters, NumPy arrays can be automatically converted to raster data:
To tell the function I want a return value which is an array, I pass an array type instead of the name. The return value is a single array or a tuple (this is similar to return values for the NumPy universal function):
This is currently working, although it is not cleaning the data. This came up during the summit in Raleigh (@lrntct) as a natural next step for this API. Now, I really see this also contributing to writing pytest tests.
Pack files as input and output
While more interesting for the "standalone" usage, rasters packed using r.pack (using
.grass_raster
extension in the example):I have an implementation of this for rasters without cleaning which is using r.pack and r.unpack to do the work. Natural extension of this is non-native data like GeoTIFF, however, many things need to be resolved even before that, namely supporting anything else than raster.
CLI
To be able to add more functionality to the CLI, we need subcommands because adding more modifiers to
grass ... --exec
would create an ugly interface. Most basic case, which is just a different syntax to what we have now is:Alternatively, project parameters can go after the subcommand:
Just like with the current CLI, we can add more ways of handling a project:
Finally, we can provide a CLI which hides the project completely:
This is the simpler access to GRASS tools I aimed for. While it relies on GRASS-specific format, it does not require knowing about GRASS project and it is extensible to non-GRASS formats.
The last version without a project works for rasters in my implementation just with a different main command (
python -m grass.app run ...
). The above versions will require further refactoring of lib/init/grass.py to avoid code duplication.Python API again, but standalone
Reusing the implementation of the CLI without a project and the Tools API, a new Python API could work in the same way as Tools, just replacing it by StandaloneTools:
I have this partially implemented. This can hide session and project even now, but it will be even more interesting with FHS and/or conda.
Helper CLI
To handle the GRASS-specific pack format in a straightforward way, a conversion is needed. Leaning into the subcommand, we could do something like this (as a bonus this is similar to new GDAL CLI):
Similar commands would be needed for vector and others. This would make sense even with support for non-GRASS formats in the API because pack will likely be faster.
Current status
The Tools API in in an open PR #2923. The StandaloneTool API and corresponding CLI implementation is in my branch without a PR which now for practical reasons contains also the most up-to-date version of the Tools API:
https://github.com/wenzeslaus/grass/blob/cli-with-pack/python/grass/experimental/standalone.py
https://github.com/wenzeslaus/grass/blob/cli-with-pack/python/grass/experimental/tests/grass_standalone_tools_test.py
https://github.com/wenzeslaus/grass/blob/cli-with-pack/python/grass/experimental/tests/grass_tools_test.py
Questions? Comments?
Beta Was this translation helpful? Give feedback.
All reactions