New interface to tools #5830

wenzeslaus · 2025-06-04T02:40:45Z

wenzeslaus
Jun 4, 2025
Maintainer

Intro

I'm not happy with gs.run_command and friends and I'm not happy with Module class from grass.pygrass either (I can elaborate on that).

I did some work on creating a new API which converged with work I did for making the individual tools more accessible in command line. To allow for more specific feedback during PR reviews, I need to split the work to multiple PRs. This will hopefully help with clarity in general because as far as the the original goals are concerned, there is a lot of scope creep. On the other hand, individual PRs without the big picture would be hard to evaluate. So, I'm creating this discussion to describe the whole idea, hopefully allowing for better discussion here and more focused discussion in the PRs (once I start creating them).

Python API

Tool call is a function call

Call of a tool is a function call with the corresponding name:

tools = Tools()
tools.r_random_surface(output="surface", seed=42)

While this is syntactic sugar, i.e., it is functionally equivalent with gs.run_command("r.random.surface", output="surface", seed=42) or subprocess.run(["r.random.surface", "output=surface", "seed=42"]), it makes the API to look like any other Python API, completely hiding the subprocess nature of the underlying implementation.

This is similar to grass.pygrass.modules.shortcuts, but it is using a single object which is directly exposed to the user, allowing for additional uses of the object (see, e.g., overwrite below). It avoids the need to import the shortcut objects such as grass.pygrass.modules.shortcuts.raster and the need for "split" syntax for tool category (prefix) and rest of the name (raster.random_surface or r.random_surface depending on how it was imported).

Direct indexing of parsed JSON

For text outputs, we have more and more JSON, so the access to it as simple as possible (similar to the current best shot at this with parse_command(...format="json")):

tools.r_info(map="streams", format="json")["datatype"]

In the background, a text output is parsed and cached if (and only if) the indexing happened, so there is no additional cost unless you decide index the result. This is done through a result object returned by the function. All text outputs are captured by default, so there is no need to set anything like stdout=PIPE to make this work.

Result accessed through attributes

If JSON is not the right thing, additional attributes allow for accessing the result in other ways, including plain text:

tools.g_mapset(flags="p").text == "PERMANENT"

There is an underlying results object similarly to what subprocess.run returns. Our object has attributes tailored to what typical GRASS text outputs are. Most of these are on-the-fly computed properties. I did this part of the work before JSON became prevalent in GRASS, so I included all sorts of outputs including key-value pairs.

Standard input without special parameter

To avoid need for additional parameters or other method names (wanting to keep the function-tool-name correspondence and to avoid thing like stdin=PIPE), I implemented the following syntax:

tools.feed_input_to("13.45,29.96,200").v_in_ascii(
    input="-", output="point", separator=","
)

It looks like method chaining, but it actually returns a completely new object (the objects are lightweight, so the overhead for this is small). While this is okay, it is not as straightforward as I would like. See below for a parameter-based solution.

Common overwrite

With a common object for different calls, we can set overwrite for this group of calls in one place:

tools = Tools(overwrite=True)
tools.r_random_surface(output="surface", seed=42)
tools.r_random_surface(output="surface", seed=43)

Interaction with a session objects

The Tools class takes a session object as a parameter. At this point, any object with an env attribute will work. This allows for isolated tool calls without need to pass env parameter to each tool:

with gs.setup.init("data/nc", env=os.environ.copy()):
    tools = Tools(session=session)
    tools.g_region(rows=10, cols=10)
    tools.r_random_surface(output="surface", seed=42)

The same can be done for environment with an env parameter which is also accepted by Tools.

Python - more (syntactic) sugar

Misspelled tool names

We control access to attributes, so we can search through available tools when a tool is not found:

tools.r_sloppy_respect()
# AttributeError: Tool r.sloppy.respect not found. Did you mean: r.slope.aspect?

I have a working implementation of this.

pygrass-shortcuts-like prefixes

The API already behaves like grass.pygrass.modules.shortcuts in the sense that tool names are functions (see above). If you like the prefix aspect of the shortcuts (@pesekon2), Tools support prefix which allows for creation of objects equivalent to raster, vector, etc. objects of grass.pygrass.modules.shortcuts:

# grass/experimental/tools/shortcuts.py
raster = Tools(prefix="r")

# my_script.py
from grass.experimental.tools.shortcuts import raster as r

r.mapcalc(expression="streams = if(row() > 1, 1, null())")
r.buffer(input="streams", output="buffer", distance=1)
assert r.info(map="streams", format="json")["datatype"] == "CELL"

This, however, looses some of the flexibility of having the Tools object in the user code like passing of overwrite, session, and env. I implemented the prefix as shown above, but I did not add the actual objects.

Anyway, while at it, we could create just a general tool object linked to the global session (environment):

# grass/experimental/tools/shortcuts.py
tools = Tools()

# my_script.py
from grass.experimental.tools.shortcuts import tools

tools.r_mapcalc(expression="streams = if(row() > 1, 1, null())")
tools.r_buffer(input="streams", output="buffer", distance=1)
assert tools.r_info(map="streams", format="json")["datatype"] == "CELL"

Python API - extras

Standard input as parameter

Parameter which is an instance of io.StringIO is translated to "-" (customary command line singling to read stdin used by GRASS tools) and the content of io.StringIO is used as stdin for the subprocess. This replaces all the different input options for stdin without need for an additional parameters (input="text...", input="-", stdin="text...", stdin_="text...", stdin=PIPE, feed_command(...)).

tools.v_edit(
    map="points",
    tool="add",
    input=io.StringIO("P 1 0\n  10 20"),
    flags="n",
)

NumPy arrays as inputs and outputs

By pre-processing the parameters, NumPy arrays can be automatically converted to raster data:

tools.r_slope_aspect(elevation=np.ones((1, 1)), slope="slope")
assert tools.r_info(map="slope", format="json")["datatype"] == "FCELL"

To tell the function I want a return value which is an array, I pass an array type instead of the name. The return value is a single array or a tuple (this is similar to return values for the NumPy universal function):

tools.g_region(rows=2, cols=3)
slope = tools.r_slope_aspect(elevation=np.ones((2, 3)), slope=np.ndarray)
assert slope.shape == (2, 3)

This is currently working, although it is not cleaning the data. This came up during the summit in Raleigh (@lrntct) as a natural next step for this API. Now, I really see this also contributing to writing pytest tests.

Pack files as input and output

While more interesting for the "standalone" usage, rasters packed using r.pack (using .grass_raster extension in the example):

tools = Tools()
tools.g_region(rows=3, cols=3)
assert os.path.exists("dem.grass_raster")
tools.r_slope_aspect(elevation="dem.grass_raster", slope="slope.grass_raster")
assert os.path.exists("slope.grass_raster")

I have an implementation of this for rasters without cleaning which is using r.pack and r.unpack to do the work. Natural extension of this is non-native data like GeoTIFF, however, many things need to be resolved even before that, namely supporting anything else than raster.

CLI

To be able to add more functionality to the CLI, we need subcommands because adding more modifiers to grass ... --exec would create an ugly interface. Most basic case, which is just a different syntax to what we have now is:

grass --project ~/data/nc run r.slope.aspect elevation=elevation slope=slope
grass --project ~/data/nc run r.univar map=slope

Alternatively, project parameters can go after the subcommand:

grass run --project ~/data/nc r.slope.aspect elevation=elevation slope=slope
grass run --project ~/data/nc r.univar map=slope

Just like with the current CLI, we can add more ways of handling a project:

grass --tmp-project EPSG:3358 run r.slope.aspect elevation=~/data/elevation.pack slope=~/data/slope.pack
grass --tmp-project EPSG:3358 run r.univar map=~/data/slope.pack

Finally, we can provide a CLI which hides the project completely:

grass run r.slope.aspect elevation=~/data/elevation.pack slope=~/data/slope.pack
grass run r.univar map=~/data/slope.pack

This is the simpler access to GRASS tools I aimed for. While it relies on GRASS-specific format, it does not require knowing about GRASS project and it is extensible to non-GRASS formats.

The last version without a project works for rasters in my implementation just with a different main command (python -m grass.app run ...). The above versions will require further refactoring of lib/init/grass.py to avoid code duplication.

Python API again, but standalone

Reusing the implementation of the CLI without a project and the Tools API, a new Python API could work in the same way as Tools, just replacing it by StandaloneTools:

import numpy as np

tools = StandaloneTools()
result = tools.r_mapcalc_simple(
    expression="A + B", a=np.full((2, 3), 2), b=np.full((2, 3), 5), output=np.array
)
assert result.shape == (2, 3)
assert np.all(result == np.full((2, 3), 7))

I have this partially implemented. This can hide session and project even now, but it will be even more interesting with FHS and/or conda.

Helper CLI

To handle the GRASS-specific pack format in a straightforward way, a conversion is needed. Leaning into the subcommand, we could do something like this (as a bonus this is similar to new GDAL CLI):

grass raster convert gdal/supported/elevation.tif grass/specific/elevation.pack
grass raster convert grass/specific/elevation.pack gdal/supported/elevation.tif

Similar commands would be needed for vector and others. This would make sense even with support for non-GRASS formats in the API because pack will likely be faster.

Current status

The Tools API in in an open PR #2923. The StandaloneTool API and corresponding CLI implementation is in my branch without a PR which now for practical reasons contains also the most up-to-date version of the Tools API:

https://github.com/wenzeslaus/grass/blob/cli-with-pack/python/grass/experimental/standalone.py
https://github.com/wenzeslaus/grass/blob/cli-with-pack/python/grass/experimental/tests/grass_standalone_tools_test.py
https://github.com/wenzeslaus/grass/blob/cli-with-pack/python/grass/experimental/tests/grass_tools_test.py

Questions? Comments?

pesekon2 · 2025-06-05T04:14:03Z

pesekon2
Jun 5, 2025
Collaborator

Thanks for the overview! It's nice and clear, making all the individual aspects much easier to be perceived. Looking forward for the PR to leave the draft stage. I love especially the io.StringIO trick; the current ways of dealing with input=- were always extremely ugly in my eyes.

Just a quick question:

pygrass-shortcuts-like prefixes

If I understand how it works from the example, then there would be no need for the shortcuts as you could just call Tools() with the prefix specified. Is it so? In that case, it probably wouldn't be such an issue for the common overwrite? Also, you can access only raster tools if Tools(prefix="r")? Sorry if it is already implemented in one of the shadow branches and I could test it instead of asking; I have checked just #2923.

I almost tend to say that it is not worth it to specify prefix when the only thing it gives to you is the "shortcut". But only almost. The thing I like beside the tool name being the same as the "real" one is that your IDE whisperers/Python console then whisper the tools that you are interested in two keys earlier. :-)

(summarizing it like this, it seems like too much work for almost nothing, right? Would be even anybody else than I interested in the shortcuts? Because if not, then I can live without them and the implementation of Tools will be shorter and cleaner).

4 replies

wenzeslaus Jun 5, 2025
Maintainer Author

If you do it in your user code, then yes, Tools(prefix="r", overwrite=True) will work just fine.

I did not implement the code completion into the prototype yet, but because we start with tools = Tools(), tools. syntax generally allows for consoles and IDEs to provide you with all attributes. In grass.pygrass, the way how this was achieved was that there are the group objects such as raster and vector, so you can get their attributes. The new Tools API does not need those group objects to trigger the code completion with a dot. All tools are accessed in the same way through one object, the Tools object.

With this discussion in place, I will start working on the individual PRs, perhaps open the all-in-one PR, so it will be easier to test.

wenzeslaus Jun 5, 2025
Maintainer Author

Try #5843 for the prefix.

pesekon2 Jun 5, 2025
Collaborator

If you do it in your user code, then yes, Tools(prefix="r", overwrite=True) will work just fine.

Great, thanks. Then I would almost say that the shortcuts are not really necessary (well, they were never necessary, but it was a nice extra feature). Whoever wants them can create them with one command. Good enough for me.

With this discussion in place, I will start working on the individual PRs, perhaps open the all-in-one PR, so it will be easier to test.

That will be more hardcore to review but definitely much more pleasant for testing without needs to explanation in type of "well, you are right that this is missing but that's just because it will be included in the next PR".

pesekon2 Jun 5, 2025
Collaborator

Try #5843 for the prefix.

Thanks. Tested, seems fine. I'l include the comment in the corresponding PR.

lrntct · 2025-06-05T22:13:17Z

lrntct
Jun 5, 2025

Nice! The general idea of representing the tools as functions makes more sense than the objects in the pygrass interface.
The StringIO and np.ndarray tricks are cools! On the other hand, feed_input_to() looks weird to me.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

New interface to tools #5830

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

pygrass-shortcuts-like prefixes

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

New interface to tools #5830

Uh oh!

wenzeslaus Jun 4, 2025 Maintainer

Intro

Python API

Tool call is a function call

Direct indexing of parsed JSON

Result accessed through attributes

Standard input without special parameter

Common overwrite

Interaction with a session objects

Python - more (syntactic) sugar

Misspelled tool names

pygrass-shortcuts-like prefixes

Python API - extras

Standard input as parameter

NumPy arrays as inputs and outputs

Pack files as input and output

CLI

Python API again, but standalone

Helper CLI

Current status

Replies: 2 comments · 4 replies

Uh oh!

Uh oh!

pesekon2 Jun 5, 2025 Collaborator

pygrass-shortcuts-like prefixes

Uh oh!

wenzeslaus Jun 5, 2025 Maintainer Author

Uh oh!

wenzeslaus Jun 5, 2025 Maintainer Author

Uh oh!

pesekon2 Jun 5, 2025 Collaborator

Uh oh!

pesekon2 Jun 5, 2025 Collaborator

Uh oh!

lrntct Jun 5, 2025

wenzeslaus
Jun 4, 2025
Maintainer

Replies: 2 comments 4 replies

pesekon2
Jun 5, 2025
Collaborator

wenzeslaus Jun 5, 2025
Maintainer Author

wenzeslaus Jun 5, 2025
Maintainer Author

pesekon2 Jun 5, 2025
Collaborator

pesekon2 Jun 5, 2025
Collaborator

lrntct
Jun 5, 2025