Skip to content

Gh-497: Update Gremlin deployment Docs #498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 51 additions & 39 deletions docs/administration-guide/gaffer-deployment/gremlin.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,27 +28,21 @@ traversals are spawned. To do this we recommend utilising the provided
which can be configured to use the Gaffer Tinkerpop implementation so that a
endpoint is available for Gremlin queries.

## Connecting to Any Existing Gaffer Graph
## Connecting to An Existing Accumulo Backed Graph

The simplest way to connect Gremlin to an existing Gaffer instance where you may
not know the Store type or Schema would be via a [Proxy Store](../gaffer-stores/proxy-store.md).
Connecting this way means Gremlin communicates via the Gaffer REST API
(similar to [gafferpy](../../user-guide/apis/python-api.md)) meaning there may
be a performance hit for larger queries.

!!! tip
You can also of course connect directly to an existing instance's storage
layer too (e.g. Accumulo store) but this would require a more complex
configuration and knowledge of the Schema.
The recommended way to provide a Gremlin interface to an existing Gaffer
instance is to connect directly to the same [Accumulo store](../gaffer-stores/accumulo-store.md).
Connecting this way means Gremlin communicates in a similar way to the Gaffer
REST API and ensures the fastest performance when using Gremlin (there may still
be a performance hit).

The general connection diagram looks something like the following:

```mermaid
flowchart LR
A(["User"])
--> B("Gremlin Server")
--> C(Gaffer Proxy Store)
--> D(Existing Gaffer Instance)
--> C("Accumulo Store")
```

To establish this connection you can make use of the existing `gaffer-gremlin`
Expand All @@ -61,44 +55,58 @@ docker pull gchq/gaffer-gremlin:latest
```

!!! note
You will likely need to configure the default `gaffer-gremlin` image to your
You will need to configure the default `gaffer-gremlin` image to your
environment, please continue reading to learn more.

### Configuring the `gaffer-gremlin` Image
### The `gaffer-gremlin` Image

To use the image you will need to provide two configuration files that are specific
to your environment, they are:
To use the image you will need to provide the normal Gaffer configuration files
for to your environment along with a new GafferPop specific file (similar to the
standard graph config JSON) they are:

- `store.properties` - Gaffer store configuration.
- `gafferpop.properties` - Configuration for the Gaffer Tinkerpop library (Gafferpop).
- `store.properties` - Gaffer store configuration, this should match the
existing graph you are connecting to.
- `elements.json` and `types.json` - The schema files for the graph you wish to
connect to.
- `gafferpop.properties` - Configuration for the Gaffer Tinkerpop library
(Gafferpop).

Once these files are configured you can use bind mounts to make them available when running the image:
Please read the subsections below on how to configure these files. Once these
are configured you can use bind mounts to make them available when running the
image:

```bash
docker run \
--name gaffer-gremlin \
--publish 8182:8182 \
--volume store.properties:conf/gaffer/store.properties \
--volume gafferpop.properties:conf/gafferpop/gafferpop.properties \
--volume store.properties:/opt/gremlin-server/conf/gaffer/store.properties \
--volume schema:/opt/gremlin-server/conf/gaffer/schema \
--volume gafferpop.properties:/opt/gremlin-server/conf/gafferpop/gafferpop.properties \
tinkerpop/gremlin-server:latest gremlin-server.yaml
```

#### Configuring the Proxy Store
### Configuring the Store Properties

Starting with the Store properties, this file should be largely identical to
the store properties used on the main Gaffer deployment. The main purpose
of this file is to ensure the same Accumulo cluster is connected to.

Starting with the Proxy Store, this is identical to running a normal [Proxy Store](../gaffer-stores/proxy-store.md)
and involves simply creating a Gaffer `store.properties` file to use. An example
`store.properties` file is given below that will connect to a graph's REST API
running at `https://localhost:8080/rest`:
An example file is given below, please read the specific [Accumulo store](../gaffer-stores/accumulo-store.md)
documentation for more detail:

```properties
gaffer.store.class=uk.gov.gchq.gaffer.proxystore.ProxyStore
# These should be configured to an existing graph deployment
gaffer.host=localhost
gaffer.port=8080
gaffer.context-root=/rest
gaffer.store.class=uk.gov.gchq.gaffer.accumulostore.AccumuloStore
gaffer.store.properties.class=uk.gov.gchq.gaffer.accumulostore.AccumuloProperties
accumulo.instance=accumulo
accumulo.zookeepers=zookeeper
accumulo.user=root
accumulo.password=secret
# General store config
gaffer.cache.service.class=uk.gov.gchq.gaffer.cache.impl.HashMapCacheService
gaffer.store.job.tracker.enabled=true
```

#### Configuring the Gafferpop Library
### Configuring the Gafferpop Library

The `gafferpop.properties`, file is the configuration for the Gaffer
implementation of Tinkerpop (a.k.a Gafferpop). Most of the set up here is for
Expand All @@ -109,11 +117,15 @@ would look like the following:
```properties
# The Tinkerpop graph class we should use
gremlin.graph=uk.gov.gchq.gaffer.tinkerpop.GafferPopGraph
gaffer.graphId=graphProxy
gaffer.graphId=existingGraph
gaffer.storeproperties=conf/gaffer/store.properties
gaffer.userId=user01
```

!!! note
It is important the `graphId` here matches the ID of the main graph you
wish to connect to as this controls which Accumulo table is connected to.

Many of these properties in the example above should be self explanatory, a full breakdown of
of the available properties is as follows:

Expand All @@ -123,11 +135,11 @@ of the available properties is as follows:
| `gaffer.graphId` | The graph ID of the Tinkerpop graph |
| `gaffer.storeproperties` | The path to the store properties file |
| `gaffer.schemas` | The path to the directory containing the graph schema files |
| `gaffer.userId` | The user ID for the Tinkerpop graph |
| `gaffer.dataAuths` | The data auths for the user to specify what operations can be performed |
| `gaffer.operation.options` | Additional operation options that will be passed to the Tinkerpop graph variables in the form `key:value`
| `gaffer.userId` | The default user ID for the Tinkerpop graph (see the [authentication section](#user-authentication)) |
| `gaffer.dataAuths` | The default data auths for the user to specify what operations can be performed |
| `gaffer.operation.options` | Default `Operation` options in the form `key:value` (this can be overridden per query see [here](../../user-guide/query/gremlin/gremlin.md#custom-features)) |

#### Configuring the Gremlin Server
### Configuring the Gremlin Server

The underlying Gremlin server can also be configured if required. The `gaffer-gremlin`
image comes with an existing YAML configuration based on the example from the
Expand Down Expand Up @@ -155,7 +167,7 @@ uk.gov.gchq.gaffer.tinkerpop.gremlinplugin.GafferPopGremlinPlugin: {}
See the [Tinkerpop docs](https://tinkerpop.apache.org/docs/current/reference/#gremlin-server)
for more information on Gremlin server configuration.

##### User Authentication
#### User Authentication

Full user authentication is possible with the Gremlin server using the framework
provided by standard Tinkerpop. The GafferPop implementation provides a
Expand Down
22 changes: 9 additions & 13 deletions docs/user-guide/query/gremlin/gremlin-limits.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,30 +6,26 @@ but some features may also be yet to be implemented.

Current TinkerPop features not present in the GafferPop implementation:

- Property index for allowing unseeded queries (unseeded queries run a `GetAllElements`).
- Unseeded queries run a `GetAllElements` with a configured limit applied,
this limit can be configured per query or will default to 5000.
- Gaffer graphs are readonly to Gremlin queries.
support this.
- TinkerPop Graph Computer is not supported.
- TinkerPop Transactions are not supported.
- TinkerPop Lambdas are not supported.

Current known limitations or bugs:

- Proper user authentication is only available if using a Gremlin server to
connect to the graph.
- Proper user authentication is only available if using a Gremlin server and
the `GafferPopAuthoriser` class.
- Performance compared to standard Gaffer `OperationChain`s will likely be
slower as multiple Gaffer `Operations` may utilised to perform one Gremlin
step.
- The entity group `id` is reserved for an empty group containing only the
vertex ID, this is currently used as a workaround for other limitations.
- When you get the in or out Vertex directly off an Edge it will not contain any
actual properties or be in correct group/label - it just returns a vertex in
the `id` group. This is due to Gaffer allowing multiple entities to be
associated with the source and destination vertices of an Edge.
- The ID of an Edge follows a specific format that is made up of its source and
destination IDs like `[source, dest]`. To use this in a seeded query you must
format it like `g.E("[source, dest]")` or a list like
`g.E(["[source1, dest1]","[source2, dest2]"])`
- Issues seen using `hasKey()` and `hasValue()` in same query.
- May experience issues using the `range()` query function.
- May experience issues using the `where()` query function.
- The entity group `id` is reserved for an empty group containing only the
vertex ID, this is currently used as a workaround for other limitations.
- Chaining `hasLabel()` calls together like `hasLabel("label1").hasLabel("label2")`
will act like an OR rather than an AND in standard Gremlin. This means you
may get results back when you realistically shouldn't.
1 change: 1 addition & 0 deletions docs/user-guide/query/gremlin/gremlin.md
Original file line number Diff line number Diff line change
Expand Up @@ -270,3 +270,4 @@ for Gaffer specific options:
| --- | --- | --- |
| `operationOptions` | `g.with("operationOptions", "gaffer.federatedstore.operation.graphIds:graphA").V()` | Allows passing options to the underlying Gaffer Operations, this is the same as the `options` field on a standard JSON query. |
| `getAllElementsLimit` | `g.with("getAllElementsLimit", 100).V()` | Limits the amount of elements returned if performing an unseeded query e.g. a `GetAllElements` operation. |
| `hasStepFilterStage` | `g.with("hasStepFilterStage", "PRE_AGGREGATION").V()` | Controls which phase the filtering from a Gremlin `has()` stage is applied to the results. |