Skip to content

Commit bf0bc89

Browse files
authored
Gh-497: Update Gremlin deployment Docs (#498)
* update gremlin docs
1 parent 48d2764 commit bf0bc89

File tree

3 files changed

+64
-51
lines changed

3 files changed

+64
-51
lines changed

docs/administration-guide/gaffer-deployment/gremlin.md

+54-38
Original file line numberDiff line numberDiff line change
@@ -28,27 +28,25 @@ traversals are spawned. To do this we recommend utilising the provided
2828
which can be configured to use the Gaffer Tinkerpop implementation so that a
2929
endpoint is available for Gremlin queries.
3030

31-
## Connecting to Any Existing Gaffer Graph
31+
## Connecting to An Existing Accumulo Backed Graph
3232

33-
The simplest way to connect Gremlin to an existing Gaffer instance where you may
34-
not know the Store type or Schema would be via a [Proxy Store](../gaffer-stores/proxy-store.md).
35-
Connecting this way means Gremlin communicates via the Gaffer REST API
36-
(similar to [gafferpy](../../user-guide/apis/python-api.md)) meaning there may
37-
be a performance hit for larger queries.
33+
The recommended way to provide a Gremlin interface to an existing Gaffer
34+
instance is to connect directly to the same [Accumulo store](../gaffer-stores/accumulo-store.md).
35+
Connecting this way means Gremlin communicates in a similar way to the Gaffer
36+
REST API and ensures the fastest performance when using Gremlin (there may still
37+
be a performance hit).
3838

39-
!!! tip
40-
You can also of course connect directly to an existing instance's storage
41-
layer too (e.g. Accumulo store) but this would require a more complex
42-
configuration and knowledge of the Schema.
39+
!!! note
40+
It is possible to attach to other store types in a similar manner usually through
41+
a [proxy store](../gaffer-stores/proxy-store.md) or [federated store](../gaffer-stores/federated-store.md).
4342

4443
The general connection diagram looks something like the following:
4544

4645
```mermaid
4746
flowchart LR
4847
A(["User"])
4948
--> B("Gremlin Server")
50-
--> C(Gaffer Proxy Store)
51-
--> D(Existing Gaffer Instance)
49+
--> C("Accumulo Store")
5250
```
5351

5452
To establish this connection you can make use of the existing `gaffer-gremlin`
@@ -61,44 +59,58 @@ docker pull gchq/gaffer-gremlin:latest
6159
```
6260

6361
!!! note
64-
You will likely need to configure the default `gaffer-gremlin` image to your
62+
You will need to configure the default `gaffer-gremlin` image to your
6563
environment, please continue reading to learn more.
6664

67-
### Configuring the `gaffer-gremlin` Image
65+
### The `gaffer-gremlin` Image
6866

69-
To use the image you will need to provide two configuration files that are specific
70-
to your environment, they are:
67+
To use the image you will need to provide the normal Gaffer configuration files
68+
for to your environment along with a new GafferPop specific file (similar to the
69+
standard graph config JSON) they are:
7170

72-
- `store.properties` - Gaffer store configuration.
73-
- `gafferpop.properties` - Configuration for the Gaffer Tinkerpop library (Gafferpop).
71+
- `store.properties` - Gaffer store configuration, this should match the
72+
existing graph you are connecting to.
73+
- `elements.json` and `types.json` - The schema files for the graph you wish to
74+
connect to.
75+
- `gafferpop.properties` - Configuration for the Gaffer Tinkerpop library
76+
(Gafferpop).
7477

75-
Once these files are configured you can use bind mounts to make them available when running the image:
78+
Please read the subsections below on how to configure these files. Once these
79+
are configured you can use bind mounts to make them available when running the
80+
image:
7681

7782
```bash
7883
docker run \
7984
--name gaffer-gremlin \
8085
--publish 8182:8182 \
81-
--volume store.properties:conf/gaffer/store.properties \
82-
--volume gafferpop.properties:conf/gafferpop/gafferpop.properties \
86+
--volume store.properties:/opt/gremlin-server/conf/gaffer/store.properties \
87+
--volume schema:/opt/gremlin-server/conf/gaffer/schema \
88+
--volume gafferpop.properties:/opt/gremlin-server/conf/gafferpop/gafferpop.properties \
8389
tinkerpop/gremlin-server:latest gremlin-server.yaml
8490
```
8591

86-
#### Configuring the Proxy Store
92+
### Configuring the Store Properties
93+
94+
Starting with the Store properties, this file should be largely identical to
95+
the store properties used on the main Gaffer deployment. The main purpose
96+
of this file is to ensure the same Accumulo cluster is connected to.
8797

88-
Starting with the Proxy Store, this is identical to running a normal [Proxy Store](../gaffer-stores/proxy-store.md)
89-
and involves simply creating a Gaffer `store.properties` file to use. An example
90-
`store.properties` file is given below that will connect to a graph's REST API
91-
running at `https://localhost:8080/rest`:
98+
An example file is given below, please read the specific [Accumulo store](../gaffer-stores/accumulo-store.md)
99+
documentation for more detail:
92100

93101
```properties
94-
gaffer.store.class=uk.gov.gchq.gaffer.proxystore.ProxyStore
95-
# These should be configured to an existing graph deployment
96-
gaffer.host=localhost
97-
gaffer.port=8080
98-
gaffer.context-root=/rest
102+
gaffer.store.class=uk.gov.gchq.gaffer.accumulostore.AccumuloStore
103+
gaffer.store.properties.class=uk.gov.gchq.gaffer.accumulostore.AccumuloProperties
104+
accumulo.instance=accumulo
105+
accumulo.zookeepers=zookeeper
106+
accumulo.user=root
107+
accumulo.password=secret
108+
# General store config
109+
gaffer.cache.service.class=uk.gov.gchq.gaffer.cache.impl.HashMapCacheService
110+
gaffer.store.job.tracker.enabled=true
99111
```
100112

101-
#### Configuring the Gafferpop Library
113+
### Configuring the Gafferpop Library
102114

103115
The `gafferpop.properties`, file is the configuration for the Gaffer
104116
implementation of Tinkerpop (a.k.a Gafferpop). Most of the set up here is for
@@ -109,11 +121,15 @@ would look like the following:
109121
```properties
110122
# The Tinkerpop graph class we should use
111123
gremlin.graph=uk.gov.gchq.gaffer.tinkerpop.GafferPopGraph
112-
gaffer.graphId=graphProxy
124+
gaffer.graphId=existingGraph
113125
gaffer.storeproperties=conf/gaffer/store.properties
114126
gaffer.userId=user01
115127
```
116128

129+
!!! note
130+
It is important the `graphId` here matches the ID of the main graph you
131+
wish to connect to as this controls which Accumulo table is connected to.
132+
117133
Many of these properties in the example above should be self explanatory, a full breakdown of
118134
of the available properties is as follows:
119135

@@ -123,11 +139,11 @@ of the available properties is as follows:
123139
| `gaffer.graphId` | The graph ID of the Tinkerpop graph |
124140
| `gaffer.storeproperties` | The path to the store properties file |
125141
| `gaffer.schemas` | The path to the directory containing the graph schema files |
126-
| `gaffer.userId` | The user ID for the Tinkerpop graph |
127-
| `gaffer.dataAuths` | The data auths for the user to specify what operations can be performed |
128-
| `gaffer.operation.options` | Additional operation options that will be passed to the Tinkerpop graph variables in the form `key:value`
142+
| `gaffer.userId` | The default user ID for the Tinkerpop graph (see the [authentication section](#user-authentication)) |
143+
| `gaffer.dataAuths` | The default data auths for the user to specify what operations can be performed |
144+
| `gaffer.operation.options` | Default `Operation` options in the form `key:value` (this can be overridden per query see [here](../../user-guide/query/gremlin/gremlin.md#custom-features)) |
129145

130-
#### Configuring the Gremlin Server
146+
### Configuring the Gremlin Server
131147

132148
The underlying Gremlin server can also be configured if required. The `gaffer-gremlin`
133149
image comes with an existing YAML configuration based on the example from the
@@ -155,7 +171,7 @@ uk.gov.gchq.gaffer.tinkerpop.gremlinplugin.GafferPopGremlinPlugin: {}
155171
See the [Tinkerpop docs](https://tinkerpop.apache.org/docs/current/reference/#gremlin-server)
156172
for more information on Gremlin server configuration.
157173
158-
##### User Authentication
174+
#### User Authentication
159175
160176
Full user authentication is possible with the Gremlin server using the framework
161177
provided by standard Tinkerpop. The GafferPop implementation provides a

docs/user-guide/query/gremlin/gremlin-limits.md

+9-13
Original file line numberDiff line numberDiff line change
@@ -6,30 +6,26 @@ but some features may also be yet to be implemented.
66

77
Current TinkerPop features not present in the GafferPop implementation:
88

9-
- Property index for allowing unseeded queries (unseeded queries run a `GetAllElements`).
9+
- Unseeded queries run a `GetAllElements` with a configured limit applied,
10+
this limit can be configured per query or will default to 5000.
1011
- Gaffer graphs are readonly to Gremlin queries.
11-
support this.
1212
- TinkerPop Graph Computer is not supported.
1313
- TinkerPop Transactions are not supported.
1414
- TinkerPop Lambdas are not supported.
1515

1616
Current known limitations or bugs:
1717

18-
- Proper user authentication is only available if using a Gremlin server to
19-
connect to the graph.
18+
- Proper user authentication is only available if using a Gremlin server and
19+
the `GafferPopAuthoriser` class.
2020
- Performance compared to standard Gaffer `OperationChain`s will likely be
2121
slower as multiple Gaffer `Operations` may utilised to perform one Gremlin
2222
step.
23-
- The entity group `id` is reserved for an empty group containing only the
24-
vertex ID, this is currently used as a workaround for other limitations.
25-
- When you get the in or out Vertex directly off an Edge it will not contain any
26-
actual properties or be in correct group/label - it just returns a vertex in
27-
the `id` group. This is due to Gaffer allowing multiple entities to be
28-
associated with the source and destination vertices of an Edge.
2923
- The ID of an Edge follows a specific format that is made up of its source and
3024
destination IDs like `[source, dest]`. To use this in a seeded query you must
3125
format it like `g.E("[source, dest]")` or a list like
3226
`g.E(["[source1, dest1]","[source2, dest2]"])`
33-
- Issues seen using `hasKey()` and `hasValue()` in same query.
34-
- May experience issues using the `range()` query function.
35-
- May experience issues using the `where()` query function.
27+
- The entity group `id` is reserved for an empty group containing only the
28+
vertex ID, this is currently used as a workaround for other limitations.
29+
- Chaining `hasLabel()` calls together like `hasLabel("label1").hasLabel("label2")`
30+
will act like an OR rather than an AND in standard Gremlin. This means you
31+
may get results back when you realistically shouldn't.

docs/user-guide/query/gremlin/gremlin.md

+1
Original file line numberDiff line numberDiff line change
@@ -270,3 +270,4 @@ for Gaffer specific options:
270270
| --- | --- | --- |
271271
| `operationOptions` | `g.with("operationOptions", "gaffer.federatedstore.operation.graphIds:graphA").V()` | Allows passing options to the underlying Gaffer Operations, this is the same as the `options` field on a standard JSON query. |
272272
| `getAllElementsLimit` | `g.with("getAllElementsLimit", 100).V()` | Limits the amount of elements returned if performing an unseeded query e.g. a `GetAllElements` operation. |
273+
| `hasStepFilterStage` | `g.with("hasStepFilterStage", "PRE_AGGREGATION").V()` | Controls which phase the filtering from a Gremlin `has()` stage is applied to the results. |

0 commit comments

Comments
 (0)