Skip to content

Commit 4bb5aaa

Browse files
tb06904cn337131wb36499
authored
Document Gremlin integration in the REST API (#522)
* document the new gremlin integration into rest api * Apply suggestions from code review Co-authored-by: cn337131 <[email protected]> * address comments * Update docs/administration-guide/security/user-control.md Co-authored-by: cn337131 <[email protected]> * Apply suggestions from code review Co-authored-by: wb36499 <[email protected]> --------- Co-authored-by: cn337131 <[email protected]> Co-authored-by: wb36499 <[email protected]>
1 parent 0282b0a commit 4bb5aaa

File tree

6 files changed

+304
-271
lines changed

6 files changed

+304
-271
lines changed
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,22 @@
1-
# Connecting Gremlin to Gaffer
1+
# Using Gremlin in Gaffer
22

33
It is possible to use Gremlin as an alternative querying language in Gaffer. To
44
make Gremlin available however, there are some additional steps that need to be
5-
taken to connect to a Gaffer graph via this interface.
5+
taken to ensure it is setup correctly.
66

77
## Overview
88

99
Gremlin serves as a query layer for a graph that implements the Tinkerpop graph
10-
structure. As of v2.1.0 Gaffer has made available a library that can be utilised
11-
to enable Gremlin queries. This library can be included via maven in any project
12-
using the following dependency definition:
10+
structure. As of v2.3.0 Gremlin is in the [Gaffer REST API](./gaffer-docker/gaffer-images.md)
11+
which provides a Websocket based traversal source similar to using a normal
12+
[Gremlin server](https://tinkerpop.apache.org/docs/current/reference/#connecting-gremlin-server).
13+
This is the recommended approach and the easiest way to start using Gremlin on
14+
Gaffer.
15+
16+
If you wish to connect via the Java API, you can utilise the underlying
17+
'GafferPop' library that can be utilised to enable Gremlin queries. This
18+
library can be included via maven in any project using the following dependency
19+
definition:
1320

1421
```xml
1522
<dependency>
@@ -19,193 +26,65 @@ using the following dependency definition:
1926
</dependency>
2027
```
2128

22-
The library contains the graph implementation that allows Tinkerpop to talk to a
23-
Gaffer graph and generally is all that is needed to provide the functionality.
24-
To actually utilise Gremlin queries however, a connection to what's known as a
25-
`GraphTraversalSource` is required which is the class from which Gremlin
26-
traversals are spawned. To do this we recommend utilising the provided
27-
[Gremlin server framework](https://tinkerpop.apache.org/docs/current/reference/#connecting-gremlin-server)
28-
which can be configured to use the Gaffer Tinkerpop implementation so that a
29-
endpoint is available for Gremlin queries.
30-
31-
## Connecting to An Existing Accumulo Backed Graph
29+
Both methods (REST API and Java API) utilise the same library that allows
30+
Tinkerpop to talk to a Gaffer graph. To actually spawn a Gremlin query a
31+
reference to a `GraphTraversalSource` is required, the following sections
32+
outline how to obtain this reference using the REST API.
3233

33-
The recommended way to provide a Gremlin interface to an existing Gaffer
34-
instance is to connect directly to the same [Accumulo store](../gaffer-stores/accumulo-store.md).
35-
Connecting this way means Gremlin communicates in a similar way to the Gaffer
36-
REST API and ensures the fastest performance when using Gremlin (there may still
37-
be a performance hit).
38-
39-
!!! note
40-
It is possible to attach to other store types in a similar manner usually through
41-
a [proxy store](../gaffer-stores/proxy-store.md) or [federated store](../gaffer-stores/federated-store.md).
42-
43-
The general connection diagram looks something like the following:
44-
45-
```mermaid
46-
flowchart LR
47-
A(["User"])
48-
--> B("Gremlin Server")
49-
--> C("Accumulo Store")
50-
```
34+
## Connecting Gremlin
5135

52-
To establish this connection you can make use of the existing `gaffer-gremlin`
53-
OCI image which is an extension of the existing `gremlin-server` image. This
54-
provides the Tinkerpop library which allows users to connect Gaffer graphs as
55-
well as some pre installed configuration to get up and running quickly.
36+
As mentioned previously the recommended way to use Gremlin queries is via the
37+
Websocket in the Gaffer REST API. To do this you will need to provide a config
38+
file that sets up the Gaffer Tinkerpop library (a.k.a 'GafferPop'). The file can
39+
either be added to `/gaffer/gafferpop.properties` in the container, or at a
40+
custom path by setting the `gaffer.gafferpop.properties` key in the
41+
`store.properties` file. This file can be blank but it is still recommended to
42+
setup some default values.
5643

57-
```bash
58-
docker pull gchq/gaffer-gremlin:latest
59-
```
60-
61-
!!! note
62-
You will need to configure the default `gaffer-gremlin` image to your
63-
environment, please continue reading to learn more.
64-
65-
### The `gaffer-gremlin` Image
66-
67-
To use the image you will need to provide the normal Gaffer configuration files
68-
for to your environment along with a new GafferPop specific file (similar to the
69-
standard graph config JSON) they are:
70-
71-
- `store.properties` - Gaffer store configuration, this should match the
72-
existing graph you are connecting to.
73-
- `elements.json` and `types.json` - The schema files for the graph you wish to
74-
connect to.
75-
- `gafferpop.properties` - Configuration for the Gaffer Tinkerpop library
76-
(Gafferpop).
77-
78-
Please read the subsections below on how to configure these files. Once these
79-
are configured you can use bind mounts to make them available when running the
80-
image:
81-
82-
```bash
83-
docker run \
84-
--name gaffer-gremlin \
85-
--publish 8182:8182 \
86-
--volume store.properties:/opt/gremlin-server/conf/gaffer/store.properties \
87-
--volume schema:/opt/gremlin-server/conf/gaffer/schema \
88-
--volume gafferpop.properties:/opt/gremlin-server/conf/gafferpop/gafferpop.properties \
89-
tinkerpop/gremlin-server:latest gremlin-server.yaml
90-
```
91-
92-
### Configuring the Store Properties
44+
!!! tip
45+
Please see the [section below](#configuring-the-gafferpop-library) on how to
46+
configure the GafferPop properties file.
9347

94-
Starting with the Store properties, this file should be largely identical to
95-
the store properties used on the main Gaffer deployment. The main purpose
96-
of this file is to ensure the same Accumulo cluster is connected to.
48+
Once the GafferPop properties file has been added, if you start the REST API a
49+
Gremlin websocket will be available at `localhost:8080/gremlin` by default.
50+
To connect to this socket you must use the [GraphSON v3](https://tinkerpop.apache.org/docs/current/dev/io/#graphson)
51+
format. Most standard Gremlin tools already default to this however, if
52+
connecting using `gremlinpython` you must set it in the driver connection like:
9753

98-
An example file is given below, please read the specific [Accumulo store](../gaffer-stores/accumulo-store.md)
99-
documentation for more detail:
54+
```python
55+
from gremlin_python.driver.serializer import GraphSONSerializersV3d0
10056

101-
```properties
102-
gaffer.store.class=uk.gov.gchq.gaffer.accumulostore.AccumuloStore
103-
gaffer.store.properties.class=uk.gov.gchq.gaffer.accumulostore.AccumuloProperties
104-
accumulo.instance=accumulo
105-
accumulo.zookeepers=zookeeper
106-
accumulo.user=root
107-
accumulo.password=secret
108-
# General store config
109-
gaffer.cache.service.class=uk.gov.gchq.gaffer.cache.impl.HashMapCacheService
110-
gaffer.store.job.tracker.enabled=true
57+
g = traversal().with_remote(DriverRemoteConnection('ws://localhost:8080/gremlin', 'g', message_serializer=GraphSONSerializersV3d0()))
11158
```
11259

113-
### Configuring the Gafferpop Library
60+
## Configuring the GafferPop Library
11461

115-
The `gafferpop.properties`, file is the configuration for the Gaffer
116-
implementation of Tinkerpop (a.k.a Gafferpop). Most of the set up here is for
117-
the construction of the Gafferpop Graph instance which we want to make run with
118-
the `store.properties` we've already configured. An example `gaffer.properties`
119-
would look like the following:
62+
The `gafferpop.properties`, file is the configuration for GafferPop. If using
63+
the REST API there is no mandatory properties you need to set since you already
64+
will have configured the Graph in the existing `store.properties` file. However,
65+
adding some default values in for operation modifiers, such as a limit for
66+
`GetAllElement` operations, is good practice.
12067

12168
```properties
122-
# The Tinkerpop graph class we should use
123-
gremlin.graph=uk.gov.gchq.gaffer.tinkerpop.GafferPopGraph
124-
gaffer.graphId=existingGraph
125-
gaffer.storeproperties=conf/gaffer/store.properties
126-
gaffer.userId=user01
127-
```
128-
129-
!!! note
130-
It is important the `graphId` here matches the ID of the main graph you
131-
wish to connect to as this controls which Accumulo table is connected to.
132-
133-
Many of these properties in the example above should be self explanatory, a full breakdown of
134-
of the available properties is as follows:
135-
136-
| Property Key | Description |
137-
| --- | --- |
138-
| `gremlin.graph` | The Tinkerpop graph class we should use |
139-
| `gaffer.graphId` | The graph ID of the Tinkerpop graph |
140-
| `gaffer.storeproperties` | The path to the store properties file |
141-
| `gaffer.schemas` | The path to the directory containing the graph schema files |
142-
| `gaffer.userId` | The default user ID for the Tinkerpop graph (see the [authentication section](#user-authentication)) |
143-
| `gaffer.dataAuths` | The default data auths for the user to specify what operations can be performed |
144-
| `gaffer.operation.options` | Default `Operation` options in the form `key:value` (this can be overridden per query see [here](../../user-guide/query/gremlin/gremlin.md#custom-features)) |
145-
146-
### Configuring the Gremlin Server
147-
148-
The underlying Gremlin server can also be configured if required. The `gaffer-gremlin`
149-
image comes with an existing YAML configuration based on the example from the
150-
[Tinkerpop repository](https://github.com/apache/tinkerpop/blob/master/gremlin-server/conf/gremlin-server.yaml).
151-
This file should be suitable for most use cases but a custom one can be provided
152-
via a bind mount. If supplying a custom file please ensure you still include the
153-
following sections:
154-
155-
Ensure the `gafferpop.properties` file is set by modifying the `graphs` section like so:
156-
157-
```yaml
158-
graphs: {
159-
graph: conf/gafferpop/gafferpop.properties
160-
}
69+
# Default operation config
70+
gaffer.elements.getalllimit=5000
71+
gaffer.elements.hasstepfilterstage=PRE_AGGREGATION
16172
```
16273

163-
Ensure the Gaffer plugin is loaded for Gremlin which is achieved by adding the
164-
following to the list of plugins in the `plugins` section:
165-
166-
```yaml
167-
uk.gov.gchq.gaffer.tinkerpop.gremlinplugin.GafferPopGremlinPlugin: {}
168-
```
169-
170-
!!! tip
171-
See the [Tinkerpop docs](https://tinkerpop.apache.org/docs/current/reference/#gremlin-server)
172-
for more information on Gremlin server configuration.
173-
174-
#### User Authentication
175-
176-
Full user authentication is possible with the Gremlin server using the framework
177-
provided by standard Tinkerpop. The GafferPop implementation provides a
178-
functional `Authoriser` class that will handle passing the authenticated user to
179-
the underlying Gaffer graph.
180-
181-
To activate user auth with the Gremlin server you must provide the classes you
182-
wish to use in the Gremlin server's YAML file like so:
183-
184-
```yaml
185-
# This should be a deployment specific class
186-
authentication: {
187-
authenticator: uk.gov.gchq.gaffer.tinkerpop.server.auth.ExampleGafferPopAuthenticator
188-
}
189-
# This class is necessary for correctly forwarding the user to Gaffer
190-
authorization: {
191-
authorizer: uk.gov.gchq.gaffer.tinkerpop.server.auth.GafferPopAuthoriser
192-
}
193-
```
194-
195-
The `authorizer` should always be the `GafferPopAuthoriser` as this is what
196-
handles denying invalid queries for GafferPop and passing the user on to the
197-
Gaffer graph for fine grained security.
74+
A full breakdown of the available properties is as follows:
19875

19976
!!! note
200-
The `GafferPopAuthoriser` will deny attempts to set the user ID via a
201-
`with("userId", <id>)` step in the Gremlin query.
202-
203-
The `authenticator` should be a class specific to the auth mechanism for your
204-
deployment e.g. LDAP. An example class `ExampleGafferPopAuthenticator` is
205-
provided as a start point but does not do any actual authenticating so should
206-
**not** be used in production.
207-
208-
!!! tip
209-
Tinkerpop provides some implementaions of `Authenticators` for standard
210-
mechanisms such as [Kerberos](https://tinkerpop.apache.org/javadocs/current/full/org/apache/tinkerpop/gremlin/server/auth/Krb5Authenticator.html).
211-
Please see the [Tinkerpop documentation](https://tinkerpop.apache.org/docs/current/reference/#security) for more info.
77+
Many of these are for standalone GafferPop Graphs so may be ignored if using
78+
the REST API.
79+
80+
| Property Key | Description | Used in REST API |
81+
| --- | --- | --- |
82+
| `gremlin.graph` | The Tinkerpop graph class we should use for construction. | No |
83+
| `gaffer.graphId` | The graph ID of the Tinkerpop graph. | No |
84+
| `gaffer.storeproperties` | The path to the store properties file. | No |
85+
| `gaffer.schemas` | The path to the directory containing the graph schema files. | No |
86+
| `gaffer.userId` | The default user ID for the Tinkerpop graph. | No (User is always set via the [`UserFactory`](../security/user-control.md).) |
87+
| `gaffer.dataAuths` | The default data auths for the user to specify what operations can be performed | No |
88+
| `gaffer.operation.options` | Default `Operation` options in the form `key:value` (this can be overridden per query see [here](../../user-guide/query/gremlin/custom-features.md)) | Yes |
89+
| `gaffer.elements.getalllimit` | The default limit for unseeded queries e.g. `g.V()`. | Yes |
90+
| `gaffer.elements.hasstepfilterstage` | The default stage to apply any `has()` steps e.g. `PRE_AGGREGATION` | Yes |
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,50 @@
1+
# User Authentication
2+
13
!!! info "Work in Progress"
24

3-
This page is under construction.
5+
This page is under construction.
6+
7+
The user authentication layer for Gaffer is currently only enforced by the REST
8+
API. We recommend restricting users such that they do not have access to the
9+
underlying Java API so that all queries are authenticated and executed via the
10+
REST API.
11+
12+
In the REST API the `User` object is constructed via a [`UserFactory`](https://gchq.github.io/Gaffer/uk/gov/gchq/gaffer/rest/factory/UserFactory.html).
13+
In the Spring REST API an abstract implementation of this class is used,
14+
`AbstractUserFactory`, which is then used in the passing of HTTP headers for
15+
authentication.
16+
17+
Currently, there is a single default implementation of this; the
18+
`UnknownUserFactory` which simply returns a new `User` with `UNKNOWN` as the
19+
user ID. To specify the user
20+
factory class define the `gaffer.user.factory.class` [REST property](../gaffer-config/config.md#application-properties).
21+
22+
## Writing a User Factory
23+
24+
To authenticate your users you will need to extend the `AbstractUserFactory` class
25+
to add your chosen authentication mechanism. The hooks will already be in the REST API
26+
to pass the current HTTP headers for each request. Your factory will need to parse these
27+
to construct a new `User` object via the `createUser()` method that reflects the user
28+
making the request. This could involve making a call to an LDAP server or similar
29+
authentication service.
30+
31+
For example, you could use the authorisation header in the request:
32+
33+
```java
34+
public class LdapUserFactory extends AbstractUserFactory {
35+
36+
public User createUser() {
37+
final String authHeaderValue = this.httpHeaders.get(HttpHeaders.AUTHORIZATION); // add logic to fetch userId
38+
final String userId = null; // extract from authHeaderValue
39+
final List<String> opAuths = null; // fetch op auths for userId
40+
final List<String> dataAuths = null; // fetch op auths for userId
41+
42+
// Create and return the Gaffer user
43+
return new User.Builder()
44+
.userId(userId)
45+
.opAuths(opAuths)
46+
.dataAuths(dataAuths)
47+
.build();
48+
}
49+
}
50+
```

0 commit comments

Comments
 (0)