Available transformers

SPADE includes a set of transformers to rewrite the responses of provenance queries. They are described below.

NoVersions

When a file is repeatedly written by a process, a corresponding number of artifact vertices (with different version numbers) appear in the provenance graph. This transformer combines all versions of the file into a single one and removes the version annotation.

MergeIO

When a process repeatedly reads (or writes, respectively) a file, a corresponding number of edges are created. In the context of dependency analysis, a single edge suffices. This transformer merges all read (or write, respectively) edges into a single one representing the flow of data from (or to, respectively) the file.

SimpleForks

When a child process (after a fork or clone call) is replaced by another process (via an execve call), the intermediate process is eliminated from the graph. In particular, "parent ---fork/clone---> intermediate ---execve---> child" is replaced by "parent ---fork/clone---> child".

Blacklist

In some cases, it may be preferable to eliminate some of the file artifacts from the provenance graph. For example, particular files, extensions, or subtrees in the filesystem may be deemed of no interest. In such cases, a blacklist can be specified in the SPADE configuration cfg/blacklist.transformer.config. Any artifact with a filename that matches the expression will be removed from the graph (along with all incident edges).

NoEphemeralWrites

If a file is only modified by a single process and never read by any other process, the writes are deemed ephemeral. This transformer eliminates all such ephemeral writes from the provenance graph.

NoEphemeralReads

If a file is only read by a single process and never modified by any other process, the reads are deemed ephemeral. In general, ephemeral reads are of interest. In the special case that the reads are from "garbage" files (such as applications' predefined temporary files), it may be preferable to eliminate them from the graph. This transformer supports the read elimination, using a list of garbage files specified in the SPADE configuration cfg/garbage.transformer.config.

Prune

A query response graph may contain portions that are not of interest. For example, it may be preferable to ignore the provenance of the sudo command when returning that of a file created by the program that was executed via sudo. This transformer takes an expression framed over the annotations on vertices. It will prune the subgraphs that flows to or from all matching vertices (with the direction automatically determined by query that gave rise to the response graph).

LastName

A file may be renamed or linked to, allowing it to subsequently be referred to by a new name. This transformer can be used to retain the write edge from the process that performed the rename or link operation to the new artifact, while eliminating the analogous read edge from the old artifact and the edge between the old and new artifacts. This simplifies the provenance to reflect only the last name of an artifact.

NoUnits

When a program is instrumented with BEEP¹, internal loop execution can be interpreted as unit vertices. In the context of workflow analysis, it may be preferable to abstract away the units. This transformer does this by merging all unit vertices with that of the containing process.

NoMemory

When BEEP¹ is used, inter-unit communication may occur through memory addresses that are depicted as artifact vertices in the provenance graph. If this level of detail is not needed, this transformer can be used to abstract away the flows through memory addresses. In particular, memory artifact vertices and the edges representing reads to and from them are eliminated.

BEEP

This transformer composes several others in a specific order. It can be used to provide results that match those produced by BEEP¹. Different transformations must be performed, depending on whether an ancestor or descendant lineage query was executed. The specific transformers, arguments, and order used for each type of query are defined in SPADE's configuration files cfg/beep.backward_search.transformers.config and cfg/beep.forward_search.transformers.config, for ancestors and descendants, respectively. This transformer automatically determines which configuration to use based on the query that gave rise to the response graph being processed.

¹Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu, High accuracy attack provenance via binary-based execution partition, 20th Network and Distributed System Security Symposium, 2013.

This material is based upon work supported by the National Science Foundation under Grants OCI-0722068, IIS-1116414, and ACI-1547467. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Setting up SPADE
Storing provenance
Collecting provenance
- Across the operating system
- Limiting collection to a part of the filesystem
  - On Linux
  - On macOS
- From an external application
- With compile-time instrumentation
- Using the reporting API
- Of transactions in the Bitcoin blockchain
- Filtering provenance
  - Using filters
  - Available filters
Viewing provenance
- In a graph database
- In a relational database
Querying SPADE
- Illustrative example
- Transforming query responses
  - Using transformers
  - Available transformers
- Protecting query responses
Miscellaneous

Available transformers

NoVersions

MergeIO

SimpleForks

Blacklist

NoEphemeralWrites

NoEphemeralReads

Prune

LastName

NoUnits

NoMemory

BEEP

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally