-
Notifications
You must be signed in to change notification settings - Fork 78
Available transformers
SPADE includes a set of transformers to rewrite the responses of provenance queries. They are described below.
When a file is repeatedly written by a process, a corresponding number of artifact vertices (with different version numbers) appear in the provenance graph. This transformer combines all versions of the file into a single one and removes the version annotation.
When a process repeatedly reads (or writes, respectively) a file, a corresponding number of edges are created. In the context of dependency analysis, a single edge suffices. This transformer merges all read (or write, respectively) edges into a single one representing the flow of data from (or to, respectively) the file.
When a child process (after a fork or clone call) is replaced by another process (via an execve call), the intermediate process is eliminated from the graph. In particular, "parent ---fork/clone---> intermediate ---execve---> child" is replaced by "parent ---fork/clone---> child".
In some cases, it may be preferable to eliminate some of the file artifacts from the provenance graph. For example, particular files, extensions, or subtrees in the filesystem may be deemed of no interest. In such cases, a blacklist can be specified in the SPADE configuration cfg/blacklist.transformer.config. Any artifact with a filename that matches the expression will be removed from the graph (along with all incident edges).
If a file is only modified by a single process and never read by any other process, the writes are deemed ephemeral. This transformer eliminates all such ephemeral writes from the provenance graph.
If a file is only read by a single process and never modified by any other process, the reads are deemed ephemeral. In general, ephemeral reads are of interest. In the special case that the reads are from "garbage" files (such as applications' predefined temporary files), it may be preferable to eliminate them from the graph. This transformer supports the read elimination, using a list of garbage files specified in the SPADE configuration cfg/garbage.transformer.config.
A query response graph may contain portions that are not of interest. For example, it may be preferable to ignore the provenance of the sudo command when returning that of a file created by the program that was executed via sudo. This transformer takes an expression framed over the annotations on vertices. It will prune the subgraphs that flows to or from all matching vertices (with the direction automatically determined by query that gave rise to the response graph).
A file may be renamed or linked to, allowing it to subsequently be referred to by a new name. This transformer can be used to retain the write edge from the process that performed the rename or link operation to the new artifact, while eliminating the analogous read edge from the old artifact and the edge between the old and new artifacts. This simplifies the provenance to reflect only the last name of an artifact.
When a program is instrumented with BEEP1, internal loop execution can be interpreted as unit vertices. In the context of workflow analysis, it may be preferable to abstract away the units. This transformer does this by merging all unit vertices with that of the containing process.
When BEEP1 is used, inter-unit communication may occur through memory addresses that are depicted as artifact vertices in the provenance graph. If this level of detail is not needed, this transformer can be used to abstract away the flows through memory addresses. In particular, memory artifact vertices and the edges representing reads to and from them are eliminated.
This transformer composes several others in a specific order. It can be used to provide results that match those produced by BEEP1. Different transformations must be performed, depending on whether an ancestor or descendant lineage query was executed. The specific transformers, arguments, and order used for each type of query are defined in SPADE's configuration files cfg/beep.backward_search.transformers.config and cfg/beep.forward_search.transformers.config, for ancestors and descendants, respectively. This transformer automatically determines which configuration to use based on the query that gave rise to the response graph being processed.
1Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu, High accuracy attack provenance via binary-based execution partition, 20th Network and Distributed System Security Symposium, 2013.
This material is based upon work supported by the National Science Foundation under Grants OCI-0722068, IIS-1116414, and ACI-1547467. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
- Setting up SPADE
- Storing provenance
-
Collecting provenance
- Across the operating system
- Limiting collection to a part of the filesystem
- From an external application
- With compile-time instrumentation
- Using the reporting API
- Of transactions in the Bitcoin blockchain
- Filtering provenance
- Viewing provenance
-
Querying SPADE
- Illustrative example
- Transforming query responses
- Protecting query responses
- Miscellaneous