You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+11-1Lines changed: 11 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,12 @@ Authors: Dr. Larry Holder, School of Electrical Engineering and Computer Science
6
6
7
7
Support: This material is based upon work supported by the National Science Foundation under Grant No. 1646640.
8
8
9
+
## Running
10
+
11
+
```python3 gsg.py <inputFile.json>```
12
+
13
+
```python3 gExportGraphML.py <graphFile.json>```
14
+
9
15
## Input File
10
16
11
17
The input file is in JSON format. An example is in the file *input.json*.
@@ -36,6 +42,7 @@ A vertex appearing in the *vertices* array of a pattern consists of the followin
36
42
***id**: String identifier for this vertex.
37
43
***new**: Whether this vertex should be created as new ("true"), or mapped to a vertex already written to the stream ("false"). The stream is chosen based on the streams of the incident edges, all of which must be assigned to the same stream.
38
44
***attributes**: A JSON object of name/value pairs, where both the name and value are strings. Attributes are not interpreted in any way, but merely copied to the streams along with the vertex instances. If this vertex is not new, then attributes are ignored (and can be omitted), because the existing attributes on the old vertex will be used.
45
+
***type**: Optional. A string indicating the type of this vertex. If this vertex is not new, then type is ignored, because the existing type of the old vertex will be used.
39
46
40
47
A new vertex is written to a stream just before the earliest edge that involves this vertex is written to the stream. If edges assigned to different streams connect to the same vertex, then that same vertex is written to each stream.
41
48
@@ -51,10 +58,11 @@ An edge appearing in the *edges* array of a pattern consists of the following pr
51
58
***maxOffset**: A non-negative integer (as a string) representing the maximum number of time units after the pattern is chosen that this edge appears in the stream. The actual offset is uniformly distributed over [minOffset,maxOffset].
52
59
***streamNum**: The stream to which this edge (and it's vertices) will be written. Must be an integer (as a string) in the range [1,*numStreams*].
53
60
***attributes**: A JSON object of name/value pairs, where both the name and value are strings. Attributes are not interpreted in any way, but merely copied to the streams along with the edge instances.
61
+
***type**: Optional. A string indicating the type of this edge.
54
62
55
63
Each edge in a pattern can be assigned to a different stream, except that edges connected to a non-new vertex must all be assigned to the same stream. Using this technique, a pattern can be divided up across multiple streams. This is one of the main goals of GSG, that is, to provide test data to see if a graph mining system can find the full pattern by analyzing (or fusing) the individual streams. In terms of fusion, the streams can be easily fused together into one large graph, using the vertex ids as anchors. That is, two vertices from two different streams having the same id, represent the same vertex (or entity).
56
64
57
-
In the event that vertices and edges are scheduled to appear beyond the *duration* of the stream generation, stream generation will continue until all scheduled vertices and edges are written to streams. No new patterns are trigger beyond the *duration* of the stream generation.
65
+
In the event that vertices and edges are scheduled to appear beyond the *duration* of the stream generation, stream generation will continue until all scheduled vertices and edges are written to streams. No new patterns are triggered beyond the *duration* of the stream generation.
58
66
59
67
## Output Stream Files
60
68
@@ -67,6 +75,7 @@ A vertex instance is a JSON object with name "vertex" and whose value is a JSON
67
75
***id**: An integer id (as a string) automatically generated by GSG that uniquely identifies this vertex in all streams in which it appears, and in the instances file if part of an instance of a tracked pattern.
68
76
***attributes**: A JSON object of name/value pairs, where both the name and value are strings. These are merely copied from the attributes defined for this vertex in the input pattern.
69
77
***timestamp**: A string representing the time at which this vertex was written to the stream. The format is dictated by the *outputTimeFormat* parameter.
78
+
***type**: Optional. A string indicating the type of this vertex. Only appears if defined for the corresponding vertex in the pattern.
70
79
71
80
### Edge Instance
72
81
@@ -80,6 +89,7 @@ object with the following properties.
80
89
***attributes**: A JSON object of name/value pairs, where both the name and value are strings. These are merely copied from the attributes defined for this edge in the input pattern.
81
90
***timestamp**: A string representing the time at which this edge was written
82
91
to the stream. The format is dictated by the *outputTimeFormat* parameter.
92
+
***type**: Optional. A string indicating the type of this edge. Only appears if defined for the corresponding edge in the pattern.
self.streamCreationTimes= {} # dictionary with entries {streamNum: creationTime} in time units; one for each stream needing new vertex (empty means exists)
0 commit comments