Skip to content

Commit 2ce0bc2

Browse files
committed
added optional type property to vertices and edges; updated to Python 3
1 parent 576c2ed commit 2ce0bc2

5 files changed

Lines changed: 61 additions & 12 deletions

File tree

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2017 Larry Holder, Washington State University.
3+
Copyright (c) 2017-2020 Larry Holder, Washington State University.
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@ Authors: Dr. Larry Holder, School of Electrical Engineering and Computer Science
66

77
Support: This material is based upon work supported by the National Science Foundation under Grant No. 1646640.
88

9+
## Running
10+
11+
```python3 gsg.py <inputFile.json>```
12+
13+
```python3 gExportGraphML.py <graphFile.json>```
14+
915
## Input File
1016

1117
The input file is in JSON format. An example is in the file *input.json*.
@@ -36,6 +42,7 @@ A vertex appearing in the *vertices* array of a pattern consists of the followin
3642
* **id**: String identifier for this vertex.
3743
* **new**: Whether this vertex should be created as new ("true"), or mapped to a vertex already written to the stream ("false"). The stream is chosen based on the streams of the incident edges, all of which must be assigned to the same stream.
3844
* **attributes**: A JSON object of name/value pairs, where both the name and value are strings. Attributes are not interpreted in any way, but merely copied to the streams along with the vertex instances. If this vertex is not new, then attributes are ignored (and can be omitted), because the existing attributes on the old vertex will be used.
45+
* **type**: Optional. A string indicating the type of this vertex. If this vertex is not new, then type is ignored, because the existing type of the old vertex will be used.
3946

4047
A new vertex is written to a stream just before the earliest edge that involves this vertex is written to the stream. If edges assigned to different streams connect to the same vertex, then that same vertex is written to each stream.
4148

@@ -51,10 +58,11 @@ An edge appearing in the *edges* array of a pattern consists of the following pr
5158
* **maxOffset**: A non-negative integer (as a string) representing the maximum number of time units after the pattern is chosen that this edge appears in the stream. The actual offset is uniformly distributed over [minOffset,maxOffset].
5259
* **streamNum**: The stream to which this edge (and it's vertices) will be written. Must be an integer (as a string) in the range [1,*numStreams*].
5360
* **attributes**: A JSON object of name/value pairs, where both the name and value are strings. Attributes are not interpreted in any way, but merely copied to the streams along with the edge instances.
61+
* **type**: Optional. A string indicating the type of this edge.
5462

5563
Each edge in a pattern can be assigned to a different stream, except that edges connected to a non-new vertex must all be assigned to the same stream. Using this technique, a pattern can be divided up across multiple streams. This is one of the main goals of GSG, that is, to provide test data to see if a graph mining system can find the full pattern by analyzing (or fusing) the individual streams. In terms of fusion, the streams can be easily fused together into one large graph, using the vertex ids as anchors. That is, two vertices from two different streams having the same id, represent the same vertex (or entity).
5664

57-
In the event that vertices and edges are scheduled to appear beyond the *duration* of the stream generation, stream generation will continue until all scheduled vertices and edges are written to streams. No new patterns are trigger beyond the *duration* of the stream generation.
65+
In the event that vertices and edges are scheduled to appear beyond the *duration* of the stream generation, stream generation will continue until all scheduled vertices and edges are written to streams. No new patterns are triggered beyond the *duration* of the stream generation.
5866

5967
## Output Stream Files
6068

@@ -67,6 +75,7 @@ A vertex instance is a JSON object with name "vertex" and whose value is a JSON
6775
* **id**: An integer id (as a string) automatically generated by GSG that uniquely identifies this vertex in all streams in which it appears, and in the instances file if part of an instance of a tracked pattern.
6876
* **attributes**: A JSON object of name/value pairs, where both the name and value are strings. These are merely copied from the attributes defined for this vertex in the input pattern.
6977
* **timestamp**: A string representing the time at which this vertex was written to the stream. The format is dictated by the *outputTimeFormat* parameter.
78+
* **type**: Optional. A string indicating the type of this vertex. Only appears if defined for the corresponding vertex in the pattern.
7079

7180
### Edge Instance
7281

@@ -80,6 +89,7 @@ object with the following properties.
8089
* **attributes**: A JSON object of name/value pairs, where both the name and value are strings. These are merely copied from the attributes defined for this edge in the input pattern.
8190
* **timestamp**: A string representing the time at which this edge was written
8291
to the stream. The format is dictated by the *outputTimeFormat* parameter.
92+
* **type**: Optional. A string indicating the type of this edge. Only appears if defined for the corresponding edge in the pattern.
8393

8494
## Output Instances File
8595

gExportGraphML.py

Lines changed: 28 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,11 @@ def generateGraphML(inputFileName):
1515
'http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">\n')
1616

1717
target.write('<key id="label" for="node" attr.name="label" attr.type="string"/>\n')
18+
target.write('<key id="type" for="node" attr.name="type" attr.type="string"/>\n')
1819
target.write('<key id="timestamp" for="node" attr.name="timestamp" attr.type="string"/>\n')
1920

2021
target.write('<key id="label" for="edge" attr.name="label" attr.type="string"/>\n')
22+
target.write('<key id="type" for="edge" attr.name="type" attr.type="string"/>\n')
2123
target.write('<key id="timestamp" for="edge" attr.name="timestamp" attr.type="string"/>\n')
2224
target.write('<key id="directed" for="edge" attr.name="directed" attr.type="string"/>\n')
2325

@@ -31,9 +33,19 @@ def generateGraphML(inputFileName):
3133
idLine = next(infile) # "id": "27",\n
3234
vid = idLine.split('"id": ')[1][:-2] # "27"
3335
target.write(oneIndent +'<node id=' + vid + '>\n')
34-
36+
37+
#Check for 'type' (optional)
38+
typeLine = next(infile) # "type": "user",\n
39+
if typeLine.startswith(' "type":'):
40+
typeComponents = typeLine.split(' ')
41+
vType = typeComponents[-1][:-2]
42+
target.write(twoIndent +'<data key="type">' + vType + '</data>\n')
43+
attrLine = next(infile)
44+
else:
45+
attrLine = typeLine
46+
3547
#Get Attributes
36-
attrLine = next(infile) # "attributes": {"label": "v8"},\n
48+
#attrLine = next(infile) # "attributes": {"label": "v8"},\n
3749
allAttrs = attrLine.split('"attributes": {')[1][:-3] # "label": "v8"
3850
allAttrArray = allAttrs.split(',')
3951
for attr in allAttrArray:
@@ -58,15 +70,24 @@ def generateGraphML(inputFileName):
5870
srcLine = next(infile)
5971
src = srcLine.split('"source": ')[1][:-2]
6072

61-
6273
# Get Destination
6374
dstLine = next(infile)
6475
dst = dstLine.split('"target": ')[1][:-2]
6576

6677
target.write(oneIndent +'<edge id='+ eId + ' source=' + src + ' target=' + dst + '>\n')
6778

79+
#Check for 'type' (optional)
80+
typeLine = next(infile) # "type": "user",\n
81+
if typeLine.startswith(' "type":'):
82+
typeComponents = typeLine.split(' ')
83+
eType = typeComponents[-1][:-2]
84+
target.write(twoIndent +'<data key="type">' + eType + '</data>\n')
85+
attrLine = next(infile)
86+
else:
87+
attrLine = typeLine
88+
6889
# Get Attributes
69-
attrLine = next(infile) # {"foo": "bar", "label": "e89"},\n
90+
#attrLine = next(infile) # {"foo": "bar", "label": "e89"},\n
7091
allAttrs = attrLine.split('"attributes": {')[1][:-3] # "label": "v8", "label": "e89"
7192
allAttrArray = allAttrs.split(',')
7293
for attr in allAttrArray:
@@ -95,22 +116,22 @@ def generateGraphML(inputFileName):
95116

96117
target.write('</graph>\n')
97118
target.write('</graphml>\n')
98-
print "****Finish: Export " + inputFileName + " to Format = " + exportFormat + ". " +inputFileName + ".grpahml generated \n"
119+
print("****Finish: Export " + inputFileName + " to Format = " + exportFormat + ". " +inputFileName + ".graphml generated \n")
99120

100121
def main():
101122

102123
global exportFormat
103124
exportFormat = "GraphML"
104125

105126
if len(sys.argv) < 2:
106-
print "Usage: python gExportGraphML.py <input json file> [exportFormat=GraphMl]"
127+
print("Usage: python gExportGraphML.py <input json file> [exportFormat=GraphMl]")
107128

108129
if len(sys.argv) > 1:
109130
inputFileName = sys.argv[1]
110131
if len(sys.argv) > 2:
111132
exportFormat = sys.argv[2]
112133

113-
print "****Start: Export " + inputFileName + " to Format = " + exportFormat + "\n"
134+
print("****Start: Export " + inputFileName + " to Format = " + exportFormat + "\n")
114135

115136
if exportFormat == "GraphML":
116137
generateGraphML(inputFileName)

gsg.py

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ def ParseVertices(jsonData):
3939
for jsonVertex in jsonData:
4040
vertex = Vertex()
4141
vertex.id = jsonVertex['id']
42+
if 'type' in jsonVertex:
43+
vertex.type = jsonVertex['type']
4244
if (jsonVertex['new'] == "true"):
4345
vertex.new = True
4446
vertex.attributes = jsonVertex['attributes']
@@ -54,6 +56,8 @@ def ParseEdges(jsonData, vertices):
5456
for jsonEdge in jsonData:
5557
edge = Edge()
5658
edge.id = jsonEdge['id']
59+
if 'type' in jsonEdge:
60+
edge.type = jsonEdge['type']
5761
edge.source = jsonEdge['source']
5862
if (GetVertexById(edge.source, vertices) == None):
5963
print("Error: Source vertex in edge " + edge.id + " not defined")
@@ -152,6 +156,7 @@ def AddPatternInstance(pattern, timeUnit):
152156
patternCreated = True
153157
for edge in pattern.edges:
154158
edgeInstance = EdgeInstance()
159+
edgeInstance.type = edge.type
155160
edgeInstance.directed = edge.directed
156161
edgeInstance.attributes = edge.attributes
157162
edgeInstance.streamNum = edge.streamNum
@@ -234,6 +239,7 @@ def GetVertexInstanceId(vertex, edgeInstance, vertexInstancesDict):
234239
else:
235240
# Create new vertex instance
236241
vertexInstance = VertexInstance()
242+
vertexInstance.type = vertex.type
237243
vertexInstance.attributes = vertex.attributes
238244
if (vertex.new):
239245
gNumVertices += 1
@@ -284,9 +290,9 @@ def GenerateStreams():
284290
global gStreamWrittenTo
285291
gNumVertices = 0
286292
gNumEdges = 0
287-
gStreamVertices = [[] for x in xrange(gParameters.numStreams)] # list of numStreams empty lists
288-
gStreamSchedules = [[] for x in xrange(gParameters.numStreams)] # list of numStreams empty lists
289-
gStreamWrittenTo = [False for x in xrange(gParameters.numStreams)]
293+
gStreamVertices = [[] for x in range(gParameters.numStreams)] # list of numStreams empty lists
294+
gStreamSchedules = [[] for x in range(gParameters.numStreams)] # list of numStreams empty lists
295+
gStreamWrittenTo = [False for x in range(gParameters.numStreams)]
290296
for timeUnit in range(0,gParameters.duration):
291297
for pattern in gPatterns:
292298
if (pattern.probability >= random.uniform(0,1)):
@@ -335,6 +341,8 @@ def WriteVertexInstanceToStream(vertexInstance, streamNum):
335341
gStreamWrittenTo[streamNum-1] = True
336342
streamFile.write(' {"vertex": {\n')
337343
streamFile.write(' "id": "' + str(vertexInstance.id) + '",\n')
344+
if vertexInstance.type:
345+
streamFile.write(' "type": "' + vertexInstance.type + '",\n')
338346
streamFile.write(' "attributes": ' + DictToJSONString(vertexInstance.attributes) + ',\n')
339347
streamFile.write(' "timestamp": "' + TimeStr(vertexInstance.streamCreationTimes[streamNum]) + '"}}')
340348

@@ -347,6 +355,8 @@ def WriteEdgeInstanceToStream(edgeInstance, streamNum):
347355
streamFile.write(' "id": "' + str(edgeInstance.id) + '",\n')
348356
streamFile.write(' "source": "' + str(edgeInstance.source) + '",\n')
349357
streamFile.write(' "target": "' + str(edgeInstance.target) + '",\n')
358+
if edgeInstance.type:
359+
streamFile.write(' "type": "' + edgeInstance.type + '",\n')
350360
streamFile.write(' "attributes": ' + DictToJSONString(edgeInstance.attributes) + ',\n')
351361
if edgeInstance.directed:
352362
streamFile.write(' "directed": "true",\n')

gsgClasses.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,11 +65,14 @@ class Vertex:
6565
def __init__(self):
6666
self.id = ""
6767
self.new = True
68+
self.type = None
6869
self.attributes = {} # dictionary
6970

7071
def prettyprint(self, tab = ''):
7172
print(tab + 'Vertex:')
7273
print(tab + ' id = ' + self.id)
74+
if self.type:
75+
print(tab + ' type = ' + self.type)
7376
if (self.new):
7477
print(tab + ' new = true')
7578
else:
@@ -80,6 +83,7 @@ def prettyprint(self, tab = ''):
8083
class Edge:
8184
def __init__(self):
8285
self.id = ""
86+
self.type = None
8387
self.source = ""
8488
self.target = ""
8589
self.directed = False
@@ -93,6 +97,8 @@ def prettyprint(self, tab = ''):
9397
print(tab + ' id = ' + self.id)
9498
print(tab + ' source = ' + self.source)
9599
print(tab + ' target = ' + self.target)
100+
if self.type:
101+
print(tab + ' type = ' + self.type)
96102
if (self.directed):
97103
print(tab + ' directed = true')
98104
else:
@@ -112,6 +118,7 @@ def __init__(self):
112118
class VertexInstance:
113119
def __init__(self):
114120
self.id = 0
121+
self.type = None
115122
self.attributes = []
116123
self.streamCreationTimes = {} # dictionary with entries {streamNum: creationTime} in time units; one for each stream needing new vertex (empty means exists)
117124

@@ -121,6 +128,7 @@ def __init__(self):
121128
self.source = 0
122129
self.target = 0
123130
self.directed = False
131+
self.type = None
124132
self.attributes = []
125133
self.streamNum = 0
126134
self.creationTime = 0 # in time units

0 commit comments

Comments
 (0)