Skip to content

Commit 7a5c386

Browse files
committed
code for D3.7
1 parent 2291b26 commit 7a5c386

File tree

153 files changed

+31307
-1537
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

153 files changed

+31307
-1537
lines changed

.DS_Store

-2 KB
Binary file not shown.

README.md

+62-2
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,62 @@
1-
# GraphAnalysisToolbox
2-
A collection of tools for graph synthesis, processing and analysis
1+
# Supergraph
2+
3+
<img src="https://github.com/Orieus/supergraph/blob/master/figures/supergraph.png" width="400">
4+
5+
**Supergraph** is a generic software for the management and processing of a interrelated collection of multiple graphs.
6+
7+
It can be used to process multiple graphs. Functionality include (but it is not limited to):
8+
9+
1. **Similarity graphs**: generated from node attributes, based on different similarity measures (Jensen-Shannon, Hellinger, L1, L2).
10+
* General implementations based on the `neighbors` module from [scikit-learn](https://scikit-learn.org/stable/).
11+
* Specific implementation for fast computation of Hellinger distances using [Numba](https://numba.pydata.org/) and cuda.
12+
2. **Community detection** algorithms (Louvain, Walktrap, FastGreeedy, Label Propagation)
13+
* Implementations based on [IGraph](https://igraph.org/python/) and [Networkx](https://networkx.github.io/).
14+
3. **Bipartite** graphs from attributes
15+
4. **Transductive graphs**: Graphs generated by connecting target nodes from a bipartite graph. Link weights are computed from the links of a graph connecting the source nodes.
16+
5. **Transitive graphs**, computed as the composition of two bipartite graphs.
17+
6. **Analysis of graph partitions**.
18+
7. **Analysis of graph nodes** (centrality measures, PageRank).
19+
* Implementations based on [Networkx](https://networkx.github.io/).
20+
8. **Edicion tools** for the collection of graphs:
21+
* Create, add, remove graphs
22+
* Subsampling
23+
* Reduction to graphs of equivalence classes
24+
9. **Tools for visualization**:
25+
* Graph layout algorithms.
26+
* Exportation to GEXF format
27+
* Visualization of bipartite graphs (requires [Halo](https://vizuly.io/product/halo/), not included)
28+
29+
30+
## Usage:
31+
32+
### As an application:
33+
34+
The software includes two applications that can be used to generate and manipulate graphs through an interactive menu:
35+
36+
* `mainRDIgraphs.py`: Provides accces to the sofware functionality through an interative menu. It reads the links to the source data from a configuration file (`parameters.yaml`). You would need to edit this file to use other data.
37+
* `mainRDIlab.py`: It uses the software functionality to carry out experiments for analysing RDI corpus collections.
38+
39+
Write
40+
41+
python mainRDIgraphs.py --h
42+
python mainRDIlab.py --h
43+
44+
to see the available options.
45+
46+
### As a sofware package:
47+
48+
The software include several class packages that can be used independently. Classes include (and are not limited to):
49+
50+
* `SimGraph`: Generation of similarity graphs
51+
* `CommunityPlus`: Wrapper to community detection algorithms
52+
* `DataGraph` (requires `SimGraph` and `CommunityPlus`): provides tools for graph processing and analysis.
53+
* `SuperGraph` (requires `DataGraph`): provides tools for handling collections of DataGraph objects, including tools for the generation of new datagraphs.
54+
55+
### Additional information
56+
57+
You can find more detailed information about this software in the [Wiki](https://github.com/Orieus/supergraph/wiki).
58+
59+
This project was initially conceived for the processing of multiple corpus of scientific publications, patents and project proposals, inside the project "**Service for Identifying Impact and R&D&I Agent Collaboration Networks**" (*Servicio para Identificar Impacto y Redes de Colaboración de Agentes I+D+i*), funded by the **Secretary of State for the Digital Agenda** (SEAD, Secretaría de Estado para la Agenda Digital), under the umbrella of the Spanish Plan for the Stimulus of Language Tecnologies (PTL, [*Plan de Impulso de las Tecnologías del Lenguaje*](https://www.plantl.gob.es/Paginas/index.aspx)).
60+
61+
62+

classes.dot

+60
Large diffs are not rendered by default.

classes.png

4.4 MB
Loading

classes_tm2.png

Loading
File renamed without changes.

options_menu.yaml renamed to config/options_menu.yaml

+36-17
Original file line numberDiff line numberDiff line change
@@ -64,9 +64,6 @@ load:
6464
setup:
6565
title: Activate configuration file
6666

67-
readData:
68-
title: Read dataset
69-
7067
import_data:
7168
title: Import data
7269
options:
@@ -109,12 +106,14 @@ graph_tools:
109106
- largest_community_subgraph
110107
- remove_isolated_nodes
111108
- remove_snode_attributes
109+
- disambiguate_node
112110

113111
gInference:
114112
title: Graph inference tools
115113
options:
116114
- equivalence_graph
117115
- infer_sim_graph
116+
- import_and_infer_sim_graph
118117
- infer_eq_simgraph
119118
- infer_sim_bigraph
120119
- infer_ppr_graph
@@ -193,12 +192,12 @@ display_graphs:
193192
# Options for import_data
194193

195194
import_snode_from_table:
196-
title: Import nodes and edges from table files
195+
title: Import nodes and features from table files
197196
options:
198197
- get_method: get_names_of_dataset_tables
199198

200199
import_nodes_and_model:
201-
title: Import nodes and feature_matrix into a zero-edge graph
200+
title: Import nodes from table files and features from npz files
202201
options:
203202
- path: topicmodels
204203

@@ -245,10 +244,7 @@ import_agents:
245244
showSDBdata:
246245
title: Show SQL data sources
247246
options:
248-
- parameters:
249-
Pu: Publications
250-
Pr: Projects
251-
Pa: Patents
247+
- get_method: get_names_of_SQL_dbs
252248

253249
manage_Neo4J:
254250
title: Manage Neo4J database
@@ -304,6 +300,9 @@ remove_snode_attributes:
304300
- path: graphs
305301
- get_method: get_attributes
306302

303+
disambiguate_node:
304+
title: Disambiguate node
305+
307306
# ######################
308307
# Options for gInference
309308

@@ -313,7 +312,24 @@ equivalence_graph:
313312
- path: topicmodels
314313

315314
infer_sim_graph:
316-
title: 'Similarity graph: from A_X to A-A'
315+
title: 'Similarity graph: from A_X to A-A'
316+
options:
317+
- path: graphs
318+
- parameters:
319+
He: "He: 1 minus squared Hellinger's distance (JS) (sklearn-based)"
320+
He2: 'He2: self implementation of He (faster)'
321+
BC: 'BC: Bhattacharyya coefficient'
322+
l1: 'l1: 1 minus l1 distance'
323+
JS: 'JS: Jensen-Shannon similarity (too slow)'
324+
Gauss: 'Gauss: An exponential function of the squared l2 distance'
325+
He->JS: 'He->JS: JS through He and a theoretical bound'
326+
He2->JS: 'He2->JS: Same as He->JS, but using implementation He2'
327+
l1->JS: 'l1->JS: JS through l1 and a theoretical bound'
328+
cosine: 'cosine: Cosine similarity'
329+
ncosine: 'ncosine: Normalized cosine similarity (rescaled to [0, 1])'
330+
331+
import_and_infer_sim_graph:
332+
title: 'Import and infer Similarity graph: from A_X to A-A'
317333
options:
318334
- path: topicmodels
319335
- parameters:
@@ -327,7 +343,7 @@ infer_sim_graph:
327343
l1->JS: 'l1->JS: JS through l1 and a theoretical bound'
328344

329345
infer_eq_simgraph:
330-
title: 'Equivalent Similarity graph: from A_X to eqA-eqA'
346+
title: 'Equivalent Similarity graph: from A_X to eqA-eqA'
331347
options:
332348
- path: topicmodels
333349
- parameters:
@@ -341,7 +357,7 @@ infer_eq_simgraph:
341357
l1->JS: 'l1->JS: JS through l1 and a theoretical bound'
342358

343359
infer_sim_bigraph:
344-
title: 'Similarity bipartite graph: from A_X, B_X to A-B'
360+
title: 'Similarity bipartite graph: from A_X, B_X to A-B'
345361
options:
346362
- get_method: get_graphs_with_features
347363
- get_method: get_graphs_with_features
@@ -355,13 +371,13 @@ infer_ppr_graph:
355371
- path: graphs
356372

357373
inferBGfromA:
358-
title: 'Bipartite graph from attributes: from A_B to A->B'
374+
title: 'Bipartite graph from attributes: from A_B to A->B'
359375
options:
360376
- path: graphs
361377
- get_method: get_attributes
362378

363379
transduce:
364-
title: 'Transductive graph: from A-A->B to B-B'
380+
title: 'Transductive graph: from A-A->B to B-B'
365381
options:
366382
- path: bigraphs
367383
- parameters:
@@ -372,14 +388,14 @@ transduce:
372388
# title: 'Similarity bipartite Graph: from A_X, B_Y to A-B'
373389

374390
inferTransit:
375-
title: 'Transitive graph: from A->B->C to A->C'
391+
title: 'Transitive graph: from A->B->C to A->C'
376392
options:
377393
- path: bigraphs
378394
- path: bigraphs
379395

380396

381-
# ###########################
382-
# Optionas for display_graphs
397+
# ##########################
398+
# Options for display_graphs
383399

384400
graph_layout:
385401
title: Graph layout
@@ -401,6 +417,9 @@ show_top_nodes:
401417
- path: graphs
402418
- get_method: get_local_features
403419

420+
profile_node:
421+
title: Show profile of a given node
422+
404423
# #############################################################################
405424
# LEVEL 3
406425
# #############################################################################

parameters.default.yaml renamed to config/parameters.default.yaml

+29-18
Original file line numberDiff line numberDiff line change
@@ -34,29 +34,39 @@ validate_all_models:
3434
# SQL and Graph DataBases
3535
connections:
3636
SQL:
37-
# Select one and only one db for Pr, Pu and Pa.
37+
# Select the databases to be used in the project.
3838
db_selection:
39-
Pr: db_Pr_FECYT
40-
# Pa: db_Pa_PATSTATS
41-
# Pu: db_Pu_S24Ever # publicacionesScopus
42-
# Co: db_Crunch4Ever
39+
# Each selection must have the form:
40+
# label: db_name
41+
# where label is just a mnemonic used to identify the database, and
42+
# db_name is the name of the database below. For instance, you can
43+
# select a different database for projects, patents, publications and
44+
# companies as
45+
# Pr: db_name01
46+
# Pa: db_name02
47+
# Pu: db_name03
48+
# Co: db_name04
49+
# where db_name01, db_name02, must be the
4350
databases:
44-
# name of the DB as specified when opening the connection
51+
# Here, you can include a complete list of available databases.
52+
# Only those included in db_selection (above) will be connected.
53+
# The key of each DB is the name of the DB as specified when opening
54+
# the connection. For instance:
55+
# db_name01:
56+
# category: Pr # Type of database
57+
# connector: &sql_con mysql # Use & to allow dereferencing
58+
# server: &sql_server hal01.tsc.uc3m.es # Write your server address here
59+
# user: &sql_user username # Write username here
60+
# password: &sql_password xxxxxxxx # Write password here
4561
db_Pr_FECYT:
46-
# the "&"s allow referring to ("dereferencing") the corresponding field below (using "*")
62+
# the "&"s allow referring to ("dereferencing") the corresponding field
63+
# below (using "*")
4764
category: Pr
4865
connector: &sql_con mysql
4966
server: &sql_server hal01.tsc.uc3m.es # Write your server address here
5067
user: &sql_user username # Write username here
5168
password: &sql_password xxxxxxxx # Write password here
5269
# ----
53-
db_Crunch4Ever:
54-
category: Pu
55-
connector: *sql_con
56-
server: localhost
57-
user: *sql_user
58-
password: *sql_password
59-
# ----
6070
db_Pa_PATSTATS:
6171
category: Pa
6272
connector: *sql_con
@@ -95,10 +105,11 @@ connections:
95105
password: *sql_password
96106
port: None,
97107
unix_socket: '/var/run/mysqld/mysqld.sock'
98-
neo4j:
99-
server: xxxxxxx # Write server here
100-
user: neo4j # Write username here
101-
password: xxxxxx # Write password here
108+
# Uncomment an set neo4j parameters if available
109+
# neo4j:
110+
# server: xxxxxxx # Write server here
111+
# user: neo4j # Write username here
112+
# password: xxxxxx # Write password here
102113

103114
# Specify format for the log outputs
104115
logformat:

config/val_menu.yaml

+73
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# This file contains the complete list of options in the main script.
2+
# It must contain at least a root menu with several options, and a description
3+
# of each option in the root menu.
4+
# Each option in any menu should have a description
5+
6+
# ROOT MENU
7+
root:
8+
options:
9+
# - create <Some true options may be missed because the appropriate one
10+
# - load may have been selected by the starting command>
11+
- setup
12+
- show_SuperGraph
13+
- compute_reference_graph
14+
- subsample_reference_graph
15+
- compute_all_sim_graphs
16+
- validate_topic_models
17+
- show_validation_results
18+
- analyze_variability
19+
- show_variability_results
20+
- analyze_scalability
21+
- show_scalability_results
22+
- validate_subtrain_models
23+
- show_subtrain_results
24+
25+
# ##########################
26+
# OPTIONS FROM THE ROOT MENU
27+
create:
28+
title: Create new project
29+
post_opts:
30+
- setup
31+
32+
load:
33+
title: Load existing project
34+
35+
setup:
36+
title: Activate configuration file
37+
38+
show_SuperGraph:
39+
title: Show supergraph structure
40+
41+
compute_reference_graph:
42+
title: Compute reference graph
43+
44+
subsample_reference_graph:
45+
title: Get reference graph by subsampling a large version
46+
47+
compute_all_sim_graphs:
48+
title: Compute all similarity graphs for validation
49+
50+
validate_topic_models:
51+
title: Validate topic models using the reference graph
52+
53+
show_validation_results:
54+
title: Generate graphical validation results
55+
56+
analyze_variability:
57+
title: Validate topic models using semantic variability
58+
59+
show_variability_results:
60+
title: Generate graphical result from the variability analysis
61+
62+
analyze_scalability:
63+
title: Analize the scalability of graph generatio for validation
64+
65+
show_scalability_results:
66+
title: Generate graphical results of the scalability analysis
67+
68+
validate_subtrain_models:
69+
title: Validate subtrained models
70+
71+
show_subtrain_results:
72+
title: Generar graphical results of the subtrained model validation
73+

0 commit comments

Comments
 (0)