-
Extract Instances from Ontology:
- Extract the instances from the ontology of England and Wales and save them in n-triple files with the type of the class.
- Input:
England.owlandWales.owl - Output:
England_triples.ntandWales_triples.nt
Example of output:
-
Second Data Source (DataSource_1_YAGO):
- Use
OS_matches.ntfrom DataSource_1_YAGO. - Example of the data:
- Use
-
Extract YAGO with Geometry Data:
- Extract YAGO with the geometry data from
extend.OS. This is important for geometry distance. The output contains the URI of the entity and the polygon data. - Input:
OS_extended.ttl - Output:
OS_extended_geometry.ttl
- Extract YAGO with the geometry data from
This step of preprocessing is to extract the actual URIs of entities from the GeoModel and the geometry points and save them in one file. This involves the following steps:
-
Step 1: Extract the OS_id and URI of all instances and save them into an n-triple file (for Wales and England).
- Input:
Wales.owl,England.owl - Output:
instance_geomodel_Wales_OS.nt,instance_geomodel_England_OS.nt
Example of output:
- Input:
-
Step 2: Match the URIs in the geometry folders with the URIs in the .nt files that have OS_id.
- For example:
instance_geomodel_England_OS.nt - The output file should contain the URI of geom and the geometry.
Note: We have geometry folders named
GeometryEnglandandGeometryWalesthat contain files including the URI of the OS and the geometry data (these are extracted in a previous task, seeGeomtryExtraction_readme.txt).- Input 1: Geometry data files such as
GeometryEngland/local_england_ced_geometry.json - Input 2: The file extracted in step 1 for England and Wales such as
instance_geomodel_England_OS.nt - Output: File containing geom URI with the geometry points such as
GeometryEngland/GeoModel_england_ced_geometry.json
- For example:
In the Similarity folder, main_task.py is executed which includes multiple steps:
-
Install Dependencies:
- Install
rdflibandjellyfishto perform label similarity on the RDF data.
- Install
-
Jaro-Winkler Similarity:
-
Calculate the similarity score based on the number of matching characters and the transpositions of characters within a certain prefix length.
-
Threshold:
label_similarity_threshold = 0.55 -
A) England
- Input:
DataSource_1_YAGO.nt,England_triples.nt - Output:
matches_England_jaro.csv - Number of matched entities for England: 7823615
- Input:
-
B) Wales
- Input:
DataSource_1_YAGO.nt,Wales_triples.nt - Output:
matches_Wales_jaro.csv - Number of matched entities for Wales: 73013307
- Input:
-
-
Load Geometry Data for England and Wales:
- Load the geometry data into a dictionary.
-
Coordinate Transformation:
-
Perform coordinate transformation or reprojection for the representation of the geometry data.
-
The geometry point of OS is in projection map (EPSG:27700), while YAGO is EPSG:4326. Convert the OS to the geographical coordinate system (WGS:84) for the geometry distance filter.
-
Input: All the geometry files for England and Wales
-
Output:
england_polygons_wgs84.json,wales_polygons_wgs84.json
-
-
Load YAGO Geometry Data:
- Input:
OS_extended_geometry.ttl(from step 3 in the preprocessing (A) above)
- Input:
-
Apply Geometry Distance Filter:
- Use Euclidean distance between (YAGO and England) and (YAGO and Wales) with a threshold of 0.2.
- Output:
matches_england_geometry.csv,matches_wales_geometry.csv
-
Filtered Results:
- Perform an inner merge on both 'Yago' and 'England' columns and on both 'Yago' and 'Wales' columns.
- Result:
merged_df_wales_matches.json,merged_df_england_matches.json
Some files necessary to run the code are big and available on request: muhajabh@cardiff.ac.uk, abdelmotyai@cardiff.ac.uk
Label semilarity output Geomtry Filter Task_Geometry Data and output


