While this tool was originally built to use data pipelines to project participants into "perspective space", I've previously created a "geogrpahic projection" of participants:
https://www.youtube.com/watch?v=IXo-d0tkePE&list=PLMgSnvCsIgoFrVNXlpbEgSDtaJ7q_fx0l&index=11
Prior manual process
I'd like to think how to take the manual process from before, and make it much faster and easy for any polis convo that has statements about region. Here's the context:
You can find all the links here under "San Juan Islands" row: https://docs.google.com/spreadsheets/d/1u8AKvHyzYiyq6_Zcwb_g0S8eVxnCrBYsxksfCrA-bP8/edit?gid=0#gid=0
Here's the high level of the manual process that I did, which was quite specific to this dataset:
- coded each participant as one specific island or "non-islander"
- when multiple options (e.g., people responding "agree" to multiple statements of where they live), I prioritized smaller island identities first
- i extracted boundaries for san juan island from Overpass Turbo API of Open Street Maps: https://overpass-turbo.eu/s/2dw0
-
- this query gets the boundaries of the islands in question, and we export to a large geojson file
[out:json][timeout:25];
{{geocodeArea:San Juan County, Washington}}->.searchArea;
(
relation["place"="island"]["name"~"San Juan|Orcas|Lopez|Shaw"](area.searchArea);
way["place"="island"]["name"~"San Juan|Orcas|Lopez|Shaw"](area.searchArea);
);
out geom;
- i used https://mapshaper.org/ to simplify the shape and compress the size (still geojson), then saved that in repo: https://github.com/patcon/kedro-polislike-pipelines-san-juan-islands/blob/main/data/r7bhuide6netnbr8fxbyh/01_raw/san-juan-islands.minimal.geojson
- we set a bounding box for where we consider "non-islanders" -- just a circle to the right of the islands on the geojson, representing "mainland".
- on each run, once we have a best guess island for each user, and we've labelled everyone else as "Other", we take each participant and randomly choose a point within the boundary of the island/other boundary.
- we render this like a scatter plot, just like we do the other plots
https://github.com/patcon/kedro-polislike-pipelines-san-juan-islands/blob/27d066a1b0395fbd96213357620a1c8d6108f79b/src/kedro_polis_classic/pipelines/geographic/nodes.py#L21-L26
Generally, here's the special pipeline that generates this: https://github.com/patcon/kedro-polislike-pipelines-san-juan-islands/tree/main/src/kedro_polis_classic/pipelines/geographic
- Extract boundaries for san juan island from Overpass Turbo API of Open Street Maps: https://overpass-turbo.eu/s/2dw0
[out:json][timeout:25];
{{geocodeArea:San Juan County, Washington}}->.searchArea;
(
relation["place"="island"]["name"~"San Juan|Orcas|Lopez|Shaw"](area.searchArea);
way["place"="island"]["name"~"San Juan|Orcas|Lopez|Shaw"](area.searchArea);
);
out geom;
Proposed automated process
- fetch all statements from conversation
- fetch conversation metadata (language, topic, description, owner name)
- attempt to overarching geographic area from language and overall statement content
- Re-examine overall statements and extract list of candidate regions mentioned
- easy: countries, states, cities, neighborhoods
- hard: regions that speak languages, urban vs rural of region, ...
- determine if hierachy of regions (if any fall within others)
- first pass, use LLM. e.g., LLM knows that Brockton Village is within Toronto is within Canada.
- write overpass turbo API queries to fetch boundaries of each region
- Execute queries for unknown boundaries
- do second pass as required, for getting hierarchy of regions
- fetch geojson for all boundaries (geojson)
- downscale boundary polygons (still geojson)
- now that we have all the regions and their hierarchy, assign each participant/row to a boundary
- if fully overlapping (nested), smallest nested boundary known is preferred
- if partially overlapping, investigate why #todo
- if non-overlapped, investigate why, but... [indexing on plurality: more voice to core or under-represented identities]
- assume home > work
- assume smaller area > larger area
- ... #todo
While this tool was originally built to use data pipelines to project participants into "perspective space", I've previously created a "geogrpahic projection" of participants:
https://www.youtube.com/watch?v=IXo-d0tkePE&list=PLMgSnvCsIgoFrVNXlpbEgSDtaJ7q_fx0l&index=11
Prior manual process
I'd like to think how to take the manual process from before, and make it much faster and easy for any polis convo that has statements about region. Here's the context:
You can find all the links here under "San Juan Islands" row: https://docs.google.com/spreadsheets/d/1u8AKvHyzYiyq6_Zcwb_g0S8eVxnCrBYsxksfCrA-bP8/edit?gid=0#gid=0
Here's the high level of the manual process that I did, which was quite specific to this dataset:
https://github.com/patcon/kedro-polislike-pipelines-san-juan-islands/blob/27d066a1b0395fbd96213357620a1c8d6108f79b/src/kedro_polis_classic/pipelines/geographic/nodes.py#L21-L26
Generally, here's the special pipeline that generates this: https://github.com/patcon/kedro-polislike-pipelines-san-juan-islands/tree/main/src/kedro_polis_classic/pipelines/geographic
Proposed automated process