I use this in conjunction with geo-precincts to build a uniformly coded WKT AZ precinct file for mapping in Looker Studio.
This project standardizes Arizona voting precinct identifiers into a uniform 6‑character pctnum code:
<COUNTY_PREFIX><ZERO-PADDED 4-DIGIT PRECINCT NUMBER>
e.g., PM0025 (PIMA county, precinct 25)
It helps with mapping and joining disparate program datasets where precinct fields differ (numeric vs. text, decimals in CSV exports, embedded numbers in names, etc.).
- Loads a CSV of precinct‑level records (typically exported from BigQuery).
- Maps county names → two‑letter prefixes (see table below).
- Extracts the numeric precinct code, handling common formats:
- Plain integers (
25) - Decimals from CSV exports (
25.0) - Text names with trailing digits (e.g.,
"Precinct 87"→0087) — optional regex mode
- Plain integers (
- Builds
pctnumas<prefix><number padded to 4 digits>, e.g.,MO0102. - Flags unparseable rows as
ERROR. - Writes a new CSV with the
pctnumcolumn added.
county_codes = {
'YUMA': 'YU', 'MARICOPA': 'MC', 'SANTA CRUZ': 'SC', 'GILA': 'GI',
'PIMA': 'PM', 'PINAL': 'PN', 'APACHE': 'AP', 'GRAHAM': 'GM',
'LA PAZ': 'LP', 'MOHAVE': 'MO', 'NAVAJO': 'NA', 'COCHISE': 'CH',
'YAVAPAI': 'YA', 'COCONINO': 'CN', 'GREENLEE': 'GN'
}You could also use FIPS county prefixes in this dictionary, but I prefer alphabetical because it lets me easily see at a glance which county any precinct is in.
Assumption: County names in your CSV match these keys (uppercased, spacing as shown). If not, normalize first (e.g.,
df['countyname'] = df['countyname'].str.upper().str.strip()).
-
Input CSV: Exported dataset with at least:
- A county column (e.g.,
countynameorcounty_name) - A precinct column:
- Numeric mode:
precinctcodelike25,25.0 - Regex mode (embedded digits): a string field like
svi_registration_national_precinct_codeorprecinct_namethat ends with digits
- Numeric mode:
- A county column (e.g.,
-
Output CSV: Same rows plus a new
pctnumcolumn. Rows that cannot be parsed getpctnum = 'ERROR'.
Use this when the precinct column is numeric (or numeric‐formatted strings like 25.0).
import pandas as pd
precincts = pd.read_csv('/path/to/RAZA_c4_2024_Door_Attempts.csv')
county_codes = {
'YUMA':'YU','MARICOPA':'MC','SANTA CRUZ':'SC','GILA':'GI','PIMA':'PM','PINAL':'PN',
'APACHE':'AP','GRAHAM':'GM','LA PAZ':'LP','MOHAVE':'MO','NAVAJO':'NA','COCHISE':'CH',
'YAVAPAI':'YA','COCONINO':'CN','GREENLEE':'GN'
}
def extract_pctnum(countyname, precinctcode, county_codes):
try:
precinct_int = int(float(precinctcode)) # handles '25' and '25.0'
except (ValueError, TypeError):
return 'ERROR'
num_part = str(precinct_int).zfill(4)
county_prefix = county_codes.get(str(countyname).upper().strip())
return county_prefix + num_part if county_prefix else 'ERROR'
precincts['pctnum'] = precincts.apply(
lambda row: extract_pctnum(row['countyname'], row['precinctcode'], county_codes),
axis=1
)
precincts.to_csv('/content/modified_precincts.csv', index=False)Examples
countyname='PIMA',precinctcode='25.0'→PM0025countyname='MOHAVE',precinctcode='102'→MO0102
Use this if the precinct identifier is embedded in a string (e.g., "PRECINCT 87"), or for fields like svi_registration_national_precinct_code.
Also skips UNCODED rows (returns ERROR).
import re
import pandas as pd
precincts = pd.read_csv('/path/to/2024_Door_Attempts.csv')
county_codes = { ... } # same dict as above
def extract_pctnum(county_name, precinctcode, county_codes):
s = str(precinctcode).upper()
if 'UNCODED' in s:
return 'ERROR'
# capture 1–3 trailing digits, then zero-pad to 4
match = re.search(r'(\d{1,3})$', s)
if not match:
return 'ERROR'
num_part = match.group(1).zfill(4)
county_prefix = county_codes.get(str(county_name).upper().strip())
return county_prefix + num_part if county_prefix else 'ERROR'
precincts['pctnum'] = precincts.apply(
lambda row: extract_pctnum(row['county_name'], row['svi_registration_national_precinct_code'], county_codes),
axis=1
)
precincts.to_csv('/content/modified_c4precincts.csv', index=False)
print(precincts[['county_name','svi_registration_national_precinct_code','pctnum']].head())Examples
county_name='PIMA',svi_registration_national_precinct_code='PRECINCT 7'→PM0007county_name='MARICOPA',...='PC 123'→MC0123...='UNCODED'→ERROR
After writing your output CSV:
-
Length check
precincts['pctnum'].str.len().value_counts()should show6for valid rows (2 letters + 4 digits). -
Error flag audit
precincts[precincts['pctnum'] == 'ERROR']- Missing or mismatched county names?
- Non‐numeric or missing precinct digits?
- “UNCODED” values?
-
Uniqueness (within county)
precincts.groupby(['countyname','pctnum']).size().loc[lambda s: s>1]— duplicates should be investigated. -
Crosswalk spot‑check
If you maintain a masterpctnumcrosswalk, inner join and review any non‑matches.
-
County string mismatches
Normalize:df['countyname'] = df['countyname'].str.upper().str.strip().
Ensure names match keys exactly (e.g.,"SANTA CRUZ"not"SantaCruz"). -
CSV decimals
BigQuery CSV exports often render integers like25as25.0. The numeric mode handles this viaint(float(...)). -
Leading zeros lost
Never store the numeric precinct as an integer alone; always recomputezfill(4)when generatingpctnum. -
“UNCODED” rows
Regex mode explicitly returnsERRORfor these—decide whether to drop or repair upstream.
Different sources encode precincts inconsistently (numbers, decimals, text labels, or missing). A single normalized pctnum key:
- Simplifies joins across voter file outputs, canvassing exports, and survey data.
- Prevents silent mismatches in mapping and aggregation.
- Keeps downstream models and reports stable.
-
Column names
Update the.apply(...)call to use your column headers (e.g.,county_namevs.countyname). -
County map changes
If a county label differs, either normalize input or add a new mapping key. -
Alternate parsing rules
If your precinct codes don’t end with digits, adjust the regex accordingly (e.g., capture the first run of digits).
# Basic validation after build
ok = precincts.query("pctnum != 'ERROR' and pctnum.str.len() == 6", engine="python")
err = precincts[~precincts.index.isin(ok.index)]
print(f"Valid rows: {len(ok):,} | Errors: {len(err):,}")
if not err.empty:
display(err.head(20))Maintainer: Christina Marikos christina@ruralazaction.org
Scope: Arizona precinct normalization for mapping/joining
Output key: pctnum (2‑letter county prefix + 4‑digit precinct number)