Skip to content

Commit 1f3152b

Browse files
committed
Fix Chinese gear pattern to handle spaced names (MIN DONG YU62646-5)
Simplify pattern from `\w+yu\d+-\d+` to `yu\d+-\d+` so it matches both concatenated (MINDONGYU62646-5) and spaced (MIN DONG YU62646-5) variants. The \w+ prefix couldn't cross the space boundary. The simpler pattern is safe because "yu" immediately followed by digits-dash-digits is specific enough to avoid false positives (tested against names like BAYOU QUEEN).
1 parent 74b8fcb commit 1f3152b

1 file changed

Lines changed: 4 additions & 3 deletions

File tree

ais/src/atlantes/machine_annotation/data_annotate_utils.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -89,11 +89,12 @@ def get_ais_vessel_category(activity_descs: Optional[list[str]]) -> list[int]:
8989
#
9090
# These patterns are matched case-insensitively against entity names. They cover:
9191
# - Net identifiers: "net\d+" (NET10), "net\s+\w+" (NET 1, NET D)
92-
# - Chinese fishing gear suffixes: "\w+yu\d+-\d+" (MINPINGYU63036-1).
92+
# - Chinese fishing gear suffixes: "yu\d+-\d+" (MINPINGYU63036-1, MIN DONG YU62646-5).
9393
# Chinese fishing vessels use province+"YU"+registration (e.g. MINPINGYU63036).
9494
# A -N suffix indicates individual nets/gear, not the vessel itself.
9595
# Validated against VHS data: -N suffix names are GEAR 35-75% of the time
96-
# with near-zero FISHING classification.
96+
# with near-zero FISHING classification. Names may have spaces (MIN DONG YU)
97+
# or be concatenated (MINDONGYU), so we match on "yu" followed by digits-dash-digits.
9798
# - Battery/signal indicators: "\d+%" (90%), "\d+V\d+" (8V2 = 8.2V).
9899
# The NVN voltage format is the second most common battery reporting convention
99100
# in gear names (~18% of GEAR records in VHS).
@@ -111,7 +112,7 @@ def get_ais_vessel_category(activity_descs: Optional[list[str]]) -> list[int]:
111112
NAME_PATTERNS_FOR_BUOYS = [
112113
r"net\d+",
113114
r"net\s+\w+",
114-
r"\w+yu\d+-\d+",
115+
r"yu\d+-\d+",
115116
r"fishing gear",
116117
r"\d+%",
117118
r"\d+V\d+",

0 commit comments

Comments
 (0)