This repository collates the code used to process Vessel Monitoring System (VMS) data and join it to PacFIN fishery landings (i.e., "fish ticket") data for the U.S. West Coast to produce spatial information on fishing activity. These outputs are useful for a wide array of applications in west coast fishery management, such as describing spatial fishing behavior and dynamics, assessing fisheries overlap with protected resources, and attributing drivers of change in west coast fisheries.
The repository is structured into multiple required steps in the processing of VMS and fish ticket data from raw inputs into useful outputs. This process has a number of steps, including multiple QA/QC steps. These steps are described in detail in the associated .Rmd files in the code/pipeline_steps folder. You will run the main_process.Rmd file, which will call each individual data processing step in its own script, which will create a processed_data folder with your output, and a knit document. You can then run move_markdown_html.R to move the knit document to be with the rest of the output. This organization provides separate modules of the individual steps, which makes it easier to develop and debug, while maintaining the overall framework in one document.
Most raw data in this project are large and confidential. This repository therefore does not include any raw data, but refers to these data using the relational command from the here package here::here(). Authorized data users that wish to run or utilize specific pieces of the workflow should obtain the relevant data from one of the moderators of this repository and place it in the raw_data folder. Then, all of the code herein should run without needing to change any file path references.
If it’s your first time running the pipeline (ever or after a break), you will need to:
- Pull from main branch on GitHub
- Adjust files and directory structure that are not tracked on GitHub
The directory structure you have will mostly be set up from pulling these changes, but there are some exceptions that are not tracked on GitHub (they are in .gitignore) that you will have to set up yourself. Those files are:
spatial_data/bathymetry/composite_bath.tifThis file is quite large and is therefore not tracked on GitHub. Visit the link inspatial_data/bathymetry/README_bathymetry-file-link.txtto download it.Confidential/raw_dataThis directory contains confidential information and is therefore not tracked on GitHub. Here are the files and file structure you will need to run the pipeline:raw_data/fish_tickets/all_fishtickets_1994_2023.rdsraw_data/vessel_registration/2009_2023_vesselreg.csvraw_data/vms/vms chunk 1 2009 2010.rdsvms chunk 2 2010 2011 2012.rdsvms chunk 3 2012 2013.rdsvms chunk 4 2013 2014 2015.rdsvms chunk 5 2015 2016.rdsvms chunk 6 2016 2017 2018.rdsvms chunk 7 2018 2019.rdsvms chunk 8 2019 2020.rdsvms chunk 9 2020 2021 2022.rdsvms chunk 10 2022 2023.rdsvms chunk 11 2023.rds
This VMS-fish ticket data processing pipeline is organized in six steps required to clean, match, and interpolate the data for each year. Each step is briefly described here, with the details and step-by-step code available for each year in the pipeline_steps folder.
-
Clean and organize PacFIN fish tickets. This step takes the raw PacFIN fish tickets, checks for errors, extracts and renames the variables of interest (such as gear type and catches of various species), and, most importantly, defines target species for each ticket based on both landed pounds and revenue.
-
Assign vessel lengths to the fish tickets, based on a separate database of vessel registrations and a set of decision rules to address vessels with multiple conflicting records, missing records, etc.
-
Clean and organize raw VMS data. We add unique identification numbers for each VMS record, remove records that are outside our spatial domain (west coast US EEZ), and attach depth via a spatial bathymetry layer.
-
Match the cleaned VMS and fish ticket data. This is one of the more involved steps because it involves making important decisions on how to join VMS data to fish ticket data. At its base, the match is done through the unique vessel identification numbers that are common to both data sources. Then, we assign VMS records to individual fish tickets by their timestamps: the VMS records associated with a fish ticket are those that fall between each ticket and the previous ticket associated with that vessel, or within the lookback window; whichever window of time is shorter. The result of this step is one matched dataset, including the information from both the fish tickets and the VMS data.
-
Filter the matched data. We impose a few filters to remove seemingly erroneous records. The filters include removing trips that do not seem to return to the correct landing port; removing VMS segments whose calculated speed is unrealistically high; removing VMS points that are seemingly on land (have a positive depth value); and removing VMS points that are between fishing trips, i.e. VMS pings from a vessel sitting in port idle between trips.
-
Create an interpolated, regularized version of the matched data. For some applications, analytical methods require spatial records to be evenly distributed across time. We perform linear interpolation of each fishing trip, placing some new VMS points along vessel trajectories such that there is one record exactly every hour.
After the pipeline has run and the markdown document is ready, run move_markdown_html.R to rename and move the knit document into the Confidential output folder.
Here is a flowchart of the VMS pipeline, indicating each step in the pipeline with its corresponding parameters and outputs:
Each individual process step (i.e., Steps 01-06 in the pipeline_steps folder) contains descriptive details on the analytical choices in various steps of the pipeline. However, overall, the pipeline is designed to be general, and the number of choices to be made by the analyst are few. The key initial choices on data processing are contained in the beginning of the main_process file:
| Choice | Parameter | Description |
|---|---|---|
| Species | spp_codes |
Which species do you want landings (weight and revenue) summed for? Note this does not filter which fish tickets are processed, but adds extra columns to the output. For example, if you choose SABL for the species code, the VMS output will include columns SABL_revenue and SABL_landings for the sablefish revenue and landings for each ticket. See options at PacFIN Species Code List. |
| Gear types | gear_codes |
Which gear types would you like all landings (weight and revenue) summed for? Again, note this does not filter which fish tickets are processed, but adds extra columns to the output. For example, if you choose CRAB POT for the gear code, the VMS output will include all_species_rev and all_species_lbs for the crab pot revenue and landings for each fish ticket. See options at PacFIN Gear Code List. |
| Target cutoff | target_cutoff |
Determines how the target species of each trip is calculated. For trips that land multiple species, how much "more important" does your target need to be than the species with the second greatest catch? Expressed as a ratio. |
| Revenue metric | pacfin_revenue_metric |
Which PacFIN-reported revenue metric to use in calculation of landings |
| Weight metric | pacfin_weight_metric |
Which PacFIN-reported weight metric to use in calculation of landings |
| Lookback window | lookback_window |
What is the maximum allowed trip length to attach to a fish ticket (e.g., maximum allowed difference between first and last VMS pings associated with a trip). |
Parameter details
- Species code refers to the
NOMINAL_TO_ACTUAL_PACFIN_SPECIES_CODEattribute or column in the fish ticket data, which does not subdivide groundfish species into additional species codes. For example, dover sole are only listed asDOVR, notDOVRandDVR1as found in thePACFIN_SPECIES_CODEattribute. - Target cutoff uses a ratio to determine the target for a given fishing trip, using revenue for the derived column
TARGET_revand using weight for the derived columnTARGET_lbs. For a species to be considered the target, the ratio between the highest and 2nd highest catch must be greater than or equal to the target threshold. For example, if the threshold is set to1.1, then a species is considered a target if its catch ≥10% more the catch of the next highest catch on that fishing trip. - Lookback window provides a maximum length for a fishing trip, and is set based on the fishery. If the lookback window is too short, the pipeline will miss fishing activity. If the lookback window is too long, the pipeline derived fishing trip will include VMS pings from when vessels transited into a given fishery, which aren't considered active fishing.
The main output of this data analysis pipeline is clean fishery landings data (fish tickets), joined to the relevant spatial information (VMS ping locations) associated with each fishing trip. As the pipeline runs, it produces intermediate outputs, which are especially helpful for checking errors and quality assurance/control (QA/QC). Example file names are listed below, where yyyy refers to the year of data the pipeline was run for.
| Output file name suffix | Description | VMS Pipeline Step |
|---|---|---|
fish_tickets/fishtix_withFTID_yyyy.rds |
Cleaned fish tickets with associated target species | 1 |
vessel_length_ _keys/vessel_length_key_yyyy.rds |
Derived join key between vessel registration data and PacFIN vessel identifiers to get vessel length | 2 |
fish_tickets/fishtix_vlengths_withFTID_yyyy.rds |
Cleaned fish tickets joined with vessel lengths | 2 |
vms/vms_clean_yyyy.rds |
Cleaned VMS data that is cropped to US EEZ, includes bathymetry, excludes records on land, and de-duplicate VMS records | 3 |
vms/duplicates_only_yyyy.rds |
Same as vms_clean, but includes only duplicate records (generated for QA/QC) |
3 |
matched/matched_vmstix_only_withFTID_yyyy.rds |
Cleaned fish tickets joined to cleaned VMS data, excluding trips with not matched VMS data | 4 |
matched/matched_alltix_withFTID_yyyy.rds |
Same as matched_vmstix_only_withFTID, but including trips that are not matched with VMS data (generated for QA/QC) |
4 |
filtered/matched_filtered_withFTID_length_yyyy.rds |
Cleaned fish tickets joined to cleaned VMS data, with filters calculated and applied | 5 |
filtered/matched_unfiltered_yyyy.rds |
Same as matched_filtered_withFTID_length, but filters not applied (generated for QA/QC) |
5 |
interpolated/interpolated_yyyy.rds |
Cleaned and filtered fish ticket data joined to VMS data, with interpolation to regularize the VMS ping interval | 6 |
Which output file should I use?
- To create fishing activity heatmaps, use either
matched_filtered_withFTID_lengthfor non-interpolated data orinterpolatedfor interpolated data. - To calculate the proportion of boats, trips, landings and revenue that was tracked from vessels using VMS transponders, use
matched_alltix_withFTID, which adds a row for each fish ticket that could not be matched to a VMS tracked fishing trip.
This repository is a scientific product and is not official communication of the National Oceanic and Atmospheric Administration, or the United States Department of Commerce. All NOAA GitHub project content is provided on an "as is" basis and the user assumes responsibility for its use. Any claims against the Department of Commerce or Department of Commerce bureaus stemming from the use of this GitHub project will be governed by all applicable Federal law. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by the Department of Commerce. The Department of Commerce seal and logo, or the seal and logo of a DOC bureau, shall not be used in any manner to imply endorsement of any commercial product or activity by DOC or the United States Government.
