To support efficient fisheries monitoring, control and surveillance (MCS), Starboard brings in vessel data from external vessel registers.
This data is available for corresponding vessels by matching vessel details in RFMO fishing registers with vessel details in the Starboard database.
How Starboard matches vessels from registers
Vessel details in RFMO fishing registers are compared with the vessel details in the Starboard database.
For each vessel in an RFMO list, a matching score is calculated to determine its similarity to each possible vessel match in Starboard.
The total matching score is the average of the individual matching scores for each of the following five attributes: vessel name, flag, Maritime Mobile Service Identity (MMSI), International Maritime Organisation (IMO) number, and call sign.
Each individual score can range from 0% to 100% depending on how well this field matched. If a field is missing in either list then it is given a score of 50%.
Table 1: Example matching for a WCPFC vessel:
| Matching score | ||
Vessel name | SEISHO MARU NO.35 | SEISYOMARU NO.35 | 92% |
Flag | JPN | JPN | 100% |
MMSI |
| 431200208 | 50% |
IMO number | 9032343 |
| 50% |
Call sign | JH3210 | JH3210 | 100% |
Total score (average of individual scores) |
|
| 78% |
How two fields are matched
Matching scores are calculated based on the number of characters that are the same or different between two fields.
To calculate the character differences between two fields, we use the Damerau—Levenshtein distance which measures the minimum number of operations (consisting of insertions, deletions or substitutions of a single character, or transposition of two adjacent characters) required to change one word into the other. The result is a “character (edit) distance”.
This number of operations is subtracted from and then divided by the total number of characters in the RFMO reference field to yield a matching score from 0% (no characters are the same) to 100% (all characters are the same).
For example for the WCPFC vessel in Table 1, the vessel names differ by one character (H → Y) resulting in a matching score of 11/12 = 92%.
Managing inconsistencies in vessel data
AIS and manual entry data from vessel registers is often messy, so some additional preprocessing is done:
We ignore common transcription errors, e.g. I vs. 1 and O vs. 0.
Roman numerals are replaced with Arabic numerals.
Punctuation is removed and capitalisation and white space are ignored.
Additionally for the vessel name field:
We remove abbreviated ship prefixes, as these are used inconsistently and infrequently.
We remove numeric prefixes (e.g. NO. or DAI).
We separate the alphabetic and numeric components of a vessel name and concatenate any numerals to the end of the alpha portion prior to matching.
We ignore diacritics (e.g. character accents).
Example:
WCPFC vessel YOKO MARU NO.18 and Starboard vessel NO18 YOUKOUMARU are converted to YOKOMARU18 and YOUKOUMARU18 respectively prior to matching. These have an edit distance of 2 characters out of a total of 10 characters resulting in a vessel name similarity score of 80%.
Resolving ambiguous matches
There are multiple possible matches for every vessel in the RFMO register. This is because multiple vessels in the Starboard global vessel database may have similar names or other identifying information. The best match is the Starboard vessel with the highest matching score.
Unmatched vessels occur if the vessel is not in the Starboard database or if the AIS vessel data contains incorrect or insufficient information. This could result in vessels which are in the RFMO fishing register but are falsely marked as ‘Not in [Register]’.
On the other hand, in some cases one or more AIS transceivers may broadcast subsequent messages with the same MMSI, but from positions that are physically impossible given required travel times. This may be due to message corruption, intentional spoofing, or an inadvertently misprogrammed MMSI.
During ingestion, Starboard marks these with unique vessel IDs, sometimes causing genuine tracks to split into “two” vessels. In this case, we mark both vessels as “In [Register]” if a match was made to two vessels with identical MMSIs. This avoids the undesirable alternative of showing authorised vessels without their registration information.
Given the above complexities, and often inaccurate or incomplete AIS data we use additional data sources and online research to supplement matching.
Confirming matches with a score threshold
All matches with a score greater than or equal to 70% are automatically included in Starboard. Generally, this corresponds to matching at least 2 out of 5 fields perfectly.
Please report any errors or corrections to support@starboard.nz.