All Collections
Searching for vessels
How it works: Matching vessels from registers
How it works: Matching vessels from registers
Kelly Rummins avatar
Written by Kelly Rummins
Updated over a week ago

Starboard brings in vessel data from external vessel registers such as the IMO Ship and Company Number Scheme held by S&P Global Market Intelligence, authorisation lists held by regional fisheries management organisation (RFMO) bodies, and the TMT Combined IUU Vessel List.

This data is available for corresponding vessels by matching vessel details in external registers with vessel details reported on AIS.

How Starboard matches vessels from registers

Vessel details in external registers are compared with the vessel details reported on AIS.

For each vessel in an external register, a matching score is calculated to determine its similarity to each possible vessel match in Starboard.

The total matching score is the weighted average of the individual matching scores for each of the following five attributes: vessel name, Maritime Mobile Service Identity (MMSI), International Maritime Organisation (IMO) number, call sign, flag, and vessel type.

Each individual score can range from 0% to 100% depending on how well this field matched. If a field is missing in either list then it is given a score of 50%.

Table 1: Example matching for a WCPFC vessel:

Starboard (AIS)

Matching score

Weight

Vessel name

SEISHO MARU NO.35

SEISYOMARU NO.35

92%

1

MMSI

431200208

50%

1

IMO number

9032343

50%

1

Call sign

JH3210

JH3210

100%

1

Total score (average of individual scores)

78%

1

Flag

JPN

JPN

100%

0.25

Vessel type

Longliner

Fishing

100%

0.25

Total score (weighted average of individual score)

76%

How two fields are matched

Matching scores for Vessel name, MMSI, IMO number and call sign are calculated based on the number of characters that are the same or different between two fields.

To calculate the character differences between two fields, we use the Damerau—Levenshtein distance which measures the minimum number of operations (consisting of insertions, deletions or substitutions of a single character, or transposition of two adjacent characters) required to change one word into the other. The result is a “character (edit) distance”.

This number of operations is subtracted from and then divided by the total number of characters in the registry reference field to yield a matching score from 0% (no characters are the same) to 100% (all characters are the same). For IMO numbers and MMSI values with defined number of digits (7 and 9 respectively), we use these values as the total number of characters.

For example for the WCPFC vessel in Table 1, the vessel names differ by one character (H → Y) resulting in a matching score of 11/12 = 92%.

Managing inconsistencies in vessel data

AIS and manual entry data from vessel registers is often messy, so some additional preprocessing is done:

  • Roman numerals are replaced with Arabic numerals.

  • Punctuation is removed and capitalisation and white space are ignored.

  • For RFMO matching, we ignore common transcription errors, e.g. I vs. 1 and O vs. 0.

Additionally for the vessel name field:

  • We remove abbreviated ship prefixes, as these are used inconsistently and infrequently.

  • We remove numeric prefixes (e.g. NO. or DAI).

  • We separate the alphabetic and numeric components of a vessel name and concatenate any numerals to the end of the alpha portion prior to matching.

  • We ignore diacritics (e.g. character accents).

Example:

WCPFC vessel YOKO MARU NO.18 and Starboard vessel NO18 YOUKOUMARU are converted to YOKOMARU18 and YOUKOUMARU18 respectively prior to matching. These have an edit distance of 2 characters out of a total of 10 characters resulting in a vessel name similarity score of 80%.

Resolving ambiguous matches

There may be multiple possible matches for every vessel in the external register. This is because multiple vessels in the Starboard global vessel database may have similar names or other identifying information. The best match is the Starboard vessel with the highest matching score.

Unmatched vessels occur if the vessel is not in the Starboard database or if the AIS vessel data contains incorrect or insufficient information. This could result in vessels which are:

  • In the RFMO fishing register but are falsely marked as ‘Not in [Register]’.

  • In the IMO database but no information is included.

  • In the TMT Combined IUU Vessel List but do not have the IUU listed or closely linked risk indicator.

Confirming matches with a score threshold

All matches with a score greater than or equal to 70% are automatically included in Starboard. Generally, this corresponds to matching at least 2 out of 5 fields perfectly.

Did this answer your question?