Background

Implementing Person-Matching: Key Steps and Considerations

Integrating a person’s electronic records within and across service delivery sites and supporting activities such as laboratory services and civil registration supports both patient care and population health goals. Consolidating records within and across these systems requires a strategy where individual records from each source are correctly identified as belonging to the same person (i.e., matching process) and multiple records linked (i.e., duplicate record adjudication). Well-devised matching processes can reduce medical errors and help control health care costs, such as by avoiding duplicate health testing and services. In addition, person matching processes facilitate increased accuracy in population-based surveillance, program planning, and research.

There are variations in how health data is collected, and these differences disparities necessitate tailored approaches. It is like making a complex dish in different regions of the world. While the basic recipe may remain the same - gather and prepare ingredients, and cook - the specific ingredients available, the cooking tools, and local consumers' tastes can vary. Similarly, in developing matching processes while the foundational principles universally apply, the specifics must be tailored to the unique aspects of each locale's health data collection systems, norms, and regulations. This is the epitome of best practices in a complex field, akin to a master chef tailoring a recipe to different regions' tastes and available resources. Therefore, while teams can and should draw from established best practices, they must also be prepared to develop bespoke processes tailored to their specific context.

Duplicate records can also be a challenge within one application, such as an electronic medical record (EMR) that supports a clinic. Within a transactional system, such as an EMR, a robust synchronous matching process that correctly identifies a person’s electronic record during the registration or data-entry process can prevent creation of duplicate records for that person. However, synchronous matching is not always possible, necessitating asynchronous matching processes that periodically review the system data to identify and adjudicate duplicate records. Further, asynchronous matching processes are essential to identify the same person across all systems when individual-level data is brought together from multiple sites and data sources.

Several considerations play into the overall matching process and duplicate record adjudication, known as the strategy for data deduplication. These will be discussed further throughout this module and include:

  • Data Variability: Variations can exist in how individuals are identified within and across systems and datasets. Not only can data quality and completeness vary, but different conventions for name, date of birth, and group affiliation might be followed. Further, different types of identifying data, such as national identification numbers, personal characteristics, and biometrics might be used.

  • Personally Identifiable Information (PII): Matching processes require use of sensitive data that can identify an individual, and it must be secured to prevent release during the matching process.

  • Collaboration: Effective matching is also a collaborative process, requiring interactions among stakeholders such as data managers, health care officials, clinic and other service staff, and the individual. Local, national, or international organizations might also have a part.

Despite these challenges, there is substantial need for matching. Meeting this need requires an understanding of the design and implementation of the matching process and awareness of best practices that experienced managers have developed.

Two popular approaches exist for approximating matching algorithms: deterministic and probabilistic.

  • Deterministic methods use rules for match determination, e.g., "Match if first name and last name are identical; otherwise, not a match."

  • In contrast, probabilistic methods compute metrics, e.g., a match score from 0 to 100, indicating the likelihood of field alignment. As normalized match scores span 0-100, probabilistic methods need a decision threshold to categorize 2 records as a "match," e.g., "Declare a match if the score exceeds 95."

Machine learning models have also gained prominence recently. When traditional deterministisc and probabilistic methods do not yield expected results, it is worth considering using machine learning methods. These approaches can be computationally intricate to implement but generally enhance match performance substantially compared to deterministic methods. This approach requires a high level of maintenance to keep the solution up to date. Nevertheless, automated methods without thorough manual review increase the risk of inappropriately identifying separate individuals as the same (i.e., false matches), as well as potentially encountering other issues based on project objectives.

This module is organized into three phases.

  • Phase 1: Characterize Data and Determine Matching Algorithm

  • Phase 2: Implement Matching Algorithm and Establish Adjudication Process for Potential Duplicates

  • Phase 3: Review Matching Algorithm Performance and Refine as Needed

Each phase has multiple components, some of which overlap. Furthermore, some components within the phases may flow sequentially whereas others are performed simultaneously. Lastly, some components, especially during the first phase, may be completed once, whereas others may be repeated until the optimum matching algorithm can be selected.

Matching is an ongoing process that is maintained and improved over time, potentially incorporating other elements, such as biometrics, some of which are addressed in other modules in the Identity Management Toolkit. The comprehensive framework provided here is adaptable to different circumstances and environments so that readers can select whichever parts are needed for their situation.

This module also assumes implementation of information security controls in adherence to all applicable regulations designed to protect confidentiality and privacy of personally identifiable information; the module, Governance to protect PII, provides more detail about this area.

The following steps explore how to determine an appropriate matching algorithm, how to implement the matching strategy, and how to monitor and improve its performance over time.

Last updated