📈
Data Use Community
  • HIV Treatment Continuity Technology Intervention Framework (TIF)
    • Outside the Visit
      • Pre Appointment Support Interventions
        • QI-PM Pre Appointment Support
        • Pre-Appointment Reminders (Nigeria)
        • Pre-Appointment Support (South Sudan)
      • Population-Based Scheduling Interventions
        • CMIS Pre Appointment Support & Population Based Scheduling (Eswatini)
      • Congestion Redistribution
        • Lighthouse Trust's Community-based ART Retention and Suppression (CARES) App in Malawi
        • Differentiated Service Delivery Models Support in UgandaEMR
      • Pooling Patient Data
        • Unique Identity (Botswana)
        • Data Analysis and Visualizations (Tanzania)
      • Anticipatory Guidance
    • During the Visit
      • Proactive Adherence Counselling Interventions
        • Missed Appointments Lists (Haiti)
        • AI Predictive Adherence Counseling (South Africa)
        • Machine Learning to Predict Interruption in Treatment (Mozambique)
        • Predictive model for Interruption in Treatment in Patient Treatment Response Dashboard (Nigeria)
      • Reactive Adherence Counseling Interventions
        • Reactive Adherence Counseling (Haiti)
        • Adherence Dashboard (Kenya)
      • Visit Management Interventions
        • EMR Visit Management (Uganda)
    • Missed Appointment Interventions
      • Missed Appointment Reminder
        • Two-way Texting Patient reminders and tracking (Zimbabwe)
        • Patient Reminders and Tracking (Kenya)
        • EMR-ART Missed Appointment Reminder (Ethiopia)
        • Person-Centered Public Health for HIV Treatment (PCPH)
        • Missed Appointment Management (Western Kenya)
        • Rwanda Biomedical Center EMR (RBC EMR)
      • Intensive Outreach Intervention
        • Missed Appointments and Intensive Outreach (Kenya)
        • Patient Tracing (Ethiopia)
        • Identification of Missed Appointments (Malawi)
        • Missed Appointments and Intensive Outreach (Nigeria)
      • Targeted Adherence Support Interventions
        • Enhancing HIV Treatment Continuity: Innovations and Data Use in Kenya's Health Information Systems
  • Patient Identity Management Toolkit
    • Modules
      • Key Considerations in Matching
        • Background
        • Phase 1 - Planning and Analysis
        • Phase 2 - Implementation
        • Phase 3 - Review and Refine
        • Frequently Asked Questions (FAQ)
      • Matching with Biometrics
        • Overview
        • Role in Identity Management
        • Choosing Biometric Characteristics and Modalities
          • Reviewing Studies and Comparisons
          • Reviewing Standards and Guidelines
          • Additional Topics to Consider
        • Trends and Developments
          • Current Trends
          • Future Developments
        • Closing
        • References
        • Glossary
    • Learn from Others
      • Map of Country Implementations
      • Reaching Health Standards and Creating Client Registry in Haiti (2021)
      • Introduction to Biometrics for Patient Identity, Presented by Simprints (2022)
      • Utilizing Biometrics for Unique Patient Identification (UPID) in Côte d’Ivoire (2022)
      • Establishing a Unique Patient Identification (UPI) Framework in Kenya (2023)
      • Malawi Master Patient Index (2023)
      • Piloting a Patient Identity Management System (PIMS) in Haiti (2023)
      • Leveraging Biometrics to Scale a Patient Identity Management System (PIMS) in Nigeria (2023)
      • Leveraging Adaptive Machine Learning Algorithms for Patient Identification in Zimbabwe (2023)
      • OpenHIE23 Meeting in Malawi. Patient Identity Management Collaborative Hackathon. (2023)
      • Strengthening Patient Identity Management (PIM) by Integrating a Client Registry in Rwanda (2023)
      • Patient Identity Management Initiatives in Ethiopia (2023)
      • Patient Identity Management Initiatives in Botswana (2024)
    • References
  • How to Provide Feedback and Input on the TIF and Toolkit
Powered by GitBook
On this page
  1. Patient Identity Management Toolkit
  2. Modules
  3. Key Considerations in Matching

Frequently Asked Questions (FAQ)

Frequent Asked Questions (FAQ)

Q: What are blocking schemes?

A: To improve the efficiency of the probabilistic methods, data blocking methods, or blocking schemes, are used to reduce the search space of comparisons needed between two or more datasets. The blocking scheme divides the datasets into smaller, manageable subsets, known as "blocks." These blocks are constructed based on certain attributes or features shared between records in both datasets. For example, if we have two datasets of people's information, we could create blocks based on the first letter of their last names. All records with the same initial letter will be placed in the same block. When a blocking scheme is applied, only records in the same block are compared.

From Bruce Dickie: The best anecdote is if you were trying to pair up all your socks, and you start by creating small piles of socks that are of the same colors, so now you won't be comparing any red socks with blue socks, but only with other red socks, thus quickly reducing the search space.

In the context of patient matching, you can reduce the search space by almost half by splitting on gender at birth. You can reduce it further by looking only within age brackets.

More efforts can be made but then you run into the risk of accidentally excluding the correct records (false negatives) by making your blocking rules too rigorous.

Q: What are the deterministic and probabilistic linkage methods? How or when can methods be used together?

From Bruce Dikie: Deterministic rules are very simple to follow and implement, and much less computationally expensive. If used on good data, they will produce great results. But it is when data isn't captured well, accuracy starts to dip off.

This is when more probabilistic approaches are required. They are more computationally intensive but can be used to be more flexible in identifying matches. In messy data, they can yield higher accuracy rates.

Deterministic approaches are enough with high quality data, for example when data includes robust unique identifiers like a national ID, SSN, NHS #, etc. Therefore we just search for others instances.

Alternatively, when small amounts of fields are available, probabilistic will not add much more than what deterministic already covers.

If a system like this doesn't exist (which is often the case), we need to use multiple fields and apply various weightings to each field (as all fields can't hold the same weightings towards decision making), whereby uniqueness, data quality and data reliability of each field is factored into calculating a weighting for that field, and a total probabilistic score is produced.

Q: What is the strategy to set the match threshold in probabilistic linkages?

A: From Bruce Dickie: Choosing thresholds requires experience, a bit of trial and error, and some manual adjudication to list out scores and review which scores are accurate and which aren't. From there, you draw thresholds that yield the best results, and that entails which is more important: accidentally rejecting matching records (false negatives) and approving non-matching records (false positives). Depending on the context, ensuring one of them is minimized can be more important than the other. But it is often the case that you can't minimize both of them.

Suggestions for improvement

  • It might be worth including a step in the overall process to brainstorm a long list of scenarios to validate [the performance of linkage method using synthetic data]. This would help from a use case definition perspective but also a testing scenario perspective. Perhaps this doc could itemise some examples of those. Obviously they would need to be nuanced to each different country setting etc. - From Jack Hilton

PreviousPhase 3 - Review and RefineNextMatching with Biometrics

Last updated 10 months ago