Use Case Examples
Single-Site Use Case Examples
Example 1: Register or confirm registration of a patient in a facility’s EHR for provision of care.
In this use case, a patient’s information is entered into the EHR so that individual can receive care at the facility, and the purpose of the deduplication process is to determine whether or not that individual is already in the system. The process is the same for a new or returning patient. To launch the process, the provider conducts a simple search using key identifiers such as last name, first name, and date of birth. (This process typically uses an exact match function, in contrast to complicated matching processes using a sophisticated matching algorithm.) If the patient is not already in the system, the provider adds that individual. If the patient is already in the system, the deduplication process has ensured the patient will not be registered again and thus represented more than once in the EHR.
Example 2: Deduplicate a site’s patient dataset for the purpose of research.
For instance, a researcher may want to know the prevalence of diabetes in the site’s population and, knowing duplicates are sometimes introduced into the EHR, wants to look for those duplicates and collapse those records to make sure the dataset contains only unique patients. If the duplicates were not removed, the prevalence of diabetes in the population would be overestimated. Deduplication thus provides a more accurate count.
Example 3: Deduplicate a site’s patient dataset for administrative/clinical purposes.
Sometimes a single site or facility wants to deduplicate its patient data for its own purposes. Some organizations, aware that their data has too many duplicates, perform the deduplication process simply to clean up their patient data. This may be done on a one-time basis or periodically, and, depending on the organization’s own resources, it may be performed by in-house analysts or by one of the software companies that offer deduplication services.
Examples of Multi-Source Use Cases
Example 4: Linking across service sites (clinical, lab, images, pharmacy, etc.) for longitudinal care management.
This use case is to ensure that clinicians have access to as comprehensive a record as possible in providing care for a patient and submitting requests to support patient care. Having a complete record can, for instance, ensure that tests are not repeated unnecessarily and that newly prescribed medications are compatible with ongoing prescriptions.
Example 5: Linking for project-based research.
In this use case, a researcher is conducting a study using a cohort or population of patients that crosses multiple care sites in order to answer a research question or questions. The goal is to have as complete data as possible to be able to draw the most accurate conclusions.
Example 6: Linking and deduplication for population-based surveillance.
In a syndromic surveillance system for a state or country, for example, the system is receiving data from emergency department (ED) visits in real time, and the same person may receive care in an ED more than once over a short period of time. In the surveillance system’s repository, that person should appear as a single person, not multiples. In this use case, when each ED visit occurs, the health system sends registration data on the person to the surveillance system, which then checks, using a matching algorithm, whether that person exists in the repository or not. If the person does, then any additional data will be added under that person’s identity; if not, then the person will be added to the system as a new patient. Duplicates would be created if the person is not identified as having been registered before.
Example 7: Linking and deduplication for a national data repository.
In this use case, new clinical data is sent to the national data repository with the patient’s identity, and that identity is then searched for or a match is attempted to see if the patient already has data in it. If so, then the new data is added to the existing record; if no prior records exist, then a new identity is added to the repository.
Example 8: Linking and deduplication for a health information exchange.
A health information exchange (HIE) is a network across sites that may or may not use the same record system but have agreed to share patient data at the community, state, regional, national, or organizational level. Though an HIE use case may be performed in the context of care delivery, public health, or analytics, this example falls into the analytics category as it is for administrative purposes. In this use case, the HIE wants to identify the geographic locations where patients are achieving poor outcomes, so that the health system can build a new clinic in the region where there are greatest health needs. The datasets for all the HIE’s clinical sites would be linked and deduplicated to provide an analysis that is complete and accurate, without overreporting for multiple visits by the same patients.
Possible other examples to be added:
Linking and deduplication for national/regional data networks.
Linking and deduplication for patient registries with focus on longitudinal data
Note about multi-site use cases:
In these and other multi-site use cases, it is important to understand that cross-site matching introduces complexities absent in single-site matching. Different sites or data sources may use varying data standards, formats, and identifiers, which may increase matching complexity. Cross-site linking also raises data governance and privacy issues, as well as questions about the reliability of shared identifiers.
Use Cases for Privacy-Preserving Record Linkage (PPRL)
In some countries, patient identifiers need to be anonymized for matching because of national laws or policies. This situation may require a use case for privacy-preserving record linkage (PPRL)—a preliminary stage to linking that may be used to support either care delivery, analytics, or public health surveillance. Some countries, for example, don’t allow a national ID number to be used in health care. Others don’t allow identifiers to be stored in clear text or to be transferred from the point-of-care system. If identifiers need to be kept in the point-of-care system, PPRL is needed to preserve privacy and allow use of the dataset in multi-source linkages.
PPRL methods protect sensitive and personally identifiable demographic data by obfuscating patient data before a linkage is performed, reducing the exposure of clear-text data and allowing records to be linked without revealing identifying information. These methods may include encryption (which converts plain text into a scrambled, unreadable format known as ciphertext) and hashing (which converts data of any size into a fixed-size hash value or hash code, which acts as a digital fingerprint of the data), as well as PPRL tokens, which are combinations of demographic and other identifier variables that define a match between two records. For example, one token may be defined as agreement on {last name+first initial of first name+sex+birth date}, while another may be defined as agreement on {patient identifier number+first name}. Tokens can serve as inputs to a variety of matching models.
PPRL can be used for the same types of use cases as with clear-text data. It’s usually done in contexts where people are concerned about the risk of reidentification or the exposure of personal identity. Where it is possible to use PPRL, we’re seeing increasing use of it because it mitigates the risk of reidentifying while still providing the value of being able to link together multiple datasets or, much less often, deduplicate a single dataset.
Last updated