Types of PIM Use Cases

PIM use cases fall into two broad types based on their scope:

those involving a single site’s dataset, and
those involving datasets from multiple sites or sources .

The word “site” here refers to a point-of-care system or other facility or service that is a single source of data. A single site could, for example, be a local health care clinic. That clinic may provide its datasets to be analyzed with other clinics’ datasets, resulting in a multi-site analysis. Other sources of data may be a national disease registry or health data exchange.

Another way to think about use cases is to categorize them by their context. Three typical contexts are:

point-of-care delivery (continuity of care),
public health (calculation of population indicators), and
analytics (which includes research, administrative evaluations, and other uses not related to clinical care or public health).

Use cases can also be categorized by their objective: deduplication, linking, or both:

In deduplication, data matching processes are used on a single site’s dataset to confirm that multiple records belong to the same person and to avoid or remove redundant data so that one person is represented only once in a dataset. This process typically entails using identifiers (such as last name, first name, date of birth) to determine if, for instance, Kirsten Kumara is the same person as K. H. Kumara. Best practice is to eliminate as many duplicates as possible within one dataset before linking it with other datasets.
In linking, datasets from multiple sites or different data sources are combined to determine patient matches across them. Typically, those datasets consist of records that have already been deduplicated at the site level. Datasets may be from multiple sites where a person has received care or from other data sources such as national disease registries in which the patient is listed.
In deduplication and linking: In some cases, it is necessary for the linkage system to perform both deduplication and linking on datasets from multiple sources to get the most complete and accurate combined dataset possible. The two are most often done simultaneously on the combined dataset or sometimes sequentially if deduplication is needed before the datasets are combined and linked. If the linkage system operator has a high degree of confidence that each dataset has no duplicates, then they can be linked without deduplication; if the operator lacks that confidence, then both deduplication and linking are needed. Most data analysts would rather deduplicate each time they link across sites because it’s best practice to do so, taking the opportunity to look for and remove duplicates whenever possible.

Categorizing use cases in these distinctive ways results in a taxonomy shown in this table, with the specified context in columns and the specified objective in rows (examples noted are described in the next two sections):

CONTEXT

Care Delivery

Public Health

Analytics

Deduplication

(SS example 1)

Deduplication

(SS examples 2 & 3)

Linking

(MS example 4)

Linking

(MS example 5)