Patient De-identification

Updated: 19 December, 2018
Overview
Following industry best-practices, RECOMIA uses a standards-based approach to de-identification of DICOM images to insure that images are free of protected health information (PHI). Our de-identification process is developed in accordance with the requirements set forth by the Federal Drug Administration (FDA) and the European Medicines Agency (EMA). These requirements are defined the Health Insurance Portability and Accountability Act (HIPAA) section 164.514(b)(2) of the HIPPA Privacy Rule. The standard for de-identification of DICOM objects is defined by the DICOM Standard PS 3.15-2011 Digital Imaging and Communications in Medicine (DICOM), Part 15: Security and System Management Profiles.
 

The de-identification process in RECOMIA happens automatically and all data is de-identified locally at the submitting site before data it is uploaded via secure communication to the RECOMIA servers.

What is Protected Health Information (PHI)?
PHI is defined as "individually identifiable health information". In other words, information that can be used to directly or indirectly identify an individual in relation to the individual’s past, present or future health condition and the provision of health care to the individual. Common types of PHI includes: patient name, address, birth date, social security number, medical and laboratory reports, physician name, hospital name, and date of examination. PHI can be embedded in both DICOM tags and pixel data.
The RECOMIA de-identification process
The process of de-identification, by which PHI are removed from the health information in the data process by RECOMIA, mitigates privacy risks to individuals and thereby supports the use of data for scientific research and other endeavours. The de-identification process in RECOMIA is an automated 3-step process, in which two de-identification methods are deployed: 1) automated redaction of individual PHI identifiers (in DICOM tags and pixel data), and 2) formal determination by a qualified expert (i.e. only a qualified individual can determine when PHI has been properly removed). Both methods are run locally (in your web-browser) which means that no PHI will leave your closed network. Upon successful completion of the de-identification process, the de-identified data is automatically uploaded via secure communication to the RECOMIA servers.
Note, however, that both methods, even when properly applied, yield de-identified data that retains some risk of identification.  Although the risk is very small, it is not zero, and there is a possibility that de-identified data could be linked back to the identity of the patient to which it corresponds. Regardless, neither HIPAA, EMA, and GDPR restricts the use or disclosure of de-identified health information, as it is no longer considered protected health information. Data processed by RECOMIA is an example of de-identified health information.
Step 1: Automated redaction of individual PHI identifiers stored in DICOM tags is the first step in our de-identification process. This step conform to the current DICOM standard to ensure data processed in RECOMIA is transformed using approved reduction techniques such as generalisation of the data by grouping of values into categories, and suppression/masking of data where specific values, or whole records are removed from the dataset. See list of de-identified DICOM tags.
Step 2: The second step in the RECOMIA de-identification process involves our optical character recognition (OCR) engine. In this step all images that are commonly known to store PHI (such as x-ray and mammography) is thoroughly scanned for characters embedded directly in the pixel data. This happens automatically, and in the event that PHI (or what our engine believes is PHI) is detected, the affected data will be invalidated. Invalidated data cannot be uploaded and requires manual expert review, as explained in step 3.
Step 3: Invalid data is automatically sent to manual expert review. During the review, the person responsible for data upload will get access to an OCR report detailing all detected characters. After careful consideration the invalid data can be either omitted from upload, or manually validated by redaction (detected characters are blanked out), or acceptance (in the event the discovered characters does not hold PHI) and in turn proceed to upload.
Step 1 to 3 is performed locally in your web browser, and ensures no data containing PHI is uploaded to RECOMIA servers. After successful de-identification, validated data is uploaded via secure communication to the RECOMIA servers.
Details

Base level de-identification 

Patient Name and Patient ID are either blanked or modified. RECOMIA does not perform ID mapping between the original Patient ID and the ID that the images will have within RECOMIA. Any mapping that is performed manually at the submitting site, is the sole responsibility of the submitting site, and RECOMIA never sees the original Patient ID. Such data is defined as pseudo-de-identified data. To show that the Patient Identity has been removed, the term “YES” is written into DICOM tag 00120062 “PatientIdentityRemoved”.

Exam Identifiers

DICOM makes extensive use of universal identifiers (UID) that could be used to identify a subject if a user had access to the PACS system at the institution where the images originated. RECOMIA uses its own root UID and then removes the original UID. UIDs have no special meaning other than serving as unique identifiers. This technique insures that images stay associated with the appropriate series, study, and subject as well as ensuring that referenced images between secondary capture images, structured reports, PET/CT, etc. are still valid references to images within RECOMIA.

Patient Demographics

The keep Patient Characteristics Option allows keeping some patient demographics for research purposes. The allowed fields are Patient’s Sex, Patient’s Age, Patient’s Size, Patient’s Weight, Ethnic Group, Smoking Status, and Pregnancy Status. If a subject is over 90 years of age, then the age must be listed as 90+.  Allergies, Patient State (this is not where they live, rather their condition), Pre-Medication, and Special Needs are defined by the DICOM standard as “clean” and are kept by RECOMAI and examined for PHI along with all tags during curation. Other patient demographics such as birthdate, address, religious affiliations, etc. are removed or emptied.

Free Text

Free following free text fields are removed by RECOMIA during the curation process: Allergies, Patient State, Study Description, Series Description, Admitting Diagnoses Description, Admitting Diagnoses Code Sequence, Derivation Description, Identifying Comments, Medical Alerts, Occupation, Additional Patient History, Patient Comments, Contrast Bolus Agent, Protocol Name, Acquisition Device Processing Description, Acquisition Comments, Acquisition Protocol Description, Contribution Description, Image Comments, Frame Comments, Reason for Study, Requested Procedure Description, Requested Contrast Agent, Study Comments, Discharge Diagnosis Description, Service Episode Description, Visit Comments, Scheduled Procedure Step Description, Performed Procedure Step Description, Comments on Performed Procedure Step, Requested Procedure Comments, Reason for Imaging Service Request, Imaging Service Request Comments, Interpretation Text, Interpretation Diagnosis Description, Impressions, and Results Comments.

Devices

The Retain Device Identity Option of the DICOM de-identification standard allows for the retention of information related to the scanner used. The option allows for the following relevant tags to be retained: Station Name, Device Serial Number, Device UID, Plate ID, Generator ID, Cassette ID, Gantry ID, Detector ID, Scheduled Study Location, Scheduled Study Location AE Title, Scheduled Station AE Title, Scheduled Station Name, Scheduled Procedure Step Location, Performed Station AE Title, Performed Station Name, Performed Station Name Code Sequence, Scheduled Station Name Code Sequence, Scheduled Station Geographic Location Code Sequence, and Performed Station Geographic Location Code Sequence.  The tags listed above are retained if they are found to be free of PHI after RECOMIA curation of the submitted DICOM objects.

Private Tags

When a submitting site sends DICOM data to RECOMIA all private tags are removed.

Table 1

All odd group numbered tags are deleted. Table 1 details the de-identification performed for even grouped numbered tags at the submitting site by way of a RECOMIA supplied de-identification script.