Medizinische Bilder anonymisieren

(auf englisch)

Challenge

The task is to anonymize medical images produced by MRI and PET scanners.

Sort and anonymize 695'799 brain images

The data is stored in DICOM format, the industry standard for storing and transporting medical images. The images are from a multi-hospital clinical trial. When stored as files, they consist of 695'799 files with meaningless names. The files are from a time when scanner vendors had stored DICOM files in their own DICOM "dialects" that deviated from the evolving standard in various ways.

Each file has hundreds of "header fields" potentially containing personal data

Additionally, each file contains hundreds of so-called "headers" with metadata about the image acquisition, many of them referring to the patient, to hospital names or other identifiable information. A part of the headers is important for analysis, and a part needs to be modified during anonymization.

US and EU privacy regulation

Anonymization needs to be done in accordance with HIPAA, but also the more stringent EU regulation. Since results are not used for an FDA submission, process validation according to 21 CFR part 11 is not required. However, the anonymization process needs to be reproducible, and the customer needs to be able to run their own quality checks.

Approach

In this project, we develop our own anonymization software and build it into a web-based and open-source DICOM toolkit. The DICOM files never leave the user's computer and are anonymized locally.

A website to anonymize DICOM files

A lot of effort is spent on writing the page in a memory-efficient, fast way so that a 2014 laptop will not crash while reading, anonymizing, sorting and writing back close to a million files one by one.

Our contribution to the DICOM Standard

The DICOM standard contains its own section about how to anonymize images, and our CEO Stefan in fact has authored its first draft. (For the insiders: obviously David Clunie has authored the next and final version.) That standard served as the guideline to this project.

Result

An anonymization pipeline has been developed which is fully characterized by a configuration page. This means, if any aspect of anonymization should change, we just edit the configuration and re-run the process on all original files.

A repeatable image anonymization process to ensure quality

We thus avoid doing incremental changes to the data files, which is very important for verifying the quality and reproducibility of the final deliverable.

An external hard drive with 30 GB of anonymized imaging files has been delivered to the customer.

Since this project, there have been great developments for working with DICOM on the web. Our old site and anonymizer is still up though. It's located at dcmjs.org

In case you're curious why we're not mentioning the customer by name: Our General Services Agreement with them prohibits us from doing so.