Data Import & Anonymisation

Table of contents

  1. Importing Images into XNAT
    1. Uploading Data
    2. The Prearchive
  2. De-identification
    1. Project-Specific De-identification with DicomEdit
    2. De-identification with PyDicom

It’s time to get the imaging data off your local machine and into a centrally managed location. At the same time you’ll remove any PHI that you may have noticed in your data exploration.

Your objectives are:

  • Import the images from your local location to a project on the XNAT
  • Write a de-identification script that runs on the XNAT as your date is uploaded
  • Ensure that no PHI is present when the data is archived.

Importing Images into XNAT

Uploading and storing imaging sessions is a key activity in XNAT-based workflows. An image session is composed of a series of DICOM scan files that contain two components: the image data, and image metadata. XNAT performs a series of operations to take these loosely aggregated files and stitch them together into an image session object.When XNAT receives data it reads the DICOM metadata in the header of each file, and uses this to map to subject, session date, and session labels. This creates the interface and hierarchy you saw when navigating the IBASH project on XNAT.

You can find a more detailed overview of Image Session Upload Methods in XNAT in the XNAT documentation.

Uploading Data

There are multiple ways to upload data to XNAT

  • XNAT Desktop Client
    • The XNAT Desktop client is often the best upload method if you are working with DICOM files and only need to upload a small number of subjects at any one time.
  • Compressed Image Uploader
    • You can upload a .zip or .tar.gz file from your computer containing DICOM or ECAT files. This can include multiple series and subjects. Unlike the XNAT Desktop Client, the compressed uploader can be run from the XNAT web interface without additional software being installed.
  • XNAT REST API
    • If you have large numbers of sessions to archive, you may consider writing a script to manage your uploads. Such a script would make use of the XNAT API to connect to your XNAT instance and perform the upload and archive functions.
  • Using xnatpy (or another library)
    • Software libraries such as xnatpy simplify communication with the XNAT REST API, allowing higher-level access to your project data, as well as another route with which to upload data to projects

The Prearchive

XNAT will temporarily store uploaded data in the prearchive as a staging point or when it does not know which project or subject to attach the data to. For example, data sent by DICOM push or uploaded via the Compressed File Uploader may appear here in some circumstances. Other studies have all data go to the prearchive so it can be reviewed by study personnel before being included into the XNAT database.

If your uploaded data has not appeared in your project, you should check the prearchive and move the data over. More information can be found in Using the Prearchive

De-identification

As you may have noticed in looking through the DICOM headers, the images contain some PHI that needs to be removed. (If you didn’t, take this chance to go back and have a look)

Every DICOM file will contain this PHI so it’s inadvisable to try and remove this information by hand! Instead we’ll leverage XNAT’s built in Project Data Import and Anonymization abilities to make our life easier.

We’ll also take what we learn from using XNAT’s solution and see what we can do with it using .

Project-Specific De-identification with DicomEdit

DICOM anonymisation happens in several locations in XNAT.

We’ll be looking at project-specific anonymisation. This occurs when when the session moves into the archive. Images going first into the prearchive will not have this anonymisation applied as part of the upload process. Project-specific anonymisation will be applied to prearchived images, but only after manually sending the images to the archive (See Using the Prearchive).

The process should be as follows:

  1. Upload original data to XNAT and store in your project’s prearchive
  2. Send the data to be archived
  3. Our de-identification runs as the data is being moved
  4. De-identified data arrives and is stored in the project’s archive

The question is then how do we define our de-identification on XNAT. The answer is by writing a DicomEdit script that can be interpreted by the DicomEdit library that XNAT uses for de-identification. This allows us to define manipulations of tags so that we can comply with whatever de-identification profile that we choose.

Your task here is to:

  1. Read about DicomEdit and gain familiarly with its syntax
  2. Write a DicomEdit script that will remove PHI from the IBASH study images
  3. Configure your XNAT project with the DicomEdit script
  4. Import your data into the project and run it through the de-identification - doing a couple of studies instead of the whole dataset should be sufficient (you can upload the rest of the dataset later)
  5. Review the DICOM tags in the project archive to ensure your de-identification was a success.

De-identification with PyDicom

DicomEdit is of course not the only option to de-identifying DICOM. If you need to do more complex manipulations or use logic that DicomEdit can’t handle, you may have to write your own pre-processing pipeline that includes de-identification.

One option for manipulating DICOM with Python is the pydicom library. See if you can use what you know about DICOM and de-identifying tags to come up with a Python script that does similar to your DicomEdit script.

You may find this tutorial on Anonymising DICOM Data useful.