12. Reading neuroimaging data with Nibabel

The first step in almost any neuroimaging data analysis is reading neuroimaging data from files. We will start dipping our toes into how to do this with the nibabel software library. Many different file formats store neuroimaging data, but the NIfTI format (The acronym stands for Neuroimaging Informatics Technology Initiative, which was the group at the US National Institutes of Health that originally designed the file format) has gained significant traction in recent years. For example, it is the format required by the BIDS standard for storing raw MRI data. Because this file format is so common, we will focus on this format here and we will not go into detail on the other file formats that are supported by nibabel.

Getting acquainted with nibabel and its operations will also allow us to work through some issues that arise when analyzing data from neuroimaging experiments. In the next section (Section 13), we will dive even deeper into the challenge of aligning different measurements to each other. Let’s start by downloading some neuroimaging files and seeing how nibabel works. We’ll use the data downloading functionality implemented in ndslib to get data from the OpenNeuro archive. These two files are part of a study on the effects of learning on brain representations [Beukema et al., 2019]. From this dataset, we will download just one run of a functional MRI experiment and a corresponding T1-weighted image acquired for the same subject in the same session.

from ndslib.data import download_bids_dataset
download_bids_dataset()

After running this code, we should have a valid BIDS dataset inside a folder called ds001233. If you examine your filesystem, you will see that the files have been named and organized according to the BIDS standard. This will come in handy in a little bit.

  ds001233
  ├── dataset_description.json
  └── sub-17
      └── ses-pre
          ├── anat
          │   ├── sub-17_ses-pre_T1w.json
          │   └── sub-17_ses-pre_T1w.nii.gz
          └── func
              ├── sub-17_ses-pre_task-cuedSFM_run-01_bold.json
              └── sub-17_ses-pre_task-cuedSFM_run-01_bold.nii.gz

In the meanwhile, let’s start using nibabel to access the data:

import nibabel as nib
img_bold = nib.load("ds001233/sub-17/ses-pre/func/sub-17_ses-pre_task-cuedSFM_run-01_bold.nii.gz")

What kind of object is img_bold?

type(img_bold)
nibabel.nifti1.Nifti1Image

As you can see, it’s a Nifti1Image object. This is a special nibabel object for handling NIfTI files. One of the principles that guide the design of nibabel is that operations that take time, or that require a lot of the computer’s memory are delayed and only executed when there is no choice. Instead, nibabel starts by reading a relatively small section of the file, called the file’s “header” that contains only meta-data: information about the data, rather than the data itself. For example, the header contains information about the size of the data array that is stored inside the file. So, when we called nibabel.load above none of the actual data was read. Instead, nibabel reads the header information about this data and could be prepared to also give you the data if and when you want it. This is useful for cases where we need to know some property of the data, such as the size of the data array, but we don’t yet need to read any of the data itself. For example:

print(img_bold.shape)
(96, 96, 66, 241)

This tells us that the data in this file is a four-dimensional array with dimensions of 96 by 96 by 66 voxels, collected in 241 time points. There is other information stored in the header, and we’ll look at some of this information a little bit later. First, we will use the object’s get_fdata method to read the data from the file into memory. The ‘f’ in this method name indicates to us that nibabel is going to take the data that is stored in the file and do its best to represent it as floating point numbers, regardless of the way that the data is stored in the file. For example

data_bold = img_bold.get_fdata()
type(data_bold)
numpy.ndarray

Now it’s a regular Numpy array. Everything that applies to the arrays that you already saw in previous chapters applies to this array. You can compute on it, visualize it, and so forth. What data type does it use?

print(data_bold.dtype)
float64

As promised, the data, regardless of how it was stored in the file, is given to us in the 64-bit floating point Numpy data type. There are other ways to read data from nibabel, that allow you more fine-grained control of how much of the data you are reading, how and whether you store it in memory, and so on, but we will not go into that here. We recommend reading the nibabel documentation to get acquainted with these details. One of the reasons that we don’t think this is worth expounding upon is that the get_fdata does the right thing for you in 99% of the cases that we have encountered ourselves. So, this provides a simple way to read the data and know what you are getting.

What shape does this array have?

print(data_bold.shape)
(96, 96, 66, 241)

It’s exactly the shape you would expect from the information provided in the Nifti1Image that we saw above. There are many things we can do with this data, but one of the first things that you might want to do with a new dataset is to assess the quality of the data. This will help guide our analysis strategy and allow us to identify samples (time points, runs, subjects) that should be handled carefully (for example, excluded from further analysis, or considered with a grain of salt).

12.1. Assessing MRI data quality

MRI data quality assessments are important because MRI data can vary quite a bit in terms of its quality and the conclusions from data analysis may be misguided if data quality is not taken into account. For example, the quality of measurements from young children and older adults could be affected by the tendency of participants from these populations to move more in the MRI scanner. There are many different ways to assess MRI data quality. We will start with a commonly-used metric that is also relatively easy to understand, by computing the temporal signal-to-noise ratio (or tSNR) of the measurement in each voxel. This is quantified as the mean signal in the voxel across time points, divided by the standard deviation of the signal across time points. To demonstrate the various concepts that we discussed above, we will use three different ways to compute this value. The first uses Numpy directly on the data that we read from the file:

import numpy as np
tsnr_numpy = np.mean(data_bold, -1) / np.std(data_bold, -1)

The second way to compute relies on the Nipype software. We will not go into much depth about how to use the Nipype library, recommending instead that you refer to one of the resources provided below, but briefly: Nipype interfaces are objects that provide direct Python control of a variety of software, including functionality that implemented in many non-Python libraries, such as AFNI and FSL. Here, we are going to use a Nipype interface for a relatively straightforward Python-based calculation of SNR, very similar to the calculation we just did using Numpy. The way this is done here is by defining the interface object TSNR(), associating this object with an input file, the file that contains the BOLD data, and then calling the .run() method. This creates a file – tsnr.nii.gz – which contains the results of the computation.

from nipype.algorithms.confounds import TSNR

tsnr = TSNR()
tsnr.inputs.in_file = 'ds001233/sub-17/ses-pre/func/sub-17_ses-pre_task-cuedSFM_run-01_bold.nii.gz'
res = tsnr.run()

The output file (“tsnr.nii.gz”) can be immediately loaded with nibabel:

tsnr_img = nib.load("tsnr.nii.gz")
tsnr_data = tsnr_img.get_fdata()

12.1.1. Exercise

Are the results from the Numpy calculation and the Nipype calculation identical? What happens if you exclude all the voxels for which the standard deviation is zero or very small (e.g., smaller than 0.001)?

A third option to calculate quality assessments for MRI data will use a BIDS app. The MRIQC app, already mentioned before in Section 11.2 is implemented as a Docker container. It takes a valid BIDS dataset as input and produces a range of quality assessments as output. Let’s see how this works in practice. In a terminal prompt, we first create a folder for the outputs of processing:

$ mkdir output

Then, we use Docker – much as we did in section Section 4 – to point the MRIQC Docker container to the BIDS dataset:

$ docker run -it --rm -v $(pwd)/ds001233:/data:ro -v $(pwd)/output:/out nipreps/mriqc:latest /data /out participant --participant_label 17

Because the dataset is rather small: one fMRI run and one T1-weighted scan, it takes just a few minutes to run. When this is completed, we should have the following contents in the ‘output’ folder:

  output
  ├── dataset_description.json
  ├── logs
  ├── sub-17
  │   └── ses-pre
  │       ├── anat
  │       │   └── sub-17_ses-pre_T1w.json
  │       └── func
  │           └── sub-17_ses-pre_task-cuedSFM_run-01_bold.json
  ├── sub-17_ses-pre_T1w.html
  └── sub-17_ses-pre_task-cuedSFM_run-01_bold.html

Notice that the outputs of this processing are also organized as a (derivatives) BIDS dataset. There are no neuroimaging data files in here, but the files named sub-17_ses-pre_T1w.json and sub-17_ses-pre_task-cuedSFM_run-01_bold.json each contain a dictionary of quality assessment metrics, including for sub-17_ses-pre_task-cuedSFM_run-01_bold.json, a field with the median tSNR value that was calculated after applying motion correction, in addition to many other metrics. In addition, the html files saved at the top level of the outputs contain some visual diagnostics of image quality. This example again demonstrates the importance of the BIDS standard. Without it, the dockerized MRIQC BIDS app would not know how to find and understand the data that is stored in the input folder, and we wouldn’t necessarily have an immediate intuitive understanding of the outputs that were provided.

12.1.2. Exercise

Is the MRIQC median tSNR larger or smaller than the ones calculated with Nipype and Numpy? Why is that?

12.2. Additional resources

There is extensive literature on approaches to quality assessment of MRI data. The MRIQC documentation is a good place to start reading about the variety of metrics that can be used to automatically characterize image quality. In particular, to see some of the impacts of MRI quality assessments on the conclusions from MRI studies, you can look at some of the original research literature on this topic. Across different MRI modalities, researchers have shown that conclusions from MRI studies can be confounded particularly by the impact that motion has on MRI data quality. This has been identified in diffusion MRI [Yendiki et al., 2014], resting state functional connectivity [Power et al., 2012] and in the analysis of structural MRI [Reuter et al., 2015].