---
jupytext:
  formats: ipynb,md:myst
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.11.5
kernelspec:
  display_name: Python 3 (ipykernel)
  language: python
  name: python3
---

```{code-cell} ipython3
:tags: [remove-cell]

from ndslib.config import jupyter_startup
jupyter_startup()
```

# Data science tools for neuroimaging

In previous chapters, we saw that there is plenty of things that you can do with
general-purpose software and data science tools. Many of the examples used data
from neuroimaging experiments, but the ideas that we introduced -- arrays and
computing with arrays, tables and how to manipulate tabular data, scientific
computing, and data visualization -- were not specific to neuroimaging. In this
chapter, we will discuss data science approaches that are specifically tailored
to neuroimaging data. First (in {numref}`nipy`), we will present a survey of
neuroimaging-specific software implemented in Python. Next (in {numref}`bids`),
we will discuss how to organize data from neuroscience experiments for data
analysis that best takes advantage of these software tools. In the following
chapters, we will dive into some of the applications of these data science tools
to fundamental problems in neuroimaging data analysis.

(nipy)=
## Neuroimaging in Python

```{eval-rst}
.. index::
   single: NiPy
```

Within the broader ecosystem of Python tools for science, there is a family of
tools specifically focused on neuroimaging (we will refer to them collectively
as "NiPy", which stands for "Neuroimaging in Python"). These software tools,
developed by and for neuroimaging researchers, cover a wide range of data
analysis tasks on a variety of different kinds of experimental data. In the next
few sections, we will see in detail how some of these tools are used. But first,
we will provide a broad survey of the different kinds of tools that currently
exist. It is important to emphasize that this is a very dynamically evolving
ecosystem, and some of these tools may evolve into other tools over time, or
even disappear. New tools will inevitably also emerge. So, this survey will be,
by necessity, a bit superficial and a bit dated. That said, we'll try to give
you a sense of how an ecosystem like this one emerges and evolves so that you
can keep an eye on these trends as they play out in the future.

### Reading and writing neuroimaging data

```{eval-rst}
.. index::
   single: Nibabel
```

All of the NiPy tools depend to some degree on the ability to read neuroimaging
data from commonly used file formats into the memory of the computer. The tool
of choice for doing that is called nibabel. The name hints at the plethora of
different file formats that exist to represent and store neuroimaging data.
Nibabel, which has been in continuous development since 2007, provides a
relatively straightforward common route to read many of these formats. Much in
the way that Numpy serves as a basis for all of the scientific computing in
Python, nibabel serves as the basis for the NiPy ecosystem. Because it is so
fundamental, we devote an entire later section ({numref}`nibabel`) to some basic
operations that you can do with nibabel. In {numref}`nibabel_transforms`, we
will go even deeper to see how nibabel enables the integration of different
neuroimaging data types. So, we will not say much more about nibabel right now.

```{eval-rst}
.. index::
   single: Nibabel
```

### Tools for MRI data analysis

```{eval-rst}
.. index::
   single: BrainIak
```

```{eval-rst}
.. index::
   single: Nilearn
```

```{eval-rst}
.. index::
   single: fitlins
```

```{eval-rst}
.. index::
   single: glmsingle
```

Tools for neuroimaging data analysis are usually focused on particular data
types and on particular ways of analyzing the data. Functional MRI (fMRI) is a
very popular data type in cognitive neuroscience, and several tools focus on
functional MRI. For example, both the [BrainIAK](https://brainiak.org/) software
library and the [Nilearn](https://nilearn.github.io/) software library focus on
machine learning analysis of functional MRI data. We will return to talk about
machine learning in more detail in chapter {numref}`core-ml`. Some tools fit
more traditional generalized linear models (GLM) to fMRI data. One of these is
[fitlins](https://fitlins.readthedocs.io/). It relies heavily on the Brain
Imaging Data Structure standard, which we will discuss below in {numref}`bids`.
Another one is [glmsingle](https://glmsingle.readthedocs.io/), implemented in
both Matlab and Python, which focuses on the estimation of single-trial responses.

```{eval-rst}
.. index::
   single: Diffusion Imaging in Python (DIPY)
```

Another popular MRI modality is diffusion MRI (dMRI). It is used to characterize
the properties of brain white matter and to find white matter connections
between different parts of the brain. [DIPY](https://dipy.org) (Diffusion
Imaging in Python) is a software library that implements a variety of methods
for analysis of this kind of data. We will use image processing algorithms that
are implemented in DIPY in chapter {numref}`registration`. Building upon DIPY,
there are several tools that focus on more specific dMRI analysis tasks, such as
[DMIPY](https://dmipy.readthedocs.io/en/latest/), which focuses on various ways
to model the microstructure of brain white matter tissue,
[pyAFQ](http://yeatmanlab.github.io/pyAFQ/) and
[TractSeg](https://github.com/MIC-DKFZ/TractSeg), which focus on delineating the
major white matter tracts, and [NDMG](https://ndmg.neurodata.io/), which focuses
on quantifying the connectivity matrix of the brain, based on dMRI data. Another
widely-used software library for analysis of dMRI is the
[Medical Imaging Toolkit for Diffusion](https://github.com/MIC-DKFZ/MITK-Diffusion).

```{eval-rst}
.. index::
   single: advanced normalization tools (ANTs)
```

Other tools focus on analysis of structural MRI data. The advanced normalization
tools (ANTs) software library implements a variety of algorithms for structural
data, such as image registration and image segmentation, which will be described
in more detail in chapter {numref}`skimage`. The ANTs library, originally
developed in the C programming language, is now also available through a
user-friendly Python interface in the
[antspy library](https://antspy.readthedocs.io/).

### Tools for other neuroimaging modalities

One of the largest and mature projects for neuroimaging data analysis focuses on
electrophysiology data that comes from experiments that use
Magnetoencephalography (MEG) Electroencephalography (EEG), Electrocorticography
(ECoG), and other similar measurements. The [MNE](https://mne.tools/) software
library has been in development since 2011 and contains implementations of many
algorithms that are common across these different types of data. For example,
analysis of the frequency spectrums of these signals. It also includes
algorithms that are specific for particular types of data. For example,
algorithms to localize the brain sources of MEG signals that are measured
outside the skull.

### Nipype, the Python tool to unify them all

```{eval-rst}
.. index::
   single: Nipype
```

One of the main challenges for anyone doing neuroimaging data analysis is that
different algorithms are implemented in different software libraries. Even if
you decide that you would like to use NiPy tools for most of your data analysis,
you might still find yourself needing to use non-Python tools for some steps in
your analysis. For example, to use novel algorithms that are implemented in the
popular FMRIB Software Library (FSL) or Analysis of Functional NeuroImages
(AFNI) frameworks for fMRI, the popular MRTRIX library for dMRI, or in
FreeSurfer for analysis of structural MRI. In theory, if you wanted to construct
an analysis pipeline that glues together operations from different frameworks,
you would have to familiarize yourself with the varying ways in which these
different frameworks expect their input data to appear and the varying ways in
which they produce their output. The complexity can quickly become daunting.
Moreover, creating reproducible data analysis programs that put together such
pipelines becomes cumbersome and can be very difficult. Fortunately, these
challenges have been addressed through the
[Nipype](https://nipype.readthedocs.io/) Python software library. Nipype
implements Python interfaces to many neuroimaging software tools (including
those we mentioned above). One of Nipype's appealing features is that it
provides a very similar programming interface across these different tools. In
addition, it provides the functionality to represent pipelines of operations, where the outputs of one processing step become the inputs to the next processing
steps. Running such a pipeline as one program is not only a matter of convenience but also important for reproducibility. By tracking and recording
the set of operations that a dataset undergoes through analysis, as well as the
parameters that are used in each step, Nipype allows users of the software to
report exactly what they did with their data to come to a particular conclusion.
In addition, Nipype allows a processing pipeline to be used with other datasets.
However, this raises another kind of challenge: to achieve this kind of
extensibility, the input data has to be organized in a particular way that the
software recognizes. This is where community-developed data standards come in.
In the rest of this section, we will discuss the Brain Imaging Data Structure,
or BIDS, which is the most commonly-used data standard for organizing
neuroimaging data for analysis.


(bids)=
## The Brain Imaging Data Structure (BIDS) standard

```{eval-rst}
.. index::
   single: Brain Imaging Data Structure (BIDS)
```

In the previous section, we pointed out that Nipype can be used to create
reproducible analysis pipelines that can be applied across different datasets.
This is true, in principle, but in practice, it also relies on one more idea:
that of a data standard. This is because to be truly transferable, an analysis
pipeline needs to know where to find the data and metadata that it uses for
analysis. Thus, in this section, we will shift our focus to talking about how
entire neuroimaging *projects* are (or should be) laid out. Until recently,
datasets were usually organized in an idiosyncratic way. Each researcher had to
decide on their own what data organization made sense to them. This made data
sharing and reuse quite difficult because if you wanted to use someone else's
data, there was a very good chance you'd first have to spend a few days just
figuring out what you were looking at and how you could go about reading the
data into whatever environment you were comfortable with.

+++

### The BIDS specification

Fortunately, things have improved dramatically in recent years. Recognizing that
working with neuroimaging data would be considerably easier if everyone adopted
a common data representation standard, a group of (mostly) fMRI researchers
convened in 2016 to create something now known as the [*Brain Imaging Data
Standard*, or BIDS](https://bids.neuroimaging.io). BIDS wasn't the *first* data
standard proposed in fMRI, but it has become by far the most widely adopted.
Much of the success of BIDS can be traced to its simplicity: the standard
deliberately insists not only on machine readability but also on human
readability, which means that a machine can ingest a dataset and do all kinds of machine processing with it, but a human looking at the files can also make sense
of the dataset, understanding what kinds of data were collected and what
experiments were conducted. While there are some nuances and complexities to
BIDS, the core of the specification consists of a relatively simple set of rules
a human with some domain knowledge can readily understand and implement.

We won't spend much time describing the details of the BIDS specification in
this book, as there's already excellent documentation for that on the project's
website. Instead, we'll just touch on a couple of core principles. The easiest
way to understand what BIDS is about is to dive right into an example. Here's a
sample directory structure we've borrowed from the BIDS documentation. It shows
a valid BIDS dataset that contains just a single subject:


<pre style="background: #eee;">
project/
    sub-control01/
        anat/
            sub-control01_T1w.nii.gz
            sub-control01_T1w.json
            sub-control01_T2w.nii.gz
            sub-control01_T2w.json
        func/
            sub-control01_task-nback_bold.nii.gz
            sub-control01_task-nback_bold.json
            sub-control01_task-nback_events.tsv
            sub-control01_task-nback_physio.tsv.gz
            sub-control01_task-nback_physio.json
            sub-control01_task-nback_sbref.nii.gz
        dwi/
            sub-control01_dwi.nii.gz
            sub-control01_dwi.bval
            sub-control01_dwi.bvec
        fmap/
            sub-control01_phasediff.nii.gz
            sub-control01_phasediff.json
            sub-control01_magnitude1.nii.gz
            sub-control01_scans.tsv
    code/
        deface.py
    derivatives/
    README
    participants.tsv
    dataset_description.json
    CHANGES
</pre>

There are two important points to note here. First, the BIDS specification
imposes restrictions on how files are organized within a BIDS project directory.
For example, every subject's data goes inside a `sub-[id]` folder below the
project root —- where the `sub-` prefix is required, and the `[id]` is a
researcher-selected string uniquely identifying that subject within the project
(`"control01"` in the example). And similarly, inside each subject directory, we
find subdirectories containing data of different modalities: `anat` for
anatomical images; `func` for functional images; `dwi` for diffusion-weighted
images; and so on. When there are multiple data collection sessions for each
subject, an extra level is introduced to the hierarchy, so that functional data
from the first session acquired from subject `control01` would be stored inside
a folder like `sub-control01/ses-01/func`.

Second, valid BIDS files must follow particular naming conventions. The precise
naming structure of each file depends on what kind of file it is, but the
central idea is that a BIDS filename is always made up of (1) a sequence of
key-value pairs, where each key is separated from its corresponding value by a
dash, and pairs are separated by underscores; (2) a "suffix" that directly
precedes the file extension and describes the type of data contained in the file
(this comes from a controlled vocabulary, meaning that it can only be one of a
few accepted values, such as `"bold"` or `"dwi"`); and (3) an extension that
defines the file format.

For example, if we take a file like
`sub-control01/func/sub-control01_task-nback_bold.nii.gz` and examine its
constituent chunks, we can infer from the filename that the file is a Nifti
image (`.nii.gz` extension) that contains BOLD fMRI data (`bold` suffix) for
task `nback` acquired from subject `control01`.

Besides these conventions, there are several other key elements of the BIDS
specification. We won't discuss them in detail, but it's good to at least be
aware of them:

* Every data file should be accompanied by a JSON "sidecar" containing metadata
  describing that file. For example, a BOLD data file might be accompanied by a
  side-car file that describes acquisition parameters, such as repetition time.
* BIDS follows an "inheritance" principle —- meaning that JSON metadata files
  higher up in the hierarchy automatically apply to relevant files lower in the
  hierarchy unless explicitly overridden. For example, if all of the BOLD data
  in a single dataset was acquired using the same protocol, this metadata
  need not be replicated in each subject's data folder.
* Every project is required to have a `dataset_description.json` file at the
  root level that contains basic information about the project (e.g., the name
  of the dataset and a description of its constituents, as well as citation
  information).
* BIDS doesn't actively prohibit you from including non-BIDS-compliant files in
  a BIDS project -— so you don't have to just throw out files that you can't
  easily shoehorn into the BIDS format. The downside of including non-compliant
  files is just that most BIDS tools and/or human users won't know what to do
  with them, so your dataset might not be quite as useful as it otherwise would
  be.

+++

#### BIDS Derivatives

The BIDS specification was originally created with static representations of
neuroimaging datasets in mind. But it quickly became clear that it would also be
beneficial for the standard to handle *derivatives* of datasets -— that is, new
BIDS datasets generated by applying some transformation to one or more existing
BIDS datasets. For example, suppose we have a BIDS dataset containing raw fMRI
images. Typically, we'll want to preprocess our images (for example, to remove
artifacts, apply motion correction, temporally filter the signal, etc.) before
submitting them to analysis. It's great if our preprocessing pipeline can take
BIDS datasets as inputs, but what should it then do with the output? A naive
approach would be to just construct a new BIDS dataset that's very similar to
the original one, but replace the original (raw) fMRI images with new
(preprocessed) ones. But that's likely to confuse: a user could easily end up
with many different versions of the same BIDS dataset, yet have no formal way to
determine the relationship between them. To address this problem, the
BIDS-Derivatives extension introduces some additional metadata and file naming
conventions that make it easier to chain BIDS-aware tools (see the next section)
without chaos taking hold.

### The BIDS Ecosystem

At this point, you might be wondering: *what is BIDS good for?* Surely the point
of introducing a new data standard isn't just to inconvenience people by forcing
them to spend their time organizing their data a certain way? There must be some
benefits to individual researchers -- and ideally, the community as a whole -—
spending precious time making datasets and workflows BIDS-compliant, right?
Well, yes, there are! The benefits of buying into the BIDS ecosystem are quite
substantial. Let's look at a few.

#### Easier data sharing and reuse

One obvious benefit we alluded to above is that sharing and re-using
neuroimaging data becomes much easier once many people agree to organize their
data the same way. As a trivial example, once you know that BIDS organizes data
according to a fixed hierarchy (i.e., subject --> session --> run), it's easy to
understand other people's datasets. There's no chance of finding time-course
images belonging to subject 1 in, say,
`/imaging/old/NbackTask/rawData/niftis/controlgroup/1/`. But the benefits of
BIDS for sharing and reuse come into full view once we consider the impact on
public data repositories. While neuroimaging repositories have been around for a
long time (for an early review, see {cite}`van2001functional`), their utility
was long hampered by the prevalence of idiosyncratic file formats and project
organizations. By supporting the BIDS standard, data repositories open the door
to a wide range of powerful capabilities.

To illustrate, consider [OpenNeuro](https://openneuro.org) -- currently the
largest and most widely-used repository of brain MRI data. OpenNeuro requires
uploaded datasets to be in BIDS format (though datasets do not have to be fully
compliant). As a result, the platform can automatically extract, display, and
index important metadata. For example, the number of subjects, sessions, and
runs in each dataset; the data modalities and experimental tasks present; a
standardized description of the dataset; and so on. Integration with free
analysis platforms like [BrainLife](https://brainlilfe.io) is possible, as is
structured querying over datasets via OpenNeuro's GraphQL API endpoint.

Perhaps most importantly, the incremental effort required by users to make their
BIDS-compliant datasets publicly available and immediately usable by others is
minimal: in most cases, users have only to click an Upload button and locate the
project they wish to share (there is also a command-line interface, for users
who prefer to interact with OpenNeuro programmatically).

#### BIDS-Apps

```{eval-rst}
.. index::
   single: BIDS apps
```

A second benefit to representing neuroimaging data in BIDS is that one
immediately gains access to a large, and rapidly growing, ecosystem of
BIDS-compatible tools. If you've used different neuroimaging tools in your
research -— for example, perhaps you've tried out both FSL and SPM (the two most
widely used fMRI data analysis suites) —- you'll have probably had to do some
work to get your data into a somewhat different format for each tool. In a world
without standards, tool developers can't be reasonably expected to know how to
read *your* particular dataset, so the onus falls on *you* to get your data
into a compatible format. In the worst case, this means that every time you want
to use a new tool, you have to do some more work.

By contrast, for tools that natively support BIDS, life is simpler. Once we know
that fMRIPrep -— a very popular preprocessing pipeline for fMRI data
{cite}`Esteban2019-md` -— takes valid BIDS datasets as inputs, the odds are very
high that we'll be able to apply fMRIPrep to our own valid BIDS datasets with
little or no additional work. To facilitate the development and use of these
kinds of tools, BIDS developed a lightweight standard for "BIDS Apps"
{cite}`Gorgolewski2017-mb`. A BIDS App is an application that takes one or more
BIDS datasets as input. There is no restriction on what a BIDS App can do, or
what it's allowed to output (though many BIDS Apps output BIDS-Derivatives
datasets); the only requirement is that a BIDS App is containerized (using
Docker or Singularity; see {numref}`docker`), and accept valid BIDS datasets as input. New BIDS Apps
are continuously being developed, and as of this writing, the BIDS Apps website
lists a few dozen [apps](https://bids-apps.neuroimaging.io/apps/).

```{eval-rst}
.. index::
   single: CPAC
```

```{eval-rst}
.. index::
   single: fMRIPrep
```

```{eval-rst}
.. index::
   single: Human Connectome Project Pipelines
```

```{eval-rst}
.. index::
   single: Freesurfer
```

```{eval-rst}
.. index::
   single: MRIQC
```

What's particularly nice about this approach is that it doesn't necessarily require the developers of existing tools to do a lot of extra work themselves to support BIDS. In principle, anyone can write a BIDS-App "wrapper" that mediates between the BIDS format and whatever format a tool natively expects to receive data in. So, for example, the BIDS-Apps registry already contains BIDS-Apps for packages or pipelines like [SPM](https://www.fil.ion.ucl.ac.uk/spm/), [CPAC](https://fcp-indi.github.io/), [Freesurfer](https://surfer.nmr.mgh.harvard.edu/), and the [Human Connectome Project Pipelines](https://www.humanconnectome.org/software/hcp-mr-pipelines). Some of these apps are still fairly rudimentary and don't cover all of the functionality provided by the original tools, but others support much or most of the respective native tool's functionality. And of course, many BIDS-Apps aren't wrappers around other tools at all; they're entirely new tools designed from the very beginning to support only BIDS datasets. We've already mentioned fMRIPrep, which has very quickly become arguably the de facto preprocessing pipeline in fMRI; another widely-used BIDS-App is [MRIQC](https://mriqc.readthedocs.io/en/stable/)  {cite}`Esteban2017-tu`, a tool for automated quality control and quality assessment of structural and functional MRI scans, which we will see in action in {numref}`nibabel`. Although the BIDS-Apps ecosystem is still in its infancy, the latter two tools already represent something close to killer applications for many researchers.

To demonstrate this statement, consider how easy it is to run fMRIPrep once your data is organized in the BIDS format. After installing the software and its dependencies, running the software is as simple as issue this command in the terminal:

    fmriprep data/bids_root/ out/ participant -w work/

where, `data/bids_root` points to a directory that contains a BIDS-organized
dataset that includes fMRI data, `out` points to the directory into which the
outputs (the BIDS derivatives) will be saved and `work` is a directory that will
store some of the intermediate products that will be generated along the way.
Looking at this, it might not be immediately apparent how important BIDS is for
this to be so simple, but consider what software would need to do to find all of
the fMRI data inside of a complex dataset of raw MRI data, that might contain
other data types, other files and so forth. Consider also the complexity that
arises from the fact that fMRI data can be collected using many different
acquisition protocols, and the fact that fMRI processing sometimes uses other
information (for example, measurements of the field map, or anatomical
T1-weighted or T2-weighted scans). The fact that the data complies with BIDS
allows fMRIPrep to locate everything that it needs with the dataset and to make
use of all the information to perform the preprocessing to the best of its
ability given the provided data.

#### Utility libraries

```{eval-rst}
.. index::
   single: pyBIDS
```

Lastly, the widespread adoption of BIDS has also spawned a large number of
utility libraries designed to help developers (rather than end users) build
their analysis pipelines and tools more efficiently. Suppose I'm writing a
script to automate my lab's typical fMRI analysis workflow. It's a safe bet
that, at multiple points in my script, I'll need to interact with the input
datasets in fairly stereotyped and repetitive ways. For instance, I might need
to search through the project directory for all files containing information
about event timing, but only for a particular experimental task. Or, I might
need to extract some metadata containing key parameters for each time-series
image I want to analyze (e.g., the repetition time, or TR). Such tasks are
usually not very complicated, but they're tedious and can slow down development
considerably. Worse, at a community level, they introduce massive inefficiency,
because each person working on their analysis script ends up writing their own
code to solve what are usually very similar problems.

A good utility library abstracts away a lot of this kind of low-level work and
allows researchers and analysts to focus most of their attention on high-level
objectives. By standardizing the representation of neuroimaging data, BIDS makes
it much easier to write good libraries of this sort. Probably the best example
so far is a package called [PyBIDS](https://github.com/bids-standard/pybids),
which provides a convenient Python interface for basic querying and manipulation
of BIDS datasets. To give you a sense of how much work PyBIDS can save you when
you're writing neuroimaging analysis code in Python, let's take a look at some
of the things the package can do.

We start by importing an object called `BIDSLayout`, which we will use to manage
and query the layout of files on disk. We also import a function that knows how
to locate some test data that was installed on our computer together with the
PyBIDS software library.

```{code-cell} ipython3
from bids import BIDSLayout
from bids.tests import get_test_data_path
```

One of the datasets that we have within the test data path is [data set number 5 from OpenNeuro](https://openneuro.org/datasets/ds000005). Note that the software has not actually installed into our hard-drive a bunch of neuroimaging data -- that would be too large! Instead, the software installed a bunch of files that have the right names and are organized in the right way, but are mostly empty. This allows us to demonstrate the way that the software works, but don't try reading the neuroimaging data from any of the files in that directory. We'll work with a more manageable BIDS dataset, including the files in it in {numref}`nibabel`.

For now, we initialize a `BIDSLayout` object, by pointing to the location of the
dataset in our disk. When we do that, the software scans through that part of
our file-system, validates that it is a properly organized BIDS dataset, and
finds all of the files that are arranged according to the specification. This
allows the object to already infer some things about the dataset. For example,
the dataset has 16 subjects and 48 total runs (here a "run" is an individual
fMRI scan). The person who organized this dataset decided not to include a
session folder for each subject. Presumably, because each subject participated in
just one session in this experiment, that information is not useful.

```{code-cell} ipython3
layout = BIDSLayout(get_test_data_path() + "/ds005")
print(layout)
```

The `layout` object now has a method called `get()`, which we can use to gain
access to various parts of the dataset. For example, we can ask to give us a
list of the filenames of all of the anatomical (`"T1w"`) scans that were
collected for subjects `sub-01` and `sub-02`

```{code-cell} ipython3
layout.get(subject=['01', '02'],  suffix="T1w", return_type='filename')
```

Or, using a slightly different logic, all of the functional (`"bold"`) scans collected for subject `sub-03`

```{code-cell} ipython3
layout.get(subject='03', suffix="bold", return_type='filename')
```

In these examples, we asked the `BIDSLayout` object to give us the `'filename'`
return type. This is because if we don't explicitly ask for a return type, we
will get back a list of `BIDSImageFile` objects. For example, selecting the
first one of these for `sub-03`'s fMRI scans:

```{code-cell} ipython3
bids_files = layout.get(subject="03", suffix="bold")
bids_image = bids_files[0]
```

This object is quite useful, of course. For example, it knows how to parse the
file name into meaningful entities, using the `get_entities()` method, which
returns a dictionary with entities such as `subject` and `task` that can be used
to keep track of things in analysis scripts.

```{code-cell} ipython3
bids_image.get_entities()
```

In most cases, you can also get direct access to the imaging data using the
`BIDSImageFile` object. This object has a `get_image` method, which would
usually return a nibabel `Nifti1Image` object. As you will see in
{numref}`nibabel` this object lets you extract metadata, or even read the data
from a file into memory as a Numpy array. However, in this case, calling the
`get_image` method would raise an error, because, as we mentioned above, the
files do not contain any data. So, let's look at another kind of file that you
can read directly in this case. In addition to the neuroimaging data, BIDS
provides instructions on how to organize files that record the behavioral events
that occurred during an experiment. These are stored as tab-separated-values
('.tsv') files, and there is one for each run in the experiment. For example,
for this dataset, we can query for the events that happened during subject
`sub-03`'s 3rd run:

```{code-cell} ipython3
events = layout.get(subject='03', extension=".tsv", task="mixedgamblestask", run="03")
tsv_file = events[0]
print(tsv_file)
```

Instead of a `BIDSImageFile`, the variable `tsv_file` is now a `BIDSDataFile`
object, and this kind of object has a `get_df` method, which returns a Pandas
`DataFrame` object

```{code-cell} ipython3
bids_df = tsv_file.get_df()
bids_df.info()
```

This kind of functionality is useful if you are planning to automate your
analysis over large datasets that can include heterogeneous acquisitions between
subjects and within subjects. At the very least, we hope that the examples have
conveyed to you the power inherent in organizing your data according to a
standard, as a starting point to use, and maybe also develop, analysis pipelines
that expect data in this format. We will see more examples of this in practice
in the next section.

(nipy_ex1)=
#### Exercises

BIDS has a set of example datasets available in a GitHub repository at https://github.com/bids-standard/bids-examples. Clone the repository and use pyBIDS to explore the dataset called "ds011". Using only pyBIDS code, can you figure out how many subjects participated in this study? What are the values of TR that were used in fMRI acquisitions in this dataset?

## Additional resources

There are many places to learn more about the NiPy community. The documentation of each of the software libraries that were mentioned here includes fully worked-out examples of data analysis, in addition to links to tutorials, videos, and so forth. For Nipype, in particular, we recommend [Michael Notter’s Nipype tutorial](https://miykael.github.io/nipype_tutorial/) as a good starting point.

BIDS is not only a data standard but also a community of developers and users
that support the use and further development of the standard. For example, over
the time since the standard was first proposed, the community has added
instructions to support the organization and sharing of new data modalities
(e.g., intracranial EEG) and derivatives of processing. The strength of such a
community is that, like open-source software, it draws on the individual
strengths of many individuals and many groups who are willing to spend time and
effort evolving and improving it. One of the resources that the community has
developed to help people who are new to the standard learn more about it is the
[BIDS Starter Kit](https://bids-standard.github.io/bids-starter-kit/). It is a
website that includes materials (videos, tutorials, explanations, and examples)
to help you get started learning about and eventually using the BIDS standard.

In addition to these relatively static resources, users of both NiPy software
and BIDS can interact with other members of the community through a dedicated
questions and answers website called [Neurostars](https://neurostars.org/). On
this website, anyone can ask questions about neuroscience software and get
answers from experts who frequent the website. In particular, the developers of
many of the projects that we mentioned in this chapter, and many of the people
who work on the BIDS standard often answer questions about these projects
through this website.

```{eval-rst}
.. index::
   single: Neurostars
```