5. A brief introduction to Python#

Researchers in neuroimaging use different programming languages to perform data analysis. In this book, we chose to focus on the Python programming language. As we mentioned in Section 1, this book isn’t meant to be a general introduction to programming; out of necessity, we’re going to assume that you’ve had some prior experience writing code in one or more other programming languages (or possibly even Python itself). That said, we’ll be briefly reviewing key programming concepts as we go, so you don’t need to have had much prior experience. As long as you’ve encountered, say, variables, functions, and for-loops before, you should be just fine. And if you haven’t yet, we recommend some resources at the end of this section, to get you up to speed. Conversely, if you’ve been programming in other languages for years, you should be able to breeze through this section very quickly (though we might still recommend paying attention to the last few sub-sections, which talk about some deeper ideas underlying the Python language).

5.1. What is Python?#

Let’s start by talking a bit about what Python is and why we chose it for this book. First of all, it is a programming language, which means it’s basically a set of rules for writing instructions that a computer can understand and follow. Of course, that alone doesn’t make Python special; there are hundreds of other programming languages out there! But Python isn’t just any programming language; by many rankings, it’s currently (as we write this in 2022) the world’s single most popular language. And it’s particularly dominant in one of this book’s two core areas of focus: data science. We also happen to think it’s by far the best choice for doing serious work in the book’s other core area of focus–neuroimaging, and it is the language the we use most often for our own neuroimaging work. But you don’t have to take our word for that right now; hopefully you’ll be convinced of it as we work through this book.

Why do so many people like Python? There are many answers to that, but here are a few important ones.

First, Python is a high-level, interpreted programming language. This means that, in contrast to low-level, compiled languages like C, C++, Java, etc., Python features a high level of abstraction. A lot of the things you have to worry about in many low-level languages (e.g., memory allocation, garbage collection, etc.) are done for you automatically in Python.

Second, Python’s syntax is readable and (relatively speaking) easy to learn. As you’ll see shortly, many Python operators are just ordinary English words. Python also imposes certain rules on code structure that most languages don’t, which may be a bit annoying at first, but makes it easier to read other people’s code once you acclimate. One of the consequences of these design considerations is that mathematical ideas translate into code in a way that does not obscure the math. That means that mathematical equations implemented in Python look a lot like the same equations written on paper. This ends up being quite useful in many data science applications.

Third, Python is a general-purpose language. In contrast to many other dynamic programming languages designed to serve specific niche uses, Python is well suited for a wide range of applications. It features a comprehensive standard library (i.e., the functionality available out-of-the-box when you install Python) and an enormous ecosystem of third-party packages. It also supports multiple programming paradigms to varying extents (object-oriented, functional, etc.). Consequently, Python is used in many areas of software development. Very few other languages can boast that they have some of the best libraries implemented in any language for tasks as diverse as, say, scientific computing and back-end web development.

Lastly, there’s the sheer size of the Python community. While Python undeniably has many attractive features, we wouldn’t want to argue that it’s a better overall programming language than anything else. To some degree it’s probably true that Python’s popularity is an accident of history. If we could randomly re-run the last two decades, we might be extolling the virtues of Haskell (or Julia, or Ruby, or…) instead of Python. So we’re not saying Python is intrinsically the world’s greatest language. But there’s no denying that there are immense benefits to using a language that so many other people use. For most tasks you might want to take on if you’re doing data science and/or neuroimaging data analysis, finding good libraries, documentation, help, and collaborators is simply going to be much easier if you work in Python than if you work in almost any other language.

Importantly, the set of tools in Python for analysis of neuroimaging data specifically have rapidly evolved and matured in the last couple of decades, and they have gained substantial popularity through their use in research. The neuroimaging in Python ecosystem also has a strong ethos of producing high-quality open-source software, which means that the Python tools that we will describe in this book should be accessible to anyone. More broadly, Python has been adopted across many different research fields and is one of the most commonly used programming language in analysis of data across a broad range of domains, including astronomy, geosciences, natural language processing, and so on. For researchers who are thinking of applying their skills in industry outside of academia, Python is very popular in industry, with many entry positions in industry data science calling out specifically experience programming in Python as a desired skill.

With that basic sales pitch out of the way, let’s dive right into the Python language. We’ll spend the rest of this chapter working through core programming concepts and showing you how they’re implemented in Python. It should go without saying that this can only hope to be a cursory overview; there just isn’t time and space to provide a full introduction to the language! But we’ll introduce many more features and concepts throughout the rest of the book, and we also end each chapter with a curated list of additional resources you can look into if you want to learn much more.

5.2. Variables and basic types#

It’s common to start programming tutorials by talking about variables, and we won’t break with this convention. A variable, as you probably already know, is a store of data that can take on different values (hence the name variable), and is typically associated with a fixed name.

5.2.1. Declaring variables#

In Python, we declare a variable by writing its name and then assigning it a value with the equal (=) sign:

my_favorite_variable = 3

Notice that when we initialize a variable, we don’t declare its type anywhere. If you’re familiar with statically typed languages like C++ or Java, you’re probably used to having to specify what type of data a variable holds when you create it. For example, you might write int my_favorite_number = 3 to indicate that the variable is an integer. In Python, we don’t need to do this. Python is dynamically typed, meaning that the type of each variable will be determined on the fly, once we start executing our program. It also means we can change the type of the variable on the fly, without anything bad happening. For example, by over-writing it with a character string value, instead of the integer value that was once stored in this variable:

my_favorite_variable = "zzzzzzz"

5.2.2. Printing variables#

We can examine the contents of a variable at any time using the built-in print() function:

print(my_favorite_variable)
zzzzzzz

If we’re working in an interactive environment like a Jupyter notebook, we may not even need to call print(), as we’ll automatically get the output of the last line evaluated by the Python interpreter:

# this line won't be printed, because it isn't the last line in the notebook cell to be evaluated
"this line won't be printed"

# but this one will
my_favorite_variable
'zzzzzzz'

5.2.3. Built-in types#

All general-purpose programming languages provide the programmer with different types of variables—things like strings, booleans, integers, and so on. These are the most basic building blocks a program is made up of. Python is no different, and provides us with a number of built-in types. Let’s take a quick look at some of these.

5.2.3.1. Integers#

An integer is a numerical data type that can only take on finite whole numbers as its value. For example:

number_of_subjects = 20
number_of_timepoints = 1000
number_of_scans = 10

Any time we see a number written somewhere in Python code, and it’s composed only of digits (no decimals, quotes, etc.), we know we’re dealing with an integer.

In Python, integers support all of the standard arithmetic operators you’re familiar with–addition, subtraction, multiplication, etc. For example, we can multiply the two variables we just defined:

number_of_subjects * number_of_timepoints
20000

Or divide one integer by another:

number_of_timepoints / number_of_scans
100.0

Notice that the result of the above division is not itself an integer! The decimal point in the result gives away that the result is of a different type–a float.

5.2.3.2. Floats#

A float (short for floating point) is a numerical data type used to represent real numbers. As we just saw, floats are identified in Python by the presence of a decimal.

roughly_pi = 3.14
mean_participant_age = 24.201843727

All of the standard arithmetic operators work on floats just like they do on ints:

print(roughly_pi * 2)
6.28

We can also freely combine ints and floats in most operations:

print(0.001 * 10000 + 1)
11.0

Observe that the output is of type float, even though the value is a whole number, and hence could in principle have been stored as an int without any loss of information. This is a general rule in Python: arithmetic operations involving a mix of int and float operands will almost always return a float. Some operations will return a float even if all operands are ints, as we saw above in the case of division.

Exercise

The Python built-in type() function reports to you the type of a variable that is passed to it. Use the type function to verify that number_of_subjects * number_of_timepoints is a Python integer, while number_of_timepoints / number_of_scans is not. Why do you think that Python changes the result of a division into a variable of type float?

5.2.3.3. Strings#

A string is a sequence of characters. In Python, we define strings by enclosing zero or more characters inside a pair of quotes (either single or double quotes work equally well, so you can use whichever you prefer; just make sure the opening and closing quotes match!).

country = "Madagascar"
ex_planet = 'Pluto'

Python has very rich built-in functionality for working with strings. Let’s look at some of the things we can do.

We can calculate the length of a string:

len(country)
10

Or convert it to uppercase (try also .lower() and .capitalize()):

country.upper()
'MADAGASCAR'

We can count the number of occurrences of a substring (in this case, a single letter a):

country.count("a")
4

Or replace a matching substring with another substring:

country.replace("car", "truck")
'Madagastruck'

One thing that you might notice in the above examples is that they seem to use two different syntaxes. In the first example, it looks like len() is a function that takes a string as its parameter (or argument). By contrast, the last 3 examples use a different “dot” notation, where the function comes after the string (as in country.upper()). If you find this puzzling, don’t worry! We’ll talk about the distinction in much more detail below.

5.2.3.4. Booleans#

Booleans operate pretty much the same in Python as in other languages; the main thing to recognize is that they can only take on the values True or False. Not true or false, not "true" or "false". The only values a boolean can take on in Python are True and False, written exactly that way. For example:

enjoying_book = True

One of the ways that boolean values are typically generated in Python programs is through logical or comparison operations. For example, we can ask whether the length of a given string is greater than a particular integer:

is_longer_than_2 = len("apple") > 2
print(is_longer_than_2)
True

Or whether the product of the first two numbers below equals the third…

is_the_product = 719 * 1.0002 == 2000
print(is_the_product)
False

Or, we might want to know whether the conjunction of several sub-expressions is True or False:

("car" in country) and (len("apple") > 2) and (15 / 2 > 7)
True

This last example, simple as it is, illustrates a nice feature of Python: its syntax is more readable than that of most other programming languages. In the above example, we ask if the substring "car" is contained in the string country using the English language word in. Similarly, Python’s logical conjunction operator is the English word and. This means that we can often quickly figure out–or at least, vaguely intuit–what a piece of Python code does.

5.2.3.5. None#

In addition to these usual suspects, Python also has a type called None. None is special, and indicates that no value has been assigned to a variable. It’s roughly equivalent to the null value found in many other languages.

name = None

Note: None and False are not the same thing!

name == False
False

Also, assigning the value None to a variable is not the same as not defining the variable in the first place. Instead a variable that is set to None is something that we can point to in our program without raising an error, but doesn’t carry any particular value. These are subtle but important points, and in later chapters we’ll write code where the difference becomes important.

Exercise

Some integer values are logically equivalent to the Python Boolean operators. Use the equality (==) operator in order to find integers that are equivalent to True and which are equivalent to False.

5.3. Collections#

Most code we’re going to want to write in Python will require more than just integers, floats, strings, and booleans. We’re going to need more complex data structures, or collections, that can hold other objects (like strings, integers, etc.) and enable us to easily manipulate them in various ways. Python provides built-in support for many common data structures, and others can be found in modules that come installed together with the language itself – the so-called “standard library” (e.g., in the collections module).

5.3.1. Lists#

Lists are the most common collection we’ll work with in Python. A list is a heterogeneous collection of objects. By heterogeneous, we mean that a list can contain elements of different types. It doesn’t have to contain only strings or only integers; it can contain a mix of the two, as well as all kinds of other types.

5.3.1.1. List initialization#

To create a new list, we enclose one or more values between square brackets ([ and ]). Elements are separated by commas. Here is how we initialize a list containing 4 elements of different types (an integer, a float, and two strings).

random_stuff = [11, "apple", 7.14, "banana"]

5.3.1.2. List indexing#

Lists are ordered collections. By ordered, we mean that a list retains a memory of the position each of its elements was inserted in. The order of elements won’t change unless we explicitly change it. This allows us to access individual elements in the list directly, by specifying their position in the collection, or index.

To access the \(i^{th}\) element in a list, we enclose the index \(i\) in square brackets. Note that Python uses 0-based indexing (i.e., the first element in the sequence has index 0), and not 1 as in some other data-centric languages (MATLAB, R, etc.). For example, it means that the following operation returns the second item in the list, and not the first.

random_stuff[1]
'apple'

Many bitter wars have been fought on the internet over whether 0-based or 1-based indexing is better. We’re not here to take a philosophical stand on this issue; the fact of the matter is that Python indexing is 0-based, and that’s not going to change. So whether or not you like it, you’ll need to make your peace with the idea that indexing starts from 0 while you’re reading this book.

5.3.1.3. List slicing#

Indexing is nice, but what if we want to pull more than one element at a time out of our list? Can we easily retrieve only part of a list? The answer is yes! We can slice a list, and get back another list that contains multiple contiguous elements of the original list, using the colon (:) operator.

random_stuff[1:3]
['apple', 7.14]

In the list-slicing syntax, the number before the colon indicates the start position, and the number after the colon indicates the end position. Note that the start is inclusive and the end is exclusive. That is, in the above example, we get back the 2nd and 3rd elements in the list, but not the 4th. If it helps, you can read the 1:3 syntax as saying I want all the elements in the list starting at index 1 and stopping just before index 3.

5.3.1.4. Assigning values to list elements#

Lists are mutable objects, meaning that they can be modified after they’ve been created. In particular, we very often want to replace a particular list value with a different value. To overwrite an element at a given index, we assign a value to it, using the same indexing syntax we saw above:

print("Value of first element before re-assignment:", random_stuff[0])

random_stuff[0] = "eleventy"

print("Value of first element after re-assignment:", random_stuff[0])
Value of first element before re-assignment: 11
Value of first element after re-assignment: eleventy

5.3.1.5. Appending to a list#

It’s also very common to keep appending variables to an ever-growing list. We can add a single element to a list via the .append() function (notice again that we are calling a function using the ‘dot’ notation, we promise that we’ll come back to that later!).

random_stuff.append(88)
print(random_stuff)
['eleventy', 'apple', 7.14, 'banana', 88]

Exercise

There are several ways to combine lists together, including the append function you saw above, as well as the extend method. You can also add lists together using the addition (+) operator.

Given the following two lists:

list1 = [1, 2, 3]

list2 = [4, 5, 6]

How would you create a new list called list3 that has the items: [6, 5, 1, 2, 3], with as few operations as possible and only using indexing operations and functions associated with the list (hint: you can look up these functions in the Python online documentation for lists)

5.3.2. Dictionaries (dict)#

Dictionaries are another extremely common data structure in Python. A dictionary (or dict) is a mapping from keys to values; we can think of it as a set of key/value pairs, where the keys have to be unique (but the values don’t). Many other languages have structures analogous to Python’s dictionaries, though they’re usually called something like associative arrays, hash tables, or maps.

5.3.2.1. Dictionary initialization#

We initialize a dictionary by specifying comma-delimited key/value pairs inside curly braces. Keys and values are separated by a colon. It looks like this:

fruit_prices = {
    "apple": 0.65,
    "mango": 1.5,
    "strawberry": "$3/lb",
    "durian": "unavailable",
    5: "just to make a point"
}

Notice that both the keys and the values can be heterogeneously typed (observe the last pair, where the key is an integer).

5.3.2.2. Accessing values in a dictionary#

In contrast to lists, you can’t access values stored in a dictionary directly by their serial position. Instead, values in a dictionary are accessed by their key. The syntax is identical to that used for list indexing. We specify the key whose corresponding value we’d like to retrieve in between square brackets:

fruit_prices['mango']
1.5

And again, the following example would fail, raising a KeyError telling us there’s no such key in the dictionary:

fruit_prices[0]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [30], in <cell line: 1>()
----> 1 fruit_prices[0]

KeyError: 0

However, the reason the above key failed is not that integers are invalid keys. To prove that, consider the following:

fruit_prices[5]
'just to make a point'

Superficially, it might look like we’re requesting the 6th element in the dictionary and getting back a valid value. But that’s actually not what’s happening here. If it’s not clear to you why fruit_prices[0] fails while fruit_prices[5] succeeds, go back and look at the code we used to create the fruit_prices dictionary. Carefully inspect the keys and make sure you understand what’s going on.

5.3.2.3. Updating a dictionary#

Updating a dictionary uses the same []-based syntax as accessing values, except we now make an explicit assignment. For example, we can add a new entry for the ananas key:

fruit_prices["ananas"] = 0.5

Or over-wrtie the value for the mango key:

fruit_prices["mango"] = 2.25

And then look at the dict again:

print(fruit_prices)
{'apple': 0.65, 'mango': 2.25, 'strawberry': '$3/lb', 'durian': 'unavailable', 5: 'just to make a point', 'ananas': 0.5}

5.3.3. Tuples#

The last widely-used Python collection we’ll discuss here (though there are many other more esoteric ones) is the tuple. Tuples are very similar to lists in Python. The main difference between lists and tuples is that lists are mutable, meaning, they can change after initialization. Tuples are immutable; once a tuple has been created, it can no longer be modified.

We initialize a tuple in much the same way as a list, except we use parentheses (round brackets) instead of square brackets:

my_tuple = ("a", 12, 4.4)

Just to drive home the immutability of tuples, let’s try replacing a value and see what happens:

my_tuple[1] = 999
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [36], in <cell line: 1>()
----> 1 my_tuple[1] = 999

TypeError: 'tuple' object does not support item assignment

Our attempt to modify the tuple makes the Python interpreter unhappy. Fortunately, we can easily convert any tuple to a list, after which we can modify it to our heart’s content.

converted_from_tuple = list(my_tuple)
converted_from_tuple[1] = 999
print(converted_from_tuple)
['a', 999, 4.4]

In practice, you can use a list almost anywhere you can use a tuple, though there are some important exceptions. One that you can already appreciate is that a tuple can be used as a key to a dictionary, but a list can’t:

dict_with_sequence_keys = {my_tuple : "Access this value using a tuple!"}
dict_with_sequence_keys[converted_from_tuple] = "This will not work"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [39], in <cell line: 1>()
----> 1 dict_with_sequence_keys[converted_from_tuple] = "This will not work"

TypeError: unhashable type: 'list'

Addmitedly, the error that this produces is a bit cryptic, but it relates directly to the fact that a mutable object is considered a bit unreliable, because elements within it can change without notice.

5.4. Everything in Python is an object#

Our discussion so far might give off the impression that some data types in Python are basic or special in some way. It’s natural to think, for example, that strings, integers, and booleans are “primitive” data types —- i.e., that they’re built into the core of the language, behave in special ways, and can’t be duplicated, or modified. And this is true in many other programming languages. For example, in Java, there are exactly 8 primitive data types. If you get bored of them, you’re out of luck. You can’t just create new ones -— say, a new type of string that behaves just like the primitive strings, but adds some additional functionality you think would be kind of cool to have.

Python is different: it doesn’t really have any primitive data types. Python is a deeply object-oriented programming language, and in Python, everything is an object. Strings are objects, integers are objects, booleans are objects. So are lists. So are dictionaries. Everything is an object in Python. We’ll spend more time talking about what objects are, and the deeper implications of everything being an object, at the end of this chapter. For now, let’s focus on some of the practical implications for the way we write code.

5.4.1. The dot notation#

Let’s start with the dot (.) notation we use to indicate that we’re accessing data or functionality inside an object. You’ve probably already noticed that there are two kinds of constructions we’ve been using in our code to do things with variables. There’s the functional syntax, where we pass an object as an argument to a function:

len([2, 4, 1, 9])
4

And then there’s the object-oriented syntax that uses the dot notation, which we saw when looking at some of the functionality implemented in strings:

phrase = "aPpLeS ArE delICIous"

phrase.lower()
'apples are delicious'

If you have some experience in another object-oriented programming language, the dot syntax will be old hat to you. But if you’ve mostly worked in data-centric languages (e.g., R or Matlab), you might find it puzzling at first.

What’s happening in the above example is that we’re calling a function attached to this object (this is called a “method” of the object) lower() on the phrase string itself. You can think of the dot operator . as expressing a relationship of belonging, or roughly translating as “look inside of”. So, when we write phrase.lower(), we’re essentially saying, “call the lower() method that’s contained inside of phrase”. (We’re being a bit sloppy here for the sake of simplicity, but that’s the gist of it.)

Note that lower() works on strings, but, unlike functions like len() and round(), it isn’t a built-in function in Python. We can’t just call lower() directly:

lower("TrY to LoWer ThIs!")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [42], in <cell line: 1>()
----> 1 lower("TrY to LoWer ThIs!")

NameError: name 'lower' is not defined

Instead, it needs to be called via an instance that contains this function, as we did above with phrase.

Neither is lower() a method that’s available on all objects. For example, this won’t work:

num = 6

num.lower()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [43], in <cell line: 3>()
      1 num = 6
----> 3 num.lower()

AttributeError: 'int' object has no attribute 'lower'

Integers, as it happens, don’t have a method called lower(). And neither do most other types. But strings do. And what the lower() method does, when called from a string, is return a lower-cased version of the string to which it is attached. But that functionality is a feature of the string type itself, and not of the Python language in general.

Later, we’ll see how we go about defining new types (or classes), and specifying what methods they have. For the moment, the main point to take away is that almost all functionality in Python is going to be accessed via objects. The dot notation is ubiquitous in Python, so you’ll need to get used to it quickly if you’re used to a purely functional syntax.

5.4.1.1. Inspecting objects#

One implication of everything being an object in Python is that we can always find out exactly what data an object contains, and what methods it implements, by inspecting it in various ways.

We won’t look very far under the hood of objects in this chapter, but it’s worth knowing about a couple of ways of interrogating objects that can make your life easier.

First, you can always see the type of an object with the built-in type() function, which you also saw before:

msg = "Hello World!"

type(msg)
str

Second, the built-in dir() function will show you all of methods implemented on an object, as well as static attributes, which variables stored within the object. Be warned that this will often be a long list, and that some of the attribute names you see (mainly those that start and end with two underscores) will look a little wonky. We’ll talk about those briefly later.

dir(msg)
['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

That’s a pretty long list! Any name in that list is available to you as an attribute of the object (e.g., my_var.values(), my_var.__class__, etc.), meaning that you can access it and call it (if it is a function) using the dot notation. Notice that the list contains all of the string methods we experimented with earlier (including lower), as well as many others.

5.5. Control flow#

Like nearly every other programming language, Python has a number of core language constructs that allow us to control the flow of our code. That is, the order in which functions get called and expressions are evaluated. The two most common ones are conditionals (if-then statements) and for-loops.

5.5.1. Conditionals#

Conditional (or if-then) statements allow our code to branch -— meaning, we can execute different chunks of code depending on which of two or more conditions is met. For example:

mango = 0.2

if mango < 0.5:
    print("Mangoes are super cheap; get a bunch of them!")
elif mango < 1.0:
    print("Get one mango from the store.")
else:
    print("Meh. I don't really even like mangoes.")
Mangoes are super cheap; get a bunch of them!

The printed statement will vary depending on the value assigned to the mango variable. Try changing that value and see what happens when you re-run the code.

Notice that there are actually three statements in the above code: if, elif (which in Python stands for “else if”), and else. Only the first of these (i.e., if) is strictly necessary; the elif and else statements are optional.

Exercise

There can be arbitrarily many elif statements. Try adding another one to the code above that executes only in the case that mangos are more expensive than 2.0 and less expensive than 5.0.

5.5.2. Loops#

For-loops allow us to iterate (or loop) over the elements of a collection (e.g., a list) and perform the same operation(s) on each one. As with most things Python, the syntax is quite straightforward and readable:

for elem in random_stuff:
    print(elem)
eleventy
apple
7.14
banana
88

Here we loop over the elements in the random_stuff list. In each iteration (i.e., for each element), we assign the value to the temporary variable elem, which only exists within the scope of the for statement (i.e., elem won’t exist once the for-loop is done executing). We can then do whatever we like with elem. In this case, we just print its value.

5.5.2.1. Looping over a range#

While we can often loop directly over the elements in an array (as in the above example), it’s also very common to loop over a range of integer indices, which we can then use to access data stored in some sequential form in one or more collections. To facilitate this type of thing, we can use Python’s built-in range() function, which produces a sequence of integers starting from 0 and stopping before the passed value:

num_elems = len(random_stuff)

for i in range(num_elems):
    val = random_stuff[i]
    print(f"Element {i}: {val}")
Element 0: eleventy
Element 1: apple
Element 2: 7.14
Element 3: banana
Element 4: 88

Exercise

The content that was printed in each iteration of the loop in the last example is formatted using a so-called “f-string”. This is a way to compose strings that change based on information from the code surrounding them. An f-string is a string that has the letter “f” before it, as in this example, and it can contain segments enclosed by curly braces ({ and }) that contain Python statements. In this case, the Python statements in each curly bracket are variable names, and the values of the variables at that point in the code are inserted into the string, but you could also insert small calculations that produce a result that then gets inserted into the string at that location. As an exercise, rewrite the code above so that in each iteration through the loop the value of i and the value of i squared are both printed.

5.5.3. Nested control flow#

We can also nest conditionals and for-loops inside one another (as well as inside other compound statements). For example, we can loop over the elements of random_stuff, as above, but keeping only the elements that meet some condition—e.g., only those elements that are strings:

# create an empty list to hold the filtered values
strings_only = []

# loop over the random_stuff list
for elem in random_stuff:
    # if the current element is a string...
    if isinstance(elem, str):
        # ...then append the value to strings_only
        strings_only.append(elem)

print("Only the string values:", strings_only)
Only the string values: ['eleventy', 'apple', 'banana']

5.5.4. Comprehensions#

In Python, for-loops can also be written in a more compact way known as a list comprehension (there are also dictionary comprehensions, but we’ll leave that to you to look up as an exercise). List comprehensions are just a bit of syntactic sugar—meaning, they’re just a different way of writing the same code, but don’t change the meaning in any way. Here’s the list comprehension version of the for-loop we wrote above:

p = [print(elem) for elem in random_stuff]
eleventy
apple
7.14
banana
88

We can also embed conditional statements inside list comprehensions. Here’s a much more compact way of writing the string-filtering snippet we wrote above:

strings_only = [elem for elem in random_stuff if isinstance(elem, str)]

print("Only the string values:", strings_only)
Only the string values: ['eleventy', 'apple', 'banana']

List comprehensions can save you quite a bit of typing once you get used to reading them, and you may eventually even find them clearer to read. It’s also possible to nest list comprehensions (equivalent to for-loops within for-loops), though that power should be used sparingly, as nested list comprehensions can be difficult to understand.

5.5.5. Whitespace is syntactically significant#

One thing you might have noticed when reading the conditional statements and for-loops above is that we always seem to indent our code inside these statements. This isn’t a matter of choice; Python is a bit of an odd duck among programming languages, in that it imposes strong rules about how whitespace can be used (i.e., whitespace is syntactically significant). This can take a bit of getting used to, but once you do, it has important benefits: there’s less variability in coding style across different Python programmers, and reading other people’s code is often much easier than it is in languages without syntactically significant whitespace.

The main rule you need to be aware of is that whenever you enter a compound statement (which includes for-loops and conditionals, but also function and class definitions, as we’ll see below), you have to increase the indentation of your code. When you exit the compound statement, you then decrease the indentation by the same amount.

The exact amount you indent each time is technically up to you. But it’s strongly recommended that you use the same convention everyone else does (described in the Python style guide, known as PEP8), which is to always indent or dedent by 4 spaces. Here’s what this looks like in a block with multiple nested conditionals:

num = 800

if num > 500:
    if num < 900:
        if num > 700:
            print("Great number.")
        else:
            print("Terrible number.")
Great number.

Exercise

Modify the above snippet so that you (a) consistently use a different amount of indentation (for example, 3 spaces), and (b) break Python by using invalid indentation.

5.6. Namespaces and imports#

Python is a high-level, dynamic programming language, which people often associated with flexibility and lack of precision (e.g., as we’ve already seen, you don’t have to declare the type of your variables when you initialize them in Python). But in some ways, Python is actually much more of a stickler than most other dynamic languages about the way Python developers write their code. We just saw that Python is very serious about how you indent your code. Another thing that’s characteristic of Python is that it takes namespacing very seriously.

If you’re used to languages like, say, R or MATLAB, you might expect to have hundreds of different functions available to call as soon as you fire up an interactive prompt. By contrast, in Python, the built-in namespace -— the set of functions you can invoke when you start running Python -— is very small. This is by design: Python expects you to carefully manage the code you use, and it’s particularly serious about making sure you maintain orderly namespaces.

In practice, this means that any time you want to use some functionality that is not built-in and immediately available, you need to explicitly import it from whatever module it’s currently in, via an import statement. This can initially look strange, but once you get used to it, you’ll find that it substantially increases code clarity and almost completely eliminates naming conflicts and confusion about where some functionality came from.

5.6.1. Importing a module#

Conventionally, all import statements in a Python file are consolidated at the very top (though there are some situations where this isn’t possible). Here’s what the most basic usage of import looks like:

import json
dummy = {'a': 1, 'b': 5}
json.dumps(dummy)
'{"a": 1, "b": 5}'

In this case, we begin by importing a module from Python’s standard library–i.e., the set of libraries that come bundled with the Python interpreter in every standard installation. Once we import a module, we can invoke any of the functions located inside it, using the dot syntax you see above. In this case, we import the json module, which provides tools for converting to and from the JSON format. JSON, which stands for JavaScript Object Notation, is a widely-used text-based data representation format. In the above example, we take a Python dictionary and convert (or “dump”) it to a JSON string by calling the dumps() function.

Note that if we hadn’t explicitly imported json, the json.dumps() call would have failed, because json would be undefined in our namespace. You can also try directly calling dumps() alone (without the json prefix) to verify that there is no such function available to you in Python’s root namespace.

5.6.2. Importing from a module#

Importing a module by name gives us access to all of its internal attributes. But sometimes we only need to call a single function inside a module, and we might not want to have to type the full module name every time we use that function. In that case, we can import from the module:

# defaultdict is a dictionary that has default values for new keys.
from collections import defaultdict

# When initializing a defaultdict, we specify the default type of new values.
test_dict = defaultdict(int)

# this would fail with a normal dict, but with a defaultdict,
# a new key with the default value is created upon first access.
test_dict['made_up_key']
0

In this case, we import defaultdict directly from the collections module into our current namespace. This makes defaultdict available for our use. Note that collections itself is not available to us unless we explicitly import it (i.e., if we run import collections):

import collections
another_test_dict = collections.defaultdict(int)

5.6.3. Renaming variables at import time#

Sometimes the module or function we want to import has an unwieldy name. Python’s import statements allow us to rename the variable we’re importing on-the-fly using the as keyword:

from collections import defaultdict as dd

float_test_dict = dd(float)

For many commonly used packages, that are strong conventions about naming abbreviations, and you should make sure to respect these in your code. For example, it’s standard to see import numpy as np and import pandas as pd (both libraries that you will learn about in Section 8 and Section 9, respectively). Your code will still work fine if you use other variable names, but other programmers will have a slightly more difficult time understanding what you’re doing. So be kind to others and respect the conventions.

5.7. Functions#

Python would be of limited use to us if we could only run our code linearly from top to bottom. Fortunately, as in almost every other modern programming language, Python has functions: blocks of code that only run when explicitly called. Some of these are built into the language itself (or contained in the standard library’s many modules we can import from, as we saw above):

approx_pi = 3.141592

round(approx_pi, 2)
3.14

Here we use the round() function to round a float to the desired precision (2 decimal places). The round() function happens to be one of the few dozen “built-ins” included in the root Python namespace out of the box, but we can easily define our own functions, which we can then call just like the built-in ones. Functions are defined like this:

def print_useless_message():
    print("This is a fairly useless message.")

Here, we’re defining a new function called print_useless_message, which can print a fairly useless message (truth in advertising!). Notice that nothing happens when we run the above block of code. That’s because all we’ve done is define the function; we haven’t yet called or invoked it. We can do that like this:

print_useless_message()
This is a fairly useless message.

5.7.1. Function arguments and return values#

Functions can accept arguments (or parameters) that alter their behavior. When we called round() above, we passed two arguments: the float we wanted to round, and the number of decimal places we wanted to round it to. The first argument is mandatory in the case of round(); if we try calling round() without any arguments (feel free to give it a shot), we’ll get an error. This should make intuitive sense to you, because it would be pretty strange to try to round no value at all.

Functions can also explicitly return values to the user. To do this, we have to explicitly end our function with a return statement, followed by the variable(s) we want to return. If a function doesn’t explicitly end with a return statement, then the special value None we encountered earlier will be returned.

Let’s illustrate the use of arguments by writing a small function that takes a single float as input, adds Gaussian noise (generated by the standard library’s random module), and returns the result.

import random

def add_noise(x, mu, sd):
    """Adds gaussian noise to the input.

    Parameters
    ----------
    x : number
        The number to add noise to.
    mu : float
        The mean of the gaussian noise distribution.
    sd : float
        The standard deviation of the noise distribution.

    Returns
    -------
    float
    """
    noise = random.normalvariate(mu, sd)
    return (x + noise)

The add_noise() function has three required parameters: The first (x) is the number we want to add noise to; the second (mu) is the mean of the Gaussian distribution to sample from; and the third (sd) is that distribution’s standard deviation.

Notice that we’ve documented the function’s behavior inside the function definition itself using what’s called a docstring. This a good habit to get into, as good documentation is essential if you expect other people to be able to use the code you write (including yourself in the future). In this case, the docstring clearly indicates to the user what the expected type of each argument is, what the argument means, and what the function returns. In case you are wondering why it is organized in just this way, that is because we are following the conventions of docstrings established by the numpy project (and described in the numpy docstring guide).

Now that we’ve defined our noise-adding function, we can start calling it. Note that because we’re sampling randomly from a distribution, we’ll get a different output every time we re-run the function, even if the inputs are the same.

add_noise(4, 1, 2)
4.26589608501196

Exercise

Based on the function definition provided above, define a new function that produces a sample of n numbers each of which is x with Gaussian noise of mean mu and standard deviation std added to it. The return value should be a list of length n, itself a parameter to the function.

5.7.2. Function arguments#

Python functions can have two kinds of arguments: positional arguments, and keyword (or named) arguments.

5.7.2.1. Positional arguments#

Positional arguments, as their name suggests, are defined by position, and they must be passed when the function is called. The values passed inside the parentheses are mapped one-to-one onto the arguments, as we saw above for add_noise(). That is, inside the add_noise() function, the first value is referenced by x, the second by mu, and so on.

If the caller fails to pass the right number of arguments (either too few or too many), an error will be generated:

add_noise(7)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [62], in <cell line: 1>()
----> 1 add_noise(7)

TypeError: add_noise() missing 2 required positional arguments: 'mu' and 'sd'

In this case, the call to the function fails because the function has 3 positional arguments, and we only pass one

5.7.2.2. Keyword arguments#

Keyword arguments are arguments that are assigned a default value in the function signature (i.e., the top line of the function definition, that looks like def my_function(...)). Unlike positional arguments, keyword arguments are optional: if the caller doesn’t pass a value for the keyword argument, the corresponding variable will still be available inside the function, but it will have whatever value is defined as the default in the signature.

To see how this works, let’s rewrite our add_noise() function so that the parameters of the gaussian distribution are now optional:

def add_noise_with_defaults(x, mu=0, sd=1):
    """Adds gaussian noise to the input.

    Parameters
    ----------
    x : number
        The number to add noise to.
    mu : float, optional
        The mean of the gaussian noise distribution.
        Default: 0
    sd : float, optional
        The standard deviation of the noise distribution.
        Default: 1

    Returns
    -------
    float
    """
    noise = random.normalvariate(mu, sd)
    return x + noise

This looks very similar, but we can now call the function without filling in mu or sd. If we don’t pass in those values explicitly, the function will internally use the defaults (i.e., 0 in the case of mu, and 1 in the case of sd). Now, when we call this function with only one argument:

add_noise_with_defaults(10)
9.384491150137645

Keyword arguments don’t have to be filled in order, as long as we explicitly name them. For example, we can specify a value for sd but not for mu:

# we specify x and sd, but not mu
add_noise_with_defaults(5, sd=100)
-6.777664068105974

Note that if we didn’t specify the name of the argument (i.e., if we called add_noise_with_defaults(5, 100), the function would still work, but the second value we pass would be interpreted as mu rather than sd, because that’s the order they were introduced in the function definition.

It’s also worth noting that we can always explicitly name any of our arguments, including positional ones. This is extremely handy in cases where we’re calling functions whose argument names we remember, but where we don’t necessarily remember the exact order of the arguments. For example, suppose we remember that add_noise() takes the three arguments x, mu, and sd, but we don’t remember if x comes before or after the distribution parameters. We can guarantee that we get the result we expect by explicitly specifying all the argument names:

add_noise(mu=1, sd=2, x=100)
102.4788002146504

To summarize, functions let us define a piece of code that can be called as needed and reused. We can define a default behavior and over-ride it as necessary. There is a bit more to it, of course. For a few more details, you can go into more depth in argument_unpacking.


Argument unpacking with *args and **kwargs

It sometimes happens that a function needs to be able to accept an unknown number of arguments. A very common scenario like this is where we’ve written a “wrapper” function that takes some input, does some operation that relies on only some of the arguments the user passed, and then hands off the rest of the arguments to a different function.

For example, suppose we want to write a arg_printer() function that we can use to produce a standardized display of the positional and keyword arguments used when calling some other arbitrary function. Python handles this scenario elegantly via special *args and **kwargs syntax in function signatures, also known as argument unpacking.

The best way to understand what *args and **kwargs do is to see them in action. Here’s an example:

    def arg_printer(func, *args, **kwargs):
        """
        A wrapper that takes any other function plus arguments to pass
        to that function. The arguments are printed out before the
        function is called and the result returned.

        Parameters
        ----------
        func : callable
            The function to call with the passed args.
        args : list, optional
            List of arguments to pass into func.
        kwargs : dict, optional
            Dict of keyword arguments to pass into func.

        Returns
        -------
        The result of func() when called with the passed arguments.
        """
        print("Calling function:", func.__name__)
        print("Positional arguments:", args)
        print("Keyword arguments:", kwargs)
        return func(*args, **kwargs)

This may seem a bit mysterious, and there are parts we won’t explain right now (e.g., func.__name__). But try experimenting a bit with calling this function, and things may start to click. Here’s an example to get you rolling

arg_printer(add_noise, 17, mu=0, sd=5)
Calling function: add_noise
Positional arguments: (17,)
Keyword arguments: {'mu': 0, 'sd': 5}
15.926281360374386

What’s happening here is that the first argument to arg_printer() is the add_noise() function we defined earlier. Remember: everything in Python is secretly an object–even functions! You can pass functions as arguments to other functions too; it’s no different than passing a string or a list. A key point to note, however, is that what we’re passing in is, in a sense, the definition of the function. Notice how we didn’t add parentheses to add_noise when we passed it to arg_printer()? That’s because we don’t want to actually call add_noise yet; we’re leaving it to the arg_printer() function to do that internally.

All the other arguments to arg_printer() after the first one are arguments that we actually want to pass onto the add_noise() function when it’s called internally by arg_printer(). The first thing arg_printer() does is print out the name of the function we just gave it, as well as all of the positional and keyword arguments we passed in. Once it’s done that, it calls the function we passed in (add_noise()) and passes along all the arguments.

If the above doesn’t make sense, don’t worry! As we mentioned above, we’re moving quickly, and the concepts from here on out start to get quite a bit denser. A good way to explore these ideas a bit better is write your own code and experimenting with things until they start to make sense.


Exercise

Replace add_noise with built-in functions like min, len, or list. What other arguments do you need to change?



5.8. Classes#

The material we’ve covered so far in this chapter provides a brief overview of the most essential concepts in Python, and should be sufficient to get you started reading and writing code in the language. If you’re impatient to start working with scientific computing libraries and playing with neuroimaging data, you could stop here and move on to the next chapter. That said, there are a number of concepts we haven’t talked about yet that are quite central to understanding how Python really works under the hood, and you’ll probably get more out of this book if you understand them well. They are, however, quite a bit more difficult conceptually.

In particular, the object-oriented programming (OOP) paradigm, and Python’s internal data model (which revolves around something called magic methods), are not very intuitive, and it can take some work to wrap your head around them if you haven’t seen OOP before. We’ll do our best to introduce these concepts here, but don’t be surprised or alarmed if they don’t immediately make sense to you–that’s completely normal! You’ll probably find that much of this material starts to “click” as you work your way through the various examples in this book and start to write your own Python analysis code.

We’ve said several times now that everything in Python is actually an object. We’re now in a position to unpack that statement. What does it mean to say that something is an object? How do objects get defined? And how do we specify what an object should do when a certain operation is applied to it? To answer these question, we need to introduce the notion of classes — and, more generally, the object-oriented programming (OOP) paradigm.

5.8.1. What is a class?#

A class is, in a sense, a kind of template for an object. You can think of it as a specification, or a set of instructions that determine what an object of a given kind can do. In a sense, it’s very close in meaning to what we’ve already been referring to as the type of an object. There is technically a difference between types and classes in Python, but it’s quite subtle, and in day-to-day usage, you can use the terms interchangeably and nobody is going to yell at you.

5.8.2. Defining classes#

So a class is a kind of template; okay, what does it look like? Well, minimally, it looks like very little. Here’s a fully functional class definition:

class Circle:
    pass

That’s it! We’ve defined a new Python class. In case you’re wondering, the pass statement does nothing – it is used as a placeholder that tells the Python interpreter it shouldn’t expect any more code to follow.

5.8.3. Creating instances#

You might think that this empty Circle class definition isn’t very interesting, because we obviously can’t do anything with it. But that isn’t entirely true. We can already instantiate this class if we like — which is to say, we can create new objects whose behavior is defined by the Circle class. A good way to think about it is in terms of a “X-is-a-particular-Y” relationship. If we were to draw 3 different circles on a piece of paper, we could say that each one is an instance of a circle, but none of them would be the actual definition of a circle. That should give you an intuitive sense of the distinction between a class definition and instances of that class.

The syntax for creating an instance in Python is simple:

my_circle = Circle()

That’s it again! We now have a new my_circle variable on our hands, which is an instance of class Circle.

If you don’t believe that, it’s easy to prove:

type(my_circle)
__main__.Circle

In case you are wondering, the reason that this appears to be a __main__.Circle object is that while we are running the code in this noteboook, it is defined within a namespace (yes, namespaces again) called __main__, and the type function recognizes this fact. You can learn more about this particular namespace in the Python documentation

5.8.3.1. A note on nomenclature#

You may have already noticed the naming convention we’ve used throughout this tutorial: our variable names are always composed of lower-case characters, with words separated by underscores. This is called snake_case. You’ll also note that class names are capitalized (technically, they’re in CamelCase). Both of these are standard conventions in the Python community, and , and you should get in the habit of following both.

5.8.4. Making it do things#

The Circle class definition we wrote above was perfectly valid, but not terribly useful. It didn’t define any new behavior, so any instances of the class we created wouldn’t do anything more than base objects in Python can do (which isn’t very much).

Let’s fix that by filling in the class a bit.

from math import pi

class Circle:

    def __init__(self, radius):
        self.radius = radius

    def area(self):
        return pi * self.radius**2

There’s not much code to see here, but conceptually, a lot is going on. Let’s walk through this piece by piece.

We start by importing the variable pi from the standard library math module.

Then, we start defining the class. First, observe that we’ve defined what look like two new functions inside the class. Technically, these are methods and not functions, because they’re bound to a particular object. But the principle is the same: they’re chunks of code that take arguments, do some stuff, and then (possibly) return something to the caller.

You’ll also note that both methods have self as their first argument. This is a requirement: all instance methods have to take a reference to the current instance (conventionally named self) as their first argument (there are also class methods and static methods, which behave differently, but we won’t cover those). This reference is what will allow us to easily maintain state (that is, to store information that can vary over time) inside the instance.

Now let’s walk through each of the defined methods. First, there’s __init__(). The leading and trailing double underscores indicate that this is a special kind of method called a magic method; we’ll talk about those a bit more later. For the moment, we just have to know that __init__() is a special method that gets called whenever we create a new instance of the class. So, when we write a line like my_circle = Circle(), what happens under the hood is that the __init__() method of Circle gets executed.

Observe that, in this case, __init__() takes a single argument (other than self, that is): a radius argument. And further, the only thing that happens inside __init__() is that we store the value of the input argument radius in an instance attribute called radius. We do this by assigning to self. Remember: self is a reference to the current instance, so the newly-created instance that’s returned by Circle() will have that radius value set in the .radius attribute.

This code should make this a bit clearer:

my_circle = Circle(4)
print(my_circle.radius)
4

Next, let’s look at the area() method. This one takes no arguments (again, self is passed automatically; we don’t need to, and shouldn’t, pass it ourselves). That means we can just call it and see what happens. Let’s do that:

my_circle.area()
50.26548245743669

When we call area(), what we get back is the area of our circle—based on the radius stored in the instance at that moment. Note that this area is only computed when we actually call area(), and isn’t computed in advance. This means that if the circle’s radius changes, so too will the result of area():

my_circle.radius = 9
print(my_circle.area())
254.46900494077323

Exercises

  1. Add to the implementation of Circle another method that calculates the circumference of the circle.

  2. Implement a class called Square that has a single attribute .side, and two methods for area and circumference.

5.8.5. Magic methods#

There’s a lot more we could say about how classes work in Python, and about object-oriented programming in general, but this is just a brief introduction, so we have to be picky. Let’s introduce just one other big concept: magic methods. The concepts we’ll discuss in this last section are actually fairly advanced, and aren’t usually discussed in introductions to Python – so, as we’ve said several times now, don’t worry if they don’t immediately click for you. We promise you’ll still be able to get through the rest of the book just fine! The reason we decided to cover magic methods here is that, once you do understand them well, you’ll have a substantially deeper grasp on how Python actually works, and you’ll probably see patterns and make connections that you otherwise wouldn’t.

Magic methods of objects, as we’ve seen a couple of times now, start and end with a double underscore: __init__, __getattr__, __new__, and so on. As their names suggest, these methods are magic -— at least in the sense that they appear to add some magic to a class’s behavior. We’ve already talked about __init__, which is a magic method that gets called any time we create a new instance of a class. But there are many others.

The key to understanding how magic methods work is to recognize that they’re usually called implicitly when a certain operation is applied to an object -— even if it doesn’t look like the magic method and the operation being applied have anything to do with each other (that’s what makes them magic!).

Remember how we said earlier that everything in Python is an object? Well now we’re going to explore one of the deeper implications of that observation, which is that all operators in Python are actually just cleverly-disguised method calls. That means that when we write even an expression as seemingly basic as 4 * 3 in Python, it’s actually implicitly converted to a call to a magic method on the first operand (4), with the second operand (3) being passed in as an argument.

This is a bit hard to explain abstractly, so let’s dive into an example. Start with this naive arithmetic operation:

4 * 3
12

No surprises there. But here’s an equivalent way to write the same line, which makes clearer what’s actually happening under the hood when we multiple one number by another:

# 4 is a number, so we have to wrap it in parentheses to prevent a syntax error.
# but we wouldn't have to do this for other types of variables (e.g., strings).
(4).__mul__(3)
12

Remember the dot notation? Here, __mul__ is actually a (magic) method implemented in the integer class. When Python evaluates the expression 4 * 3, it actually calls __mul__ on the first integer, and hands it the second one as an argument. See, we weren’t messing around when we said everything is an object in Python. Even something as seemingly basic as the multiplication operator is actually just an alias to a method called on an integer object!

5.8.6. The semantics of *#

Once we recognize that Python’s * operator is just an alias to the __mul__ magic method, we might start to wonder if this is always true. Does every valid occurrence of * in Python code imply that the object just before the * must be an instance of a class that implements the __mul__ method? The answer is yes! The result of an expression that includes the * operator (and for that matter, every other operator in Python, including things like == and &) is entirely dependent on the receiver object’s implementation of __mul__.

Just to make it clear how far-reaching the implications of this principle are, let’s look at how a couple of other built-in Python types deal with the * operator. Let’s start with strings. What do you think will happen when we multiply a string object by 2?

"apple" * 2
'appleapple'

There’s a good chance this was not the behavior you expected. Many people intuitively expect an expression like "apple" * 2 to produce an error, because we don’t normally think of strings as a kind of thing that can be multiplied. But remember: in Python, the multiplication operator is just an alias for a __mul__ call. And there’s no particular reason a string class shouldn’t implement the __mul__ method; why not define some behavior for it, even if it’s counterintuitive? That way users have a super easy way to repeat strings if that’s what they want to do.

What about a list?

random_stuff * 3
['eleventy',
 'apple',
 7.14,
 'banana',
 88,
 'eleventy',
 'apple',
 7.14,
 'banana',
 88,
 'eleventy',
 'apple',
 7.14,
 'banana',
 88]

List multiplication behaves a lot like string multiplication: the result is that the list is repeated \(n\) times.

What about dictionary multiplication?

fruit_prices * 2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [80], in <cell line: 1>()
----> 1 fruit_prices * 2

TypeError: unsupported operand type(s) for *: 'dict' and 'int'

Finally, we encounter an outright failure! It appears Python dictionaries can’t be multiplied. Which presumably means that the dict class doesn’t implement __mul__ (you can verify this for yourself by inspecting any dictionary using dir()).

5.8.7. Other magic methods#

Most of the magic methods in Python do something very much like what we saw for the multiplication operator. Consider the following operators: +, &, /, %, and <. These map, respectively, onto the magic methods __add__, __and__, __truediv__, __mod__, and __lt__. There are many others that follow the same pattern.

There are also a number of magic methods tied to built-in functions rather than operators (for example, when you call len(obj), that’s equivalent to calling obj.__len__), or that are triggered by certain events (for example, __getattr__ is called when a requested attribute isn’t found in an object).

In practice, you won’t really need to know much about magic methods unless you start writing a lot of classes of your own (but you can find a full descriptions of all the magic methods, you can peruse the official docs.). We spent a lot of time talking about them mainly because they’re a good way to convey some deep insights about the data model at the core of the Python language.

5.8.8. Hungry Circles#

Let’s come full circle now (awful pun intended) and revisit the Circle class we defined earlier. The last thing we’ll do in this tutorial is add a magic method to our Circle class. This will nicely tie together a lot of different threads we’ve covered.

What we’re going to do is give instances of class Circle the ability to “eat” other circles. When given Python code like this:

    c1 = Circle(4)
    c2 = Circle(2)
    c1 * c2

we want the first circle to “grow” its radius by exactly the amount required for its new area to equal the sum of the two circles’ previous areas. Here’s our updated implementation:

from math import pi, sqrt

class Circle:

    def __init__(self, radius):
        self.radius = radius

    def __mul__(self, prey):
        new_area = self.area() + prey.area()
        self.radius = sqrt(new_area / pi)

    def area(self):
        return pi * self.radius**2

The only change here is the addition of the __mul__ method.

Let’s see if the above did what we wanted:

c1 = Circle(4)
c2 = Circle(2)

# Now the important part: c1 eats c2!
c1 * c2

Well, we didn’t get an error, so that’s a good sign. Let’s inspect c1 and see if it’s been updated as we expect. Remember: we expect c1 to have “eaten” c2, which means its radius should grow, and its area should be the sum of both previous areas.

print("Radius of c1 after gorging on c2:", c1.radius)
print("Area of c1 after gorging on c2:", c1.area())
Radius of c1 after gorging on c2: 4.47213595499958
Area of c1 after gorging on c2: 62.83185307179588

It worked!

The only slightly dissatisfying feature of our implementation is that, after c1 eats c2 and expands itself accordingly, c2 is somehow still around to tell the tale. This probably violates some physical conservation law, but we’ll overlook that here. For reasons we won’t get into, it’s not trivial to delete c2 from inside c1. (There are good reasons for this, and the fact that we can’t easily make some of our circles wink out of existence from inside the belly of other circles might lead us to suspect we’ve architected our code suboptimally. But that’s a problem for a different book.)

Exercise

Add a __mul__ method to your implementation of the Square class that follows the same principles as the __mul__ method of the Circle class, changing both the area and the side attributes of the calling object as it swallows the other object.

5.8.9. Additional resources#

This chapter provided a high-level look at some of the main features of the Python language—some basic, some more advanced. To really develop a working familiarity with the language, you will, of course need to roll up your sleeves and start writing some code. One of the best ways to learn is to pick a small problem that actually interests or matters to you in some way (e.g., parsing some text data you have lying around), and searching the web for help every time you run into problems (there’s no shame in consulting the internet! All programmers do it!).

If you prefer to have more structure than that, there are hundreds of excellent, and mostly free, resources online to help you on your way. A couple of good ones include:

  1. A Whirlwind Tour of Python is an excellent intro to Python by Jake VanderPlas; Jupyter notebooks are available in the book GitHub repo.

  2. Allen Downey’s “Think Python” is another excellent introduction to the language.