A brief introduction to Python
Contents
5. A brief introduction to Python¶
Researchers in neuroimaging use different programming languages to perform data analysis. In this book, we chose to focus on the Python programming language. However, as we mentioned in Section 1, this book isn’t meant to be a general introduction to programming; out of necessity, we’re going to assume that you’ve had some prior experience writing code in one or more other programming languages (or possibly even Python itself). That said, we’ll be briefly reviewing key programming concepts as we go, so you don’t need to have had much prior experience. As long as you’ve encountered, say, variables, functions, and for-loops before, you should be just fine. And if you haven’t yet, we recommend some resources at the end of this section, to get you up to speed. Conversely, if you’ve been programming in other languages for years, you should be able to breeze through this section very quickly, though we might still recommend paying attention to the last few sub-sections, which talk about some deeper ideas underlying the Python language.
5.1. What is Python?¶
Let’s start by talking a bit about what Python is and why we chose it for this book. First of all, it is a programming language, which means it’s a set of rules for writing instructions that a computer can understand and follow. Of course, that alone doesn’t make Python special; there are hundreds of other programming languages out there! But Python isn’t just any programming language; by many rankings, it’s currently (as we write this in 2022) the world’s single most popular language. And it’s particularly dominant in one of this book’s two core areas of focus: data science. We also happen to think it’s by far the best choice for doing serious work in the book’s other core area of focus–neuroimaging, and it is the language that we use most often for our neuroimaging work. But you don’t have to take our word for that right now; hopefully, you’ll be convinced of it as we work through this book.
Why do so many people like Python? There are many answers to that, but here are a few important ones.
First, Python is a high-level, interpreted programming language. This means that, in contrast to low-level, compiled languages like C, C++, Java, etc., Python features a high level of abstraction. A lot of the things you have to worry about in many low-level languages (e.g., memory allocation, garbage collection, etc.) are done for you automatically in Python.
Second, Python’s syntax is readable and (relatively speaking) easy to learn. As you’ll see shortly, many Python operators are ordinary English words. Python imposes certain rules on code structure that most languages don’t. This may be a bit annoying at first, but it also makes it easier to read other people’s code once you acclimate. One of the consequences of these design considerations is that mathematical ideas translate into code in a way that does not obscure the math. That means that mathematical equations implemented in Python look a lot like the same equations written on paper. This is useful in data science applications, where ideas are expressed in mathematical terms.
Third, Python is a general-purpose language. In contrast to many other dynamic programming languages designed to serve specific niche uses, Python is well-suited for a wide range of applications. It features a comprehensive standard library (i.e., the functionality available out-of-the-box when you install Python) and an enormous ecosystem of third-party packages. It also supports multiple programming paradigms to varying extents (object-oriented, functional, etc.). Consequently, Python is used in many areas of software development. Very few other languages can boast that they have some of the best libraries implemented in any language for tasks as diverse as, say, scientific computing and back-end web development.
Lastly, there’s the sheer size of the Python community. While Python undeniably has many attractive features, we wouldn’t want to argue that it’s a better overall programming language than anything else. To some degree, it’s probably true that Python’s popularity is an accident of history. If we could randomly re-run the last two decades, we might be extolling the virtues of Haskell (or Julia, or Ruby, or…) instead of Python. So we’re not saying Python is intrinsically the world’s greatest language. But there’s no denying that there are immense benefits to using a language that so many other people use. For most tasks you might want to take on if you’re doing data science and/or neuroimaging data analysis, finding good libraries, documentation, help, and collaborators is simply going to be much easier if you work in Python than if you work in almost any other language.
Importantly, the set of tools in Python specifically for the analysis of neuroimaging data have rapidly evolved and matured in the last couple of decades, and they have gained substantial popularity through their use in research (we’ll dive into these tools in Section 11.1). The Neuroimaging in Python ecosystem also has a strong ethos of producing high-quality open-source software, which means that the Python tools that we will describe in this book should be accessible to anyone. More broadly, Python has been adopted across many different research fields and is one of the most commonly used programming languages in the analysis of data across a broad range of domains, including astronomy, geosciences, natural language processing, and so on. For researchers who are thinking of applying their skills in the industry outside of academia, Python is very popular in the industry, with many entry positions in industry data science calling out specifically experience programming in Python as a desired skill.
With that basic sales pitch out of the way, let’s dive right into the Python language. We’ll spend the rest of this chapter working through core programming concepts and showing you how they’re implemented in Python. It should go without saying that this can only hope to be a cursory overview; there just isn’t time and space to provide a full introduction to the language! But we’ll introduce many more features and concepts throughout the rest of the book, and we also end each chapter with a curated list of additional resources you can look into if you want to learn much more.
5.2. Variables and basic types¶
It’s common to introduce programming by first talking about variables, and we won’t break with this convention. A variable, as you probably already know, is a store of data that can take on different values (hence the name variable), and is typically associated with a fixed name.
5.2.1. Declaring variables¶
In Python, we declare a variable by writing its name and then assigning it a
value with the equal (=
) sign:
my_favorite_variable = 3
Notice that when we initialize a variable, we don’t declare its type anywhere.
If you’re familiar with statically typed languages like C++ or Java, you’re
probably used to having to specify what type of data a variable holds when you
create it. For example, you might write int my_favorite_number = 3
to indicate
that the variable is an integer. In Python, we don’t need to do this. Python is
dynamically typed, meaning that the type of each variable will be determined
on the fly, once we start executing our program. It also means we can change the
type of the variable on the fly, without anything bad happening. For example, by
over-writing it with a character string value, instead of the integer value that
was once stored in this variable:
my_favorite_variable = "zzzzzzz"
5.2.2. Printing variables¶
We can examine the contents of a variable at any time using the built-in
print()
function:
print(my_favorite_variable)
zzzzzzz
If we’re working in an interactive environment like a Jupyter notebook, we may
not even need to call print()
, as we’ll automatically get the output of the
last line evaluated by the Python interpreter:
# this line won't be printed, because it isn't the last line in the notebook cell to be evaluated
"this line won't be printed"
# but this one will
my_favorite_variable
'zzzzzzz'
As you can see in this example, the hash sign (#
) is used for comments. That
means that any text that is after a #
is ignored by the Python interpreter and
not evaluated.
5.2.3. Built-in types¶
All general-purpose programming languages provide the programmer with different types of variables—things like strings, integers, Booleans, and so on. These are the most basic building blocks a program is made up of. Python is no different and provides us with several built-in types](https://docs.python.org/3/library/stdtypes.html). Let’s take a quick look at some of these.
5.2.3.1. Integers¶
An integer is a numerical data type that can only take on finite whole numbers as its value. For example:
number_of_subjects = 20
number_of_timepoints = 1000
number_of_scans = 10
Any time we see a number written somewhere in Python code, and it’s composed only of digits (no decimals, quotes, etc.), we know we’re dealing with an integer.
In Python, integers support all of the standard arithmetic operators you’re familiar with–addition, subtraction, multiplication, etc. For example, we can multiply the two variables we just defined:
number_of_subjects * number_of_timepoints
20000
Or divide one integer by another:
number_of_timepoints / number_of_scans
100.0
Notice that the result of the above division is not itself an integer! The decimal point in the result gives away that the result is of a different type – a float.
5.2.3.2. Floats¶
A float (short for floating point number) is a numerical data type used to represent real numbers. As we just saw, floats are identified in Python by the presence of a decimal.
roughly_pi = 3.14
mean_participant_age = 24.201843727
All of the standard arithmetic operators work on floats just like they do on ints:
print(roughly_pi * 2)
6.28
We can also freely combine ints and floats in most operations:
print(0.001 * 10000 + 1)
11.0
Observe that the output is of type float
, even though the value is a whole
number, and hence could in principle have been stored as an int
without any
loss of information. This is a general rule in Python: arithmetic operations
involving a mix of int
and float
operands will almost always return a
float
. Some operations will return a float
even if all operands are int
s,
as we saw above in the case of division.
5.2.3.3. Exercise¶
The Python built-in type()
function reports to you the type of a variable that is passed to it. Use the type
function to verify that number_of_subjects * number_of_timepoints
is a Python integer, while number_of_timepoints / number_of_scans
is not. Why do you think that Python changes the result of a division into a variable of type float?
5.2.3.4. Strings¶
A string is a sequence of characters. In Python, we define strings by enclosing zero or more characters inside a pair of quotes (either single or double quotes work equally well, so you can use whichever you prefer; just make sure the opening and closing quotes match!).
country = "Madagascar"
ex_planet = 'Pluto'
Python has very rich built-in functionality for working with strings. Let’s look at some of the things we can do.
We can calculate the length of a string:
len(country)
10
Or convert it to uppercase (try also .lower()
and .capitalize()
):
country.upper()
'MADAGASCAR'
We can count the number of occurrences of a substring (in this case, a single letter a
):
country.count("a")
4
Or replace a matching substring with another substring:
country.replace("car", "truck")
'Madagastruck'
One thing that you might notice in the above examples is that they seem to use
two different syntaxes. In the first example, it looks like len()
is a
function that takes a string as its parameter (or argument). By contrast,
the last 3 examples use a different “dot” notation, where the function comes
after the string (as in country.upper()
). If you find this puzzling, don’t
worry! We’ll talk about the distinction in much more detail below.
5.2.3.5. Exercise¶
Write code to count how many times the combination “li” appears in the string “supercalifragilisticexpialidocious”. Assign this value into a new variable named number_of_li
and print its value.
5.2.3.6. Booleans¶
Booleans operate pretty much the same in Python as in other languages; the main
thing to recognize is that they can only take on the values True
or False
.
Not true
or false
, not "true"
or "false"
. The only values a boolean can
take on in Python are True
and False
, written exactly that way. For example:
enjoying_book = True
One of the ways that boolean values are typically generated in Python programs is through logical or comparison operations. For example, we can ask whether the length of a given string is greater than a particular integer:
is_longer_than_2 = len("apple") > 2
print(is_longer_than_2)
True
Or whether the product of the first two numbers below equals the third…
is_the_product = 719 * 1.0002 == 2000
print(is_the_product)
False
Or, we might want to know whether the conjunction of several sub-expressions is True
or False
:
("car" in country) and (len("apple") > 2) and (15 / 2 > 7)
True
This last example, simple as it is, illustrates a nice feature of Python: its
syntax is more readable than that of most other programming languages. In the
above example, we ask if the substring "car"
is contained in the string
country
using the English language word in
. Similarly, Python’s logical
conjunction operator is the English word and
. This means that we can often
quickly figure out – or at least, vaguely intuit – what a piece of Python code
does.
5.2.3.7. Exercise¶
Some integer values are equivalent to the Python Boolean values. Use the
equality (==
) operator to find integers that are equivalent to True
and that
are equivalent to False
.
5.2.3.8. None¶
In addition to these usual suspects, Python also has a type called None
.
None
is special and indicates that no value has been assigned to a variable.
It’s roughly equivalent to the null
value found in many other languages.
name = None
Note: None
and False
are not the same thing!
name == False
False
Also, assigning the value None
to a variable is not the same as not defining
the variable in the first place. Instead, a variable that is set to None
is
something that we can point to in our program without raising an error but
doesn’t carry any particular value. These are subtle but important points, and
in later chapters, we’ll use code where the difference becomes important.
5.3. Collections¶
Most code we’re going to want to write in Python will require more than just integers, floats, strings, and booleans. We’re going to need more complex data structures, or collections, that can hold other objects (like strings, integers, etc.) and enable us to easily manipulate them in various ways. Python provides built-in support for many common data structures, and others can be found in modules that come installed together with the language itself – the so-called “standard library” (e.g., in the collections module).
5.3.1. Lists¶
Lists are the most common collection we’ll work with in Python. A list is a heterogeneous collection of objects. By heterogeneous, we mean that a list can contain elements of different types. It doesn’t have to contain only strings or only integers; it can contain a mix of the two, as well as all kinds of other types.
5.3.1.1. List initialization¶
To create a new list, we enclose one or more values between square brackets ([
and ]
). Elements are separated by commas. Here is how we initialize a list
containing 4 elements of different types (an integer, a float, and two strings).
random_stuff = [11, "apple", 7.14, "banana"]
5.3.1.2. List indexing¶
Lists are ordered collections, by which we mean that a list retains a memory of the position each of its elements was inserted in. The order of elements won’t change unless we explicitly change it. This allows us to access individual elements in the list directly, by specifying their position in the collection, or index.
To access the \(i^{th}\) element in a list, we enclose the index \(i\) in square brackets. Note that Python uses 0-based indexing (i.e., the first element in the sequence has index 0), and not 1 as in some other data-centric languages (Julia, R, etc.). For example, it means that the following operation returns the second item in the list and not the first.
random_stuff[1]
'apple'
Many bitter wars have been fought on the internet over whether 0-based or 1-based indexing is better. We’re not here to take a philosophical stand on this issue; the fact of the matter is that Python indexing is 0-based, and that’s not going to change. So whether you like it or not, you’ll need to make your peace with the idea that indexing starts from 0 while you’re reading this book.
5.3.1.3. Exercise¶
Indexing with negative numbers: in addition to indexing from the beginning of the list, we can index from the end of the list using negative numbers (e.g., random_stuff[-1]
). Experiment indexing into the list random_stuff
with negative numbers. What is the negative number index for the last item in the list? What is the negative number index for the first item in the list? Can you write code that would use a negative number to index the first item in the list, without advance knowledge of its length?
5.3.1.4. List slicing¶
Indexing is nice, but what if we want to pull more than one element at a time
out of our list? Can we easily retrieve only part of a list? The answer is yes!
We can slice a list, and get back another list that contains multiple
contiguous elements of the original list, using the colon (:
) operator.
random_stuff[1:3]
['apple', 7.14]
In the list-slicing syntax, the number before the colon indicates the start
position, and the number after the colon indicates the end position. Note that
the start is inclusive and the end is exclusive. That is, in the above example,
we get back the 2nd and 3rd elements in the list, but not the 4th. If it
helps, you can read the 1:3
syntax as saying I want all the elements in the
list starting at index 1
and stopping just before index 3
.
5.3.1.5. Assigning values to list elements¶
Lists are mutable objects, meaning that they can be modified after they’ve been created. In particular, we very often want to replace a particular list value with a different value. To overwrite an element at a given index, we assign a value to it, using the same indexing syntax we saw above:
print("Value of first element before re-assignment:", random_stuff[0])
random_stuff[0] = "eleventy"
print("Value of first element after re-assignment:", random_stuff[0])
Value of first element before re-assignment: 11
Value of first element after re-assignment: eleventy
5.3.1.6. Appending to a list¶
It’s also very common to keep appending variables to an ever-growing list. We
can add a single element to a list via the .append()
function (notice again
that we are calling a function using the ‘dot’ notation, we promise that we’ll
come back to that later!).
random_stuff.append(88)
print(random_stuff)
['eleventy', 'apple', 7.14, 'banana', 88]
5.3.1.7. Exercise¶
There are several ways to combine lists, including the append
function you saw above, as well as the extend
method. You can also add lists together using the addition (+
) operator.
Given the following two lists:
list1 = [1, 2, 3]
list2 = [4, 5, 6]
How would you create a new list called list3
that has the items: [6, 5, 1, 2, 3]
, with as few operations as possible and only using indexing operations and functions associated with the list (hint: you can look up these functions in the Python online documentation for lists)
5.3.2. Dictionaries (dict)¶
Dictionaries are another extremely common data structure in Python. A dictionary
(or dict
) is a mapping from keys to values; we can think of it as a set of
key/value pairs, where the keys have to be unique (but the values don’t). Many
other languages have structures analogous to Python’s dictionaries, though
they’re usually called something like associative arrays, hash tables, or
maps.
5.3.2.1. Dictionary initialization¶
We initialize a dictionary by specifying comma-delimited key/value pairs inside curly braces. Keys and values are separated by a colon. It looks like this:
fruit_prices = {
"apple": 0.65,
"mango": 1.5,
"strawberry": "$3/lb",
"durian": "unavailable",
5: "just to make a point"
}
Notice that both the keys and the values can be heterogeneously typed (observe the last pair, where the key is an integer).
5.3.2.2. Accessing values in a dictionary¶
In contrast to lists, you can’t access values stored in a dictionary directly by their serial position. Instead, values in a dictionary are accessed by their key. The syntax is identical to that used for list indexing. We specify the key whose corresponding value we’d like to retrieve in between square brackets:
fruit_prices['mango']
1.5
And again, the following example would fail, raising a KeyError
telling us
there’s no such key in the dictionary:
fruit_prices[0]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Input In [30], in <cell line: 1>()
----> 1 fruit_prices[0]
KeyError: 0
However, the reason the above key failed is not that integers are invalid keys. To prove that, consider the following:
fruit_prices[5]
'just to make a point'
Superficially, it might look like we’re requesting the 6th element in the
dictionary and getting back a valid value. But that is not what is actually
happening here. If it’s not clear to you why fruit_prices[0]
fails while
fruit_prices[5]
succeeds, go back and look at the code we used to create the
fruit_prices
dictionary. Carefully inspect the keys and make sure you
understand what’s going on.
5.3.2.3. Updating a dictionary¶
Updating a dictionary uses the same []
-based syntax as accessing values,
except we now make an explicit assignment. For example, we can add a new entry
for the ananas
key:
fruit_prices["ananas"] = 0.5
Or over-write the value for the mango
key:
fruit_prices["mango"] = 2.25
And then look at the dict again:
print(fruit_prices)
{'apple': 0.65, 'mango': 2.25, 'strawberry': '$3/lb', 'durian': 'unavailable', 5: 'just to make a point', 'ananas': 0.5}
5.3.2.4. Exercise¶
Add another fruit to the dictionary. This fruit should have several different values associated with it, organized as a list. How can you access the second item in this list in one single call?
5.3.3. Tuples¶
The last widely-used Python collection we’ll discuss here (though there are many other more esoteric ones) is the tuple. Tuples are very similar to lists in Python. The main difference between lists and tuples is that lists are mutable, meaning, they can change after initialization. Tuples are immutable; once a tuple has been created, it can no longer be modified.
We initialize a tuple in much the same way as a list, except we use parentheses (round brackets) instead of square brackets:
my_tuple = ("a", 12, 4.4)
Just to drive home the immutability of tuples, let’s try replacing a value and see what happens:
my_tuple[1] = 999
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [36], in <cell line: 1>()
----> 1 my_tuple[1] = 999
TypeError: 'tuple' object does not support item assignment
Our attempt to modify the tuple raises an error. Fortunately, we can easily convert any tuple to a list, after which we can modify it to our heart’s content.
converted_from_tuple = list(my_tuple)
converted_from_tuple[1] = 999
print(converted_from_tuple)
['a', 999, 4.4]
In practice, you can use a list almost anywhere you can use a tuple, though there are some important exceptions. One that you can already appreciate is that a tuple can be used as a key to a dictionary, but a list can’t:
dict_with_sequence_keys = {my_tuple : "Access this value using a tuple!"}
dict_with_sequence_keys[converted_from_tuple] = "This will not work"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [39], in <cell line: 1>()
----> 1 dict_with_sequence_keys[converted_from_tuple] = "This will not work"
TypeError: unhashable type: 'list'
Admittedly, the error that this produces is a bit cryptic, but it relates directly to the fact that a mutable object is considered a bit unreliable because elements within it can change without notice.
5.4. Everything in Python is an object¶
Our discussion so far might give off the impression that some data types in Python are basic or special in some way. It’s natural to think, for example, that strings, integers, and booleans are “primitive” data types —- i.e., that they’re built into the core of the language, behave in special ways, and can’t be duplicated, or modified. And this is true in many other programming languages. For example, in Java, there are exactly 8 primitive data types. If you get bored of them, you’re out of luck. You can’t just create new ones -— say, a new type of string that behaves just like the primitive strings, but adds some additional functionality you think would be kind of cool to have.
Python is different: it doesn’t really have any primitive data types. Python is a deeply object-oriented programming language, and in Python, everything is an object. Strings are objects, integers are objects, booleans are objects. So are lists. So are dictionaries. Everything is an object in Python. We’ll spend more time talking about what objects are, and the deeper implications of everything being an object, at the end of this chapter. For now, let’s focus on some of the practical implications for the way we write code.
5.4.1. The dot notation¶
Let’s start with the dot (.
) notation we use to indicate that we’re accessing
data or functionality inside an object. You’ve probably already noticed that
there are two kinds of constructions we’ve been using in our code to do things
with variables. There’s the functional syntax, where we pass an object as an
argument to a function:
len([2, 4, 1, 9])
4
And then there’s the object-oriented syntax that uses the dot notation, which we saw when looking at some of the functionality implemented in strings:
phrase = "aPpLeS ArE delICIous"
phrase.lower()
'apples are delicious'
If you have some experience in another object-oriented programming language, the dot syntax will be old hat to you. But if you’ve mostly worked in data-centric languages (e.g., R or Matlab), you might find it puzzling at first.
What’s happening in the above example is that we’re calling a function attached to this object (this is called a “method” of the object) lower()
on the phrase
string itself. You can think of the dot operator .
as expressing a relationship of belonging, or roughly translating as “look inside of”. So, when we write phrase.lower()
, we’re essentially saying, “call the lower()
method that’s contained inside of phrase
”. (We’re being a bit sloppy here for the sake of simplicity, but that’s the gist of it.)
Note that lower()
works on strings, but, unlike functions like len()
and round()
, it isn’t a built-in function in Python. We can’t just call lower()
directly:
lower("TrY to LoWer ThIs!")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [42], in <cell line: 1>()
----> 1 lower("TrY to LoWer ThIs!")
NameError: name 'lower' is not defined
Instead, it needs to be called via an instance that contains this function, as we did above with phrase
.
Neither is lower()
a method that’s available on all objects. For example, this won’t work:
num = 6
num.lower()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [43], in <cell line: 3>()
1 num = 6
----> 3 num.lower()
AttributeError: 'int' object has no attribute 'lower'
Integers, as it happens, don’t have a method called lower()
. And neither do most other types. But strings do. And what the lower()
method does, when called from a string, is return a lower-cased version of the string to which it is attached. But that functionality is a feature of the string type itself, and not of the Python language in general.
Later, we’ll see how we go about defining new types (or classes), and specifying what methods they have. For the moment, the main point to take away is that almost all functionality in Python is going to be accessed via objects. The dot notation is ubiquitous in Python, so you’ll need to get used to it quickly if you’re used to a purely functional syntax.
5.4.1.1. Inspecting objects¶
One implication of everything being an object in Python is that we can always find out exactly what data an object contains, and what methods it implements, by inspecting it in various ways.
We won’t look very far under the hood of objects in this chapter, but it’s worth knowing about a couple of ways of interrogating objects that can make your life easier.
First, you can always see the type of an object with the built-in type()
function, which you also saw before:
msg = "Hello World!"
type(msg)
str
Second, the built-in dir()
function will show you all of the methods implemented on an object, as well as static attributes, which are variables stored within the object. Be warned that this will often be a long list and that some of the attribute names you see (mainly those that start and end with two underscores) will look a little wonky. We’ll talk about those briefly later.
dir(msg)
['__add__',
'__class__',
'__contains__',
'__delattr__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__getitem__',
'__getnewargs__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__iter__',
'__le__',
'__len__',
'__lt__',
'__mod__',
'__mul__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__rmod__',
'__rmul__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'capitalize',
'casefold',
'center',
'count',
'encode',
'endswith',
'expandtabs',
'find',
'format',
'format_map',
'index',
'isalnum',
'isalpha',
'isascii',
'isdecimal',
'isdigit',
'isidentifier',
'islower',
'isnumeric',
'isprintable',
'isspace',
'istitle',
'isupper',
'join',
'ljust',
'lower',
'lstrip',
'maketrans',
'partition',
'removeprefix',
'removesuffix',
'replace',
'rfind',
'rindex',
'rjust',
'rpartition',
'rsplit',
'rstrip',
'split',
'splitlines',
'startswith',
'strip',
'swapcase',
'title',
'translate',
'upper',
'zfill']
That’s a pretty long list! Any name in that list is available to you as an attribute of the object (e.g., my_var.values()
, my_var.__class__
, etc.), meaning that you can access it and call it (if it is a function) using the dot notation. Notice that the list contains all of the string methods we experimented with earlier (including lower
), as well as many others.
5.4.1.2. Exercise¶
Find the methods associated with “int
” objects. Are they different from the methods associated with “float
” objects?
5.5. Control flow¶
Like nearly every other programming language, Python has several core language constructs that allow us to control the flow of our code – the order in which functions get called and expressions are evaluated. The two most common ones are conditionals (if-then statements) and for-loops.
5.5.1. Conditionals¶
Conditional (or if-then) statements allow our code to branch -— meaning, we can execute different chunks of code depending on which of two or more conditions is met. For example:
mango = 0.2
if mango < 0.5:
print("Mangoes are super cheap; get a bunch of them!")
elif mango < 1.0:
print("Get one mango from the store.")
else:
print("Meh. I don't really even like mangoes.")
Mangoes are super cheap; get a bunch of them!
The printed statement will vary depending on the value assigned to the mango
variable. Try changing that value and see what happens when you re-run the code.
Notice that there are three statements in the above code: if
, elif
(which in
Python stands for “else if”), and else
. Only the first of these (i.e., if
)
is strictly necessary; the elif
and else
statements are optional.
5.5.1.1. Exercise¶
There can be arbitrarily many elif
statements. Try adding another one to the code above that executes only in the case that mangos are more expensive than 2.0 and less expensive than 5.0.
5.5.2. Loops¶
For-loops allow us to iterate (or loop) over the elements of a collection (e.g., a list) and perform the same operation(s) on each one. As with most things Python, the syntax is quite straightforward and readable:
for elem in random_stuff:
print(elem)
eleventy
apple
7.14
banana
88
Here we loop over the elements in the random_stuff
list. In each iteration
(i.e., for each element), we assign the value to the temporary variable elem
,
which only exists within the scope of the for
statement (i.e., elem
won’t
exist once the for-loop is done executing). We can then do whatever we like with
elem
. In this case, we just print its value.
5.5.2.1. Looping over a range¶
While we can often loop directly over the elements in an array (as in the above
example), it’s also very common to loop over a range of integer indices, which
we can then use to access data stored in some sequential form in one or more
collections. To facilitate this type of thing, we can use Python’s built-in
range()
function, which produces a sequence of integers starting from 0
and
stopping before the passed value:
num_elems = len(random_stuff)
for i in range(num_elems):
val = random_stuff[i]
print(f"Element {i}: {val}")
Element 0: eleventy
Element 1: apple
Element 2: 7.14
Element 3: banana
Element 4: 88
5.5.2.2. Exercise¶
The content that was printed in each iteration of the loop in the last example is formatted using a so-called “f-string”. This is a way to compose strings that change based on information from the code surrounding them. An f-string is a string that has the letter “f” before it, as in this example, and it can contain segments enclosed by curly braces ({
and }
) that contain Python statements. In this case, the Python statements in each curly bracket are variable names, and the values of the variables at that point in the code are inserted into the string, but you could also insert small calculations that produce a result that then gets inserted into the string at that location. As an exercise, rewrite the code above so that in each iteration through the loop the value of i
and the value of i
squared are both printed. Hint: powers of Python numbers are calculated using the **
operator.
5.5.3. Nested control flow¶
We can also nest conditionals and for-loops inside one another (as well as inside other compound statements). For example, we can loop over the elements of random_stuff
, as above, but keeping only the elements that meet some condition—e.g., only those elements that are strings:
# create an empty list to hold the filtered values
strings_only = []
# loop over the random_stuff list
for elem in random_stuff:
# if the current element is a string...
if isinstance(elem, str):
# ...then append the value to strings_only
strings_only.append(elem)
print("Only the string values:", strings_only)
Only the string values: ['eleventy', 'apple', 'banana']
5.5.4. Comprehensions¶
In Python, for-loops can also be written in a more compact way known as a list comprehension (there are also dictionary comprehensions, but we’ll leave that to you to look up as an exercise). List comprehensions are just a bit of syntactic sugar -— meaning, they’re just a different way of writing the same code but don’t change the meaning in any way. Here’s the list comprehension version of the for-loop we wrote above:
p = [print(elem) for elem in random_stuff]
eleventy
apple
7.14
banana
88
We can also embed conditional statements inside list comprehensions. Here’s a much more compact way of writing the string-filtering snippet we wrote above:
strings_only = [elem for elem in random_stuff if isinstance(elem, str)]
print("Only the string values:", strings_only)
Only the string values: ['eleventy', 'apple', 'banana']
List comprehensions can save you quite a bit of typing once you get used to reading them, and you may eventually even find them clearer to read. It’s also possible to nest list comprehensions (equivalent to for-loops within for-loops), though that power should be used sparingly, as nested list comprehensions can be difficult to understand.
5.5.4.1. Exercise¶
Using a comprehension, create a list, where each element is a tuple. The first element in each tuple should be the index of the element in random_stuff
and the second element of the tuple should be its square.
5.5.5. Whitespace is syntactically significant¶
One thing you might have noticed when reading the conditional statements and for-loops above is that we always seem to indent our code inside these statements. This isn’t a matter of choice; Python is a bit of an odd duck among programming languages, in that it imposes strong rules about how whitespace can be used (i.e., whitespace is syntactically significant). This can take a bit of getting used to, but once you do, it has important benefits: there’s less variability in coding style across different Python programmers, and reading other people’s code is often much easier than it is in languages without syntactically significant whitespace.
The main rule you need to be aware of is that whenever you enter a compound statement (which includes for-loops and conditionals, but also function and class definitions, as we’ll see below), you have to increase the indentation of your code. When you exit the compound statement, you then decrease the indentation by the same amount.
The exact amount you indent each time is technically up to you. But it’s strongly recommended that you use the same convention everyone else does (described in the Python style guide, known as PEP8), which is to always indent or dedent by 4 spaces. Here’s what this looks like in a block with multiple nested conditionals:
num = 800
if num > 500:
if num < 900:
if num > 700:
print("Great number.")
else:
print("Terrible number.")
Great number.
5.5.5.1. Exercise¶
Modify the above snippet so that you (a) consistently use a different amount of indentation (for example, 3 spaces), and (b) break Python by using invalid indentation.
5.6. Namespaces and imports¶
Python is a high-level, dynamic programming language, which people often associated with flexibility and lack of precision (e.g., as we’ve already seen, you don’t have to declare the type of your variables when you initialize them in Python). But in some ways, Python is much more of a stickler than most other dynamic languages about the way Python developers write their code. We just saw that Python is very serious about how you indent your code. Another thing that’s characteristic of Python is that it takes namespacing very seriously.
If you’re used to languages like, say, R or MATLAB, you might expect to have hundreds of different functions available to call as soon as you fire up an interactive prompt. By contrast, in Python, the built-in namespace -— the set of functions you can invoke when you start running Python -— is very small. This is by design: Python expects you to carefully manage the code you use, and it’s particularly serious about making sure you maintain orderly namespaces.
In practice, this means that any time you want to use some functionality that is
not built-in and immediately available, you need to explicitly import it from
whatever module it’s currently in, via an import
statement. This can initially
look strange, but once you get used to it, you’ll find that it substantially
increases code clarity and almost completely eliminates naming conflicts and
confusion about where some functionality came from.
5.6.1. Importing a module¶
Conventionally, all import statements in a Python file are consolidated at the
very top (though there are some situations where this isn’t possible). Here’s
what the most basic usage of import
looks like:
import json
dummy = {'a': 1, 'b': 5}
json.dumps(dummy)
'{"a": 1, "b": 5}'
In this case, we begin by importing a module from Python’s standard library–i.e., the set of libraries that come bundled with the Python interpreter in every standard installation. Once we import a module, we can invoke any of the functions located inside it, using the dot syntax you see above. In this case, we import the json
module, which provides tools for converting to and from the JSON format. JSON, which stands for JavaScript Object Notation, is a widely-used text-based data representation format. In the above example, we take a Python dictionary and convert (or “dump”) it to a JSON string by calling the dumps()
function.
Note that if we hadn’t explicitly imported json
, the json.dumps()
call would have failed, because json
would be undefined in our namespace. You can also try directly calling dumps()
alone (without the json
prefix) to verify that there is no such function available to you in Python’s root namespace.
5.6.2. Importing from a module¶
Importing a module by name gives us access to all of its internal attributes. But sometimes we only need to call a single function inside a module, and we might not want to have to type the full module name every time we use that function. In that case, we can import from the module:
# defaultdict is a dictionary that has default values for new keys.
from collections import defaultdict
# When initializing a defaultdict, we specify the default type of new values.
test_dict = defaultdict(int)
# this would fail with a normal dict, but with a defaultdict,
# a new key with the default value is created upon first access.
test_dict['made_up_key']
0
In this case, we import defaultdict
directly from the collections
module into our current namespace. This makes defaultdict
available for our use. Note that collections
itself is not available to us unless we explicitly import it (i.e., if we run import collections
):
import collections
another_test_dict = collections.defaultdict(int)
5.6.3. Renaming variables at import time¶
Sometimes the module or function we want to import has an unwieldy name. Python’s import statements allow us to rename the variable we’re importing on the fly using the as
keyword:
from collections import defaultdict as dd
float_test_dict = dd(float)
For many commonly used packages, that are strong conventions about naming
abbreviations, and you should make sure to respect these in your code. For
example, it’s standard to see import numpy as np
and import pandas as pd
(both are libraries that you will learn about in Section 8 and
Section 9, respectively). Your code will still work fine if you use other
variable names, but other programmers will have a slightly more difficult time
understanding what you’re doing. So be kind to others and respect the
conventions.
5.7. Functions¶
Python would be of limited use to us if we could only run our code linearly from top to bottom. Fortunately, as in almost every other modern programming language, Python has functions: blocks of code that only run when explicitly called. Some of these are built into the language itself (or contained in the standard library’s many modules we can import from, as we saw above):
approx_pi = 3.141592
round(approx_pi, 2)
3.14
Here we use the round()
function to round a float to the desired precision (2
decimal places). The round()
function happens to be one of the few dozen
“built-ins” included in the root Python namespace out of the box, but we can
easily define our own functions, which we can then call just like the built-in
ones. Functions are defined like this:
def print_useless_message():
print("This is a fairly useless message.")
Here, we’re defining a new function called print_useless_message
, which, as you might expect, can print a fairly useless message. Notice that nothing happens when we run the above block of code. That’s because all we’ve done is define the function; we haven’t yet called or invoked it. We can do that like this:
print_useless_message()
This is a fairly useless message.
5.7.1. Function arguments and return values¶
Functions can accept arguments (or parameters) that alter their behavior. When
we called round()
above, we passed two arguments: the float we wanted to
round, and the number of decimal places we wanted to round it to. The first
argument is mandatory in the case of round()
; if we try calling round()
without any arguments (feel free to give it a shot), we’ll get an error. This
should make intuitive sense to you because it would be pretty strange to try to
round no value at all.
Functions can also explicitly return values to the user. To do this, we have to explicitly end our function with a return
statement, followed by the variable(s) we want to return. If a function doesn’t explicitly end with a return
statement, then the special value None
we encountered earlier will be returned.
Let’s illustrate the use of arguments by writing a small function that takes a
single float as input, adds Gaussian noise (generated by the standard library’s
random
module), and returns the result.
import random
def add_noise(x, mu, sd):
"""Adds gaussian noise to the input.
Parameters
----------
x : number
The number to add noise to.
mu : float
The mean of the gaussian noise distribution.
sd : float
The standard deviation of the noise distribution.
Returns
-------
float
"""
noise = random.normalvariate(mu, sd)
return (x + noise)
The add_noise()
function has three required parameters: The first (x
) is the
number we want to add noise to. The second (mu
) is the mean of the Gaussian
distribution to sample from. The third (`sd) is the distribution’s standard deviation.
Notice that we’ve documented the function’s behavior inside the function definition itself using what’s called a docstring. This a good habit to get into, as good documentation is essential if you expect other people to be able to use the code you write (including yourself in the future). In this case, the docstring indicates to the user what the expected type of each argument is, what the argument means, and what the function returns. In case you are wondering why it is organized in just this way, that is because we are following the conventions of docstrings established by the numpy project (and described in the numpy docstring guide).
Now that we’ve defined our noise-adding function, we can start calling it. Note that because we’re sampling randomly from a distribution, we’ll get a different output every time we re-run the function, even if the inputs are the same.
add_noise(4, 1, 2)
3.446093099301067
5.7.1.1. Exercise¶
Based on the function definition provided above, define a new function that produces a sample of n
numbers each of which is x
with Gaussian noise of mean mu
and standard deviation std
added to it. The return value should be a list of length n
, itself a parameter to the function.
5.7.2. Function arguments¶
Python functions can have two kinds of arguments: positional arguments, and keyword (or named) arguments.
5.7.2.1. Positional arguments¶
Positional arguments, as their name suggests, are defined by position, and they
must be passed when the function is called. The values passed inside the
parentheses are mapped one-to-one onto the arguments, as we saw above for
add_noise()
. That is, inside the add_noise()
function, the first value is
referenced by x
, the second by mu
, and so on.
If the caller fails to pass the right number of arguments (either too few or too many), an error will be generated:
add_noise(7)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [62], in <cell line: 1>()
----> 1 add_noise(7)
TypeError: add_noise() missing 2 required positional arguments: 'mu' and 'sd'
In this case, the call to the function fails because the function has 3 positional arguments, and we only pass one
5.7.2.2. Keyword arguments¶
Keyword arguments are arguments that are assigned a default value in the
function signature (i.e., the top line of the function definition, that looks
like def my_function(...)
). Unlike positional arguments, keyword arguments are
optional: if the caller doesn’t pass a value for the keyword argument, the
corresponding variable will still be available inside the function, but it will
have whatever value is defined as the default in the signature.
To see how this works, let’s rewrite our add_noise()
function so that the
parameters of the gaussian distribution are now optional:
def add_noise_with_defaults(x, mu=0, sd=1):
"""Adds gaussian noise to the input.
Parameters
----------
x : number
The number to add noise to.
mu : float, optional
The mean of the gaussian noise distribution.
Default: 0
sd : float, optional
The standard deviation of the noise distribution.
Default: 1
Returns
-------
float
"""
noise = random.normalvariate(mu, sd)
return x + noise
This looks very similar, but we can now call the function without filling in
mu
or sd
. If we don’t pass in those values explicitly, the function will
internally use the defaults (i.e., 0
in the case of mu
, and 1
in the case
of sd
). Now, when we call this function with only one argument:
add_noise_with_defaults(10)
9.81883420856302
Keyword arguments don’t have to be filled in order, as long as we explicitly
name them. For example, we can specify a value for sd
but not for mu
:
# we specify x and sd, but not mu
add_noise_with_defaults(5, sd=100)
-32.43831889011323
Note that if we didn’t specify the name of the argument (i.e., if we called
add_noise_with_defaults(5, 100)
, the function would still work, but the second
value we pass would be interpreted as mu
rather than sd
because that’s the
order they were introduced in the function definition.
It’s also worth noting that we can always explicitly name any of our
arguments, including positional ones. This is extremely handy in cases where
we’re calling functions whose argument names we remember, but where we don’t
necessarily remember the exact order of the arguments. For example, suppose we
remember that add_noise()
takes the three arguments x
, mu
, and sd
, but
we don’t remember if x
comes before or after the distribution parameters. We
can guarantee that we get the result we expect by explicitly specifying all the
argument names:
add_noise(mu=1, sd=2, x=100)
99.91417120282422
To summarize, functions let us define a piece of code that can be called as needed and reused. We can define a default behavior and override it as necessary. There is a bit more to it, of course. For a few more details, you can go into more depth in Section 5.7.2.3, or skip forward to Section 5.8.
5.7.2.3. Argument unpacking with *args and **kwargs**¶
It sometimes happens that a function needs to be able to accept an unknown number of arguments. A very common scenario like this is where we’ve written a “wrapper” function that takes some input, does some operation that relies on only some of the arguments the user passed, and then hands off the rest of the arguments to a different function.
For example, suppose we want to write an arg_printer()
function that we can use
to produce a standardized display of the positional and keyword arguments used
when calling some other arbitrary function. Python handles this scenario
elegantly via special *args
and **kwargs
syntax in function signatures, also
known as argument unpacking.
The best way to understand what *args
and **kwargs
do is to see them in
action. Here’s an example:
def arg_printer(func, *args, **kwargs):
"""
A wrapper that takes any other function plus arguments to pass
to that function. The arguments are printed out before the
function is called and the result returned.
Parameters
----------
func : callable
The function to call with the passed args.
args : list, optional
List of arguments to pass into func.
kwargs : dict, optional
Dict of keyword arguments to pass into func.
Returns
-------
The result of func() when called with the passed arguments.
"""
print("Calling function:", func.__name__)
print("Positional arguments:", args)
print("Keyword arguments:", kwargs)
return func(*args, **kwargs)
This may seem a bit mysterious, and there are parts we won’t explain right now
(e.g., func.__name__
). But try experimenting a bit with calling this function,
and things may start to click. Here’s an example to get you rolling
arg_printer(add_noise, 17, mu=0, sd=5)
Calling function: add_noise
Positional arguments: (17,)
Keyword arguments: {'mu': 0, 'sd': 5}
16.617894244934316
What’s happening here is that the first argument to arg_printer()
is the
add_noise()
function we defined earlier. Remember: everything in Python is
secretly an object–even functions! You can pass functions as arguments to other
functions too; it’s no different than passing a string or a list. A key point to
note, however, is that what we’re passing in is, in a sense, the definition of
the function. Notice how we didn’t add parentheses to add_noise
when we passed
it to arg_printer()
? That’s because we don’t want to call add_noise
yet;
we’re leaving it to the arg_printer()
function to do that internally.
All the other arguments to arg_printer()
after the first one are arguments
that we actually want to pass onto the add_noise()
function when it’s called
internally by arg_printer()
. The first thing arg_printer()
does is print out
the name of the function we just gave it, as well as all of the positional and
keyword arguments we passed in. Once it’s done that, it calls the function we
passed in (add_noise()
) and passes along all the arguments.
If the above doesn’t make sense, don’t worry! As we mentioned above, we’re moving quickly, and the concepts from here on out start to get quite a bit denser. A good way to explore these ideas a bit better is to write your own code and experiment with things until they start to make sense.
5.7.2.4. Exercise¶
Replace add_noise
with built-in functions like min
, len
, or list
. What other arguments do you need to change?
5.8. Classes¶
The material we’ve covered so far in this chapter provides a brief overview of the most essential concepts in Python and should be sufficient to get you started reading and writing code in the language. If you’re impatient to start working with scientific computing libraries and playing with neuroimaging data, you could stop here and move on to the next chapter. That said, there are some concepts we haven’t talked about yet that are quite central to understanding how Python really works under the hood, and you’ll probably get more out of this book if you understand them well. They are, however, quite a bit more difficult conceptually.
In particular, the object-oriented programming (OOP) paradigm, and Python’s internal data model (which revolves around something called magic methods), are not very intuitive, and it can take some work to wrap your head around them if you haven’t seen OOP before. We’ll do our best to introduce these concepts here, but don’t be surprised or alarmed if they don’t immediately make sense to you – that’s completely normal! You’ll probably find that much of this material starts to “click” as you work your way through the various examples in this book and start to write your own Python analysis code.
We’ve said several times now that everything in Python is actually an object. We’re now in a position to unpack that statement. What does it mean to say that something is an object? How do objects get defined? And how do we specify what an object should do when a certain operation is applied to it? To answer these questions, we need to introduce the notion of classes – and, more generally, the object-oriented programming (OOP) paradigm.
5.8.1. What is a class?¶
A class is, in a sense, a kind of template for an object. You can think of it as a specification or a set of instructions that determine what an object of a given kind can do. In a sense, it’s very close in meaning to what we’ve already been referring to as the type of an object. There is technically a difference between types and classes in Python, but it’s quite subtle, and in day-to-day usage, you can use the terms interchangeably and nobody is going to yell at you.
5.8.2. Defining classes¶
So a class is a kind of template; okay, what does it look like? Well, minimally, it looks like very little. Here’s a fully functional class definition:
class Circle:
pass
That’s it! We’ve defined a new Python class. In case you’re wondering, the
pass
statement does nothing – it is used as a placeholder that tells the
Python interpreter it shouldn’t expect any more code to follow.
5.8.3. Creating instances¶
You might think that this empty Circle
class definition isn’t very interesting
because we obviously can’t do anything with it. But that isn’t entirely true.
We can already instantiate this class if we like – which is to say, we can
create new objects whose behavior is defined by the Circle
class. A good way
to think about it is in terms of an “X-is-a-particular-Y” relationship. If we
were to draw 3 different circles on a piece of paper, we could say that each one
is an instance of a circle, but none of them would be the actual definition
of a circle. That should give you an intuitive sense of the distinction between
a class definition and instances of that class.
The syntax for creating an instance in Python is simple:
my_circle = Circle()
That’s it again! We now have a new my_circle
variable on our hands, which is an
instance of class Circle
.
If you don’t believe that, it’s easy to prove:
type(my_circle)
__main__.Circle
In case you are wondering, the reason that this appears to be a
__main__.Circle
object is that while we are running the code in this
notebook, it is defined within a namespace (yes, namespaces again) called
__main__
, and the type
function recognizes this fact. You can learn more
about this particular namespace in the Python documentation
5.8.3.1. A note on nomenclature¶
You may have already noticed the naming convention we’ve used throughout this tutorial: our variable names are always composed of lower-case characters, with words separated by underscores. This is called snake_case. You’ll also note that class names are capitalized (technically, they’re in CamelCase). Both of these are standard conventions in the Python community, and you should get in the habit of following both.
5.8.4. Making it do things¶
The Circle
class definition we wrote above was perfectly valid, but not
terribly useful. It didn’t define any new behavior, so any instances of the
class we created wouldn’t do anything more than base objects in Python can do
(which isn’t very much).
Let’s fix that by filling in the class a bit.
from math import pi
class Circle:
def __init__(self, radius):
self.radius = radius
def area(self):
return pi * self.radius**2
There’s not much code to see here, but conceptually, a lot is going on. Let’s walk through this piece by piece.
We start by importing the variable pi
from the standard library math
module.
Then, we start defining the class. First, observe that we’ve defined what looks like two new functions inside the class. Technically, these are methods and not functions, because they’re bound to a particular object. But the principle is the same: they’re chunks of code that take arguments, do some stuff, and then (possibly) return something to the caller.
You’ll also note that both methods have self
as their first argument. This is
a requirement: all instance methods have to take a reference to the current
instance (conventionally named self
) as their first argument (there are also
class methods and static methods, which behave differently, but we won’t
cover those). This reference is what will allow us to easily maintain state
(that is, to store information that can vary over time) inside the instance.
Now let’s walk through each of the defined methods. First, there’s __init__()
.
The leading and trailing double underscores indicate that this is a special kind
of method called a magic method; we’ll talk about those a bit more later. For
the moment, we just have to know that __init__()
is a special method that gets
called whenever we create a new instance of the class. So, when we write a line
like my_circle = Circle()
, what happens under the hood is that the __init__()
method of Circle
gets executed.
Observe that, in this case, __init__()
takes a single argument (other than
self
, that is): a radius
argument. And further, the only thing that happens
inside __init__()
is that we store the value of the input argument radius
in
an instance attribute called radius
. We do this by assigning to self.radius
.
Remember: self
is a reference to the current instance, so the newly-created
instance that’s returned by Circle()
will have that radius
value set in the
.radius
attribute.
This code should make this a bit clearer:
my_circle = Circle(4)
print(my_circle.radius)
4
Next, let’s look at the area()
method. This one takes no arguments (again,
self
is passed automatically; we don’t need to, and shouldn’t, pass it
ourselves). That means we can just call it and see what happens. Let’s do that:
my_circle.area()
50.26548245743669
When we call area()
, what we get back is the area of our circle—based on the
radius stored in the instance at that moment. Note that this area is only
computed when we call area()
, and isn’t computed in advance. This means that
if the circle’s radius changes, so too will the result of area()
:
my_circle.radius = 9
print(my_circle.area())
254.46900494077323
5.8.4.1. Exercises¶
Add to the implementation of
Circle
another method that calculates the circumference of the circle.Implement a class called
Square
that has a single attribute.side
, and two methods forarea
andcircumference
.
5.8.5. Magic methods¶
There’s a lot more we could say about how classes work in Python, and about object-oriented programming in general, but this is just a brief introduction, so we have to be picky. Let’s introduce just one other big concept: magic methods. The concepts we’ll discuss in this last section are fairly advanced, and aren’t usually discussed in introductions to Python – so, as we’ve said several times now, don’t worry if they don’t immediately click for you. We promise you’ll still be able to get through the rest of the book just fine! The reason we decided to cover magic methods here is that, once you do understand them well, you’ll have a substantially deeper grasp on how Python works, and you’ll probably see patterns and make connections that you otherwise wouldn’t.
Magic methods of objects, as we’ve seen a couple of times now, start and end
with a double underscore: __init__
, __getattr__
, __new__
, and so on. As
their names suggest, these methods are magic -— at least in the sense that they
appear to add some magic to a class’s behavior. We’ve already talked about
__init__
, which is a magic method that gets called any time we create a new
instance of a class. But there are many others.
The key to understanding how magic methods work is to recognize that they’re usually called implicitly when a certain operation is applied to an object -— even if it doesn’t look like the magic method and the operation being applied have anything to do with each other (that’s what makes them magic!).
Remember how we said earlier that everything in Python is an object? Well, now we’re going to explore one of the deeper implications of that observation, which is that all operators in Python are just cleverly-disguised method calls. That means that when we write even an expression as seemingly basic as 4 * 3
in Python, it’s implicitly converted to a call to a magic method on the first operand (4
), with the second operand (3
) being passed in as an argument.
This is a bit hard to explain abstractly, so let’s dive into an example. Start with this naive arithmetic operation:
4 * 3
12
No surprises there. But here’s an equivalent way to write the same line, which makes clearer what’s happening under the hood when we multiply one number by another:
# 4 is a number, so we have to wrap it in parentheses to prevent a syntax error.
# but we wouldn't have to do this for other types of variables (e.g., strings).
(4).__mul__(3)
12
Remember the dot notation? Here, __mul__
is a (magic) method implemented in the integer class. When Python evaluates the expression 4 * 3
, it calls __mul__
on the first integer, and hands it the second one as an argument. See, we weren’t messing around when we said everything is an object in Python. Even something as seemingly basic as the multiplication operator is an alias to a method called on an integer object.
5.8.6. The semantics of *¶
Once we recognize that Python’s *
operator is just an alias to the __mul__
magic method, we might start to wonder if this is always true. Does every valid occurrence of *
in Python code imply that the object just before the *
must be an instance of a class that implements the __mul__
method? The answer is yes! The result of an expression that includes the *
operator (and for that matter, every other operator in Python, including things like ==
and &
) is entirely dependent on the receiver object’s implementation of __mul__
.
Just to make it clear how far-reaching the implications of this principle are, let’s look at how a couple of other built-in Python types deal with the *
operator. Let’s start with strings. What do you think will happen when we multiply a string object by 2?
"apple" * 2
'appleapple'
There’s a good chance this was not the behavior you expected. Many people intuitively expect an expression like "apple" * 2
to produce an error, because we don’t normally think of strings as a kind of thing that can be multiplied. But remember: in Python, the multiplication operator is just an alias for a __mul__
call. And there’s no particular reason a string class shouldn’t implement the __mul__
method; why not define some behavior for it, even if it’s counterintuitive? That way users have a super easy way to repeat strings if that’s what they want to do.
What about a list?
random_stuff * 3
['eleventy',
'apple',
7.14,
'banana',
88,
'eleventy',
'apple',
7.14,
'banana',
88,
'eleventy',
'apple',
7.14,
'banana',
88]
List multiplication behaves a lot like string multiplication: the result is that the list is repeated \(n\) times.
What about dictionary multiplication?
fruit_prices * 2
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [80], in <cell line: 1>()
----> 1 fruit_prices * 2
TypeError: unsupported operand type(s) for *: 'dict' and 'int'
Finally, we encounter an outright failure! It appears Python dictionaries can’t be multiplied. This presumably means that the dict
class doesn’t implement __mul__
(you can verify this for yourself by inspecting any dictionary using dir()
).
5.8.7. Other magic methods¶
Most of the magic methods in Python do something very much like what we saw for the multiplication operator. Consider the following operators: +
, &
, /
, %
, and <
. These map, respectively, onto the magic methods __add__
, __and__
, __truediv__
, __mod__
, and __lt__
. Many others follow the same pattern.
There are also magic methods that are tied to built-in functions rather than operators (for example, when you call len(obj)
, that’s equivalent to calling obj.__len__
), or that are triggered by certain events (for example, __getattr__
is called when a requested attribute isn’t found in an object).
In practice, you won’t need to know much about magic methods unless you start writing a lot of classes of your own (but you can find a full description of all the magic methods; you can peruse the official Python docs for that.). We spent a lot of time talking about them mainly because they’re a good way to convey some deep insights about the data model at the core of the Python language.
5.8.8. Hungry Circles¶
Let’s come full circle now (awful pun intended) and revisit the Circle
class we defined earlier. The last thing we’ll do in this chapter is to add a magic method to our Circle
class. This will nicely tie together a lot of different threads we’ve covered.
What we’re going to do is give instances of class Circle
the ability to “eat” other circles. When given Python code like this:
c1 = Circle(4)
c2 = Circle(2)
c1 * c2
we want the first circle to “grow” its radius by exactly the amount required for its new area to equal the sum of the two circles’ previous areas. Here’s our updated implementation:
from math import pi, sqrt
class Circle:
def __init__(self, radius):
self.radius = radius
def __mul__(self, prey):
new_area = self.area() + prey.area()
self.radius = sqrt(new_area / pi)
def area(self):
return pi * self.radius**2
The only change here is the addition of the __mul__
method.
Let’s see if the above did what we wanted:
c1 = Circle(4)
c2 = Circle(2)
# Now the important part: c1 eats c2!
c1 * c2
Well, we didn’t get an error, so that’s a good sign. Let’s inspect c1
and see if it’s been updated as we expect. Remember: we expect c1
to have “eaten” c2
, which means its radius should grow, and its area should be the sum of both previous areas.
print("Radius of c1 after gorging on c2:", c1.radius)
print("Area of c1 after gorging on c2:", c1.area())
Radius of c1 after gorging on c2: 4.47213595499958
Area of c1 after gorging on c2: 62.83185307179588
It worked!
The only slightly dissatisfying feature of our implementation is that, after c1
eats c2
and expands itself accordingly, c2
is somehow still around to tell the tale. This probably violates some physical conservation law, but we’ll overlook that here. For reasons we won’t get into, it’s not trivial to delete c2
from inside c1
. (There are good reasons for this, and the fact that we can’t easily make some of our circles wink out of existence from inside the belly of other circles might lead us to suspect we’ve architected our code suboptimally. But that’s a problem for a different book.)
5.8.8.1. Exercise¶
Add a __mul__
method to your implementation of the Square
class that follows the same principles as the __mul__
method of the Circle
class, changing both the side
attribute of the calling object, as well as the return value of area
as it swallows the other object.
5.8.9. Additional resources¶
This chapter provided a high-level look at some of the main features of the Python language—some basic, some more advanced. To develop a stronger working familiarity with the language, you will need to roll up your sleeves and start writing some code. One of the best ways to learn is to pick a small problem that interests or matters to you in some way (e.g., parsing some text data you have lying around), and search the web for help every time you run into problems (there’s no shame in consulting the internet! All programmers do it!).
If you prefer to have more structure than that, there are hundreds of excellent, and mostly free, resources online to help you on your way. A couple of good ones include:
A Whirlwind Tour of Python is an excellent intro to Python by Jake VanderPlas; Jupyter notebooks are available in the book GitHub repo.
Allen Downey’s “Think Python” is another excellent introduction to the language.