You can do many things with desktop GIS software such as QGIS, but if you work with spatial data for long, you’ll inevitably want to do something that isn’t available through the software’s interface. If you know how to program, and are clever enough, you can write code that does exactly what you need. Another common scenario is the need to automate a repetitive processing task instead of using the point-and-click method over and over again. Not only is coding more fun and intellectually stimulating than pointing and clicking, but it’s also much more efficient when it comes to repetitive tasks. You have no shortage of languages you could learn and work with, but because Python is used with many GIS software packages, including QGIS and ArcGIS, it’s an excellent language for working with spatial data. It’s also powerful, but at the same time a relatively easy-to-learn language, so that makes it a good choice if you’re starting out with programming.
Another reason for using Python is that it’s an interpreted language, so programs written in Python will run on any computer with an interpreter, and interpreters exist for any operating system you’re likely to use. To run a Python script, you need the script and an interpreter, which is different from running an .exe file, for example, where you only need one file. But if you have an .exe file, you can only run it under the Windows operating system, which is a bummer if you want to run it on a Mac or Linux. However, if you have a Python script, you can run it anywhere that has an interpreter, so you’re no longer limited to a single operating system.
Another advantage of interpreted languages is that you can use them interactively. This is great for playing around and learning a language, because you can type a line of code and see the results instantly. You can run the Python interpreter in a terminal window, but it’s probably easier to use IDLE, which is a simple development environment installed with Python. Two different types of windows exist in IDLE, shells and edit windows. A shell is an interactive window in which you can type Python code and get immediate results. You’ll know that you’re looking at an interactive window if you see a >>> prompt, like that in figure 2.1. You can type code after this prompt and execute it by pressing Enter. Many of the examples in this book are run this way to show results. This is an inefficient way to run more than a few lines of code, and it doesn’t save your code for later use. This is where the edit window comes in. You can use the File menu in IDLE to open a new window, which will contain an empty file. You can type your code in there and then execute the script using the Run menu, although you’ll need to save it with a .py extension first. The output from the script will be sent to the interactive window. Speaking of output, in many of the interactive examples in this book I type a variable name to see what the variable contains, but this won’t work if you’re running the code from a script. Instead, you need to use print to explicitly tell it to send information to the output window.
Figure 2.1. An IDLE shell window
In figure 2.1 the string I typed, 'Hello world!', and the output are color coded. This syntax highlighting is useful because it helps you pick out keywords, built-in functions, strings, and error messages at a glance. It can also help you find spelling mistakes if something doesn’t change color when you expect it to. Another useful feature of IDLE is tab completion. If you start typing a variable or function name and then press the Tab key, a list of options will pop up, as shown in figure 2.2. You can keep typing, and it will narrow the search. You can also use arrow keys to scroll through the list. When the word you want is highlighted, press Tab again, and the word will appear on your screen.
Figure 2.2. Start typing and press the Tab key in order to get a list of possible variables or functions that match what you were typing.
Because Python scripts are plain text files, you aren’t forced to use IDLE if you don’t want to. You can write scripts in whatever text editor you prefer. Many editors are easy to configure, so you can run a Python script directly without leaving the editor. See the documentation for your favorite editor to learn how to do this. Packages that are designed specifically for working with Python code are Spyder, PyCharm, Wing IDE, and PyScripter. Everybody has their own favorite development environment, and you may need to play with a few different ones before you find an environment that you like.
Some of the first things you’ll see right at the top of most Python scripts are import statements. These lines of code load additional modules so that the scripts can use them. A module is basically a library of code that you can access and use from your scripts, and the large ecosystem of specialized modules is another advantage to using Python. You’d have a difficult time working with GIS data in Python without extra modules that are designed for this, similar to the way tools such as GIMP and Photoshop make it easier to work with digital images. The whole point of this book is to teach you how to use these tools for working with GIS data. Along the way, you’ll also use several of the modules that come with Python because they’re indispensable for tasks such as working with the file system.
Let’s look at a simple example that uses one of the built-in modules. The first thing you need to do to use a module is load it using import. Then you can access objects in the module by prefixing them with the module name so that Python knows where to find them. This example loads the random module and then uses the gauss function contained in that module to get a random number from the standard normal distribution:
>>> import random >>> random.gauss(0, 1) -0.22186423850882403
Another thing you might notice in a Python script is the lack of semicolons and curly braces, which are commonly used in other languages for ending lines and setting off blocks of code. Python uses whitespace to do these things. Instead of using a semicolon to end a line, press Enter and start a new line. Sometimes one line of code is too long to fit comfortably on one line in your file, however. In this case, break your line at a sensible place, such as right after a comma, and the Python interpreter will know that the lines belong together. As for the missing curly braces, Python uses indentation to define blocks of code instead. This may seem weird at first if you’re used to using braces or end statements, but indentation works as well and forces you to write more readable code. Because of this, you need to be careful with your indentations. In fact, it’s common for beginners to run into syntax errors because of wayward indentations. For example, even an extra space at the beginning of a line of code will cause an error. You’ll see examples of how indentation is used in section 2.5.
# This is a comment
Unless your script is extremely simple, it will need a way to store information as it runs, and this is where variables come in. Think about what happens when you use software to open a file, no matter what kind of file it is. The software displays an Open dialog, you select a file and click OK, and then the file is opened. When you press OK, the name of the selected file is stored as a variable so that the software knows what file to open. Even if you’ve never programmed anything in your life, you’re probably familiar with this concept in the mathematical sense. Think back to algebra class and computing the value of y based on the value of x. The x variable can take on any value, and y changes in response. A similar concept applies in programming. You’ll use many different variables, or x’s, that will affect the outcome of your script. The outcome can be anything you want it to be and isn’t limited to a single y value, however. It might be a number, if your goal is to calculate a statistic on your data, but it could as easily be one or more entirely new datasets.
>>> n = 10 >>> n 10
Although you can store whatever you want in a variable without worrying about data type, you will run into trouble if you try to use the variable in a way that’s inconsistent with the kind of data stored in it. Because the data types aren’t checked until runtime, the error won’t happen until that line of the script is executed, so you won’t get any warning beforehand. You’ll get the same errors in the Python interactive window that would occur in a script, so you can always test examples there if you’re not sure if something will work. For example, you can’t add strings and integers together, and this shows what happens if you try:
>>> msg = n + 1 Traceback (most recent call last): File "", line 1, in TypeError: Can't convert 'int' object to str implicitly
As your code becomes more complex, you’ll find that it’s extremely difficult to store all of the information that your script needs as numbers and strings. Fortunately, you can use many different types of data structures, ranging from simple numbers to complex objects that can contain many different types of data themselves. Although an infinite number of these object types can be used (because you can define your own), only a small number of core data types exist from which the more complex ones are built. I’ll briefly discuss several of those here. Please see a more comprehensive set of Python documentation for more details, because this leaves out much information.
2.4.1. Booleans
A Boolean variable denotes true or false values. Two case-sensitive keywords, True and False, are used to denote these values. They can be used in standard Boolean operations, like these:
>>> True or False True >>> not False True >>> True and False False >>> True and not False True
2.4.2. Numeric types
As you’d expect, you can use Python to work with numbers. What you might not expect, however, is that distinct kinds of numbers exist. Integers are whole numbers, such as 5, 27, or 592. Floating-point numbers, on the other hand, are numbers with decimal points, such as 5.3, 27.0, or 592.8. Would it surprise you to know that 27 and 27.0 are different? For one, they might take up different amounts of memory, although the details depend on your operating system and version of Python. If you’re using Python 2.7 there’s a major difference in how the two numbers are used for mathematical operations, because integers don’t take decimal places into account. Take a look at this Python 2.7 example:
>>> 27 / 7 3 >>> 27.0 / 7.0 3.857142857142857 >>> 27 / 7.0 3.857142857142857
>>> 27 / 7 3.857142857142857 >>> 27 // 7 3
>>> float(27) 27.0 >>> int(27.9) 27
>>> round(27.9) 28
2.4.3. Strings
Strings are text values, such as 'Hello world'. You create a string by surrounding the text with either single or double quotes—it doesn’t matter which, although if you start a string with one type, you can’t end it with the other because Python won’t recognize it as the end of the string. The fact that either one works makes it easy to include quotes as part of your string. For example, if you need single quotes inside your string, as you would in a SQL statement, surround the entire string with double quotes, like this:
sql = "SELECT * FROM cities WHERE country = 'Canada'"
>>> 'Don't panic!' File "", line 1 'Don't panic!' ^ SyntaxError: invalid syntax >>> 'Don\'t panic!' "Don't panic!"
>>> print('Don\'t panic!') Don't panic!
Joining strings
>>> 'Beam me up ' + 'Scotty' 'Beam me up Scotty'
>>> 'I wish I were as smart as {0} {1}'.format('Albert', 'Einstein') 'I wish I were as smart as Albert Einstein'
>>> 'I wish I were as smart as {1}, {0}'.format('Albert', 'Einstein') 'I wish I were as smart as Einstein, Albert'
Escape characters
>>> print('Title:\tMoby Dick\nAuthor:\tHerman Melville') Title: Moby Dick Author: Herman Melville
>>> import os >>> os.path.exists('d:\temp\cities.csv') False
>>> print('d:\temp\cities.csv') d: emp\cities.csv
>>> os.path.exists('d:/temp/cities.csv') True >>> os.path.exists('d:\\temp\\cities.csv') True >>> os.path.exists(r'd:\temp\cities.csv') True
2.4.4. Lists and tuples
A list is an ordered collection of items that are accessed via their index. The first item in the list has index 0, the second has index 1, and so on. The items don’t even have to all be the same data type. You can create an empty list with a set of square brackets, [], or you can populate it right off the bat. For example, this creates a list with a mixture of numbers and strings and then accesses some of them:
>>> data = [5, 'Bob', 'yellow', -43, 'cat'] >>> data[0] 5 >>> data[2] 'yellow'
>>> data[-1] 'cat' >>> data[-3] 'yellow'
>>> data[1:3] ['Bob', 'yellow'] >>> data[-4:-1] ['Bob', 'yellow', -43]
>>> data[2] = 'red' >>> data [5, 'Bob', 'red', -43, 'cat'] >>> data[0:2] = [2, 'Mary'] >>> data [2, 'Mary', 'red', -43, 'cat']
>>> data.append('dog') >>> data [2, 'Mary', 'red', -43, 'cat', 'dog'] >>> del data[1] >>> data [2, 'red', -43, 'cat', 'dog']
>>> len(data) 5 >>> 2 in data True >>> 'Mary' in data False
>>> data = (5, 'Bob', 'yellow', -43, 'cat') >>> data[1:3] ('Bob', 'yellow') >>> len(data) 5 >>> 'Bob' in data True
>>> data[0] = 10 Traceback (most recent call last): File "", line 1, in TypeError: 'tuple' object does not support item assignment
2.4.5. Sets
Sets are unordered collections of items, but each value can only occur once, which makes it an easy way to remove duplicates from a list. For example, this set is created using a list that contains two instances of the number 13, but only one is in the resulting set:
>>> data = set(['book', 6, 13, 13, 'movie']) >>> data {'movie', 6, 'book', 13}
>>> data.add('movie') >>> data.add('game') >>> data {'movie', 'game', 6, 'book', 13}
>>> 13 in data True
2.4.6. Dictionaries
Dictionaries are indexed collections, like lists and tuples, except that the indices aren’t offsets like they are in lists. Instead, you get to choose the index value, called a key. Keys can be numbers, strings, or other data types, as can the values they reference. Use curly braces to create a new dictionary:
>>> data = {'color': 'red', 'lucky number': 42, 1: 'one'} >>> data {1: 'one', 'lucky number': 42, 'color': 'red'} >>> data[1] 'one' >>> data['lucky number'] 42
As with lists, you can add, change, and remove items:
>>> data[5] = 'candy' >>> data {1: 'one', 'lucky number': 42, 5: 'candy', 'color': 'red'} >>> data['color'] = 'green' >>> data {1: 'one', 'lucky number': 42, 5: 'candy', 'color': 'green'} >>> del data[1] >>> data {'lucky number': 42, 5: 'candy', 'color': 'green'}
>>> 'color' in data True
The first script you write will probably consist of a sequence of statements that are executed in order, like all of the examples we have looked at so far. The real power of programming, however, is the ability to change what happens based on different conditions. Similar to the way you might use sale prices to decide which veggies to buy at the supermarket, your code should use data, such as whether it’s working with a point or a line, to determine exactly what needs to be done. Control flow is the concept of changing this order of code execution.
2.5.1. If statements
Perhaps the simplest way to change execution order is to test a condition and do something different depending on the outcome of the test. This can be done with an if statement. Here’s a simple example:
if n == 1: print('n equals 1') else: print('n does not equal 1')
n = 1 if n == 1: print('n equals 1') else: print('n does not equal 1') print('This is not part of the condition')
n equals 1 This is not part of the condition
You can also test multiple conditions like this:
if n == 1: print('n equals 1') elif n == 3: print('n equals 3') elif n > 5: print('n is greater than 5') else: print('what is n?')
Jn rdjc vzcs, n jc rsfti emporadc rx 1. Jl rj’z vrn qealu rv 1, xrnp rj’z rmadpceo rk 3. Jl rj’z rnx uqale er dsrr, rieeth, norb rj ehckcs xr kzk jl n zj reterga rsnq 5. Jl none le teohs nosidocnit vst drto, ykrn rxd aogx dreun prx else aentttsem jc cduetexe. Xxh szn yxcx ca bncm elif neetsatsmt zz xbg crnw, rub nkqf onk if nzg nx xtmk rcdn nxv else. Sraimil rk rou wsd rqo elif mssntaette nzxt’r eerriduq, thirnee aj nc else masenttet. Bhx zzn zxb nc if asmntetet fcf pd leistf lj edq’g xfjv.
>>> if '': ... print('a blank string acts like True') ... else: ... print('a blank string acts like false') ... a blank string acts like false
>>> if [1]: ... print('a non-empty list acts like True') ... else: ... print('a non-empty list acts like False') ... a non-empty list acts like True
2.5.2. While statements
A while statement executes a block of code as long as a condition is True. The condition is evaluated, and if it’s True, then the code is executed. Then the condition is checked again, and if it’s still True, then the code executes again. This continues until the condition is False. If the condition never becomes False, then the code will run forever, which is called an infinite loop and is a scenario you definitely want to avoid. Here’s an example of a while loop:
>>> n = 0 >>> while n < 5: ... print(n) ... n += 1 ... 0 1 2 3 4
2.5.3. For statements
A for statement allows you to iterate over a sequence of values and do something for each one. When you write a for statement, you not only provide the sequence to iterate over, but you also provide a variable name. Each time through the loop, this variable contains a different value from the sequence. This example iterates through a list of names and prints a message for each one:
>>> names = ['Chris', 'Janet', 'Tami'] >>> for name in names: ... print('Hello {}!'.format(name)) ... Hello Chris! Hello Janet! Hello Tami!
The range function
>>> n = 0 >>> for i in range(20): ... n += 1 ... >>> print(n) 20
>>> n = 1 >>> for i in range(1, 21): ... n = n * i ... >>> print(n) 2432902008176640000
2.5.4. break, continue, and else
A few statements apply to while and for loops. The first one, break, will kick execution completely out of the loop, as in this example that stops the loop when i is equal to 3:
>>> for i in range(5): ... if i == 3: ... break ... print(i) ... 0 1 2
If you find that you reuse the same bits of code over and over, you can create your own function and call that instead of repeating the same code. This makes things much easier and also less error-prone, because you won’t have nearly as many places to make typos. When you create a function, you need to give it a name and tell it what parameters the user needs to provide to use it. Let’s create a simple function to calculate a factorial:
def factorial(n): answer = 1 for i in range(1, n + 1): answer = answer * i return answer
>>> fact5 = factorial(5)
def factorial(n, print_it=False): answer = 1 for i in range(1, n + 1): answer = answer * i if print_it: print('{0}! = {1}'.format(n, answer)) return answer
>>> fact5 = factorial(5, True) 5! = 120
It’s easy to reuse your functions by saving them in a .py file and then importing them the way you would any other module. The one hitch is that your file needs to be in a location where Python can find it. One way to do this is to put it in the same folder as the script that you’re running. For example, if the factorial function was saved in a file called, you could import myfuncs (notice there’s no .py extension) and then call the function inside of it:
import myfuncs fact5 = myfuncs.factorial(5)
As you work through this book, you’ll come across variables that have other data and functions attached to them. These are objects created from classes. Although we won’t cover how to create your own classes in this book, you need to be aware of them because you’ll still use ones defined by someone else. Classes are an extremely powerful concept, but all you need to understand for the purposes of this book are that they’re data types that can contain their own internal data and functions. An object or variable that is of this type contains these data and functions, and the functions operate on that particular object. You saw this with several of the data types we looked at earlier, such as lists. You can have a variable of type list, and that variable contains all of the functions, such as append, that come with being a list. When you call append on a list, it only appends data to that particular list and not to any other list variables you might have.
>>> import datetime >>> datetype = >>> mydate = >>> mydate, 5, 18) >>> mydate.weekday() 6
>>> newdate = mydate.replace(year=2010) >>> newdate, 5, 18) >>> newdate.weekday() 1
You’ll use objects created from classes throughout this book. For example, whenever you open a dataset, you’ll get an object that represents that dataset. Depending on the type of data, that object will have different information and functions associated with it. Obviously, you need to know about the classes being used to create these objects, so that you know what data and functions they contain. The GDAL modules contain fairly extensive classes, which are documented in appendixes B, C, and D. (Appendixes C through E are available online on the Manning Publications website at
