Reading Notes on "Dive Into Python"

Dive Into Python
20 May 2004
Copyright ? 2000, 2001, 2002, 2003, 2004 Mark Pilgrim (mailto:[email protected])
This book lives at http://diveintopython.org/. If you're reading it somewhere else, you may not have the latest version.
Permission is granted to copy, distribute, and/or modify this document under the terms of the GNU Free
Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no
Invariant Sections, no Front?Cover Texts, and no Back?Cover Texts. A copy of the license is included in
Appendix G, GNU Free Documentation License.
The example programs in this book are free software; you can redistribute and/or modify them under the terms of the
Python license as published by the Python Software Foundation. A copy of the license is included in Appendix H,
Python license.


13.2. Diving in
Now that you've completely defined the behavior you expect from your conversion functions, you're going to do
something a little unexpected: you're going to write a test suite that puts these functions through their paces and makes
sure that they behave the way you want them to. You read that right: you're going to write code that tests code that
you haven't written yet.
This is called unit testing, since the set of two conversion functions can be written and tested as a unit, separate from
any larger program they may become part of later. Python has a framework for unit testing, the appropriately?named
unittest module.
unittest is included with Python 2.1 and later. Python 2.0 users can download it from
pyunit.sourceforge.net (http://pyunit.sourceforge.net/).
Unit testing is an important part of an overall testing?centric development strategy. If you write unit tests, it is
important to write them early (preferably before writing the code that they test), and to keep them updated as code and
requirements change. Unit testing is not a replacement for higher?level functional or system testing, but it is important
in all phases of development:
· Before writing code, it forces you to detail your requirements in a useful fashion.
· While writing code, it keeps you from over?coding. When all the test cases pass, the function is complete.
· When refactoring code, it assures you that the new version behaves the same way as the old version.
When maintaining code, it helps you cover your ass when someone comes screaming that your latest change
broke their old code. ("But sir, all the unit tests passed when I checked it in...")
·
When writing code in a team, it increases confidence that the code you're about to commit isn't going to break
other peoples' code, because you can run their unittests first. (I've seen this sort of thing in code sprints. A
team breaks up the assignment, everybody takes the specs for their task, writes unit tests for it, then shares
their unit tests with the rest of the team. That way, nobody goes off too far into developing code that won't
play well with others.)
·




13.4. Testing for success
The most fundamental part of unit testing is constructing individual test cases. A test case answers a single question
about the code it is testing.
A test case should be able to...
· ...run completely by itself, without any human input. Unit testing is about automation.
...determine by itself whether the function it is testing has passed or failed, without a human interpreting the
results.
·
...run in isolation, separate from any other test cases (even if they test the same functions). Each test case is an
island.



13.5. Testing for failure
It is not enough to test that functions succeed when given good input; you must also test that they fail when given bad
input. And not just any sort of failure; they must fail in the way you expect.


13.6. Testing for sanity
Often, you will find that a unit of code contains a set of reciprocal functions, usually in the form of conversion
functions where one converts A to B and the other converts B to A. In these cases, it is useful to create a "sanity
check" to make sure that you can convert A to B and back to A without losing precision, incurring rounding errors, or
triggering any other sort of bug.

Chapter 14. Test?First Programming
The most important thing that comprehensive unit testing can tell you is when to stop coding. When all the unit tests
for a function pass, stop coding the function. When all the unit tests for an entire module pass, stop coding the
module.



·
Dive Into Python 267
Appendix B. A 5?minute review
Chapter 1. Installing Python
1.1. Which Python is right for you?
The first thing you need to do with Python is install it. Or do you?
·
1.2. Python on Windows
On Windows, you have a couple choices for installing Python.
·
1.3. Python on Mac OS X
On Mac OS X, you have two choices for installing Python: install it, or don't install it. You
probably want to install it.
·
1.4. Python on Mac OS 9
Mac OS 9 does not come with any version of Python, but installation is very simple, and there
is only one choice.
·
1.5. Python on RedHat Linux
Download the latest Python RPM by going to http://www.python.org/ftp/python/ and
selecting the highest version number listed, then selecting the rpms/ directory within that.
Then download the RPM with the highest version number. You can install it with the rpm
command, as shown here:
·
1.6. Python on Debian GNU/Linux
If you are lucky enough to be running Debian GNU/Linux, you install Python through the apt
command.
·
1.7. Python Installation from Source
If you prefer to build from source, you can download the Python source code from
http://www.python.org/ftp/python/. Select the highest version number listed, download the
.tgz file), and then do the usual configure, make, make install dance.
·
1.8. The Interactive Shell
Now that you have Python installed, what's this interactive shell thing you're running?
·
1.9. Summary
You should now have a version of Python installed that works for you.
·
Chapter 2. Your First Python Program
2.1. Diving in
Here is a complete, working Python program.
·
2.2. Declaring Functions
Python has functions like most other languages, but it does not have separate header files like
C++ or interface/implementation sections like Pascal. When you need a function,
just declare it, like this:
·
· 2.3. Documenting Functions
Dive Into Python 268
You can document a Python function by giving it a doc string.
2.4. Everything Is an Object
A function, like everything else in Python, is an object.
·
2.5. Indenting Code
Python functions have no explicit begin or end, and no curly braces to mark where the
function code starts and stops. The only delimiter is a colon (:) and the indentation of the
code itself.
·
2.6. Testing Modules
Python modules are objects and have several useful attributes. You can use this to easily test
your modules as you write them. Here's an example that uses the if __name__ trick.
·
Chapter 3. Native Datatypes
3.1. Introducing Dictionaries
One of Python's built?in datatypes is the dictionary, which defines one?to?one relationships
between keys and values.
·
3.2. Introducing Lists
Lists are Python's workhorse datatype. If your only experience with lists is arrays in Visual
Basic or (God forbid) the datastore in Powerbuilder, brace yourself for Python lists.
·
3.3. Introducing Tuples
A tuple is an immutable list. A tuple can not be changed in any way once it is created.
·
3.4. Declaring variables
Python has local and global variables like most other languages, but it has no explicit variable
declarations. Variables spring into existence by being assigned a value, and they are
automatically destroyed when they go out of scope.
·
3.5. Formatting Strings
Python supports formatting values into strings. Although this can include very complicated
expressions, the most basic usage is to insert values into a string with the %s placeholder.
·
3.6. Mapping Lists
One of the most powerful features of Python is the list comprehension, which provides a
compact way of mapping a list into another list by applying a function to each of the elements
of the list.
·
3.7. Joining Lists and Splitting Strings
You have a list of key?value pairs in the form key=value, and you want to join them into a
single string. To join any list of strings into a single string, use the join method of a string
object.
·
3.8. Summary
The odbchelper.py program and its output should now make perfect sense.
·
Chapter 4. The Power Of Introspection
Dive Into Python 269
4.1. Diving In
Here is a complete, working Python program. You should understand a good deal about it just
by looking at it. The numbered lines illustrate concepts covered in Chapter 2, Your First
Python Program. Don't worry if the rest of the code looks intimidating; you'll learn all about
it throughout this chapter.
·
4.2. Using Optional and Named Arguments
Python allows function arguments to have default values; if the function is called without the
argument, the argument gets its default value. Futhermore, arguments can be specified in any
order by using named arguments. Stored procedures in SQL Server Transact/SQL can do this,
so if you're a SQL Server scripting guru, you can skim this part.
·
4.3. Using type, str, dir, and Other Built?In Functions
Python has a small set of extremely useful built?in functions. All other functions are
partitioned off into modules. This was actually a conscious design decision, to keep the core
language from getting bloated like other scripting languages (cough cough, Visual Basic).
·
4.4. Getting Object References With getattr
You already know that Python functions are objects. What you don't know is that you can get
a reference to a function without knowing its name until run?time, by using the getattr
function.
·
4.5. Filtering Lists
As you know, Python has powerful capabilities for mapping lists into other lists, via list
comprehensions (Section 3.6, Mapping Lists). This can be combined with a filtering
mechanism, where some elements in the list are mapped while others are skipped entirely.
·
4.6. The Peculiar Nature of and and or
In Python, and and or perform boolean logic as you would expect, but they do not return
boolean values; instead, they return one of the actual values they are comparing.
·
4.7. Using lambda Functions
Python supports an interesting syntax that lets you define one?line mini?functions on the fly.
Borrowed from Lisp, these so?called lambda functions can be used anywhere a function is
required.
·
4.8. Putting It All Together
The last line of code, the only one you haven't deconstructed yet, is the one that does all the
work. But by now the work is easy, because everything you need is already set up just the
way you need it. All the dominoes are in place; it's time to knock them down.
·
4.9. Summary
The apihelper.py program and its output should now make perfect sense.
·
Chapter 5. Objects and Object?Orientation
5.1. Diving In
Here is a complete, working Python program. Read the doc strings of the module, the
classes, and the functions to get an overview of what this program does and how it works. As
usual, don't worry about the stuff you don't understand; that's what the rest of the chapter is
·
Dive Into Python 270
for.
5.2. Importing Modules Using from module import
Python has two ways of importing modules. Both are useful, and you should know when to
use each. One way, import module, you've already seen in Section 2.4, Everything Is an
Object. The other way accomplishes the same thing, but it has subtle and important
differences.
·
5.3. Defining Classes
Python is fully object?oriented: you can define your own classes, inherit from your own or
built?in classes, and instantiate the classes you've defined.
·
5.4. Instantiating Classes
Instantiating classes in Python is straightforward. To instantiate a class, simply call the class
as if it were a function, passing the arguments that the __init__ method defines. The return
value will be the newly created object.
·
5.5. Exploring UserDict: A Wrapper Class
As you've seen, FileInfo is a class that acts like a dictionary. To explore this further, let's
look at the UserDict class in the UserDict module, which is the ancestor of the
FileInfo class. This is nothing special; the class is written in Python and stored in a .py
file, just like any other Python code. In particular, it's stored in the lib directory in your
Python installation.
·
5.6. Special Class Methods
In addition to normal class methods, there are a number of special methods that Python
classes can define. Instead of being called directly by your code (like normal methods),
special methods are called for you by Python in particular circumstances or when specific
syntax is used.
·
5.7. Advanced Special Class Methods
Python has more special methods than just __getitem__ and __setitem__. Some of
them let you emulate functionality that you may not even know about.
·
5.8. Introducing Class Attributes
You already know about data attributes, which are variables owned by a specific instance of a
class. Python also supports class attributes, which are variables owned by the class itself.
·
5.9. Private Functions
Unlike in most languages, whether a Python function, method, or attribute is private or public
is determined entirely by its name.
·
5.10. Summary
That's it for the hard?core object trickery. You'll see a real?world application of special class
methods in Chapter 12, which uses getattr to create a proxy to a remote web service.
·
Chapter 6. Exceptions and File Handling
6.1. Handling Exceptions
Like many other programming languages, Python has exception handling via
try...except blocks.
·
Dive Into Python 271
6.2. Working with File Objects
Python has a built?in function, open, for opening a file on disk. open returns a file object,
which has methods and attributes for getting information about and manipulating the opened
file.
·
6.3. Iterating with for Loops
Like most other languages, Python has for loops. The only reason you haven't seen them
until now is that Python is good at so many other things that you don't need them as often.
·
6.4. Using sys.modules
Modules, like everything else in Python, are objects. Once imported, you can always get a
reference to a module through the global dictionary sys.modules.
·
6.5. Working with Directories
The os.path module has several functions for manipulating files and directories. Here,
we're looking at handling pathnames and listing the contents of a directory.
·
6.6. Putting It All Together
Once again, all the dominoes are in place. You've seen how each line of code works. Now
let's step back and see how it all fits together.
·
6.7. Summary
The fileinfo.py program introduced in Chapter 5 should now make perfect sense.
·
Chapter 7. Regular Expressions
7.1. Diving In
If what you're trying to do can be accomplished with string functions, you should use them.
They're fast and simple and easy to read, and there's a lot to be said for fast, simple, readable
code. But if you find yourself using a lot of different string functions with if statements to
handle special cases, or if you're combining them with split and join and list
comprehensions in weird unreadable ways, you may need to move up to regular expressions.
·
7.2. Case Study: Street Addresses
This series of examples was inspired by a real?life problem I had in my day job several years
ago, when I needed to scrub and standardize street addresses exported from a legacy system
before importing them into a newer system. (See, I don't just make this stuff up; it's actually
useful.) This example shows how I approached the problem.
·
7.3. Case Study: Roman Numerals
You've most likely seen Roman numerals, even if you didn't recognize them. You may have
seen them in copyrights of old movies and television shows ("Copyright MCMXLVI" instead
of "Copyright 1946"), or on the dedication walls of libraries or universities ("established
MDCCCLXXXVIII" instead of "established 1888"). You may also have seen them in
outlines and bibliographical references. It's a system of representing numbers that really does
date back to the ancient Roman empire (hence the name).
·
7.4. Using the {n,m} Syntax
In the previous section, you were dealing with a pattern where the same character could be
repeated up to three times. There is another way to express this in regular expressions, which
·
Dive Into Python 272
some people find more readable. First look at the method we already used in the previous
example.
7.5. Verbose Regular Expressions
So far you've just been dealing with what I'll call "compact" regular expressions. As you've
seen, they are difficult to read, and even if you figure out what one does, that's no guarantee
that you'll be able to understand it six months later. What you really need is inline
documentation.
·
7.6. Case study: Parsing Phone Numbers
So far you've concentrated on matching whole patterns. Either the pattern matches, or it
doesn't. But regular expressions are much more powerful than that. When a regular
expression does match, you can pick out specific pieces of it. You can find out what matched
where.
·
7.7. Summary
This is just the tiniest tip of the iceberg of what regular expressions can do. In other words,
even though you're completely overwhelmed by them now, believe me, you ain't seen nothing
yet.
·
Chapter 8. HTML Processing
8.1. Diving in
I often see questions on comp.lang.python
(http://groups.google.com/groups?group=comp.lang.python) like "How can I list all the
[headers|images|links] in my HTML document?" "How do I parse/translate/munge the text of
my HTML document but leave the tags alone?" "How can I add/remove/quote attributes of all
my HTML tags at once?" This chapter will answer all of these questions.
·
8.2. Introducing sgmllib.py
HTML processing is broken into three steps: breaking down the HTML into its constituent
pieces, fiddling with the pieces, and reconstructing the pieces into HTML again. The first step
is done by sgmllib.py, a part of the standard Python library.
·
8.3. Extracting data from HTML documents
To extract data from HTML documents, subclass the SGMLParser class and define methods
for each tag or entity you want to capture.
·
8.4. Introducing BaseHTMLProcessor.py
SGMLParser doesn't produce anything by itself. It parses and parses and parses, and it calls
a method for each interesting thing it finds, but the methods don't do anything. SGMLParser
is an HTML consumer: it takes HTML and breaks it down into small, structured pieces. As
you saw in the previous section, you can subclass SGMLParser to define classes that catch
specific tags and produce useful things, like a list of all the links on a web page. Now you'll
take this one step further by defining a class that catches everything SGMLParser throws at
it and reconstructs the complete HTML document. In technical terms, this class will be an
HTML producer.
·
8.5. locals and globals
Let's digress from HTML processing for a minute and talk about how Python handles
variables. Python has two built?in functions, locals and globals, which provide
·
Dive Into Python 273
dictionary?based access to local and global variables.
8.6. Dictionary?based string formatting
There is an alternative form of string formatting that uses dictionaries instead of tuples of
values.
·
8.7. Quoting attribute values
A common question on comp.lang.python
(http://groups.google.com/groups?group=comp.lang.python) is "I have a bunch of HTML
documents with unquoted attribute values, and I want to properly quote them all. How can I
do this?"[4] (This is generally precipitated by a project manager who has found the
HTML?is?a?standard religion joining a large project and proclaiming that all pages must
validate against an HTML validator. Unquoted attribute values are a common violation of the
HTML standard.) Whatever the reason, unquoted attribute values are easy to fix by feeding
HTML through BaseHTMLProcessor.
·
8.8. Introducing dialect.py
Dialectizer is a simple (and silly) descendant of BaseHTMLProcessor. It runs blocks
of text through a series of substitutions, but it makes sure that anything within a

...
block passes through unaltered.
·
8.9. Putting it all together
It's time to put everything you've learned so far to good use. I hope you were paying attention.
·
8.10. Summary
Python provides you with a powerful tool, sgmllib.py, to manipulate HTML by turning
its structure into an object model. You can use this tool in many different ways.
·
Chapter 9. XML Processing
9.1. Diving in
There are two basic ways to work with XML. One is called SAX ("Simple API for XML"),
and it works by reading the XML a little bit at a time and calling a method for each element it
finds. (If you read Chapter 8, HTML Processing, this should sound familiar, because that's
how the sgmllib module works.) The other is called DOM ("Document Object Model"),
and it works by reading in the entire XML document at once and creating an internal
representation of it using native Python classes linked in a tree structure. Python has standard
modules for both kinds of parsing, but this chapter will only deal with using the DOM.
·
9.2. Packages
Actually parsing an XML document is very simple: one line of code. However, before you
get to that line of code, you need to take a short detour to talk about packages.
·
9.3. Parsing XML
As I was saying, actually parsing an XML document is very simple: one line of code. Where
you go from there is up to you.
·
9.4. Unicode
Unicode is a system to represent characters from all the world's different languages. When
Python parses an XML document, all data is stored in memory as unicode.
·
· 9.5. Searching for elements
Dive Into Python 274
Traversing XML documents by stepping through each node can be tedious. If you're looking
for something in particular, buried deep within your XML document, there is a shortcut you
can use to find it quickly: getElementsByTagName.
9.6. Accessing element attributes
XML elements can have one or more attributes, and it is incredibly simple to access them
once you have parsed an XML document.
·
9.7. Segue
OK, that's it for the hard?core XML stuff. The next chapter will continue to use these same
example programs, but focus on other aspects that make the program more flexible: using
streams for input processing, using getattr for method dispatching, and using
command?line flags to allow users to reconfigure the program without changing the code.
·
Chapter 10. Scripts and Streams
10.1. Abstracting input sources
One of Python's greatest strengths is its dynamic binding, and one powerful use of dynamic
binding is the file?like object.
·
10.2. Standard input, output, and error
UNIX users are already familiar with the concept of standard input, standard output, and
standard error. This section is for the rest of you.
·
10.3. Caching node lookups
kgp.py employs several tricks which may or may not be useful to you in your XML
processing. The first one takes advantage of the consistent structure of the input documents to
build a cache of nodes.
·
10.4. Finding direct children of a node
Another useful techique when parsing XML documents is finding all the direct child elements
of a particular element. For instance, in the grammar files, a ref element can have several p
elements, each of which can contain many things, including other p elements. You want to
find just the p elements that are children of the ref, not p elements that are children of other
p elements.
·
10.5. Creating separate handlers by node type
The third useful XML processing tip involves separating your code into logical functions,
based on node types and element names. Parsed XML documents are made up of various
types of nodes, each represented by a Python object. The root level of the document itself is
represented by a Document object. The Document then contains one or more Element
objects (for actual XML tags), each of which may contain other Element objects, Text
objects (for bits of text), or Comment objects (for embedded comments). Python makes it
easy to write a dispatcher to separate the logic for each node type.
·
10.6. Handling command?line arguments
Python fully supports creating programs that can be run on the command line, complete with
command?line arguments and either short? or long?style flags to specify various options.
None of this is XML?specific, but this script makes good use of command?line processing,
so it seemed like a good time to mention it.
·
· 10.7. Putting it all together
Dive Into Python 275
You've covered a lot of ground. Let's step back and see how all the pieces fit together.
10.8. Summary
Python comes with powerful libraries for parsing and manipulating XML documents. The
minidom takes an XML file and parses it into Python objects, providing for random access
to arbitrary elements. Furthermore, this chapter shows how Python can be used to create a
"real" standalone command?line script, complete with command?line flags, command?line
arguments, error handling, even the ability to take input from the piped result of a previous
program.
·
Chapter 11. HTTP Web Services
11.1. Diving in
You've learned about HTML processing and XML processing, and along the way you saw
how to download a web page and how to parse XML from a URL, but let's dive into the more
general topic of HTTP web services.
·
11.2. How not to fetch data over HTTP
Let's say you want to download a resource over HTTP, such as a syndicated Atom feed. But
you don't just want to download it once; you want to download it over and over again, every
hour, to get the latest news from the site that's offering the news feed. Let's do it the
quick?and?dirty way first, and then see how you can do better.
·
11.3. Features of HTTP
There are five important features of HTTP which you should support.
·
11.4. Debugging HTTP web services
First, let's turn on the debugging features of Python's HTTP library and see what's being sent
over the wire. This will be useful throughout the chapter, as you add more and more features.
·
11.5. Setting the User?Agent
The first step to improving your HTTP web services client is to identify yourself properly
with a User?Agent. To do that, you need to move beyond the basic urllib and dive into
urllib2.
·
11.6. Handling Last?Modified and ETag
Now that you know how to add custom HTTP headers to your web service requests, let's look
at adding support for Last?Modified and ETag headers.
·
11.7. Handling redirects
You can support permanent and temporary redirects using a different kind of custom URL
handler.
·
11.8. Handling compressed data
The last important HTTP feature you want to support is compression. Many web services
have the ability to send data compressed, which can cut down the amount of data sent over
the wire by 60% or more. This is especially true of XML web services, since XML data
compresses very well.
·
11.9. Putting it all together
You've seen all the pieces for building an intelligent HTTP web services client. Now let's see
·
Dive Into Python 276
how they all fit together.
11.10. Summary
The openanything.py and its functions should now make perfect sense.
·
Chapter 12. SOAP Web Services
12.1. Diving In
You use Google, right? It's a popular search engine. Have you ever wished you could
programmatically access Google search results? Now you can. Here is a program to search
Google from Python.
·
12.2. Installing the SOAP Libraries
Unlike the other code in this book, this chapter relies on libraries that do not come
pre?installed with Python.
·
12.3. First Steps with SOAP
The heart of SOAP is the ability to call remote functions. There are a number of public access
SOAP servers that provide simple functions for demonstration purposes.
·
12.4. Debugging SOAP Web Services
The SOAP libraries provide an easy way to see what's going on behind the scenes.
·
12.5. Introducing WSDL
The SOAPProxy class proxies local method calls and transparently turns then into
invocations of remote SOAP methods. As you've seen, this is a lot of work, and SOAPProxy
does it quickly and transparently. What it doesn't do is provide any means of method
introspection.
·
12.6. Introspecting SOAP Web Services with WSDL
Like many things in the web services arena, WSDL has a long and checkered history, full of
political strife and intrigue. I will skip over this history entirely, since it bores me to tears.
There were other standards that tried to do similar things, but WSDL won, so let's learn how
to use it.
·
12.7. Searching Google
Let's finally turn to the sample code that you saw that the beginning of this chapter, which
does something more useful and exciting than get the current temperature.
·
12.8. Troubleshooting SOAP Web Services
Of course, the world of SOAP web services is not all happiness and light. Sometimes things
go wrong.
·
12.9. Summary
SOAP web services are very complicated. The specification is very ambitious and tries to
cover many different use cases for web services. This chapter has touched on some of the
simpler use cases.
·
Chapter 13. Unit Testing
· 13.1. Introduction to Roman numerals
Dive Into Python 277
In previous chapters, you "dived in" by immediately looking at code and trying to understand
it as quickly as possible. Now that you have some Python under your belt, you're going to
step back and look at the steps that happen before the code gets written.
13.2. Diving in
Now that you've completely defined the behavior you expect from your conversion functions,
you're going to do something a little unexpected: you're going to write a test suite that puts
these functions through their paces and makes sure that they behave the way you want them
to. You read that right: you're going to write code that tests code that you haven't written yet.
·
13.3. Introducing romantest.py
This is the complete test suite for your Roman numeral conversion functions, which are yet to
be written but will eventually be in roman.py. It is not immediately obvious how it all fits
together; none of these classes or methods reference any of the others. There are good reasons
for this, as you'll see shortly.
·
13.4. Testing for success
The most fundamental part of unit testing is constructing individual test cases. A test case
answers a single question about the code it is testing.
·
13.5. Testing for failure
It is not enough to test that functions succeed when given good input; you must also test that
they fail when given bad input. And not just any sort of failure; they must fail in the way you
expect.
·
13.6. Testing for sanity
Often, you will find that a unit of code contains a set of reciprocal functions, usually in the
form of conversion functions where one converts A to B and the other converts B to A. In
these cases, it is useful to create a "sanity check" to make sure that you can convert A to B
and back to A without losing precision, incurring rounding errors, or triggering any other sort
of bug.
·
Chapter 14. Test?First Programming
14.1. roman.py, stage 1
Now that the unit tests are complete, it's time to start writing the code that the test cases are
attempting to test. You're going to do this in stages, so you can see all the unit tests fail, then
watch them pass one by one as you fill in the gaps in roman.py.
·
14.2. roman.py, stage 2
Now that you have the framework of the roman module laid out, it's time to start writing
code and passing test cases.
·
14.3. roman.py, stage 3
Now that toRoman behaves correctly with good input (integers from 1 to 3999), it's time to
make it behave correctly with bad input (everything else).
·
14.4. roman.py, stage 4
Now that toRoman is done, it's time to start coding fromRoman. Thanks to the rich data
structure that maps individual Roman numerals to integer values, this is no more difficult than
the toRoman function.
·
Dive Into Python 278
14.5. roman.py, stage 5
Now that fromRoman works properly with good input, it's time to fit in the last piece of the
puzzle: making it work properly with bad input. That means finding a way to look at a string
and determine if it's a valid Roman numeral. This is inherently more difficult than validating
numeric input in toRoman, but you have a powerful tool at your disposal: regular
expressions.
·
Chapter 15. Refactoring
15.1. Handling bugs
Despite your best efforts to write comprehensive unit tests, bugs happen. What do I mean by
"bug"? A bug is a test case you haven't written yet.
·
15.2. Handling changing requirements
Despite your best efforts to pin your customers to the ground and extract exact requirements
from them on pain of horrible nasty things involving scissors and hot wax, requirements will
change. Most customers don't know what they want until they see it, and even if they do, they
aren't that good at articulating what they want precisely enough to be useful. And even if they
do, they'll want more in the next release anyway. So be prepared to update your test cases as
requirements change.
·
15.3. Refactoring
The best thing about comprehensive unit testing is not the feeling you get when all your test
cases finally pass, or even the feeling you get when someone else blames you for breaking
their code and you can actually prove that you didn't. The best thing about unit testing is that
it gives you the freedom to refactor mercilessly.
·
15.4. Postscript
A clever reader read the previous section and took it to the next level. The biggest headache
(and performance drain) in the program as it is currently written is the regular expression,
which is required because you have no other way of breaking down a Roman numeral. But
there's only 5000 of them; why don't you just build a lookup table once, then simply read
that? This idea gets even better when you realize that you don't need to use regular
expressions at all. As you build the lookup table for converting integers to Roman numerals,
you can build the reverse lookup table to convert Roman numerals to integers.
·
15.5. Summary
Unit testing is a powerful concept which, if properly implemented, can both reduce
maintenance costs and increase flexibility in any long?term project. It is also important to
understand that unit testing is not a panacea, a Magic Problem Solver, or a silver bullet.
Writing good test cases is hard, and keeping them up to date takes discipline (especially when
customers are screaming for critical bug fixes). Unit testing is not a replacement for other
forms of testing, including functional testing, integration testing, and user acceptance testing.
But it is feasible, and it does work, and once you've seen it work, you'll wonder how you ever
got along without it.
·
Chapter 16. Functional Programming
· 16.1. Diving in
Dive Into Python 279
In Chapter 13, Unit Testing, you learned about the philosophy of unit testing. In Chapter 14,
Test?First Programming, you stepped through the implementation of basic unit tests in
Python. In Chapter 15, Refactoring, you saw how unit testing makes large?scale refactoring
easier. This chapter will build on those sample programs, but here we will focus more on
advanced Python?specific techniques, rather than on unit testing itself.
16.2. Finding the path
When running Python scripts from the command line, it is sometimes useful to know where
the currently running script is located on disk.
·
16.3. Filtering lists revisited
You're already familiar with using list comprehensions to filter lists. There is another way to
accomplish this same thing, which some people feel is more expressive.
·
16.4. Mapping lists revisited
You're already familiar with using list comprehensions to map one list into another. There is
another way to accomplish the same thing, using the built?in map function. It works much
the same way as the filter function.
·
16.5. Data?centric programming
By now you're probably scratching your head wondering why this is better than using for
loops and straight function calls. And that's a perfectly valid question. Mostly, it's a matter of
perspective. Using map and filter forces you to center your thinking around your data.
·
16.6. Dynamically importing modules
OK, enough philosophizing. Let's talk about dynamically importing modules.
·
16.7. Putting it all together
You've learned enough now to deconstruct the first seven lines of this chapter's code sample:
reading a directory and importing selected modules within it.
·
16.8. Summary
The regression.py program and its output should now make perfect sense.
·
Chapter 17. Dynamic functions
17.1. Diving in
I want to talk about plural nouns. Also, functions that return other functions, advanced regular
expressions, and generators. Generators are new in Python 2.3. But first, let's talk about how
to make plural nouns.
·
17.2. plural.py, stage 1
So you're looking at words, which at least in English are strings of characters. And you have
rules that say you need to find different combinations of characters, and then do different
things to them. This sounds like a job for regular expressions.
·
17.3. plural.py, stage 2
Now you're going to add a level of abstraction. You started by defining a list of rules: if this,
then do that, otherwise go to the next rule. Let's temporarily complicate part of the program
so you can simplify another part.
·
· 17.4. plural.py, stage 3
Dive Into Python 280
Defining separate named functions for each match and apply rule isn't really necessary. You
never call them directly; you define them in the rules list and call them through there. Let's
streamline the rules definition by anonymizing those functions.
17.5. plural.py, stage 4
Let's factor out the duplication in the code so that defining new rules can be easier.
·
17.6. plural.py, stage 5
You've factored out all the duplicate code and added enough abstractions so that the
pluralization rules are defined in a list of strings. The next logical step is to take these strings
and put them in a separate file, where they can be maintained separately from the code that
uses them.
·
17.7. plural.py, stage 6
Now you're ready to talk about generators.
·
17.8. Summary
You talked about several different advanced techniques in this chapter. Not all of them are
appropriate for every situation.
·
Chapter 18. Performance Tuning
18.1. Diving in
There are so many pitfalls involved in optimizing your code, it's hard to know where to start.
·
18.2. Using the timeit Module
The most important thing you need to know about optimizing Python code is that you
shouldn't write your own timing function.
·
18.3. Optimizing Regular Expressions
The first thing the Soundex function checks is whether the input is a non?empty string of
letters. What's the best way to do this?
·
18.4. Optimizing Dictionary Lookups
The second step of the Soundex algorithm is to convert characters to digits in a specific
pattern. What's the best way to do this?
·
18.5. Optimizing List Operations
The third step in the Soundex algorithm is eliminating consecutive duplicate digits. What's
the best way to do this?
·
18.6. Optimizing String Manipulation
The final step of the Soundex algorithm is padding short results with zeros, and truncating
long results. What is the best way to do this?
·
18.7. Summary
This chapter has illustrated several important aspects of performance tuning in Python, and
performance tuning in general.
·
Dive Into Python 281


Appendix C. Tips and tricks
Chapter 1. Installing Python
Chapter 2. Your First Python Program
2.1. Diving in
In the ActivePython IDE on Windows, you can run the Python program you're editing by choosing
File?>Run... (Ctrl?R). Output is displayed in the interactive window.
In the Python IDE on Mac OS, you can run a Python program with Python?>Run window... (Cmd?R), but
there is an important option you must set first. Open the .py file in the IDE, pop up the options menu by
clicking the black triangle in the upper?right corner of the window, and make sure the Run as __main__
option is checked. This is a per?file setting, but you'll only need to do it once per file.
On UNIX?compatible systems (including Mac OS X), you can run a Python program from the command
line: python odbchelper.py
·
2.2. Declaring Functions
In Visual Basic, functions (that return a value) start with function, and subroutines (that do not return a
value) start with sub. There are no subroutines in Python. Everything is a function, all functions return a
value (even if it's None), and all functions start with def.
In Java, C++, and other statically?typed languages, you must specify the datatype of the function return
value and each function argument. In Python, you never explicitly specify the datatype of anything. Based on
what value you assign, Python keeps track of the datatype internally.
·
2.3. Documenting Functions
Triple quotes are also an easy way to define a string with both single and double quotes, like qq/.../ in
Perl.
Many Python IDEs use the doc string to provide context?sensitive documentation, so that when you
type a function name, its doc string appears as a tooltip. This can be incredibly helpful, but it's only as
good as the doc strings you write.
·
2.4. Everything Is an Object
import in Python is like require in Perl. Once you import a Python module, you access its functions
with module.function; once you require a Perl module, you access its functions with
module::function.
·
2.5. Indenting Code
Python uses carriage returns to separate statements and a colon and indentation to separate code blocks. C++
and Java use semicolons to separate statements and curly braces to separate code blocks.
·
2.6. Testing Modules
Like C, Python uses == for comparison and = for assignment. Unlike C, Python does not support in?line
assignment, so there's no chance of accidentally assigning the value you thought you were comparing.
On MacPython, there is an additional step to make the if __name__ trick work. Pop up the module's
options menu by clicking the black triangle in the upper?right corner of the window, and make sure Run as
__main__ is checked.
·
Chapter 3. Native Datatypes
· 3.1. Introducing Dictionaries
Dive Into Python 282
A dictionary in Python is like a hash in Perl. In Perl, variables that store hashes always start with a %
character. In Python, variables can be named anything, and Python keeps track of the datatype internally.
A dictionary in Python is like an instance of the Hashtable class in Java.
A dictionary in Python is like an instance of the Scripting.Dictionary object in Visual Basic.
3.1.2. Modifying Dictionaries
Dictionaries have no concept of order among elements. It is incorrect to say that the elements are "out of
order"; they are simply unordered. This is an important distinction that will annoy you when you want to
access the elements of a dictionary in a specific, repeatable order (like alphabetical order by key). There are
ways of doing this, but they're not built into the dictionary.
·
3.2. Introducing Lists
A list in Python is like an array in Perl. In Perl, variables that store arrays always start with the @ character;
in Python, variables can be named anything, and Python keeps track of the datatype internally.
A list in Python is much more than an array in Java (although it can be used as one if that's really all you
want out of life). A better analogy would be to the ArrayList class, which can hold arbitrary objects and
can expand dynamically as new items are added.
·
3.2.3. Searching Lists
Before version 2.2.1, Python had no separate boolean datatype. To compensate for this, Python accepted
almost anything in a boolean context (like an if statement), according to the following rules:
¨ 0 is false; all other numbers are true.
¨ An empty string ("") is false, all other strings are true.
¨ An empty list ([]) is false; all other lists are true.
¨ An empty tuple (()) is false; all other tuples are true.
¨ An empty dictionary ({}) is false; all other dictionaries are true.
These rules still apply in Python 2.2.1 and beyond, but now you can also use an actual boolean, which has a
value of True or False. Note the capitalization; these values, like everything else in Python, are
case?sensitive.
·
3.3. Introducing Tuples
Tuples can be converted into lists, and vice?versa. The built?in tuple function takes a list and returns a
tuple with the same elements, and the list function takes a tuple and returns a list. In effect, tuple
freezes a list, and list thaws a tuple.
·
3.4. Declaring variables
When a command is split among several lines with the line?continuation marker ("/"), the continued lines
can be indented in any manner; Python's normally stringent indentation rules do not apply. If your Python
IDE auto?indents the continued line, you should probably accept its default unless you have a burning reason
not to.
·
3.5. Formatting Strings
String formatting in Python uses the same syntax as the sprintf function in C.
·
3.7. Joining Lists and Splitting Strings
join works only on lists of strings; it does not do any type coercion. Joining a list that has one or more
non?string elements will raise an exception.
anystring.split(delimiter, 1) is a useful technique when you want to search a string for a
substring and then work with everything before the substring (which ends up in the first element of the
returned list) and everything after it (which ends up in the second element).
·
Chapter 4. The Power Of Introspection
· 4.2. Using Optional and Named Arguments
Dive Into Python 283
The only thing you need to do to call a function is specify a value (somehow) for each required argument; the
manner and order in which you do that is up to you.
4.3.3. Built?In Functions
Python comes with excellent reference manuals, which you should peruse thoroughly to learn all the modules
Python has to offer. But unlike most languages, where you would find yourself referring back to the manuals
or man pages to remind yourself how to use these modules, Python is largely self?documenting.
·
4.7. Using lambda Functions
lambda functions are a matter of style. Using them is never required; anywhere you could use them, you
could define a separate normal function and use that instead. I use them in places where I want to encapsulate
specific, non?reusable code without littering my code with a lot of little one?line functions.
·
4.8. Putting It All Together
In SQL, you must use IS NULL instead of = NULL to compare a null value. In Python, you can use either
== None or is None, but is None is faster.
·
Chapter 5. Objects and Object?Orientation
5.2. Importing Modules Using from module import
from module import * in Python is like use module in Perl; import module in Python is like
require module in Perl.
from module import * in Python is like import module.* in Java; import module in Python
is like import module in Java.
Use from module import * sparingly, because it makes it difficult to determine where a particular
function or attribute came from, and that makes debugging and refactoring more difficult.
·
5.3. Defining Classes
The pass statement in Python is like an empty set of braces ({}) in Java or C.
In Python, the ancestor of a class is simply listed in parentheses immediately after the class name. There is
no special keyword like extends in Java.
·
5.3.1. Initializing and Coding Classes
By convention, the first argument of any Python class method (the reference to the current instance) is called
self. This argument fills the role of the reserved word this in C++ or Java, but self is not a reserved
word in Python, merely a naming convention. Nonetheless, please don't call it anything but self; this is a
very strong convention.
·
5.3.2. Knowing When to Use self and __init__
__init__ methods are optional, but when you define one, you must remember to explicitly call the
ancestor's __init__ method (if it defines one). This is more generally true: whenever a descendant wants
to extend the behavior of the ancestor, the descendant method must explicitly call the ancestor method at the
proper time, with the proper arguments.
·
5.4. Instantiating Classes
In Python, simply call a class as if it were a function to create a new instance of the class. There is no explicit
new operator like C++ or Java.
·
5.5. Exploring UserDict: A Wrapper Class
In the ActivePython IDE on Windows, you can quickly open any module in your library path by selecting
File?>Locate... (Ctrl?L).
Java and Powerbuilder support function overloading by argument list, i.e. one class can have multiple
methods with the same name but a different number of arguments, or arguments of different types. Other
languages (most notably PL/SQL) even support function overloading by argument name; i.e. one class can
·
Dive Into Python 284
have multiple methods with the same name and the same number of arguments of the same type but different
argument names. Python supports neither of these; it has no form of function overloading whatsoever.
Methods are defined solely by their name, and there can be only one method per class with a given name. So
if a descendant class has an __init__ method, it always overrides the ancestor __init__ method, even
if the descendant defines it with a different argument list. And the same rule applies to any other method.
Guido, the original author of Python, explains method overriding this way: "Derived classes may override
methods of their base classes. Because methods have no special privileges when calling other methods of the
same object, a method of a base class that calls another method defined in the same base class, may in fact
end up calling a method of a derived class that overrides it. (For C++ programmers: all methods in Python
are effectively virtual.)" If that doesn't make sense to you (it confuses the hell out of me), feel free to ignore
it. I just thought I'd pass it along.
Always assign an initial value to all of an instance's data attributes in the __init__ method. It will save
you hours of debugging later, tracking down AttributeError exceptions because you're referencing
uninitialized (and therefore non?existent) attributes.
In versions of Python prior to 2.2, you could not directly subclass built?in datatypes like strings, lists, and
dictionaries. To compensate for this, Python comes with wrapper classes that mimic the behavior of these
built?in datatypes: UserString, UserList, and UserDict. Using a combination of normal and special
methods, the UserDict class does an excellent imitation of a dictionary. In Python 2.2 and later, you can
inherit classes directly from built?in datatypes like dict. An example of this is given in the examples that
come with this book, in fileinfo_fromdict.py.
5.6.1. Getting and Setting Items
When accessing data attributes within a class, you need to qualify the attribute name: self.attribute.
When calling other methods within a class, you need to qualify the method name: self.method.
·
5.7. Advanced Special Class Methods
In Java, you determine whether two string variables reference the same physical memory location by using
str1 == str2. This is called object identity, and it is written in Python as str1 is str2. To compare
string values in Java, you would use str1.equals(str2); in Python, you would use str1 == str2.
Java programmers who have been taught to believe that the world is a better place because == in Java
compares by identity instead of by value may have a difficult time adjusting to Python's lack of such
"gotchas".
While other object?oriented languages only let you define the physical model of an object ("this object has a
GetLength method"), Python's special class methods like __len__ allow you to define the logical model
of an object ("this object has a length").
·
5.8. Introducing Class Attributes
In Java, both static variables (called class attributes in Python) and instance variables (called data attributes
in Python) are defined immediately after the class definition (one with the static keyword, one without).
In Python, only class attributes can be defined here; data attributes are defined in the __init__ method.
There are no constants in Python. Everything can be changed if you try hard enough. This fits with one of the
core principles of Python: bad behavior should be discouraged but not banned. If you really want to change
the value of None, you can do it, but don't come running to me when your code is impossible to debug.
·
5.9. Private Functions
In Python, all special methods (like __setitem__) and built?in attributes (like __doc__) follow a
standard naming convention: they both start with and end with two underscores. Don't name your own
methods and attributes this way, because it will only confuse you (and others) later.
·
Chapter 6. Exceptions and File Handling
· 6.1. Handling Exceptions
Dive Into Python 285
Python uses try...except to handle exceptions and raise to generate them. Java and C++ use
try...catch to handle exceptions, and throw to generate them.
6.5. Working with Directories
Whenever possible, you should use the functions in os and os.path for file, directory, and path
manipulations. These modules are wrappers for platform?specific modules, so functions like
os.path.split work on UNIX, Windows, Mac OS, and any other platform supported by Python.
·
Chapter 7. Regular Expressions
7.4. Using the {n,m} Syntax
There is no way to programmatically determine that two regular expressions are equivalent. The best you can
do is write a lot of test cases to make sure they behave the same way on all relevant inputs. You'll talk more
about writing test cases later in this book.
·
Chapter 8. HTML Processing
8.2. Introducing sgmllib.py
Python 2.0 had a bug where SGMLParser would not recognize declarations at all (handle_decl would
never be called), which meant that DOCTYPEs were silently ignored. This is fixed in Python 2.1.
In the ActivePython IDE on Windows, you can specify command line arguments in the "Run script" dialog.
Separate multiple arguments with spaces.
·
8.4. Introducing BaseHTMLProcessor.py
The HTML specification requires that all non?HTML (like client?side JavaScript) must be enclosed in
HTML comments, but not all web pages do this properly (and all modern web browsers are forgiving if they
don't). BaseHTMLProcessor is not forgiving; if script is improperly embedded, it will be parsed as if it
were HTML. For instance, if the script contains less?than and equals signs, SGMLParser may incorrectly
think that it has found tags and attributes. SGMLParser always converts tags and attribute names to
lowercase, which may break the script, and BaseHTMLProcessor always encloses attribute values in
double quotes (even if the original HTML document used single quotes or no quotes), which will certainly
break the script. Always protect your client?side script within HTML comments.
·
8.5. locals and globals
Python 2.2 introduced a subtle but important change that affects the namespace search order: nested scopes.
In versions of Python prior to 2.2, when you reference a variable within a nested function or lambda
function, Python will search for that variable in the current (nested or lambda) function's namespace, then
in the module's namespace. Python 2.2 will search for the variable in the current (nested or lambda)
function's namespace, then in the parent function's namespace, then in the module's namespace. Python 2.1
can work either way; by default, it works like Python 2.0, but you can add the following line of code at the
top of your module to make your module work like Python 2.2:
from __future__ import nested_scopes
Using the locals and globals functions, you can get the value of arbitrary variables dynamically,
providing the variable name as a string. This mirrors the functionality of the getattr function, which
allows you to access arbitrary functions dynamically by providing the function name as a string.
·
8.6. Dictionary?based string formatting
Using dictionary?based string formatting with locals is a convenient way of making complex string
formatting expressions more readable, but it comes with a price. There is a slight performance hit in making
the call to locals, since locals builds a copy of the local namespace.
·
Dive Into Python 286
Chapter 9. XML Processing
9.2. Packages
A package is a directory with the special __init__.py file in it. The __init__.py file defines the
attributes and methods of the package. It doesn't need to define anything; it can just be an empty file, but it
has to exist. But if __init__.py doesn't exist, the directory is just a directory, not a package, and it can't
be imported or contain modules or nested packages.
·
9.6. Accessing element attributes
This section may be a little confusing, because of some overlapping terminology. Elements in an XML
document have attributes, and Python objects also have attributes. When you parse an XML document, you
get a bunch of Python objects that represent all the pieces of the XML document, and some of these Python
objects represent attributes of the XML elements. But the (Python) objects that represent the (XML)
attributes also have (Python) attributes, which are used to access various parts of the (XML) attribute that the
object represents. I told you it was confusing. I am open to suggestions on how to distinguish these more
clearly.
Like a dictionary, attributes of an XML element have no ordering. Attributes may happen to be listed in a
certain order in the original XML document, and the Attr objects may happen to be listed in a certain order
when the XML document is parsed into Python objects, but these orders are arbitrary and should carry no
special meaning. You should always access individual attributes by name, like the keys of a dictionary.
·
Chapter 10. Scripts and Streams
Chapter 11. HTTP Web Services
11.6. Handling Last?Modified and ETag
In these examples, the HTTP server has supported both Last?Modified and ETag headers, but not all
servers do. As a web services client, you should be prepared to support both, but you must code defensively
in case a server only supports one or the other, or neither.
·
Chapter 12. SOAP Web Services
Chapter 13. Unit Testing
13.2. Diving in
unittest is included with Python 2.1 and later. Python 2.0 users can download it from
pyunit.sourceforge.net (http://pyunit.sourceforge.net/).
·
Chapter 14. Test?First Programming
14.3. roman.py, stage 3
The most important thing that comprehensive unit testing can tell you is when to stop coding. When all the
unit tests for a function pass, stop coding the function. When all the unit tests for an entire module pass, stop
coding the module.
·
14.5. roman.py, stage 5
When all of your tests pass, stop coding.
·
Chapter 15. Refactoring
· 15.3. Refactoring
Dive Into Python 287
Whenever you are going to use a regular expression more than once, you should compile it to get a pattern
object, then call the methods on the pattern object directly.
Chapter 16. Functional Programming
16.2. Finding the path
The pathnames and filenames you pass to os.path.abspath do not need to exist.
os.path.abspath not only constructs full path names, it also normalizes them. That means that if you
are in the /usr/ directory, os.path.abspath('bin/../local/bin') will return
/usr/local/bin. It normalizes the path by making it as simple as possible. If you just want to normalize
a pathname like this without turning it into a full pathname, use os.path.normpath instead.
Like the other functions in the os and os.path modules, os.path.abspath is cross?platform. Your
results will look slightly different than my examples if you're running on Windows (which uses backslash as
a path separator) or Mac OS (which uses colons), but they'll still work. That's the whole point of the os
module.
·
Chapter 17. Dynamic functions
Chapter 18. Performance Tuning
18.2. Using the timeit Module
You can use the timeit module on the command line to test an existing Python program, without
modifying the code. See http://docs.python.org/lib/node396.html for documentation on the command?line
flags.
The timeit module only works if you already know what piece of code you need to optimize. If you have
a larger Python program and don't know where your performance problems are, check out the hotshot
module. (http://docs.python.org/lib/module?hotshot.html)

你可能感兴趣的:(Python)