Django Master Class

The point of this tutorial is to introduce you to the features of Django that the experts know. The three of us have each picked three “tricks of the trade” to share with you.

1

So here’s what we’ve got on the plate:

  1. Unit testing (Simon). First because it’s important, dammit!
  2. Stupid middleware tricks (Jacob). Make middleware work for you, not against you.
  3. Signals (Jeremy). Get notified when important things happen.
  4. Forms & AJAX (Simon). Django’s newforms library rocks, and it goes great with this whole “AJAX” thing. Now you can finally show your face in those cool “Web 2.0” cliques.
  5. Template tag patterns (Jacob). Save time writing those repetitive tags by factoring out common tasks.
  6. Custom fields (Jeremy). Because not every piece of data is a simple, primitive type.
  7. OpenID (Simon). Learn the straight dope about OpenID, and see how to integrate it into your Django site.
  8. The “rest” of the stack (Jacob). Also known as “how to scale your website by copying LiveJournal.”
  9. GIS (Jeremy). Store data about our planet. Or another one.
2

First up: testing. If you pay attention to only one part of this tutorial, make it this one.

3

Test Driven Development By Example (by Kent Beck) is the bible here.

If you have the discipline for it, this is a really rewarding way of programming. It works particularly well if you are pair programming with someone who can keep you on the straight and narrow.

4

The other end of the spectrum. Write tests only when you need them. This is a really great way to tackle tricky bugs, where the hardest problem is often replicating them. Replicate them with a test, then solve them. The test will guarantee they don't come back to haunt you again later.

5

I speak from experience here. I had a project with a beautiful test suite. I let things lapse while dashing for a deadline. I still haven’t got all the tests working again, which discourages me from running the tests at all, which massively devalues the test suite.

6

Until I saw Ruby on Rails, I had basically resigned to the fact that testing web apps was too hard to be worth doing, thanks to the difficulties involved in testing something with external persistent state (a database) and most interactions happening over HTTP.

Rails used fixtures to tackle the database testing problem, and included a bunch of clever hooks for making everything else easy. Django has since evolved a similar set of features, albeit with a distinctly Pythonic flavour.

7

Doctests are used extensively by Django itself for unit testing the ORM - they have the nice side-effect of doubling as documentation, automatically generated for the website:

  • http://www.djangoproject.com/documentation/models/

You are encouraged to use them for testing your own models as well; Django's built in test runner will detect and execute them.

8
9
10
11

A naive approach.

12

It doesn't work for the edge cases. That's why tests should always target the edge cases. You can try using using 365.25 instead, but it still won't pass every test.

13

This passes all the tests. I'm ashamed to admit how long it took me to get here; the tests were invaluable.

14
15
16

http://www.kottke.org/04/10/normalized-data discusses the quote in more detail. Cal is the lead engineer on Flickr, and knows exactly what it takes to build a system that scales to millions of users. Denormalisation is an excellent way to speed up your queries - at a cost of added complexity in your application code. It's an ideal case study for unit testing.

17

Many online forums have a view which shows the most recent 20 or so threads along with a count of the number of replies to each. This can be a pretty expensive SQL query, and can be dramatically sped up by denormalising the data.

18

Here, num_replies is the denormalised field. It stores the number of replies that are attached to that thread - information that already exists in the database (and is now stored twice).

19

Fixtures provide a way of pre-populating a database with test data - great for writing tests against. These fixtures are saved in a file called forum/fixtures/threadcount.json . You can easily generate your own fixtures using this command:

./manage.py dumpdata

To pretty-print the JSON, use this:

./manage.py dumpdata --indent 2

And for XML instead, do this:

./manage.py dumpdata --indent 2 --format xml

If you’ve got PyYAML installed, you can also use --format yaml .

20

This test case clears the database and loads our threadcount.json fixtures before each test. It contains two tests: one that adds a new reply to a thread, and another that deletes a reply. The tests check that num_replies accurately reflects the number of replies associated with a thread.

21

Our tests fail, because we don't have a mechanism for keeping the counter in sync with the actual data yet.

22

By over-riding the save and delete methods on the Reply , we can update the num_replies of the parent thread when a reply is added or deleted.

23

The tests pass! It says 7 because I ran the test runner against a project, which included a couple of other simple applications.

24

Bonus slide: here's an alternative way of solving the denormalised counter problem, this time using signals instead of custom delete() and save() methods.

If you’ve not yet learned about signals, don’t fret; Jeremy is going to cover them in a little bit. Mark this slide and come back to it then...

25
26

Two tests here. The first simple checks that Django's trailing slash adding middleware is configured correctly, and that /register/ returns a 200 (“OK”) status code.

The second checks that /register/ uses the register.html template, demonstrating both the verbose way of doing this and the self.assertTemplateUsed shortcut.

27

A more complex example. This illustrates two useful concepts: POSTing to a form using client.post() , and intercepting sent e-mails using mail.outbox .

Test cases that inherit from django.test.TestCase automatically hook in to Django's email framework and intercept messages, so that instead of being sent out via SMTP they are stored in a queue and made available for assertion testing.

A good rule for tests is that they should never interact with external services, unless the services themselves are being tested.

28

More on testing with Django:

  • http://www.djangoproject.com/documentation/testing/

I've also used BeautifulSoup for running tests against the structure of my HTML before, but I generally find this counter-productive as HTML frequently changes during development without having much of an impact on the functionality of the application.

29

Part the second: Middleware.

30

Most people understand a request/response cycle along these lines. This is correct, of course, but it’s also overly simplistic; there are a number of steps that this simple Request/View/Response understanding leaves out.

In particular, it suggests that if we want certain behavior to happen on each request, we’re forced to write it into a view (since the view is the only part of this cycle that Django doesn’t control internally).

Often we want to perform tasks on each and every request -- think about return gzipped content, for example. If Django was really this simplistic, something along those lines would be basically impossible.

31

So this is how a request “really” works (and actually even this outline simplifies things somewhat).

I don’t have time to go over all the intricate details here, but notice the pieces of “middleware” that let you hook in at various points in the cycle and override the default behavior. For example, you can see that the request middleware can return a response and “short-circuit” the entire view phase. This is how the caching framework is able to work so fast: if the page is in the cache, the view doesn’t even need to be called.

A note on terminology: we call this feature “middleware”, though this term can be a bit misleading to folks with an “enterprisy” background. Ruby on Rails calls its similar feature “filters”, which would work well for Django (if not for the conflict with the naming of template filters).

If you’re confused, think of middleware as essentially callbacks at particular moments in a single request cycle.

32

Here’s a simple piece of middleware (modified from a post to DjangoSnippets by “Leonidas”). In particular, this is a piece of “request middleware.”

There’s really nothing special about middleware; a piece of middleware is just a Python class that defines a particular API. Here, by defining process_request() , this object can be used as a piece of request middleware.

33

A piece of middleware can define multiple handlers. It can also save state as instance attributes (i.e. self.foo = ‘whatever’ ), but note that for performance a single middleware instance is reused for all requests.

34

“Installing” a piece of middleware is as simple as registering it in MIDDLEWARE_CLASSES . This example shows some built-in Django middleware along with the piece of middleware from the previous slide.

35

The order of MIDDLEWARE_CLASSES is important; middleware is processed “top-down” during the request phase, and “bottom-up” during the response phase.

36

Here’s another way of looking at it. Middleware is the onion skin around the view; you can think of each middleware class as a “layer” that wraps the view and can intercept data on its way in or out.

37

The four types of middleware callbacks.

38

It’s nasty and slimy, but a great example of request middleware is a three-click paywall like some news sites use. That is, you get free access to the site, but after your third page you get redirected to a login/registration page.

Pretty straightforward, but note that request middleware may return an HttpResponse or suitable subclass. If so, the rest of the request is short-circuited and the view is never handled. However, the middleware may also return None` , which signals that the normal request cycle should be continued.

39

View middleware... isn’t really very useful, honestly. It’s mostly there for a debugging hook -- it’s a nice place to hook in if you’d like to wrap and profile a view, for example.

I’m going to skip showing an example, because you probably won’t ever need to use it.

40

You’ll use response middleware any time you need to modify the output before it gets sent to the browser.

41

Cute, eh?

42

Like view middleware, exception middleware isn’t all that useful in end-user code; it’s mostly there as a hook for doing frameworky stuff.

So I’ve cheated and taken an example from Django itself: the built-in TransactionMiddleware that handles keeping each request in its own transaction. Here we can see the rollback step taken by the exception hook. (There’s of course a similar commit step in the response middleware, but that’s not shown here.)

43

More:

  • http://www.djangoproject.com/documentation/middleware/
  • http://www.djangobook.com/en/beta/chapter16/
  • http://code.djangoproject.com/wiki/ContributedMiddleware
  • http://www.djangosnippets.org/tags/middleware/
44

Jacob’s discussion of middleware showed that it provides hooks for additional processing of HTTP requests and responses. I’m going to cover signals, which provide similar hooks in Django’s lifecycle and ORM.

45

You’ve probably used something like Django’s signaling tools before in the form of Observer from the book, “Design Patterns" (a.k.a the Gang of Four), or from Qt, Java, or .Net programming.

Something so popular has got to be useful, right?

When you’re first starting out with a toolset, it’s common to just make things work.

But, when your codebase grows or you wish to start combining and layering components, directly referencing other modules and applications leads to circular dependencies, tight coupling, difficulties in testing, and, yes, sadness.

You can use Django’s stock signals to hook into other apps and to customize ORM behavior. You can also provide your own signals for use in other applications.

46

The core idea is that signals provide a way to communicate and coordinate without directly expressing dependencies.

Note that it’s possible to have multiple handlers per signal. They’ll run sequentially, but their order is undefined. You shouldn’t write signal handlers with the expectation that an earlier handler has altered state.

47

Here’s a simple example from the Django codebase.

We want Django’s ORM to be useful without the HTTP handler, and vice versa. But we also want to make sure that when an HTTP request is finished, the DB connection is closed.

The core.request_finished signal is used to notify the ORM that the connection is no longer needed.

48

Using signals starts with choosing a one. You can either use a stock Django one or publish your own.

Defining your own signal is as simple as creating an object to represent it.

Once you’ve chosen your signal, you’ll write a handler based on the arguments the signal’s sender provides.

Finally, you’ll connect your handler to the signal.

49

Django includes a number of signals which it uses internally.

50

django.core.signals is home to a couple more request-related signals.

request_started is sent when the request handler first begins processing, and is used internally to reset db.connection.queries , a list of all queries executed by Django’s ORM which is kept when settings.DEBUG == True .

got_request_exception is used to indicate an exception occurred while processing a request. It’s used internally to roll back any pending database transaction as well as for exception reporting in django.test .

51

Now we get to the good stuff.

The class_prepared signal indicates that a Model class has been constructed. It’s used internally for some housekeeping such as ensuring that every model has a Manager and resolving recursive model relationships. This signal is very early in the life of a Model , so some pretty radical features are possible.

The pre and post init signals allow signal handlers to munge data just as a model instance is created. We’ll see an example in GenericForeignKey a bit later.

The pre and post save signals allow a signal handler to do additional processing in response to the model being saved. The pre and post delete signals serve a similar purpose.

post_syncdb is sent by django.core.management just after an app’s models have been added to the database. It’s used for interactive prompting, as seen in auth’s initial superuser prompt.

52

One nice use of signals is to add additional functionality to existing code.

Suppose we want to get an email any time a model is saved with a pub_date attribute set in the future.

53

Note that if you just want this type of handling on a single model which you control, you’d probably be better off overriding the save method in your model definition rather than using a signal.

But in this case, we want to handle multiple models. We’ll need to listen to a save signal. We can use either pre- or post-save in this case, since the signal will not be manipulating the data about to be saved. We’ll use pre_save .

Django dispatches the pre_save signal with the keyword arguments sender (the model class) and instance (the model object).

We’ll need to define a signal handler to use these parameters.

54

Connecting to a signal is pretty simple-- just call dispatcher.connect , passing in the handler and the signal for which it should be called.

55

Recall that pre_save offers both the model class and instance as parameters. In this case, we care about the model instance, but not the class.

Django’s dispatching system will match up the published arguments with the subscribed handlers. There’s no need to accept all parameters explicitly in the handlers.

56

Since we’re trying to handle many different models, we’ll have to assume some common interface.

Here, we check whether the model has the attributes we expect, and if not, we stop processing the signal.

57

Now, whenever a model instance is saved, mail_on_future will be called.

58

Another use of signals is to adapt from one form in an API call to another.

59

GenericForeignKey makes it possible to refer to any kind of related instance using ForeignKey -like semantics. It does this by storing the related instance’s content type and primary key value.

But there’s a hitch-- models with regular ForeignKey fields can be constructed with references to the related model instance. In this example, we’re assigning an author to a story.

GenericForeignKey , however, requires both a content type and a foreign key. The API would be more consistent with ForeignKey if we had a way to hide that complexity. In this example, we’d like to assign a target object for a Comment .

60

To accomplish this, GenericForeignKey listens for the pre_init signal and alters the model construction call from the nice form to the ugly (but necessary) form.

61

In the pre_init handler, GenericForeignKey inspects the constructor kwargs for the desired usage.

62

And then it replaces the the given model instance with its related content type and primary key.

63

This reduces the lines of code needed to use the GenericForeignKey, and makes the API more like a standard ForeignKey.

Nice!

64

You can find further information on signals as implemented in Django with these links:

  • http://en.wikipedia.org/wiki/Observer_pattern
  • http://code.djangoproject.com/wiki/Signals
  • http://pydispatcher.sourceforge.net/
65

These projects, available on http://code.google.com/ , all use signals.

django-multilingual , in particular, is very ambitious; it uses signals to dynamically create models featuring parallel texts for originally-specified models. Additionally, it substitutes its own custom (oldforms) manipulators in to facilitate data entry of multilingual text.

Have a look and have fun.

66
67
68
69
70

This view has three return values: the empty string, if it was given an empty username; the text 'Unavailable', if it was given a username that is unavailable; and the text 'Available' for usernames that are available.

71

The jQuery function takes a CSS selector as its first argument; here we are passing a selector for the span element with id="msg" , but it supports all sorts of advanced selectors including ones from CSS 2 and 3, XPath and a few that are unique to jQuery.

The function returns a wrapper object around the collection of elements matched by the selector. jQuery methods can then be called on the wrapper; in this case we are calling the load method, which uses Ajax to retrieve a fragment of HTML from a URL and then injects it in to the element(s) on which it was called.

72

For convenience, jQuery sets up $() as an alias to itself. jQuery and $ are the only two symbols it adds to your global namespace, and you can revert $ back to what it was before if you want to (for compatibility with Protoype, for example).

73

Here we're binding a function to the keyup event of the input field. Every time a key is released it performs the Ajax request.

74

Finally, we set the whole thing to run when the page has finished loading. This ensures that the input element has been loaded in to the browser's DOM. $(document).ready() fires after the DOM has been loaded but before all of the images have been loaded - this means it's a better way to attach JavaScript behaviours than the more traditional window.onload, which can take a lot longer to fire.

75

The $ function also acts as a shortcut for $(document).ready, if you pass it a function instead of a selector string.

76

All Web applications need server-side validation, to ensure the integrity (and security) of data submitted by the client. Application usability can be enhanced by adding JavaScript client-side validation, but this often leads to duplicated validation logic - the same rules expressed once in Python and once in JavaScript.

With Ajax, we can reuse the server-side code for client-side validation.

77

Django's newforms library allows us to define form validation logic in a similar way to Django models - declaratively, using a subclass of newforms.Form .

78

Here's the server-side code that goes with that form. If the form has been POSTed, it checks if it is valid. If it is, it sends an e-mail (in this case) and redirects the user. If the form is invalid or has not yet been submitted, the contact page is displayed.

79

The template looks like this. form.as_p provides a simple default layout for the form; the template can be extended to define exactly how the form should look if a custom display is required.

80

Let's add client-side validation, reusing our ContactForm for validation. This view expects to be POSTed either the whole form or just one of the fields; if just one field is provided, the field= GET variable is used to specify which one.

The view returns a Python dictionary rendered to JSON, a useful data format for Ajax as it can be evaluated as regular JavaScript.

It makes use of a custom JsonResponse class, which knows how to render a Python object as JSON.

81

Here's JsonResponse . I often include this utility class in my applications when I'm working with JSON. Note that it sets the correct Content-Type header, "application/json". This can make debugging difficult as the browser will attempt to download the content directly; an improved version could check for settings.DEBUG and serve using "text/plain".

82

This is the accompanying JavaScript. The validateInput function is called for an input field, and performs an HTTP POST (using jQuery's Ajax features) against the view we just defined.

It makes use of the jquery.form.js plugin, which adds the formToArray() method to the jQuery object. jQuery plugins provide a clever mechanism for extending jQuery's functionality without needing to increase the size of the main jquery.js file.

The validateInput function is attached to every input field on the page, using jQuery's handy custom :input selector.

83

Here's the showErrors function, which displays any errors in the errorlist associated with the form element. relatedErrorList() uses jQuery's DOM traversal functions to find the error list associated with the input element, and creates one if there isn't one already.

84
85
86

Bonus slide: here’s that validate_contact method repackaged as a generic view.

87

More:

  • http://www.djangoproject.com/documentation/newforms/
  • http://jquery.com/
  • http://dojotoolkit.org/
  • http://developer.yahoo.com/yui/
  • http://www.prototypejs.org/
  • http://www.djangosnippets.org/tags/ajax/
88

Custom template tags are supremely useful. Write ‘em for a while, however, and you start to discover some patterns you use over and over again. In this part, I’ll go over five common needs, and the patterns I use to handle them.

89

The first use case: simple data (i.e. a list, text, etc.) in, simple data out.

When you’ve got one of these tasks, think “filter!”

90

An example filter to “piratize” text. Filters really are damn simple, so there’s not much more to say about this.

91

Use case #2: you’ve got some programatically-generated data (i.e. from the results of a database lookup, or system call, or ...) that you’d like to render into the template.

In this case, the @simple_tag decorator is your friend.

92

Here’s a pretty simple example: display a server’s uptime. Not a very useful tag, but shows the basic pattern pretty well.

93

Use case #3: you’ve got something you want to display in a template tag, but it’s expensive and you don’t want template authors killing your servers.

The solution is to cache the results of template tags.

94

I’ve written a set of node subclasses that illustrate one way you could use caching with template tags. It’s a useful idea even if you don’t use these specific bits.

95

This is a use case that doesn’t come up very often, some some times you need to do pretty complex stuff.

96

Here’s an example (also available at djangosnippets.com ) of what I’m talking about. These tags depend on each other, and you’ll need to handle the child tokens “inside” the switch tag correctly.

97

The import parts to notice here are the three commented lines. First we gather all the child nodes until the {% endswitch %} tag; then we delete that {% endswitch %} tag; then we pull out just {% case %} nodes. From there, it’s a matter of returning the node type.

The case handler is very similar; it just doesn’t have to do the get_nodes_by_type() call.

98

Here’s (the render method of) the switch node. Notice all that it does is delegate rendering off to the case node after doing some checks.

99

Finally, this is the interesting part of the case node. Pretty simple: check for equality, and (when requested) render all the child nodes passed in.

Again, the full code’s available online at http://www.djangosnippets.org/snippets/300/ .

100

This is a common complaint: “I’ve got this cool tag, but I hate having to {% load %} it everywhere!” The solution is to make it a builtin.

101

And here’s how. You can stick this code anywhere that’ll get loaded on startup; I suggest installing it in a top-ish-level __init__.py .

102

More resources:

  • http://djangoproject.com/documentation/templates_python/ — the official template documentation.
  • http://code.google.com/p/django-template-utils/ — James’ template utils have some good examples.
  • http://www.djangosnippets.org/ — There are lots of good resources here.
103

There are two different kinds of fields in Django: newforms.Field (which Simon covered earlier), and db.models.Field . Here I’ll cover model fields.

Model fields provide a way to customize the behavior of the ORM and to provide a richer interface when dealing with model instances.

104

There are many model fields that come with Django. Here are a few that run spectrum of sophistication.

A CharField requires a maxlength argument, and otherwise supports common validation parameters like blank, null, and default. I’m sure you’ve used one before.

Note that each of those parameters could be implemented as a validator given in validator_list . They’re included in the CharField implementation because they are so commonly useful.

Next on the spectrum is URLField , which is a CharField with a larger default maxlength and an additional option to validate that the resource identified actually exists.

A FileField goes further by contributing helper functions, such as get_FIELD_url , to the associated model.

As I covered earlier, GenericForeignKey provides an abstraction layer over the ContentType package in order to make model instances refer to any other model.

Developers using Django can tap into this power, too.

105

We’ll start with a validating ISBNField.

An ISBN is a unique identifier assigned to each edition (or sometimes printing) of any book. They come in 10 and 13 digit varieties; 13 digits is the new standard.

The last digit is a check digit and can be used to verify validity.

106

We need to subclass an existing Field class. The base Field class provides hooks needed for Django to manage persistance.

We’ll usually want to override the Field.__init__ in order to set constraints, and we need to map our Field into a database column.

107

Before we get to the actual field, a little warning about validation.

Form processing is in flux on trunk right now. Oldforms is being replaced with Newforms. Oldforms used manipulators, which validated, in part, using a field’s validator_list .

There’s some debate right now whether validation logic belongs in models, forms, or both.

Rather than get sidelined with that debate and the many ways to currently do it, I’m going to cheat and not use forms here. Instead, I’ll rely on Model.validate , which, at least on trunk right now, calls validate for each of the fields on the model.

Watch this space.

108

Let’s get started.

We’ll inherit from CharField to start with, since ISBNs are a string of characters.

Here’s our custom validator. If you’re not familiar, validators must raise an ValidationError exception to indicate failure.

109

In the ISBNField’s __init__ , we’ll force maxlength to be 13, since all ISBNs are at most that many characters.

We also add the isISBN validator to validator_list, as an example of how we could support oldforms.

110

Finally we add get_internal_type() to tell Django to map ISBNField to the same column type as a CharField . get_internal_type() is used by Django’s admin to determine which widget to show.

In this example, that’s all we need. However, when creating Field subclasses that are significantly different than any existing Django model, you’ll also want to implement db_type() . db_type() is used during model creation, and should return the SQL fragment needed to determine the database engine-specific column type.

111

Now we can use the ISBNField like any stock field.

We can give it a valid ISBN and have it pass, or a bad ISBN and have it fail.

112

Given an ISBN, it’s common to want related information about a book such as the title.

Let’s change ISBNField so that it contributes a CharField for the title in addition to its own field.

113

So, I’ve written a method that, given an ISBN, returns the title of that book.

I’ve also tweaked the ISBNField.__init__ to take an optional title_field argument. This is used to determine the name of the title field on this model.

114

Every Field has a contribute_to_class method, which Django uses to help define the Model class.

In the last example, we just let the standard Field.contribute_to_class do its thing, but now we want to alter the model class definition to include an extra Field for the title.

The tricky part here is incrementing the creation counter. The creation counter is used to maintain field order when one Django model inherits from another one. But it also affects the order of field value assignment in the model’s constructor.

We want ISBN to be set after the title field so that we can fill in the title based on the ISBN value. If the ISBN field occurred before the title in the model definition, the title set by the ISBNField might be overwritten.

Finally, we contribute the new title field to the model we’re helping to build.

115

Actually, there’s one more step to the contribution.

We’d like the the title attribute to be derived from the given ISBN.

If you want control what happens on an attribute access, you typically use a property.

In Django, the Field instance is attached to the Model class . This is important to realize, because a single Field instance can’t manage the model instances. Instead, we need to use a “descriptor”.

116

Descriptors are objects that take a class or instance as a parameter, and resolve attribute lookup using both that reference and internal state.

See Guido’s discussion here: http://www.python.org/download/releases/2.2.3/descrintro/

Since serving the attribute resolution is tightly related to the Field itself, I’ve made the Field instance itself serve as the descriptor for the Model class.

117

Here’s the descriptor “set” method for setting the value of the field on the model.

We insure that the call is for a model instance rather than the model class. This prevents overriding the field on the class in outside code.

Then, if the ISBN is a string or None , the ISBN is stashed in the model instance’s dictionary, and the title is set to correspond to the ISBN.

118

Finally, when the ISBNField’s attribute is accessed, we return the value from the model instance’s dictionary. This is the descriptors “getter” method.

119

There we have it: an ISBNField that manages a related title field.

120

More resources on Django’s model creation lifecycle:

  • http://code.djangoproject.com/wiki/DevModelCreation
  • http://toys.jacobian.org/presentations/2007/pycon/tutorials/advanced/#s22

The TagField that’s part of django-tagging (http://code.google.com/p/django-tagging/ ) is a good example.

And more information about the python magic that lets this work:

  • http://www.python.org/download/releases/2.2.3/descrintro/
  • http://docs.python.org/ref/attribute-access.html
121
122
123

It solves the “too many passwords” problem - with OpenID, you don’t have to come up with a brand new username and password on every site that you need an account.

It’s decentralised, which means that there’s no central entity controlling everyone’s identity - unlike Microsoft Passport or Six Apart’s TypeKey.

It’s an open standard, supported by Open Source libraries. For a much more detailed introduction, watch the video of my Google Tech Talk (or read through the slides):

  • http://video.google.com/videoplay?docid=2288395847791059857
  • http://www.slideshare.net/simon/implications-of-openid-google-tech-talk/
124

These are some of mine. It’s perfectly normal for people to have more than one (people have maintained multiple online personas since the early days of the Internet), but in practise most people will pick one and use it on most sites.

If you have a LiveJournal or AOL account, you have an OpenID already. If you don’t have one, there are plenty of places that you can get one: http://openid.net/wiki/index.php/OpenIDServers

125

You can watch a screencast of OpenID in action here: http://simonwillison.net/2006/openid-screencast/

126
127

If you view the HTML source of a page that is an OpenID, you’ll find this in the section.

This tells the OpenID consumer (the site you are signing in to) where your provider’s server is. This is the URL that you will be redirected to to “prove” that you own that OpenID. Proof is often done by signing in to that site with a username and password, but other forms of authentication are possible as well.

The consumer also establishes a shared secret with the provider, if they haven’t communicated before. This lets them communicate securely despite your browser handing the information back and forth between the two of them.

128
129

This essentially acts as a way of helping you to pre-fill a registration form. As part of the OpenID sign in process, the consumer can ask your provider for this information. Your provider will explicitly ask your permission before passing it back. There are no guarantees that complete (or indeed any) information will be passed back at all, so consumers can’t rely on this working.

More here: http://simonwillison.net/2007/Jun/30/sreg/

130
131

The reference implementation is the JanRain OpenID library: http://www.openidenabled.com/openid/libraries/python/ . It’s a great library, and really isn’t that hard to use. But there is an easier way...

132

The models are used by the JanRain library for persistence; you don’t have to worry about them at all.

Full instructions here: http://django-openid.googlecode.com/svn/trunk/openid.html

133

The full middleware line is 'django_openidconsumer.middleware.OpenIDMiddleware' , but that didn't fit on the slide. You need to add this somewhere after the session middleware, which must be activated for the OpenID functionality to work.

134

The first URL will be your sign-in page, where users are directed to begin signing in with OpenID.

The second is the URL that the user will be redirected back to upon successful sign in with their OpenID provider.

The third is the signout page, which users can use to sign out of your application.

135
136
137

It may not be instantly obvious why it is useful to have users sign in with more than one OpenID at once. There are a number of reasons, but the most interesting is that sites may well start to offer API services around the OpenIDs they provide - for example, a last.fm OpenID may be used to retrieve that user's last.fm music preferences, while an Upcoming.org OpenID could provide access to their calendar. Supporting multiple OpenIDs allows services to be developed that can take advantage of these site-specific APIs.

138
139

By "coming soon", I mean really soon. There's a small chance I'll have released the first of these before giving this tutorial.

140

More info:

  • http://openid.net/ — the oficial OpenID site; also home to the OpenID mailing lists.
  • http://www.openidenabled.com/ — a directory of OpenID-enabled applications.
  • http://simonwillison.net/tags/openid/ — All of Simon’s writings on OpenID.
  • http://code.google.com/p/django-openid/ — Home of the django-openid library.
141
142

So: diagrammed loosely, this is what a typical website looks like, right?

143

Ahem.

144

This is more like it.

This is LiveJournal’s current architecture, as taken from some slides on LiveJournal’s architecture given by Brad Fitzpatrick. Yes, LiveJournal is a big site, but 90% of good scaling is foresight. Planning ahead to an architecture like this is the only way we’ll actually get there without too much trouble.

145

The thing is, this is the only part of that cluster that’s LiveJournal-specific. In any big application, there’s a bunch of other code that does infrastructure-related activities, and all that is reusable.

In fact, poke under the hood at most big web sites — MySpace, Facebook, Slashdot, etc. — and you’ll find many tools crop up over and over again. The wonders of the LAMP-ish stack these days is that you can use the same tools the big boys use. The fact that MySpace gets 6000 hits/second out of Memcached makes me not worry at all about my 60.

I’m going to go over a few of these tools that’ll give you the most “bang for your buck.”

146

The first tool I’ll look at is Perlbal. Perlbal is a “reverse proxy load balancer and web server”, which is a fancy way of describing a tool that mediates between web browsers and backend web servers.

Perlbal can do a whole bunch more, actually — including acting as a part of MogileFS, which is awesome but which I can’t cover in this tutorial — but I’ll just focus on its role as a reverse proxy.

There are, of course, other load balancers -- Apache’s mod_proxy and nginx come to mind -- and much of the following applies to them. I use Perlbal, so that’s what I’m gonna talk about.

147

So why use a reverse proxy at all?

Well, even if you’ve only got a single web server, Perlbal can still save your butt. Although it takes only fractions of a second to generate a page, a slow client can take a relatively long time to download that content. In most situations even your faster clients have far smaller pipes than your server; this leaves the server to spend the majority of its town “spoonfeeding” rendered data down to clients. Perlbal (and other reverse proxies) will cache a certain amount of content and trickle it down to clients, leaving your backend free to handle more requests.

Second, if all your requests go through a proxy, it’s amazingly easy to swap out backend web servers, add more as traffic increases, or otherwise move things around. Without a proxy, you’d spend a bunch of time rebinding IP addresses, and possibly end up locked into a server you don’t like.

Finally, if you’re lucky you’ll get to the point that a single server won’t handle all the traffic you’re throwing at it. Perlbal makes it incredibly easy to add more backend servers if and when that happens.

148

Unfortunately, Perlbal isn’t documented all that well. The docs in SVN are pretty good, and the mailing list is a great place to get help. I’ll also show some example configs over the next few slides.

  • http://danga.com/perlbal/
  • http://code.sixapart.com/svn/perlbal/trunk/doc/
  • http://lists.danga.com/mailman/listinfo/perlbal
149

Here’s a stripped down version of the Perlbal config for ljworld.com. We’re using the virtual host plugin to delegate based on domain name. The domain name points to a “service”, which (since it’s a proxy) points to a “pool” of servers.

We’re using a cute trick for the poll here; instead of listing the servers in the config file, we point to a “nodefile” of backend web servers.

150

This is that node file; one IP:port per line. The clever thing is that Perlbal notices if this file changes and automatically reconfigures the pool; this means that changing the pool is as simple as changing this file.

151

A couple of tricks we’ve learned over a few years of using Perlbal:

  • Because you’re now behind a proxy, REMOTE_IP won’t be correct (it’ll always be set to the IP of Perlbal itself). Django’s included XForwardedForMiddleware will correctly set REMOTE_IP for you.
  • Perlbal has some neat tricks; check out X-Reproxy-File and X-Reproxy-URL .
  • It’s often useful to know which backend server actually handled a request. We use a special X-header to keep track of that (X-Beatles ).
  • If you’ve got a change you’re not sure about, you can always deploy it to a single server and let Perlbal hand just a portion of requests to that server.
152

The next tool on our little micro-tour is memcached . It’s a in-memory object caching system, and it’s the secret to making your sites run fast. Django’s caching framework will use memcached, and for any serious production-quality site you should let it.

153

Really, there’s no reason not to use memcached, so I’m not going to spend much time advocating it. If you choose a different cache backend you deserve what you get.

154

This is how easy it is to start memcached.

155

And this is all you need to do to make Django use it (well, besides installing the memcached client library, which is pure Python and will run anywhere). Since it confuses some people, the second line shows how to use multiple cache backends.

156

Some tricks:

  • More memcached servers generally equals better performance (i.e. four 1 GB servers will perform better than 1 4GB server). That’s because the memcached protocol hashes twice: once on the client to determine the server, and once on the server. This leaves an equal distribution of keys across servers, and hence better performance. You do want roughly equal cache sizes on each server so that key expiration isn’t abnormal.
  • You want to make sure to use unique keys if you’re running multiple sites against the same cache. Otherwise one.example.com/A/ could get the same key as two.example.com/A/ , and that’s bad. We use DJANGO_SETTINGS_MODULE as the key prefix, and it works well.
  • Memcached has no namespaces, so try to design keys that don’t need ‘em. In a bind, you can use some external value that you increment when you need a “new” namespace.
157

The final tool I’ll look at is Capistrano. Although it’s classified as a deployment utility, you can really think of Capistrano as a tool to run the same command on a bunch of servers at once. The most useful command is svn update , but you can really run anything.

158

Once you end up with multiple web servers, keeping ‘em in sync is hard, and NFS is failure-prone. Deployment tools keep sanity.

159

Yes, it’s Ruby :)

The Capistrano DSL, though, is pretty sweet; here I’m defining a remote command I can easily run with cap upgrade_project . I can’t really show much more code examples since each site will be different, but I suggest just reading through the manual and playing around; it’s really not very hard.

160

A couple of tricks we’ve learned:

  • If you’ve got a “restart” task (to reload Apache or whatever), make sure to stagger the restarts so you don’t have any downtime.
  • Capistrano is great to combine with a build process. We use it to crunch and combine JavaScript, and it rocks. deploy javascript combines the build process and the roll-out process.
  • It’s also a good idea to bake cache-busting into your code deployment task.
161

http://www.unessa.net/en/hoyci/2007/06/using-capistrano-deploy-django-apps/ has a good introduction to using Capistrano with Django.

162
163

GIS - Geographic Information System - has existed in various forms since the earliest innovators used the stars and seasons to build and travel.

For the last couple centuries, it’s been used by governments and corporations (especially petroleum) for planning and managing real-world development and prospecting.

A modern GIS system offers primitives like point, line, and polygon in order to facilitate spatial operations.

These spatial operations are typically containment, area, intersection, and the like.

In addition, a good GIS handles various coordinate systems, which are simply ways of agreeing on measurement and description. You can think of a coordinate system as being something like character encodings, but messier, and without an obvious winner like UTF-8.

164

The earth, while not flat, is also not round. It’s round-ish, with lots of inconvenient bumps and dips.

An ellipsoid is a sort-of-earthy shape that’s easier on the computer processor. Since it’s not quite right, datums are used to choose a specific area of the earth for which the distortion is minimized for the chosen ellipsoid. (Diagram needed)

Unit systems are things like metric, lat/lon, and radian. They’re various ways to describe a given point or distance.

Axes and Origins are the orientation and starting point for a coordinate system. The choose where 0,0 is and which direction 10,20 is from that.

Projections are various ways of distorting the ellipsoid onto a flat surface. Mercator is pretty popular these days, and Google Maps uses it. Mercator is good for directions because it preserves angles, but bad for comparisons because it distorts area sizes. There are several other classes of projections, and dozens of projections within those classes.

The governments, a while back, wanted to accurately fire long range missiles at each other, and GPS came out of that. GPS uses WGS 84, an ellipsoid/datum that’s pretty accurate everywhere on the earth.

165

ESRI dominates the closed-source and commercial GIS landscape. Local and state governments generally use ESRI products. ESRI’s shapefile (.shp ), like Microsoft’s Doc (.doc ), is the de facto standard for data interchange.

OGC, the Open Geospatial Consortium, has been working on standards for years, and there is now a good bit of mature open-source GIS software. It’s not ready to replace ESRI for all things, but it makes sense for a lot of cases, and I think it’s a safe bet that the situation will continue that direction.

GEOS provides a cross-platform C++ library (with C interface!) for many useful GIS operations. GDAL/OGR is a package for manipulating raster (i.e. image) and vector (i.e. shapefile) data in many formats. It uses PROJ.4 deals with adapting between the many coordinate systems.

GIS is becoming popularized, spurred with Google Earth and Google Maps.

166

Geo Django is in early development. This presentation shows the APIs as they are today, but they’re likely to change and improve.

I just couldn’t help sharing what we’ve got so far.

167

PostgreSQL is a popular database for use with Django, and PostGIS is a mature implementation of the OGC’s Simple Features specification. Initial Geo Django work uses this as a basis.

168

To create a GIS-enabled model, import the models module from the GIS contrib package. That package includes a custom GeoManager , which adds spatial lookup types to the standard Django QuerySet object.

Also use one of the supplied geometry fields as required by your data. Note that you can override the spatial reference system, which defaults to WGS84. Just using the default generally makes sense.

169

Geo Django includes a utility, LayerMapping , to load data out of any GDAL-supported format, such as shapefiles.

LayerMapping is a bridge between the GDAL datasource and the Django model.

170

Once you have data loaded, you can check containment, area, overlap, union, and more.

If you want to get or assign well-known text, you have access to that as well.

171

QuerySets work just like you’d expect.

Basic spatial queries are supported. Here we’re finding any schools within the boundary of a given district.

172

A sitemap is an XML file intended to give crawlers links to discover “greynet” pages. Django already has excellent support for sitemap generation based on querysets.

173
The GIS package adds support for KML generation for any model which includes a GeometryField .
174

We have a bunch of features on the drawing board and in various states of usefulness.

If you have what you think is a common need, we’d like to hear from you.

175

Get started here: http://code.djangoproject.com/wiki/GeoDjango . The docs are thin at this point, but the wiki page provides some good pointers. There are unit tests to show more advanced usage.

Getting GEOS and GDAL installed will give you some nice toys.

176

We’re not alone in the open-source GIS space, and there are lots of useful tools. Many of them are even in Python. These will help you start making sense of GIS problems and solutions:

  • Python Cartographic Library .
  • QuantGIS (spatial data browser).
  • GRASS (GIS manipulation app).
  • Mapnik (WMS w/Agg).
  • OpenLayers (in-browser GIS toolkit).
177

你可能感兴趣的:(Django Master Class)