Bubble-Up Method
This section provides a conversational explanation of how the repository actually stores and revisions file trees. It's not critical knowledge for a programmer using the Subversion Filesystem API, but most people probably still want to know what's going on “under the hood” of the repository.
Suppose we have a new project, at revision 1, looking like this (using CVS syntax):
prompt$ svn checkout myproj U myproj/ U myproj/B U myproj/A U myproj/A/fish U myproj/A/fish/tuna prompt$
Only the file tuna is a regular file, everything else in myproj is a directory.
Let's see what this looks like as an abstract data structure in the repository, and how that structure works in various operations (such as update, commit, and branch).
In the diagrams that follow, lines represent parent-to-child connections in a directory hierarchy. Boxes are "nodes". A node is either a file or a directory – a letter in the upper left indicates which kind. A file node has a byte-string for its content, whereas directory nodes have a list of dir_entries, each pointing to another node.
Parent-child links go both ways (i.e., a child knows who all its parents are), but a node's name is stored only in its parent, because a node with multiple parents may have different names in different parents.
At the top of the repository is an array of revision numbers, stretching off to infinity. Since the project is at revision 1, only index 1 points to anything; it points to the root node of revision 1 of the project:
( myproj's revision array ) ______________________________________________________ |___1_______2________3________4________5_________6_____... | | ___|_____ |D | | | | A | /* Two dir_entries, `A' and `B'. */ | \ | | B \ | |__/___\__| / \ | \ | \ ___|___ ___\____ |D | |D | | | | | | | | fish | /* One dir_entry, `fish'. */ |_______| |___\____| \ \ ___\____ |D | | | | tuna | /* One dir_entry, `tuna'. */ |___\____| \ \ ___\____ |F | | | | | /* (Contents of tuna not shown.) */ |________|
What happens when we modify tuna and commit? First, we make a new tuna node, containing the latest text. The new node is not connected to anything yet, it's just hanging out there in space:
________ |F | | | | | |________|
Next, we create a new revision of its parent directory:
________ |D | | | | tuna | |___\____| \ \ ___\____ |F | | | | | |________|
We continue up the line, creating a new revision of the next parent directory:
________ |D | | | | fish | |___\____| \ \ ___\____ |D | | | | tuna | |___\____| \ \ ___\____ |F | | | | | |________|
Now it gets more tricky: we need to create a new revision of the root directory. This new root directory needs an entry to point to the “new” directory A, but directory B hasn't changed at all. Therefore, our new root directory also has an entry that still points to the old directory B node!
______________________________________________________ |___1_______2________3________4________5_________6_____... | | ___|_____ ________ |D | |D | | | | | | A | | A | | \ | | \ | | B \ | | B \ | |__/___\__| |__/___\_| / \ / \ | ___\_____________/ \ | / \ \ ___|__/ ___\____ ___\____ |D | |D | |D | | | | | | | | | | fish | | fish | |_______| |___\____| |___\____| \ \ \ \ ___\____ ___\____ |D | |D | | | | | | tuna | | tuna | |___\____| |___\____| \ \ \ \ ___\____ ___\____ |F | |F | | | | | | | | | |________| |________|
Finally, after all our new nodes are written, we finish the “bubble up” process by linking this new tree to the next available revision in the history array. In this case, the new tree becomes revision 2 in the repository.
______________________________________________________ |___1_______2________3________4________5_________6_____... | \ | \__________ ___|_____ __\_____ |D | |D | | | | | | A | | A | | \ | | \ | | B \ | | B \ | |__/___\__| |__/___\_| / \ / \ | ___\_____________/ \ | / \ \ ___|__/ ___\____ ___\____ |D | |D | |D | | | | | | | | | | fish | | fish | |_______| |___\____| |___\____| \ \ \ \ ___\____ ___\____ |D | |D | | | | | | tuna | | tuna | |___\____| |___\____| \ \ \ \ ___\____ ___\____ |F | |F | | | | | | | | | |________| |________|
Generalizing on this example, you can now see that each “revision” in the repository history represents a root node of a unique tree (and an atomic commit to the whole filesystem.) There are many trees in the repository, and many of them share nodes.
Many nice behaviors come from this model:
-
Easy reads. If a filesystem reader wants to locate revision X of file foo.c, it need only traverse the repository's history, locate revision X's root node, then walk down the tree to foo.c.
-
Writers don't interfere with readers. Writers can continue to create new nodes, bubbling their way up to the top, and concurrent readers cannot see the work in progress. The new tree only becomes visible to readers after the writer makes its final “link” to the repository's history.
-
File structure is versioned. Unlike CVS, the very structure of each tree is being saved from revision to revision. File and directory renames, additions, and deletions are part of the repository's history.
Let's demonstrate the last point by renaming the tuna to book.
We start by creating a new parent “fish” directory, except that this parent directory has a different dir_entry, one which points the same old file node, but has a different name:
______________________________________________________ |___1_______2________3________4________5_________6_____... | \ | \__________ ___|_____ __\_____ |D | |D | | | | | | A | | A | | \ | | \ | | B \ | | B \ | |__/___\__| |__/___\_| / \ / \ | ___\_____________/ \ | / \ \ ___|__/ ___\____ ___\____ |D | |D | |D | | | | | | | | | | fish | | fish | |_______| |___\____| |___\____| \ \ \ \ ___\____ ___\____ ________ |D | |D | |D | | | | | | | | tuna | | tuna | | book | |___\____| |___\____| |_/______| \ \ / \ \ / ___\____ ___\____ / |F | |F | | | | | | | | | |________| |________|
From here, we finish with the bubble-up process. We make new parent directories up to the top, culminating in a new root directory with two dir_entries (one points to the old “B” directory node we've had all along, the other to the new revision of “A”), and finally link the new tree to the history as revision 3:
______________________________________________________ |___1_______2________3________4________5_________6_____... | \ \_________________ | \__________ \ ___|_____ __\_____ __\_____ |D | |D | |D | | | | | | | | A | | A | | A | | \ | | \ | | \ | | B \ | | B \ | | B \ | |__/___\__| |__/___\_| |__/___\_| / ___________________/_____\_________/ \ | / ___\_____________/ \ \ | / / \ \ \ ___|/_/ ___\____ ___\____ _____\__ |D | |D | |D | |D | | | | | | | | | | | | fish | | fish | | fish | |_______| |___\____| |___\____| |___\____| \ \ \ \ \ \ ___\____ ___\____ ___\____ |D | |D | |D | | | | | | | | tuna | | tuna | | book | |___\____| |___\____| |_/______| \ \ / \ \ / ___\____ ___\____ / |F | |F | | | | | | | | | |________| |________|
For our last example, we'll demonstrate the way “tags” and “branches” are implemented in the repository.
In a nutshell, they're one and the same thing. Because nodes are so easily shared, we simply create a new directory entry that points to an existing directory node. It's an extremely cheap way of copying a tree; we call this new entry a clone, or more colloquially, a “cheap copy”.
Let's go back to our original tree, assuming that we're at revision 6 to begin with:
______________________________________________________ ...___6_______7________8________9________10_________11_____... | | ___|_____ |D | | | | A | | \ | | B \ | |__/___\__| / \ | \ | \ ___|___ ___\____ |D | |D | | | | | | | | fish | |_______| |___\____| \ \ ___\____ |D | | | | tuna | |___\____| \ \ ___\____ |F | | | | | |________|
Let's “tag” directory A. To make the clone, we create a new dir_entry T in our root, pointing to A's node:
______________________________________________________ |___6_______7________8________9________10_________11_____... | \ | \ ___|_____ __\______ |D | |D | | | | | | A | | A | | \ | | | | | B \ | | B | T | |__/___\__| |_/__|__|_| / \ / | | | ___\__/ / / | / \ / / ___|__/ ___\__/_ / |D | |D | | | | | | | | fish | |_______| |___\____| \ \ ___\____ |D | | | | tuna | |___\____| \ \ ___\____ |F | | | | | |________|
Now we're all set. In the future, the contents of directories A and B may change quite a lot. However, assuming we never make any changes to directory T, it will always point to a particular pristine revision of directory A at some point in time. Thus, T is a tag.
(In theory, we can use some kind of authorization system to prevent anyone from writing to directory T. In practice, a well-laid out repository should encourage “tag directories” to live in one place, so that it's clear to all users that they're not meant to change.)
However, if we do decide to allow commits in directory T, and now our repository tree increments to revision 8, then T becomes a branch. Specifically, it's a branch of directory A which shares history with A up to a certain point, and then “broke off” from the main line at revision 8.
License — Copyright
Copyright © 2000-2006 Collab.Net. All rights reserved.
This software is licensed as described in the file COPYING, which you should have received as part of this distribution. The terms are also available at http://subversion.tigris.org/license-1.html. If newer versions of this license are posted there, you may use a newer version instead, at your option.
track:http://subversion.tigris.org/design.html#server.fs.struct.bubble-up