https://git.wiki.kernel.org/index.php/GitSvnComparison
Note: This page is currently a work in progress. It started out as a private email to someone who currently uses Subversion. I decided to make it available and try to extend it further. I'll remove this comment when the page is improved. :) -- Shawn Pearce
See the discussion page for further comments
Although this page is hosted on a Git-specific Wiki it tries to provide a fair and unbiased comparison of Git and Subversion to help prospective users of both tools better evaluate their choices. This page only describes base Subversion and does not discuss the benefits and drawbacks to using SVK, a distributed wrapper around Subversion.
Git was designed from the ground up as a distributed version control system. Being a distributed version control system means that multiple redundant repositories and branching are first class concepts of the tool.
In a distributed VCS like Git every user has a complete copy of the repository data stored locally, thereby making access to file history extremely fast, as well as allowing full functionality when disconnected from the network. It also means every user has a complete backup of the repository. Have 20 users? You probably have more than 20 complete backups of the repository as some users tend to keep more than one repository for the same project. If any repository is lost due to system failure only the changes which were unique to that repository are lost. If users frequently push and fetch changes with each other this tends to be a small amount of loss, if any.
In a centralized VCS like Subversion only the central repository has the complete history. This means that users must communicate over the network with the central repository to obtain history about a file. Backups must be maintained independently of the VCS. If the central repository is lost due to system failure it must be restored from backup and changes since that last backup are likely to be lost. Depending on the backup policies in place this could be several man-weeks worth of work.
(Note that even SVK doesn't do quite the same thing as git. SVK downloads a complete history and allows disconnected commits, but there is still a unique "upstream" repository. Two SVK users can't merge with each other and then push the changes to the upstream.)
Due to being distributed, you inherently do not have to give commit access to other people in order for them to use the versioning features. Instead, you decide when to merge what from whom.
That is, because subversion controls access, in order for daily checkins to be allowed - for example - the user requires commit access. In git, users are able to have version control of their own work while the source is controlled by the repo owner.
(There exist different mechanisms of control in case you do want to have a repository into which multiple people can push to. -not covered here yet-)
Branches in Git are a core concept used everyday by every user. In Subversion they are more cumbersome and often used sparingly.
The reason branches are so core in Git is every developer's working directory is itself a branch. Even if two developers are modifying two different unrelated files at the same time it's easy to view these two different working directories as different branches stemming from the same common base revision of the project.
Consequently Git:
This is different to Subversion's handling of branches. As of Subversion 1.5:
Git is extremely fast. Since all operations (except for push and fetch) are local there is no network latency involved to:
FIXME: Include actual comparisons, e.g. load Git code into both Git and SVN.
Git's repository and working directory sizes are extremely small when compared to SVN.
For example the Mozilla repository is reported to be almost 12 Gb when stored in SVN using the fsfs backend. Previously, the fsfs backend also required over 240,000 files in one directory to record all 240,000 commits made over the 10 year project history. This was fixed in SVN 1.5, where every 1000 revisions are placed in a separate directory. The exact same history is stored in Git by only two files totaling just over 420 Mb. This means that SVN requires 30x the disk space to store the same history.
One of the reasons for the smaller repo size is that an SVN working directory always contains two copies of each file: one for the user to actually work with and another hidden in .svn/ to aid operations such as status, diff and commit. In contrast a Git working directory requires only one small index file that stores about 100 bytes of data per tracked file. On projects with a large number of files this can be a substantial difference in the disk space required per working copy.
As a full Git clone is often smaller than a full checkout, Git working directories (including the repositories) are typically smaller than the corresponding SVN working directories. There are even ways in Git to share one repository across many working directories, but in contrast to SVN, this requires the working directories to be colocated.
Subversion can be easily configured to automatically convert line endings to CRLF or LF, depending on the native line ending used by the client's operating system. This conversion feature is useful when Windows and UNIX users are collaborating on the same set of source code. It is also possible to configure a fixed line ending independent of the native operating system. Files such as a Makefile need to only use LFs, even when they are accessed from Windows. This can be adjusted in a global config and overridden in user configs. Binary files are checked in with a binary flag (like with CVS except that SVN does this almost always automatically) and such never get converted or keyword substituted. Subversion also allows the user to specify line ending conversion on a file-by-file basis. But if the user does not check the binary flag on adding (Subversion prints for every added file whether it recognized it as binary) binary content might get corrupted.
Whilst Git versions prior 1.5.1 never convert files and always assume that every file is opaque and should not be modified. Git 1.5.1 and onwards make [line ending conversion configurable]. Git's advantage over Subversion is that you do not have to manually specify which files this conversion should be applied to, it happens automatically (hence autocrlf).
Subversion has some notable features that Git currently doesn't have or will never have.
Currently Subversion has a wider range of user interface tools than Git. For example there are SVN plugins available for most popular IDEs. There is a Windows Explorer shell extension. There are a number of native Windows and Mac OS X GUI tools available in ready-to-install packages.
Git's primary user interface is through the command line. There are two graphical interfaces: git-gui (distributed with Git) and qgit, which is making great strides towards providing another feature-complete graphical interface. Also gitk, the graphical history browser, can be more than just a fancy log reader. git-gui and gitk usually work out-of-box for common operating systems, and qgit is being ported to Qt4, which improves its portability. There are some user interface tools in development for Git, namely TortoiseGit, a port of TortoiseSVN. There is also Git Extensions, another explorer shell extension.
Since Subversion only supports a single repository there is little doubt about where something is stored. Once a user knows the repository URL they can reasonably assume that all materials and all branches related to that project are always available at that location. Backup to tape/CD/DVD is also simple as there is exactly one location that needs to be backed up regularly.
Since Git is distributed by nature not everything related to a project may be stored in the same location. Therefore there may be some degree of confusion about where to obtain a particular branch, unless repository location is always explicitly specified. There may also be some confusion about which repositories are backed up to tape/CD/DVD regularly, and which aren't.
Since Subversion has a single central repository it is possible to specify read and write access controls in a single location and have them be enforced across the entire project.
Subversion can be used with binary files (it is automatically detected; if that detection fails, you have to mark the file binary yourself). Just like Git.
Only that with Git, the default is to interpret the files as binary to begin with. If you _have_ to have CR+LF line endings (even though most modern programs grok the saner LF-only line endings just fine), you have to tell Git so. Git will then autodetect if a file is text (just like Subversion), and act accordingly. Analogous to Subversion, you can correct an erroneous autodetection by setting a git attribute.
In an earlier version of git [number?] seemingly minor changes to binary files, such as adjusting brightness on an image, could be different enough that Git interprets them as a new file, causing the content history to split. Since Subversion tracks by file, history for such changes is maintained.
With Subversion, you can check out just a subdirectory of a repository. This is not possible with Git. For a large project, this means that you always have to download the whole repository, even if you only need the current version of some sub-directory. In times where fast Internet connections are only available in most cities and traffic over mobile internet connections is expensive, git can cost much more time and money in rural areas or with mobile devices. This is arguably mitigated by the small size of git repositories.
In other cases, requirements other than the raw repository size provide the motivation for wanting a partial checkout, e.g. access control (you can't restrict read access to part of the repository with Git) or directory layout requirements. There is no general solution for this problem other than to split the original Git repository into multiple repositories, then cloning one of the new repositories. (Git subprojects can mitigate some of the difficulties of managing the collection of new repositories.)
First, as SVN assigns revision numbers sequentially (starting from 1) even very old projects such as Mozilla have short unique revision numbers (Mozilla is only up to 6 digits in length). Many users find this convenient when entering revisions for historical research purposes. They also find this number easy to embed into their product, supposedly making it easy to determine which sources were used to create a particular executable. However since the revision number is global to the entire repository, including all branches, there is still a question of which branch the revision number corresponds to.
Unless the last committed revision is recorded. Since revisions are global for a repository, the last committed revision makes it possible to determine which branch was used
As Git uses a SHA1 to uniquely identify a commit each specific revision can only be described by a 40 character hexadecimal string, however this string not only identifies the revision but also the branch it came from. In practice the first 8 characters tends to be unique for a project, however most users try to not rely on this over the long term. Rather than embedding long commit SHA1s into executables Git users generate a uniquely named tag. This is an additional step, but a simple one.
Secondly, SVN's revision numbers are predictable. If the current commit is 435 the next one will be 436. It's very easy then to go through a few sequential revisions to, e.g. look at differences, revert to an old revision to find when a regression was introduced, etc. Furthermore, without looking up any additional information, you know that commit 436 was done after 435. Similar actions and knowledge from git requires looking at the log.
Git provides shorthand syntax to partially compensate for this by allowing you to add any number of ^ after a revision to indicate how far back to go. e8fa9c^^^..e8fa9c, for instance, would show the history for e8fa9c and it's 3 parent revisions. (However, it does not provide any shorthand syntax for going forward in time.)
The benefits of Git's branch and merge handling that are mentioned above come with a downside that can occasionally surface: they slightly restrict your freedom. For instance, it is not possible to commit to two branches at once with Git, but it is in Subversion. There are other restrictions as well; [[1]] has a brief, unfortunately not-terribly-informative discussion of some of the issues that the team working on cvs2git ran into in terms of Git being less flexible than CVS and Subversion.
[[2]] has another brief discussion, including this quote: "CVS allows a branch or tag to be created from arbitrary combinations of source revisions from multiple source branches. It even allows file revisions that were never contemporaneous to be added to a single branch/tag. Git, on the other hand, only allows the full source tree, as it existed at some instant in the history, to be branched or tagged as a unit. Moreover, the ancestry of a git revision makes implications about the contents of that revision. This difference means that it is fundamentally impossible to represent an arbitrary CVS history in a git repository 100% faithfully."
As demonstrated by the large number of large projects that use Git for version control, none of these operations are at all essential; however, it's certainly believable that there are use cases where Git's restrictions are an actual impediment. Perhaps more seriously, these restrictions mean that switching from CVS or Subversion to Git may result in a loss of fidelity in terms of the history of the project.
However, in most cases the extra freedom in creating revisions and tags in Subversion may bring nothing but a possibility of mishandling and confusion.
Provided as reference, until this page is cleaned up.
The key things that I like about Git are: