There are two object abstractions: the "object database", and the
"current directory cache" aka "index".
The Object Database
~~~~~~~~~~~~~~~~~~~
The object database is literally just a content-addressable collection
of objects. All objects are named by their content, which is
approximated by the SHA1 hash of the object itself. Objects may refer
to other objects (by referencing their SHA1 hash), and so you can
build up a hierarchy of objects.
对象数据库本质上只是一组内容可寻址的对象。所有对象都根据他们内容进行命名,即通过该对象的SHA1哈希值。对象可能引用到其它的对象(通过SHA1哈希值),这样你可以建立对象的层级关系。
All objects have a statically determined "type" aka "tag", which is
determined at object creation time, and which identifies the format of
the object (i.e. how it is used, and how it can refer to other
objects). There are currently four different object types: "blob",
"tree", "commit" and "tag".
所有的对象都有一种确定的“类型”(或称“标签”),“类型”在对象创建时确定并用于标识该对象的格式(例如:该对象如何使用,如何被其它对象引用)。当前有四种不同的对象类型:"blob", "tree", "commit"和"tag"。
A "blob" object cannot refer to any other object, and is, like the tag
implies, a pure storage object containing some user data. It is used to
actually store the file data, i.e. a blob object is associated with some
particular version of some file.
"blob"对象不能引用其它的对象,就像标签定义的那样,该对象只是用于单纯的存储一些用户数据。它被用来存储文件数据,例如:一个blob对象与一些文件的一此特定版本相关联。
A "tree" object is an object that ties one or more "blob" objects into a
directory structure. In addition, a tree object can refer to other tree
objects, thus creating a directory hierarchy.
"tree"对象用于将一个或多个"blob"对象绑定到一个目录结构。除此之外,一个树对象能够引用其它的树对象,因此创建一个目录层级。
A "commit" object ties such directory hierarchies together into
a DAG of revisions - each "commit" is associated with exactly one tree
(the directory hierarchy at the time of the commit). In addition, a
"commit" refers to one or more "parent" commit objects that describe the
history of how we arrived at that directory hierarchy.
"commit"对象将上述的目录继承绑定到一起放到一个DAG修订中——每个"commit"都与只与一个树对象相关联(进行提交时的目录层级)。除此之外,一个提交对象引用一个或多个"parent"提交对象,用于描述我们如何到达本次目录层级的历史记录。
As a special case, a commit object with no parents is called the "root"
object, and is the point of an initial project commit. Each project
must have at least one root, and while you can tie several different
root objects together into one project by creating a commit object which
has two or more separate roots as its ultimate parents, that's probably
just going to confuse people. So aim for the notion of "one root object
per project", even if git itself does not enforce that.
在特殊情况下,一个没有父的提交对象被称为"root"对象,它是一个项目初始化提交的一个结点。每个对象必须至少有一个根,当你能够将多个不同的根对象绑定到一个项目通过创建一个提交对象,则该提交对象有二个或多个独立的根作为它的最终父对象,这些可能使人们产生困扰。于是最好一个项目只有一个根对象,即使git不对些进行强制要求。
A "tag" object symbolically identifies and can be used to sign other
objects. It contains the identifier and type of another object, a
symbolic name (of course!) and, optionally, a signature.
"tag"对象被用于符号标识其它的对象。它包括其它对象的标识,类型以及一个符号名称并且可选的有一个标记。
Regardless of object type, all objects share the following
characteristics: they are all deflated with zlib, and have a header
that not only specifies their tag, but also provides size information
about the data in the object. It's worth noting that the SHA1 hash
that is used to name the object is the hash of the original data.
(Historical note: in the dawn of the age of git the hash
was the sha1 of the _compressed_ object)
无论何种类型的对象,所有的对象都共享下面的特性:他们都由zlib进行压缩并且都有一个头不仅指定他们的标签而且提供在对象中数据的大小信息。有必要注意作为对象名称的SHA1哈希值也是数据原始内容的哈希值。
As a result, the general consistency of an object can always be tested
independently of the contents or the type of the object: all objects can
be validated by verifying that (a) their hashes match the content of the
file and (b) the object successfully inflates to a stream of bytes that
forms a sequence of <ascii tag without space> + <space> + <ascii decimal
size> + <byte\0> + <binary object data>.
通常一个对象的一致性可以通过内容或对象类型进行独立的测试:所有的对象都能被验证通过(a)他们的哈希值与文件内容相匹配并且(b)对象以一组序列: <ascii tag without space> + <space> + <ascii decimal size> + <byte\0> + <binary object data>被成功的压缩到比特流。
The structured objects can further have their structure and
connectivity to other objects verified. This is generally done with
the "git-fsck-cache" program, which generates a full dependency graph
of all objects, and verifies their internal consistency (in addition
to just verifying their superficial consistency through the hash).
结构化的对象可以有他们的结构并且连接到直接对象验证。通常通过"git-fsck-cache"命令生成所有对象的全部依赖关系图并验证它们的内部一致性(只是通过哈希验证他们的表面一致性)。
The object types in some more detail:
对象类型的详细描述:
Blob Object
~~~~~~~~~~~
A "blob" object is nothing but a binary blob of data, and doesn't
refer to anything else. There is no signature or any other
verification of the data, so while the object is consistent (it _is_
indexed by its sha1 hash, so the data itself is certainly correct), it
has absolutely no other attributes. No name associations, no
permissions. It is purely a blob of data (i.e. normally "file
contents").
"blob"对象只是数据的二进制blob并且不引用其它内容。该对象没有数据的标记或其它验证,当一个对象是一致(通过SHA1哈希值进行索引,数据本身是正确的)时,它并没有其它属性。没有名称与之关联,没有权限信息。有的只是纯的blob数据(通常为文件内容)。
In particular, since the blob is entirely defined by its data, if two
files in a directory tree (or in multiple different versions of the
repository) have the same contents, they will share the same blob
object. The object is totally independent of it's location in the
directory tree, and renaming a file does not change the object that
file is associated with in any way.
特别注意,当blob对象整个是由它的数据所定义的,如果有两个文件在同一个目录树(或者在配置库多个不同的版本)中有相同的内容,他们将共享同一个blob对象。对象是与它在目录树中的位置完成独立,并且重命名一个文件并不改变该文件与之关联的对象。
A blob is typically created when link:git-update-cache.html[git-update-cache]
is run, and it's data can be accessed by link:git-cat-file.html[git-cat-file].
一个blob对象通常在运行git-update-cache命令时创建并且它的数据可以通过git-cat-file访问。
Tree Object
~~~~~~~~~~~
The next hierarchical object type is the "tree" object. A tree object
is a list of mode/name/blob data, sorted by name. Alternatively, the
mode data may specify a directory mode, in which case instead of
naming a blob, that name is associated with another TREE object.
下一个分层的对象类型是"tree"对象。tree对象是一个根据name排序的mode/name/blob data的列表。可选的,模式数据可能指定为一个目录模式,在这种情况下就不是blob的名称而是命名一个与之相关联的另一个树对象。
Like the "blob" object, a tree object is uniquely determined by the
set contents, and so two separate but identical trees will always
share the exact same object. This is true at all levels, i.e. it's
true for a "leaf" tree (which does not refer to any other trees, only
blobs) as well as for a whole subdirectory.
与"blob"相类似,tree对象是通过其内容进行唯一标识,于是两个独立但完成相同的tree对象总是共享完全相同的对象。这个在所有级别上都成立,例如:对于所有叶子树(没有引用其它的树对象只包含blob对象)也成立,对于整个子目录也成立。
For that reason a "tree" object is just a pure data abstraction: it
has no history, no signatures, no verification of validity, except
that since the contents are again protected by the hash itself, we can
trust that the tree is immutable and its contents never change.
因为上述的原因,tree对象只是纯数据抽象:它没有历史记录,没有标志,没有有效性验证除了当其内容通过自身进行哈希而重新进行保护,我们能够相信树对象是不可变的且它的内容从不发生改变。
So you can trust the contents of a tree to be valid, the same way you
can trust the contents of a blob, but you don't know where those
contents _came_ from.
于是你可以相信一个tree对象的内容是可用的,使用相同的方法你可相信blob对象的内容,但你不知道这些内容是来自哪里。
Side note on trees: since a "tree" object is a sorted list of
"filename+content", you can create a diff between two trees without
actually having to unpack two trees. Just ignore all common parts,
and your diff will look right. In other words, you can effectively
(and efficiently) tell the difference between any two random trees by
O(n) where "n" is the size of the difference, rather than the size of
the tree.
tree旁注1:当一个tree对象是一组通过filename+content排序的列表,你可以创建两个树对象之间的区别而无需解压两个树对象。只是无视其相同的部分并且区别将是正确的。换句话说,你可以很有效率的指出两个随机树对象的区别在O(n)时间内,n表示区别的大小,而不树的大小。
Side note 2 on trees: since the name of a "blob" depends entirely and
exclusively on its contents (i.e. there are no names or permissions
involved), you can see trivial renames or permission changes by
noticing that the blob stayed the same. However, renames with data
changes need a smarter "diff" implementation.
tree旁注2:当一个blob对象的名称全部并且唯一的依赖于其内容(例如:没有名称或权限需要涉及),你可以看到重命名或权限改变并不会改变blob对象。即使,重命名并伴随数据改变需要一个更聪明的diff声明。
A tree is created with link:git-write-tree.html[git-write-tree] and
it's data can be accessed by link:git-ls-tree.html[git-ls-tree]
一个tree对象通过git-write-tree创建并且它的数据可能通过git-ls-tree进行访问。
Commit Object
~~~~~~~~~~~~~
The "commit" object is an object that introduces the notion of
history into the picture. In contrast to the other objects, it
doesn't just describe the physical state of a tree, it describes how
we got there, and why.
"commit"对象是一个介绍历史的示意对象。与其它对象进行比较,它不仅描述了tree对象的物理状态,而且描述了我们为什么以及如何到达这种状态。
A "commit" is defined by the tree-object that it results in, the
parent commits (zero, one or more) that led up to that point, and a
comment on what happened. Again, a commit is not trusted per se:
the contents are well-defined and "safe" due to the cryptographically
strong signatures at all levels, but there is no reason to believe
that the tree is "good" or that the merge information makes sense.
The parents do not have to actually have any relationship with the
result, for example.
"commit"通过其最后看到的tree对象,使其到达该点的父提交对象(0,1或多个)和一个注释通知发生了什么。commit对于第一个se是不被信任的:内容是被良好定义并且安全的取决于在每个级别上加密强标识,但没有理由去相信一个tree是好的或合并信息是有意义的。例如:父提交对象不必须与结果拥有任何关系。
Note on commits: unlike real SCM's, commits do not contain
rename information or file mode chane information. All of that is
implicit in the trees involved (the result tree, and the result trees
of the parents), and describing that makes no sense in this idiotic
file manager.
注:与真实的SCM不同,commit不包含重命名信息或文件模式改变信息。所有的这些情况都是隐式的包含在其关系的tree对象中(结果树和父对象的结果树),and describing that makes no sense in this idiotic
file manager.
A commit is created with link:git-commit-tree.html[git-commit-tree] and
it's data can be accessed by link:git-cat-file.html[git-cat-file]
commit对象通过git-commit-tree创建并且它的内容可以通过git-cat-file查看。
Trust
~~~~~
An aside on the notion of "trust". Trust is really outside the scope
of "git", but it's worth noting a few things. First off, since
everything is hashed with SHA1, you _can_ trust that an object is
intact and has not been messed with by external sources. So the name
of an object uniquely identifies a known state - just not a state that
you may want to trust.
关于trust的旁注。信任是git在范围外,但有些事件是值得去了解的。首先,当所有的内容都是通过SHA1进行哈希,你可以相信一个对象是完整的并且不会被外部源搞杂。于是一个对象的名称可以唯一的标识一个已知的状态——不仅是你需要信任的状态。
Furthermore, since the SHA1 signature of a commit refers to the
SHA1 signatures of the tree it is associated with and the signatures
of the parent, a single named commit specifies uniquely a whole set
of history, with full contents. You can't later fake any step of the
way once you have the name of a commit.
而且,当一个commit对象的SHA1标识引用一个与之关联的tree对象的SHA1标识和其父对象的标识,单独命名的commit对象指定唯一历史全部记录及其全部内容。当你拥有一个commit名称时,你不能在以后假造任意的能够产生该提交名的步骤。
So to introduce some real trust in the system, the only thing you need
to do is to digitally sign just _one_ special note, which includes the
name of a top-level commit. Your digital signature shows others
that you trust that commit, and the immutability of the history of
commits tells others that they can trust the whole history.
为了介绍一些系统中真实的信任,你唯一需要做的是将数字签名包含最上层提交的名称。你进行数据签名向其它对象表示你信任本次提交,并且不可改变的历史提交告诉其它人可以信任整个历史记录。
In other words, you can easily validate a whole archive by just
sending out a single email that tells the people the name (SHA1 hash)
of the top commit, and digitally sign that email using something
like GPG/PGP.
换句话说,你可以简单的验证整整个归档通过发送一封告诉其它人最顶层提交名称(SHA1哈希)的邮件,并且像GPG/PGP数字签名这封邮件。
To assist in this, git also provides the tag object...
为了辅助这个,git提供了tag对象...
Tag Object
~~~~~~~~~~
Git provides the "tag" object to simplify creating, managing and
exchanging symbolic and signed tokens. The "tag" object at its
simplest simply symbolically identifies another object by containing
the sha1, type and symbolic name.
Git提供"tag"对象为了简化创建,迁移和交换符号以及标识字段。"tag"对象在最简单的符号通过包含SHA1,类型和符号名来标识其它对象。
However it can optionally contain additional signature information
(which git doesn't care about as long as there's less than 8k of
it). This can then be verified externally to git.
尽管它可以可选的包含附加的标识信息(只要它们小于8k,Git是不会关心的)。这些可以由git进行外部验证。
Note that despite the tag features, "git" itself only handles content
integrity; the trust framework (and signature provision and
verification) has to come from outside.
注意:不管tag特性,git自己只处理内容的完整性;信任系统(标识的提供及验证)可以由外部提供。
A tag is created with link:git-mktag.html[git-mktag] and
it's data can be accessed by link:git-cat-file.html[git-cat-file]
tag对象通过git-mktag进行创建并且可以通过git-cat-file访问。
The "index" aka "Current Directory Cache"
-----------------------------------------
The index is a simple binary file, which contains an efficient
representation of a virtual directory content at some random time. It
does so by a simple array that associates a set of names, dates,
permissions and content (aka "blob") objects together. The cache is
always kept ordered by name, and names are unique (with a few very
specific rules) at any point in time, but the cache has no long-term
meaning, and can be partially updated at any time.
index是一个简单的二进制文件,包含在随机时间内虚拟目录内容的有效的体现。它通过与之关联的一组名称,日期,权限以及内容对象的简单数组来完成。缓冲总是通过名称排序并且在任何时间名称总是唯一的,但缓冲对于长时间来说是没有意义的并且可以的任意时间进行部分更新。
In particular, the index certainly does not need to be consistent with
the current directory contents (in fact, most operations will depend on
different ways to make the index _not_ be consistent with the directory
hierarchy), but it has three very important attributes:
虽然,index不需要与当前目录内容保持一致,但它有三个非常重要的属性:
'(a) it can re-generate the full state it caches (not just the
directory structure: it contains pointers to the "blob" objects so
that it can regenerate the data too)'
(a) 它可以重新生成它所缓冲内容的全状态(不仅是目录结构:它也包含指向blob对象的指针,也可以重新生成数据)。
As a special case, there is a clear and unambiguous one-way mapping
from a current directory cache to a "tree object", which can be
efficiently created from just the current directory cache without
actually looking at any other data. So a directory cache at any one
time uniquely specifies one and only one "tree" object (but has
additional data to make it easy to match up that tree object with what
has happened in the directory)
在特殊的情况下,有一个干净并且单向不模糊的从当前目录缓冲到tree对象的映射,这就使用其能有效的从当前目录缓冲中创建而不需要查看其它任何数据。所以一个目录缓冲在任意时刻可以唯一的标识一个tree对象。
'(b) it has efficient methods for finding inconsistencies between that
cached state ("tree object waiting to be instantiated") and the
current state.'
(b) 它具有有效的方法用于查找缓冲状态(tree对象等待被初始化)和当前缓冲状态的不一致性。
'(c) it can additionally efficiently represent information about merge
conflicts between different tree objects, allowing each pathname to be
associated with sufficient information about the trees involved that
you can create a three-way merge between them.'
(c) 它可以附加的有效的表示不同tree对象进行合并时的冲突信息,允许与之相关的有足够信息的关于tree对象的每个路径进行参与,即你可以创建在它们中进行三方合并。
Those are the three ONLY things that the directory cache does. It's a
cache, and the normal operation is to re-generate it completely from a
known tree object, or update/compare it with a live tree that is being
developed. If you blow the directory cache away entirely, you generally
haven't lost any information as long as you have the name of the tree
that it described.
上述三个是目录缓冲所做的事情。它是一个缓冲,通常的操作是从一个已知的tree对象中重新生成它,或将它与一个正在开发的树进行更新/比较。如果你始终保持目录缓冲有内容,则通常你不会失去任何内容直到你有一个描述其内容的tree对象。
At the same time, the directory index is at the same time also the
staging area for creating new trees, and creating a new tree always
involves a controlled modification of the index file. In particular,
the index file can have the representation of an intermediate tree that
has not yet been instantiated. So the index can be thought of as a
write-back cache, which can contain dirty information that has not yet
been written back to the backing store.
与此同时,目录索引也是创建新树的缓冲区域,并且创建一个新树总是涉及到对索引文件的控制修改。尽管,索引文件可以包含一个没有被初始化的中间tree对象。所以索引可以被看作一个可回写的缓冲,里面可以包含还没有写回到后续存储的脏信息。