Swift架构概述

SwiftArchitectural Overview(Swift架构概述

Proxy Server

The Proxy Server is responsible for tying together therest of the Swift architecture. For each request, it will look up the locationof the account, container, or object in the ring (see below) and route therequest accordingly. For Erasure Code type policies, the Proxy Server is alsoresponsible for encoding and decoding object data. See Erasure Code Support for complete information on Erasure Code support. The public API isalso exposed through the Proxy Server.

代理服务器负责Swift架构的其余组件间的相互通信。对于每个客户端的请求,它将在环中查询帐号、容器或者对象的位置并且相应地转发请求。也可以使用公共API向代理服务器发送请求。

A large number of failures are also handled in the ProxyServer. For example, if a server is unavailable for an object PUT, it will askthe ring for a handoff server and route there instead.

代理服务器也处理大量的失败请求。例如,如果对于某个对象PUT请求时,某个存储节点不可用,它将会查询环可传送的服务器并转发请求。

When objects are streamed to or from an object server,they are streamed directly through the proxy server to or from the user – theproxy server does not spool them.

对象以流的形式到达(来自)对象服务器,它们直接从代理服务器传送到(来自)用户—代理服务器并不缓冲它们。

The Ring

A ring represents a mapping between the names of entitiesstored on disk and their physical location. There are separate rings foraccounts, containers, and one object ring per storage policy. When othercomponents need to perform any operation on an object, container, or account,they need to interact with the appropriate ring to determine its location inthe cluster.

环表示存储在硬盘上的实体名称和物理位置间的映射。帐号、容器、对象都有相应的环。当swift的其它组件(比如复制)要对帐号、容器或对象操作时,需要查询相应的环来确定它在集群上的位置。

The Ring maintains this mapping using zones, devices,partitions, and replicas. Each partition in the ring is replicated, by default,3 times across the cluster, and the locations for a partition are stored in themapping maintained by the ring. The ring is also responsible for determiningwhich devices are used for handoff in failure scenarios.

环使用区域、设备、虚节点和副本来维护这些映射信息。环中每个虚节点在集群中都(默认)3个副本。每个虚节点的位置由环来维护,并存储在映射中。当代理服务器转发的客户端请求失败时,环也负责决定由哪一个设备来接手请求。

The replicas of each partition will be isolated onto asmany distinct regions, zones, servers and devices as the capacity of thesefailure domains allow. If there are less failure domains at a given tier thanreplicas of the partition assigned within a tier (e.g. a 3 replica cluster with2 servers), or the available capacity across the failure domains within a tierare not well balanced it will not be possible to achieve both even capacitydistribution (balance) as well as complete isolation of replicas acrossfailure domains (dispersion). When this occurs the ring management toolswill display a warning so that the operator can evaluate the cluster topology.

在域故障处理能力下,每个分区的副本将被隔离到尽可能多不同的区域,区域,服务器和设备。如果故障冗余设备少于要求的分区的副本数(例如一个副本3集群只有2台服务器),或可用容量是不平衡它将不可能实现容量分配(平衡),以及完全隔离的副本在失效域(分散)的副本。当这个发生时,环管理工具将显示一个警告,以便操作员可以评估群集拓扑结构。

Data is evenly distributed across the capacity availablein the cluster as described by the devices weight. Weights can be used tobalance the distribution of partitions on drives across the cluster. This canbe useful, for example, when different sized drives are used in a cluster.Device weights can also be used when adding or removing capacity or failuredomains to control how many partitions are reassigned during a rebalance to bemoved as soon as replication bandwidth allows.

数据最终根据设备权重在集群的空用容量是均匀分布的。权重可以用来平衡分区的分布在整个集群的驱动器上。这是很有用的,例如,当不同大小的驱动器在一个集群使用。设备权重也可使用添加或删除容量,控制多少分区重新分配,控制再平衡过程中允许在带宽范围最大速度复制。

Note

 Prior to Swift 2.1.0 it was not possible to restrict partition movement bydevice weight when adding new failure domains, and would allow extremelyunbalanced rings. The greedy dispersion algorithm is now subject to theconstraints of the physical capacity in the system, but can be adjusted with-inreason via the overload option. Artificially unbalancing the partitionassignment without respect to capacity can introduce unexpected full deviceswhen a given failure domain does not physically support its share of the usedcapacity in the tier.

When partitions need to be moved around (for example if adevice is added to the cluster), the ring ensures that a minimum number ofpartitions are moved at a time, and only one replica of a partition is moved ata time.

swift安装的时候,环的虚节点会均衡地划分到所有的设备中。当虚节点需要移动时(例如新设备被加入到集群),环会确保一次移动最少数量的虚节点数,并且一次只移动一个虚节点的一个副本。

The ring is used by the Proxy server and severalbackground processes (like replication).

ring被代理服务器和一些后台程序使用(如replication)。

Storage Policies存储策略

Storage Policies provide a way for object storageproviders to differentiate service levels, features and behaviors of a Swiftdeployment. Each Storage Policy configured in Swift is exposed to the clientvia an abstract name. Each device in the system is assigned to one or moreStorage Policies. This is accomplished through the use of multiple objectrings, where each Storage Policy has an independent object ring, which mayinclude a subset of hardware implementing a particular differentiation.

存储策略为对象存储提供了一种方法来区分服务级别、功能和行为的快速部署。每一个存储策略通过客户端来配置。在系统中的每个设备被分配给一个或多个存储策略。这是通过使用多个对象的环,其中每个存储策略有一个独立的对象环。

For example, one might have the default policy with 3xreplication, and create a second policy which, when applied to new containersonly uses 2x replication. Another might add SSDs to a set of storage nodes andcreate a performance tier storage policy for certain containers to have theirobjects stored there. Yet another might be the use of Erasure Coding to definea cold-storage tier.

例如,默认的副本策略是3份,当创建新的副本策略2份时,当应用到新的容器,仅用2份复制。另一个可能添加固态硬盘到一组存储节点,创建一个性能分层策略用于特定的容器存储。另一种可能是使用Erasure Coding来定义一个冷存储层。

This mapping is then exposed on a per-container basis,where each container can be assigned a specific storage policy when it iscreated, which remains in effect for the lifetime of the container.Applications require minimal awareness of storage policies to use them; once acontainer has been created with a specific policy, all objects stored in itwill be done so in accordance with that policy.

此映射是然后在每个容器基础上暴露的,其中每个容器可以在创建时指定一个特定的存储策略,跟随容器的生命周期。应用程序需要最小的存储策略来使用它们,一旦一个容器被创建了一个特定的策略,所有存储的对象都将按照该策略进行。

The Storage Policies feature is implemented throughoutthe entire code base so it is an important concept in understanding Swiftarchitecture.

存储策略的功能是在整个代码库中实现的,因此它是理解Swift架构的一个重要概念。

Object Server 对象服务器

The Object Server is a very simple blob storage serverthat can store, retrieve and delete objects stored on local devices. Objectsare stored as binary files on the filesystem with metadata stored in the file’sextended attributes (xattrs). This requires that the underlying filesystemchoice for object servers support xattrs on files. Some filesystems, like ext3,have xattrs turned off by default.

对象服务器是一个简单的二进制大对象存储服务器,可以用来存储、检索和删除本地设备上的对象。在文件系统上,对象以二进制文件的形式存储,它的元数据存储在文件系统的扩展属性(xattrs)中。这要求用于对象服务器的文件系统需要支持文件有扩展属性。一些文件系统,如ext3,它的xattrs属性默认是关闭的。

Each object is stored using a path derived from theobject name’s hash and the operation’s timestamp. Last write always wins, andensures that the latest object version will be served. A deletion is alsotreated as a version of the file (a 0 byte file ending with ”.ts”, which standsfor tombstone). This ensures that deleted files are replicated correctly andolder versions don’t magically reappear due to failure scenarios.

每个对象使用对象名称的哈希值和操作的时间戳组成的路径来存储。最后一次写操作总可以成功,并确保最新一次的对象版本将会被处理。删除也被视为文件的一个版本(一个以".ts"结尾的0字节文件,ts表示墓碑)。这确保了被删除的文件被正确地复制并且不会因为遭遇故障场景导致早些的版本神奇再现。

Container Server 容器服务器

The Container Server’s primary job is to handle listingsof objects. It doesn’t know where those object’s are, just what objects are ina specific container. The listings are stored as sqlite database files, andreplicated across the cluster similar to how objects are. Statistics are alsotracked that include the total number of objects, and total storage usage forthat container.

容器服务器的首要工作是处理对象的列表。容器服务器并不知道对象存在哪,只知道指定容器里存的哪些对象。这些对象信息以sqlite数据库文件的形式存储,和对象一样在集群上做类似的备份。容器服务器也做一些跟踪统计,比如对象的总数,容器的使用情况。

Account Server 帐户服务器

The Account Server is very similar to the ContainerServer, excepting that it is responsible for listings of containers rather thanobjects.

帐户服务器与容器服务器非常相似,除了它是负责容器的列表而不是对象。

Replication 复制

Replication is designed to keep the system in aconsistent state in the face of temporary error conditions like network outagesor drive failures.

复制是设计在面临如网络中断或者驱动器故障等临时性故障情况时来保持系统的一致性。

The replication processes compare local data with eachremote copy to ensure they all contain the latest version. Object replicationuses a hash list to quickly compare subsections of each partition, andcontainer and account replication use a combination of hashes and shared highwater marks.

复制进程将本地数据与每个远程拷贝比较以确保它们都包含有最新的版本。对象复制使用一个哈希列表来快速地比较每个虚节点的子段,容器和帐号的复制使用哈希值和共享的高水位线的组合进行版本比较。

Replication updates are push based. For objectreplication, updating is just a matter of rsyncing files to the peer. Accountand container replication push missing records over HTTP or rsync wholedatabase files.

复制更新基于推模式的。对于对象的复制,更新只是使用rsync同步文件到对等节点。帐号和容器的复制通过HTTPrsync来推送整个数据库文件上丢失的记录。

The replicator also ensures that data is removed from thesystem. When an item (object, container, or account) is deleted, a tombstone isset as the latest version of the item. The replicator will see the tombstoneand ensure that the item is removed from the entire system.

复制器也确保数据已从系统中移除。当有一项(对象、容器、或者帐号)被删除,则一个墓碑文件被设置作为该项的最新版本。复制器将会检测到该墓碑文件并确保将它从整个系统中移除。

Reconstruction 重建

The reconstructor is used by Erasure Code policies and isanalogous to the replicator for Replication type policies. See Erasure Code Support for complete information on both Erasure Code support as well as thereconstructor.

重建仅用于Erasure Code,类似于复制型政策的复制。重构和ErasureCode更多的信息请看Erasure Code Support 

Updaters 更新器

There are times when container or account data can not beimmediately updated. This usually occurs during failure scenarios or periods ofhigh load. If an update fails, the update is queued locally on the filesystem,and the updater will process the failed updates. This is where an eventualconsistency window will most likely come in to play. For example, suppose acontainer server is under load and a new object is put in to the system. Theobject will be immediately available for reads as soon as the proxy serverresponds to the client with success. However, the container server did notupdate the object listing, and so the update would be queued for a laterupdate. Container listings, therefore, may not immediately contain the object.

在一些情况下,容器或帐号中的数据不会被立即更新。这种情况经常发生在系统故障或者是高负荷的情况下。如果更新失败,该次更新在本地文件系统上会被加入队列,然后更新器会继续处理这些失败了的更新工作。最终,一致性窗口将会起作用。例如,假设一个容器服务器处于负荷之下,此时一个新的对象被加入到系统。当代理服务器成功地响应客户端的请求,这个对象将变为直接可用的。但是容器服务器并没有更新对象列表,因此此次更新将进入队列等待延后的更新。所以,容器列表不可能马上就包含这个新对象。

In practice, the consistency window is only as large asthe frequency at which the updater runs and may not even be noticed as theproxy server will route listing requests to the first container server whichresponds. The server under load may not be the one that serves subsequentlisting requests – one of the other two replicas may handle the listing.

在实际使用中,一致性窗口的大小和更新器的运行频度一致,因为代理服务器会转送列表请求给第一个响应的容器服务器,所以可能不会被注意到。当然,负载下的服务器不应该再去响应后续的列表请求,其他2个副本中的一个应该处理这些列表请求。

Auditors审计器

Auditors crawl the local server checking the integrity ofthe objects, containers, and accounts. If corruption is found (in the case ofbit rot, for example), the file is quarantined, and replication will replacethe bad file from another replica. If other errors are found they are logged(for example, an object’s listing can’t be found on any container server itshould be).

审计器会在本地服务器上反复地爬取来检测对象、容器、帐号的完整性。一旦发现不完整的数据(例如,发生了bit rot的情况:可能改变代码),该文件就会被隔离,然后复制器会从其他的副本那里把问题文件替换。如果其他错误出现(比如在任何一个容器服务器中都找不到所需的对象列表),还会记录进日志。

 

 

你可能感兴趣的:(Swift,OpenStack,swift)