OpenStack Object Storage(Swift)是OpenStack开源云计算项目的子项目之一,被称为对象存储,提供了强大的扩展性、冗余和持久性。本文将从架构、原理和实践等几方面讲述Swift。 Swift并不是文件系统或者实时的数据存储系统,它称为对象存储,用于永久类型的静态数据的长期存储,这些数据可以检索、调整,必要时进行更新。最适合存储的数据类型的例子是虚拟机镜像、图片存储、邮件存储和存档备份。因为没有中心单元或主控结点,Swift提供了更强的扩展性、冗余和持久性。Swift前身是Rackspace Cloud Files项目,随着Rackspace加入到OpenStack社区,于2010年7月贡献给OpenStack,作为该开源项目的一部分。Swift目前的最新版本是OpenStack Essex 1.5.1。
Swift功能
Swift提供的服务与AWS S3相同,可以用以下用途:
•作为IaaS的存储服务
•与OpenStack Compute对接,为其存储镜像
•文档存储
•存储需要长期保存的数据,例如log
•存储网站的图片,缩略图等
Swift使用RESTful API对外提供服务,目前 1.4.6版本所提供的功能: •Account(存储账户)的GET、HEAD
•Container(存储容器,与S3的bucket相同)的GET、PUT、HEAD、DELETE
•Object(存储对象)的GET、PUT、HEAD、DELETE、DELETE
•Account、Container、Object的元数据支持
•大文件(无上限,单个无文件最大5G,大于5G的文件在客户端切分上传,并上传manifest文件)、
•访问控制、权限控制
•临时对象存储(过期对象自动删除)
•存储请求速率限制
•临时链接(让任何用户访问对象,不需要使用Token)
•表单提交(直接从HTML表单上传文件到Swift存储,依赖与临时链接)
•静态WEB站点(用Swift作为静态站点的WEB服务器)
Swift特性
在OpenStack官网中,列举了Swift的20多个特性,其中最引人关注的是以下几点。
极高的数据持久性
一些朋友经常将数据持久性(Durability)与系统可用性(Availability)两个概念混淆,前者也理解为数据的可靠性,是指数据存储到系统中后,到某一天数据丢失的可能性。例如Amazon S3的数据持久性是11个9,即如果存储1万(4个0)个文件到S3中,1千万(7个0)年之后,可能会丢失其中1个文件。那么Swift能提供多少个9的SLA呢?下文会给出答案。针对Swift在新浪测试环境中的部署,我们从理论上测算过,Swift在5个Zone、5×10个存储节点的环境下,数据复制份是为3,数据持久性的SLA能达到10个9。
完全对称的系统架构
“对称”意味着Swift中各节点可以完全对等,能极大地降低系统维护成本。
无限的可扩展性
这里的扩展性分两方面,一是数据存储容量无限可扩展;二是Swift性能(如QPS、吞吐量等)可线性提升。因为Swift是完全对称的架构,扩容只需简单地新增机器,系统会自动完成数据迁移等工作,使各存储节点重新达到平衡状态。
无单点故障
在互联网业务大规模应用的场景中,存储的单点一直是个难题。例如数据库,一般的HA方法只能做主从,并且“主”一般只有一个;还有一些其他开源存储系统的实现中,元数据信息的存储一直以来是个头痛的地方,一般只能单点存储,而这个单点很容易成为瓶颈,并且一旦这个点出现差异,往往能影响到整个集群,典型的如HDFS。而Swift的元数据存储是完全均匀随机分布的,并且与对象文件存储一样,元数据也会存储多份。整个Swift集群中,也没有一个角色是单点的,并且在架构和设计上保证无单点业务是有效的。
简单、可依赖
简单体现在架构优美、代码整洁、实现易懂,没有用到一些高深的分布式存储理论,而是很简单的原则。可依赖是指Swift经测试、分析之后,可以放心大胆地将Swift用于最核心的存储业务上,而不用担心Swift捅篓子,因为不管出现任何问题,都能通过日志、阅读代码迅速解决。
Swift架构概述
Swift主要有三个组成部分:Proxy Server、Storage Server和Consistency Server。其架构如图1所示,其中Storage和Consistency服务均允许在Storage Node上。Auth认证服务目前已从Swift中剥离出来,使用OpenStack的认证服务Keystone,目的在于实现统一OpenStack各个项目间的认证管理。
主要组件
Proxy Server
Proxy Server是提供Swift API的服务器进程,负责Swift其余组件间的相互通信。对于每个客户端的请求,它将在Ring中查询Account、Container或Object的位置,并且相应地转发请求。Proxy提供了Rest-full API,并且符合标准的HTTP协议规范,这使得开发者可以快捷构建定制的Client与Swift交互。
Storage Server
Storage Server提供了磁盘设备上的存储服务。在Swift中有三类存储服务器:Account、Container和Object。其中Container服务器负责处理Object的列表,Container服务器并不知道对象存放位置,只知道指定Container里存的哪些Object。这些Object信息以sqlite数据库文件的形式存储。Container服务器也做一些跟踪统计,例如Object的总数、Container的使用情况。
Consistency Servers
在磁盘上存储数据并向外提供Rest-ful API并不是难以解决的问题,最主要的问题在于故障处理。Swift的Consistency Servers的目的是查找并解决由数据损坏和硬件故障引起的错误。主要存在三个Server:Auditor、Updater和Replicator。 Auditor运行在每个Swift服务器的后台持续地扫描磁盘来检测对象、Container和账号的完整性。如果发现数据损坏,Auditor就会将该文件移动到隔离区域,然后由Replicator负责用一个完好的拷贝来替代该数据。图2给出了隔离对象的处理流图。 在系统高负荷或者发生故障的情况下,Container或账号中的数据不会被立即更新。如果更新失败,该次更新在本地文件系统上会被加入队列,然后Updaters会继续处理这些失败了的更新工作,其中由Account Updater和Container Updater分别负责Account和Object列表的更新。 Replicator的功能是处理数据的存放位置是否正确并且保持数据的合理拷贝数,它的设计目的是Swift服务器在面临如网络中断或者驱动器故障等临时性故障情况时可以保持系统的一致性。
This document shows a cluster using the following types of nodes:
This document refers to two networks. An external network for connecting to the Proxy server, and a storage network that is not accessibile from outside the cluster, to which all of the nodes are connected. All of the Swift services, as well as the rsync daemon on the Storage nodes are configured to listen on their STORAGE_LOCAL_NET IP addresses.
Install the baseline Ubuntu Server 10.04 LTS on all nodes.
Install common Swift software prereqs:
apt-get install python-software-properties
add-apt-repository ppa:swift-core/release
apt-get update
apt-get install swift python-swiftclient openssh-server
Create and populate configuration directories:
mkdir -p /etc/swift
chown -R swift:swift /etc/swift/
On the first node only, create /etc/swift/swift.conf:
cat >/etc/swift/swift.conf <<EOF
[swift-hash]
# random unique strings that can never change (DO NOT LOSE)
swift_hash_path_prefix = `od -t x8 -N 8 -A n </dev/random`
swift_hash_path_suffix = `od -t x8 -N 8 -A n </dev/random`
EOF
On the second and subsequent nodes: Copy that file over. It must be the same on every node in the cluster!:
scp firstnode.example.com:/etc/swift/swift.conf /etc/swift/
Publish the local network IP address for use by scripts found later in this documentation:
export STORAGE_LOCAL_NET_IP=10.1.2.3
export PROXY_LOCAL_NET_IP=10.1.2.4
Create directory /var/run/swift and change the ownership to the user and groupwhich Swift services will run under. Since the directory is only needed for runtime,when system shuts down, the directory will be gone. It is necessary to havethe directory recreated when system is restarted. To do that, also add thefollowing lines into /etc/rc.local before line “exit 0”.:
mkdir -p /var/run/swift
chown swift:swift /var/run/swift
Create directory /var/cache/swift and /srv/node. Change the ownership of thedirectory /var/cache/swift to the user and group which Swift account, containeror object services will run under. These directories are needed only for storagenode (account, container or object server). The ownership of /srv/node shouldbe root:root, this is to ensure that when storage disks unmounted unexpectedly,the objects for swift will not be created in the directory /srv/node. If you havea node only runs proxy server, you can skip this step.:
mkdir -p /var/cache/swift /srv/node/
chown swift:swift /var/cache/swift
Note
It is assumed that all commands are run as the root user
Install swift-proxy service:
apt-get install swift-proxy memcached
Create self-signed cert for SSL:
cd /etc/swift
openssl req -new -x509 -nodes -out cert.crt -keyout cert.key
Note
If you don’t create the cert files, Swift silently uses http internally rather than https. This document assumes that you have createdthese certs, so if you’re following along step-by-step, create them. In aproduction cluster, you should terminate SSL before the proxy server. SSLsupport is provided for testing purposes only.
Modify memcached to listen on the default interfaces. Preferably this should be on a local, non-public network. Edit the IP address in /etc/memcached.conf, for example:
perl -pi -e "s/-l 127.0.0.1/-l $PROXY_LOCAL_NET_IP/" /etc/memcached.conf
Restart the memcached server:
service memcached restart
Create /etc/swift/proxy-server.conf:
cat >/etc/swift/proxy-server.conf <<EOF
[DEFAULT]
cert_file = /etc/swift/cert.crt
key_file = /etc/swift/cert.key
bind_port = 8080
workers = 8
user = swift
[pipeline:main]
pipeline = healthcheck proxy-logging cache tempauth proxy-logging proxy-server
[app:proxy-server]
use = egg:swift#proxy
allow_account_management = true
account_autocreate = true
[filter:proxy-logging]
use = egg:swift#proxy_logging
[filter:tempauth]
use = egg:swift#tempauth
user_system_root = testpass .admin https://$PROXY_LOCAL_NET_IP:8080/v1/AUTH_system
[filter:healthcheck]
use = egg:swift#healthcheck
[filter:cache]
use = egg:swift#memcache
memcache_servers = $PROXY_LOCAL_NET_IP:11211
EOF
Note
If you run multiple memcache servers, put the multiple IP:port listingsin the [filter:cache] section of the proxy-server.conf file like:10.1.2.3:11211,10.1.2.4:11211. Only the proxy server uses memcache.
Create the account, container and object rings. The builder command is basically creating a builder file with a few parameters. The parameter with the value of 18 represents 2 ^ 18th, the value that the partition will be sized to. Set this “partition power” value based on the total amount of storage you expect your entire ring to use. The value of 3 represents the number of replicas of each object, with the last value being the number of hours to restrict moving a partition more than once.
cd /etc/swift
swift-ring-builder account.builder create 18 3 1
swift-ring-builder container.builder create 18 3 1
swift-ring-builder object.builder create 18 3 1
Note
For more information on building rings, see The Rings.
For every storage device in /srv/node on each node add entries to each ring:
export ZONE= # set the zone number for that storage device
export STORAGE_LOCAL_NET_IP= # and the IP address
export WEIGHT=100 # relative weight (higher for bigger/faster disks)
export DEVICE=sdb1
swift-ring-builder account.builder add z$ZONE-$STORAGE_LOCAL_NET_IP:6002/$DEVICE $WEIGHT
swift-ring-builder container.builder add z$ZONE-$STORAGE_LOCAL_NET_IP:6001/$DEVICE $WEIGHT
swift-ring-builder object.builder add z$ZONE-$STORAGE_LOCAL_NET_IP:6000/$DEVICE $WEIGHT
Note
Assuming there are 5 zones with 1 node per zone, ZONE should start at1 and increment by one for each additional node.
Verify the ring contents for each ring:
swift-ring-builder account.builder
swift-ring-builder container.builder
swift-ring-builder object.builder
Rebalance the rings:
swift-ring-builder account.builder rebalance
swift-ring-builder container.builder rebalance
swift-ring-builder object.builder rebalance
Note
Rebalancing rings can take some time.
Copy the account.ring.gz, container.ring.gz, and object.ring.gz filesto each of the Proxy and Storage nodes in /etc/swift.
Make sure all the config files are owned by the swift user:
chown -R swift:swift /etc/swift
Start Proxy services:
swift-init proxy start
Note
Swift should work on any modern filesystem that supportsExtended Attributes (XATTRS). We currently recommend XFS as itdemonstrated the best overall performance for the swift use case afterconsiderable testing and benchmarking at Rackspace. It is also theonly filesystem that has been thoroughly tested. These instructionsassume that you are going to devote /dev/sdb1 to an XFS filesystem.
Install Storage node packages:
apt-get install swift-account swift-container swift-object xfsprogs
For every device on the node, setup the XFS volume (/dev/sdb is usedas an example), add mounting option inode64 when your disk is bigger than1TB to archive a better performance.:
fdisk /dev/sdb (set up a single partition)
mkfs.xfs -i size=512 /dev/sdb1
echo "/dev/sdb1 /srv/node/sdb1 xfs noatime,nodiratime,nobarrier,logbufs=8 0 0" >> /etc/fstab
mkdir -p /srv/node/sdb1
mount /srv/node/sdb1
chown swift:swift /srv/node/sdb1
Create /etc/rsyncd.conf:
cat >/etc/rsyncd.conf <<EOF
uid = swift
gid = swift
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
address = $STORAGE_LOCAL_NET_IP
[account]
max connections = 2
path = /srv/node/
read only = false
lock file = /var/lock/account.lock
[container]
max connections = 2
path = /srv/node/
read only = false
lock file = /var/lock/container.lock
[object]
max connections = 2
path = /srv/node/
read only = false
lock file = /var/lock/object.lock
EOF
Edit the RSYNC_ENABLE= line in /etc/default/rsync:
perl -pi -e 's/RSYNC_ENABLE=false/RSYNC_ENABLE=true/' /etc/default/rsync
Start rsync daemon:
service rsync start
Note
The rsync daemon requires no authentication, so it should be run ona local, private network.
Create /etc/swift/account-server.conf:
cat >/etc/swift/account-server.conf <<EOF
[DEFAULT]
bind_ip = $STORAGE_LOCAL_NET_IP
workers = 2
[pipeline:main]
pipeline = account-server
[app:account-server]
use = egg:swift#account
[account-replicator]
[account-auditor]
[account-reaper]
EOF
Create /etc/swift/container-server.conf:
cat >/etc/swift/container-server.conf <<EOF
[DEFAULT]
bind_ip = $STORAGE_LOCAL_NET_IP
workers = 2
[pipeline:main]
pipeline = container-server
[app:container-server]
use = egg:swift#container
[container-replicator]
[container-updater]
[container-auditor]
[container-sync]
EOF
Create /etc/swift/object-server.conf:
cat >/etc/swift/object-server.conf <<EOF
[DEFAULT]
bind_ip = $STORAGE_LOCAL_NET_IP
workers = 2
[pipeline:main]
pipeline = object-server
[app:object-server]
use = egg:swift#object
[object-replicator]
[object-updater]
[object-auditor]
EOF
Start the storage services. If you use this command, it will try to startevery service for which a configuration file exists, and throw a warningfor any configuration files which don’t exist:
swift-init all start
Or, if you want to start them one at a time, run them as below.Note that if the server program in question generates any output on itsstdout or stderr, swift-init has already redirected the command’s outputto /dev/null. If you encounter any difficulty, stop the server and run itby hand from the command line. Any server may be started using“swift-$SERVER-$SERVICE /etc/swift/$SERVER-config”, where $SERVER mightbe object, continer, or account, and $SERVICE might be server,replicator, updater, or auditor.
swift-init object-server start
swift-init object-replicator start
swift-init object-updater start
swift-init object-auditor start
swift-init container-server start
swift-init container-replicator start
swift-init container-updater start
swift-init container-auditor start
swift-init account-server start
swift-init account-replicator start
swift-init account-auditor start
You run these commands from the Proxy node.
Get an X-Storage-Url and X-Auth-Token:
curl -k -v -H 'X-Storage-User: system:root' -H 'X-Storage-Pass: testpass' https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0
Check that you can HEAD the account:
curl -k -v -H 'X-Auth-Token: <token-from-x-auth-token-above>' <url-from-x-storage-url-above>
Check that swift works (at this point, expect zero containers, zero objects, and zero bytes):
swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U system:root -K testpass stat
Use swift to upload a few files named ‘bigfile[1-2].tgz’ to a container named ‘myfiles’:
swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U system:root -K testpass upload myfiles bigfile1.tgz
swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U system:root -K testpass upload myfiles bigfile2.tgz
Use swift to download all files from the ‘myfiles’ container:
swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U system:root -K testpass download myfiles
Use swift to save a backup of your builder files to a container named ‘builders’. Very important not to lose your builders!:
swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U system:root -K testpass upload builders /etc/swift/*.builder
Use swift to list your containers:
swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U system:root -K testpass list
Use swift to list the contents of your ‘builders’ container:
swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U system:root -K testpass list builders
Use swift to download all files from the ‘builders’ container:
swift -A https://$PROXY_LOCAL_NET_IP:8080/auth/v1.0 -U system:root -K testpass download builders