Built on extensive research and development, combined with active feedback from a thriving open source community, Amanda Enterprise (AE) 3.3 is here! AE 3.3 has significant architecture and feature updates and is a robust, scalable and feature-rich platform that meets the backup needs of heterogeneous environments, across Linux, Windows, OS X and Solaris-based systems.
As we worked to further develop Amanda Enterprise, it was important to us that the architecture and feature updates would provide better control and management for backup administration. Our main goal was to deliver a scalable platform which enables you to perform and manage backups your way.
Advanced Cloud Backup Management: AE 3.3 now supports use of many new and popular cloud storage platforms as backup repositories. We have also added cloud backup features to give users more control over their backups for speed and data priority.
Backup Storage Devices Supported by Amanda Enterprise 3.3
Platforms supported now include Amazon S3, Google Cloud Storage, HP Cloud Storage, Japan’s IIJ GIO Storage Service, and private and public storage clouds built on OpenStack Swift. Notably, AE 3.3 supports all current Amazon S3 locations including various locations in US (including GovCloud), EU, Asia, Brazil and Australia.
Cloud Storage Locations Supported by Amanda Enterprise
In addition to new platforms, now, you can control how many parallel backup (upload) or restore (download) streams you want based on your available bandwidth. You can even throttle upload or download speeds per backup set level; for example, you can give higher priority to the backup of your more important data.
Optimized SQL Server and Exchange Backups: If you are running multiple SQL Server or Exchangedatabases on a Windows server, AE 3.3 allows selective backup or recovery of an individual database. This enables you to optimize the use of your backup resources by selecting only the databases you want to back up, or to improve recovery time by enabling recovery of a selected database. Of course, the ability to do an express backup and recovery of all databases on a server is still available.
Further optimizing, Zmanda Management Console (which is the GUI for Amanda Enterprise) now automatically discovers databases on a specific Windows server, allowing you to simply pick and choose those you want to backup.
Improved Virtual Tape and Physical Tape Management: Our developers have done extensive work in this area to enhance usability, including seamless management of available disk space. With extensive concurrency added to the Amanda architecture, you can eliminate using the staging disk for backup-to-disk configurations. AE 3.3 will write parallel streams of backups directly to disk without going through the staging disk. You can also choose to optionally configure staging disk for backup to tapes or clouds to improve fault tolerance and data streaming.
Better Fault Tolerance: When backing up to tapes, AE 3.3 can automatically withstand the failure of a tape drive. By simply configuring a backup set to be able to use more than one tape drive in your tape library, if any of the tape drives is not available, AE will automatically start using one of the available drives.
NDMP Management Improvements: AE 3.3 allows for selective restore of a file or a directory from aNetwork Data Management Protocol (NDMP) based backup. Now, you can also recover to an alternative path or an alternative filer directly from the GUI. Support for compression and encryption for NDMP based backups has also been added to the GUI. Plus, in addition to devices from NetApp and Oracle, AE now also supports NDMP enabled devices from EMC.
Scalability, Concurrency and Parallelism: Many more operations can now be executed in parallel. For example, you can run a restore operation, while active backups are in progress. Parallelism also has been added in various operations including backup to disk, cloud and tapes.
Expanded Platform Support: Our goal is to provide a backup solution which supports all of the key platforms deployed in today’s data centers. We have updated AE 3.3 to support latest versions of Windows Server, Red Hat Enterprise Linux, CentOS, Fedora, Ubuntu, Debian and OS X. With AE, you have flexibility of choosing the platforms best suited for each application in your environment – without having to worry about the backup infrastructure.
There are many new enhancements to leverage! To help you dive in, we hosted a live demonstration of Amanda Enterprise 3.3. The session provides insights on best practices for setting up a backup configuration for a modern data center.
Posted in Chander Kant, Cloud Backup, Network Backup and Recovery, Open Source, OpenStack Swift |No Comments »
During the OpenStack Folsom Design Summit in April 2012, there was an interesting workshop discussion on Swift Quota. This topic has been actively and formally discussed in many forums (Link1,Link2) and also regarded as one of the blueprints in OpenStack Swift. Here are some of our key takeaways and insights on what this means for your storage cloud.
Swift Quota: Business Values
The business value of implementing Swift Quota is two-fold:
(1) Protect the Cluster: Cloud operators can conveniently set some effective limits, (e.g. limit on the number of objects per container), to protect the Swift cluster from many malicious behaviors, for example, creating millions of 0-byte objects to slow down the container database, or creating thousands of empty containers to overload the account database.
(2) Manage Storage Capacity: Cloud storage providers can sell their cloud storage capacity upfront, which is similar to the Amazon EC2 reserved instance price model: the provider can sell a fixed amount of storage capacity (e.g., 1TB) to a customer by setting up a capacity limit for that customer and would not be concerned with how the customer uses the storage capacity (e.g., use 100% capacity all the time, or use 50% capacity today and 95% capacity next month). The vendor will simply charge the customer based on the fixed amount of storage capacity (and possibly other resource usages, such as the number of PUT, GET and DELETE operations) and would not have to precisely track and calculate how much storage capacity is used by a customer on an on-going basis.
In summary, the reason Swift Quota is interesting to the cloud storage operators and providers is that it enables effective and robust resource (e.g. capacity) management and improves the overall usability of the Swift-based storage cloud.
Today, we would like to introduce an interesting Swift Quota project that we have been focusing on and which has been used in StackLab – a production public cloud for users to try out OpenStack for free. (Details about StackLab can be found at http://freedomhui.com/stacklab/
Swift Quota Introduction
Swift Quota is a production-ready project that is mainly used for controlling the usage of account and containers in OpenStack Swift. In the current version of Swift Quota, the users can set up the quotas on the following three items:
(1) Number of containers per account (example: an account cannot have more than 5 containers)
(2) Number of objects per container (example: a container cannot have more than 100 objects)
(3) Storage capacity per container (example: the size of a container cannot be larger than 100 GB)
Swift Quota is implemented as the middle layer in Swift, so it is simple and straightforward to integrate and merge with the mainstream Swift code. The idea of Swift Quota is not to create new separate counters to keep track of the resources usages, but to utilize the existing metadata associated with the containers and accounts. So it is very lightweight in the production environment.
Swift Quota Installation
Before we go any further, we’d like to thank AlexYuYang for his contribution to this project. The project is available at Alex’s github repository.
To install Swift Quota, you either check out the modified Swift code from the github repository above (git clone git://github.com/AlexYangYu/StackLab-swift.git) and switch to the branch called “dev-quota” (git checkout dev-quota). Then you install the modified Swift software on the cluster nodes, or you need to follow the commit history to figure out which changes are new and then merge them to your existing Swift code base.
Configuration File
To enable Swift Quota, /etc/swift/proxy-server.conf should be adjusted as following (bold words/lines highlight the new configuration settings),
[pipeline:main]
pipeline = catch_errors cache token auth quota proxy-server
[filter:quota]
use = egg:swift#quota
cache_timeout = 30
# If set precise_mode = true, the quota middleware will disable the cache.
precise_mode = true
set log_name = quota
quota = {
“container_count”: {
“default”: 5,
“L1″: 10,
“L2″: 25
},
“object_count”: {
“default”: 200000,
“L1″: 500000,
“L2″: 1000000
},
“container_usage”: {
“default”: 2147483648,
“L1″: 10737418240,
“L2″: 53687091200
}
}
From the above configuration settings, for each of the three resource quotas, there are 3 levels of limits: default, L1 and L2. Here, we want to provide a flexible and configurable interface for the cloud operator (e.g., reseller_admin) to specify quota level for each account. For example, the cloud operator can assign “L1” level quota to one account and “L2” level quota to a different account. If the quota level is not clearly specified, all accounts will strictly follow the “default” quota level. Cloud operators are free to define as many quota levels as they want for their own use cases. Next, we will show how to specify the quota level for an account.
Assigning Quota Level to an Account
We assume only the reseller_admin can modify the quota level for an account, so make sure you have a reseller_admin login in your authentication system. For example,
[filter:tempauth]
use = egg:swift#tempauth
user_system_root = testpass .admin http://your_swift_ip:8080/v1/AUTH_system
user_reseller_reseller = reseller .reseller_admin http:// your_swift_ip:8080/v1/AUTH_reseller
Then, we use this curl command to retrieve the X-Auth-Token of the reseller_admin
curl -k -v -H ‘X-Storage-User: reseller:reseller’ -H ‘X-Storage-Pass: reseller’ http://your_swift_ip:8080/auth/v1.0
Next, we use this curl command to edit the quota level of an account, called “system”. For example,
curl -v -X POST http://your_swift_ip:8080/v1/AUTH_system -H ‘X-Auth-Token: your reseller_admin token’ -H ‘X-Account-Meta-Quota: L1′
Note that, in the above curl command, ‘X-Account-Meta-Quota: L1′ is to assign L1 level quota to the account called “system”
Similarly, the following curl command will update the quota level to L2
curl -v -X POST http://your_swift_ip:8080/v1/AUTH_system -H ‘X-Auth-Token: your reseller_admin token’ -H ‘X-Account-Meta-Quota: L2′
If everything works correctly, you will receive a “204 No Content” response from the server after you issue the above curl commands.
Trade-off between Cluster Performance and Quota Accuracy
It is possible to trigger a quota check upon each PUT request to guarantee that no quota violation is allowed. However, when hardware resources are in short supply and the workload becomes very intensive, the check upon each PUT request may affect the Swift cluster performance. So, in the current design of Swift Quota, there are two parameters, called precise_mode and cache_time under [filter:quota] in /etc/swift/proxy-server.conf, that can effectively balance the cluster performance and quota accuracy.
When precise_mode is set to true, cache_time is not effective and the Swift cluster will check the quota upon each PUT request by reading the current container and account usage from the server. However, when precise_mode is set to false, the Swift cluster will only read the container and account usage that is cached in the memory. cache_time will then decide how often the cached information is updated via reading it from the server.
Closing Comments
We are happy to see that the Swift Quota has been in production in StackLab environment for almost 6 months and we believe Swift Quota is a neat and clear design that will be adopted by more Swift users.
If you are thinking of putting together a storage cloud, or thinking of introducing Quota to your Swift cluster, we would love to discuss your challenges and share our observations. Please drop us a note at [email protected].
Posted in Ning Zhang, Open Source, OpenStack Swift | No Comments »
In a previous blog, we proposed a method to enable Cyberduck to work with Keystone-based Swift, which is to upgrade java-cloudfiles API to 2.0 in Cyberduck. We received lot of feedback on it, and we appreciate hearing your feedback. Today, we move one step forward and propose a more reliable and straightforward way to make your older Swift clients, such as Cyberduck, work with Keystone-based Swift.
The high-level idea of this new method is to add v1.0 authentication middleware in Keystone, while keeping the client, in this case Cyberduck, unchanged. Thanks to AlexYangYu for providing the v1.0 enabled Keystone code base; it’s available at:
https://github.com/AlexYangYu/StackLab-Ketystone/tree/dev-protocol-convertor
In case you still want to use your own version of Keystone, rather than removing it and using the Keystone from above location, you need to follow the steps below:
First, add the following files to your existing Keystone code base:
https://github.com/AlexYangYu/StackLab-Ketystone/commit/9e126d6716912e8822de3884c32f5b9509ef0994
Then, after incorporating the middleware to support v1.0 authentication in Keystone, you need to recompile and install the modified Keystone code base.
Next, change the keystone configuration file (/etc/keystone/keystone.conf) as follows (bold lines highlight the differences from the default keystone.conf)
[composite:main]
use = egg:Paste#urlmap
/v2.0 = public_api
/v1.0 = public_api_v1
/ = public_version_api
[pipeline:public_api_v1]
pipeline = protocol_converter token_auth admin_token_auth xml_body json_body debug ec2_extension public_service
[filter:protocol_converter]
paste.filter_factory = keystone.contrib.protocol_converter:ProtocolConverter.factory
Finally, you need to restart the keystone service.
To do this on the client side, you follow the standard configuration procedures traditionally used with v1.0 authentication. For Cyberduck, you can follow the steps here to set the Authenticate Context Path (ch.sudo.cyberduck cf.authentication.context /auth/v1.0).
We have verified this method on both PC and Mac platforms with the latest version of Cyberduck and other v1.0 authentication based Swift clients.
If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at [email protected].
Posted in Ning Zhang, Open Source, OpenStack Swift | No Comments »
We just came back from OpenStack Summit 2012 in San Diego. Summit was full of energy and rapid progress of OpenStack project, on both technical and business fronts, was palpable.
Our participation was focused around OpenStack Swift, and here are three notable sessions (including our own!) on the topic:
(1) COSBench: A Benchmark Tool for Cloud Object Storage Service: Folks from Intel presented how they designed and implemented a Cloud Storage benchmark tool, called COSBench (Cloud Object Storage Benchmark), for OpenStack Swift. In our previous blog, we briefly introduced COSBench and our expectation of this tool becoming the de facto Swift benchmarking tool in the future. In this session, the presenter also demonstrated how to use COSBench to analyze the bottleneck of a Swift cluster when it is under certain workload. The most promising point in this session is the indication that COSBench is going to be released to the open-source community. The slides for the session are available here.
(2) Building Applications with OpenStack Swift: In this very interesting talk from SwiftStack, a primer was provided on how to build web-based application on top of OpenStack Swift. The presentation team jumped into code-level to explain how to extend and customize Swift authentication and how to develop custom Swift middleware. The goal is to seamlessly support the integration between the web applications and Swift infrastructure. A very useful presentation for developers who are thinking of how to make applications for Swift.
(3) How swift is you Swift?: Goal of this presentation (from Zmanda) was to shed light on the provisioning problem for Swift infrastructure. We looked at almost every hardware and software component in Swift and discussed how to pick up the appropriate hardware and software settings for optimizing the upfront cost and performance. Besides, we also talked about the performance degradation when a failure (e.g. node or HDD failure) happens. Our slides are available here.
All in all the Summit was a great step forward in the evolution of Swift.
If you are thinking of putting together a storage cloud, we would love to discuss your challenges and share our observations. Please drop us a note at [email protected]
Posted in Ning Zhang, Open Source, OpenStack Swift | No Comments »
The OpenStack Swift project has been developing at a tremendous pace. The version 1.6.0 was released in August followed by 1.7.4 (Folsom) just after two months! In these two recent releases, many important features have also been implemented, for example the optimization for using SSD, object versioning, StatsD logging and much more – many of these features have significant implications for performance planning for the cloud builders and operators.
As an integral part of deploying a cloud storage platform based on OpenStack Swift, benchmarking a Swift cluster implementation is essential before the cluster is deployed for production use. Preferably the benchmark should simulate the eventual workload that the cluster will be subjected to.
In this blog, we discuss following Swift benchmarking concepts:
(1) Benchmark Dimensions for Swift cluster: performance, scalability and degraded-mode performance (e.g. when hardware and software failures happen).
(2) Sample workloads for Swift cluster
Benchmark Tools for Swift
There are currently two Swift benchmark tools available: swift-bench and COSBench.
swift-bench is a command-line benchmark tool that is shipped along with Swift distribution. Recently, we improved swift-bench to allow for random object sizes and better usability.
COSBench is a fairly new web-based benchmark tool, led by the researchers at Intel. Fortunately, we obtained a trial version of COSBench. Based on our initial experience with COSBench, we believe it represents a very helpful tool, and may become the the de facto Swift benchmarking tool in the future.
Benchmark Dimensions
Dimension 1 – Performance
The performance dimension is to measure the performance of the Swift cluster when it is under a certain load. The performance metrics can be specified in many ways. In most cases, the cloud operators will be interested in the following four performance metrics:
(1) The average throughput (number of operations per second)
(2) The average bandwidth (MB/s)
(3) The average response time of all requests.
(4) Response time for a certain percentage of requests (e.g. 95 percentile).
To measure the performance, we first need to populate a Swift cluster with some data (i.e. objects) to simulate an initial stage. The size of the initially loaded objects can be controlled by the inputs of the benchmark client. Subsequently, a pre-defined workload is executed against the Swift cluster while the performance is measured.
When measuring the performance, there is one key issue we need to pay attention to: First, we need to carefully adjust the number of threads because it determines how much workload the benchmark clients will generate against the Swift cluster. Since we want to measure the performance of the Swift cluster when it is under load or saturated, we need to increase the number of threads, until the point at which the bandwidth/throughput becomes stable and the average response time starts to increase very sharply.
As the number of threads increases, the benchmark client will get busier. We need to make sure that it has enough resources (CPU, memory, network bandwidth) to use and should not be the performance bottleneck.
While the performance of the client software (Cyberduck, Cloud Backup software etc.), that is connecting with Swift, is an important factor in the overall usability of the storage cloud, the scope of this blog is the performance of the storage cloud platform itself.
Dimension 2 – Scalability
The benchmark on scalability is to test if a Swift cluster can scale out gracefully by adding more servers and other resources. We can conduct this benchmark in the following steps: we proportionally add more servers for each type of node in the Swift cluster. For example, we double the number of the storage nodes and proxy nodes with the same hardware and software configurations. Then, we run the same workloads to measure the performance. If a Swift cluster can scale out nicely, then its bandwidth/throughput will be increased in proportion to the number of new servers we added in. Otherwise, the cloud operators should analyze what is the bottleneck to prevent it from scaling well.
To simulate a real-world scenario, we need to test the scalability of a Swift cluster while it is running. As suggested by a blog from SwiftStack, cloud operators may consider adding new servers gradually in order to avoid the performance degradation because of the data movement between the existing and new servers. During the measurement, we want to observe: (1) if the Swift cluster operates normally (i.e. no period of service disruption) and (2) the increase on performance when the new servers are added into the Swift cluster.
Dimension 3 – Degraded Mode Performance
The cloud operators will face hardware or software failures at some points. If their objective is to ensure that their clusters will perform at a certain level (e.g. abide by the performance SLA) even in face of the failures, they should benchmark their Swift cluster appropriately upfront.
The most straightforward way to measure the availability of a Swift cluster is to intentionally shut down some nodes and measure the number of errors (e.g. failed operations) and performance degradation when the Swift is running in the degraded mode.
There are some factors that increase the complexities of benchmarking the degraded Swift cluster. For example, the failures can happen at every possible system level. For example, I/O devices, OS, Swift processes or even the entire server. The impact of failures is different when they occur at different levels. So, the failure scenarios at all system levels need to be considered. Such as, to simulate a disk failure, we may intentionally umount the disk; To simulate a Swift process failure, we need to kill some or all Swift processes on a node; To simulate an OS or entire server failure, the server could be temporarily powered off; Or a whole zone could be powered off (to simulate power failure of an entire rack of servers).
By combining the above considerations together, we notice that the total problem space for analyzing all failure scenarios may be very huge for a large-scale Swift cluster. So, it is more practical to prioritize those failure scenarios. For example, only the worst scenarios or more common scenarios are evaluated first.
In our presentation at the coming OpenStack Summit, we will present our empirical results to show how a Swift cluster performs when the hardware failures occur.
Sample Workloads
The COSBench tool allows users to define a Swift workload based on the following two aspects: (1)range of the object sizes in the workload (e.g. from 1MB to 10MB). (2) the ratio of PUT, GET and DELETE operations (e.g. 1:8:1).
The object sizes in a workload may have certain distributions. For example, uniform, Zipfan and more. At this point, based on our experiences with COSBench, it assumes the object sizes are uniformly distributed within the pre-defined range. Plus, it assumes all objects have the equal possibility to be accessed by the GET operation. It may be a good direction for COSBench to add more choices on the distribution when the users want to specify the object size and access pattern.
In the following table, we provide some sample Swift workloads in the following table.