Redis requires that the whole dataset be loaded into main memory at all times. (Redis Virtual Memory, which we’ll discuss later, relaxes this requirement, but still needs all keys to always be in memory.). Guaranteed in-memory access to most of the dataset is Redis' main performance driver — and is also responsible for creating its main limitations.
RAM is the gold of cloud computing. (Cloud servers are primarily priced based on the amount of available RAM. By comparison, disk and CPU are cheap.)
The amount of RAM that Redis needs is proportional to the size of the dataset. Large datasets in Redis are going to be fast, but expensive.
Redis persistence is highly configurable but the implementation makes extremely heavy use of I/O resources. Furthermore, most save operations require additional memory to complete successfully, and, in some cases, asynchronous saves can block the server for lengthy periods of time. (These points are discussed in more detail, below; see thePersistence section.)
Redis' internal design typically trades off memory for speed. For some workloads, there can be an order of magnitude difference between the raw number of bytes handed off to Redis to store, and the amount of memory that Redis uses.
Regardless of the data type, the data is always identified by a key, and the key is always a string.
For example, using the string data type:
redis> SET foo bar
OK
redis> GET foo
"bar"
redis> GET dne
(nil)
Keys can be marked for expiry. For example:
redis> EXPIRE foo 2
(integer) 1
After waiting for 2 seconds:
redis> GET foo
(nil)
So far, this looks quite similar to memcached (GET/SET API, in-memory storage, etc.). However, there are a few important things to note:
Each of these points will be addressed in more detail, below.
Redis' replication capabilities are powerful yet straightforward.
A master can have any number of slaves, and each slave can have any number of their own slaves, and so on and so forth. Any topology is possible.
To point a slave to a specific master, issue the SLAVEOF
command on the slave. Slaves will block until the initial synchronization with the master is complete.
This initial synchronization process consists of the master asynchronously snapshotting the current state of the database, then transferring the snapshot to the slave, and then subsequently streaming all commands received after initiating the snapshot.
Redis has configurable persistence settings, enabling durability to be tweaked depending on the problem domain.
If durability is not important:
Redis can be configured in “snapshotting mode”. In this mode, Redis saves a binary dump of the contents of the database every x
seconds or every y
operations. If one of these criteria are met, Redis forks the process. The child process writes the dump file to disk while the master continues to service requests.
This procedure can be memory-efficient due to the way that Copy-On-Write works when forking. (Here, a snapshot of the database is saved as it existed exactly at the time of forking; extra memory is required only to store the keys that change during the snapshot procedure. If every key changes in value over the course of the snapshot, then roughly 2x the amount of memory used by Redis before the save is required to complete the save operation. This is the upper bound on the memory usage required for saving.)
Of course, in this mode, any data that is not written in the snapshot is immediately lost if the server is killed.
If durability is important:
Redis can be configured to use an Append-Only File (AOF). Here, every command is written to a file. To recover from a crash or other server restart, the append-only file is replayed. There are three modes:
fsync()
on every new commandfsync()
every secondfysnc()
Using the BGREWRITEAOF
command, Redis will update the snapshot and re-write the Append-Only File to shorten it. Like snapshotting, this is done asynchronously, in the background.
More advanced configurations:
Persistence can be turned off completely. This is useful in a number of scenarios.
For example, if performance is very critical and your application demands extremely tight control over RAM usage, the following configuration is possible:
The advantage of this set-up is that it requires no extra memory to complete a save, regardless of the number and frequency of writes. In this way, you are trading off durability for extremely tight control over memory usage.
No extra memory is required to complete the save because the SAVE
command performs a synchronous save operation, thereby blocking the server that the command is issued against until the saving process completes. (Asynchronous saves, as discussed above, require extra memory proportional to the number of writes performed during the save.)
Other variations on this theme are possible, for example AOF can be enabled on the slave only while persistence remains off on the master.
The binary dumps (i.e. those produced by the snapshot operations) are stored in a very efficient manner on disk.
Once a binary dump is loaded, Redis will use several factors more memory than the on-disk representation requires. The exact factor increase depends primarily on the data types that are in use. For example, Sorted Sets use significantly more memory than Sets, even though both data structures require similar amounts of space when serialized to disk.
This is expected behaviour, given that Redis optimizes heavily both read and write performance.
Note that optimizations are continually being made to reduce the amount of memory required to represent each of the data types in memory.
Redis exhibits the following issues with persistence:
Most save operations require additional memory to complete successfully (as previously discussed). Depending on the size of the dataset, the frequency of writes, and the amount of RAM you are comfortable reserving, this may or may not be an issue.
Redis persistence requires extremely heavy I/O usage. This is discussed in detailhere. Also see Salvatore’s response.
In some cases, asynchronous saves can block the server for lengthy periods of time. See this post on the mailing list for an interesting discussion.
Although the issues with Redis persistence are hard problems to solve, the issues are beginning to be discussed at length. We should continue to see improvements in this area.