Redis, from the Ground Up(2)

Key Disadvantages

Redis requires that the whole dataset be loaded into main memory at all times. (Redis Virtual Memory, which we’ll discuss later, relaxes this requirement, but still needs all keys to always be in memory.). Guaranteed in-memory access to most of the dataset is Redis' main performance driver — and is also responsible for creating its main limitations.

RAM

RAM is the gold of cloud computing. (Cloud servers are primarily priced based on the amount of available RAM. By comparison, disk and CPU are cheap.)

The amount of RAM that Redis needs is proportional to the size of the dataset. Large datasets in Redis are going to be fast, but expensive.

Persistence

Redis persistence is highly configurable but the implementation makes extremely heavy use of I/O resources. Furthermore, most save operations require additional memory to complete successfully, and, in some cases, asynchronous saves can block the server for lengthy periods of time. (These points are discussed in more detail, below; see thePersistence section.)

Memory Bloat

Redis' internal design typically trades off memory for speed. For some workloads, there can be an order of magnitude difference between the raw number of bytes handed off to Redis to store, and the amount of memory that Redis uses.

Diving In

String Keys

Regardless of the data type, the data is always identified by a key, and the key is always a string.

For example, using the string data type:

    redis> SET foo bar
    OK
    redis> GET foo
    "bar"
    redis> GET dne
    (nil)

Expiry

Keys can be marked for expiry. For example:

    redis> EXPIRE foo 2
    (integer) 1

After waiting for 2 seconds:

    redis> GET foo
    (nil)

Sidenote: memcached?

So far, this looks quite similar to memcached (GET/SET API, in-memory storage, etc.). However, there are a few important things to note:

  1. Redis supports replication out of the box. Any sort of topology is possible, so you can create replication trees.
  2. Redis supports persistence, so you don’t lose everything that’s in memory when the server restarts.
  3. Redis supports a rich set of data types (far beyond memcached’s simple key-value-pairs).

Each of these points will be addressed in more detail, below.

Replication

Redis' replication capabilities are powerful yet straightforward.

A master can have any number of slaves, and each slave can have any number of their own slaves, and so on and so forth. Any topology is possible.

To point a slave to a specific master, issue the SLAVEOF command on the slave. Slaves will block until the initial synchronization with the master is complete.

This initial synchronization process consists of the master asynchronously snapshotting the current state of the database, then transferring the snapshot to the slave, and then subsequently streaming all commands received after initiating the snapshot.

Persistence

Redis has configurable persistence settings, enabling durability to be tweaked depending on the problem domain.

Options

If durability is not important:

Redis can be configured in “snapshotting mode”. In this mode, Redis saves a binary dump of the contents of the database every x seconds or every y operations. If one of these criteria are met, Redis forks the process. The child process writes the dump file to disk while the master continues to service requests.

This procedure can be memory-efficient due to the way that Copy-On-Write works when forking. (Here, a snapshot of the database is saved as it existed exactly at the time of forking; extra memory is required only to store the keys that change during the snapshot procedure. If every key changes in value over the course of the snapshot, then roughly 2x the amount of memory used by Redis before the save is required to complete the save operation. This is the upper bound on the memory usage required for saving.)

Of course, in this mode, any data that is not written in the snapshot is immediately lost if the server is killed.

If durability is important:

Redis can be configured to use an Append-Only File (AOF). Here, every command is written to a file. To recover from a crash or other server restart, the append-only file is replayed. There are three modes:

  • fsync() on every new command
  • fsync() every second
  • Let the OS decide when to fysnc()

Using the BGREWRITEAOF command, Redis will update the snapshot and re-write the Append-Only File to shorten it. Like snapshotting, this is done asynchronously, in the background.

More advanced configurations:

Persistence can be turned off completely. This is useful in a number of scenarios.

For example, if performance is very critical and your application demands extremely tight control over RAM usage, the following configuration is possible:

  • One master, persistence off, and
  • One slave, persistence off, and
  • Periodic synchronous saves, issues against the slave only

The advantage of this set-up is that it requires no extra memory to complete a save, regardless of the number and frequency of writes. In this way, you are trading off durability for extremely tight control over memory usage.

No extra memory is required to complete the save because the SAVE command performs a synchronous save operation, thereby blocking the server that the command is issued against until the saving process completes. (Asynchronous saves, as discussed above, require extra memory proportional to the number of writes performed during the save.)

Other variations on this theme are possible, for example AOF can be enabled on the slave only while persistence remains off on the master.

Binary Dumps and In-Memory Representation

The binary dumps (i.e. those produced by the snapshot operations) are stored in a very efficient manner on disk.

Once a binary dump is loaded, Redis will use several factors more memory than the on-disk representation requires. The exact factor increase depends primarily on the data types that are in use. For example, Sorted Sets use significantly more memory than Sets, even though both data structures require similar amounts of space when serialized to disk.

This is expected behaviour, given that Redis optimizes heavily both read and write performance.

Note that optimizations are continually being made to reduce the amount of memory required to represent each of the data types in memory.

Problems

Redis exhibits the following issues with persistence:

  • Most save operations require additional memory to complete successfully (as previously discussed). Depending on the size of the dataset, the frequency of writes, and the amount of RAM you are comfortable reserving, this may or may not be an issue.

  • Redis persistence requires extremely heavy I/O usage. This is discussed in detailhere. Also see Salvatore’s response.

  • In some cases, asynchronous saves can block the server for lengthy periods of time. See this post on the mailing list for an interesting discussion.

Although the issues with Redis persistence are hard problems to solve, the issues are beginning to be discussed at length. We should continue to see improvements in this area.

你可能感兴趣的:(redis,memcached,OS,UP,performance)