Following up on yesterday’s 200,000,000 Keys in Redis 2.0.0-rc3 post, which was a worst-case test scenario to see what the overhead for top-level keys in Redis is, I decided to push the boundaries in a different way. I wanted to use the new Hash data type to see if I could store over 1 billion values on a single 32GB box. To do that, I modified my previous script to create 25,000,000 top-level hashes, each of which had 50 key/value pairs in it.
The code for redisStressHash was this:
#!/usr/bin/perl -w $|++; use strict; use lib 'perl-Redis/lib'; use Redis; my $r = Redis->new(server => 'localhost:63790') or die "$!"; ## 2.5B values for my $key (1..25_000_000) { my @vals; for my $k (1..50) { my $v = int(rand($key)); push @vals, $k, $v; } $r->hmset("$key", @vals) or die "$!"; } exit; __END__
Note that I added a use lib in there to use a modified Redis Perl library that speaks the multi-bulk protocol used all over in the Redis 2.0 series.
If you do the math, that yields 1.25 billion (1,250,000,000) key/value pairs stored. This time I remembered to time the execution as well:
real 160m17.479s user 58m55.577s sys 5m53.178s
So it took about 2 hours and 40 minutes to complete. The resulting dump file (.rdb file) was 13GB in size (compared to the previous 1.8GB) and the memory usage was roughly 17GB.
Here’s the INFO output again on the master:
redis_version:1.3.16 redis_git_sha1:00000000 redis_git_dirty:0 arch_bits:64 multiplexing_api:epoll process_id:21426 uptime_in_seconds:12807 uptime_in_days:0 connected_clients:1 connected_slaves:1 blocked_clients:0 used_memory:18345759448 used_memory_human:17.09G changes_since_last_save:774247 bgsave_in_progress:1 last_save_time:1280092860 bgrewriteaof_in_progress:0 total_connections_received:22 total_commands_processed:32937310 expired_keys:0 hash_max_zipmap_entries:64 hash_max_zipmap_value:512 pubsub_channels:0 pubsub_patterns:0 vm_enabled:0 role:master db0:keys=25000000,expires=0
Not bad, really. This provides a slightly more reasonable usse case of storing many values in Redis. In most applications, I supsect people will have a number of “complex” values stored behind their top-level keys (unlike my previous simple test).
I’m kind of tempted to re-run this test using LISTS, then SETS, then SORTED SETS just to see how they all compare from a storage point of view.
In any case, a 10 machine cluster could handle 12 billion key/value pairs this way. Food for thought.