Problem of storage and delivering static content is quiet actual nowadays. Lots of people needs big and reliable storages for storing static images and many other static files and delivering it to end users. Most popular solution still is NFS mounted storage, which is accessible from all front-ends, but this solution has big bottlenecks.
Now lets dig deeper:
Hard to Backup :Some of you will say that this is not so ! But lets imagine that you have 10TB of small images which your application regularly use and this images are very critical. Standard rsync and or tar could take lots of time and system resources, which is definitely not what we want.
Everything relies on NAS : So what ? We an buy a reliable NAS/SAN with cool RAID(1-10) storage and use it. But if we have a closer look we will see that for having 10TB space with for example 10x1TB 15k RPM SAS drives we will need at least 11x drives + RAID controller. Everything is good so far, but wait what is the price for that. After digging internet shops and price-lists you will see that this is quiet expensive, especial if your data is very critical and you need hot-backup aka second NAS/SAN. Another bottleneck is that in that in this solution you will have to do vertical only scalability. This is expensive and hard to achieve. And at least by order but not by meaning is that you will have to share same IO device for all. This is truly a problem for large scale deployments.
Statically mounted external storage is needed: This means that all your system will rely on externally mounted device and regardless how reliable is that, it is some king of SPOF.
So combining this all will show that classical shared storage architecture is hard to implement, expensive and has slow performance for large deployments. This may not me a big deal if, you are IT of Bank, and you management has lots of money and very little “imagination”. In this case this article is not for you
So for everyone else:
lets summarize what we need:
After spending lots of time for finding a solution for mentioned problems we found seems ideal solution:
Before starting let’s summarize what these two tools will give us:
Riak: Wonderful, fully clusterized NoSQL server written in Erlang. It works asynchronously, has great performance and easy access via REST, protobuf and lots of other interfaces. It also has built in realtime Search index and MapReduce Implementation. But for now we will use only small par of Riak, aka storage for static files. In this scenario we must look on several benefits against shared storage solution.
So lets start my favorite part: Installation and configuration of mentioned above. As I’m Debian fan, I will do this on current Stable release Debian 6.0 Squeeze
First you need t download and install Riak. At the moment of writing this article this was the latest version of Riak but before just copy-pasting check out for latest version here: http://basho.com/resources/downloads/.
Download and Install Riak:
# cd /usr/local/src # wget http://s3.amazonaws.com/downloads.basho.com/riak/CURRENT/debian/6/riak_1.2.1-1_amd64.deb # dpkg -i riak_1.2.1-1_amd64.deb
Done! Riak is installed. Do not start it for now. Just in case:
# /etc/init.d/riak restart
Now we need to clusterize it and make some configuration changes. By default Riak binds on 127.0.0.1 whic ix not a good idea fo clusters so change it to internal ip address of server,do not bind Riak on servers public IP is that exist .
edit /etc/riak/app.config and change:
{pb_ip, "192.168.235.111" }, and {http, [ {"192.168.235.111", 8098 } ]},
127.0.0.1 localhost 192.168.235.111 riak1.your-domain.com riak1
192.168.235.112 riak2.your-domain.com riak2 192.168.235.113 riak3.your-domain.com riak3 192.168.235.11N riakN.your-domain.com riakN
# mkfs.xfs /dev/sdb1 # mount /dev/sdb1 /mnt # mv /var/lib/riak/* /mnt/ # umount /mnt # mount /dev/sdb1 /var/lib/riak # chown -R riak.riak /var/lib/riak
# mount /dev/sdb1 /opt # mkdir /opt/riak # chown riak.riak /opt/riak # mv /var/lib/riak/* /opt/riak
{riak_core, [ {ring_state_dir, "/opt/riak/riak/ring"}, ...--------... {bitcask, [{data_root, "/opt/riak/bitcask"} ]}, {eleveldb, [{data_root, "/opt/riak/leveldb"}]},
{https, [ {"192.168.235.111", 8069 } ]}, {ssl, [ {certfile, "/etc/riak/ssl/riak.crt"}, {keyfile, "/etc/riak/ssl/riak.pem"} ]},
-name [email protected] to -name [email protected]
/etc/init.d/riak restart
riak-admin cluster join [email protected]
riak-admin cluster plan riak-admin cluster commit
curl -v -X PUT http://192.168.235.111:8098/riak/images/foo.jpg -H "Content-type: image/jpg" --data-binary @./foo.jpg
upstream riak { server 192.168.235.111:8098 fail_timeout=30s; server 192.168.235.112:8098 fail_timeout=30s; server 192.168.235.113:8098 fail_timeout=30s; } server { listen 80; server_name your.public.domain; if ( $uri !~ \. ) { return 403; } # Require URI with file extension if ( $uri !~ ^/.*/.* ) { return 403; } # Disable access to Riak / if ( $uri ~ ^/.*/.*/.* ) { return 403;} # Disable Link walk MR etc location / { if ($request_method = GET){ proxy_pass http://riak; rewrite ^/(.*) /riak/$1 break; # Remove /riak from external URL (Hide Riak) } proxy_redirect off; proxy_next_upstream error timeout invalid_header http_500; proxy_connect_timeout 2; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Referer ""; # Zero up referer or Riak will 403 all requests proxy_hide_header X-Riak-Vclock; # Remove Riak specific headers proxy_hide_header Link; # Remove Riak specific headers proxy_hide_header ETag; # Remove Another Riak header proxy_hide_header Server; } }