How would HBase compare to Facebook's Haystack for photo storage?

link: http://www.quora.com/Apache-Hadoop/How-would-HBase-compare-to-Facebooks-Haystack-for-photo-storage?q=hbase+bl


The comparison isn't very fair since they're different things.  Haystack is software that runs on a single machine and stores data without replication, peer awareness, or anything else that makes distributed storage distributed.  It's no more distributed than XFS, and sometimes isn't even used when the context of the content stored (video for example) doesn't make sense for Haystack.

There's a system that sits on top of Haystack (and others) that figures out where writes go, where reads go, how replication happens, how recovery happens, and all that fun stuff.  That's closer to HBase in this context.  That system is a distributed key/value binary blob store optimized for reads, datacenter replication, and cost.  I'm no HBase expert, but I think the datacenter replication built in give it some advantages (especially as far as cost in concerned).  Plus, it's made for fast read/write (not update), which works for photos and video.

It works great for us, but that doesn't mean it's the correct way and other ways are incorrect :)


Please have a look at Apache Hadoop: Is HBase appropriate for indexed blob storage in HDFS? where I explain how one solution could look like. But this is far from comparable in regards to optimizing for what Facebook has for their requirements. HBase is not custom made to store and serve images but was build as a generic data store to serve "key/value" style data. My view is that you can use it in combination with other systems to build a photo store. 

This reminds me a bit of other custom made solutions that can do one thing very well but not others. Trying to use one of those to fit into another use-case may not work at all. Haystack was build for one purpose while HBase is too generic to be as good.

你可能感兴趣的:(How would HBase compare to Facebook's Haystack for photo storage?)