That is the common question I hear: Which one is better, Solr or Elasticsearch? Which one is faster? Which one scales better? Which one can do X, and Y, and Z? Which one is easier to manage? Which one should we use? Which one do you recommend? These are all great questions, though not always with clear and definite, universally applicable answers.
So which one do we recommend you use? How do you choose in the end? Well, let me share how I see Apache Solr and Elasticsearch past, present, and future, and let’s do a bit of comparing and contrasting, and hopefully help you make the right choice for your particular needs.
Apache Solr is a mature project with a large and active development and user community behind it, as well as the Apache brand. First released to open-source in 2006, Solr has long dominated the search engine space and was the go-to engine for anyone needing search functionality. Its maturity translates to rich functionality beyond vanilla text indexing and searching; such as faceting, grouping (aka field collapsing), powerful filtering, pluggable document processing, pluggable search chain components, language detection, etc.
Solr dominated the search scene for several years. Then, around 2010, Elasticsearch appeared as another option on the market. Back then, it was nowhere near as stable as Solr, did not have Solr’s feature depth, did not have the mindshare, brand, and so on.
But it had a few other things going for it: Elasticsearch was young and built on more modern principles, aimed at more modern use cases, and was built to make handling of large indices and high query rates easier. Moreover, because it was so young and without a community to work with, it had the freedom to move forward in leaps and bounds, without requiring any sort of consensus or cooperation with others (users or developers), backwards compatibility, or anything else that more mature software typically has to handle.
As such it exposed certain highly sought-after functionality (e.g., Near Real-Time Search) before Solr did. Technically speaking, the ability to have NRT Search really came from Lucene, the underlying search library to both Solr and Elasticsearch use. The irony is that because Elasticsearch exposed NRT Search first, people associated NRT Search with Elasticsearch, even though Solr and Lucene are both part of the same Apache project and, as such, one would expect Solr to have such highly demanded functionality first.
Elasticsearch, being more modern, appealed to several groups of people and organizations:
Of course, let’s admit it, there will always be those who like jumping on new shiny objects, too.
Fast forward to 2015. Elasticsearch is no longer new, but it’s still shiny. It closed the feature gap with Solr and, in some cases, surpassed it. It certainly has more buzz around it. At this point, both projects are very mature. Both have lots of features. Both are stable. I have to say though, that I do see more Elasticsearch clusters with issues, but I think that is primarily because of a few reasons:
Although this may sound scary, let me put it this way — Elasticsearch exposes a ton of control knobs one can play with to control the beast. Of course, the key bit is that one has to be aware of all possible knobs, know what they do, and make use of that. For example, despite what you just read about Elasticsearch, we rely on it in our organization for several different products, even though we know Solr just as well as we know Elasticsearch.
What about Solr? Solr hasn’t exactly stood still. The appearance of Elasticsearch was actually great for Solr and its community of developers and users. Despite being almost 10 years old, Solr development is going faster than ever. It, too, has a friendly API now. It, too, has the ability to more easily grow and shrink clusters, create indices more dynamically, shard them on the fly, route documents and queries, etc., etc. Note: when people refer to SolrCloud they specifically mean this form of very distributed, Elasticsearch-like Solr deployment.
I recently attended a Lucene/Solr Revolution conference in Washington D.C. and was pleasantly surprised by what I saw: A strong community, healthy project, lots of big name companies not only using Solr, but investing in it through adoption, contribution through development/engineering time, etc.
If you follow just the news you’d be led to believe Solr is dead and everyone is just flocking to Elasticsearch. That is actually not the case. Elasticsearch being newer, is naturally more interesting to write about. Solr was news 5-plus years ago. And of course there were some people going from Solr to Elasticsearch when Elasticsearch appeared–in the beginning there were simply no Elasticsearch users.
So which is better? Which one should you use? Where do Solr and Elasticsearch differ? What does the future hold?
Here are some other things you should keep in mind:
In conclusion, here are the bits that I think make the most difference for anyone having to make a choice:
If you expected a single definitive winner, I’m sorry to disappoint. We don’t have one here. However, I hope this quick comparison of the two leading open-source search engines provides enough information and guidance to help you make the right choice for your organization.
About the author: Otis Gospodnetić is a Lucene, Solr, and Elasticsearch expert, co-author of Lucene in Action (1st and 2nd editions), and the founder and CEO of Sematext. Sematext is a globally distributed organization that builds innovative cloud and on premise solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), search analytics (SSA), and search enhancement. The company also provides Search and Big Data consulting services and offers 24/7 production support for Solr and Elasticsearch to clients worldwide.
请关注我们的技术创业项目 Terark,领先的数据技术提供商