With earlier versions of Solr, you had to set up your own load balancer. Now each individual node load balances requests across the replicas in a cluster. You still need a load balancer on the 'outside' that talks to the cluster, or you need a smart client. (Solr provides a smart Java Solrj client called CloudSolrServer.)
在旧版本的Solr中,你必须自己实现一个负载均衡器,然而现在在集群中的每个节点都可以把请求自动的负载均衡到所有的replica节点上去。但是对于整个集群来说,你仍然需要一个外部的负载均衡器,或者是一个智能客户端(Solr已经在Java的客户端Solrj中提供了一个CloudSolrServer的智能客户端)
A smart client understands how to read and interact with ZooKeeper and only requests the ZooKeeper ensembles' address to start discovering to which nodes it should send requests.
智能的客户端知道怎么去读取ZooKeeper里面的信息并和ZooKeeper交互,而且只通过请求ZooKeeper集群来判断应该向哪个节点发送请求。
SolrCloud supports near real-time actions, elasticity, high availability, and fault tolerance. What this means, basically, is that when you have a large cluster, you can always make requests to the cluster, and if a request is acknowledged you are sure it will be durable; i.e., you won't lose data. Updates can be seen right after they are made and the cluster can be expanded or contracted.
SolrCloud支持一些近实时操作、弹性伸缩、高可用和可容错的特性。这意味着,基本上只要你有一个大型集群,你就可以一直把请求发送到这个集群中去,并且只要这个请求是节点公认的,就可以确定这个请求操作可以一直使用;比如,你不会在集群中丢失任何数据。所有的更新操作只要在完成之后并且集群可以正常的伸缩的话,结果都可以正确可见。
A Transaction Log is created for each node so that every change to content or organization is noted. The log is used to determine which content in the node should be included in a replica. When a new replica is created, it refers to the Leader and the Transaction Log to know which content to include. If it fails, it retries.
每一个节点都会创建一个Transaction Log来记录所有索引内容或结构的变更。这个Log被用来确定在各个replica节点中应该包含哪些索引内容。当一个新的replica节点创建之后,它会查阅Leader节点和它的Transaction Log来了解自己应该包含哪些索引内容。如果这个过程失败了的话,它会自动重试。
Since the Transaction Log consists of a record of updates, it allows for more robust indexing because it includes redoing the uncommitted updates if indexing is interrupted.
Transaction Log由一个保存了一系列的更新操作的记录构成,它能增加索引操作的健壮性,因为只要某个节点在索引操作过程中意外中断了,它可以重做所有未提交的更新操作。
If a leader goes down, it may have sent requests to some replicas and not others. So when a new potential leader is identified, it runs a synch process against the other replicas. If this is successful, everything should be consistent, the leader registers as active, and normal actions proceed. If the a replica is too far out of synch, the system asks for a full replication/replay-based recovery.
假如一个leader节点宕机了,可能它已经把请求发送到了一些replica节点但是却没有发送到另一些却没有发送,所以在一个新的leader节点在被选举出来之前,它会依靠其他replica节点来运行一个同步处理操作。如果这个操作成功了的话,所有节点的数据就都保持一致了,然后leader节点把自己注册为活动节点,普通的操作就会被处理。如果一个replica节点的数据脱离整体同步太多了的话,系统会请求执行一个全量的基于普通的replication同步恢复。
an update fails because cores are reloading schemas and some have finished but others have not, the leader tells the nodes that the update failed and starts the recovery procedure.
一个更新操作可能在core在加载schema的时候失败,因为一些节点可能已经加载完成了,另一些节点却没有,leader节点会告诉那些更新数据失败的节点启动一个回复处理。