SolrCloud Wiki翻译(2)Nodes,Cores,Clusters & Leaders

Nodes and Cores

Node和Core

In SolrCloud, a node is Java Virtual Machine instance running Solr, commonly called a server. Each Solr core can also be considered a node. Any node can contain both an instance of Solr and various kinds of data.

在SolrCloud里面,一个node代表运行一个Solr应用的JVM进程,一般叫做一个server。每一个Solr core也可以认为是一个node。一个node可以包含一个Solr的运行实例和各种各样的索引数据。

A Solr core is basically an index of the text and fields found in documents. A single Solr instance can contain multiple "cores", which are separate from each other based on local criteria. It might be that they are going to provide different search interfaces to users (customers in the US and customers in Canada, for example), or they have security concerns (some users cannot have access to some documents), or the documents are really different and just won't mix well in the same index (a shoe database and a dvd database).

一个Solr core是一个基本概念,指的是可以在文档里面的查找文本和其他类型字段的一个索引。一个单独的Solr实例可以包含多个“core”,这些core因为一系列的条件而需要隔离的。这些条件可能是:它们要给用户提供不同的一些搜索服务(例如在美国的顾客和在加拿大的顾客),它们要各自设置一些安全策略(比如某些用户不能访问某些文档),或者是它们的数据格式完全不同导致它们不能很好的混合在一份索引里面(比如一个鞋子的数据库和dvd的数据库)

When you start a new core in SolrCloud mode, it registers itself with ZooKeeper. This involves creating an Ephemeral node that will go away if the Solr instance goes down, as well as registering information about the core and how to contact it (such as the base Solr URL, core name, etc). Smart clients and nodes in the cluster can use this information to determine who they need to talk to in order to fulfill a request.

当你在SolrCloud模式下面启动一个新的core的时候,它会把自己注册到ZooKeeper里面,这个注册操作包含创建一个Ephemeral节点,这个节点在Solr实例关闭的时候会自动删除,也会把core的相关信息和怎么和这个core通信的方式注册到ZooKeeper里面(例如Solr的base url,core名字,等等)。为了成功的执行一个请求,在集群中的智能客户端和节点能够运用这些信息来确定他们需要和谁通信。

New Solr cores may also be created and associated with a collection via  CoreAdmin. Additional cloud-related parameters are discussed in the Parameter Reference page. Terms used for the CREATE action are:

新版本的Solr core亦可通过 CoreAdmin来创建并且把它和一个collection关联在一起。一些SolrCloud相关的附加参数已经在Parameter Reference进行了说明

  • collection: the name of the collection to which this core belongs. Default is the name of the core.
  • shard: the shard id this core represents. (Optional: normally you want to be auto assigned a shard id.)
  • collection.<param>=<value>: causes a property of <param>=<value> to be set if a new collection is being created. For example, usecollection.configName=<configname> to point to the config for a new collection.
  • collection:该core所属collection的名称,默认就是core的名称。
  • shard: 该core所代表的shard的id(该参数是可选的,一般你都会想要集群帮你自动分配一个shard id)
  • collection.<param>=<value>: 在一个新的collection创建的时候使用这种<param>=<value>的方式来设置相关参数。例如,用collection.configName=<configname>来指明新的collection所需要使用的索引配置。

For example:

举例:

curl  'http://localhost:8983/solr/admin/cores?action=CREATE&name=mycore&collection=collection1&shard=shard2'
Clusters

集群:

A cluster is set of Solr nodes managed by ZooKeeper as a single unit. When you have a cluster, you can always make requests to the cluster and if the request is acknowledged, you can be sure that it will be managed as a unit and be durable, i.e., you won't lose data. Updates can be seen right after they are made and the cluster can be expanded or contracted.

一个集群是一个独立的单元,这个单元由通过ZooKeeper管理的一系列Solr节点构成。当你拥有一个集群的时候,你可以发送所有的标准请求到集群的任意节点,可以确保整个集群式作为一个单元进行管理的并且持续可用,比如在集群中你不会丢失任何数据。只要集群能够正常的伸缩,那么所有的更新操作在完成之后改变会立马正确的可见。

Creating a Cluster

创建集群

A cluster is created as soon as you have more than one Solr instance registered with ZooKeeper. The section Getting Started with SolrCloud reviews how to set up a simple cluster.

只要一个Solr实例注册到了ZooKeeper上那么就认为一个集群已经创建好了。 你可以去Getting Started with SolrCloud复习一下怎么创建一个简单的集群。

Resizing a Cluster

集群伸缩:

Clusters contain a settable number of shards. You set the number of shards for a new cluster by passing a system property, numShards, when you start up Solr. ThenumShards parameter must be passed on the first startup of any Solr node, and is used to auto-assign which shard each instance should be part of. Once you have started up more Solr nodes than numShards, the nodes will create replicas for each shard, distributing them evenly across the node, as long as they all belong to the same collection.

集群包含了若干个shard,shard的数量可以通过数字设置,你可以通过设置一个numShards的system property来指定一个新的集群的shard数量。numShards参数只能在第一次启动Solr节点的时候指定,而且shard中的节点都是自动分配到各个shard中去的。如果你启动了比numShards参数更多的solr节点的话,这些新启动的节点都会作为shard的replica加入到集群中,这些节点都是均匀的分布到shard中的,同时他们都是属于同一个collection。

To add more cores to your collection, simply start the new core. You can do this at any time and the new core will sync its data with the current replicas in the shard before becoming active.

为了在你的collection中添加更多的core,只需要简单的启动新的core。你可以随时进行这个操作,并且新的core在自动同步该shard的数据副本之后才会对外开放服务

You can also avoid numShards and manually assign a core a shard ID if you choose.

当然你也可以不使用numShards参数而是选择手动的分配一个core到一个shard,这个shard通过一个shard id来指定。

The number of shards determines how the data in your index is broken up, so you cannot change the number of shards of the index after initially setting up the cluster.

shard的数量决定你的索引数据会分割成多少个部分,所以你不能在初始化一个集群之后再改变shard的数量。

However, you do have the option of breaking your index into multiple shards to start with, even if you are only using a single machine. You can then expand to multiple machines later. To do that, follow these steps:

然而,你可以选择将一个索引的数据分割到多个shard中去,即便你用的是单机模式。你可以在日后将他们扩展到多个机器上去。下面便是实现这种方式的步骤:

  1. Set up your collection by hosting multiple cores on a single physical machine (or group of machines). Each of these shards will be a leader for that shard. 
  2. When you're ready, you can migrate shards onto new machines by starting up a new replica for a given shard on each new machine. 
  3. Remove the shard from the original machine. ZooKeeper will promote the replica to the leader for that shard.
  1. 在一个拥有多个core的物理机上(或者是多个物理机上)构建一个collection。每个shard都会有一个属于该shard的leader。
  2. 当你准备好以后,你可以为每个shard在一个新的机器上创建一个replica,这样就可以把每个shard都迁移到不同机器上去。
  3. 删除原来机器上的shard,ZooKeeper会自动把replica提升为当前shard的leader。

Leaders and Replicas

Leader和Replica

The concept of a leader is similar to that of master when thinking of traditional Solr replication. The leader is responsible for making sure the replicas are up to date with the same information stored in the leader.

leader的概念和旧式的solr主从复制索引同步中的master概念是相似的。leader节点负责确保replica中所保存的信息能够跟上存储在leader中信息并保持一致。

However, with SolrCloud, you don't simply have one master and one or more "slaves", instead you likely have distributed your search and index traffic to multiple machines. If you have bootstrapped Solr with numShards=2, for example, your indexes are split across both shards. In this case, both shards are considered leaders. If you start more Solr nodes after the initial two, these will be automatically assigned as replicas for the leaders.

然而,在SolrCloud模式里面,你不是简单的拥有了一个master和一个或多个slave,取而代之的是你的搜索和索引请求会被分发到多台机器上去。如果你用一个numShards=2的参数启动了Solr,这个例子中,你的索引请求会被分离到两个shard里面去,两个shard都被当做是一个leader。如果你在开始的两个solr之后启动了更多的solr节点的话,它们都会自动的分配给leader节点当做replica。

Replicas are assigned to shards in the order they are started the first time they join the cluster. This is done in a round-robin manner, unless the new node is manually assigned to a shard with the shardId parameter during startup. This parameter is used as a system property, as in -DshardId=1, the value of which is the ID number of the shard the new node should be attached to.

Replica在第一次启动并且加入到集群的时候会被有序的分配到shard里面去。 这是一种重复的工作方式,除非新的节点被在启动的时候被手动的指定了一个shardId参数来分配到一个特定的Shard上去。这个参数是作为一个system property使用的,例如 -DshardId=1,参数的值是新节点想要加入的shard的ID数值。

On subsequent restarts, each node joins the same shard that it was assigned to the first time the node was started (whether that assignment happened manually or automatically). A node that was previously a replica, however, may become the leader if the previously assigned leader is not available.

在接下来的重启操作中,每个节点会分配到和第一次启动的时候加入的shard相同的shard(无论这个分配操作是手动还是自动)。只要shard的当前leader不可用,前面成为了replica的节点都有可能会成为它们所属shard的leader。

Consider this example:

考虑下面这个例子:

  • Node A is started with the bootstrap parameters, pointing to a stand-alone ZooKeeper, with the numShards parameter set to 2.
  • Node B is started and pointed to the stand-alone ZooKeeper.
  • 节点A在启动的时候用参数将自己指向了一个单独运行的ZooKeeper,并且把numShards参数给设置到了2.
  • 节点B也在启动的时候把自己指向了这个ZooKeeper。

Nodes A and B are both shards, and have fulfilled the 2 shard slots we defined when we started Node A. If we look in the Solr Admin UI, we'll see that both nodes are considered leaders (indicated with a solid black circle).

节点A和节点B都是shard,在启动A节点的时候定义好了集群只能有2个shard。如我们我们看一下Solr Admin UI,我们将会看到所有的节点都被当做了leader(通过一个实心黑圆来表示)

  • Node C is started and pointed to the stand-alone ZooKeeper.


  • 启动节点C并且把它指向单独的ZooKeeper.


Node C will automatically become a replica of Node A because we didn't specify any other shard for it to belong to, and it cannot become a new shard because we only defined two shards and those have both been taken.

节点C会自动成为节点A的一个replica节点,因为我们没有为它指定一个特定的shard,而且它也不能够成为一个新的shard,因为我们只定义了两个shard并且两个shard都已经存在了。

  • Node D is started and pointed to the stand-alone ZooKeeper.


  • 启动节点D并且把它指向单独的ZooKeeper


Node D will automatically become a replica of Node B, for the same reasons why Node C is a replica of Node A.

节点D将会自动成为节点B的一个replica节点,原因和节点C成为节点A的replica一样。

Upon restart, suppose that Node C starts before Node A. What happens? Node C will become the leader, while Node A becomes a replica of Node C.

在重新启动的时候,假设节点C比节点A先启动,将会发生什么呢?节点C会自动的成为一个leader节点,而节点A则会成为节点C的一个replica节点。

全文完

你可能感兴趣的:(zookeeper,Lucene,Solr,solrCloud)