最近部署正式环境有这样的一个需求,使用SolrCloud部署多核。网上找了很多资料都找不到,自己查了一些资料倒腾倒腾。结果真的被我倒腾出来了!
这里会涉及到solr的分布式部署,也就是SolrCloud,以及Solr Replication、Solr分片。
在现在这个时代,大数据往往我们必须面对的一大问题,当我们将足够大的数据存放在一个单节点solr上面,这样是会出现很多问题的,节点崩溃,数据丢失,查询缓慢等等问题。其实solrCloud已经帮我们解决了这一系列问题。SolrCloud解决所有这些问题。有用于自动地分配两个索引过程和查询的支持,以及动物园管理员提供故障转移和负载平衡。此外,每一个碎片还可以有额外的健壮性多个副本。
1、准备工作:
环境:
两台服务器
10.68.237.21 website1 10.68.237.22 website2
zookeeper安装
zookeeper的安装我们这边使用的是3.4.5的版本。只安装了一台;
一下是zoo.cfg的配置,默认端口2181(zookeeper刚解压的时候zoo.cfg是不存在的,我们需要cp一个zoo_sample.cfg并命名为zoo.cfg)
# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/data/projects/clusters/zookeeper-3.4.5/data dataLogDir=/data/projects/clusters/zookeeper-3.4.5/log # the port at which the clients will connect clientPort=2181 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1 server.1=10.68.237.105:2888:3888如果是多个节点的话,配置完zoo.cfg之后,通过scp进行服务器之间的复制即可;
scp命令示例:
scp -r localpath user@hostname:remotepath
各服务器之间拷贝完成之后,到每台服务器上执行以下命令启动zookeeper集群
./zkServer.sh start
./zkServer.sh status
./zkClient.sh -server hostname:port
客户端连接上去之后会可以进入查询页面
Connecting to hbase1:2181 2013-12-26 14:49:53,146 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT 2013-12-26 14:49:53,151 [myid:] - INFO [main:Environment@100] - Client environment:host.name=hbase1 2013-12-26 14:49:53,151 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.7.0_17 2013-12-26 14:49:53,152 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation 2013-12-26 14:49:53,152 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/java-7-oracle-1.7.0.17/jre 2013-12-26 14:49:53,153 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/data/projects/clusters/zookeeper-3.4.5/bin/../build/classes:/data/projects/clusters/zookeeper-3.4.5/bin/../build/lib/*.jar:/data/projects/clusters/zookeeper-3.4.5/bin/../lib/slf4j-log4j12-1.6.1.jar:/data/projects/clusters/zookeeper-3.4.5/bin/../lib/slf4j-api-1.6.1.jar:/data/projects/clusters/zookeeper-3.4.5/bin/../lib/netty-3.2.2.Final.jar:/data/projects/clusters/zookeeper-3.4.5/bin/../lib/log4j-1.2.15.jar:/data/projects/clusters/zookeeper-3.4.5/bin/../lib/jline-0.9.94.jar:/data/projects/clusters/zookeeper-3.4.5/bin/../zookeeper-3.4.5.jar:/data/projects/clusters/zookeeper-3.4.5/bin/../src/java/lib/*.jar:/data/projects/clusters/zookeeper-3.4.5/bin/../conf:.:/usr/lib/jvm/java-7-oracle/lib 2013-12-26 14:49:53,153 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2013-12-26 14:49:53,154 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp 2013-12-26 14:49:53,154 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA> 2013-12-26 14:49:53,155 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux 2013-12-26 14:49:53,155 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64 2013-12-26 14:49:53,156 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.2.0-23-generic 2013-12-26 14:49:53,156 [myid:] - INFO [main:Environment@100] - Client environment:user.name=appadmin 2013-12-26 14:49:53,157 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/home/appadmin 2013-12-26 14:49:53,157 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/data/projects/clusters/zookeeper-3.4.5/bin 2013-12-26 14:49:53,159 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=hbase1:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@4f83e6d4 Welcome to ZooKeeper! 2013-12-26 14:49:53,189 [myid:] - INFO [main-SendThread(hbase1:2181):ClientCnxn$SendThread@966] - Opening socket connection to server hbase1/10.68.237.105:2181. Will not attempt to authenticate using SASL (unknown error) JLine support is enabled 2013-12-26 14:49:53,197 [myid:] - INFO [main-SendThread(hbase1:2181):ClientCnxn$SendThread@849] - Socket connection established to hbase1/10.68.237.105:2181, initiating session [zk: hbase1:2181(CONNECTING) 0] 2013-12-26 14:49:53,242 [myid:] - INFO [main-SendThread(hbase1:2181):ClientCnxn$SendThread@1207] - Session establishment complete on server hbase1/10.68.237.105:2181, sessionid = 0x1431d32bbc3107a, negotiated timeout = 30000 WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: hbase1:2181(CONNECTED) 0]这样zookeeper就算搭建好了!
如果需要更详细的zookeeper集群搭建步骤,可以查看我的博客:http://blog.csdn.net/weijonathan/article/details/8591117
接下来需要上传solr配置文件到zookeeper
webadmin@website1:/data/tomcats/apprank.solr.com/bin$ java -classpath .:/data/projects/apprank.solr.com/solr/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost hbase1:2181 -confdir /data/projects/apprank.solr.com/solr/googleplayrank/conf -confname googleplayconf webadmin@website1:/data/tomcats/apprank.solr.com/bin$ java -classpath .:/data/projects/apprank.solr.com/solr/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig -collection googleplayrank -confname googleplayconf -zkhost hbase1:2181多个colllection只需按照上面的配置多执行一次就OK,但是collection的路径以及confname要区分出来哦
上传后我们通过zookeeper客户端查看上传情况,连上zookeeper服务器后,我们可以执行help查看有哪些命令可以使用
[zk: hbase1:2181(CONNECTED) 0] help ZooKeeper -server host:port cmd args connect host:port get path [watch] ls path [watch] set path data [version] rmr path delquota [-n|-b] path quit printwatches on|off create [-s] [-e] path data acl stat path [watch] close ls2 path [watch] history listquota path setAcl path acl getAcl path sync path redo cmdno addauth scheme auth delete path [version] setquota -n|-b val path
这里我们使用ls命令。
查看上传情况
configs路径
[zk: hbase1:2181(CONNECTED) 8] ls /configs [appstoreapps, googleplayapps]collections路径
[zk: hbase1:2181(CONNECTED) 0] ls /collections [appstoreapps, googleplayapps]
执行tomcat 项目拷贝
scp -r /data/projects/apprank.solr.com/* webadmin@website2:/data/projects/apprank.solr.com
启动各个服务器上的tomcat。查看solr web端。
我们可以看到在左侧出现了一个Cloud菜单。我们提交到zookeeper的文件可以通过菜单里面的Tree菜单查看到
结果发现,122的两个节点上出现了Recovering现象。这个搞了很久都没搞出来,所以换了一种方式进行创建。直接创建shard方式。
以下先介绍下solr在zookeeper底下生成的文件的作用,之后在介绍写建立shard方法
启动tomcat之后我们再查看zookeeper服务器的列表情况
[zk: hbase1:2181(CONNECTED) 0] ls / [configs, hbase, zookeeper, clusterstate.json, aliases.json, live_nodes, overseer, overseer_elect, collections]可以看到这边多了好几个文件夹和文件
[zk: hbase1:2181(CONNECTED) 3] get /clusterstate.json { "appstoreapps":{ "shards":{"shard1":{ "range":null, "state":"active", "parent":null, "replicas":{ "core_node1":{ "state":"active", "base_url":"http://10.68.237.121:8983/solr", "core":"appstoreapps", "node_name":"10.68.237.121:8983_solr", "leader":"true"}, "core_node2":{ "state":"recovering", "base_url":"http://10.68.237.122:8983/solr", "core":"appstoreapps", "node_name":"10.68.237.122:8983_solr"}}}}, "router":{"name":"implicit"}}, "googleplayapps":{ "shards":{"shard1":{ "range":null, "state":"active", "parent":null, "replicas":{ "core_node1":{ "state":"active", "base_url":"http://10.68.237.121:8983/solr", "core":"googleplayapps", "node_name":"10.68.237.121:8983_solr", "leader":"true"}, "core_node2":{ "state":"recovering", "base_url":"http://10.68.237.122:8983/solr", "core":"googleplayapps", "node_name":"10.68.237.122:8983_solr"}}}}, "router":{"name":"implicit"}}} cZxid = 0x6da2 ctime = Thu Dec 26 15:12:30 CST 2013 mZxid = 0x6e28 mtime = Thu Dec 26 15:15:06 CST 2013 pZxid = 0x6da2 cversion = 0 dataVersion = 11 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 1217 numChildren = 0
[zk: hbase1:2181(CONNECTED) 1] ls /live_nodes [10.68.237.121:8983_solr, 10.68.237.122:8983_solr]
[zk: hbase1:2181(CONNECTED) 4] ls /overseer_elect [election, leader]
[zk: hbase1:2181(CONNECTED) 1] ls /overseer [queue, queue-work, collection-queue-work]
[zk: hbase1:2181(CONNECTED) 1] get /aliases.json null cZxid = 0x6da3 ctime = Thu Dec 26 15:12:30 CST 2013 mZxid = 0x6da3 mtime = Thu Dec 26 15:12:30 CST 2013 pZxid = 0x6da3 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 0
前面配置一直有点问题,所以使用rest重新进行collection(core) shard 创建,这里我们直接调用solr的REST接口,这里使用CREATE进行创建
curl 'http://website1:8081/admin/collections?action=CREATE&name=appstoreapps&numShards=2&replicationFactor=1'
使用这个方法,直接调用上面的接口,然后再将我们对应的config文件上传到zookeeper后,重启即可看到我们的shard成功了。提交到zookeeper的方法在前面已经提到过,不知道的可以拉到前面查看提交方式。
看下我们这样创建的结果
多核可以在我们的tomcat底下建立多个collection(core)目录,修改其core.properties配置,修改为对应的collection名称。再按上面的rest步骤即可!
最后我们来查看下我最终的配置结果。
由于我这边Replication还没配置好,所以会有一个Recovering。后续会把replication配置成功的响应方法以博文方式发布!
SolrCloud+Tomcat7的多核安装配置就到这里。
如果想知道具体的solr REST接口操作,可以查看以下文档
http://www.wxdl.cn/index/solrcloud.html
其他参考文献
http://blog.csdn.net/shirdrn/article/details/9718387
http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble