分布式搜索引擎ElasticSearch之高级运用（五）

一、IK分词器

安装IK分词插件
下载地址

执行安装

采用本地文件安装方式，进入ES安装目录，执行插件安装命令：

[elsearch@localhost plugins]$../bin/elasticsearch-plugin install file:///usr/local/elasticsearch-7.10.2/elasticsearch-analysis-ik-7.10.2.zip

安装成功后，会给出对应提示：

-> Installing file:///usr/local/elasticsearch-7.10.2/elasticsearch-analysis-ik-7.10.2.zip
-> Downloading file:///usr/local/elasticsearch-7.10.2/elasticsearch-analysis-ik-7.10.2.zip
[=================================================] 100%   
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@     WARNING: plugin requires additional permissions     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.net.SocketPermission * connect,resolve
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.

Continue with installation? [y/N]y
-> Installed analysis-ik

重启ElasticSearch服务

测试IK分词器

标准分词器：

GET _analyze?pretty
{
  "analyzer": "standard",
  "text": "我爱我的祖国"
}

采用IK智能化分词器：

GET _analyze?pretty
{
  "analyzer": "ik_smart",
  "text": "我爱我的祖国"
}

IK最大化分词：

GET _analyze?pretty
{
  "analyzer": "ik_max_word",
  "text": "我爱我的祖国与家乡"
}

IK分词器最佳运用
analyzer指定的是构建索引的分词，search_analyzer指定的是搜索关键字的分词。
实践运用的时候，构建索引的时候采用max_word，将分词最大化；查询的时候则使用smartword智能化分词，这样能够最大程度的匹配出结果。
```
PUT / orders_test {
    "mappings": {
        "properties": {            
            "goodsName": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_smart"
            }
        }
    }
}
```

二、全量索引构建

下载logstash
下载地址
安装logstash-input-jdbc插件
进入logstash目录，执行：
```
bin/logstash-plugin install logstash-input-jdbc
```

配置mysql驱动包

[root@localhost bin]# mkdir mysql 
[root@localhost bin]# cp mysql-connector-java-5.1.34.jar /usr/local/logstash-7.10.2/bin/mysql/

配置JDBC连接

创建索引数据是从mysql中通过select语句查询，然后再通过logstash-input-jdbc的配置文件方式导入
elasticsearch中。

在/usr/local/logstash-7.10.2/bin/mysql/目录创建jdbc.conf与jdbc.sql文件。

jdbc.conf文件:

input {
    stdin {
    }
    jdbc {
        # mysql 数据库链接,users为数据库名
        jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/users"
        # 用户名和密码
        jdbc_user => "root"
        jdbc_password => "654321"
        # 驱动
        jdbc_driver_library => "/usr/local/logstash-7.10.2/bin/mysql/mysql-connector-java-5.1.34.jar
        # 驱动类名
        jdbc_driver_class => "com.mysql.jdbc.Driver"
        jdbc_paging_enabled => "true"
        jdbc_page_size => "50000"
        # 执行的sql 文件路径+名称
        statement_filepath => "/usr/local/logstash-7.10.2/bin/mysql/jdbc.sql"
        # 设置监听间隔 各字段含义（由左至右）分、时、天、月、年，全部为*默认含义为每分钟都更新
        schedule => "* * * * *"
    }
}

output {
    elasticsearch {
        #ES的连接信息
        hosts => ["10.10.20.28:9200"]
        #索引名称
        index => "users"
        document_type => "_doc"
        #自增ID， 需要关联的数据库的ID字段， 对应索引的ID标识
        document_id => "%{id}"        
    }
    stdout {
        #JSON格式输出
        codec => json_lines
    }
}

jdbc.sql文件：

select `id`, `institutionTypeId`, `menuCode`, `menuName`, `menuUri`, `menuLevel`, `componentSrc` from t_authority_menu

创建ES索引

PUT /users?include_type_name=false
{
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    },
    "mappings": {
        "properties": {
            "id": {
                "type": "integer"
            },
            "institutionTypeId": {
                "type": "text",
                "analyzer": "whitespace",
                "fielddata": true
            },
            "menuCode": {
                "type": "text"
            },
            "menuName": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_smart"
            },
            "menuUri": {
                "type": "text"
            },
            "menuLevel": {
                "type": "integer"
            },
            "componentSrc": {
                "type": "text"
            }
        }
    }
}

执行全量同步
执行命令：
```
./logstash -f mysql/jdbc.conf
```
检查结果：
```
GET /users/_search
```

三、增量索引同步

修改jdbc.conf配置文件：

增加配置：

input {
    jdbc{
        #设置timezone
        jdbc_default_timezone => "Asia/Shanghai"
        ...
        # 增量同步属性标识
        last_run_metadata_path => "/usr/local/logstash-7.10.2/bin/mysql/last_value"
    }
}

修改jdbc.sql配置文件

这里根据最后更新时间，做增量同步：

select `id`, `institutionTypeId`, `menuCode`, `menuName`, `menuUri`, `menuLevel`, `componentSrc` from t_authority_menu where last_udpate_time > :sql_last_value

创建同步最后记录时间

vi /usr/local/logstash-7.10.2/bin/mysql/last_value

给定一个初始的时间：

2020-01-01 00:00:00

验证
1）启动logstash，会根据初始时间，加载对应的数据。
2）如果修改了数据的更新时间， logstash会自动检测，同步增量的数据至ES中。

本文由mirson创作分享，如需进一步交流，请加vx号：soft_art或访问www.softart.cn

分布式搜索引擎ElasticSearch之高级运用（五）

一、IK分词器

二、全量索引构建

三、增量索引同步

你可能感兴趣的:(java)