Elasticsearch

Es---Ekastic search 搜索

1、介绍

什么是搜索
百度、淘宝
如果用数据库做搜索会怎么样
做软件开发的话，或者对IT，计算机有一定了解的话，都知道，数据都是存储在数据库里面的，比如说电商网站的商品信息，招聘网站的职位信息，新闻网站的新闻信息
1.比方说，每条记录的指定字段的文本，可能会很长，比如说“商品描述”字段的长度，有长达数千个，甚至数万个字符，这个时候，每次都要对每条记录的所有文本进行扫描，来判断说，你包不包含我指定的这个关键词，比如说：牙膏
2.还不能将搜索词拆分开来，尽可能去搜索更多的符合你的期望的结果，比如说：输入“生化机”，就搜不出来“生化危机”。
结论：
用数据库来实现搜索，是不太靠谱的，通常来说，性能会很差
什么是全文检索和Lucene？
lucene,就是一个jar包，里面包含了封装好的各种建立倒排索引，以及进行搜索的代码，包含各种算法，我们就用java开发的时候，引入lucene jar，然后基于lucene的api去进行开发就可以了，我们就可以将已有的数据数据建立索引，lucene会在本地磁盘上面，给我们组织索引的数据结构。另外的话，我们也可以用lucene提供的的功能和api来针对磁盘上的索引数据，进行搜索。
什么是Elasticsearch？
1)Elasticsearch的功能

2、部署安装

安装方式
方式有点缺点
1）docker 部署方便需要有docker的知识、
开箱急用修改配置麻烦，需要生成镜像、
启动迅速数据存储需要挂载目录
2）tar 部署灵活需要自己写启动管理文件、
对系统的侵占性小目录提前需要规划好
3）RPM|DEB 部署方便软件个个组件分散在不同的目录
启动脚本安装急用、卸载可能不干净
存放目录标准化默认配置需要修改
4）ansible 机器的灵活需要学习ansible语法和规则
你想要的功能都有需要提前规划好所有的标准
批量部署速度快需要专人维护

安装部署
1）安装部署-rpm安装

# 安装java
yum install -y java-1.8.0-openjdk.x86_64 
#下载安装软件
mkdir  -p /data/es_soft/
cd /data/es_soft/
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.6.0.rpm
rpm -ivh elasticsearch-6.6.0.rpm

#配置启动
systemctl daemon-reload
systemctl enable elasticsearch.service
systemctl start elasticsearch.service
systemctl status elasticsearch.service

# 检查是否启动成功
ps -ef|grep elastic
lsof -i:9200

2)目录文件说明

rpm -ql elasticsearch     #查看elasticsearch软件安装了哪些目录
rpm -qc elasticsearch     #查看elasticsearch的所有配置文件

/etc/elasticsearch/elasticsearch.yml    #配置文件
/etc/elasticsearch/jvm.options.            #jvm虚拟机配置文件
/etc/init.d/elasticsearch         #init启动文件
/etc/sysconfig/elasticsearch      #环境变量配置文件
/usr/lib/sysctl.d/elasticsearch.conf  #sysctl变量文件，修改最大描述符
/usr/lib/systemd/system/elasticsearch.service  #systemd启动文件
/var/lib/elasticsearch        # 数据目录
/var/log/elasticsearch        #日志目录
/var/run/elasticsearch        #pid目录

3）修改配置文件

#配置文件说明
#Elasticsearch 已经有了很好的默认值，特别是涉及到性能相关的配置或者选项,其它数据库可能需要调优，但总得来说，Elasticsearch不需要。如果你遇到了性能问题，解决方法通常是更好的数据布局或者更多的节点。
egrep -v "^#" /etc/elasticsearch/elasticsearch.yml
cluster.name: dba5        #集群名称
node.name: node-1     #节点名称
path.data: /data/elasticsearch    #数据目录
path.logs: /var/log/elasticsearch #日志目录
bootstrap.memory_lock: true   #锁定内存
network.host: localhost       #绑定IP地址
http.port: 9200           #端口号
discovery.zen.ping.unicast.hosts: [“localhost”]   #集群发现的通讯      discovery.zen.minimum_master_nodes: 2     #最小主节点数
#添加内存锁定   最后配置文件如下
grep "^[a-Z]" /etc/elasticsearch/elasticsearch.yml 
node.name: node-1
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: 10.0.0.181
http.port: 9200


#添加完内存锁后 需要修改jvm的内存参数  
内存限制
1）不要超过32G
2）最大最小内存设置一样
3）配置文件设置锁定内存
4）至少给服务器本身空余50%内存

#测试机器内存1G  设置大小为512G
vim /etc/elasticsearch/jvm.options
-Xms512m
-Xmx512m

#修改完配置文件后我们需要重启一下
mkdir /data/elasticsearch
chown -R elasticsearch:elasticsearch /data/elasticsearch/
systemctl restart elasticsearch
systemctl status elasticsearch

#这个时候可能会启动失败，查看日志可能会发现是锁定内存失败,官方解决方案
https://www.elastic.co/guide/en/elasticsearch/reference/6.6/setup-configuration-memory.html
https://www.elastic.co/guide/en/elasticsearch/reference/6.6/setting-system-settings.html#sysconfig


# 修改启动配置文件或创建新配置文件
方法1: systemctl edit elasticsearch
方法2: vim /usr/lib/systemd/system/elasticsearch.service 
# 增加如下参数
[Service]
LimitMEMLOCK=infinity
# 重新启动
systemctl daemon-reload
systemctl restart elasticsearch

3、Elasticsearch交互- head插件交互

#Head插件在5.0以后安装方式发生了改变，需要nodejs环境支持，或者直接使用别人封装好的docker镜像
#插件官方地址
https://github.com/mobz/elasticsearch-head

#使用docker部署elasticsearch-head
docker pull alivv/elasticsearch-head
docker run --name es-head -p 9100:9100 -dit elivv/elasticsearch-head

#使用nodejs编译安装elasticsearch-head
yum install nodejs npm openssl screen -y
node -v
npm  -v
npm install -g cnpm --registry=https://registry.npm.taobao.org
cd /opt/
git clone git://github.com/mobz/elasticsearch-head.git
cd elasticsearch-head/
cnpm install
screen -S es-head
cnpm run start
Ctrl+A+D
#直接谷歌下载对应的插件，安装 打开谷歌的插件 输入对应的地址
#修改ES配置文件支持跨域
http.cors.enabled: true 
http.cors.allow-origin: "*"
#之后重启对应的服务器
systemctl restart elasticsearch
#插件连接 如图下

image.png

4、Elasticsearch 增删改查

#创建索引
curl -XPUT '10.0.0.181:9200/vipinfo?pretty'
#插入文档数据
curl -XPUT '127.0.0.1:9200/vipinfo/user/1?pretty' -H 'Content-Type: application/json' -d'
{
    "first_name" : "John",
    "last_name": "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing", "interests": [ "sports", "music" ]
}
'

curl -XPUT  '127.0.0.1:9200/vipinfo/user/2?pretty' -H 'Content-Type: application/json' -d' {
"first_name": "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums", "interests": [ "music" ]
}'

curl –XPUT  '127.0.0.1:9200/vipinfo/user/3?pretty' -H 'Content-Type: application/json' -d' {
"first_name": "Douglas", "last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets", "interests": [ "forestry" ]
}'



curl -XPOST  '127.0.0.1:9200/dajia/renyuan/?pretty' -H 'Content-Type: application/json' -d' {
"name": "bb",
"age" : 29,
"about" : "I like to collect rock albums", "interests": [ "music" ]
}'


curl -XPUT  '127.0.0.1:9200/dajia/renyuan/1?pretty' -H 'Content-Type: application/json' -d' {
"name": "Ly",
"age" : 29,
"about" : "I like to collect rock albums", "interests": [ "music" ]
}'

##面试问题、生产案列
#数据多时，id不能重复，手动指定存在问题，（性能损耗）每次插入都需要去判断是否重复。 使用随机ID可以解决，但是存在查询数据无法更好的匹配。
#解决方法：可以在插入输入是，添加一个新的字段SID  进行插入 索引

curl -XPOST  '127.0.0.1:9200/userinfo/test/?pretty' -H 'Content-Type: application/json' -d' {
"name": "Zxl",
"age" : 29,
"sid" : 1,
"about" : "I like to collect rock albums", "interests": [ "music" ]
}'

#查询索引中所有的
curl -XGET localhost:9200/vipinfo/user/_search?pretty
    
#查询指定文档数据
curl -XGET 'localhost:9200/vipinfo/user/1?pretty'
curl -XGET 'localhost:9200/vipinfo/user/2?pretty'
    
#按条件查询文档数据
#查询索引中符合条件的数据:搜索姓氏为Smith的雇员
curl -XGET 'localhost:9200/vipinfo/user/_search?q=last_name:Smith&pretty'
    
#使用Query-string查询 
curl -XGET 'localhost:9200/vipinfo/user/_search?pretty' -H 'Content-Type: application/json' -d'           
{
  "query" : { 
    "match" : {
        "last_name" : "Smith"
     }
  } 
}
'
#使用过滤器查询
#搜索姓氏为 Smith 的雇员，但这次我们只需要年龄大于 30 的。
#查询需要稍作调整，使用过滤器 filter ，它支持高效地执行一个结构化查询�

curl -XGET 'localhost:9200/vipinfo/user/_search?pretty' -H 'Content-Type: application/json' -d'{ 
  "query" : { 
    "bool": { 
      "must": { 
        "match" : { 
          "last_name" : "smith" 
          } 
     }, 
     "filter": { 
        "range" : {"age" : { "gt" : 36 }  
          } 
        } 
      } 
    } 
 }'

#更新数据
#PUT更新，需要填写完整的信息
curl -XPUT 'localhost:9200/vipinfo/user/1?pretty' -H 'Content-Type: application/json' -d'
{
    "first_name" : "John",
    "last_name": "Smith",
    "age" : 40,
    "about" : "I love to go rock climbing", "interests": [ "sports", "music" ]
}'

#POST更新，只需要填写需要更改的信息 
curl -XPOST 'localhost:9200/vipinfo/user/1?pretty' -H 'Content-Type: application/json' -d'
{
    "age" : 29
}'

#删除指定文档数据
curl -XDELETE 'localhost:9200/vipinfo/user/1?pretty’
{
  "_index" : "vipinfo",
  "_type" : "user",
  "_id" : "1",
  "_version" : 2,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 2
}'

#删除索引
curl -XDELETE 'localhost:9200/vipinfo?pretty'
{
  "acknowledged" : true
}'

5、ES回顾

es默认设置：
    5分片
    1副本 在其他机器上备份
    
    应用场景：
    1.搜索，高亮显示
    2.商城搜索
    3.日志收集分析展示
    
    集群状态颜色：
    绿色:所有条件都满足，数据完整，副本满足
    黄色:数据完整，副本不满足   
    红色:有索引里的数据出现不完整了
    紫色:有分片正在同步中
    
    默认自己就是一个集群，默认的集群名称为:elasticsearch
    
    安装注意的内容：
    1.锁定内存要修改配置
    2.JVM虚拟机最大最小内存设置为一样
    3.最大内存不要超过30G
    4.更改数据目录需要授权用户给elasticsearch
    5.es启动比较慢
    
    数据操作：
    增删改查
    1.插入数据不需要提前创建好数据库
    2.index -- 库
      type -- 表
      filter -- 字段
    3.默认随机生成_ID -- 唯一键ID
    
    交互方式：
    1.curl命令
    2.es-head插件 
    3.kibana

6、ES集群

集群说明
1.多台机器
2.处于同一个组里
集群配置文件：

[root@db01 elasticsearch]# grep "^[a-z]" elasticsearch.yml 
cluster.name: Linux         #集群名称，同一个集群内所有节点集群名称要一模一样
node.name: node-1           #节点名称，同一个集群内所有节点的节点名称不能重复
path.data: /data/elasticsearch          #数据目录
path.logs: /var/log/elasticsearch       #日志目录
bootstrap.memory_lock: true             #内存锁定
network.host: 10.0.0.51,127.0.0.1       #绑定监听地址
http.port: 9200                         #默认端口号
discovery.zen.ping.unicast.hosts: ["10.0.0.51", "10.0.0.52"]    #集群发现节点配置
discovery.zen.minimum_master_nodes: 2   #选项相关参数,有公式 master/2 +1

新增节点配置步骤:

#1.安装软件
rpm -ivh elasticsearch-6.6.0.rpm
#2.修改配置文件
[root@db02 elasticsearch]# cat /etc/elasticsearch/elasticsearch.yml 
cluster.name: Linux
node.name: node-2
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: 10.0.0.52,127.0.0.1
http.port: 9200
discovery.zen.ping.unicast.hosts: ["10.0.0.51", "10.0.0.52"]
discovery.zen.minimum_master_nodes: 2
#3.修改内存锁定
[root@db02 ~]# systemctl edit elasticsearch
[Service]
LimitMEMLOCK=infinity
#4.创建数据目录并授权
mkdir /data/elasticsearch
chown -R elasticsearch:elasticsearch /data/elasticsearch
#5.重启服务
systemctl daemon-reload
systemctl start elasticsearch
#6.查看日志和端口
tail -f /var/log/elasticsearch/Linux.log
netstat -lntup:grep 9200

集群情况说明
节点角色:
主节点 :负责调度数据返回数据
工作节点 :负责处理数据
默认情况下:
1.所有节点都是工作节点
2.主节点即负责调度又负责处理数据
操作指令:

#查询集群名称
curl -XGET 'http://localhost:9200/_nodes/procese?human&pretty' 
#查询集群配置信息
curl -XGET 'http://localhost:9200/_nodes/_all/info/jvm,process?human&pretty'
#查询集群节点信息
curl -XGET 'http://localhost:9200/_cat/nodes?human&pretty' 
#查询集群节点以及分片信息
curl -XGET 'http://localhost:9200/_cluster/health?pretty'
#查询集群索引信息
curl -XPUT 'localhost:9200/_cat/indices?pretty'
#故障案列
2个节点,master设置为2的时候,一台出现故障导致集群不可用
解决方案:
把还存活的节点的配置文件集群选举相关的选项注释掉或者改成1
discovery.zen.minimum_master_nodes: 1
重启服务

结论:
两个节点数据不一致会导致查询结果不一致
找出不一致的数据,清空一个节点,以另一个节点的数据为准
然后手动插入修改后的数据

增加第三台节点：

#新增节点配置步骤:
#1.安装软件
rpm -ivh elasticsearch-6.6.0.rpm
#2.修改配置文件
[root@db02 elasticsearch]# cat /etc/elasticsearch/elasticsearch.yml 
cluster.name: Linux
node.name: node-3
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: 10.0.0.53,127.0.0.1
http.port: 9200
discovery.zen.ping.unicast.hosts: ["10.0.0.51", "10.0.0.53"]
discovery.zen.minimum_master_nodes: 2
#3.修改内存锁定
[root@db02 ~]# systemctl edit elasticsearch
[Service]
LimitMEMLOCK=infinity
#4.创建数据目录并授权
mkdir /data/elasticsearch -p 
chown -R elasticsearch:elasticsearch /data/elasticsearch
#5.重启服务
systemctl daemon-reload
systemctl start elasticsearch
#6.查看日志和端口
tail -f /var/log/elasticsearch/Linux.log
netstat -lntup|grep 9200
#7、插入测试数据测试
curl -XPUT 'localhost:9200/zzz/zhang/1?pretty' -H 'Content-Type: application/json' -d'
{
    "first_name" : "John",
    "last_name": "Smith",
    "age" : 25,
    "about" : "I love to go rock climbing", "interests": [ "sports", "music" ]
}
'

curl -XPUT  'localhost:9200/zzz/zhang/2?pretty' -H 'Content-Type: application/json' -d' {
    "first_name": "Jane",
    "last_name" : "Smith",
    "age" : 32,
    "about" : "I like to collect rock albums", "interests": [ "music" ]
}
'

curl –XPUT  'localhost:9200/zzz/zhang/3?pretty' -H 'Content-Type: application/json' -d' {
    "first_name": "Douglas", "last_name" : "Fir",
    "age" : 35,
    "about": "I like to build cabinets", "interests": [ "forestry" ]
}'

ES集群总结

默认数据分配：
5分片
1副本


监控状态
1.监控集群健康状态 不是 green
or
2.监控集群节点数量 不是 3


极限损坏：
3节点
坏2台节点

默认创建数据：
curl -XPUT 'localhost:9200/index1?pretty'  -H 'Content-Type: application/json' -d'
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "index1"
}

创建索引的时候指定分片和副本
curl -XPUT 'localhost:9200/index2?pretty' -H 'Content-Type: application/json' -d'       
{
   "settings" : { 
   "number_of_shards" : 3, 
   "number_of_replicas" : 1
 } 
}'


分片数一旦创建就不能再更改了，但是我们可以调整副本数
curl -XPUT 'localhost:9200/index2/_settings?pretty' -H 'Content-Type: application/json' -d'         
{
  "settings" : { 
  "number_of_replicas" : 2
 } 
}'


curl –XPUT  'localhost:9200/linux2/user/3?pretty' -H 'Content-Type: application/json' -d' {
    "first_name": "Douglas", "last_name" : "Fir",
    "age" : 35,
    "about": "I like to build cabinets", "interests": [ "forestry" ]
}'


curl -XPUT  'localhost:9200/linux2/user/3?pretty' -H 'Content-Type: application/json' -d' {
    "first_name": "Douglas", "last_name" : "ssss",
    "age" : 35,
    "about": "I like to build cabinets", "interests": [ "forestry" ]
}'

中文分词器：

#因为存在检索中文，软件默认检索中文 会拆分成一个一个字 而不是我们检索的词
##测试
curl -XPOST 'localhost:9200/index/fulltext/1?pretty' -H 'Content-Type:application/json' -d'
{"content":"美国留给伊拉克的是个烂摊子吗"}
'
curl -XPOST 'localhost:9200/index/fulltext/2?pretty' -H 'Content-Type:application/json' -d'
{"content":"公安部：各地校车将享最高路权"}
'
curl -XPOST 'localhost:9200/index/fulltext/3?pretty' -H 'Content-Type:application/json' -d'
{"content":"中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"}
'
curl -XPOST 'localhost:9200/index/fulltext/4?pretty' -H 'Content-Type:application/json' -d'
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
'   
#测试检索
curl -XPOST 'localhost:9200/index/fulltext/_search?pretty'  -H 'Content-Type:application/json' -d'
{
    "query" : { "match" : { "content" : "中国" }},
    "highlight" : {
        "pre_tags" : ["", ""],
        "post_tags" : ["", ""],
        "fields" : {
            "content" : {}
        }
    }
}
'   
#返回结果 吧所有的中字和国字都进行找出了.  无法准确找出词语 ，所以需要安装中文分词器
"content" : [
            "美国留给伊拉克的是个烂摊子吗"
          ] 
##安装部署：  所有节点都得安装中文分词器

#官方地址
https://github.com/medcl/elasticsearch-analysis-ik

#分词器安装
cd /usr/share/elasticsearch/bin
./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/    v6.4.2/elasticsearch-analysis-ik-6.4.2.zip
#安装完成后 所有节点重启
systemctl restart elasticsearch 
#分词器测试  需要在创建的额时候指定索引
#创建索引
curl -XPUT 'localhost:9200/index3?pretty'

#创建映射
curl -XPOST 'localhost:9200/index3/fulltext/_mapping?pretty' -H 'Content-Type:application/json' -d'
{
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_max_word"
            }
        }
}'
#插入测试数据
curl -XPOST 'localhost:9200/index3/fulltext/1?pretty' -H 'Content-Type:application/json' -d'
{"content":"美国留给伊拉克的是个烂摊子吗"}
'
curl -XPOST 'localhost:9200/index3/fulltext/2?pretty' -H 'Content-Type:application/json' -d'
{"content":"公安部：各地校车将享最高路权"}
'
curl -XPOST 'localhost:9200/index3/fulltext/3?pretty' -H 'Content-Type:application/json' -d'
{"content":"中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"}
'
curl -XPOST 'localhost:9200/index3/fulltext/4?pretty' -H 'Content-Type:application/json' -d'
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
'

#查询测试
curl -XPOST 'localhost:9200/index3/fulltext/_search?pretty'  -H 'Content-Type:application/json' -d'
{
    "query" : { "match" : { "content" : "中国" }},
    "highlight" : {
        "pre_tags" : ["", ""],
        "post_tags" : ["", ""],
        "fields" : {
            "content" : {}
        }
    }
}
'

Elasticsearch

1、介绍

2、部署安装

3、Elasticsearch交互- head插件交互

4、Elasticsearch 增删改查

5、ES回顾

6、ES集群

你可能感兴趣的:(Elasticsearch)