查询: 宽泛的概念!只要将某个东西查询出来!
精确查询:
模糊查询:
搜索: 一种特定的查询! 搜索一般指 通过某个关键字,检索出和关键字相关的信息!
搜索引擎,不适合使用关系型数据库存储数据!
原因: ①在搜索时,只输入关键字,希望可以得到匹配关键字的所有的数据!如果使用数据库,在查询时一定需要模糊查询,模糊查询会导致索引失效,全表扫描!效率低!
select xxx from xxx where xxx like %aaa% //索引失效,有索引,查询引擎不会用
select xxx from xxx where xxx like aaa% //索引有效,加速查询
②关系型数据库查询时,不能分词,联想,得到的不是期望的结果!
solr : 和es的作用是一样的,都是用于搜索!
solr一般用于中小数据量的静态搜索(数据,很少发生变化)!
es可以用于PB级别数据量的动态搜索(数据可能会不断新增,变化)!
效率上: solr(老大哥): 小数据量,静态搜索,优于es!
solr在插入数据时,创建索引会有IO阻塞,效率低!
es(新人) : 大数量,动态搜索,优于solr!
es在插入数据时,创建索引,无阻塞! 不是实时,接近实时搜索,延迟秒级!
依赖: solr 依赖 zk
es不依赖任何框架!
数据类型: solr 丰富: xml,json
es 单一: json
扩展性: es更容易扩展,天然集群!
Lucene: 搜索场景,常用的API集合!
本质是一个框架,可以集成到项目中,提供搜索场景常用的API,方便开发!
搜索工具包!
业界公认的非常优秀的搜索框架!
Nutch : 是一个可以直接使用的产品! 基于lucene提供web浏览器的搜索产品! 小型google!
ES : es内置了Lucene,使Lunece变得更好用! 使用RESTFUL风格,使用ES!
直接通过浏览器,发送REST请求,使用ES完成数据的CRUD!
全文检索:
最初的含义: 提供一个关键字,在整篇文章中,搜索和关键字匹配的片段!
应用开发含义: 提供一个关键字,在整个数据库中,搜索和关键字匹配的数据!
如果要实现全文检索,必须依赖倒排索引!
索引: 是一种数据结构,加速查询!
类似一本百科全书的目录,根据目录直接跳转到感兴趣的书页!
正排索引:在mysql中创建的索引,在hbase中创建的索引,都属于正排索引!
举例: 《唐诗三百首》(数据库)
目录(正排索引): 诗名 ------> 哪一页 ------> 诗的内容
搜索 《静夜思》
倒排索引:
举例: 《唐诗三百首》(数据库)
目录(倒排索引): 存储的不是诗名和页面的对应关系!
词语 ------> 在哪些诗中出现了,诗是哪一页
明月--------> 《静夜思》 200页, 《xxx》300页
搜索:包含明月的古诗有哪些
搜索引擎都使用倒排索引!
天然分片: 数据在写入时,会被分为若干片,每一片会分布到集群的不同节点!
优势: 横向扩容! 负载均衡! 提高并行IO能力!
天然集群: 一台ES实例也可以组成一个集群! 方便扩容! 如果集群需要增加节点!
只需要在其他节点安装ES,直接启动,自动在网段中寻找ES集群,自动加入集群!
天然索引: mysql和其他的数据库,需要手动创建索引! ES在插入数据后自动创建索引!
文档:
https://www.elastic.co/guide/en/elasticsearch/reference/6.6/index.html
REST是一种思想和理念! 推崇使用标准的url路径,表达对资源的操作方式!本质是为了简化和规范url路径的写法!
没有REST之前: 在浏览器发送一个url时,可以随意写
举例: 查询1号员工
[http://hadoop102:8088/gmall/getEmployeeById?id=1](http://hadoop102:8088/gmall/getEmployeeById?id=1)
[http://hadoop102:8088/gmall/findEmployeeById?id=1](http://hadoop102:8088/gmall/findEmployeeById?id=1)
http://hadoop102:8088/gmall/retreveEmployeeById?id=1
http://hadoop102:8088/gmall/queryEmployeeById?id=1
http://hadoop102:8088/gmall/tongguoidchaxunyuangong?id=1
规范: /资源/id
可使用不同的请求方式,表达对资源的操作意图!
REST : /Employee/1
发送GET,代表查询
发送POST,代表新增
发送PUT,代表修改
发送DELETE ,代表删除
发送HEAD , 判断是否存在
http://hadoop102:8088/gmall/Emp/1 GET
框架使用RESTFUL的开发理念!这个框架支持REST风格的API操作!
B(balance)-tree: B树,多路平衡(自愈)树
B+tree: B-tree的改进
LSM树(mysql,hbase)
官网: https://www.elastic.co/cn/downloads/elasticsearch
本次学习基于6.6.0版本
# 1.解压elasticsearch-6.6.0.tar.gz到/opt/module目录下
tar -zxvf elasticsearch-6.6.0.tar.gz -C /opt/module/
# 2.在/opt/module/elasticsearch-6.6.0路径下创建data文件夹
mkdir data
# 3.修改配置文件(config/elasticsearch.yml)
#-----------------------Cluster-----------------------
cluster.name: my-application
#-----------------------Node-----------------------
node.name: node-102
#-----------------------Paths-----------------------
path.data: /opt/module/elasticsearch-6.6.0/data
path.logs: /opt/module/elasticsearch-6.6.0/logs
#-----------------------Memory-----------------------
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
#-----------------------Network-----------------------
network.host: hadoop102
#-----------------------Discovery-----------------------
discovery.zen.ping.unicast.hosts: ["hadoop102","hadoop103","hadoop104"]
# 4.将 /opt/module/elasticsearch 分发至各节点
xsync /opt/module/elasticsearch
# 5.修改hadoop103,hadoop104上的配置文件(修改node.name,network.host)
参考:http://blog.csdn.net/satiling/article/details/59697916
# 1.借用root权限,编辑/etc/security/limits.conf 添加类似如下内容,注意*不要省略
* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 4096
# 2.借用root权限修改配置sysctl.conf (/etc/sysctl.conf)
#添加如下配置
vm.max_map_count=655360
#并执行命令
sysctl -p
#3.以上修改的配置分发到各节点
xsync /etc/security/limits.conf
xsync /etc/sysctl.conf
#4.重启linux
[atguigu@hadoop102 elasticsearch]$ bin/elasticsearch
打开浏览器访问hadoop102:9200
群起脚本
[atguigu@hadoop102 bin]$ vi es.sh
#!/bin/bash
es_home=/opt/module/elasticsearch-6.6.0
case $1 in
"start") {
for i in hadoop102 hadoop103 hadoop104
do
echo "==============$i=============="
ssh $i "source /etc/profile;${es_home}/bin/elasticsearch >/dev/null 2>&1 &"
sleep 4s;
done
};;
"stop") {
for i in hadoop102 hadoop103 hadoop104
do
echo "==============$i=============="
ssh $i "ps -ef|grep $es_home |grep -v grep|awk '{print \$2}'|xargs kill" >/dev/null 2>&1
done
};;
esac
#1.解压kibana-6.6.0-linux-x86_64.tar.gz到/opt/module下
tar -zxvf kibana-6.6.0-linux-x86_64.tar.gz -C /opt/module/
mv kibana-6.6.0-linux-x86_64/ kibana/
#2.修改配置文件
vim config/kibana.yml
server.port: 5601
server.host: "hadoop102"
eleasticsearch.hosts: ["http://hadoop102:9200"]
[atguigu@hadoop102 kibana]$ bin/kibana
打开浏览器访问 hadoop102:5601
#!/bin/bash
es_home=/opt/module/elasticsearch-6.6.0
kibana_home=/opt/module/kibana
case $1 in
"start") {
for i in hadoop102 hadoop103 hadoop104
do
echo "==============$i=============="
ssh $i "source /etc/profile;${es_home}/bin/elasticsearch >/dev/null 2>&1 &"
sleep 4s;
done
sleep 2s;
nohup ${kibana_home}/bin/kibana > kibana.log 2>&1 &
};;
"stop") {
ps -ef | grep ${kibana_home} | grep -v grep | awk '{print $2}'| xargs kill
for i in hadoop102 hadoop103 hadoop104
do
echo "==============$i=============="
ssh $i "ps -ef|grep $es_home |grep -v grep|awk '{print \$2}'|xargs kill" >/dev/null 2>&1
done
};;
esac
GET /_cat
# 带_xxx,都是系统内置的关键字
#查看节点状况
GET /_cat/nodes?v
#查看健康状况
GET /_cat/health
#查看所有的index
get /_cat/indices
#一个库
#查index
#查看所有的index
GET /_cat/indices
#查看某个index的信息
GET /_cat/indices/.kibana_1
#查看某个index的元数据信息
GET /stu1
##查看某个index的表结构
GET /.kibana_1/_mapping
#新增Index
#手动创建 需要在创建index时指定mapping信息
#6.0版本一个Index只能创建一个type,名称随意
PUT stu
{
"mappings": {
"table1":{
"properties":{
"id":{
"type":"keyword"
},
"name":{
"type":"text"
},
"sex":{
"type":"integer"
},
"birth":{
"type":"date"
}
}
}
}
}
#自动创建 直接向一个不存在的Index插入数据,在插入数据时,系统根据数据的类型,自动推断mapping,自动创建mapping
# POST /indexname/typename/id
POST /stu1/table1/1
{
"id":"1001",
"name":"jack"
}
#删除index
DELETE /stu1
#修改index 需要执行迁移操作,从一个index读取数据,写入一个新的index
#判断是否存在index 404 - Not Found代表不存在 ,200代表存在
HEAD /stu
#type就等价于index
#7.0之后没有type的概念了,6.0一个index只允许创建一个type,因此index 等价于 type
#查 type 和查index一致
#删除type 就是删除index
#创建type 就是创建index
#判断type是否存在 405 - Method Not Allowed 判断index
#查
#全表查询
GET /stu/table1/_search
#查询单个元素 GET /indexname/typename/id
# _id才是唯一标识
GET /stu/table1/1
#增
#POST /indexname/typename/id
POST /stu/table1/2
{
"id":"tom",
"name":"tom"
}
#POST也可以实现更新操作,如果当前记录的ID不存在,就insert,存在就update。 更新是全量更新
POST /stu/table1/2
{
"id":"1003"
}
#POST新增,不指定ID,就随机生成ID
POST /stu/table1/
{
"id":"tom",
"name":"tom"
}
#增量更新
#400 : 客户端发送的参数不符合要求
#404 : 客户端发送的url路径匹配不上
#405 : 客户端发送的url,对应的请求方式不符合
POST /stu/table1/rx4wNHwBb4g3p3m-lruA/_update
{
"doc": {
"id":"1003"
}
}
#改 PUT
#新增 PUT在新增时,必须指定id!
PUT /stu/table1/3
{
"id":"1003",
"name":"marry"
}
#405 /stu/table1/只允许POST,不允许PUT
PUT /stu/table1/
{
"id":"1003",
"name":"marry"
}
#id存在就更新,不存在就插入,默认也是全量更新
PUT /stu/table1/3
{
"name":"jack"
}
#不能增量更新
PUT /stu/table1/rx4wNHwBb4g3p3m-lruA/_update
{
"doc": {
"id":"1004"
}
}
# 4xxx开头的都是客户端错误
# 405: 客户端发送的请求方式错误,例如只允许发POST,你发了PUT
# 400 : 请求参数格式错误。没有按照人家指定的格式发参数
#删
DELETE /stu/table1/rx4wNHwBb4g3p3m-lruA
#判断是否存在
HEAD /stu/table1/rx4wNHwBb4g3p3m-lruA
HEAD /stu/table1/1
# text(允许分词) keyword(不允许分词)
# 默认的分词器,用来进行英文分词,按照空格分
GET /_analyze
{
"text": "I am a teacher!"
}
#不能分词
GET /_analyze
{
"keyword": "I am a teacher!"
}
# 汉语按照字切分
GET /_analyze
{
"text": "国庆节快乐"
}
#ik_smart: 智能分词。切分后的所有单词的总字数等于 被切词的总字数 输入总字数=输出总字数
GET /_analyze
{
"analyzer": "ik_smart",
"text": "国庆节快乐"
}
#ik_max_word: 最大化分词。 输入总字数 <= 输出总字数
GET /_analyze
{
"analyzer": "ik_max_word",
"text": "国庆节快乐"
}
#只是切词,没有NLP(自然语言处理),没有感情,不会思考,听不懂人话
GET /_analyze
{
"analyzer": "ik_max_word",
"text": "爱好抽烟喝酒烫头洗屁股眼子"
}
java中:
public class Person{
public String name;
public Address address;
}
public class Address{
public String provinceName;
}
provinceName称为是Person类的 级联(层级联系)属性, 或子属性(属性的属性)
json中:
person:
{
age: 20
address:{
"provinceName":"广东"
}
}
注意:
"name" : {
"type" : "text",
"fields" : {
"aaa" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
text类型的字段,如果将来需要聚合,一定需要为其设置一个子属性,子属性的类型必须是keyword类型!
#导入数据:
#_bulk代表批量写
#格式 : {"action": {metadata}}\n {data}
# action: insert,update,delete, index(upsert): 存在就更新,不存在就插入
#metadata 指定当前向哪个index,哪个type,哪个id进行写
#_id: id _index:xxx _type:哪个type
关键字 | 含义 | 类比SQL |
---|---|---|
query | 查询 | select |
bool | 多个组合条件 | selext xxx from xxx where age=20 and gender=male |
filter | 一个过滤条件 | where |
term | 精确匹配 | = |
match | 全文检索,会分词 | |
must | 在过滤条件中使用,代表必须包含 | |
fuzzy | 模糊音匹配 | dick 联想到 nick pick |
from | 从哪一条开始取,索引从0开始 | |
size | 取多少条 | limit |
_source | 只选择某些字段 | select 字段 |
match_phrase | 短语匹配,将输入的查询内容整个作为整体进行查询,不切词 | |
multi_match | 一次到多个子弹中匹配内容 | |
aggregations|aggs
"aggregations" :
{
--aggregation_name:聚合字段名
"" :
{
--聚合运算的类型,类比,sum,avg,count(Term),min,max sum()
"" :
{
--num 对什么字段进行聚合
<aggregation_body>
}
-- 对哪些表进行聚合,类比tablea,不写,将meta写在url
[,"meta" : { [<meta_data_body>] } ]?
--子聚合,在当前聚合的基础上,继续聚合
[,"aggregations" : { [<sub_aggregation>]+ } ]?
}
--
[,"" : { ... } ]*
}
count 等价于 term
count(*) ======== sum(if(gender = 'male',1,0))
select
a,max(sum_num) --子聚合
from
(select
a,b,sum(num) sum_num,max(num) max_num
from tablea
where xxx
group by a,b) tmp
group by a
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [gender] in order to load fielddata in memory by uninverting the inverted index.
Note that this can however use significant memory. Alternatively use a keyword field instead."
TEXT类型,因为涉及到分词,无法被聚合!
解决: 使用KEYWORD类型
a_column(text)
中国人 ------> 中国,国人,中国人
- 见第五章综合练习
#导入测试数据
#建表
PUT /test
{
"mappings" : {
"emps" : {
"properties" : {
"empid" : {
"type" : "long"
},
"age" : {
"type" : "long"
},
"balance" : {
"type" : "double"
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"gender" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"hobby" : {
"type" : "text",
"analyzer":"ik_max_word",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
#导数据
POST /test/emps/_bulk
{"index":{"_id":"1"}}
{"empid":1001,"age":20,"balance":2000,"name":"李三","gender":"男","hobby":"吃饭睡觉"}
{"index":{"_id":"2"}}
{"empid":1002,"age":30,"balance":2600,"name":"李小三","gender":"男","hobby":"吃粑粑睡觉"}
{"index":{"_id":"3"}}
{"empid":1003,"age":35,"balance":2900,"name":"张伟","gender":"女","hobby":"吃,睡觉"}
{"index":{"_id":"4"}}
{"empid":1004,"age":40,"balance":2600,"name":"张伟大","gender":"男","hobby":"打篮球睡觉"}
{"index":{"_id":"5"}}
{"empid":1005,"age":23,"balance":2900,"name":"大张伟","gender":"女","hobby":"打乒乓球睡觉"}
{"index":{"_id":"6"}}
{"empid":1006,"age":26,"balance":2700,"name":"张大喂","gender":"男","hobby":"打排球睡觉"}
{"index":{"_id":"7"}}
{"empid":1007,"age":29,"balance":3000,"name":"王五","gender":"女","hobby":"打牌睡觉"}
{"index":{"_id":"8"}}
{"empid":1008,"age":28,"balance":3000,"name":"王武","gender":"男","hobby":"打桥牌"}
{"index":{"_id":"9"}}
{"empid":1009,"age":32,"balance":32000,"name":"王小五","gender":"男","hobby":"喝酒,吃烧烤"}
{"index":{"_id":"10"}}
{"empid":1010,"age":37,"balance":3600,"name":"赵六","gender":"男","hobby":"吃饭喝酒"}
{"index":{"_id":"11"}}
{"empid":1011,"age":39,"balance":3500,"name":"张小燕","gender":"女","hobby":"逛街,购物,买"}
{"index":{"_id":"12"}}
{"empid":1012,"age":42,"balance":3400,"name":"李三","gender":"男","hobby":"逛酒吧,购物"}
{"index":{"_id":"13"}}
{"empid":1013,"age":42,"balance":3400,"name":"李球","gender":"男","hobby":"体育场,购物"}
{"index":{"_id":"14"}}
{"empid":1014,"age":22,"balance":3400,"name":"李健身","gender":"男","hobby":"体育场,购物"}
{"index":{"_id":"15"}}
{"empid":1015,"age":22,"balance":3400,"name":"Nick","gender":"男","hobby":"坐飞机,购物"}
#0.查询的两种方式
#①.RESTFUL的查询方式,参数是需要附加在url的后面
#②ES定义的DSL(特定领域语言),需要根据DSL的语法规则将参数写在请求体中
#1.全表查询,按照年龄降序排序
#① RESTFUL 知道在ES中,不同的参数是什么操作 q代表查询 sort代表排序
GET /test/emps/_search?q=*&sort=age:desc
#②DSL 学习DSL的语法规则
GET /test/emps/_search
{
"query": {
"match_all": {
}
},
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
#2.全表查询,按照年龄降序排序,再按照工资降序排序,只取前5条记录的empid,age,balance
GET /test/emps/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"age": {
"order": "desc"
}
},
{
"balance": {
"order": "desc"
}
}
],
"from": 0
, "size": 5,
"_source": ["empid","age","balance"]
}
#3.匹配之match分词匹配: 搜索hobby是吃饭睡觉的员工
GET /_analyze
{
"analyzer": "ik_max_word",
"text": "吃饭睡觉"
}
GET /test/emps/_search
{
"query": {
"match": {
"hobby": "吃饭睡觉"
}
}
}
#4.匹配之match/term不分词匹配: 搜索工资是2000的员工
#只有text类型可以切词,balance是double类型,无法切词
#ES不建议对无法切词的类型,使用 match
GET /test/emps/_search
{
"query": {
"match": {
"balance": 2000
}
}
}
# 匹配之term不分词匹配: 搜索工资是2000的员工
GET /test/emps/_search
{
"query": {
"term": {
"balance": 2000
}
}
}
#
#5.匹配之match不分词匹配: 搜索hobby是吃饭睡觉的员工
# keyword类型不能切词,只需要使用 一个 keyword类型的hobby就行了
GET /test/emps/_search
{
"query": {
"match": {
"hobby.keyword": "吃饭睡觉"
}
}
}
#6.匹配之短语匹配: 搜索hobby是吃饭的员工
GET /test/emps/_search
{
"query": {
"match_phrase": {
"hobby": "吃饭睡觉"
}
}
}
#7.匹配之多字段匹配: 搜索name或hobby中带球的员工
GET /test/emps/_search
{
"query": {
"multi_match": {
"query": "球",
"fields": ["name","hobby"]
}
}
}
#8.匹配之多条件匹配,搜索男性中喜欢购物的员工
GET /test/emps/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"hobby": "购物"
}
},
{
"term": {
"gender": {
"value": "男"
}
}
}
]
}
}
}
#9.匹配之多条件匹配,搜索男性中喜欢购物,还不能爱去酒吧的员工
GET /test/emps/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"hobby": "购物"
}
},
{
"term": {
"gender": {
"value": "男"
}
}
}
],
"must_not": [
{
"match": {
"hobby": "酒吧"
}
}
]
}
}
}
#10.匹配之多条件匹配,搜索男性中喜欢购物,还不能爱去酒吧的员工,最好在20-30之间
#should 加分
GET /test/emps/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"hobby": "购物"
}
},
{
"term": {
"gender": {
"value": "男"
}
}
}
],
"must_not": [
{
"match": {
"hobby": "酒吧"
}
}
],
"should": [
{
"range": {
"age": {
"gt": 20,
"lt": 30
}
}
}
]
}
}
}
#11.匹配之多条件匹配,搜索男性中喜欢购物,还不能爱去酒吧的员工,最好在20-30之间,不要40岁以上的
GET /test/emps/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"hobby": "购物"
}
},
{
"term": {
"gender": {
"value": "男"
}
}
}
],
"must_not": [
{
"match": {
"hobby": "酒吧"
}
},
{
"range": {
"age": {
"gt": 40
}
}
}
],
"should": [
{
"range": {
"age": {
"gt": 20,
"lt": 30
}
}
}
]
}
}
}
GET /test/emps/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"hobby": "购物"
}
},
{
"term": {
"gender": {
"value": "男"
}
}
}
],
"must_not": [
{
"match": {
"hobby": "酒吧"
}
}
],
"should": [
{
"range": {
"age": {
"gt": 20,
"lt": 30
}
}
}
],
"filter": {
"range": {
"age": {
"lte": 40
}
}
}
}
}
}
#12.匹配之字段模糊联想匹配,搜索Nick
GET /test/emps/_search
{
"query": {
"fuzzy": {
"name": "Dick"
}
}
}
#13.聚合之单聚合,统计男女员工各多少人
#如果想取全部的聚合结果,size >= 分组数
GET /test/emps/_search
{
"aggs": {
"gendercount": {
"terms": {
"field": "gender.keyword",
"size": 2
}
}
}
}
#14.聚合之先查询再聚合,统计喜欢购物的男女员工各多少人
GET /test/emps/_search
{
"query": {
"match": {
"hobby": "购物"
}
},
"aggs": {
"gendercount": {
"terms": {
"field": "gender.keyword",
"size": 2
}
}
}
}
#15.聚合之多聚合,统计喜欢购物的男女员工各多少人,及这些人总体的平均年龄
GET /test/emps/_search
{
"query": {
"match": {
"hobby": "购物"
}
},
"aggs": {
"gendercount": {
"terms": {
"field": "gender.keyword",
"size": 2
}
},
"avgage":{
"avg": {
"field": "age"
}
}
}
}
#16.聚合之多聚合和嵌套聚合,统计喜欢购物的男女员工各多少人,及这些人不同性别的平均年龄
GET /test/emps/_search
{
"query": {
"match": {
"hobby": "购物"
}
},
"aggs": {
"gendercount": {
"terms": {
"field": "gender.keyword",
"size": 2
},
"aggs": {
"avgage": {
"avg": {
"field": "age"
}
}
}
}
}
}
别名和索引是N对N的关系!
1个别名 对于 N个索引!
1个索引可以拥有多个别名!
别名的主要应用场景:
在hive中有分区表,常见按照数据的日期分区。比如表ods_a,按照dt分区
/ ods_a / dt= 2021-07-07
/ ods_a / dt= 2021-07-08
只查询某一天的数据,使用分区字段进行过滤
where dt= 2021-07-07
如果是全表查询,不加where过滤!
在ES中,如何实现一个分区表的效果?
要实现分区的效果:
只能将每天产生的数据,放入到一个独立的index中
2021-07-07 ----------> ods_a_2021-07-07_index
2021-07-08 ----------> ods_a_2021-07-08_index
只查询某一天的数据,只查询某个对应的index
2021-07-07 ------> GET ods_a_2021-07-07_index
查询这个月的所有数据?
这个月的index在创建时,为它们赋予一个别名 2021-07_index
使用别名查询: GET 2021-07_index
查询每一天所有的数据?
每个index在创建时,为它们赋予一个别名 ods_a_index
使用别名查询: GET ods_a_index
#别名的查询
#查询所有的别名
GET /_cat/aliases?v
#查某个index的别名
GET /movie_index/_alias
#增
#在创建Index时,直接指定
PUT movie_index
{
"aliases": {
"movie1": {},
"movie2": {}
},
"mappings": {
"movie_type":{
"properties": {
"id":{
"type": "long"
},
"name":{
"type": "text",
"analyzer": "ik_smart"
}
}
}
}
}
#为已经创建好的index,添加别名
POST _aliases
{
"actions": [
{
"add": {
"index": "movie_index",
"alias": "movie3"
}
}
]
}
#使用别名来引用一个index的子集
POST _aliases
{
"actions": [
{
"add": {
"index": "test",
"alias": "man",
"filter": {
"term": {
"gender": "男"
}
}
}
}
]
}
GET /man/_search
#将movie_index的别名 movie3删除,为test添加movie3
POST _aliases
{
"actions": [
{
"remove": {
"index": "movie_index",
"alias": "movie3"
}
},
{
"add": {
"index": "test",
"alias": "movie3"
}
}
]
}
#查看
#查看当前所有定义的模板
GET /_cat/templates
#新增
#index_patterns 指当你创建的索引名称符合当前模板的index_patterns时,调用模板帮你创建index
PUT /_template/template_movie2020
{
"index_patterns": ["movie_test*"],
"aliases" : {
"{index}-query": {},
"movie_test-query":{}
},
"mappings": {
"_doc": {
"properties": {
"id": {
"type": "keyword"
},
"movie_name": {
"type": "text",
"analyzer": "ik_smart"
}
}
}
}
}
GET /test
#Rejecting mapping update to [movie_index] as the final mapping would have more than 1 type: [movie_type, t1]
#movie2 是一个别名,指向movie_index
# PUT /movie_index/t1/1
# movie_index 的唯一type 是movie_type,你又指定了t1,冲突了
PUT /movie2/t1/1
{
"name":"jack"
}
GET /_cat/aliases
GET /movie_index
PUT /hahah/t1/1
{
"name":"jack"
}
GET /movie_test2
PUT /movie_test2/_doc/1
{
"name":"jack"
}
HEAD /_template/template_movie2020
新建maven工程,导入依赖
<dependency>
<groupId>org.apache.httpcomponentsgroupId>
<artifactId>httpclientartifactId>
<version>4.5.5version>
dependency>
<dependency>
<groupId>org.apache.httpcomponentsgroupId>
<artifactId>httpmimeartifactId>
<version>4.3.6version>
dependency>
<dependency>
<groupId>io.searchboxgroupId>
<artifactId>jestartifactId>
<version>5.3.3version>
dependency>
<dependency>
<groupId>net.java.dev.jnagroupId>
<artifactId>jnaartifactId>
<version>4.5.2version>
dependency>
<dependency>
<groupId>org.codehaus.janinogroupId>
<artifactId>commons-compilerartifactId>
<version>2.7.8version>
dependency>
<dependency>
<groupId>org.elasticsearchgroupId>
<artifactId>elasticsearchartifactId>
<version>6.6.0version>
dependency>
<dependency>
<groupId>org.projectlombokgroupId>
<artifactId>lombokartifactId>
<version>1.18.12version>
<scope>providedscope>
dependency>
javabean(Emp.java)
package com.atgugu.esdemo.pojo;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
@NoArgsConstructor
@AllArgsConstructor
@Data
public class Emp {
private String empid;
private Integer age;
private Double balance;
private String name;
private String gender;
private String hobby;
}
package com.atgugu.esdemo;
import com.atgugu.esdemo.pojo.Emp;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.Search;
import io.searchbox.core.SearchResult;
import java.io.IOException;
import java.util.List;
/**
* 一般步骤
* 1.创建一个客户端
* 2.连接服务端
* 3.准备命令
* 4.发送命令
* 5.如果是查询,接收服务端返回的结果
* -------------------------------------
* Jest客户端大量使用以下两种模式
* 工厂模式: new 对象Factory().get对象()
* 建筑者模式: new 对象Builder().build()
* 在建筑者模式中大量使用了java语法糖
* A.B() 返回 A
* -------------------------------------
*/
public class ReadDemo01 {
public static void main(String[] args) throws IOException {
//建厂
JestClientFactory jestClientFactory = new JestClientFactory();
//设置连接的集群地址
HttpClientConfig httpClientConfig = (new HttpClientConfig.Builder("http://hadoop102:9200")).build();
jestClientFactory.setHttpClientConfig(httpClientConfig);
//获取连接
JestClient jestClient = jestClientFactory.getObject();
String queryString = "{\n" +
" \"query\": {\n" +
" \"match\": {\n" +
" \"hobby\": \"购物\"\n" +
" }\n" +
" },\n" +
" \"aggs\": {\n" +
" \"gendercount\": {\n" +
" \"terms\": {\n" +
" \"field\": \"gender.keyword\",\n" +
" \"size\": 2\n" +
" },\n" +
" \"aggs\": {\n" +
" \"avgage\": {\n" +
" \"avg\": {\n" +
" \"field\": \"age\"\n" +
" }\n" +
" }\n" +
" }\n" +
" }\n" +
" }\n" +
"}";
// 使用 GET /test/emps/_search
Search search = new Search.Builder(queryString)
.addIndex("test")
.addType("emps")
.build();
SearchResult searchResult = jestClient.execute(search);
//遍历返回最后的结果
System.out.println("total:"+ searchResult.getTotal());
System.out.println("max_score:"+ searchResult.getMaxScore());
List<SearchResult.Hit<Emp, Void>> hits = searchResult.getHits(Emp.class);
for (SearchResult.Hit<Emp, Void> hit : hits) {
System.out.println("_index:"+hit.index);
System.out.println("_type:"+hit.type);
System.out.println("_id:"+hit.id);
System.out.println("_source:"+hit.source);
}
//关闭
jestClient.shutdownClient();
}
}
package com.atgugu.esdemo;
import com.atgugu.esdemo.pojo.Emp;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.Search;
import io.searchbox.core.SearchResult;
import io.searchbox.core.search.aggregation.AvgAggregation;
import io.searchbox.core.search.aggregation.MetricAggregation;
import io.searchbox.core.search.aggregation.TermsAggregation;
import org.elasticsearch.index.query.MatchQueryBuilder;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.terms.TermsAggregationBuilder;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import java.io.IOException;
import java.util.List;
/**
* 一般步骤
* 1.创建一个客户端
* 2.连接服务端
* 3.准备命令
* 4.发送命令
* 5.如果是查询,接收服务端返回的结果
* -------------------------------------
* Jest客户端大量使用以下两种模式
* 工厂模式: new 对象Factory().get对象()
* 建筑者模式: new 对象Builder().build()
* 在建筑者模式中大量使用了java语法糖
* A.B() 返回 A
* -------------------------------------
*/
public class ReadDemo02 {
public static void main(String[] args) throws IOException {
//建厂
JestClientFactory jestClientFactory = new JestClientFactory();
//设置连接的集群地址
HttpClientConfig httpClientConfig = (new HttpClientConfig.Builder("http://hadoop102:9200")).build();
jestClientFactory.setHttpClientConfig(httpClientConfig);
//获取连接
JestClient jestClient = jestClientFactory.getObject();
//创建一个对象,通过这个对象,将查询条件封装
//封装match
MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("hobby", "购物");
//封装aggs
TermsAggregationBuilder aggregationBuilder = AggregationBuilders.terms("gendercount").field("gender.keyword").size(2)
.subAggregation(AggregationBuilders.avg("avgage").field("age"));
//将match放入query
String querySource = new SearchSourceBuilder().query(matchQueryBuilder).aggregation(aggregationBuilder).toString();
// 使用 GET /test/emps/_search
Search search = new Search.Builder(querySource)
.addIndex("test")
.addType("emps")
.build();
SearchResult searchResult = jestClient.execute(search);
//遍历返回最后的结果
System.out.println("total:"+ searchResult.getTotal());
System.out.println("max_score:"+ searchResult.getMaxScore());
List<SearchResult.Hit<Emp, Void>> hits = searchResult.getHits(Emp.class);
for (SearchResult.Hit<Emp, Void> hit : hits) {
System.out.println("_index:"+hit.index);
System.out.println("_type:"+hit.type);
System.out.println("_id:"+hit.id);
System.out.println("_source:"+hit.source);
}
MetricAggregation aggregations = searchResult.getAggregations();
TermsAggregation genderCount = aggregations.getTermsAggregation("gendercount");
List<TermsAggregation.Entry> buckets = genderCount.getBuckets();
for (TermsAggregation.Entry bucket : buckets) {
System.out.println(bucket.getKey() + ":" + bucket.getCount());
AvgAggregation avgage = bucket.getAvgAggregation("avgage");
System.out.println(avgage.getAvg());
}
//关闭
jestClient.shutdownClient();
}
}
package com.atgugu.esdemo;
import com.atgugu.esdemo.pojo.Emp;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.DocumentResult;
import io.searchbox.core.Index;
import java.io.IOException;
import java.util.List;
/**
* 新增或修改:index
* 删除:Delete
*
*/
public class WriteDemo01 {
public static void main(String[] args) throws IOException {
//建厂
JestClientFactory jestClientFactory = new JestClientFactory();
//设置连接的集群地址
HttpClientConfig httpClientConfig = (new HttpClientConfig.Builder("http://hadoop102:9200")).build();
jestClientFactory.setHttpClientConfig(httpClientConfig);
//获取连接
JestClient jestClient = jestClientFactory.getObject();
//将写的数据封装为一个对象
Emp emp = new Emp("1018", 30, 22.22, "jack", "男", "吃饭");
//PUT /test/emps/16
Index index = new Index.Builder(emp)
.type("emps")
.index("test")
.id("18")
.build();
DocumentResult result = jestClient.execute(index);
System.out.println(result.getResponseCode());
//关闭
jestClient.shutdownClient();
}
}
package com.atgugu.esdemo;
import com.atgugu.esdemo.pojo.Emp;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.*;
import java.io.IOException;
/**
* 新增或修改:index
* 删除:Delete
* 批量写:Bulk
*
*/
public class WriteDemo02 {
public static void main(String[] args) throws IOException {
//建厂
JestClientFactory jestClientFactory = new JestClientFactory();
//设置连接的集群地址
HttpClientConfig httpClientConfig = (new HttpClientConfig.Builder("http://hadoop102:9200")).build();
jestClientFactory.setHttpClientConfig(httpClientConfig);
//获取连接
JestClient jestClient = jestClientFactory.getObject();
//将写的数据封装为一个对象
Emp emp = new Emp("1018", 30, 22.22, "jack", "男", "吃饭");
//PUT /test/emps/16
Index index = new Index.Builder(emp)
.type("emps")
.index("test")
.id("16")
.build();
Delete delete = new Delete.Builder("18").index("test").type("emps").build();
//将多次操作组装到一个Bulk中
Bulk bulk = new Bulk.Builder()
.addAction(index)
.addAction(delete).build();
BulkResult bulkResult = jestClient.execute(bulk);
System.out.println(bulkResult.getResponseCode());
//关闭
jestClient.shutdownClient();
}
}