标签(空格分隔): ELS
安装配置
安装版本:github上的elasticsearch-rtf
启动 in windows:bin\elasticsearch.bat
安装elasticsearch-head
启动:cnpm run start
安装之后浏览器打开http://localhost:9100/
问题:
elasticsearch-head显示elasticsearch未连接
原因:elasticsearch的自身安全设置,外部无权限
解决:修改elasticsearch的配置文件
添加:
http.cors.enabled:true
http.cors.allow-origin:"*"
http.cors.allow-methods:OPTIONS,HEAD,GET,POST,PUT,DELETE
http.cors.allow-headers:"X-Requested-With,Content-type,Content-Length,X-User"
概念
elasticsearch:自己保存数据,集合了数据保存和数据分析服务的搜索引擎。
集群:一个或多个节点组织在一起
节点:一个节点师集群中的一个服务器
分片:将索引划分为多份的能力
副本:创建分片的一份或多分的能力,一个节点失败其他节点可以顶上
倒排索引(逆向思维)(TF-IDF) :
通过对文件进行分词,将分词设置为索引,指明那些文件包含分词。
倒排索引需要解决的问题:
1.大小写转换问题
2.词干抽取
3.分词模式
4.倒排索引文件-压缩编码
使用
#es文档、索引的CRUD操作
#索引初始化操作
#指定指定分片和副本的数量
#shards一旦设置不能修改
# 新建索引
PUT lagou
{
"settings": {
"index": {
"number_of_shards":5, # 分片数量,不可修改
"number_of_replicas":1 # 副本数量,可修改
}
}
}
#查找索引的配置
GET lagou/_settings
GET _all/_settings
GET .kibana,lagou/_settings
GET _settings
#修改索引配置
PUT lagou/_settings
{
"number_of_replicas": 2
}
#PUT保存文档
PUT lagou/job/1
{
"title": "python爬虫开发工程师",
"salary_min": 15000,
"city": "北京",
"company": {
"name":"百度",
"company_addr":"北京软件园"
},
"publish_date":"2020-4-16",
"comments":20
}
#保存文档
POST lagou/job/
{
"title": "python django 开发工程师",
"salary_min": 15000,
"city": "北京",
"company": {
"name":"美团",
"company_addr":"北京软件园"
},
"publish_date":"2020-4-16",
"comments":20
}
GET lagou/job/1
GET lagou/job/1?_source=title
GET lagou/job/1?_source=title,city
#修改文章,完全覆盖,需指定 doc
PUT lagou/job/1
{
"title": "python爬虫开发工程师",
"salary_min": 15000,
"city": "北京",
"company": {
"name":"百度",
"company_addr":"北京软件园"
},
"publish_date":"2020-4-16",
"comments":20
}
#修改某一字段
POST lagou/job/1/_update
{
"doc":{
"comments":21
}
}
#删除
DELETE lagou/job/1
DELETE lagou
GET _mget
{
"docs":[
{
"_index": "testdb",
"type":"job",
"_id":1
},
{
"_index": "testdb",
"type":"job2",
"_id":2
}
]
}
GET testdb/_mget
{
"docs":[
{
"type":"job",
"_id":1
},
{
"type":"job2",
"_id":2
}
]
}
GET testdb/job/_mget
{
"docs":[
{
"_id":1
},
{
"_id":2
}
]
}
#上式等价于
GET testdb/job/_mget
{
"ids":[1,2]
}
POST _bulk
{"index":{"_index":"lagou","_type":"job","_id":1}}
{"title": "python django 开发工程师","salary_min": 15000,"city": "北京","company": {"name":"美团","company_addr":"北京软件园"},"publish_date":"2020-4-16","comments":20}
{"index":{"_index":"lagou","_type":"job","_id":2}}
{"title": "python django ","salary_min": 15000,"city": "北京","company": {"name":"alibaba","company_addr":"北京软件园"},"publish_date":"2020-4-16","comments":20}
映射
创建索引的时候,可以预先定义字段的类型及相关属性
在创建type时,会根据json源数据的类型,猜测你想要的字段映射。而映射(mapping)就是自定义字段的类型,告诉ES如何索引数据
作用:会让索引建立更加细致完善
内置类型:
a) 数字类型:long,integer,short,byte,double,float
b) 日期类型:date
c) bool类型:boolean
d) binary类型:binary
e) 复杂类型:object,nested
f) geo类型:geo-point,geo-shape
g) 专业类型:ip,competion
h) string类型:text,keyword
text和keyword的区别:
text会对内容进行解析,而keyword不会对内容进行解析
设置mappings映射的示例代码:
PUT lagou
{
"mappings":{
"job":{
"properties":{
"title":{
"type":"text"
},
"salary_min":{
"type":"integer"
},
"city":{
"type":"keyword"
},
"company":{
"properties":{
"name":{
"type":"text"
},
"company_adde":{
"type":"text"
},
"employee_count":{
"type":"integer"
}
}
},
"publish_date":{
"type":"date",
"format":"yyyy-MM-dd"
}
}
}
}
}
#analyzer分析器 ik
1.match查询
# ik 分词器会将关键词分词,大小写不敏感
GET lagou/job/_search
{
"stored_fields": ["title","company_name"], # 设置可以返回的字段,只返回设置了 store为True的字段
"query": {
"match": {
"title": "python"
}
}
}
2.term查询
# 在 term查询中不会将 关键词分词
GET lagou/job/_search
{
"query": {
"term": {
"title": "爬取"
}
}
}
3.terms查询
# 可以将数组作为查询条件,满足一个即可
GET lagou/job/_search
{
"query": {
"terms": {
"title":["python","爬虫"]
}
}
, "from": 1 #从查询的第几个开始显示[0-x]
, "size": 1 #控制返回数据的数量
}
4.match_phrase查询 短语查询
GET lagou/job/_search
{
"query": {
"match_phrase": {
"title":{
"query": "python开发", # 将查询条件分词,全部满足才可以
"slop":5 #两个分词间的距离
}
}
}
}
5.multi_match查询
GET lagou/job/_search
{
"query": {
"multi_match": {
"fields": ["title","desc"] # 要查询的字段名 title^3表示权重为3,优先排序
, "query": "python" # 查询条件
}
}
}
6.使用sort对结果排序
GET lagou/job/_search
{
"query": {
"match_all": {}
}
, "sort": [
{
"comments": {
"order": "desc"
}
}
]
}
7.range设置查询范围
GET lagou/job/_search
{
"query": {
"range": {
"comments": {
"gte": 10, # 范围开始
"lte": 14 # 范围结束
, "boost": 1 # 设置权重
}
}
}
}
# range的时间查询
GET lagou/job/_search
{
"query": {
"range": {
"add_time": {
"gt": "2019-8-1", # 范围开始
"lte": "now" # 范围结束
}
}
}
}
8.wildcard 查询(模糊查询)
GET lagou/job/_search
{
"query": {
"wildcard": {
"title": {
"value": "pytho*"
}
}
}
}
9.fuzzy 模糊搜索
GET lagou/_search
{
"query":{
"fuzzy"{"title":"Open"}
},
"_source":["title"]
}
**bool查询 *******
# 写法:
"bool":{
"filter":[], #过滤
"must":[], #必须全部满足 =>and
"should":[], #可以满足 =>or
"must_not":[] #必须全部不满足
}