Search & Analyze Data in Real Time
核心的功能就是搜索,全文搜索框架,接近实时的搜索强力搜索引擎依赖Lucene,新上传,修改的索引同步速度接近实时
优势:
1.分布式,水平扩容,高可用
2.实时搜索,提供分词功能
3.提供强力的restfulAPI
tb级别的数据量,需要提供全文搜索功能,并且实时返回匹配的结果如下
例如在一个入口搜索一个组合的关键词,得到最匹配的结果列表,并且是实时返回,索引中存着很多的商品 tb级别) 用火锅 辣 这样的组合单词去搜索索引中的title字段
1.【通州区】麻合辣重庆九宫格火锅
2. 【平谷城区】北京嗨辣激情火锅
分词器会把titel 【通州区】麻合辣重庆九宫格火锅 进行一个拆分 [通,州,区,麻,合,辣,重,庆,九,宫,格,火,锅] ,之后进行单词匹配,并给匹配的结果打分(关联性)之后利用打分的结果进行排序,返回最匹配的结果
更详细有关分词器内容可以查看官方文档
https://www.elastic.co/downloads/elasticsearch
下载后解压进入bin目录
输入./elasticsearch
看到上图表示启动成功
es有很多新的名词例如node document index type id理解这些词组才能有一个好的开始
node 集群中的一个节点;
index :一个索引是一个包含某些特性类似数据的集合
type:在一个索引里面,可以定义一个或多个types, 一个type是逻辑 分类你的索引数据
document:一个文本是一个能被索引的基础单位
对比mysql数据关系如下
mysql: db -table - row
es: index-type-id
mysql的库等同于es的index,table等同于type,row等同于id;
https://github.com/bly2k/files/blob/master/accounts.zip?raw=true 1000条批量json数据
提取它放到当前命令后目录输入
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json"
这个操作会上传1000条数据进入bank下面的account type下
批处理命令 _bulk
?pretty 漂亮的格式返回
下列是列举各类的查询语法
分页:
curl -XPOST 'localhost:9200/hotelswitch/_search?pretty' -d ' { "query": { "match_all": {} }, "from": 10, "size": 10 }'
排序:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{ "query": { "match_all": {} }, "sort": { "balance": { "order": "desc" } } }'
返回部分字段 -在source 里面指定
curl -XPOST 'localhost:9200/hotelswitch/_search?pretty' -d ' { "query": { "match": {"account_number":20} }, "_source": ["account_number", "email"] }'
curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "query": { "match": { "address": "mill lane" } } }'
组合查询
curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "query": { "bool": { "must": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } }'
范围过滤器
curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "query": { "bool": { "must": { "match_all": {} }, "filter": { "range": { "balance": { "gte": 20000, "lte": 30000 } } } } } }'
聚合函数 类似于sql 的group by
curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state" } } } }'
更多详细的restful API可以看官方文档
org.elasticsearch elasticsearch 2.4.0
public class elasticSearch_local {
private static final Logger logger = LoggerFactory.getLogger(elasticSearch_local.class);
private static Random r=new Random();
static int [] typeConstant =new int[]{0,1,2,3,4,5,6,7,8,9,10};
static String [] roomTypeNameConstant =new String[]{"标准大床房","标准小床房","豪华大房","主题情侣房间"};
public static void main (String []agre) throws Exception {
//http://bj1.lc.data.sankuai.com/ test 80 online 9300
// on startup
//初始化client实列 连接本机的es 9300端口
TransportClient client = TransportClient.builder().build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9300));
long startTime = System.currentTimeMillis();
for (int i=0;i<1000;i++) {
//上传数据第一个参数为索引,第二个为type,source是文本
IndexResponse response = client.prepareIndex("hotel", "room")
.setSource(getEsDataString()
)
.get();
}
logger.info(" run 1000 index consume time : "+(System.currentTimeMillis()-startTime));
}
public static XContentBuilder getEsDataString () throws Exception{
SimpleDateFormat sp =new SimpleDateFormat("yyyy-MM-dd");
Date d =new Date();
int offset = r.nextInt(15);
//es的原生api 提供json数据的转换 jsonBuilder.field(key,value).endObject();
XContentBuilder object= jsonBuilder()
.startObject().field("gmtCreate", (System.currentTimeMillis()-(864000008*offset))+"").field("gmtModified",(System.currentTimeMillis()-(864000008*offset))+"")
.field("sourceType",typeConstant[r.nextInt(10)]+"").field("partnerId",r.nextInt(999999999)+"").field("poiId",r.nextInt(999999999)+"")
.field("roomType",r.nextInt(999999999)+"").field("roomName",roomTypeNameConstant[r.nextInt(4)]).field("bizDay",r.nextInt(999999999)+"")
.field("status",typeConstant[r.nextInt(10)]+"").field("freeCount",r.nextInt(99999)+"").field("soldPrice",r.nextInt(99999)+"")
.field("marketPrice",r.nextInt(99999)+"").field("ratePlanId",r.nextInt(99999)+"").field("accessCode",r.nextInt(999999999)+"")
.field("basePrice",r.nextInt(999999999)+"").field("memPrice",r.nextInt(999999999)+"").field("priceCheck",typeConstant[r.nextInt(10)]+"")
.field("shardPart",typeConstant[r.nextInt(10)]+"").field("sourceCode",typeConstant[r.nextInt(10)]+"").field("realRoomType",r.nextInt(999999999)+"")
.field("typeLimitValue",typeConstant[r.nextInt(10)]+"").field("openInventoryByAccessCodeList","").field("closeInventoryByAccessCodeList","")
.field("openOrClose","1").field("openInventoryByAccessCodeListSize",r.nextInt(999999999)+"").field("openInventoryByAccessCodeListIterator",r.nextInt(999999999)+"")
.field("closeInventoryByAccessCodeListSize",r.nextInt(999999999)+"").field("closeInventoryByAccessCodeListIterator",r.nextInt(999999999)+"")
.field("datetime", sp.format(d))
.endObject();
return object;
}
}
public class elasticSearch_formeituanSearch {
private static final Logger logger = LoggerFactory.getLogger(elasticSearch_formeituanSearch.class);
public static void main (String []agre) throws Exception {
//http://bj1.lc.data.sankuai.com/ test 80 online 9300
// on startup
//连接到集群 初始化客户端
TransportClient client = TransportClient.builder().build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9300));
/*QueryBuilder queryBuilder = QueryBuilders
.disMaxQuery()
.add(QueryBuilders.termQuery("roomName", "豪华大床"))
.add(QueryBuilders.termQuery("status", "0"));*/
//查询条件 在匹配文字的时候一定用matchQuery termQuery 用于精确匹配 匹配数字 ,long型 term查询不会分词
QueryBuilder qb = boolQuery().must(matchQuery("roomName", "豪华大房")) ;
/* QueryBuilder qb = boolQuery()
.must(matchQuery("roomName", "豪华大房"))
.must(matchQuery("status", "0"))
.must(matchQuery("sourceCode", "4"))
.must(matchQuery("typeLimitValue", "5"))
.must(matchQuery("soldPrice", "11673"));*/
SearchResponse response = client.prepareSearch("hotel") //hotel索引
.setTypes("room") //room type
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH) //搜索类型
.setQuery(qb) // Query
.setPostFilter(QueryBuilders.rangeQuery("datetime").gte("2016-10-20").lte("2016-10-21").format("yyyy-MM-dd")) //在查询到的结果后 进行日期过滤
.setFrom(0).setSize(10).setExplain(true) //分页
.execute() //执行
.actionGet();
long count =response.getHits().getTotalHits(); //命中的结果
System.out.println(count);
SearchHit[] hits =response.getHits().getHits();
for (SearchHit hit : hits) {
System.out.println(hit.getSource());
}
}
}
public class elasticSearch_fordelete {
private static final Logger logger = LoggerFactory.getLogger(elasticSearch_fordelete.class);
public static void main (String []agre) throws Exception {
TransportClient client = TransportClient.builder().build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9300));
//匹配所有 Scroll便利数据 每次读取1000条 while循环中 会重新拉取数据 大数据建议用Scroll
QueryBuilder qb = matchAllQuery();
SearchResponse response = client.prepareSearch("hotelindex")
.setTypes("poidata")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.addSort(SortParseElement.DOC_FIELD_NAME, SortOrder.ASC)
.setScroll(new TimeValue(60000))
.setQuery(qb)
.setFrom(0)
.setSize(50)
.execute()
.actionGet();
long count =response.getHits().getTotalHits();
while (true) {
for (SearchHit hit : response.getHits().getHits()) {
client.prepareDelete(hit.getIndex(),hit.getType(),hit.getId()).get();
}
try {
response = client.prepareSearchScroll(response.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
//Break condition: No hits are returned
if (response.getHits().getHits().length == 0) {
break;
}
}catch (Exception e){
e.printStackTrace();
}
}
}
}
搜索区别-
//查询条件 在匹配文字的时候一定用matchQuery termQuery用于精确匹配匹配数字long型term查询不会分词
match_query :全文搜索 首先分析单词
term_query:精确查询-不分析单词
Mapings:
建立字段映射多种数据类型
注意 已经存在的索引不能够重新被映射
索引的几种建立方式
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html
需要源码的请加技术群:468246651