一、背景
最近在做基于宴会厅档期的商户搜索推荐时,如果用传统平铺式的mapping结构,无法满足需求场景,于是用到了Elasticsearch支持的Nested(嵌套)查询。
二、普通对象与嵌套对象的索引异同
如果一个对象不是嵌套类型,那么以如下原数据为例:
PUT /my_index/blogpost/1
{
"title":"Nest eggs",
"body": "Making your money work...",
"tags": [ "cash", "shares" ],
"comments":[
{
"name": "John Smith",
"comment": "Great article",
"age": 28,
"stars": 4,
"date": "2014-09-01"
},
{
"name": "Alice White",
"comment": "More like this please",
"age": 31,
"stars": 5,
"date": "2014-10-22"
}
]
}
由于是json格式的结构化文档,es会平整成索引内的一个简单键值格式,如下:
{
"title": [ eggs, nest ],
"body": [ making, money, work, your ],
"tags": [ cash, shares ],
"comments.name": [ alice, john, smith, white ],
"comments.comment": [ article, great, like, more, please, this ],
"comments.age": [ 28, 31 ],
"comments.stars": [ 4, 5 ],
"comments.date": [ 2014-09-01, 2014-10-22 ]
}
这样的话,像这种john/28,Alice/31间的关联性就丢失了,Nested Object就是为了解决这个问题。
将comments指定为Nested类型,如下mapping:
curl -XPUT 'localhost:9200/my_index' -d '
{
"mappings":{
"blogpost":{
"properties":{
"comments":{
"type":"nested", //声明为nested类型
"properties":{
"name": {"type":"string"},
"comment": { "type": "string"},
"age": { "type": "short"},
"stars": { "type": "short"},
"date": { "type": "date"}
}
}
}
}
}
}
这样,每一个nested对象将会作为一个隐藏的单独文本建立索引,进而保持了nested对象的内在关联关系,如下:
{ ①
"comments.name": [ john, smith ],
"comments.comment": [ article, great ],
"comments.age": [ 28 ],
"comments.stars": [ 4 ],
"comments.date": [ 2014-09-01 ]
}
{
"comments.name": [ alice, white ],
"comments.comment": [ like,more,please,this],
"comments.age": [ 31 ],
"comments.stars": [ 5 ],
"comments.date": [ 2014-10-22 ]
}
{
"title": [ eggs, nest ],
"body": [ making, money, work, your ],
"tags": [ cash, shares ]
}
①nested object
三、嵌套对象的查询
命令查询(输出结果1):
curl -XGET localhost:9200/yzsshopv1/shop/_search?pretty -d '{"query" : {"bool" : {"filter" : {"nested" : {"path":"hallList","query":{"bool":{"filter":{"term":{"hallList.capacityMin" : "11"}}}}}}}}}'
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.0,
"hits" : [ {
"_index" : "yzsshopv1",
"_type" : "shop",
"_id" : "89999988",
"_score" : 0.0,
"_source" : {
"cityId" : "1",
"shopName" : "xxxx婚宴(yyyy店)",
"shopId" : "89999988",
"categoryId" : [ "55", "165", "2738" ],
"hallList" : [ {
"hallId" : "20625",
"schedule" : ["2017-11-10", "2017-11-09"],
"capacityMax" : 16,
"capacityMin" : 12
}, {
"hallId" : "21080",
"schedule" : [ "2017-12-10", "2017-09-09", "2017-02-25"],
"capacityMax" : 20,
"capacityMin" : 11
} ],
"wedHotelTagValue" : [ "12087", "9601", "9603", "9602" ],
"regionId" : [ "9", "824" ]
}
} ]
}
}
java api查询封装:
BoolQueryBuilder boolBuilder = new BoolQueryBuilder();
NestedQueryBuilder nestedQuery = new NestedQueryBuilder("hallList", new TermQueryBuilder("hallList.capacityMin","11")); //注意:除path之外,fieldName也要带上path (hallList)
boolBuilder.filter(nestedQuery);
searchRequest.setQuery(boolBuilder); //设置查询条件
java api输出字段封装:
searchRequest.addField("shopId");
searchRequest.addField("hallList. schedule");
searchRequest.addField("hallList.capacityMin");
searchRequest.addField("hallList.capacityMax");
如果输出的outputField为searchRequest.addField("hallList"),则会报错:illegal_argument_exception,reason:field [hallList] isn't a leaf field;
如果输出的outputField为searchRequest.addField("capacityMin"),则不报错,但没有capacityMin字段的值;
正确调用search后的输出结果(输出结果2):
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.0,
"hits" : [{
"_index" : "yzsshopv1",
"_type" : "shop",
"_id" : "89999988",
"_score" : 0.0,
"fields" : {
"shopId" : [ "89999988" ],
"hallList.hallId" : [ "20625", "21080"],
"hallList.capacityMin" : [12, 11 ],
"hallList.capacityMax" : [16, 20 ],
"hallList.schedule" : [ "2017-11-10", "2017-11-09", "2017-12-10", "2017-09-09", "2017-02-25"]
}
}]
}
}
对比输出结果1和2发现,命令输出嵌套对象结果1没问题,但通过java api输出结果2时,嵌套对象内部的关系也会打乱,比如hallList.schedule字段,无法区分到底哪些值属于hallList.hallId-20625,哪些属于21080。
//============以下更新20170331===========
经过后续调试,发现要让java api输出正确结果的嵌套对象,不能通过searchRequest.addField的方式,因为嵌套对象并不是叶子节点,需要通过以下的方式添加输出字段:
searchRequest.setFetchSource(new String[]{"shopId","hallList"},new String[]{});
还有一个不足点是: 嵌套查询请求返回的是整个文本,而不仅是匹配的nested文本。
四、参考文档
- https://www.elastic.co/guide/en/elasticsearch/guide/master/nested-objects.html
- http://stackoverflow.com/questions/23562192/unable-to-retrieve-nested-objects-using-elasticsearch-java-api
- http://elasticsearch.cn/book/elasticsearch_definitive_guide_2.x/nested-aggregation.html