Elasticsearch的DSL操作命令大全(一)

文章目录

        • 前言:
          • 增加:
          • 删除:
          • 更新:
          • 搜索:
          • 聚合统计:
          • 全局桶:
          • 合并查询语句:
          • 返回指定的字段:

以下执行命令都是基于阿里es提供的kibana。

前言:

以前在服务器上直接使用curl命令就可以进行es的查询,后来公司用了阿里的es后,在阿里给的服务器上执行命令居然会报错

[root@Alihui ~]# curl -XGET es-cn-huiiiiiiiiiiiii.elasticsearch.aliyuncs.com:9200
{"error":{"root_cause":[{"type":"security_exception","reason":"missing authentication token for REST request [/]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}}],"type":"security_exception","reason":"missing authentication token for REST request [/]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}},"status":401}[root@Ali98 ~]# timed out waiting for input: auto-logout

解决:原来是人家阿里做了相应的控制,需要输入用户和密码按人家的套路才能访问,详情请看https://help.aliyun.com/document_detail/57877.html?spm=a2c4g.11186623.6.548.AAW08d
正确的连接姿势:

[root@Ali98 ~]# curl -u hui:hui -XGET es-cn-huiiiiiiiiiiiii.elasticsearch.aliyuncs.com:9200
{
  "name" : "huihui",
  "cluster_name" : "es-cn-huiiiiiiiiiiiii",
  "cluster_uuid" : "huiiiiiiiiiiiii_iiiii",
  "version" : {
    "number" : "5.5.3",
    "build_hash" : "930huihui",
    "build_date" : "2017-09-07T15:56:59.599Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.0"
  },
  "tagline" : "You Know, for Search"
}

 
查看该服务器所有的索引信息:

GET _cat/indices?v

查看该服务器指定的索引信息:

GET _cat/indices/xiao-2018-9-*?v

补充:java api获取索引列表(来自http://www.hillfly.com/2016/131.html)
与 index 相关的文档可参考官网:Indices APIs。而相应的,JAVA-API 中与 index 相关的 API 被统一封装在 IndicesAdminClient 下,可以通过 client.admin().indices() 获取当前的 IndicesAdminClient。

那么我们如何获取 ES 当前的索引信息,比如我想要知道目前有多少个索引,给我个索引列表?
答案是纠结的。因为截至今天(2016.11.26),我在官网中也木有找到仅获取 indexList 的 API,也可能是本渣眼瞎没有发现::>_<::。

但是可以通过别的 API 间接获取到:
比如通过以下 HTTP API:http://10.1.109.77:9200/_all
对应 JAVA API 为: GetIndexResponse resp = client.admin().indices().prepareGetIndex().execute().actionGet();
这里我就不贴运行结果了,大家可以自己试试,可以发现返回的信息很多,除了 indexList 外,还有 mappings 等信息。

但是我更偏向使用另一个 API 来获取索引列表:
HTTP: http://10.1.109.77:9200/_stats
JAVA-API:IndicesStatsResponse resp = client.admin().indices().prepareStats().execute().actionGet();
为什么偏向使用这个?因为这个接口还有返回索引的存储信息,包括缓存大小、索引大小等等,方便我们作为优化参考。

这里偏个题,如果需要通过这个 API 获取某个索引的存储信息,比如我要获取索引名为 HILL-INDEX-TEST1 的存储信息,可以这么做:
HTTP:http://10.1.109.77:9200/HILL-INDEX-TEST1/_stats;
JAVA-API:
IndicesStatsResponse resp = client.admin().indices().prepareStats() .setIndices("HILL-INDEX-TEST1") .execute().actionGet();
另外 setIndices 还支持通配符查询,比如我要查询以 HILL - 开头的所有索引的存储信息:setIndices(“HILL-*”)。

单纯只想要索引名称而没有其他信息的列表,可以这样做:

		Map indexStatsMap = resp.getIndices();
		TreeSet set = new TreeSet();
		SimpleDateFormat df = new SimpleDateFormat("yyyy-M-d");//设置日期格式
		for (String key : indexStatsMap.keySet()) {
			long timestamp = df.parse(key.substring(7)).getTime();
			set.add(timestamp);
		}
		String maxIndex = "xiaoqiang-"+df.format(new Date(set.last()));

获取索引的mapping:

GET /xiao-2018-6-12/Socials/_mapping

 

增加:

1.增加一个文档:

POST hualong_word/word/AWM6zjWeB-kQcwLD8Zjp?routing=word
{
    "text" : "科技服务客服"
}

2.增加指定字段name的值为xiaoqiang:
注:AWM6zjWeB-kQcwLD8Zjp这条文档已经存在,name字段有没有都可以。

POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
    "script" : "ctx._source.name = \"xiaoqiang\""
}

 

删除:

1.删除指定字段:

POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
    "script" : "ctx._source.remove(\"name_of_new_field\")"
}

2.删除一条数据:

DELETE mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp?routing=news

3.根据多个条件批量删除:

POST mei_toutiao/News/_delete_by_query?routing=news
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : [
                        { "term" : { "mediaNameZh" : "5time悦读" } }, 
                        { "term" : { "codeName" : "美发" } }
                    ]
                }
            }
        }
    }
}

 

更新:

1.局部更新:

POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
   "doc" : {
      "userName": "hao"  //有这个字段则修改,没有则增加该字段
   }
}

2.更新字符串数组:

POST mei_toutiao/News/AWPN8pLjs4TGXdjfL8_b/_update?routing=news
{
   "doc" : {
      "littleUrls": [
          "http://shishanghui.oss-cn-beijing.aliyuncs.com/700d2d2936f40fabe5a70b1449f07f9df080.jpg?x-oss-process=image/format,jpg/interlace,1",
          "http://shishanghui.oss-cn-beijing.aliyuncs.com/ed7ad5d1e23441880c59abf0cfd7a89df080.jpg?x-oss-process=image/format,jpg/interlace,1"
      ]
   }
}

3.全部更新:
(不管有没有下面这些字段,都变为只有下面这些内容即全部替换掉下面的,所以慎用!!!)

PUT mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp?routing=news
{
    "counter" : 1,
    "tags" : ["red"]
}

4.批量重置评论量大于0的文章的评论量为0:

POST mei_toutiao/News/_update_by_query?routing=news
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "atdCnt": {
              "gt": 0
            }
          }
        }
      ]
    }
  },
  "script": {
    "inline":"ctx._source.atdCnt = 0"
  }
}

可参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html

5.批量增加相应字段并赋值:

POST hui/News/_update_by_query
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "hui": "hehe"
                        }
                    }
                }
            }
        }
    },
    "script": {
        "inline":"ctx._source.name = \"xiaoqiang\""
    }
}

6.使用脚本更新:
当文档存在时,把文档的counter字段设置为3;当文档不存在时,插入一个新的文档,文档的counter字段的值是2

POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{  
   "script":{  
      "inline":"ctx._source.counter = 3"
   },
   "upsert":{"counter":2}
}

counter字段加4:
参考(版本号是6.4,官方文档用的是"source",我的阿里es是5.5.3,用"inline"才好使):https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html

POST mei_toutiao/News/AWM6zjWeB-kQcwLD8Zjp/_update?routing=news
{
    "script" : {
        "inline": "ctx._source.counter += 4"
    }
}

或者:

{
    "script" : {
        "inline": "ctx._source.counter += params.count",
        "lang": "painless",
        "params" : {
            "count" : 4
        }
    }
}

 

搜索:
GET mei_toutiao/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {             
                "term" : {
                    "_id": "AWNcz4IrB-kQcwLDJ93q"
                }
            }
        }
    }
}

注:
1.“constant_score"的用处参考https://blog.csdn.net/dm_vincent/article/details/42157577
2.match和term的区别可参考https://www.cnblogs.com/yjf512/p/4897294.html
3.term里面也可以是数据相对应的字段(如"newType” : 1),根据字段查可能会返回很多条数据,但是根据_id查只会返回一条数据。

1.搜索一条数据:

GET mei_toutiao/hui/AWNcz4IrB-kQcwLDJ93q?routing=hui

2.搜索全部数据:

GET mei_toutiao/_search

注:可以全部搜索到,但是默认返回10条数据

3.搜索所有newType字段为1的数据:

GET mei_toutiao/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "newType": "1"
                        }
                    }
                }
            }
        }
    }
}

搜索所有newType字段不为1的数据:

GET mei_toutiao/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must_not" : {
                        "term" : {
                            "newType": "1"
                        }
                    }
                }
            }
        }
    }
}

注意:

GET mei_toutiao/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "match_phrase" : {
                            "userId": "1C210E82-21B7-4220-B267-ED3DA6635F6F"
                        }
                    }
                }
            }
        }
    }
}

上面可以查到相应的数据,而下面却不行

GET mei_toutiao/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "userId": "1C210E82-21B7-4220-B267-ED3DA6635F6F"
                        }
                    }
                }
            }
        }
    }
}

4.存在该字段的文档:

GET mei_toutiao/_search
{
    "query":{
          "exists": {
                "field": "newType"
           }
    }
}

不存在该字段的文档:

GET mei_toutiao/_search
{
    "query":{
        "bool": {
            "must_not": {
                "exists": {
                    "field": "newType"
                }
            }
        }
    }
}

5.多字段查询:

GET mei_toutiao/_search
{
    "size" : 0,
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : [
                        { "term" : { "sourceType" : "FORUM" } }, 
                        { "term" : { "flwCnt" : 0 } } 
                    ]
                }
            }
        }
    }
}

6.按pubTime字段降序:升序是asc

GET mei_toutiao/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "newType": "1"
                        }
                    }
                }
            }
        }
    }
    , "sort": [
      {
          "pubTime": "desc"
      }
    ]
}

7.视频分类中过滤掉抖音:

GET mei_toutiao/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "codeName": "视频"
                        }
                    },
                    "must_not" : {
                        "term" : {
                            "mediaNameZh": "抖音"
                        }
                    }
                }
            }
        }
    }
    , "sort": [
      {
          "pubTime": "desc"
      }
    ]
}

对应的java api:

query.must(QueryBuilders.termQuery("codeName", "视频"))
.mustNot(QueryBuilders.matchQuery("mediaNameZh", "抖音"));
client.setQuery(fqb).addSort("pubTime", SortOrder.DESC);

分页加排序:

client.setQuery(fqb).setFrom((message.getInt("pageNo")-1)*10).setSize(10).addSort("pubTime", SortOrder.DESC);

8.根据时间范围搜索:
参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html

GET mei_toutiao/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "pubDay": {
              "gte": "2018-05-11",
              "lte": "2018-05-12"
            }
          }
        }
      ]
    }
  }
}

昨天到现在:

GET mei_toutiao/_search
{
    "query": {
        "range" : {
            "pubDay" : {
                "gte" : "now-1d/d",
                "lt" :  "now/d"
            }
        }
    }
}

 
按相应的时间格式查询:

GET mei_toutiao/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "pubDay": {
              "gte": "2018-05-29 00:00:00",
              "lte": "2018-05-30 00:00:00",
              "format": "yyyy-MM-dd HH:mm:ss"
            }
          }
        }
      ]
    }
  }
}

或者:

GET mei_toutiao/_search
{
    "query": {
        "range" : {
            "pubDay" : {
                "gte": "30/05/2018",
                "lte": "2019",
                "format": "dd/MM/yyyy||yyyy"
            }
        }
    }
}

对应的java api:

QueryBuilder fqb = QueryBuilders.boolQuery().filter(new RangeQueryBuilder("pubDay").gte("2018-05-29 12:00:00").lte("2018-05-30 00:00:00").format("yyyy-MM-dd HH:mm:ss")).filter(filterQuery(message));

9.script查询微信url字段包含__biz的数据:

GET xiaoqiang-2018-11-6/Socials/_search
{
    "query": {
        "bool" : {
            "must" : [
                {
                    "term": {
                        "sourceType":"weixin"
                    }
                },
                {
                    "script" : {
                        "script" : "if (doc['url'].value.length() > 31) {doc['url'].value.substring(26,31) == '__biz';}"
                    }
                }
            ]
        }
    }
}

 

聚合统计:

1.分类聚合:

GET mei_toutiao/_search
{
    "size" : 0,
    "aggs" : {
        "per_count" : {
           "terms" : {
              "size" : 3,    //不加这个默认只会返回10条数据
              "field" : "codeName"
           }
        }
    }
}

结果:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 10,
    "successful": 10,
    "failed": 0
  },
  "hits": {
    "total": 52766,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "per_count": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "视频",
          "doc_count": 17258
        },
        {
          "key": "旅游",
          "doc_count": 10132
        },
        {
          "key": "娱乐",
          "doc_count": 8867
        }
      ]
    }
  }
}

注:可参考官网https://www.elastic.co/guide/cn/elasticsearch/guide/current/cardinality.html

2.sourceType字段为论坛的媒体名称聚合:

GET xiao-2018-4-1/Socials/_search
{
    "size" : 0,
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "sourceType" : "FORUM"
                        }
                    }
                }
            }
        }
    },
    "aggs" : {
        "per_count" : {
           "terms" : {
              "size" : 10000,
              "field" : "website.keyword"
           }
        }
    }
}

3.根据name字段聚合,并且得出每个分类下的最大阅读量:

GET xiao-2018-4-1/Socials/_search
{
    "size" : 0,
    "aggs" : {
        "per_count" : {
           "terms" : {
              "size" : 10000,
              "field" : "name"
           },
           "aggs" : {
                "max_count" : {
                    "max" : {
                       "field" : "view"
                    }
                }
            }
        }
    }
}

stats:返回聚合分析后所有有关stat的指标。具体哪些是stat指标是ES定义的,共有5项。(参考:https://blog.csdn.net/zxjiayou1314/article/details/53741586)

{
    "aggs" : {
    "grades_stats" : { "stats" : { "field" : "grade" } }
    }
}

执行结果:

{
    "aggregations": {
    "grades_stats": {
        "count": 6,
        "min": 60,
        "max": 98,
        "avg": 78.5,
        "sum": 471
    }
    }
}

Extended Stats:返回聚合分析后所有指标

{
    "aggs" : {
    "grades_stats" : { "extended_stats" : { "field" : "grade" } }
    }
}

执行结果:

{
    "aggregations": {
    "grade_stats": {
        "count": 9,
        "min": 72,
        "max": 99,
        "avg": 86,
        "sum": 774,
        # 平方和
        "sum_of_squares": 67028,
        # 方差
        "variance": 51.55555555555556,
        # 标准差
        "std_deviation": 7.180219742846005,
        #平均加/减两个标准差的区间,用于可视化你的数据方差
        "std_deviation_bounds": {
        "upper": 100.36043948569201,
        "lower": 71.63956051430799
        }
    }
    }
}

补充:一些常用聚合查询api(参考:https://blog.csdn.net/earthhour/article/details/79602809)

(1)统计某个字段的数量
ValueCountBuilder vcb=  AggregationBuilders.count("count_uid").field("uid");
(2)去重统计某个字段的数量(有少量误差)
CardinalityBuilder cb= AggregationBuilders.cardinality("distinct_count_uid").field("uid");
(3)聚合过滤
FilterAggregationBuilder fab= AggregationBuilders.filter("uid_filter").filter(QueryBuilders.queryStringQuery("uid:001"));
(4)按某个字段分组
TermsBuilder tb=  AggregationBuilders.terms("group_name").field("name");
(5)求和
SumBuilder  sumBuilder=	AggregationBuilders.sum("sum_price").field("price");
(6)求平均
AvgBuilder ab= AggregationBuilders.avg("avg_price").field("price");
(7)求最大值
MaxBuilder mb= AggregationBuilders.max("max_price").field("price"); 
(8)求最小值
MinBuilder min=	AggregationBuilders.min("min_price").field("price");
(9)按日期间隔分组
DateHistogramBuilder dhb= AggregationBuilders.dateHistogram("dh").field("date");
(10)获取聚合里面的结果
TopHitsBuilder thb=  AggregationBuilders.topHits("top_result");
(11)嵌套的聚合
NestedBuilder nb= AggregationBuilders.nested("negsted_path").path("quests");
(12)反转嵌套
AggregationBuilders.reverseNested("res_negsted").path("kps ");

4.查询平媒最近每天的日更量+有多少数据源(聚合结果去重排序):

GET xiao-2018-4-1/News/_search
{
    "size" : 0,
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : [
                        {
                            "term" : {
                               "mediaTname": "平媒"
                            }
                        },
                        {
                            "range": {
                                "pubDay": {
                                    "gt": "2018-08-31",
                                    "lt": "2018-09-09"
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs" : {
        "all_interests" : {
            "terms" : {
                "field" : "pubDay",
                "order" : { "distinct_mediaNameZh" : "desc" }
            },
            "aggs" : {
                "distinct_mediaNameZh" : {
                    "cardinality" : {
                       "field" : "mediaNameZh"
                    }
                }
            }
        }
    }
}

结果:

{
  "took": 1067,
  "timed_out": false,
  "_shards": {
    "total": 350,
    "successful": 350,
    "failed": 0
  },
  "hits": {
    "total": 98312,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "all_interests": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1536278400000,
          "key_as_string": "2018-09-07",
          "doc_count": 20946,
          "distinct_mediaNameZh": {
            "value": 389
          }
        },
        {
          "key": 1535932800000,
          "key_as_string": "2018-09-03",
          "doc_count": 14651,
          "distinct_mediaNameZh": {
            "value": 383
          }
        },
        {
          "key": 1536019200000,
          "key_as_string": "2018-09-04",
          "doc_count": 18325,
          "distinct_mediaNameZh": {
            "value": 381
          }
        },
        {
          "key": 1536192000000,
          "key_as_string": "2018-09-06",
          "doc_count": 20659,
          "distinct_mediaNameZh": {
            "value": 378
          }
        },
        {
          "key": 1536105600000,
          "key_as_string": "2018-09-05",
          "doc_count": 12752,
          "distinct_mediaNameZh": {
            "value": 321
          }
        },
        {
          "key": 1536364800000,
          "key_as_string": "2018-09-08",
          "doc_count": 8071,
          "distinct_mediaNameZh": {
            "value": 246
          }
        },
        {
          "key": 1535760000000,
          "key_as_string": "2018-09-01",
          "doc_count": 1706,
          "distinct_mediaNameZh": {
            "value": 147
          }
        },
        {
          "key": 1535846400000,
          "key_as_string": "2018-09-02",
          "doc_count": 1202,
          "distinct_mediaNameZh": {
            "value": 112
          }
        }
      ]
    }
  }
}

注:
1.根据查询到文档数量排序

"order" : {  "_count" : "desc" }

api:

.order(Terms.Order.count(true));

2.根据聚合字段排序(让结果按pubDay字段排序,该字段类似"2018-08-24")

"order" : {  "_term" : "desc" }

api:

AggregationBuilder aggregationBuilder = AggregationBuilders.terms("timeinterval")
					.script(new Script("String he=new SimpleDateFormat('HH').format(new Date(doc['timeHour'].value)); if(he.equals('01')){return he;}else{return null;}"))
					.size(24).order(Terms.Order.term(false));

注意:(1)fase表示desc,true表示asc (2).script也可换成.field(“timeHour”)
3.根据子聚合结果排序

"order" : { "distinct_mediaNameZh" : "desc" }

api:

.order(Terms.Order.aggregation("distinct_mediaNameZh", true));

5.sourceType字段为论坛的媒体名称聚合:
(并且每个媒体名称取出一个文章的url链接)

GET xiao-2018-4-1/Socials/_search
{
    "size" : 0,
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "sourceType" : "FORUM"
                        }
                    }
                }
            }
        }
    },
    "aggs" : {
        "all_interests" : { 
            "terms" : {
                "size" : 10000,   //这个语句是没有问题,但是这么大的量扛不住(嵌套聚合导致处理的数据量指数型爆炸增加),总是报连接超时
                "field" : "website.keyword" 
            }, 
            "aggs" : {
                "per_count" : {    //这个字段名字随意取
                    "terms" : {
                       "size" : 1,
                       "field" : "url"
                    }
                }
            }
        }
    }
}

解决上面的性能问题(转换思路):
参考官方文档:Top Hits Aggregation

GET xiao-2018-4-1/Socials/_search
{
    "size" : 0,
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "sourceType" : "FORUM"
                        }
                    }
                }
            }
        }
    },
    "aggs" : {
        "all_interests" : { 
            "terms" : {
                "size" : 10000,
                "field" : "website.keyword" 
            }, 
            "aggs": {
                "top_age": {
                    "top_hits": {
                        "_source": {
                            "includes": [
                                "url"
                             ]
                        },
                        "size": 1,
                        "from" : 0
                    }
                }
            }
        }
    }
}

top_hits下面的选项:
size:每组显示的数据
from:要获取的第一个结果的偏移量。注意:这里是偏移量而不是第几页的意思!!!
sort:每组的排序
_source.includes:每组显示哪些属性值
注:后来想用“字段折叠”来实现的(参考:https://elasticsearch.cn/article/132),但发现不能满足全部需求,只能满足“每个媒体名称取出一个文章的url链接”的需求,而不能满足“sourceType字段为论坛的媒体名称聚合”的需求从而无法得知聚合后的媒体名称的数量。看来这个字段折叠功能也只有在特定的场景下才适用啊。
 

全局桶:
GET xiao-2018-4-1/Socials/_search
{
    "size" : 0,
    "query" : {
        "constant_score" : {
            "filter" : {
                "bool" : {
                    "must" : {
                        "term" : {
                            "sourceType" : "FORUM"
                        }
                    }
                }
            }
        }
    },
    "aggs" : {
        "per_count": {
            "terms" : { "field" : "website.keyword" } 
        },
        "all": {
            "global" : {}, 
            "aggs" : {
                "per_count": {
                    "terms" : { "field" : "website.keyword" } 
                }
            }
        }
    }
}

可参考:https://www.elastic.co/guide/cn/elasticsearch/guide/current/_scoping_aggregations.html
 

合并查询语句:
{
    "bool": {
        "must": { "match":   { "email": "business opportunity" }},
        "should": [
            { "match":       { "starred": true }},
            { "bool": {
                "must":      { "match": { "folder": "inbox" }},
                "must_not":  { "match": { "spam": true }}
            }}
        ],
        "minimum_should_match": 1
    }
}

注:上面这个语句逻辑比较复杂需要好好思考一下(找出信件正文包含business opportunity的星标邮件,或者在收件箱正文包含business opportunity的非垃圾邮件),该列子来自官网https://www.elastic.co/guide/cn/elasticsearch/guide/current/query-dsl-intro.html
 

返回指定的字段:

1.store:返回有newType字段数据的codeName和view的内容

GET mei_toutiao/_search
{
    "stored_fields" : ["codeName", "view"],
    "query":{
          "exists": {
                "field": "newType"
           }
    }
}
SearchRequestBuilder request = getTransportClient().prepareSearch(esProperties.getES_Index()).setTypes(type)
				.setRouting(routing).storedFields(new String[] {"titleZh", "uuid"});

参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-stored-fields.html
前提:mapping中相应的字段store参数为true
(参考https://blog.csdn.net/napoay/article/details/73100110?locationNum=9&fps=1#323-store)默认情况下,自动是被索引的也可以搜索,但是不存储,这也没关系,因为_source字段里面保存了一份原始文档。在某些情况下,store参数有意义,比如一个文档里面有title、date和超大的content字段,如果只想获取title和date,可以这样:

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "title": {
          "type": "text",
          "store": true 
        },
        "date": {
          "type": "date",
          "store": true 
        },
        "content": {
          "type": "text"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "title":   "Some short title",
  "date":    "2015-01-01",
  "content": "A very long content field..."
}

GET my_index/_search
{
  "stored_fields": [ "title", "date" ] 
}

查询结果:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 1,
        "fields": {
          "date": [
            "2015-01-01T00:00:00.000Z"
          ],
          "title": [
            "Some short title"
          ]
        }
      }
    ]
  }
}

Stored fields返回的总是数组,如果想返回原始字段,还是要从_source中取。
注意:在java代码中需要将field放在数组中,否则只会返回数组中的第一个

JSONObject hitJson = JSONObject.fromObject(hit.getFields());
String[] fields = [ "keywordsZh", "littleUrls"];
for (Object field : fields) {
    if (hit.getFields().containsKey(field)) {
        if (field.equals("keywordsZh")) {
						@SuppressWarnings("unchecked")
						List keywordsZh = (List) hitJson.getJSONObject(field.toString()).get("values");
						json.put(field, keywordsZh);
//						json.put(field, hitJson.getJSONObject(field.toString()).get("value")); // 只返回该数组的第一个值
			  }
		}
}

2.返回一个指定的字段:

GET mei_toutiao/_search
{
    "_source": "newType",
    "query":{
          "term": {
                "uuid": "b6a0d42731c94db1a75383c192b5544a"
           }
    }
}

或者:

GET mei_toutiao/_search
{
    "_source": {
        "includes": "newType"
    },
    "query":{
        "term": {
            "uuid": "b6a0d42731c94db1a75383c192b5544a"
        }
    }
}

3.只返回newType和keywordsZh字段:

GET mei_toutiao/_search
{
    "_source": [ "newType", "keywordsZh" ]
}

或者:

GET mei_toutiao/_search
{
    "_source": {
        "includes": [ "newType", "keywordsZh" ]
    }
}

4.返回字段前缀名为t的:

GET mei_toutiao/_search
{
    "_source": "t*"
}

5.返回除newType和keywordsZh字段的:

GET mei_toutiao/_search
{
    "_source": {
        "excludes": [ "newType", "keywordsZh" ]
    }
}

参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-source-filtering.html

SearchRequestBuilder request = getTransportClient().prepareSearch(esProperties.getES_Index()).setTypes(type)
				.setRouting(routing).setFetchSource(new String[] {"titleZh", "uuid"} , null);

注:如果同时存在includes和excludes则取他两的交集
 

你可能感兴趣的:(Elasticsearch)