在之前的文章“Elasticsearch:Elasticsearch SQL介绍及实例”里,我们简要介绍了新的Elasticsearch SQL功能以及_translate API。 这篇特定的文章通过探索更复杂的功能来继续该系列。如果你还没准备好自己的数据,请先阅读我前面指出来的文章。
Elasticsearch的聚合框架(能够汇总数十亿个数据点)代表了堆栈中最强大和最受欢迎的功能之一。 从功能的角度来看,它与SQL中的GROUP BY运算符具有自然的等效性。 除了提供一些GROUP BY功能的示例外,我们还将再次使用translation API来显示等效的聚合。
“找到飞往伦敦的每个来源目的地国家的平均飞行时间。 按照国家的字母顺序排列。”
sql> SELECT AVG(FlightTimeHour) Avg_Flight_Time, OriginCountry FROM flights GROUP BY OriginCountry ORDER BY OriginCountry LIMIT 5;
Avg_Flight_Time | OriginCountry
------------------+---------------
9.342180244924574 |AE
13.49582274385201 |AR
4.704097126921018 |AT
15.081367354940724|AU
7.998943401875511 |CA
检查此查询的DSL将显示“composite aggregation”的使用。
GET flights/_search
{
"size": 0,
"_source": false,
"stored_fields": "_none_",
"aggs": {
"groupby": {
"composite": {
"size": 1000,
"sources": [
{
"3471": {
"terms": {
"field": "OriginCountry.keyword",
"order": "asc"
}
}
}
]
},
"aggs": {
"3485": {
"avg": {
"field": "FlightTimeHour"
}
}
}
}
}
}
这里使用的是composite aggregation。它可以帮我实现在aggregration里的scroll功能。如果大家对这个不是很明白的话,请参阅我的另外一篇文章“在Elasticsearch中的Composite Aggregation”。上面查询的结果返回的是:
{
"took" : 21,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"groupby" : {
"after_key" : {
"3471" : "ZA"
},
"buckets" : [
{
"key" : {
"3471" : "AE"
},
"doc_count" : 385,
"3485" : {
"value" : 9.342180244924574
}
},
{
"key" : {
"3471" : "AR"
},
"doc_count" : 258,
"3485" : {
"value" : 13.49582274385201
}
},
{
"key" : {
"3471" : "AT"
},
"doc_count" : 120,
"3485" : {
"value" : 4.704097126921018
}
},
{
"key" : {
"3471" : "AU"
},
"doc_count" : 518,
"3485" : {
"value" : 15.081367354940724
}
},
...
我们还可以使用函数对select中定义的别名字段进行分组。
“查找每月航班的数量和平均飞行时间。”
POST /_sql?format=txt
{
"query":"SELECT COUNT(*), MONTH_OF_YEAR(timestamp) AS month_of_year, AVG(FlightTimeHour) AS Avg_Flight_Time FROM flights GROUP BY month_of_year"
}
上面的查询结果是:
COUNT(*) | month_of_year | Avg_Flight_Time
---------------+---------------+-----------------
5687 |4 |8.578573065474027
7372 |5 |8.472684454688286
Composite aggregation的使用具有一个主要优点-可以确保GROUP BY实现甚至可扩展用于高基数字段,并提供一种机制来流传输特定聚合的所有存储桶,类似于滚动对文档所做的操作。 这也确保了实现不会像使用术语聚合那样遭受相同的内存限制。 我们可以通过如下命令来翻译相对应的composite aggregation:
POST /_sql/translate
{
"query":"SELECT AVG(FlightTimeHour) Avg_Flight_Time, OriginCountry FROM flights GROUP BY OriginCountry ORDER BY Avg_Flight_Time"
}
相应的翻译的结果是:
{
"size" : 0,
"_source" : false,
"stored_fields" : "_none_",
"aggregations" : {
"groupby" : {
"composite" : {
"size" : 1000,
"sources" : [
{
"bee1e422" : {
"terms" : {
"field" : "OriginCountry.keyword",
"missing_bucket" : true,
"order" : "asc"
}
}
}
]
},
"aggregations" : {
"803ccc93" : {
"avg" : {
"field" : "FlightTimeHour"
}
}
}
}
}
}
为了过滤组,我们可以利用HAVING运算符,该运算符也可以利用SELECT子句中指定的别名。 这对于某些SQL专家可能是不寻常的,因为在基于RDBMS的实现中通常是不可能的,因为SELECT是在HAVING之后执行的。 在这里,HAVING子句使用的是在执行阶段声明的别名。 但是,我们的分析器足够聪明,可以向前看,并选择要在HAVING中使用的声明。
“找到每个出发城市的航班数量,平均飞行距离和第95个百分位,平均距离在3000到4000英里之间。”
sql> SELECT OriginCityName, ROUND(AVG(DistanceKilometers)) avg_distance, COUNT(*) c, ROUND(PERCENTILE(DistanceKilometers,95)) AS percentile_distance FROM flights GROUP BY OriginCityName HAVING avg_distance BETWEEN 3000 AND 4000;
OriginCityName | avg_distance | c |percentile_distance
---------------+---------------+---------------+-------------------
Verona |3078.0 |120 |7927.0
Vienna |3596.0 |120 |7436.0
Xi'an |3842.0 |114 |7964.0
为了实现HAVING功能,SQL Elasticsearch利用Bucket Selector管道聚合,使用参数化的painless 脚本过滤值。 请注意下面的内容,将自动为聚合选择OriginCityName字段的关键字变体,而不是尝试使用标准文本变体,这可能由于未启用字段数据而失败。 avg和percentile指标聚合提供与SQL变体等效的功能。
POST /_sql/translate
{
"query": """
SELECT OriginCityName, ROUND(AVG(DistanceKilometers)) avg_distance, COUNT(*) c, ROUND(PERCENTILE(DistanceKilometers,95)) AS percentile_distance FROM flights GROUP BY OriginCityName HAVING avg_distance BETWEEN 3000 AND 4000
"""
}
上面翻译的结果是:
{
"size" : 0,
"_source" : false,
"stored_fields" : "_none_",
"aggregations" : {
"groupby" : {
"composite" : {
"size" : 1000,
"sources" : [
{
"ff6ca116" : {
"terms" : {
"field" : "OriginCityName.keyword",
"missing_bucket" : true,
"order" : "asc"
}
}
}
]
},
"aggregations" : {
"b54e054" : {
"avg" : {
"field" : "DistanceKilometers"
}
},
"7171c519" : {
"percentiles" : {
"field" : "DistanceKilometers",
"percents" : [
95.0
],
"keyed" : true,
"tdigest" : {
"compression" : 100.0
}
}
},
"having.8bcff206" : {
"bucket_selector" : {
"buckets_path" : {
"a0" : "b54e054",
"a1" : "b54e054"
},
"script" : {
"source" : "InternalSqlScriptUtils.nullSafeFilter(InternalSqlScriptUtils.and(InternalSqlScriptUtils.gte(InternalSqlScriptUtils.round(params.a0,params.v0), params.v1), InternalSqlScriptUtils.lte(InternalSqlScriptUtils.round(params.a1,params.v2), params.v3)))",
"lang" : "painless",
"params" : {
"v0" : null,
"v1" : 3000,
"v2" : null,
"v3" : 4000
}
},
"gap_policy" : "skip"
}
}
}
}
}
}
与传统的RDBMS相比,Elasticsearch作为搜索引擎的独特功能之一是它能够通过使用相关性计算来考虑文本数据的属性,从而对匹配进行评分,而不仅仅是简单的“是/否”。扩展SQL语法使我们可以公开此功能,并且超越了传统RDBMS可能提供的功能。
因此,我们引入了两个新的运算符:QUERY和MATCH。对于熟悉Elasticsearch的人员,这些等效于基础的multi_match和query_string运算符。 Kibana的用户将熟悉query_string运算符的行为,因为它用于为默认搜索栏提供动力。它提供了智能的解析功能,并允许自然的语言风格的查询。这两个运算符的详细信息不在本博客的讨论范围之内,但是权威的指南条目对这些概念进行了很好的介绍。
例如,请考虑以下内容:
“查找按日期排序的2018-06-06至2018-06-17之间所有往返Kastrup机场的延迟航班。”
Edmonton一座服务于加拿大阿尔伯塔省埃德蒙顿市及周边地区的国际机场,全称是“Edmonton International Airport”。 使用QUERY运算符,我们只需搜索Edmonton。
sql> SELECT timestamp, FlightNum, OriginCityName, DestCityName FROM flights WHERE QUERY('Edmonton') AND FlightDelay=true AND timestamp > '2018-06-20' AND timestamp < '2020-06-27' ORDER BY timestamp;
timestamp | FlightNum |OriginCityName | DestCityName
------------------------+---------------+---------------+---------------
2020-04-14T22:19:48.000Z|1C0ZWE9 |Cologne |Edmonton
2020-04-16T04:55:07.000Z|48DVRFT |Edmonton |Torino
2020-04-16T19:17:14.000Z|14KTFQB |Edmonton |Oslo
2020-04-19T06:25:17.000Z|EN9FHUD |Detroit |Edmonton
2020-04-21T20:35:16.000Z|H5Y0MJK |Edmonton |Palermo
2020-04-23T02:03:18.000Z|KCNMKVI |Edmonton |Erie
2020-04-23T09:34:02.000Z|XH9H5H3 |Paris |Edmonton
2020-04-25T04:22:28.000Z|GJTJ47T |Edmonton |Bangalore
2020-04-26T13:23:09.000Z|PPZN0Y7 |Edmonton |Indianapolis
2020-04-27T00:20:57.000Z|IKFEGFL |Edmonton |Warsaw
2020-04-27T22:11:51.000Z|300JHDQ |Green Bay |Edmonton
2020-04-30T15:02:33.000Z|PK1ETRA |Rome |Edmonton
2020-05-01T17:52:50.000Z|A2NRDPQ |Edmonton |Manchester
2020-05-01T22:19:38.000Z|S9AY152 |Edmonton |Buenos Aires
2020-05-03T15:52:05.000Z|PJXXO9P |Edmonton |Buenos Aires
2020-05-05T09:00:47.000Z|QTPABGR |Edmonton |Jeju City
2020-05-05T18:49:49.000Z|YVEUZNO |Edmonton |Ottawa
2020-05-06T12:46:16.000Z|TCPDEBY |Edmonton |Bergamo
2020-05-07T00:00:00.000Z|SW1HB5M |Abu Dhabi |Edmonton
2020-05-07T12:47:25.000Z|0HZ3PHM |Cape Town |Edmonton
2020-05-08T15:26:39.000Z|T5YFSWW |Paris |Edmonton
2020-05-08T16:35:16.000Z|E92FNK2 |Edmonton |Vienna
2020-05-09T02:34:40.000Z|PB8BSSH |Edmonton |Tokyo
2020-05-10T14:06:58.000Z|ADWMNQL |Edmonton |Zurich
2020-05-11T15:21:31.000Z|YB4FNOI |Edmonton |Vienna
2020-05-12T22:16:10.000Z|TCE99LO |Copenhagen |Edmonton
2020-05-14T00:19:45.000Z|RBJT1ZG |Edmonton |Palermo
2020-05-15T12:35:39.000Z|M1NHZTB |Edmonton |Guangzhou
2020-05-17T15:23:49.000Z|WC862JS |Dublin |Edmonton
2020-05-18T19:39:08.000Z|99R1VXK |Edmonton |Naples
2020-05-21T05:30:11.000Z|PJP5R9L |Edmonton |Portland
2020-05-21T07:59:04.000Z|PK7R8IF |Edmonton |Winnipeg
2020-05-22T00:00:00.000Z|RLMOSMO |Edmonton |Rome
2020-05-22T17:10:22.000Z|K0SUJFG |Tokoname |Edmonton
2020-05-22T19:06:34.000Z|ECEIAND |Edmonton |Treviso
2020-05-23T01:20:52.000Z|VG2K3M9 |Amsterdam |Edmonton
2020-05-23T22:34:45.000Z|8FXIRFY |Edmonton |Miami
注意,这里没有要求指定该字段。 只需使用QUERY运算符搜索“Edmonton”就足够了。 此外,请注意,我们往返卡斯特鲁普的航班都延迟了。 Elasticsearch查询在这里:
POST /_sql/translate
{
"query": """
SELECT timestamp, FlightNum, OriginCityName, DestCityName FROM flights WHERE QUERY('Edmonton') AND FlightDelay=true AND timestamp > '2018-06-20' AND timestamp < '2020-06-27' ORDER BY timestamp
"""
}
{
"size" : 1000,
"query" : {
"bool" : {
"must" : [
{
"bool" : {
"must" : [
{
"query_string" : {
"query" : "Edmonton",
"fields" : [ ],
"type" : "best_fields",
"default_operator" : "or",
"max_determinized_states" : 10000,
"enable_position_increments" : true,
"fuzziness" : "AUTO",
"fuzzy_prefix_length" : 0,
"fuzzy_max_expansions" : 50,
"phrase_slop" : 0,
"escape" : false,
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
},
{
"term" : {
"FlightDelay" : {
"value" : true,
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
{
"range" : {
"timestamp" : {
"from" : "2018-06-20",
"to" : "2020-06-27",
"include_lower" : false,
"include_upper" : false,
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"_source" : {
"includes" : [
"FlightNum",
"OriginCityName",
"DestCityName"
],
"excludes" : [ ]
},
"docvalue_fields" : [
{
"field" : "timestamp",
"format" : "epoch_millis"
}
],
"sort" : [
{
"timestamp" : {
"order" : "asc",
"missing" : "_last",
"unmapped_type" : "date"
}
}
]
}
对于Elasticsearch的新用户来说,这代表了一个相对复杂的查询。 我们有一个带有嵌套范围,术语限制和查询字符串运算符的布尔查询。 对于从SQL迁移应用程序的用户而言,这在传统上可能是一项相当艰巨的任务,甚至在担心最终查询在功能上是否正确和最佳之前也是如此。 实际的query_string运算符已嵌套在过滤器中,因为不需要相关性(我们按日期排序),从而使我们能够利用过滤器缓存,跳过评分并缩短响应时间。
这些运算符的参数也在SQL中公开。 最后一个示例说明了如何将MATCH查询与跨多个字段的多个搜索词一起使用以限制结果。
“找到往返巴塞罗那的天气晴朗的航班”
出于示例目的,我们还通过Score() 函数进行排序并显示相关性得分。
sql> SELECT Score(), timestamp, FlightNum, OriginCityName, DestCityName, DestWeather, OriginWeather FROM flights WHERE MATCH('*Weather,*City*', 'Lightning Barcelona', 'type=cross_fields;operator=AND') ORDER BY Score() DESC LIMIT 5;
Score() | timestamp | FlightNum |OriginCityName | DestCityName | DestWeather | OriginWeather
---------------+------------------------+---------------+---------------+---------------+---------------+-------------------
6.917009 |2020-04-16T06:00:41.000Z|L637ISB |Barcelona |Santiago |Rain |Thunder & Lightning
6.917009 |2020-04-16T01:58:51.000Z|ZTOD7RQ |Barcelona |Dubai |Sunny |Thunder & Lightning
6.917009 |2020-04-22T14:02:34.000Z|QSQA5CT |Barcelona |Naples |Rain |Thunder & Lightning
6.917009 |2020-04-29T12:23:44.000Z|0GIHB62 |Barcelona |Buenos Aires |Clear |Thunder & Lightning
6.917009 |2020-04-30T07:42:21.000Z|L09W9TV |Barcelona |Dubai |Cloudy |Thunder & Lightning
我们使用通配符模式来指定要匹配的字段,并要求匹配为布尔AND。 跨字段参数不需要术语全部出现在一个字段中,而是允许它们出现在不同的字段中,前提是两个字段都存在。 给定数据的结构,这对于匹配至关重要。
这里的示例返回列和组。 但是,QUERY和MATCH运算符也可以与GROUP BY一起使用-有效地过滤到Elasticsearch的聚合。
到目前为止,我们的查询仅针对单个表/索引。 如果我们复制flights索引,并通过reindex请求将文档复制到新的命名版本,则只要两个索引具有相同的映射,就可以同时查询这两个索引。 映射中的任何差异都可能导致查询在分析时出错。 为了一起查询多个索引,用户可以将它们添加到Elasticsearch别名中,也可以在WHERE子句中使用通配符。如果大家还记得的话,在上一篇文章“Elasticsearch:Elasticsearch SQL介绍及实例”中,我们已经把之前的索引“kibana_sample_data_flights”通过reindex的方法导入到flight1索引中。现在我们也可以通过如下的方法复制这个索引到索引flight2中。
POST _reindex
{
"source": {
"index": "flight1"
},
"dest": {
"index": "flight2"
}
}
我们可以通过如下的方法来设置flight1和flight2的别名为f_alias:
POST /_aliases
{
"actions": [
{
"add": {
"index": "flight1",
"alias": "f_alias"
}
},
{
"add": {
"index": "flight2",
"alias": "f_alias"
}
}
]
}
那么我们可以通过如下的方法来查询:
sql> SELECT FlightNum, OriginCityName, DestCityName, DestWeather, OriginWeather FROM f_alias ORDER BY timestamp DESC LIMIT 2;
FlightNum |OriginCityName | DestCityName | DestWeather | OriginWeather
---------------+---------------+---------------+---------------+---------------
GDZWNB0 |London |Shanghai |Rain |Clear
GDZWNB0 |London |Shanghai |Rain |Clear
传统RDBMS SQL实现中的JOIN允许通过单独的表格响应中的相关列来合并不同的表格。 与Elasticsearch本地可用的选项相比,这允许数据的关系建模,并且代表了一个重要的主题。 尽管Elasticsearch SQL当前不支持JOIN运算符,但它确实允许用户利用嵌套文档,该文档提供了一对多的简单关系建模。 嵌套文档的查询对用户是透明的。 为了演示此功能,我们需要一个包含此类数据的索引。 该索引的文档代表电子商务网站的订单,并包含诸如order_date,billing_city和customer_last_name之类的字段。 此外,“产品”字段包含订单中每个产品的嵌套子文档。 为了加载这个文档,我们安装之前文章“Elasticsearch:Elasticsearch SQL介绍及实例”中介绍的那样,只不过这次我们加载的是eCommerce的数据:
一旦数据加载完毕,我们可以在Kibana中找到一个叫做kibana_sample_data_ecommerce的索引。它的文档的一个例子:
{
"category" : [
"Men's Clothing"
],
"currency" : "EUR",
"customer_first_name" : "Eddie",
"customer_full_name" : "Eddie Underwood",
"customer_gender" : "MALE",
"customer_id" : 38,
"customer_last_name" : "Underwood",
"customer_phone" : "",
"day_of_week" : "Monday",
"day_of_week_i" : 0,
"email" : "[email protected]",
"manufacturer" : [
"Elitelligence",
"Oceanavigations"
],
"order_date" : "2020-05-04T09:28:48+00:00",
"order_id" : 584677,
"products" : [
{
"base_price" : 11.99,
"discount_percentage" : 0,
"quantity" : 1,
"manufacturer" : "Elitelligence",
"tax_amount" : 0,
"product_id" : 6283,
"category" : "Men's Clothing",
"sku" : "ZO0549605496",
"taxless_price" : 11.99,
"unit_discount_amount" : 0,
"min_price" : 6.35,
"_id" : "sold_product_584677_6283",
"discount_amount" : 0,
"created_on" : "2016-12-26T09:28:48+00:00",
"product_name" : "Basic T-shirt - dark blue/white",
"price" : 11.99,
"taxful_price" : 11.99,
"base_unit_price" : 11.99
},
{
"base_price" : 24.99,
"discount_percentage" : 0,
"quantity" : 1,
"manufacturer" : "Oceanavigations",
"tax_amount" : 0,
"product_id" : 19400,
"category" : "Men's Clothing",
"sku" : "ZO0299602996",
"taxless_price" : 24.99,
"unit_discount_amount" : 0,
"min_price" : 11.75,
"_id" : "sold_product_584677_19400",
"discount_amount" : 0,
"created_on" : "2016-12-26T09:28:48+00:00",
"product_name" : "Sweatshirt - grey multicolor",
"price" : 24.99,
"taxful_price" : 24.99,
"base_unit_price" : 24.99
}
],
"sku" : [
"ZO0549605496",
"ZO0299602996"
],
"taxful_total_price" : 36.98,
"taxless_total_price" : 36.98,
"total_quantity" : 2,
"total_unique_products" : 2,
"type" : "order",
"user" : "eddie",
"geoip" : {
"country_iso_code" : "EG",
"location" : {
"lon" : 31.3,
"lat" : 30.1
},
"region_name" : "Cairo Governorate",
"continent_name" : "Africa",
"city_name" : "Cairo"
}
}
通常,查询这些文档将要求用户理解为什么我们要对产品字段使用嵌套的数据类型,并且还要了解嵌套的查询语法。 但是,通过Elasticsearch SQL,我们能够查询这些嵌套文档,就好像每个嵌套文档都使用其父级字段代表一个单独的行一样(即,我们有效地扁平化了表示结构)。 考虑上面有两个产品的订单。 当从产品子文档中请求字段时,查询时将其显示为两行。 如果需要,每一行还可以包含父订单的字段。 例如:
“查找航班584677所使用的帐单名称和购买的产品。”
如果我们查看一下kibana_sample_data_ecommerce,我们发现这个索引的products字段并不是我们想象的nested类型。为此,我们需要重新定义它的mapping:
PUT orders
{
"mappings": {
"properties": {
"category": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"currency": {
"type": "keyword"
},
"customer_birth_date": {
"type": "date"
},
"customer_first_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"customer_full_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"customer_gender": {
"type": "keyword"
},
"customer_id": {
"type": "keyword"
},
"customer_last_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"customer_phone": {
"type": "keyword"
},
"day_of_week": {
"type": "keyword"
},
"day_of_week_i": {
"type": "integer"
},
"email": {
"type": "keyword"
},
"geoip": {
"properties": {
"city_name": {
"type": "keyword"
},
"continent_name": {
"type": "keyword"
},
"country_iso_code": {
"type": "keyword"
},
"location": {
"type": "geo_point"
},
"region_name": {
"type": "keyword"
}
}
},
"manufacturer": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"order_date": {
"type": "date"
},
"order_id": {
"type": "keyword"
},
"products": {
"type": "nested",
"properties": {
"_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"base_price": {
"type": "half_float"
},
"base_unit_price": {
"type": "half_float"
},
"category": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"created_on": {
"type": "date"
},
"discount_amount": {
"type": "half_float"
},
"discount_percentage": {
"type": "half_float"
},
"manufacturer": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"min_price": {
"type": "half_float"
},
"price": {
"type": "half_float"
},
"product_id": {
"type": "long"
},
"product_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "english"
},
"quantity": {
"type": "integer"
},
"sku": {
"type": "keyword"
},
"tax_amount": {
"type": "half_float"
},
"taxful_price": {
"type": "half_float"
},
"taxless_price": {
"type": "half_float"
},
"unit_discount_amount": {
"type": "half_float"
}
}
},
"sku": {
"type": "keyword"
},
"taxful_total_price": {
"type": "half_float"
},
"taxless_total_price": {
"type": "half_float"
},
"total_quantity": {
"type": "integer"
},
"total_unique_products": {
"type": "integer"
},
"type": {
"type": "keyword"
},
"user": {
"type": "keyword"
}
}
}
}
在上面,我们对原有的mapping做了如下的修改:
"products": {
"type": "nested",
"properties": {
"_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"base_price": {
"type": "half_float"
},
"base_unit_price": {
"type": "half_float"
},
"category": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"created_on": {
"type": "date"
},
"discount_amount": {
"type": "half_float"
},
"discount_percentage": {
"type": "half_float"
},
"manufacturer": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"min_price": {
"type": "half_float"
},
"price": {
"type": "half_float"
},
"product_id": {
"type": "long"
},
"product_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "english"
},
"quantity": {
"type": "integer"
},
"sku": {
"type": "keyword"
},
"tax_amount": {
"type": "half_float"
},
"taxful_price": {
"type": "half_float"
},
"taxless_price": {
"type": "half_float"
},
"unit_discount_amount": {
"type": "half_float"
}
}
}
在上面我加入了如下的一句:
"type": "nested",
这样我们把products这个字段设置为nested数据类型。如果大家对nested数据类型还是不太清楚的话,请参阅我之前的文字“Elasticsearch: nested对象”。我们使用如下命令来做reindex:
POST _reindex
{
"source": {
"index": "kibana_sample_data_ecommerce"
},
"dest": {
"index": "orders"
}
}
我们通过如下的方式来继续查询:
sql> SELECT customer_last_name, customer_first_name, products.price, products.product_id FROM orders WHERE order_id=584677;
customer_last_name|customer_first_name| products.price |products.product_id
------------------+-------------------+------------------+-------------------
Underwood |Eddie |11.989999771118164|6283
Underwood |Eddie |24.989999771118164|19400
_translate API将显示如何使用嵌套查询构造此查询:
POST /_sql/translate
{
"query": """
SELECT customer_last_name, customer_first_name, products.price, products.product_id FROM orders WHERE order_id=584677
"""
}
上面的显示结果是:
{
"size" : 1000,
"query" : {
"bool" : {
"must" : [
{
"term" : {
"order_id" : {
"value" : 584677,
"boost" : 1.0
}
}
},
{
"nested" : {
"query" : {
"match_all" : {
"boost" : 1.0
}
},
"path" : "products",
"ignore_unmapped" : false,
"score_mode" : "none",
"boost" : 1.0,
"inner_hits" : {
"name" : "products_1",
"ignore_unmapped" : false,
"from" : 0,
"size" : 99,
"version" : false,
"seq_no_primary_term" : false,
"explain" : false,
"track_scores" : false,
"_source" : {
"includes" : [
"products.product_id",
"products.price"
],
"excludes" : [ ]
}
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"_source" : {
"includes" : [
"customer_last_name",
"customer_first_name"
],
"excludes" : [ ]
},
"sort" : [
{
"_doc" : {
"order" : "asc"
}
}
]
}
相反,如果仅查询父字段,则仅显示一行:
“查找航班用于订单584677的帐单名称”
sql> SELECT customer_last_name, customer_first_name FROM orders WHERE order_id=584677;
customer_last_name|customer_first_name
------------------+-------------------
Underwood |Eddie