Elasticsearch Api

基于HTTP协议,以JSON为数据交互格式的RESTful API

使用RESTful API,通过9200端口的与Elasticsearch进行通信,这里使用curl命令与Elasticsearch通信。

向Elasticsearch发出的请求的组成部分与其它普通的HTTP请求是一样的:

curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'

·         VERB HTTP方法:GET, POST, PUT, HEAD, DELETE

·         PROTOCOL http或者https协议(只有在Elasticsearch前面有https代理的时候可用)

·         HOST Elasticsearch集群中的任何一个节点的主机名,如果是在本地的节点,那么就叫localhost

·         PORT Elasticsearch HTTP服务所在的端口,默认为9200

·         PATH API路径(例如_count将返回集群中文档的数量)

·         QUERY_STRING 一些可选的查询请求参数,例如?pretty参数将使请求返回更加美观易读的JSON数据

·         BODY 一个JSON格式的请求主体

举例说明,为了计算集群中的文档数量,我们可以这样做:

curl -XGET 'http://localhost:9200/_count?pretty' -d '

{

    "query": {

        "match_all": {}

    }

}

等同于

curl http://localhost:9200/_cat/count

 

查看f5日志一天的日志访问量

[root@shnh-bak002 ~]# curl -XGET 'http://127.0.0.1:9200/f5-access-2016.03.05/_count?pretty' -d '

{

    "query": {

        "match_all": {}

    }

}'

返回

{

  "count" : 13471904,

  "_shards" : {

    "total" : 5,

    "successful" : 5,

    "failed" : 0

  }

}

 

 

一 在Elasticsearch中基于RESTful接口方式数据操作 (Search APIs

  http://localhost:9200/index_file_name/type_name/_search?q=field_name:Hello&pretty=true

  上述方法可以在索引文件index_file_name,指定的类型文件type_name中,在指定的字段fileld_name中,查找包含Hello字符串的结果集。

简单查询:

1. 查询指定索引和指定类型下的信息(指定一个index和一个type):
curl -XGET 'http://localhost:9200/f5-access-2016.03.05/f5-access/_search?q=domainname:www.pinganfang.com&pretty=ture'

等同于

curl -XGET 'http://localhost:9200/f5-access-2016.03.05/f5-access/_search?pretty=true' -d '{

 "query": {

   "term": {"domainname": "www.pinganfang.com"}

  }

}'

2.查询指定索引下所有类型中的信息(指定一个Index,没指定type):

curl -XGET 'http://localhost:9200/f5-access-2016.03.05/_search?q=domainname:www.pinganfang.com&pretty=ture'

3.查询所有索引中的信息(没有指定indextype):比较慢

curl -XGET 'http://localhost:9200/_search?q=domainname:www.pinganfang.com&pretty=ture'

4.查询多个索引下所有类型中的信息(指定多个index名和多个type):

curl -XGET 'http://localhost:9200/nginx-access-2016.03.05,f5-access-2016.03.05/_search?q=domainname:www.pinganfang.com&pretty=ture'

5.查询多个索引下多个类型中的信息(指定多个Index名和多个type):

curl -XGET 'http://localhost:9200/nginx-access-2016.03.05,f5-access-2016.03.05/nginx-access,f5-access/_search?q=domainname:www.pinganfang.com&pretty=ture'

6.分页查询,请求Body部分可以设置2个额外属性.
  from:
指定了从哪个结果开始返回(默认是0)

  Size: 该属性指定了查询的集合中包含的最大的文档数(默认是10)

curl -XGET 'http://localhost:9200/f5-access-2016.03.05/f5-access/_search?pretty=true' -d '{

 "from":10,

 "size":2,

 "query": {

   "term": {"domainname": "www.pinganfang.com"}

  }

}'

7.指定返回的字段子集

curl -XGET 'http://localhost:9200/f5-access-2016.03.05/f5-access/_search?pretty=true' -d '{

 "query": {

   "term": {"domainname": "www.pinganfang.com"}

  },

  "_source": ["domainname","status"]

}'

 

 

DSLDomain Special Language: 结构化查询,封装一个JSON格式的对象,发送给ElasticSearch       Query DSL

API  Query  DSL

脚本


[root@shnh-bak001 test]# cat data.py

#!/usr/bin/env python

#coding=utf-8

#date:2015-01-28

#writer: hechen

#curl -XPOST 'http://127.0.0.1:9200/f5-access-2016.01.29/_search?pretty=true' -d '

from elasticsearch import Elasticsearch

import sys

import json

#top access ip in recently a minute

def data_search():

    rets = """{

                "query": {

                  "match_all": {}

                },

                "_source": [

                  "geoip.ip",

                  "status"

                ]

    }"""

    return rets

 

if __name__ == '__main__':

    index_day = "f5-access-2016.02.28"

    es = Elasticsearch(["http://10.10.108.23:9200/"])

#    print help(es.search)

    data = es.search(index=index_day, body=data_search(), size=2)  只显示2条记录 

    str = json.dumps(data,indent=5)

    print str

[root@shnh-bak001 test]# python data.py

{

     "hits": {

          "hits": [

               {

                    "_score": 1.0,

                    "_type": "f5-access",

                    "_id": "AVMnVlEkm-B6xxJMSVeA",

                    "_source": {

                         "status": 200,

                         "geoip": {

                              "ip": "1.207.169.175"

                         }

                    },

                    "_index": "f5-access-2016.02.28"

               }

          ],

          "total": 8777026,

          "max_score": 1.0

     },

     "_shards": {

          "successful": 5,

          "failed": 0,

          "total": 5

     },

     "took": 85,

     "timed_out": false

}




1.Term查询 (查询仅匹配在给定字段有某个词项的文档)
 Term
是代表完全匹配,即不进行分词器分析,文档中必须包含整个搜索的词汇

{

                "query": {

                  "term": {

                    "domainname": "www.pinganfang.com"

                  }

                }

 }

2. Terms查询 (查询允许匹配包含某些词项的文档)

{

            "query": {

              "terms": {

                "domainname": ["www.pinganfang.com","member.pinganfang.com"]  // 含有www.pinganfang.com或者member.pinganfang.com

              }

            }

  }

3. Wildcard通配符查询

{

                "query": {

                  "wildcard": {

                    "domainname": "www.d*"

                  }

                }

 }

4.Match 查询子句可接受文字,数字和日期等类型的数据

匹配字段domainname 包含xxx.pinganfang.com 或www.pinganfang.com

{

               "query": {

                "match" : {

                    "domainname" : "xxx.pinganfang.com www.pinganfang.com"

                }

              }

 }

5.布尔match查询

匹配字段domainname 包含 xxx.pinganfang.com 且 www.pinganfang.com

{

               "query": {

                "match" : {

                    "domainname" : {

                        "query" : "xxx.pinganfang.com www.pinganfang.com",

                        "operator" : "and"

                    }

                }

              }

    }

6.Match_all (查询指定索引下的所有文档,相当于select * from )

{

                "query": {

                   "match_all": {} 

                }

}

7.匹配所有文档且检索结果按指定字段排序,返回2个记录

{

                "query": {

                   "match_all": {}

                },

                "sort": {

                   "request_body": {      //排序字段

                      "order": "desc"     //降序,升序为asc

                   }

                },

                "size":2                  //返回结果的大小

  }

8.Multi_match 跨字段检索,即在多个字段中进行检索

匹配domainname或request字段含有www.pinganfang.com

 

{

        "query": {

            "multi_match": {

               "query": "www.pinganfang.com",

               "fields": ["domainname", "request"]

            }

        }

}

9.fuzzy 查询

第三了模糊查询通过计算给定项与文档的编辑距离来得到结果.

 

{

                "query": {

                  "fuzzy": {

                    "domainname": "www.pingnafang.com"

                  }

                }

              }

}

10.Range 查询是范围查询

查找最近一分钟的数据(注意选择 now)

 

{

        "query": {

               "range" : {

                  "@timestamp": {

                    "from": "now-1m",  //范围下界

                    "to": "now"        //范围上界

                   }

                }

        }

}

等同于以下

{

               "range" : {

                  "@timestamp": {

                     "gt" : "now-1m"

                   }

                }

       

 }

过滤查询结果

只需要在query属性下添加filter字段就可以再任何搜索中使用过滤器.也可以最后再由filtered子句将它们合并。

f5-access-2016.03.05 查询 member.pinganfang.com 过滤 status302

{

                "query": {

                    "filtered": {

                        "query": {

                          "match": {

                            "domainname": "member.pinganfang.com"

                          }

                        },

                        "filter": {

                          "term": {

                            "status": "302"

                          }

                        }

                    }

                }

    }

1.And FilterOr Filter(过滤器之间满足逻辑与,抑或逻辑或的关系)

 {

                    "filter": {

                      "or": [

                       {

                          "term": {

                            "domainname": "member.pinganfang.com"

                          }

                       },

                       {

                          "term": {

                            "status": "302"

                          }

                       }

                      ]

                    }

    }


2.Bool Filter(它可以将多个查询块通过must,must_not等连接整合在一起)

 

{

                "query": {

                     "bool": {

                       "must_not": {     //必须要排除的条件

                         "term": {

                           "domainname": "member.pinganfang.com"

                         }

                       },

                       "must": {        //必须要满足的条件

                           "term": {

                             "status": "302"

                           }

                      }

                     }

                }

 

  }

 3.复合查询(可以将各个子查询封装在一起)

 should --- 完成逻辑或的布尔查询

 must ---- 查询必须在返回文档中被匹配上

 must_not---查询不能再返回的文档中被匹配上

{

                 "query": {

                     "bool": {

                       "must":[   

                           { "term": {"domainname": "member.pinganfang.com"} },

                           { "term": {"request": "member.pinganfang.com"}}

                       ]

                     }

                }

}

  

Aggregations (对查询结果的二次汇总)

1.Metrics Aggregations

统计发送给客户端最大的字节数

{

              "query": {

                   "match_all": {}

                },

              "sort": {

                  "bytes": {      //排序字段

                      "order": "desc"     //降序,升序为asc

                   }

              },

              "aggs" : {

                   "max_bytes" : {          //aggs的名称

                        "max" : {             //统计最大值,可换成stats,多值统计.

                             "field" : "bytes"   //统计字段

                        }

                    }

               }

 }

2.Terms Aggregations (用于对指定字段的内容进行分布统计)

统计http code 分布

 {

 

                 "query": {

                    "match_all": {}

                 },

                 "aggs": {

                   "http_code": {

                      "terms": {

                            "field": "status"

                       }

                   }

                 }

    }

3.Bucket嵌套

先将域名分类统计,然后再分类统计其http code.

{

                "query": {

                   "match_all": {}

                },

                "aggs" : {

                    "domainname" : {

                        "terms" : {

                            "field" : "domainname"

                        },

                        "aggs" : {

                            "http_code" : { "terms" : { "field" : "status" } }

                        }

                    }

               }

} 

4.Range Aggregations

用于范围统计,对于普通的数值类型字段的聚合统计

统计bytes按小于200,200-1000,大于1000分别进行统计

{

                "query": {

                   "match_all": {}

                },

                "aggs": {

                      "code_ranges": {

                          "range": {

                              "field": "bytes",

                              "ranges": [

                                  {

                                    "to": 200

                                  },

                                  {

                                    "from": 200,

                                    "to": 1000

                                  },

                                  {

                                    "from": 1000

                                  }

                              ]

                          }

                      }

                  }

    } 

 5.Data_range Aggregations

 针对时间类型的字段进行区段统计

 统计1分钟内访问总数

{

                "query": {

                   "match_all": {}

                },

                "aggs": {

                      "range": {

                          "date_range": {

                              "field": "@timestamp",

                              "format": "yyyy/MM/dd",

                              "ranges": [

                                  { "from": "now-1m" }   //从当前日期的前一分钟到现在

                              ]

                          }

                      }

                  }

    }

线上实例:

统计f5日志一分钟访问超过100次的ip

[root@shnh-bak001 custon_scripts]# cat top_ip.py

#!/usr/bin/env python

#coding=utf-8

#date:2015-01-28

#curl -XPOST 'http://127.0.0.1:9200/f5-access-2016.01.29/_search?pretty=true' -d '

 

from elasticsearch import Elasticsearch

import sys

def usage():

    print 'Usage: python ' + sys.argv[0] + ' <"warn_ip_count">'

    sys.exit()

#top access ip in recently a minute

def top_search(query_str):

    rets = """{

        "query": {

          "filtered": {

            "filter": {

              "range": {

                "@timestamp": {

                  "gt" : "now-1m"

                }

              }

            }

          }

        },

 

        "aggs": {

          "%s": {

            "terms": {

              "field": "%s",

              "size": 10

            }

          }

        }

    }""" % (query_str, query_str + ".raw")

 

    return rets

#print top_search(q_ip)

 

if __name__ == '__main__':

    if len(sys.argv) < 2:

        usage()

   

    warn_ip_count = int(sys.argv[1])

    index_day = "f5-access-*"

    q_ip = "geoip.ip"

    flag = 0

 

    es = Elasticsearch(["http://10.10.108.23:9200/"])

    top_ip = es.search(index=index_day, body=top_search(q_ip))

    data = top_ip["aggregations"]["geoip.ip"]["buckets"]

    for i in data:

        ip = i["key"]

        ip_count = i["doc_count"]

        if ip_count > warn_ip_count:

            flag = 1

            print "ip:%s, count:%s" % (ip, ip_count)

       

    if flag == 0:

        print "OK"

 

[root@shnh-bak001 test]# python top_ip.py 100
ip:183.57.148.147, count:149
ip:203.208.60.85, count:118

 
  
  
  
  


参考 https://www.elastic.co/guide/en/elasticsearch/reference/2.0/index.html

你可能感兴趣的:(Elasticsearch Api)