信息输出: 搜索分析
连载中...
Information out: search and analyze
之所以能够使用Elasticsearch存储检索文档数据和它们的元数据还要感谢底层的搜索引擎Lucene
.
While you can use Elasticsearch as a document store and retrieve documents and their metadata, the real power comes from being able to easily access the full suite of search capabilities built on the Apache Lucene search engine library.
Elasticsearch基于Lucene又提供了简单易用的REST API用于管理集群和对数据进行索引搜索处理.简单到你可以直接通过命令行也可以通过Kibana提供的开发者控制台发起请求操作Elasticsearch.在应用中(你编写的程序中)你可以使用Elasticsearch客户端操作Elasticsearch,目前Elasticsearch不但提供了Java、JavaScript、Go、.Net、PHP语言的客户端还提供了使用Perl、Python、Ruby编写的客户端.
Elasticsearch provides a simple, coherent REST API for managing your cluster and indexing and searching your data. For testing purposes, you can easily submit requests directly from the command line or through the Developer Console in Kibana. From your applications, you can use the Elasticsearch client for your language of choice: Java, JavaScript, Go, .NET, PHP, Perl, Python or Ruby.
搜索
Searching your data
可以使用Elasticsearch提供的REST API进行结构化搜索、全文检索和组合搜索(把俩个搜索组合到一起).结构化搜索有点类似于使用SQL构建的搜索.比如搜索hire_date
为特定值的employee
的gender
和age
字段. 全文检索是按文档跟搜索文档的相关程度返回搜索结果,越匹配搜索文本的文档越在最前面.那怎么定义越匹配
呢? 这就要提到打分机制了.后面文档有介绍,这里就不展开了.
The Elasticsearch REST APIs support structured queries, full text queries, and complex queries that combine the two. Structured queries are similar to the types of queries you can construct in SQL. For example, you could search the
gender
andage
fields in youremployee
index and sort the matches by thehire_date
field. Full-text queries find all documents that match the query string and return them sorted by relevance—how good a match they are for your search terms.
Elasticsearch除了支持单个词的查询,还支持语句查询、相似查询、前置匹配查询还支持提供自动补全建议.就问你功能强大不强大?
In addition to searching for individual terms, you can perform phrase searches, similarity searches, and prefix searches, and get autocomplete suggestions.
需要搜索地理位置信息和其它数字类型的数据?就像上一篇介绍的Elasticsearch对这种特定类型的数据是使用了优化过的特定数据结构存储的而不是直接存储个文本了事,这也是它搜索快的原因.
Have geospatial or other numerical data that you want to search? Elasticsearch indexes non-textual data in optimized data structures that support high-performance geo and numerical queries.
你可以使用Elasticsearch提供的功能强大的JSON风格的查询语言搜索数据也可以采用类似SQL的查询对数据进行搜索统计.Elasticsearch提供的JDBC和ODBC驱动可以很方便跟第三方应用使用SQL交互.
You can access all of these search capabilities using Elasticsearch’s comprehensive JSON-style query language (Query DSL). You can also construct SQL-style queries to search and aggregate data natively inside Elasticsearch, and JDBC and ODBC drivers enable a broad range of third-party applications to interact with Elasticsearch via SQL.
分析
Analyzing your data
Elasticsearch提供的聚合功能可以让我们构建一些比较复杂统计查询从而可以发现数据中的一些关键指标、规律模式趋势。而不只是"大海捞针".使用聚合还可以解答这样的问题:
Elasticsearch aggregations enable you to build complex summaries of your data and gain insight into key metrics, patterns, and trends. Instead of just finding the proverbial “needle in a haystack”, aggregations enable you to answer questions like:
-
大海里究竟有多少针?
-
> How many needles are in the haystack?
-
这些针平均多长?
-
> What is the average length of the needles?
-
每个制造商制造的针的平均长度是多少?
-
> What is the median length of the needles, broken down by manufacturer?
-
每六个月大海中新增多少针?
- > How many needles were added to the haystack in each of the last six months?
还可以使用聚合解答更难点的问题:
You can also use aggregations to answer more subtle questions, such as:
-
你最喜欢哪个针制造商?
-
> What are your most popular needle manufacturers?
-
是否有不合格的针(批次)
- > Are there any unusual or anomalous clumps of needles?
执行聚合操作和搜索操作使用的是相同的数据结构,所以聚合操作像搜索操作一样快.因此我们可以近实时的对数据进行分析和可视化.报表和看板可以显示最近的信息.
Because aggregations leverage the same data-structures used for search, they are also very fast. This enables you to analyze and visualize your data in real time. Your reports and dashboards update as your data changes so you can take action based on the latest information.
另外,聚合操作可以跟搜索操作一起使用.也就是可以在对文档进行搜索、过滤的同时在同一个请求中对数据进行分析操作.因为搜索和统计都是在同一个执行上下文中的,所以我们不但可以计算所有尺寸为70的针数量,还可以计算所有尺寸为70并且符合特定条件比如不粘的绣花针数量.
What’s more, aggregations operate alongside search requests. You can search documents, filter results, and perform analytics at the same time, on the same data, in a single request. And because aggregations are calculated in the context of a particular search, you’re not just displaying a count of all size 70 needles, you’re displaying a count of the size 70 needles that match your users' search criteria—for example, all size 70 non-stick embroidery needles.
等等少年,还有功能
But wait, There's more.
想自动分析时序数据?你可以使用机器学习功能去计算数据中的基准线识别异常数据.使用机器学习,我们可以:
Want to automate the analysis of your time-series data? You can use machine learning features to create accurate baselines of normal behavior in your data and identify anomalous patterns. With machine learning, you can detect:
-
检测不正常的数据、计数和频率
-
> Anomalies related to temporal deviations in values, counts, or frequencies
-
检测稀有的数据
-
> Statistical rarity
-
从群体中检测出不正常的成员
- > Unusual behaviors for a member of a population
更劲爆更强大的是我们甚至都不需要指定算法训练模型甚至连一些跟数据研究有关的配置都不需要就可以完成.
就问你强大不强大?高级不高级?
And the best part? You can do this without having to specify algorithms, models, or other data science-related configurations.