曹木青芸

Flink SQL:Queries(Pattern Recognition)

Pattern Recognition 模式识别

Streaming

It is a common use case to search for a set of event patterns, especially in case of data streams. Flink comes with a complex event processing (CEP) library which allows for pattern detection in event streams. Furthermore, Flink’s SQL API provides a relational way of expressing queries with a large set of built-in functions and rule-based optimizations that can be used out of the box.
搜索一组事件模式是一种常见的用例，尤其是在数据流的情况下。Flink附带了一个复杂事件处理(CEP)库，允许在事件流中进行模式检测。此外，Flink的SQL API提供了一种关系型的查询表达方式，其中包含大量内置函数和基于规则的优化，可以开箱即用。

In December 2016, the International Organization for Standardization (ISO) released a new version of the SQL standard which includes Row Pattern Recognition in SQL (ISO/IEC TR 19075-5:2016). It allows Flink to consolidate CEP and SQL API using the MATCH_RECOGNIZE clause for complex event processing in SQL.
2016年12月，国际标准化组织(ISO)发布了新版本的SQL标准，其中包括SQL中的行模式识别 (ISO/IEC TR 19075-5:2016)。它允许Flink使用MATCH_RECOGNIZE 子句合并CEP和SQL API，以便在SQL中进行复杂事件处理。

A MATCH_RECOGNIZE clause enables the following tasks:
MATCH_RECOGNIZE 子句支持以下任务：

Logically partition and order the data that is used with the PARTITION BY and ORDER BY clauses.
对与PARTITION BY和ORDER BY子句一起使用的数据进行逻辑分区和排序。
Define patterns of rows to seek using the PATTERN clause. These patterns use a syntax similar to that of regular expressions.
使用PATTERN子句定义要查找的行的模式。这些模式使用与正则表达式类似的语法。
The logical components of the row pattern variables are specified in the DEFINE clause.
行模式变量的逻辑组件在DEFINE子句中指定。
Define measures, which are expressions usable in other parts of the SQL query, in the MEASURES clause.
在MEASURES子句中定义度量值，这些度量值是SQL查询的其他部分中可用的表达式。

The following example illustrates the syntax for basic pattern recognition:
以下示例说明了基本模式识别的语法：

SELECT T.aid, T.bid, T.cid
FROM MyTable
    MATCH_RECOGNIZE (
      PARTITION BY userid
      ORDER BY proctime
      MEASURES
        A.id AS aid,
        B.id AS bid,
        C.id AS cid
      PATTERN (A B C)
      DEFINE
        A AS name = 'a',
        B AS name = 'b',
        C AS name = 'c'
    ) AS T

This page will explain each keyword in more detail and will illustrate more complex examples.
本页将更详细地解释每个关键字，并将说明更复杂的示例。

Flink’s implementation of the MATCH_RECOGNIZE clause is a subset of the full standard. Only those features documented in the following sections are supported. Additional features may be supported based on community feedback, please also take a look at the known limitations.
Flink对MATCH_RECOGNIZE子句的实现是完整标准的子集。仅支持以下章节中记录的功能。根据社区反馈，可能会支持其他功能，请查看已知的限制。

Introduction and Examples

Installation Guide

The pattern recognition feature uses the Apache Flink’s CEP library internally. In order to be able to use the MATCH_RECOGNIZE clause, the library needs to be added as a dependency to your Maven project.
模式识别功能在内部使用Apache Flink的CEP库。为了能够使用MATCH_RECOGNIZE子句，需要将库作为依赖项添加到Maven项目中。

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-cep</artifactId>
  <version>1.15.2</version>
</dependency>

Alternatively, you can also add the dependency to the cluster classpath (see the dependency section for more information).
或者，您也可以将依赖项添加到集群类路径（有关详细信息，请参见依赖项部分）。

If you want to use the MATCH_RECOGNIZE clause in the SQL Client, you don’t have to do anything as all the dependencies are included by default.
如果要在SQL客户端中使用MATCH_RECOGNIZE子句，则无需执行任何操作，因为默认情况下包含所有依赖项。

SQL Semantics

Every MATCH_RECOGNIZE query consists of the following clauses:
每个MATCH_RECOGNIZE查询都包含以下子句：

PARTITION BY - defines the logical partitioning of the table; similar to a GROUP BY operation.
PARTITION BY - 定义表的逻辑分区；类似于GROUP BY操作。
ORDER BY - specifies how the incoming rows should be ordered; this is essential as patterns depend on an order.
ORDER BY -指定传入行的排序方式；这是至关重要的，因为模式取决于顺序。
MEASURES - defines output of the clause; similar to a SELECT clause.
MEASURES -定义语句的输出；类似于SELECT子句。
ONE ROW PER MATCH - output mode which defines how many rows per match should be produced.
ONE ROW PER MATCH - 输出模式，定义每个匹配应该产生多少行。
AFTER MATCH SKIP - specifies where the next match should start; this is also a way to control how many distinct matches a single event can belong to.
AFTER MATCH SKIP - 指定下一个匹配的开始位置；这也是一种控制单个事件可以属于多少不同匹配的方法。
PATTERN - allows constructing patterns that will be searched for using a regular expression-like syntax.
PATTERN - 允许构建类似正则表达式的语法进行搜索的模式。
DEFINE - this section defines the conditions that the pattern variables must satisfy.
DEFINE - 定义模式变量必须满足的条件。

Attention Currently, the MATCH_RECOGNIZE clause can only be applied to an append table. Furthermore, it always produces an append table as well.
注意目前，MATCH_RECOGNIZE子句只能应用于追加表。此外，它还总是生成一个追加表。

Examples

For our examples, we assume that a table Ticker has been registered. The table contains prices of stocks at a particular point in time.
对于我们的示例，我们假设表Ticker已经注册。该表包含特定时间点的股票价格

The table has a following schema:
该表具有以下schema：

Ticker
     |-- symbol: String                           # symbol of the stock 股票的符号
     |-- price: Long                              # price of the stock
     |-- tax: Long                                # tax liability of the stock 股票的纳税义务
     |-- rowtime: TimeIndicatorTypeInfo(rowtime)  # point in time when the change to those values happened

For simplification, we only consider the incoming data for a single stock ACME. A ticker could look similar to the following table where rows are continuously appended.
为了简化，我们只考虑单个股票ACME的传入数据。股票价格报收机可能类似于下表，其中行被持续追加。

symbol         rowtime         price    tax
======  ====================  ======= =======
'ACME'  '01-Apr-11 10:00:00'   12      1
'ACME'  '01-Apr-11 10:00:01'   17      2
'ACME'  '01-Apr-11 10:00:02'   19      1
'ACME'  '01-Apr-11 10:00:03'   21      3
'ACME'  '01-Apr-11 10:00:04'   25      2
'ACME'  '01-Apr-11 10:00:05'   18      1
'ACME'  '01-Apr-11 10:00:06'   15      1
'ACME'  '01-Apr-11 10:00:07'   14      2
'ACME'  '01-Apr-11 10:00:08'   24      2
'ACME'  '01-Apr-11 10:00:09'   25      2
'ACME'  '01-Apr-11 10:00:10'   19      1

The task is now to find periods of a constantly decreasing price of a single ticker. For this, one could write a query like:
现在的任务是找到一个股票价格不断下降的时期。为此，可以编写如下查询：

SELECT *
FROM Ticker
    MATCH_RECOGNIZE (
        PARTITION BY symbol
        ORDER BY rowtime
        MEASURES
            START_ROW.rowtime AS start_tstamp,
            LAST(PRICE_DOWN.rowtime) AS bottom_tstamp,
            LAST(PRICE_UP.rowtime) AS end_tstamp
        ONE ROW PER MATCH
        AFTER MATCH SKIP TO LAST PRICE_UP
        PATTERN (START_ROW PRICE_DOWN+ PRICE_UP)
        DEFINE
            PRICE_DOWN AS
                (LAST(PRICE_DOWN.price, 1) IS NULL AND PRICE_DOWN.price < START_ROW.price) OR
                    PRICE_DOWN.price < LAST(PRICE_DOWN.price, 1),
            PRICE_UP AS
                PRICE_UP.price > LAST(PRICE_DOWN.price, 1)
    ) MR;

The query partitions the Ticker table by the symbol column and orders it by the rowtime time attribute.
该查询按symbol列对Ticker表进行分区，并按行时间属性对其排序。

The PATTERN clause specifies that we are interested in a pattern with a starting event START_ROW that is followed by one or more PRICE_DOWN events and concluded with a PRICE_UP event. If such a pattern can be found, the next pattern match will be seeked at the last PRICE_UP event as indicated by the AFTER MATCH SKIP TO LAST clause.
PATTERN子句指定我们对一个模式感兴趣，该模式有一个起始事件START_ROW，其后是一个或多个PRICE_DOWN事件，并以PRICE_UP事件结束。如果可以找到这样的模式，下一个模式匹配将会在AFTER MATCH SKIP TO LAST子句所示的最后一个PRICE_UP事件上被找到。

The DEFINE clause specifies the conditions that need to be met for a PRICE_DOWN and PRICE_UP event. Although the START_ROW pattern variable is not present it has an implicit condition that is evaluated always as TRUE.
DEFINE子句指定PRICE_DOWN和PRICE_UP事件需要满足的条件。尽管START_ROW模式变量不存在，但它有一个隐式条件，其求值始终为TRUE。

A pattern variable PRICE_DOWN is defined as a row with a price that is smaller than the price of the last row that met the PRICE_DOWN condition. For the initial case or when there is no last row that met the PRICE_DOWN condition, the price of the row should be smaller than the price of the preceding row in the pattern (referenced by START_ROW).
模式变量PRICE_DOWN定义为价格小于满足PRICE_ DOWN条件的最后一行的价格的行。对于初始情况，或者当没有满足PRICE_DOWN条件的最后一行时，该行的价格应该小于模式中前一行的价格（由START_ROW引用）。

A pattern variable PRICE_UP is defined as a row with a price that is larger than the price of the last row that met the PRICE_DOWN condition.
模式变量PRICE_UP定义为价格大于满足PRICE_DOWN条件的最后一行的价格的行。

This query produces a summary row for each period in which the price of a stock was continuously decreasing.
此查询为股票价格持续下降的每个期间生成一个摘要行。

The exact representation of the output rows is defined in the MEASURES part of the query. The number of output rows is defined by the ONE ROW PER MATCH output mode.
输出行的精确表示在查询的MEASURES部分中定义。输出行数由“ONE ROW PER MATCH”输出模式定义。

 symbol       start_tstamp       bottom_tstamp         end_tstamp
=========  ==================  ==================  ==================
ACME       01-APR-11 10:00:04  01-APR-11 10:00:07  01-APR-11 10:00:08

The resulting row describes a period of falling prices that started at 01-APR-11 10:00:04 and achieved the lowest price at 01-APR-11 10:00:07 that increased again at 01-APR-11 10:00:08.
结果行描述了从2011年4月1日10:00:04开始的一段价格下跌时期，并在2011年4月1日的10:00:07达到最低价格，之后在2011年4月1日的10:00:08再次上涨。

Partitioning

It is possible to look for patterns in partitioned data, e.g., trends for a single ticker or a particular user. This can be expressed using the PARTITION BY clause. The clause is similar to using GROUP BY for aggregations.
可以在分区数据中查找模式，例如，单个股票或特定用户的趋势。这可以使用PARTITION BY子句来表示。该子句类似于使用GROUP BY进行聚合。

It is highly advised to partition the incoming data because otherwise the MATCH_RECOGNIZE clause will be translated into a non-parallel operator to ensure global ordering.
强烈建议对传入数据进行分区，否则MATCH_RECOGNIZE子句将被转换为非并行运算符，以确保全局排序。

Order of Events

Apache Flink allows for searching for patterns based on time; either processing time or event time.
Apache Flink允许基于时间的搜索模式；处理时间或事件时间。

In case of event time, the events are sorted before they are passed to the internal pattern state machine. As a consequence, the produced output will be correct regardless of the order in which rows are appended to the table. Instead, the pattern is evaluated in the order specified by the time contained in each row.
在事件时间的情况下，在将事件传递到内部模式状态机之前，会对事件进行排序。因此，无论行被附加到表中的顺序如何，生成的输出都是正确的。相反，将按照每行中包含的时间指定的顺序计算模式。

The MATCH_RECOGNIZE clause assumes a time attribute with ascending ordering as the first argument to ORDER BY clause.
MATCH_RECOGNIZE子句假定时间属性以升序作为ORDER BY子句的第一个参数。

For the example Ticker table, a definition like ORDER BY rowtime ASC, price DESC is valid but ORDER BY price, rowtime or ORDER BY rowtime DESC, price ASC is not.
对于示例Ticker表，定义如ORDER BY rowtime ASC, price DESC有效，但ORDER BY price, rowtime or ORDER BY rowtime DESC, price ASC无效。

Define & Measures

The DEFINE and MEASURES keywords have similar meanings to the WHERE and SELECT clauses in a simple SQL query.
DEFINE和MEASURES关键字与简单SQL查询中的WHERE和SELECT子句具有相似的含义。

The MEASURES clause defines what will be included in the output of a matching pattern. It can project columns and define expressions for evaluation. The number of produced rows depends on the output mode setting.
MEASURES子句定义了匹配模式输出中包含的内容。它可以投影列并定义表达式以进行计算。生成的行数取决于输出模式设置。

The DEFINE clause specifies conditions that rows have to fulfill in order to be classified to a corresponding pattern variable. If a condition is not defined for a pattern variable, a default condition will be used which evaluates to true for every row.
DEFINE子句指定了行必须满足的条件，才能被分类到相应的模式变量。如果没有为模式变量定义条件，则将使用默认条件，该条件对每一行的求值都为true。

For a more detailed explanation about expressions that can be used in those clauses, please have a look at the event stream navigation section.
有关可以在这些子句中使用的表达式的更详细解释，请查看事件流导航部分。

Aggregations

Aggregations can be used in DEFINE and MEASURES clauses. Both built-in and custom user defined functions are supported.
聚合可以用在DEFINE和MEASURES子句中。支持内置和自定义用户定义函数。

Aggregate functions are applied to each subset of rows mapped to a match. In order to understand how those subsets are evaluated have a look at the event stream navigation section.
聚合函数应用于映射到匹配的行的每个子集。为了了解如何评估这些子集，请查看事件流导航部分。

The task of the following example is to find the longest period of time for which the average price of a ticker did not go below certain threshold. It shows how expressible MATCH_RECOGNIZE can become with aggregations. This task can be performed with the following query:
下面的例子的任务是找出一个股票收报机的平均价格没有低于某个阈值的最长时间段。它展示了MATCH_RECOGNIZE在聚合中的可表达性。可以使用以下查询执行此任务：

SELECT *
FROM Ticker
    MATCH_RECOGNIZE (
        PARTITION BY symbol
        ORDER BY rowtime
        MEASURES
            FIRST(A.rowtime) AS start_tstamp,
            LAST(A.rowtime) AS end_tstamp,
            AVG(A.price) AS avgPrice
        ONE ROW PER MATCH
        AFTER MATCH SKIP PAST LAST ROW
        PATTERN (A+ B)
        DEFINE
            A AS AVG(A.price) < 15
    ) MR;

Given this query and following input values:
给定此查询和以下输入值：

symbol         rowtime         price    tax
======  ====================  ======= =======
'ACME'  '01-Apr-11 10:00:00'   12      1
'ACME'  '01-Apr-11 10:00:01'   17      2
'ACME'  '01-Apr-11 10:00:02'   13      1
'ACME'  '01-Apr-11 10:00:03'   16      3
'ACME'  '01-Apr-11 10:00:04'   25      2
'ACME'  '01-Apr-11 10:00:05'   2       1
'ACME'  '01-Apr-11 10:00:06'   4       1
'ACME'  '01-Apr-11 10:00:07'   10      2
'ACME'  '01-Apr-11 10:00:08'   15      2
'ACME'  '01-Apr-11 10:00:09'   25      2
'ACME'  '01-Apr-11 10:00:10'   25      1
'ACME'  '01-Apr-11 10:00:11'   30      1

The query will accumulate events as part of the pattern variable A as long as the average price of them does not exceed 15. For example, such a limit exceeding happens at 01-Apr-11 10:00:04. The following period exceeds the average price of 15 again at 01-Apr-11 10:00:11. Thus the results for said query will be:
只要事件的平均价格不超过15，查询将累积事件作为模式变量A的一部分。例如，这种超出限制的情况发生在2011年4月1日10:00:04。以下时段在2011年4月1日的10:00:11再次超过15的平均价格。因此，所述查询的结果将是：

 symbol       start_tstamp       end_tstamp          avgPrice
=========  ==================  ==================  ============
ACME       01-APR-11 10:00:00  01-APR-11 10:00:03     14.5
ACME       01-APR-11 10:00:05  01-APR-11 10:00:10     13.5

Aggregations can be applied to expressions, but only if they reference a single pattern variable. Thus SUM(A.price * A.tax) is a valid one, but AVG(A.price * B.tax) is not.
聚合可以应用于表达式，但仅当它们引用单个模式变量时。因此，SUM(A.price * A.tax)是有效的，而AVG(A.price * B.tax)则不是。

DISTINCT aggregations are not supported.不支持DISTINCT聚合。

Defining a Pattern

The MATCH_RECOGNIZE clause allows users to search for patterns in event streams using a powerful and expressive syntax that is somewhat similar to the widespread regular expression syntax.
MATCH_RECOGNIZE子句允许用户使用功能强大且富有表现力的语法来搜索事件流中的模式，该语法与广泛使用的正则表达式语法有些相似。

Every pattern is constructed from basic building blocks, called pattern variables, to which operators (quantifiers and other modifiers) can be applied. The whole pattern must be enclosed in brackets.
每个模式都是由称为模式变量的基本构建块构建的，可以对其应用运算符(量词和其他修饰符)。整个模式必须用括号括起来。

An example pattern could look like:
示例模式如下：

PATTERN (A B+ C* D)

One may use the following operators:
可以使用以下运算符：

Concatenation - a pattern like (A B) means that the contiguity is strict between A and B. Therefore, there can be no rows that were not mapped to A or B in between.
串联 - 类似(A B)的模式意味着A和B之间的连续性是严格的。因此，其中不可能存在未映射到A或B的行。
Quantifiers - modify the number of rows that can be mapped to the pattern variable. 量词-修改可以映射到模式变量的行数。
* — 0 or more rows
+ — 1 or more rows
? — 0 or 1 rows
{ n } — exactly n rows (n > 0)
{ n, } — n or more rows (n ≥ 0)
{ n, m } — between n and m (inclusive) rows (0 ≤ n ≤ m, 0 < m)
{ , m } — between 0 and m (inclusive) rows (m > 0)

Patterns that can potentially produce an empty match are not supported. Examples of such patterns are PATTERN (A*), PATTERN (A? B*), PATTERN (A{0,} B{0,} C*), etc.
不支持可能产生空匹配的模式。这种模式的示例是PATTERN (A*), PATTERN (A? B*), PATTERN (A{0,} B{0,} C*)等。

Greedy & Reluctant Quantifiers 贪婪与不情愿量词

Each quantifier can be either greedy (default behavior) or reluctant. Greedy quantifiers try to match as many rows as possible while reluctant quantifiers try to match as few as possible.
每个量词可以是贪婪的(默认行为)，也可以是不情愿的。贪婪的量词试图匹配尽可能多的行，而不情愿的量词尝试匹配尽可能少的行。

In order to illustrate the difference, one can view the following example with a query where a greedy quantifier is applied to the B variable:
为了说明差异，可以通过查询查看以下示例，其中贪婪量词应用于B变量：

SELECT *
FROM Ticker
    MATCH_RECOGNIZE(
        PARTITION BY symbol
        ORDER BY rowtime
        MEASURES
            C.price AS lastPrice
        ONE ROW PER MATCH
        AFTER MATCH SKIP PAST LAST ROW
        PATTERN (A B* C)
        DEFINE
            A AS A.price > 10,
            B AS B.price < 15,
            C AS C.price > 12
    )

Given we have the following input:
鉴于我们有以下输入：

 symbol  tax   price          rowtime
======= ===== ======== =====================
 XYZ     1     10       2018-09-17 10:00:02
 XYZ     2     11       2018-09-17 10:00:03
 XYZ     1     12       2018-09-17 10:00:04
 XYZ     2     13       2018-09-17 10:00:05
 XYZ     1     14       2018-09-17 10:00:06
 XYZ     2     16       2018-09-17 10:00:07

The pattern above will produce the following output:
上述模式将产生以下输出：

 symbol   lastPrice
======== ===========
 XYZ      16

The same query where B* is modified to B*?, which means that B* should be reluctant, will produce:
将B*修改为B*？的相同查询？，这意味着B*应该是不情愿的

 symbol   lastPrice
======== ===========
 XYZ      13
 XYZ      16

The pattern variable B matches only to the row with price 12 instead of swallowing the rows with prices 12, 13, and 14.
模式变量B仅匹配价格为12的行，而不是吞下价格为12、13和14的行。

It is not possible to use a greedy quantifier for the last variable of a pattern. Thus, a pattern like (A B*) is not allowed. This can be easily worked around by introducing an artificial state (e.g. C) that has a negated condition of B. So you could use a query like:
不可能对模式的最后一个变量使用贪婪的量词。因此，不允许类似 (A B*)的模式。通过引入一个具有否定条件B的人工状态(例如C)，可以很容易地解决这个问题。因此，可以使用如下查询：

PATTERN (A B* C)
DEFINE
    A AS condA(),
    B AS condB(),
    C AS NOT condB()

Attention The optional reluctant quantifier (A?? or A{0,1}?) is not supported right now.
注意可选的不情愿量词 (A?? or A{0,1}?)目前不支持。

Time constraint 时间限制

Especially for streaming use cases, it is often required that a pattern finishes within a given period of time. This allows for limiting the overall state size that Flink has to maintain internally, even in case of greedy quantifiers.
特别是对于流式用例，通常要求模式在给定的时间段内完成。这允许限制Flink必须在内部维护的总体状态大小，即使是贪婪的量词。

Therefore, Flink SQL supports the additional (non-standard SQL) WITHIN clause for defining a time constraint for a pattern. The clause can be defined after the PATTERN clause and takes an interval of millisecond resolution.
因此，Flink SQL支持额外的(非标准SQL)WITHIN子句来定义模式的时间约束。该子句可以在PATTERN子句之后定义，并采用毫秒分辨率的间隔。

If the time between the first and last event of a potential match is longer than the given value, such a match will not be appended to the result table.
如果潜在匹配的第一个和最后一个事件之间的时间长于给定值，则不会将此类匹配附加到结果表中。

Note It is generally encouraged to use the WITHIN clause as it helps Flink with efficient memory management. Underlying state can be pruned once the threshold is reached.
注意：通常建议使用WITHIN子句，因为它有助于Flink进行有效的内存管理。一旦达到阈值，就可以修剪基础状态。

Attention However, the WITHIN clause is not part of the SQL standard. The recommended way of dealing with time constraints might change in the future.
注意然而，WITHIN子句不是SQL标准的一部分。建议的处理时间限制的方法将来可能会改变。

The use of the WITHIN clause is illustrated in the following example query:
以下示例查询中说明了WITHIN子句的用法：

SELECT *
FROM Ticker
    MATCH_RECOGNIZE(
        PARTITION BY symbol
        ORDER BY rowtime
        MEASURES
            C.rowtime AS dropTime,
            A.price - C.price AS dropDiff
        ONE ROW PER MATCH
        AFTER MATCH SKIP PAST LAST ROW
        PATTERN (A B* C) WITHIN INTERVAL '1' HOUR
        DEFINE
            B AS B.price > A.price - 10,
            C AS C.price < A.price - 10
    )

The query detects a price drop of 10 that happens within an interval of 1 hour.
查询检测到在1小时间隔内发生的价格下降10。

Let’s assume the query is used to analyze the following ticker data:
让我们假设该查询用于分析以下股票数据：

symbol         rowtime         price    tax
======  ====================  ======= =======
'ACME'  '01-Apr-11 10:00:00'   20      1
'ACME'  '01-Apr-11 10:20:00'   17      2
'ACME'  '01-Apr-11 10:40:00'   18      1
'ACME'  '01-Apr-11 11:00:00'   11      3
'ACME'  '01-Apr-11 11:20:00'   14      2
'ACME'  '01-Apr-11 11:40:00'   9       1
'ACME'  '01-Apr-11 12:00:00'   15      1
'ACME'  '01-Apr-11 12:20:00'   14      2
'ACME'  '01-Apr-11 12:40:00'   24      2
'ACME'  '01-Apr-11 13:00:00'   1       2
'ACME'  '01-Apr-11 13:20:00'   19      1

The query will produce the following results:
查询将产生以下结果：

symbol         dropTime         dropDiff
======  ====================  =============
'ACME'  '01-Apr-11 13:00:00'      14

The resulting row represents a price drop from 15 (at 01-Apr-11 12:00:00) to 1 (at 01-Apr-11 13:00:00). The dropDiff column contains the price difference.
结果行表示价格从15 (at 01-Apr-11 12:00:00)下降到1 (at 01-Apr-11 13:00:00)。dropDiff列包含价差。

Notice that even though prices also drop by higher values, for example, by 11 (between 01-Apr-11 10:00:00 and 01-Apr-11 11:40:00), the time difference between those two events is larger than 1 hour. Thus, they don’t produce a match.
请注意，尽管价格也下降了更高的值，例如，下降了11 (between 01-Apr-11 10:00:00 and 01-Apr-11 11:40:00)，但这两个事件之间的时间差大于1小时。因此，它们不会产生匹配项。

Output Mode

The output mode describes how many rows should be emitted for every found match. The SQL standard describes two modes:
输出模式描述了对于每个找到的匹配应该发出多少行。SQL标准描述了两种模式：

ALL ROWS PER MATCH
每个匹配的所有行
ONE ROW PER MATCH.
每个匹配一行。

Currently, the only supported output mode is ONE ROW PER MATCH that will always produce one output summary row for each found match.
目前，唯一支持的输出模式是ONE ROW PER MATCH，它将始终为每个找到的匹配生成一个输出摘要行。

The schema of the output row will be a concatenation of [partitioning columns] + [measures columns] in that particular order.
输出行的模式将是按特定顺序的[partitioning columns] + [measures columns]的串联。

The following example shows the output of a query defined as:
以下示例显示了定义的查询的输出：

SELECT *
FROM Ticker
    MATCH_RECOGNIZE(
        PARTITION BY symbol
        ORDER BY rowtime
        MEASURES
            FIRST(A.price) AS startPrice,
            LAST(A.price) AS topPrice,
            B.price AS lastPrice
        ONE ROW PER MATCH
        PATTERN (A+ B)
        DEFINE
            A AS LAST(A.price, 1) IS NULL OR A.price > LAST(A.price, 1),
            B AS B.price < LAST(A.price)
    )

For the following input rows:
对于以下输入行：

 symbol   tax   price          rowtime
======== ===== ======== =====================
 XYZ      1     10       2018-09-17 10:00:02
 XYZ      2     12       2018-09-17 10:00:03
 XYZ      1     13       2018-09-17 10:00:04
 XYZ      2     11       2018-09-17 10:00:05

The query will produce the following output:
查询将生成以下输出：

 symbol   startPrice   topPrice   lastPrice
======== ============ ========== ===========
 XYZ      10           13         11

The pattern recognition is partitioned by the symbol column. Even though not explicitly mentioned in the MEASURES clause, the partitioned column is added at the beginning of the result.
模式识别由symbol列分区。尽管在MEASURES子句中没有明确提及，但分区列还是添加在结果的开头。

Pattern Navigation 模式航行

The DEFINE and MEASURES clauses allow for navigating within the list of rows that (potentially) match a pattern.
DEFINE和MEASURES子句允许在(可能)匹配模式的行列表中航行。

This section discusses this navigation for declaring conditions or producing output results.
本节讨论用于声明条件或生成输出结果的航行。

Pattern Variable Referencing

A pattern variable reference allows a set of rows mapped to a particular pattern variable in the DEFINE or MEASURES clauses to be referenced.
模式变量引用允许引用一组映射到DEFINE或MEASURES子句中特定模式变量的行。

For example, the expression A.price describes a set of rows mapped so far to A plus the current row if we try to match the current row to A. If an expression in the DEFINE/MEASURES clause requires a single row (e.g. A.price or A.price > 10), it selects the last value belonging to the corresponding set.
例如，如果我们试图将当前行与A匹配，表达式A.price描述了一组映射到A的行加上当前行。如果DEFINE/MEAURES子句中的表达式需要单行(例如A.price或A.price>10)，它将选择属于相应集合的最后一个值。

If no pattern variable is specified (e.g. SUM(price)), an expression references the default pattern variable * which references all variables in the pattern. In other words, it creates a list of all the rows mapped so far to any variable plus the current row.
如果未指定模式变量(e.g. SUM(price))，表达式将引用默认模式变量*，该变量引用模式中的所有变量。换句话说，它创建了一个列表，其中包含迄今为止映射到任何变量的所有行以及当前行。

Example

For a more thorough example, one can take a look at the following pattern and corresponding conditions:
对于更彻底的示例，可以查看以下模式和相应的条件：

PATTERN (A B+)
DEFINE
  A AS A.price >= 10,
  B AS B.price > A.price AND SUM(price) < 100 AND SUM(B.price) < 80

The following table describes how those conditions are evaluated for each incoming event.
下表描述了如何为每个传入事件评估这些条件。

The table consists of the following columns:
该表由以下列组成：

# - the row identifier that uniquely identifies an incoming row in the lists [A.price]/[B.price]/[price].
#-[A.price]/[B.price]/[price]列表中输入行的唯一行标识符。
price - the price of the incoming row.
price-传入行的价格。
[A.price]/[B.price]/[price] - describe lists of rows which are used in the DEFINE clause to evaluate conditions.
[A.price]/[B.price]/[price]-描述DEFINE子句中用于评估条件的行列表。
Classifier - the classifier of the current row which indicates the pattern variable the row is mapped to.
分类器-当前行的分类器，指示该行映射到的模式变量。
A.price/B.price/SUM(price)/SUM(B.price) - describes the result after those expressions have been evaluated.
A.price/B.price/SUM(price)/SUM(B.price) - 描述这些表达式求值后的结果。

As can be seen in the table, the first row is mapped to pattern variable A and subsequent rows are mapped to pattern variable B. However, the last row does not fulfill the B condition because the sum over all mapped rows SUM(price) and the sum over all rows in B exceed the specified thresholds.
如表所示，第一行映射到模式变量A，随后的行映射到模式变量B。但是，最后一行不满足B条件，因为所有映射行的总和SUM(price)和B中所有行的总和超过了指定的阈值。

Logical Offsets

Logical offsets enable navigation within the events that were mapped to a particular pattern variable. This can be expressed with two corresponding functions:
逻辑偏移允许在映射到特定模式变量的事件中航行。这可以用两个相应的函数表示：

Offset functions	Description
LAST(variable.field, n)	Returns the value of the field from the event that was mapped to the n-th last element of the variable. The counting starts at the last element mapped. 返回映射到变量最后第n个元素的事件的字段值。计数从映射的最后一个元素开始。
FIRST(variable.field, n)	Returns the value of the field from the event that was mapped to the n-th element of the variable. The counting starts at the first element mapped. 返回映射到变量第n个元素的事件的字段值。计数从映射的第一个元素开始。

Examples

For a more thorough example, one can take a look at the following pattern and corresponding conditions:
对于更彻底的示例，可以查看以下模式和相应的条件：

PATTERN (A B+)
DEFINE
  A AS A.price >= 10,
  B AS (LAST(B.price, 1) IS NULL OR B.price > LAST(B.price, 1)) AND
       (LAST(B.price, 2) IS NULL OR B.price > 2 * LAST(B.price, 2))

The following table describes how those conditions are evaluated for each incoming event.
下表描述了如何为每个传入事件评估这些条件。

The table consists of the following columns:
该表由以下列组成：

price - the price of the incoming row.
price-传入行的价格。
Classifier - the classifier of the current row which indicates the pattern variable the row is mapped to.
分类器-当前行的分类器，指示该行映射到的模式变量。
LAST(B.price, 1)/LAST(B.price, 2) - describes the result after those expressions have been evaluated.
LAST(B.price, 1)/LAST(B.price, 2) - 描述这些表达式计算后的结果。

It might also make sense to use the default pattern variable with logical offsets.
使用带有逻辑偏移的默认模式变量也可能有意义。

In this case, an offset considers all the rows mapped so far:
在这种情况下，偏移量考虑到迄今为止映射的所有行：

PATTERN (A B? C)
DEFINE
  B AS B.price < 20,
  C AS LAST(price, 1) < C.price

If the second row did not map to the B variable, we would have the following results:
如果第二行没有映射到B变量，我们将得到以下结果：

It is also possible to use multiple pattern variable references in the first argument of the FIRST/LAST functions. This way, one can write an expression that accesses multiple columns. However, all of them must use the same pattern variable. In other words, the value of the LAST/FIRST function must be computed in a single row.
也可以在FIRST/LAST函数的第一个参数中使用多个模式变量引用。这样，可以编写访问多个列的表达式。但是，它们都必须使用相同的模式变量。换句话说，LAST/FIRST函数的值必须在一行中计算。

Thus, it is possible to use LAST(A.price * A.tax), but an expression like LAST(A.price * B.tax) is not allowed.
因此，可以使用LAST(A.price * A.tax)，但不允许使用类似LAST(A.price * B.tax)的表达式。

After Match Strategy

The AFTER MATCH SKIP clause specifies where to start a new matching procedure after a complete match was found.
AFTER MATCH SKIP子句指定在找到完全匹配后在何处开始新的匹配过程。

There are four different strategies:
有四种不同的策略：

SKIP PAST LAST ROW - resumes the pattern matching at the next row after the last row of the current match.
SKIP PAST LAST ROW - 在当前匹配的最后一行之后的下一行恢复模式匹配。
SKIP TO NEXT ROW - continues searching for a new match starting at the next row after the starting row of the match.
SKIP TO NEXT ROW - 继续搜索从匹配开始行后的下一行开始的新的匹配。
SKIP TO LAST variable - resumes the pattern matching at the last row that is mapped to the specified pattern variable.
SKIP TO LAST variable - 在映射到指定模式变量的最后一行恢复模式匹配。
SKIP TO FIRST variable - resumes the pattern matching at the first row that is mapped to the specified pattern variable.
SKIP TO FIRST variable - 在映射到指定模式变量的第一行的恢复模式匹配。

This is also a way to specify how many matches a single event can belong to. For example, with the SKIP PAST LAST ROW strategy every event can belong to at most one match.
这也是一种指定单个事件可以属于多少场匹配的方法。例如，使用“SKIP PAST LAST ROW”策略，每个事件最多只能属于一场匹配。

Examples

In order to better understand the differences between those strategies one can take a look at the following example.
为了更好地理解这些策略之间的差异，我们可以看看下面的例子。

For the following input rows:
对于以下输入行：

 symbol   tax   price         rowtime
======== ===== ======= =====================
 XYZ      1     7       2018-09-17 10:00:01
 XYZ      2     9       2018-09-17 10:00:02
 XYZ      1     10      2018-09-17 10:00:03
 XYZ      2     5       2018-09-17 10:00:04
 XYZ      2     10      2018-09-17 10:00:05
 XYZ      2     7       2018-09-17 10:00:06
 XYZ      2     14      2018-09-17 10:00:07

We evaluate the following query with different strategies:
我们使用不同的策略评估以下查询：

SELECT *
FROM Ticker
    MATCH_RECOGNIZE(
        PARTITION BY symbol
        ORDER BY rowtime
        MEASURES
            SUM(A.price) AS sumPrice,
            FIRST(rowtime) AS startTime,
            LAST(rowtime) AS endTime
        ONE ROW PER MATCH
        [AFTER MATCH STRATEGY]
        PATTERN (A+ C)
        DEFINE
            A AS SUM(A.price) < 30
    )

The query returns the sum of the prices of all rows mapped to A and the first and last timestamp of the overall match.
查询返回映射到A的所有行的价格之和以及整个匹配的第一个和最后一个时间戳。

The query will produce different results based on which AFTER MATCH strategy was used:
根据使用的AFTER MATCH策略，查询将产生不同的结果：

AFTER MATCH SKIP PAST LAST ROW #

 symbol   sumPrice        startTime              endTime
======== ========== ===================== =====================
 XYZ      26         2018-09-17 10:00:01   2018-09-17 10:00:04
 XYZ      17         2018-09-17 10:00:05   2018-09-17 10:00:07

The first result matched against the rows #1, #2, #3, #4.
第一个结果与行#1、#2、#3、#4匹配。

The second result matched against the rows #5, #6, #7.
第二个结果与行#5、#6、#7匹配。

AFTER MATCH SKIP TO NEXT ROW #

 symbol   sumPrice        startTime              endTime
======== ========== ===================== =====================
 XYZ      26         2018-09-17 10:00:01   2018-09-17 10:00:04
 XYZ      24         2018-09-17 10:00:02   2018-09-17 10:00:05
 XYZ      25         2018-09-17 10:00:03   2018-09-17 10:00:06
 XYZ      22         2018-09-17 10:00:04   2018-09-17 10:00:07
 XYZ      17         2018-09-17 10:00:05   2018-09-17 10:00:07

Again, the first result matched against the rows #1, #2, #3, #4.
同样，第一个结果与行#1、#2、#3、#4匹配。

Compared to the previous strategy, the next match includes row #2 again for the next matching. Therefore, the second result matched against the rows #2, #3, #4, #5.
与之前的策略相比，下一个匹配再次包括第2行，用于下一个搜索。因此，第二个结果与行#2、#3、#4、#5匹配。

The third result matched against the rows #3, #4, #5, #6.
第三个结果与行#3、#4、#5、#6匹配。

The forth result matched against the rows #4, #5, #6, #7.
第四个结果与行#4、#5、#6、#7匹配。

The last result matched against the rows #5, #6, #7.
最后一个结果与行#5、#6、#7匹配。

AFTER MATCH SKIP TO LAST A #

 symbol   sumPrice        startTime              endTime
======== ========== ===================== =====================
 XYZ      26         2018-09-17 10:00:01   2018-09-17 10:00:04
 XYZ      25         2018-09-17 10:00:03   2018-09-17 10:00:06
 XYZ      17         2018-09-17 10:00:05   2018-09-17 10:00:07

Again, the first result matched against the rows #1, #2, #3, #4.
同样，第一个结果与行#1、#2、#3、#4匹配。

Compared to the previous strategy, the next match includes only row #3 (mapped to A) again for the next matching. Therefore, the second result matched against the rows #3, #4, #5, #6.
与上一个策略相比，下一个匹配只包括第3行(映射到A)，再次用于下一次匹配。因此，第二个结果与行#3、#4、#5、#6匹配。

The last result matched against the rows #5, #6, #7.
最后一个结果与行#5、#6、#7匹配。

AFTER MATCH SKIP TO FIRST A #
This combination will produce a runtime exception because one would always try to start a new match where the last one started. This would produce an infinite loop and, thus, is prohibited.
这种组合将产生运行时异常，因为总是会尝试在上一个匹配开始的地方开始新的匹配。这将产生无限循环，因此被禁止。

One has to keep in mind that in case of the SKIP TO FIRST/LAST variable strategy it might be possible that there are no rows mapped to that variable (e.g. for pattern A*). In such cases, a runtime exception will be thrown as the standard requires a valid row to continue the matching.
必须记住，在SKIP TO FIRST/LAST variable策略的情况下，可能没有映射到该变量的行(例如模式A*)。在这种情况下，将引发runtime异常，因为标准要求存在有效行才能继续匹配。

Time attributes

In order to apply some subsequent queries on top of the MATCH_RECOGNIZE it might be required to use time attributes. To select those there are available two functions:
为了在MATCH_RECOGNIZE之上应用一些后续查询，可能需要使用时间属性。要选择这些功能，有两个可用功能：

Function	Description
MATCH_ROWTIME([rowtime_field])	Returns the timestamp of the last row that was mapped to the given pattern. The function accepts zero or one operand which is a field reference with rowtime attribute. If there is no operand, the function will return rowtime attribute with TIMESTAMP type. Otherwise, the return type will be same with the operand type. The resulting attribute is a rowtime attribute that can be used in subsequent time-based operations such as interval joins and group window or over window aggregations. 返回映射到给定模式的最后一行的时间戳。该函数接受零个或一个操作数，该操作数是具有rowtime属性的字段引用。如果没有操作数，函数将返回TIMESTAMP类型的rowtime属性。另外，返回类型将与操作数类型相同。生成的属性是一个rowtime属性，可用于后续基于时间的操作，如间隔联接和分组窗口或over窗口聚合。
MATCH_PROCTIME()	Returns a proctime attribute that can be used in subsequent time-based operations such as interval joins and group window or over window aggregations. 返回一个proctime属性，该属性可用于后续基于时间的操作，如间隔联接和分组窗口或over窗口聚合。

Function

Description

MATCH_ROWTIME([rowtime_field])

Returns the timestamp of the last row that was mapped to the given pattern. The function accepts zero or one operand which is a field reference with rowtime attribute. If there is no operand, the function will return rowtime attribute with TIMESTAMP type. Otherwise, the return type will be same with the operand type. The resulting attribute is a rowtime attribute that can be used in subsequent time-based operations such as interval joins and group window or over window aggregations. 返回映射到给定模式的最后一行的时间戳。该函数接受零个或一个操作数，该操作数是具有rowtime属性的字段引用。如果没有操作数，函数将返回TIMESTAMP类型的rowtime属性。另外，返回类型将与操作数类型相同。生成的属性是一个rowtime属性，可用于后续基于时间的操作，如间隔联接和分组窗口或over窗口聚合。

MATCH_PROCTIME()

Returns a proctime attribute that can be used in subsequent time-based operations such as interval joins and group window or over window aggregations. 返回一个proctime属性，该属性可用于后续基于时间的操作，如间隔联接和分组窗口或over窗口聚合。

Controlling Memory Consumption 控制内存消耗

Memory consumption is an important consideration when writing MATCH_RECOGNIZE queries, as the space of potential matches is built in a breadth-first-like manner. Having that in mind, one must make sure that the pattern can finish. Preferably with a reasonable number of rows mapped to the match as they have to fit into memory.
当编写MATCH_RECOGNIZE查询时，内存消耗是一个重要的考虑因素，因为潜在匹配的空间是以宽度优先的方式构建的。考虑到这一点，必须确保模式能够完成。最好有合理数量的行映射到匹配项，因为它们必须适合内存。

For example, the pattern must not have a quantifier without an upper limit that accepts every single row. Such a pattern could look like this:
例如，模式不能有没有接受每一行的上限的量词。这样的模式可能如下：

PATTERN (A B+ C)
DEFINE
  A as A.price > 10,
  C as C.price > 20

The query will map every incoming row to the B variable and thus will never finish. This query could be fixed, e.g., by negating the condition for C:
查询将把每个传入行映射到B变量，因此永远不会完成。这个查询可以被修复，例如，通过否定C的条件：

PATTERN (A B+ C)
DEFINE
  A as A.price > 10,
  B as B.price <= 20,
  C as C.price > 20

Or by using the reluctant quantifier:
或者使用不情愿的量词：

PATTERN (A B+? C)
DEFINE
  A as A.price > 10,
  C as C.price > 20

Attention Please note that the MATCH_RECOGNIZE clause does not use a configured state retention time. One may want to use the WITHIN clause for this purpose.
注意请注意，MATCH_RECOGNIZE子句不使用配置的状态保留时间。为此，可能需要使用WITHIN子句。

Known Limitations

Flink’s implementation of the MATCH_RECOGNIZE clause is an ongoing effort, and some features of the SQL standard are not yet supported.
Flink对MATCH_RECOGNIZE子句的实现是一项持续的工作，SQL标准的一些特性还不受支持。

Unsupported features include:
不支持的功能包括：

Pattern expressions:模式表达式：
** Pattern groups - this means that e.g. quantifiers can not be applied to a subsequence of the pattern. Thus, (A (B C)+) is not a valid pattern.
Pattern groups - 这意味着例如量词不能应用于模式的子序列。因此，(A (B C)+)不是有效的模式
** Alterations - patterns like PATTERN((A B | C D) E), which means that either a subsequence A B or C D has to be found before looking for the E row.
Alterations - 像PATTERN((A B | C D) E)这样的模式，这意味着在查找E行之前必须找到子序列AB或CD。
** PERMUTE operator - which is equivalent to all permutations of variables that it was applied to e.g. PATTERN (PERMUTE (A, B, C)) = PATTERN (A B C | A C B | B A C | B C A | C A B | C B A).
PERMUTE operator -它等效于应用于变量的所有排列，例如PATTERN (PERMUTE (A, B, C)) = PATTERN (A B C | A C B | B A C | B C A | C A B | C B A)。
** Anchors - ^, $, which denote beginning/end of a partition, those do not make sense in the streaming context and will not be supported.
Anchors - ^，$，表示分区的开始/结束，这些在流上下文中没有意义，也不受支持。
** Exclusion - PATTERN ({- A -} B) meaning that A will be looked for but will not participate in the output. This works only for the ALL ROWS PER MATCH mode.
Exclusion - PATTERN ({- A -} B)表示将查找A，但不会参与输出。这仅适用于ALL ROWS PER MATCH模式。
** Reluctant optional quantifier - PATTERN A?? only the greedy optional quantifier is supported.
不情愿的可选量词 - PATTERN A??只支持贪婪的可选量词。
ALL ROWS PER MATCH output mode - which produces an output row for every row that participated in the creation of a found match. This also means: ALL ROWS PER MATCH output mode - 为参与创建找到的匹配的每一行生成一个输出行。这也意味着：
** that the only supported semantic for the MEASURES clause is FINAL
MEASURES子句唯一支持的语义是FINAL
** CLASSIFIER function, which returns the pattern variable that a row was mapped to, is not yet supported.
CLASSIFIER函数返回一行映射到的模式变量，但尚不受支持。
SUBSET - which allows creating logical groups of pattern variables and using those groups in the DEFINE and MEASURES clauses.
SUBSET - 允许创建模式变量的逻辑组，并在DEFINE和MEASURES子句中使用这些组。
Physical offsets - PREV/NEXT, which indexes all events seen rather than only those that were mapped to a pattern variable (as in logical offsets case).
物理偏移量-PREV/NEXT，它索引所有看到的事件，而不仅仅是映射到模式变量的事件(如逻辑偏移量情况)。
Extracting time attributes - there is currently no possibility to get a time attribute for subsequent time-based operations.
提取时间属性-目前无法为后续基于时间的操作获取时间属性。
MATCH_RECOGNIZE is supported only for SQL. There is no equivalent in the Table API.
仅SQL支持MATCH_RECOGNIZE。表API中没有等效项。
Aggregations:
distinct aggregations are not supported.

你可能感兴趣的:(flink官方文档翻译-SQL,flink,sql,大数据)

mysql禁用远程登录 igotyback mysql
去mysql库中的user表里，将host都改成localhost之后刷新权限FLUSHPRIVILEGES;
SQL Server_查询某一数据库中的所有表的内容 qq_42772833 SQL Server 数据库 sqlserver
1.查看所有表的表名要列出CrabFarmDB数据库中的所有表（名），可以使用以下SQL语句：USECrabFarmDB;--切换到目标数据库GOSELECTTABLE_NAMEFROMINFORMATION_SCHEMA.TABLESWHERETABLE_TYPE='BASETABLE';对这段SQL脚本的解释：SELECTTABLE_NAME：这个语句的作用是从查询结果中选择TABLE_NAM
MYSQL面试系列-04 king01299 面试 mysql 面试
MYSQL面试系列-0417.关于redolog和binlog的刷盘机制、redolog、undolog作用、GTID是做什么的？innodb_flush_log_at_trx_commit及sync_binlog参数意义双117.1innodb_flush_log_at_trx_commit该变量定义了InnoDB在每次事务提交时，如何处理未刷入（flush）的重做日志信息（redolog）。它
nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
SpringBlade dict-biz/list 接口 SQL 注入漏洞文章永久免费只为良心 oracle 数据库
SpringBladedict-biz/list接口SQL注入漏洞POC:构造请求包查看返回包你的网址/api/blade-system/dict-biz/list?updatexml(1,concat(0x7e,md5(1),0x7e),1)=1漏洞概述在SpringBlade框架中，如果dict-biz/list接口的后台处理逻辑没有正确地对用户输入进行过滤或参数化查询（PreparedSta
ES聚合分析原理与代码实例讲解光剑书架上的书大厂Offer收割机面试题简历程序员读书硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM Java Python 架构设计 Agent 程序员实现财富自由
ES聚合分析原理与代码实例讲解1.背景介绍1.1问题的由来在大规模数据分析场景中，特别是在使用Elasticsearch（ES）进行数据存储和检索时，聚合分析成为了一个至关重要的功能。聚合分析允许用户对数据集进行细分和分组，以便深入探索数据的结构和模式。这在诸如实时监控、日志分析、业务洞察等领域具有广泛的应用。1.2研究现状目前，ES聚合分析已经成为现代大数据平台的核心组件之一。它支持多种类型的聚
关于Mysql 中 Row size too large (＞ 8126) 错误的解决和理解秋刀prince mysql mysql 数据库
提示：啰嗦一嘴，数据库的任何操作和验证前，一定要记得先备份！！！不会有错；文章目录问题发现一、问题导致的可能原因1、页大小2、行格式2.1compact格式2.2Redundant格式2.3Dynamic格式2.4Compressed格式3、BLOB和TEXT列二、解决办法1、修改页大小（不推荐）2、修改行格式3、修改数据类型为BLOB和TEXT列4、其他优化方式（可以参考使用）4.1合理设置数据
MongoDB知识概括 GeorgeLin98 持久层 mongodb
MongoDB知识概括MongoDB相关概念单机部署基本常用命令索引-IndexSpirngDataMongoDB集成副本集分片集群安全认证MongoDB相关概念业务应用场景：传统的关系型数据库（如MySQL），在数据操作的“三高”需求以及应对Web2.0的网站需求面前，显得力不从心。解释：“三高”需求：①Highperformance-对数据库高并发读写的需求。②HugeStorage-对海量数
JAVA·一个简单的登录窗口 MortalTom java 开发语言学习
文章目录概要整体架构流程技术名词解释技术细节资源概要JavaSwing是Java基础类库的一部分，主要用于开发图形用户界面（GUI）程序整体架构流程新建项目，导入sql.jar包（链接放在了文末），编译项目并运行技术名词解释一、特点丰富的组件提供了多种可视化组件，如按钮（JButton）、文本框（JTextField）、标签（JLabel）、下拉列表（JComboBox）等，可以满足不同的界面设计
入门MySQL——查询语法练习 K_un
前言：前面几篇文章为大家介绍了DML以及DDL语句的使用方法，本篇文章将主要讲述常用的查询语法。其实MySQL官网给出了多个示例数据库供大家实用查询，下面我们以最常用的员工示例数据库为准，详细介绍各自常用的查询语法。1.员工示例数据库导入官方文档员工示例数据库介绍及下载链接：https://dev.mysql.com/doc/employee/en/employees-installation.h
WebMagic：强大的Java爬虫框架解析与实战 Aaron_945 Java java 爬虫开发语言
文章目录引言官网链接WebMagic原理概述基础使用1.添加依赖2.编写PageProcessor高级使用1.自定义Pipeline2.分布式抓取优点结论引言在大数据时代，网络爬虫作为数据收集的重要工具，扮演着不可或缺的角色。Java作为一门广泛使用的编程语言，在爬虫开发领域也有其独特的优势。WebMagic是一个开源的Java爬虫框架，它提供了简单灵活的API，支持多线程、分布式抓取，以及丰富的
博客网站制作教程 2401_85194651 java maven
首先就是技术框架：后端：Java+SpringBoot数据库：MySQL前端：Vue.js数据库连接：JPA(JavaPersistenceAPI)1.项目结构blog-app/├──backend/│├──src/main/java/com/example/blogapp/││├──BlogApplication.java││├──config/│││└──DatabaseConfig.java
ubuntu安装wordpress lissettecarlr
1安装nginx网上安装方式很多，这就就直接用apt-get了apt-getinstallnginx不用启动啥，然后直接在浏览器里面输入IP:80就能看到nginx的主页了。如果修改了一些配置可以使用下列命令重启一下systemctlrestartnginx.service2安装mysql输入安装前也可以更新一下软件源，在安装过程中将会让你输入数据库的密码。sudoapt-getinstallmy
计算机毕业设计PHP仓储综合管理系统（源码+程序+VUE+lw+部署） java毕设程序源码王哥 php 课程设计 vue.js
该项目含有源码、文档、程序、数据库、配套开发软件、软件安装教程。欢迎交流项目运行环境配置：phpStudy+Vscode+Mysql5.7+HBuilderX+Navicat11+Vue+Express。项目技术：原生PHP++Vue等等组成，B/S模式+Vscode管理+前后端分离等等。环境需要1.运行环境：最好是小皮phpstudy最新版，我们在这个版本上开发的。其他版本理论上也可以。2.开发
免费的GPT可在线直接使用（一键收藏） kkai人工智能 gpt
1、LuminAI（https://kk.zlrxjh.top）LuminAI标志着一款融合了星辰大数据模型与文脉深度模型的先进知识增强型语言处理系统，旨在自然语言处理（NLP）的技术开发领域发光发热。此系统展现了卓越的语义把握与内容生成能力，轻松驾驭多样化的自然语言处理任务。VisionAI在NLP界的应用领域广泛，能够胜任从机器翻译、文本概要撰写、情绪分析到问答等众多任务。通过对大量文本数据的
MyBatis 详解阿贾克斯的黎明 java mybatis
目录目录一、MyBatis是什么二、为什么使用MyBatis（一）灵活性高（二）性能优化（三）易于维护三、怎么用MyBatis（一）添加依赖（二）配置MyBatis（三）创建实体类和接口（四）使用MyBatis一、MyBatis是什么MyBatis是一个优秀的持久层框架，它支持自定义SQL、存储过程以及高级映射。MyBatis免除了几乎所有的JDBC代码以及设置参数和获取结果集的工作。它可以通过简
如何利用大数据与AI技术革新相亲交友体验 h17711347205 回归算法安全系统架构交友小程序
在数字化时代，大数据和人工智能（AI）技术正逐渐革新相亲交友体验，为寻找爱情的过程带来前所未有的变革（编辑h17711347205）。通过精准分析和智能匹配，这些技术能够极大地提高相亲交友系统的效率和用户体验。大数据的力量大数据技术能够收集和分析用户的行为模式、偏好和互动数据，为相亲交友系统提供丰富的信息资源。通过分析用户的搜索历史、浏览记录和点击行为，系统能够深入了解用户的兴趣和需求，从而提供更
You have an error in your SQL syntax； check the manual that corresponds to your MySQL server version 努力的菜鸟~ sql 数据库
YouhaveanerrorinyourSQLsyntax;checkthemanualthatcorrespondstoyourMySQLserverversionfortherightsyntaxtousenear‘IDENTIFIEDBY‘123456’WITHGRANTOPTION’atline1在mysql5.7之前GRANTALLPRIVILEGESON*.*TO'root'@'%'I
mysql学习教程，从入门到精通，TOP 和MySQL LIMIT 子句（15）知识分享小能手大数据数据库 MySQL mysql 学习 oracle 数据库开发语言 adb 大数据
1、TOP和MySQLLIMIT子句内容在SQL中，不同的数据库系统对于限制查询结果的数量有不同的实现方式。TOP关键字主要用于SQLServer和Access数据库中，而LIMIT子句则主要用于MySQL、PostgreSQL（通过LIMIT/OFFSET语法）、SQLite等数据库中。下面将分别详细介绍这两个功能的语法、语句以及案例。1.1、TOP子句（SQLServer和Access）1.1
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your †徐先森® Oracle数据库 Web相关错误集
createtablestudents(idintunsignedprimarykeyauto_increment,namevarchar(50)notnull,ageintunsigned,highdecimal(3,2),genderenum('男','女','中性','保密','妖')default'保密',cls_idintunsigned);在对数据库插入如上带有中文带有默认值的字段的时
鲲鹏 ARM 架构麒麟 Lylin v10 安装 Nginx (离线) 焚木灵 arm开发架构 nginx 服务器
最近做一个银行的项目，银行的服务器是鲲鹏ARM架构的服务器，并且是麒麟v10的系统，这里记录一下在无法访问外网安装Nginx的方法。其他文章：鲲鹏ARM架构麒麟Lylinv10安装Mysql8.3(离线)-CSDN博客鲲鹏ARM架构麒麟Lylinv10安装Node和NVM(离线)-CSDN博客鲲鹏ARM架构麒麟Lylinv10安装Pm2(离线)-CSDN博客鲲鹏ARM架构麒麟Lylinv10安装P
【Golang】 Golang 的 GORM 库中的 Rows 函数不爱洗脚的小滕 golang 开发语言后端
文章目录前言一、Rows函数解释二、代码实现三、总结前言在使用Go语言进行数据库操作时，GORM（GoObject-RelationalMapping）库是一个常用的工具。它提供了一种简洁和强大的方式来处理数据库操作。本文将介绍GORM库中的Rows函数，这是一个用于执行原生SQL查询并返回结果的函数。一、Rows函数解释在GORM库中，Rows函数用于执行原生SQL查询并返回*sql.Rows结
未来软件市场是怎么样的？做开发的生存空间如何？ cesske 软件需求
目录前言一、未来软件市场的发展趋势二、软件开发人员的生存空间前言未来软件市场是怎么样的？做开发的生存空间如何？一、未来软件市场的发展趋势技术趋势：人工智能与机器学习：随着技术的不断成熟，人工智能将在更多领域得到应用，如智能客服、自动驾驶、智能制造等，这将极大地推动软件市场的增长。云计算与大数据：云计算服务将继续普及，大数据技术的应用也将更加广泛。企业将更加依赖云计算和大数据来优化运营、提升效率，并
Kubernetes部署MySQL数据持久化沫殇-MS Kubernetes MySQL数据库 kubernetes mysql 容器
一、安装配置NFS服务端1、安装nfs-kernel-server：sudoapt-yinstallnfs-kernel-server2、服务端创建共享目录#列出所有可用块设备的信息lsblk#格式化磁盘sudomkfs-text4/dev/sdb#创建一个目录：sudomkdir-p/data/nfs/mysql#更改目录权限：sudochown-Rnobody:nogroup/data/nfs
MySQL事务隔离级别和MVCC 简书徐小耳
MySQL事务隔离级别和MVCC参考：https://mp.weixin.qq.com/s/Jeg8656gGtkPteYWrG5_Nw1.MVCC只对读已提交和可重复的读有效果，而未提交读和串行则无意义。2.每条记录都会有trx_id(事务修改记录的id）和roll_pointer是一个指针指向旧版本的undo日志链表（row_id不是必必要的，如果有主键存在就不需要了）3.版本链的头结点就是记
【Death Note】网吧战神之7天爆肝渗透测试死亡笔记_sqlmap在默认情况下除了使用 char() 函数防止出现单引号 2401_84561374 程序员笔记
网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是浅尝辄止，不再深入研究，那么很难做到真正的技术提升。需要这份系统化的资料的朋友，可以戳这里获取一个人可以走的很快，但一群人才能走的更远！不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人，都欢迎加入我们的的圈子（技术交流、学习资源、职场吐槽、大厂内推、面试辅导），让我们一起学习成长！特殊服务端口2181zookeeper服务未授权访问
Hadoop架构 henan程序媛 hadoop 大数据分布式
一、案列分析1.1案例概述现在已经进入了大数据(BigData)时代，数以万计用户的互联网服务时时刻刻都在产生大量的交互，要处理的数据量实在是太大了，以传统的数据库技术等其他手段根本无法应对数据处理的实时性、有效性的需求。HDFS顺应时代出现，在解决大数据存储和计算方面有很多的优势。1.2案列前置知识点1.什么是大数据大数据是指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的大量数据集合，
CentOS7 安装MySQL5.7.44 不要Null了 java centos mysql
1.下载mysql安装包，我放在百度网盘里(下方链接)链接：https://pan.baidu.com/s/1_Mn1XW_1mWdTV4mhnLG66A提取码：s31n2.首先看看以前是否安装过mysqlrpm-qa|grep-imysql如果已经安装过mysql会提示卸载mysqlrpm-emysql-…3.使用FinallShell或者Xftp进行上传放到/usr/local/mysql，没
非关系型数据库天秤-white nosql
一、为什么要用Nosql1.单机MySQL的时代。一个基本的网站访问量一般不会太大，单个数据库完全足够。那时候更多使用的静态网页html，服务器根本没有太大压力。这时候网站的瓶颈是什么？-数据量如果太大，一个机器放不下。-数据量太大需要建立数据的索引（B+Tree），一个服务器内存放不下。-访问量读写混合，一个服务器承受不了。2.memcached缓存+MySQL+垂直拆分（读写分离）。网站80%
六、全局锁和表锁：给表加个字段怎么有这么多阻碍 nieniemin
数据库锁设计的初衷是处理并发问题。作为多用户共享的资源，当出现并发访问的时候，数据库需要合理地控制资源的访问规则。而锁就是用来实现这些访问规则的重要数据结构。根据加锁的范围，MySQL里面的锁大致可以分成全局锁、表级锁和行锁三类。6.1全局锁全局锁就是对整个数据库实例加锁。MySQL提供了一个加全局读锁的方法，命令是Flushtableswithreadlock(FTWRL)。当你需要让整个库处于
Spring中@Value注解，需要注意的地方无量 spring bean @Value xml
Spring 3以后,支持@Value注解的方式获取properties文件中的配置值，简化了读取配置文件的复杂操作 1、在applicationContext.xml文件(或引用文件中)中配置properties文件 <bean id="appProperty" class="org.springframework.beans.fac
mongoDB 分片开窍的石头 mongodb
mongoDB的分片。要mongos查询数据时候先查询configsvr看数据在那台shard上，configsvr上边放的是metar信息，指的是那条数据在那个片上。由此可以看出mongo在做分片的时候咱们至少要有一个configsvr,和两个以上的shard（片）信息。第一步启动两台以上的mongo服务 &nb
OVER(PARTITION BY)函数用法 0624chenhong oracle
这篇写得很好，引自 http://www.cnblogs.com/lanzi/archive/2010/10/26/1861338.html OVER(PARTITION BY)函数用法 2010年10月26日 OVER(PARTITION BY)函数介绍开窗函数 &nb
Android开发中，ADB server didn't ACK 解决方法一炮送你回车库 Android开发
首先通知：凡是安装360、豌豆荚、腾讯管家的全部卸载，然后再尝试。一直没搞明白这个问题咋出现的，但今天看到一个方法，搞定了！原来是豌豆荚占用了 5037 端口导致。参见原文章：一个豌豆荚引发的血案——关于ADB server didn't ACK的问题简单来讲，首先将Windows任务进程中的豌豆荚干掉，如果还是不行，再继续按下列步骤排查。 &nb
canvas中的像素绘制问题换个号韩国红果果 JavaScript canvas
pixl的绘制，1.如果绘制点正处于相邻像素交叉线，绘制x像素的线宽，则从交叉线分别向前向后绘制x/2个像素，如果x/2是整数，则刚好填满x个像素，如果是小数，则先把整数格填满，再去绘制剩下的小数部分，绘制时，是将小数部分的颜色用来除以一个像素的宽度，颜色会变淡。所以要用整数坐标来画的话（即绘制点正处于相邻像素交叉线时），线宽必须是2的整数倍。否则会出现不饱满的像素。 2.如果绘制点为一个像素的
编码乱码问题灵静志远 java jvm jsp 编码
1、JVM中单个字符占用的字节长度跟编码方式有关，而默认编码方式又跟平台是一一对应的或说平台决定了默认字符编码方式；2、对于单个字符：ISO-8859-1单字节编码，GBK双字节编码，UTF-8三字节编码；因此中文平台(中文平台默认字符集编码GBK)下一个中文字符占2个字节，而英文平台(英文平台默认字符集编码Cp1252(类似于ISO-8859-1))。 3、getBytes()、getByte
java 求几个月后的日期 darkranger calendar getinstance
Date plandate = planDate.toDate(); SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd"); Calendar cal = Calendar.getInstance(); cal.setTime(plandate); // 取得三个月后时间 cal.add(Calendar.M
数据库设计的三大范式（通俗易懂） aijuans 数据库复习
关系数据库中的关系必须满足一定的要求。满足不同程度要求的为不同范式。数据库的设计范式是数据库设计所需要满足的规范。只有理解数据库的设计范式，才能设计出高效率、优雅的数据库，否则可能会设计出错误的数据库. 目前，主要有六种范式：第一范式、第二范式、第三范式、BC范式、第四范式和第五范式。满足最低要求的叫第一范式，简称1NF。在第一范式基础上进一步满足一些要求的为第二范式，简称2NF。其余依此类推。
想学工作流怎么入手 atongyeye jbpm
工作流在工作中变得越来越重要，很多朋友想学工作流却不知如何入手。很多朋友习惯性的这看一点，那了解一点，既不系统，也容易半途而废。好比学武功，最好的办法是有一本武功秘籍。研究明白，则犹如打通任督二脉。系统学习工作流，很重要的一本书《JBPM工作流开发指南》。本人苦苦学习两个月，基本上可以解决大部分流程问题。整理一下学习思路，有兴趣的朋友可以参考下。 1 首先要
Context和SQLiteOpenHelper创建数据库百合不是茶 android Context创建数据库
一直以为安卓数据库的创建就是使用SQLiteOpenHelper创建,但是最近在android的一本书上看到了Context也可以创建数据库,下面我们一起分析这两种方式创建数据库的方式和区别,重点在SQLiteOpenHelper 一:SQLiteOpenHelper创建数据库: 1,SQLi
浅谈group by和distinct bijian1013 oracle 数据库 group by distinct
group by和distinct只了去重意义一样，但是group by应用范围更广泛些，如分组汇总或者从聚合函数里筛选数据等。譬如：统计每id数并且只显示数大于3 select id ,count(id) from ta
vi opertion 征客丶 mac opration vi
进入 command mode （命令行模式）按 esc 键再按 shift + 冒号注：以下命令中带 $ 【在命令行模式下进行】，不带 $ 【在非命令行模式下进行】一、文件操作 1.1、强制退出不保存 $ q! 1.2、保存 $ w 1.3、保存并退出 $ wq 1.4、刷新或重新加载已打开的文件 $ e 二、光标移动 2.1、跳到指定行数字
【Spark十四】深入Spark RDD第三部分RDD基本API bit1129 spark
对于K/V类型的RDD,如下操作是什么含义？ val rdd = sc.parallelize(List(("A",3),("C",6),("A",1),("B",5)) rdd.reduceByKey(_+_).collect reduceByKey在这里的操作，是把
java类加载机制 BlueSkator java 虚拟机
java类加载机制 1.java类加载器的树状结构引导类加载器 ^ | 扩展类加载器 ^ | 系统类加载器 java使用代理模式来完成类加载，java的类加载器也有类似于继承的关系，引导类是最顶层的加载器，它是所有类的根加载器，它负责加载java核心库。当一个类加载器接到装载类到虚拟机的请求时，通常会代理给父类加载器，若已经是根加载器了，就自己完成加载。虚拟机区分一个Cla
动态添加文本框 BreakingBad 文本框
<script> var num=1; function AddInput() { var str=""; str+="<input
读《研磨设计模式》-代码笔记-单例模式 bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ public class Singleton { } /* * 懒汉模式。注意，getInstance如果在多线程环境中调用，需要加上synchronized，否则存在线程不安全问题 */ class LazySingleton
iOS应用打包发布常见问题 chenhbc ios iOS发布 iOS上传 iOS打包
这个月公司安排我一个人做iOS客户端开发，由于急着用，我先发布一个版本，由于第一次发布iOS应用，期间出了不少问题，记录于此。 1、使用Application Loader 发布时报错：Communication error.please use diagnostic mode to check connectivity.you need to have outbound acc
工作流复杂拓扑结构处理新思路 comsci 设计模式工作算法企业应用 OO
我们走的设计路线和国外的产品不太一样，不一样在哪里呢？国外的流程的设计思路是通过事先定义一整套规则(类似XPDL)来约束和控制流程图的复杂度(我对国外的产品了解不够多，仅仅是在有限的了解程度上面提出这样的看法)，从而避免在流程引擎中处理这些复杂的图的问题，而我们却没有通过事先定义这样的复杂的规则来约束和降低用户自定义流程图的灵活性，这样一来，在引擎和流程流转控制这一个层面就会遇到很
oracle 11g新特性Flashback data archive daizj oracle
1. 什么是flashback data archive Flashback data archive是oracle 11g中引入的一个新特性。Flashback archive是一个新的数据库对象，用于存储一个或多表的历史数据。Flashback archive是一个逻辑对象，概念上类似于表空间。实际上flashback archive可以看作是存储一个或多个表的所有事务变化的逻辑空间。
多叉树:2-3-4树 dieslrae 树
平衡树多叉树,每个节点最多有4个子节点和3个数据项,2,3,4的含义是指一个节点可能含有的子节点的个数,效率比红黑树稍差.一般不允许出现重复关键字值.2-3-4树有以下特征: 1、有一个数据项的节点总是有2个子节点(称为2-节点) 2、有两个数据项的节点总是有3个子节点(称为3-节
C语言学习七动态分配 malloc的使用 dcj3sjt126com c language malloc
/* 2013年3月15日15:16:24 malloc 就memory(内存) allocate(分配)的缩写本程序没有实际含义，只是理解使用 */ # include <stdio.h> # include <malloc.h> int main(void) { int i = 5; //分配了4个字节静态分配 int * p
Objective-C编码规范[译] dcj3sjt126com 代码规范
原文链接 : The official raywenderlich.com Objective-C style guide 原文作者 : raywenderlich.com Team 译文出自 : raywenderlich.com Objective-C编码规范译者 : Sam Lau
0.性能优化-目录 frank1234 性能优化
从今天开始笔者陆续发表一些性能测试相关的文章，主要是对自己前段时间学习的总结，由于水平有限，性能测试领域很深，本人理解的也比较浅，欢迎各位大咖批评指正。主要内容包括：一、性能测试指标吞吐量、TPS、响应时间、负载、可扩展性、PV、思考时间 http://frank1234.iteye.com/blog/2180305 二、性能测试策略生产环境相同基准测试预热等 htt
Java父类取得子类传递的泛型参数Class类型 happyqing java 泛型父类子类 Class
import java.lang.reflect.ParameterizedType; import java.lang.reflect.Type; import org.junit.Test; abstract class BaseDao<T> { public void getType() { //Class<E> clazz =
跟我学SpringMVC目录汇总贴、PDF下载、源码下载 jinnianshilongnian springMVC
----广告-------------------------------------------------------------- 网站核心商详页开发掌握Java技术，掌握并发/异步工具使用，熟悉spring、ibatis框架；掌握数据库技术，表设计和索引优化，分库分表/读写分离；了解缓存技术，熟练使用如Redis/Memcached等主流技术；了解Ngin
the HTTP rewrite module requires the PCRE library 流浪鱼 rewrite
./configure: error: the HTTP rewrite module requires the PCRE library. 模块依赖性Nginx需要依赖下面3个包 1. gzip 模块需要 zlib 库 ( 下载: http://www.zlib.net/ ) 2. rewrite 模块需要 pcre 库 ( 下载: http://www.pcre.org/ ) 3. s
第12章 Ajax（中） onestopweb Ajax
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
Optimize query with Query Stripping in Web Intelligence blueoxygen BO
http://wiki.sdn.sap.com/wiki/display/BOBJ/Optimize+query+with+Query+Stripping+in+Web+Intelligence and a very straightfoward video http://www.sdn.sap.com/irj/scn/events?rid=/library/uuid/40ec3a0c-936
Java开发者写SQL时常犯的10个错误 tomcat_oracle java sql
1、不用PreparedStatements 　　有意思的是，在JDBC出现了许多年后的今天，这个错误依然出现在博客、论坛和邮件列表中，即便要记住和理解它是一件很简单的事。开发者不使用PreparedStatements的原因可能有如下几个：　　他们对PreparedStatements不了解　　他们认为使用PreparedStatements太慢了　　他们认为写Prepar
世纪互联与结盟有感阿尔萨斯
10月10日，世纪互联与（Foxcon）签约成立合资公司，有感。全球电子制造业巨头（全球500强企业）与世纪互联共同看好IDC、云计算等业务在中国的增长空间，双方迅速果断出手，在资本层面上达成合作，此举体现了全球电子制造业巨头对世纪互联IDC业务的欣赏与信任，另一方面反映出世纪互联目前良好的运营状况与广阔的发展前景。众所周知，精于电子产品制造（世界第一），对于世纪互联而言，能够与结盟