1、返回数据集"airports"中所有的airports:
FOR airport IN airports
RETURN airport
2、只返回California的airports:
FOR airport IN airports
FILTER airport.state == "CA"
RETURN airport
3、返回每个国家的机场数量
FOR airport IN airports
COLLECT state = airport.state
WITH COUNT INTO counter
RETURN {state, counter}
在上面的代码示例中,所有关键字COLLECT、WITH和RETURN等都是大写的,但它只是一个约定。你也可以将所有关键词小写或混合大小写。但是变量名、属性名和集合名是区分大小写的。
1、返回能到达洛杉矶国际机场(Lax)的所有机场
FOR airport IN OUTBOUND 'airports/LAX' flights
RETURN DISTINCT airport
2、返回10个洛杉矶的航班和他们的目的地
FOR airport, flight IN OUTBOUND 'airports/LAX' flights
LIMIT 10
RETURN {airport, flight}
对于最小深度大于2的遍历,有两个选项可以选择:
深度优先(默认):继续沿着从起始顶点到该路径上的最后顶点的边缘,或者直到达到最大遍历深度,然后向下走其他路径
广度优先(可选):从开始顶点到下一个级别遵循所有边缘,然后按另一个级别跟踪邻居的所有边缘,并继续这个模式,直到没有更多的边缘跟随或达到最大的遍历深度。
返回LAX直达的所有机场:
FOR airport IN OUTBOUND 'airports/LAX' flights
OPTIONS {bfs: true, uniqueVertices: 'global'}
RETURN airport
通过执行时间与之前的查询进行比较,返回相同的机场:
FOR airport IN OUTBOUND 'airports/LAX' flights
RETURN DISTINCT airport
FOR airport IN OUTBOUND 'airports/LAX' flights
OPTIONS {bfs: true, uniqueVertices: 'global'}
RETURN DISTINCT airport
对比这两次结果,将看到显著的性能改进。也就是说,特定场景下使用广度遍历法会加快性能。
简单表达式以及整个子查询的结果可以存储在变量中。若要声明变量,请使用LET关键字,后面跟着变量名、等号和表达式。如果表达式是子查询,则代码必须位于括号中。
在下面的示例中,预先计算出发时间的时间和分钟,并将其存储在变量H和M中。
FOR f IN flights
FILTER f._from == 'airports/BIS'
LIMIT 100
LET h = FLOOR(f.DepTime / 100)
LET m = f.DepTime % 100
RETURN {
year: f.Year,
month: f.Month,
day: f.DayofMonth,
time: f.DepTime,
iso: DATE_ISO8601(f.Year, f.Month, f.DayofMonth, h, m)
}
最短路径查询在两个给定文档之间找到连接,其边缘数量最少。
寻找机场BIS和JFK之间的最短路径:
FOR v IN OUTBOUND
SHORTEST_PATH 'airports/BIS'
TO 'airports/JFK' flights
RETURN v
通过查询解释器,我们可以看到 默认For循环遍历,省略了 startnode 、索引命中情况、优化规则应用情况等信息
Query String (81 chars, cacheable : true):
FOR v IN OUTBOUND
SHORTEST_PATH 'airports/BIS'
TO 'airports/JFK' flights
RETURN v
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 ShortestPathNode 0 - FOR v /* vertex */ IN OUTBOUND SHORTEST_PATH 'airports/BIS' /* startnode */ TO 'airports/JFK' /* targetnode */ flights
3 ReturnNode 0 - RETURN v
Indexes used:
none
Shortest paths on graphs:
Id Vertex collections Edge collections
2 flights
Optimization rules applied:
none
返回从BIS到JFK的最小航班数:
LET airports = (
FOR v IN OUTBOUND
SHORTEST_PATH 'airports/BIS'
TO 'airports/JFK' flights
RETURN v
)
RETURN LENGTH(airports) - 1
LENGTH 函数可返回结果集的记录数,这里可用于 表示最短路径深度
目标:找出BIS与JFK之间花费时间最短的路径
筛选BIS到JFK的所有路径,由于在shortest path中最短路径深度为2,所以这里直接使用“IN 2 OUTBOUND”
FOR v, e, p IN 2 OUTBOUND 'airports/BIS' flights
FILTER v._id == 'airports/JFK'
LIMIT 5
RETURN p
筛选一天内的路径,这里以1月1号为例
FOR v, e, p IN 2 OUTBOUND 'airports/BIS' flights
FILTER v._id == 'airports/JFK'
FILTER p.edges[*].Month ALL == 1
FILTER p.edges[*].DayofMonth ALL == 1
LIMIT 5
RETURN p
使用DATE_DIFF() 函数计算出发时间与到达时间的差值,然后将结果升序排列
FOR v, e, p IN 2 OUTBOUND 'airports/BIS' flights
FILTER v._id == 'airports/JFK'
FILTER p.edges[*].Month ALL == 1
FILTER p.edges[*].DayofMonth ALL == 1
FILTER DATE_ADD(p.edges[0].ArrTimeUTC, 20, 'minutes') < p.edges[1].DepTimeUTC
LET flightTime = DATE_DIFF(p.edges[0].DepTimeUTC, p.edges[1].ArrTimeUTC, 'i')
SORT flightTime ASC
LIMIT 5
RETURN { flight: p, time: flightTime }
我们来看一下这句AQL的各种Node
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 TraversalNode 1002001 - FOR v /* vertex */, p /* paths */ IN 2..2 /* min..maxPathDepth */ OUTBOUND 'airports/BIS' /* startnode */ flights
3 CalculationNode 1002001 - LET #10 = ((v.`_id` == "airports/JFK") && (p.`edges`[1].`DepTimeUTC` > DATE_ADD(p.`edges`[0].`ArrTimeUTC`, 20, "minutes"))) /* simple expression */
4 FilterNode 1002001 - FILTER #10
11 CalculationNode 1002001 - LET flightTime = DATE_DIFF(p.`edges`[0].`DepTimeUTC`, p.`edges`[1].`ArrTimeUTC`, "i") /* simple expression */
12 SortNode 1002001 - SORT flightTime ASC /* sorting strategy: constrained heap */
13 LimitNode 5 - LIMIT 0, 5
14 CalculationNode 5 - LET #18 = { "flight" : p, "time" : flightTime } /* simple expression */
15 ReturnNode 5 - RETURN #18
Indexes used:
By Name Type Collection Unique Sparse Selectivity Fields Ranges
2 edge edge flights false false 0.10 % [ `_from` ] base OUTBOUND
Functions used:
Name Deterministic Cacheable Uses V8
DATE_ADD true true false
DATE_DIFF true true false
Traversals on graphs:
Id Depth Vertex collections Edge collections Options Filter / Prune Conditions
2 2..2 flights uniqueVertices: none, uniqueEdges: path FILTER ((p.`edges`[*].`Month` all == 1) && (p.`edges`[*].`DayofMonth` all == 1))
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 move-filters-up
3 move-calculations-up-2
4 move-filters-up-2
5 optimize-traversals
6 remove-filter-covered-by-traversal
7 remove-unnecessary-calculations-2
8 fuse-filters
9 sort-limit
在这个例子中,我们的查询需要遍历非常多的边,其中有些边是不需要去遍历的。我们这里用vertex-centric index方法来优化。
尝试给这三个字段加hash索引,
_from, Month, DayofMonth
在WEBUI中,发现无法添加hash和skiplist索引,原因官方解释如下:
The hash index type is deprecated for the RocksDB storage engine. It is the same as the persistent type when using RocksDB. The type hash is still allowed for backward compatibility in the APIs, but the web interface does not offer this type anymore.
The skiplist index type is deprecated for the RocksDB storage engine. It is the same as the persistent type when using RocksDB. The type skiplist is still allowed for backward compatibility in the APIs, but the web interface does not offer this type anymore.
再上一段官方对 Persistent 索引的说明:
The persistent index type is deprecated for the MMFiles storage engine. Use the RocksDB storage engine instead, where all indexes are persistent.
The index types hash, skiplist and persistent are equivalent when using the RocksDB storage engine. The types hash and skiplist are still allowed for backward compatibility in the APIs, but the web interface does not offer these types anymore.
最终,使用WebUI是给这3个字段加了persistent类型索引后,再执行 STEP3的查询,性能明显提升了将近30%。
原理解释:
如果没有以顶点为中心的索引,则需要跟踪出发机场的所有外出边缘,然后检查它们是否满足我们的条件(在某一天,到达期望的目的地,具有可行的中转)。
我们创建的新索引允许在某一天(Month,DayofMonth属性)内快速查找离开机场的外部边缘(_from属性),这消除了在不同天提取和过滤所有边缘的需要。它减少了需要用原始索引检查边缘的数量,并节省了相当长的时间。
以下是官方对Hash索引用法的的一些解释:
A hash index can be used to quickly find documents with specific attribute values. The hash index is unsorted, so it supports equality lookups but no range queries or sorting.
A hash index can be created on one or multiple document attributes. A hash index will only be used by a query if all index attributes are present in the search condition, and if all attributes are compared using the equality (==
) operator. Hash indexes are used from within AQL and several query functions, e.g. byExample
, firstExample
etc.
Hash indexes can optionally be declared unique, then disallowing saving the same value(s) in the indexed attribute(s). Hash indexes can optionally be sparse.
The different types of hash indexes have the following characteristics:
unique hash index: all documents in the collection must have different values for the attributes covered by the unique index. Trying to insert a document with the same key value as an already existing document will lead to a unique constraint violation.
This type of index is not sparse. Documents that do not contain the index attributes or that have a value of null
in the index attribute(s) will still be indexed. A key value of null
may only occur once in the index, so this type of index cannot be used for optional attributes.
The unique option can also be used to ensure that no duplicate edges are created, by adding a combined index for the fields _from
and _to
to an edge collection.
unique, sparse hash index: all documents in the collection must have different values for the attributes covered by the unique index. Documents in which at least one of the index attributes is not set or has a value of null
are not included in the index. This type of index can be used to ensure that there are no duplicate keys in the collection for documents which have the indexed attributes set. As the index will exclude documents for which the indexed attributes are null
or not set, it can be used for optional attributes.
non-unique hash index: all documents in the collection will be indexed. This type of index is not sparse. Documents that do not contain the index attributes or that have a value of null
in the index attribute(s) will still be indexed. Duplicate key values can occur and do not lead to unique constraint violations.
non-unique, sparse hash index: only those documents will be indexed that have all the indexed attributes set to a value other than null
. It can be used for optional attributes.
https://www.arangodb.com/docs/stable/indexing-index-basics.html
https://www.cnblogs.com/minglex/p/9383849.html