HQL & JPQL - Part Ⅲ

Joins, reporting queries, and subselects

在抓取策略这篇文章中有提到dynamic fetching，用HQL/Criteria来动态抓取，最终是利用JOIN - 连接查询。

Joining relations and associations

首先要明白inner join和outer join，inner join返回两张表中都满足条件的结果集，left outer join表示主表数据全部返回，但是副表只返回满足条件的数据，如果主表数据在副表中没有，副表字段用null代替。

如下两张表

inner join的结果，注意ID为3的item没有返回：

left join的结果：

HQL and JPQL join options

HQL/JPQL提供了四种JOIN的表达方式：

An implicit association join
An ordinary join in the FROM clause
A fetch join in the FROM cluase
A theta-style join in the WHERE clause

Implicit association joins

隐式连接，使用点操作符来完成。在HQL/JPQL中点操作符的两种主要用途：

Querying components - 查询组件
Expressing implicit association joins - 表达隐式连接

// 查询组件，此时不会有表关联
// Address类属性对应的字段都在User表中
from User u where u.homeAddress.city = 'Bangkok'

// 放在SELECT子句中
select distinct u.homeAddress.city from User u

Implicit association join:

// 从BID到ITEM是many-to-one的关系
from Bid bid where bid.item.description like '%Foo%'

Implicit joins are always directed along many-to-one or one-to-one associations, never through a collection-valued association (you can’t write item.bids.amount).

隐式连接总是从many指向one或从one指向one来引用，永远不要从one指向many.

多张表隐式连接：

// 这里假设从ITEM到CATEGORY是多对一关系
from Bid bid where bid.item.category.name like 'Laptop%' and bid.item.successfulBid.amount > 100

虽然这样写可以，但是要一下子看明白到底关联了几张表还是很费劲，不够简单直白。
如果需要多张表关联，还是不要使用隐式连接。

select ...
from BID B
inner join ITEM I on B.ITEM_ID = I.ITEM_ID
inner join CATEGORY C on I.CATEGORY_ID = C.CATEGORY_ID
inner join BID SB on I.SUCCESSFUL_BID_ID = SB.BID_ID
where C.NAME like 'Laptop%'
and SB.AMOUNT > 100

Joins expressed in the FROM clause

在HQL/JPQL的FROM子句中使用JOIN，首先来看inner join:

// 在FROM子句中使用JOIN，为关联集合设置别名
// join默认为inner join
from Item i join i.bids b where i.description like '%Foo%' and b.amount > 100

生成的SQL如下：

select i.DESCRIPTION, i.INITIAL_PRICE, ...
    b.BID_ID, b.AMOUNT, b.ITEM_ID, b.CREATED_ON
from ITEM i
inner join BID b on i.ITEM_ID = b.ITEM_ID
where i.DESCRIPTION like '%Foo%' and b.AMOUNT > 100

获取查询结果代码如下：

Query q = session.createQuery("from Item i join i.bids b");
Iterator pairs = q.list().iterator();
while (pairs.hasNext()) {
    Object[] pair = (Object[]) pairs.next();
    Item item = (Item) pair[0];
    Bid bid = (Bid) pair[1];
}

返回List中的元素是对象数组，根据HQL中对象的顺序，第一个是Item第二个是Bid。
这里要注意，从ITEM到BID是一对多的关系，所以查询inner join的查询结果中，很可能存在重复的ITEM行，但是在Hibernate中只是重复的对象引用，不是重复的Item对象。

A particular Item may appear multiple times, once for each associated Bid. These duplicate items are duplicate in-memory references, not duplicate instances!

如果只想获取ITEM表数据：

// 添加SELECT子句来指定要查询的对象
select i from Item i join i.bids b where i.description like '%Foo%' and b.amount > 100

获取查询结果代码如下：

Query q = session.createQuery("select i from Item i join i.bids b");
Iterator items = q.list().iterator();
while ( items.hasNext() ) {
    Item item = (Item) items.next();
}

以上都是inner join，left join通常用于动态抓取策略(Dynamic fetch strategy)，动态抓取下节会讲。
下面看个使用left join，但没有使用动态抓取策略：

// 注意with关键字
from Item i left join i.bids b with b.amount > 100 where i.description like '%Foo%'

首先要知道LEFT JOIN和LEFT OUTER JOIN是一回事，通常不使用RIGHT JOIN。此条HQL仍然返回元素是对象数组的List，数组中第一个元素是Item，第二个是Bid。
注意with条件，如果将b.amount > 100放到WHERE子句中，则是对LEFT JOIN之后的集合做过滤，所以没有BID的ITEM就被过滤掉了；如果是放在JOIN之后，则只过滤BID，此时LEFT JOIN，没有BID的ITEM记录最终仍然会被查询出来。

Dynamic fetching strategies with joins

Hibernate默认是fetch on demand的抓取策略，所以默认会为关联实体创建代理，为集合创建collection wrapper，只有当真正需要时才去初始化；上一节的所有查询如果使用默认的抓取策略，则关联属性和collection都不会真正初始化。

你可以修改全局的抓取策略来改变默认行为；还可以保持默认行为，通过Dynamic fetching strategy为某些CASE提供eager fetch。

HQL/JPQL中可以通过FETCH关键字来实现eager fetch:

from Item i left join fetch i.bids where i.description like '%Foo%'

上面HQL返回元素是Item对象的List，并且其bids集合属性已经被初始化。
生成的SQL如下：

select i.DESCRIPTION, i.INITIAL_PRICE, ...
        b.BID_ID, b.AMOUNT, b.ITEM_ID, b.CREATED_ON
from ITEM i
left outer join BID b on i.ITEM_ID = b.ITEM_ID
where i.DESCRIPTION like '%Foo%'

上面HQL中，从Item到Bid是one-to-many，对于many-to-one或one-to-one关联关系，也可以使用FETCH关键字：

from Bid bid
    left join fetch bid.item
    left join fetch bid.bidder where bid.amount > 100

生成的SQL如下：

select b.BID_ID, b.AMOUNT, b.ITEM_ID, b.CREATED_ON
        i.DESCRIPTION, i.INITIAL_PRICE, ...
        u.USERNAME, u.FIRSTNAME, u.LASTNAME, ...
from BID b
left outer join ITEM i on i.ITEM_ID = b.ITEM_ID
left outer join USER u on u.USER_ID = b.BIDDER_ID
where b.AMOUNT > 100

HQL/JPQL注意事项：

You never assign an alias to any fetch-joined association or collection for further restriction or projection. So left join fetch i.bids b where b =...is invalid, whereas left join fetch i.bids b join fetch b.bidder is valid.
You shouldn’t fetch more than one collection in parallel; otherwise you create a Cartesian product. You can fetch as many single-valued associated objects as you like without creating a product.
HQL and JPA QL ignore any fetching strategy you’ve defined in mapping metadata. For example, mapping the bids collection in XML with fetch="join", has no effect on any HQL or JPA QL statement.
If you eager-fetch a collection, duplicates may be returned. Look at figure 14.3: This is exactly the SQL operation that is executed for a select i from Item i join fetch i.bids HQL or JPA QL query. Each Item is duplicated on the left side of the result table as many times as related Bid data is present. The List returned by the HQL or JPA QL query preserves these duplicates as references. If you prefer to filter out these duplicates you need to either wrap the List in a Set (for example, with Set noDupes = new LinkedHashSet(resultList)) or use the DISTINCT keyword: select distinct i from Item i join fetch i.bids —note that in this case the DISTINCT doesn’t operate at the SQL level, but forces Hibernate to filter out duplicates in memory when marshaling the result into objects. Clearly, duplicates can’t be avoided in the SQL result.
Query execution options that are based on the SQL result rows, such as pagination with setMaxResults()/setFirstResult(), are semantically incorrect if a collection is eagerly fetched. If you have an eager fetched collection in your query, at the time of writing, Hibernate falls back to limiting the result in-memory, instead of using SQL. This may be less efficient, so we don’t recommend the use of JOIN FETCH with setMaxResults()/setFirstResult(). Future versions of Hibernate may fall back to a different SQL query strategy (such as two queries and subselect fetching) if setMaxResults()/setFirstResult() is used in combination with a JOIN FETCH.

Theta-style joins

from User user, LogRecord log where user.username = log.username

这就是Theta-style join，两张表的关联条件在WHERE子句中指定。在HQL/JPQL中，当两个实体没有做外键关联时，可以使用Theta-style join，最终是inner join的结果。

查询结果取值：

Iterator i = session.createQuery("from User user, LogRecord log where user.username = log.username").list().iterator();
while (i.hasNext()) {
    Object[] pair = (Object[]) i.next();
    User user = (User) pair[0];
    LogRecord log = (LogRecord) pair[1];
}

Comparing identifiers

HQL/JPQL比较对象引用：

from Item i, User u where i.seller = u and u.username = 'steve'

与此theta-style join等价的from clause join:

from Item i join i.seller u where u.username = 'steve'

与以上两条HQL等价的比较主键(标识符)方式：

// 比较主键
from Item i, User u where i.seller.id = u.id and u.username = 'steve'

HQL guarantees that id always refers to any arbitrarily named identifier property; JPA QL doesn’t. HQL中id总是指向实体的主键属性，不管其主键属性名称是什么。

下面这条theta-style join没有等价的from clause join:

// i.seller和b.bidder都外键与USER表
from Item i, Bid b where i.seller = b.bidder

但是有等价的比较主键方式：

from Item i, Bid b where i.seller.id = b.bidder.id

主键做为查询参数：

Long userId = ...
Query q = session.createQuery("from Comment c where c.fromUser.id = :userId");
q.setLong("userId", userId);
List result = q.list();

下面两条HQL，第一条没有表连接；第二条最终会生成表连接。

// no joins at all
from Bid b where b.item.id = 1

// use a implicit table join
from Bid b where b.item.description like '%Foo%'

Reporting queries

利用group by和聚合函数来查询数据，称为reporting query.
报表查询出的不是实体对象，所以Hibernate不用为其查询结果维护状态(翻译的不好，原文如下)。

You don’t need transactional entity instances and can save the overhead of automatic dirty checking and caching in the persistence context.

Projection with aggregation functions

HQL/JPQL识别的聚合函数有：count(),min(),max(),sum(),avg()。当不使用GROUP BY时，SELECT子句中使用了聚合函数，则SELECT子句中只能有聚合函数，不同有其他字段

Long count = (Long) session.createQuery("select count(i) from Item i").uniqueResult();

// 统计successfulBid属性不为null的数量，为null的被忽略
select count(i.successfulBid) from Item i

// 忽略重复的
select count(distinct i.description) from Item i

// 因为amount属性是BigDecimal类型，所以SUM返回的也是BigDecimal类型
// 这里使用了implict join，从Item到Bid
select sum(i.successfulBid.amount) from Item i

// 返回对象数组Object[]，包含两个BigDecimal对象
select min(bid.amount), max(bid.amount) from Bid bid where bid.item.id = 1

Grouping aggregated results

group by分组：

select u.lastname, count(u) from User u group by u.lastname

生成如下SQL：

// 生成的SQL自动count实体类的主键
select u.LAST_NAME, count(u.USER_ID) from USER u group by u.LAST_NAME

// implicit join
select bid.item.id, count(bid), avg(bid.amount) from Bid bid where bid.item.successfulBid is null group by bid.item.id

// 等价的from clause join
select bidItem.id, count(bid), avg(bid.amount)
    from Bid bid join bid.item bidItem 
    where bidItem.successfulBid is null group by bidItem.id

Restricting groups with having

使用HAVING子句：

select user.lastname, count(user) from User user group by user.lastname having user.lastname like 'A%'

select item.id, count(bid), avg(bid.amount)
    from Item item join item.bids bid
    where item.successfulBid is null group by item.id having count(bid) > 10

Hibernate将查询结果的每一行生成一个对象数组Object[]。

Utilizing dynamic instantiation

动态实例化：

// 将查询结果直接封装成ItemBidSummary对象
select new ItemBidSummary(
    bid.item.id, count(bid), avg(bid.amount)
)
from Bid bid where bid.item.successfulBid is null group by bid.item.id

返回的ItemBidSummary对象是transient state；ItemBidSummary类名要含包名。

Improving performance with report queries

Report queries using projection in HQL and JPA QL let you specify which properties you wish to retrieve. For report queries, you aren’t selecting entities in managed state, but only properties or aggregated values.

This query doesn’t return persistent entity instances, so Hibernate doesn’t add any persistent object to the persistence context cache. This means that no object must be watched for dirty state either.

由于report query不是查询整个实体，所以Hibernate不用维护其查询结果的状态，所以更加高效。

Using subselects

HQL/JPQL不支持SELECT子句的子查询，支持WHERE子句的子查询。

Correlated and uncorrelated nesting

相关子查询 - correlated subquery：

from User u where 10 < (
    select count(i) from u.items i where i.successfulBid is not null
)

uncorrelated subquery:

// 查询出所有和最高amount相差在1块之内的bid
from Bid bid where bid.amount + 1 >= (
    select max(b.amount) from Bid b
)

Quantification

量词关键字：ALL, ANY/SOME, IN:

// 返回这些Item：Item对应的所有Bid的amount都小于100
from Item i where 100 > all ( select b.amount from i.bids b )

// 返回这些Item: Item对应的所有Bid的amount只要有一个大于等于100
from Item i where 100 <= any ( select b.amount from i.bids b )

// 返回这些Item: Item对应的所有Bid的amount只要有一个等于100
from Item i where 100 = some ( select b.amount from i.bids b )

// 返回这些Item: Item对应的所有Bid的amount只有有一个等于100(同上)
from Item i where 100 in ( select b.amount from i.bids b )

此文是对《Java Persistence with Hibernate》第14章第三部分的归纳。