最近一个周一直在做测评分析,既然是分析就离不开查询统计,一点心得体会与大家一起分享。
我想做统计时行列转换应该是最常见的操作之一。现在我们拿NorthWind数据库作为例子,由于这个数据库大家都比较熟悉了我也不在这里过多的阐述所用到的表的具体信息。
假设现在我们要得到每一笔订单每种产品的订购数量,就可以这样:
SELECT dbo.[Order Details].OrderID,dbo.Categories.CategoryName,dbo.Products.ProductName,dbo.[Order Details].Quantity
FROM dbo.Orders
INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
但是在做统计的时候往往这样的显示是不够的,假设现在我要统计每笔订单每类产品的数量(当然事先我们已经知道产品的种类)。这是我们就要用到行列转换,对,也就是我们要将我们的结果中的行数据作为列来显示。具体怎么做呢?
SELECT
dbo.Orders.OrderID,
SUM(CASE WHEN CategoryName='Beverages' THEN Quantity ELSE 0 END),
SUM(CASE WHEN CategoryName='Condiments' THEN Quantity ELSE 0 END),
SUM(CASE WHEN CategoryName='Confections' THEN Quantity ELSE 0 END),
SUM(CASE WHEN CategoryName='Dairy Products' THEN Quantity ELSE 0 END),
SUM(CASE WHEN CategoryName='Grains/Cereals' THEN Quantity ELSE 0 END),
SUM(CASE WHEN CategoryName='Meat/Poultry' THEN Quantity ELSE 0 END),
SUM(CASE WHEN CategoryName='Produce' THEN Quantity ELSE 0 END),
SUM(CASE WHEN CategoryName='Seafood' THEN Quantity ELSE 0 END)
FROM dbo.Orders
INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
GROUP BY orders.OrderID
ORDER BY dbo.Orders.OrderID
上面是一种方法(注意上面Sum不能换成Max,因为每类商品有多种),当然还有别的方法,假如你用是SQL SERVER 2005或以上版本,我们可能会这样:
SELECT *
FROM
(SELECT dbo.[Order Details].OrderID,dbo.Categories.CategoryName,dbo.[Order Details].Quantity
FROM dbo.Orders
INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID)
AS A
PIVOT
(
SUM(A.Quantity) FOR A.CategoryName IN ([Beverages],[Condiments],[Confections],[Dairy Products],[Grains/Cereals],[Meat/Poultry],[Produce],[Seafood])
)
AS B
上面就显示了Pivot函数的强大了,具体用法我就不再具体说了。但是就上面的方法对于再稍微复杂一点的情形就不行了(或许是还有什么方法我目前还不知道吧),例如假设对于我们上面商品类别进行了分等次划分,每类商品都有好(Good)、一般(General)、差(Bad)三种档次(当然这些等次也在另一张表中存储,假设有一张表叫ProductLevel(LevelID,LevelName)),而我们要显示的结果是下面这种形式。
订购批次 |
Beverages |
Condiments |
… |
||||
Good |
General |
Bad |
Good |
General |
Bad |
… |
|
这个时候我想应该用Pivot就没办法了吧(上面说了,如果有人知道用Pivot可实现请告诉我一下,谢谢了),但是用第一种方法就很容易了:
SELECT
dbo.Orders.OrderID,
SUM(CASE WHEN CategoryName='Beverages' AND LevelName='Good' THEN Quantity ELSE 0 END),
SUM (CASE WHEN CategoryName='Beverages' AND LevelName='General' THEN Quantity ELSE 0 END),
SUM (CASE WHEN CategoryName='Beverages' AND LevelName='Bad' THEN Quantity ELSE 0 END),
…
注意上面我们用的"Case When 参数"的用法才可以这样,如果要是使用"Case 参数 When"这种用法就没办法了(或者说很麻烦),这里就不再说case两种用法的区别了。
当然了提到行列转换我们就会想到如果列是变的怎么办?也就是上面的商品种类不是八种而是不固定的怎么办?
DECLARE
@tempStr VARCHAR(max)
SET @tempStr='SELECT dbo.Orders.OrderID'
SELECT @tempStr=@tempStr+', SUM (CASE WHEN CategoryName='''+CategoryName+''' THEN Quantity ELSE 0 END)' FROM (SELECT DISTINCT dbo.Categories.CategoryName FROM dbo.Categories) AS T --动态得到中间商品种类型相关的字符串
SELECT @tempStr=@tempStr+'FROM dbo.Orders
INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
GROUP BY orders.OrderID
ORDER BY dbo.Orders.OrderID'
EXEC(@tempStr)
我们通过动态拼接sql的形式,仍然可以解决的。
我们先看下面的sql
SELECT dbo.[Order Details].OrderID,dbo.Categories.CategoryName
FROM dbo.Orders
INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
ORDER BY orders.OrderID
它的执行结果就是哪笔订单定了哪些种类的商品,因为并不是每笔订单都会包含已有的所有种类的商品,所以有时候我们需要查看这样的结果
订购批次 |
商品种类 |
10248 |
Dairy Products Grains/Cereals Dairy Products |
对于这样的结果我想凭借着简单的sql就很难完成了,但是SQL SERVER并不是那么简单的,我们通过创建函数是很容易实现的。
首先创建一个函数:
CREATE FUNCTION GetCategorys(@orderID INT)
RETURNS varchar(100)
AS
BEGIN
DECLARE @tempStr VARCHAR(100)
SET @tempStr=''
SELECT @tempStr=@tempStr+' '+dbo.Categories.CategoryName FROM dbo.Orders
INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
WHERE dbo.[Order Details].OrderID=@orderID
RETURN(@tempStr)
END
然后执行:
SELECT dbo.[Order Details].OrderID,dbo.GetCategorys(dbo.[Order Details].OrderID)
FROM dbo.Orders
INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
ORDER BY orders.OrderID
这样就得到我们想要的结果了。
这个标题或许不太恰当,我在这里做一解释:我们知道连表查询(不管left join、inner join、right jion还是在where中用主外键相等搜索),但是很多时候这是不够的我们还需要虚表查询,就是这边连了几个表那边连了几个表然后这两个表再连(当然也包含一个虚表和一个实表相连的情况)。
我们现在想要查询所有订单中每类商品最高值,当然这个很简单:
SELECT dbo.Categories.CategoryName,MAX(dbo.[Order Details].Quantity)
FROM dbo.Orders
INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
GROUP BY dbo.Categories.CategoryName
但是现在我们还要知道所有订单中单类商品最高的订单,简单的说就是上面没有OrderID现在需要将它页显示出来。这就有问题了,我们知道想要在上面添加这样一列就会提示"选择列表中的列'dbo.Orders.OrderID' 无效,因为该列没有包含在聚合函数或GROUP BY 子句中。"因为我们并没有按照OrderID分组,当然我们也不能在Group By中添加这一列了,那不是我们想要的。好了看看我们要怎么做吧:
SELECT dbo.[Order Details].OrderID,dbo.Categories.CategoryName,dbo.[Order Details].Quantity
FROM dbo.Orders
INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
INNER JOIN
(
SELECT dbo.Categories.CategoryName,MAX(dbo.[Order Details].Quantity) AS MQuantity
FROM dbo.Orders
INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
GROUP BY dbo.Categories.CategoryName
) AS T ON T.CategoryName=dbo.Categories.CategoryName AND T.MQuantity=dbo.[Order Details].Quantity
ORDER BY dbo.Categories.CategoryName
从上面的sql我们可以查询到想要的结果,当然它不是8条记录而是十条,因为有并列。它就是按照我们虚表相连的得到的(事实上它还是同样的表相连的虚表相连的)。
其实sql的强大不用我多说,很多时候我们需要用程序语言来实现的事情sql本身就可以做,但是多数时候我们没有将它的功能发挥出来。
假设现在我们知道每类商品我们可以盈利的百分率,那么现在要根据这个得到我们的盈利该怎么办?当然之前先假设一下百分率再说,八类商品Beverages、Condiments、Confections、Dairy Products、Grains/Cereals、Meat/Poultry、Produce、Seafood假设盈利分别为1%、2%、3%、4%、5%、6%、7%、8%。现在我就想查出某笔订单的盈利:
DECLARE
@BeveragesCount INT,
@CondimentsCount INT,
@ConfectionsCount INT,
@DairyProductsCount INT,
@GrainsCerealsCount INT,
@MeatPoultryCount INT,
@ProduceCount INT,
@SeafoodCount INT
SELECT @BeveragesCount=ISNULL(SUM(dbo.[Order Details].Quantity),0) FROM dbo.Orders INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
WHERE dbo.Orders.OrderID='10248' AND dbo.Categories.CategoryName='Beverages'
SELECT @CondimentsCount=ISNULL(SUM(dbo.[Order Details].Quantity),0) FROM dbo.Orders INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
WHERE dbo.Orders.OrderID='10248' AND dbo.Categories.CategoryName='Condiments'
SELECT @ConfectionsCount=ISNULL(SUM(dbo.[Order Details].Quantity),0) FROM dbo.Orders INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
WHERE dbo.Orders.OrderID='10248' AND dbo.Categories.CategoryName='Confections'
SELECT @DairyProductsCount=ISNULL(SUM(dbo.[Order Details].Quantity),0) FROM dbo.Orders INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
WHERE dbo.Orders.OrderID='10248' AND dbo.Categories.CategoryName='Dairy Products'
SELECT @GrainsCerealsCount=ISNULL(SUM(dbo.[Order Details].Quantity),0) FROM dbo.Orders INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
WHERE dbo.Orders.OrderID='10248' AND dbo.Categories.CategoryName='Grains/Cereals'
SELECT @MeatPoultryCount=ISNULL(SUM(dbo.[Order Details].Quantity),0) FROM dbo.Orders INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
WHERE dbo.Orders.OrderID='10248' AND dbo.Categories.CategoryName='Meat/Poultry'
SELECT @ProduceCount=ISNULL(SUM(dbo.[Order Details].Quantity),0) FROM dbo.Orders INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
WHERE dbo.Orders.OrderID='10248' AND dbo.Categories.CategoryName='Produce'
SELECT @SeafoodCount=ISNULL(SUM(dbo.[Order Details].Quantity),0) FROM dbo.Orders INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
WHERE dbo.Orders.OrderID='10248' AND dbo.Categories.CategoryName='Seafood'
SELECT '10248',@BeveragesCount*0.01+@CondimentsCount*0.02+@ConfectionsCount*0.03+@DairyProductsCount*0.04+@GrainsCerealsCount*0.05+@MeatPoultryCount*0.06+@ProduceCount*0.07+@SeafoodCount*0.08
上面的sql看起来很麻烦,其实都是基本的sql,我们借助于变量达到我们要的结果。其实如果你细心的话就会想到我们有更简单的方法,而且能够得到所有订单的盈利,方法就是利用上面说过的虚表和行列转换:
SELECT
OrderID,Beverages*0.01+Condiments*0.02+Confections*0.03+DairyProducts*0.04+GrainsCereals*0.05+MeatPoultry*0.06+Produce*0.07+Seafood*0.08
FROM
(
SELECT
dbo.Orders.OrderID,
SUM(CASE WHEN dbo.Categories.CategoryName='Beverages' THEN dbo.[Order Details].Quantity ELSE 0 END) AS Beverages,
SUM(CASE WHEN dbo.Categories.CategoryName='Condiments' THEN dbo.[Order Details].Quantity ELSE 0 END) AS Condiments,
SUM(CASE WHEN dbo.Categories.CategoryName='Confections' THEN dbo.[Order Details].Quantity ELSE 0 END) AS Confections,
SUM(CASE WHEN dbo.Categories.CategoryName='Dairy Products' THEN dbo.[Order Details].Quantity ELSE 0 END) AS DairyProducts,
SUM(CASE WHEN dbo.Categories.CategoryName='Grains/Cereals' THEN dbo.[Order Details].Quantity ELSE 0 END) AS GrainsCereals,
SUM(CASE WHEN dbo.Categories.CategoryName='Meat/Poultry' THEN dbo.[Order Details].Quantity ELSE 0 END) AS MeatPoultry,
SUM(CASE WHEN dbo.Categories.CategoryName='Produce' THEN dbo.[Order Details].Quantity ELSE 0 END) AS Produce,
SUM(CASE WHEN dbo.Categories.CategoryName='Seafood' THEN dbo.[Order Details].Quantity ELSE 0 END) AS Seafood
FROM
dbo.Orders
INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
GROUP BY dbo.Orders.OrderID
) AS T
ORDER BY OrderID
是不是很简单啊,当然我们这里是为了说sql变量,也是动态sql来做查询统计,利用变量的方法当然也能够找出所有订单的盈利,这里就不再说了。
其实INNER JOIN 、LEFT JOIN、 RIGHT JOIN是为了实现列的组合,但是如果要是想让行组合呢?那就要用UNION、UNION ALL。我们在做查询统计的时候经常用到合计,下面我们想要在原来行列转换的结果的最下面添加一行统计,来计算八类商品在所有订单中的合计数目。
SELECT
dbo.Orders.OrderID,
SUM(CASE WHEN dbo.Categories.CategoryName='Beverages' THEN dbo.[Order Details].Quantity ELSE 0 END) AS Beverages,
SUM(CASE WHEN dbo.Categories.CategoryName='Condiments' THEN dbo.[Order Details].Quantity ELSE 0 END) AS Condiments,
SUM(CASE WHEN dbo.Categories.CategoryName='Confections' THEN dbo.[Order Details].Quantity ELSE 0 END) AS Confections,
SUM(CASE WHEN dbo.Categories.CategoryName='Dairy Products' THEN dbo.[Order Details].Quantity ELSE 0 END) AS DairyProducts,
SUM(CASE WHEN dbo.Categories.CategoryName='Grains/Cereals' THEN dbo.[Order Details].Quantity ELSE 0 END) AS GrainsCereals,
SUM(CASE WHEN dbo.Categories.CategoryName='Meat/Poultry' THEN dbo.[Order Details].Quantity ELSE 0 END) AS MeatPoultry,
SUM(CASE WHEN dbo.Categories.CategoryName='Produce' THEN dbo.[Order Details].Quantity ELSE 0 END) AS Produce,
SUM(CASE WHEN dbo.Categories.CategoryName='Seafood' THEN dbo.[Order Details].Quantity ELSE 0 END) AS Seafood
FROM
dbo.Orders
INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
GROUP BY dbo.Orders.OrderID
UNION ALL
SELECT
0 AS OrderID,
SUM(Beverages) AS BeveragesCount,
SUM(Condiments) AS CondimentsCount,
SUM(Confections) AS ConfectionsCount,
SUM(DairyProducts) AS DairyProductsCount,
SUM(GrainsCereals) AS GrainsCerealsCount,
SUM(MeatPoultry) AS MeatPoultryCount,
SUM(Produce) AS ProduceCount,
SUM(Seafood) AS SeafoodCount
FROM
(
SELECT
dbo.Orders.OrderID,
SUM(CASE WHEN dbo.Categories.CategoryName='Beverages' THEN dbo.[Order Details].Quantity ELSE 0 END) AS Beverages,
SUM(CASE WHEN dbo.Categories.CategoryName='Condiments' THEN dbo.[Order Details].Quantity ELSE 0 END) AS Condiments,
SUM(CASE WHEN dbo.Categories.CategoryName='Confections' THEN dbo.[Order Details].Quantity ELSE 0 END) AS Confections,
SUM(CASE WHEN dbo.Categories.CategoryName='Dairy Products' THEN dbo.[Order Details].Quantity ELSE 0 END) AS DairyProducts,
SUM(CASE WHEN dbo.Categories.CategoryName='Grains/Cereals' THEN dbo.[Order Details].Quantity ELSE 0 END) AS GrainsCereals,
SUM(CASE WHEN dbo.Categories.CategoryName='Meat/Poultry' THEN dbo.[Order Details].Quantity ELSE 0 END) AS MeatPoultry,
SUM(CASE WHEN dbo.Categories.CategoryName='Produce' THEN dbo.[Order Details].Quantity ELSE 0 END) AS Produce,
SUM(CASE WHEN dbo.Categories.CategoryName='Seafood' THEN dbo.[Order Details].Quantity ELSE 0 END) AS Seafood
FROM
dbo.Orders
INNER JOIN dbo.[Order Details] ON dbo.Orders.OrderID = dbo.[Order Details].OrderID
INNER JOIN dbo.Products ON dbo.[Order Details].ProductID = dbo.Products.ProductID
INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID
GROUP BY dbo.Orders.OrderID
) AS B
利用上面的方法几乎原来的查询不用动,只需增加合计行就完成了合并。
先到这里,如果以后有什么总结我会继续更新。