场景:多表join、union时,发生如下报错:
Error in query: Resolved attribute(s) complex_flag_code#6549,quantity#6551L,pay_time_date#6547,sales_price#6553,oms_code#6548,retail_price#6550,promotion_sku_code#6552 missing from retail_price#6178,source_platform_code#6384,promotion_policy_code#6402,pay_amount#6329,sku_code#6322,complex_flag_code#6177,order_id#6530,sales_price#6181,promotion_sku_code#6180,pay_time_date#6175,sku_type#6331,quantity#6179L,oms_code#6176,is_gift#6333L in operator !Project [order_id#6530, pay_time_date#6547, cast(oms_code#6548 as string) AS oms_code#6518, sku_code#6322, cast(complex_flag_code#6549 as string) AS complex_flag_code#6519, retail_price#6550, pay_amount#6329, cast(quantity#6551L as decimal(22,2)) AS quantity#6520, sku_type#6331, promotion_policy_code#6402, source_platform_code#6384, is_gift#6333L, promotion_sku_code#6552, sales_price#6553]. Attribute(s) with the same name appear in the operation: complex_flag_code,quantity,pay_time_date,sales_price,oms_code,retail_price,promotion_sku_code. Please check if the right attribute(s) are used.;;
通过分别注释各部分代码后再运行,将问题定位到以下代码段:
...
...
(
SELECT
comp_sku.order_id
,comp_sku.quantity
,comp_sku.sales_price
,comp_sku.promotion_sku_code
,sales_tmp.order_id
FROM
(
SELECT
order_id
,promotion_sku_code
,quantity
,sales_price
FROM all_detail
WHERE is_gift = 1 AND promotion_sku_code IS NOT NULL
) comp_sku
LEFT JOIN
(
SELECT
order_id
,sku_code
FROM sales
) sales_tmp
ON comp_sku.order_id = sales_tmp.order_id AND comp_sku.promotion_sku_code = sales_tmp.sku_code
)
...
...
根据报错,猜测:以上代码作为子查询,将结果供父查询时, 父查询没有解析到子查询结果中的字段。
联想到曾经在hive官网上看到,在join或者union时,必须指定字段别名,否则会丢失数据。
猜测代码中的 comp_sku.order_id等字段在结果中应该成为了column1之类默认的字段名,所以父查询中查找order_id就查找不到。
于是将代码修改为:
...
...
(
SELECT
comp_sku.order_id AS order_id
,comp_sku.quantity AS quantity
,comp_sku.sales_price AS sales_price
,comp_sku.promotion_sku_code AS promotion_sku_code
,sales_tmp.order_id AS s_order_id
FROM
(
SELECT
order_id
,promotion_sku_code
,quantity
,sales_price
FROM all_detail
WHERE is_gift = 1 AND promotion_sku_code IS NOT NULL
) comp_sku
LEFT JOIN
(
SELECT
order_id
,sku_code
FROM sales
) sales_tmp
ON comp_sku.order_id = sales_tmp.order_id AND comp_sku.promotion_sku_code = sales_tmp.sku_code
)
...
...
问题解决。
总结: