将上图中红框的两个数组合并为一个数组并去重,也就是同一个productid对应的city_tags和hotel_tags取并集
第一步,先将数组中的数据全部取出来 使用LATERAL VIEW、explode 2个函数,可以实现把一个array类型的值分开
SELECT t.productID, t.cityID,t.airlineCode,t.hotelID,tagv FROM (SELECT productID, cityID,airlineCode,hotelID, tagids FROM product_pbs.origin_pbs_product ) t LATERAL VIEW explode(t.tags) v AS tagv
UNION ALL SELECT t.productID,t.cityID,t.airlineCode,t.hotelID, tagv FROM (SELECT productID,cityID,airlineCode,hotelID, hotelTags FROM product_pbs.origin_pbs_product ) t LATERAL VIEW explode(t.tagids) v AS tagv
第二步:使用collect_set函数配合group by将同一个productid对应的tagv进行合并为同一个数组
SELECT h.productID,h.cityID,h.airlineCode,h.hotelID,collect_set(h.tagv) AS tags
FROM
(SELECT t.productID, t.cityID,t.airlineCode,t.hotelID,tagv FROM (SELECT productID, cityID,airlineCode,hotelID, tagids FROM product_pbs.origin_pbs_product ) t LATERAL VIEW explode(t.tags) v AS tagv
UNION ALL SELECT t.productID,t.cityID,t.airlineCode,t.hotelID, tagv FROM (SELECT productID,cityID,airlineCode,hotelID, hotelTags FROM product_pbs.origin_pbs_product ) t LATERAL VIEW explode(t.tagids) v AS tagv) h
GROUP BY h.productID,h.cityID,h.airlineCode,h.hotelID;