因业务特殊,对同一个df进行两次join会报以下错:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Resolved attribute(s) goods_name#1139 missing from order_id#498,goods_name#476 in operator !TypedFilter , interface org.apache.spark.sql.Row, [StructField(order_id,StringType,true), StructField(goods_name,StringType,true)], createexternalrow(order_id#498.toString, goods_name#1139.toString, StructField(order_id,StringType,true), StructField(goods_name,StringType,true)). Attribute(s) with the same name appear in the operation: goods_name. Please check if the right attribute(s) are used.;;
Join Inner
:- Project [txt_code#6, order_record_code#7, business_type#8, cust_id#9, net_cash_account#10, to_account_date#11, goods_name#476]
: +- Join Inner, (order_code#569 = order_record_code#7)
: :- LogicalRDD [txt_code#6, order_record_code#7, business_type#8, cust_id#9, net_cash_account#10, to_account_date#11], false
: +- Join Inner, (order_record_code#770 = order_code#569)
: :- Join Inner, (order_id#471 = id#568)
: : :- LogicalRDD [id#568, order_code#569], false
: : +- Project [order_id#471, goods_name#476, news_package#490, brand_package#494]
: : +- Join Inner, (order_id#471 = order_id#503)
: : :- Project [order_id#471, goods_name#476, news_package#490]
: : : +- Join Inner, (order_id#471 = order_id#498)
: : : :- TypedFilter , interface org.apache.spark.sql.Row, [StructField(order_id,StringType,true), StructField(goods_name,StringType,true)], createexternalrow(order_id#471.toString, goods_name#476.toString, StructField(order_id,StringType,true), StructField(goods_name,StringType,true))
: : : : +- Project [order_id#471, UDF(goods_name#472) AS goods_name#476]
: : : : +- TypedFilter , interface org.apache.spark.sql.Row, [StructField(order_id,StringType,true), StructField(goods_name,StringType,true)], createexternalrow(order_id#471.toString, goods_name#472.toString, StructField(order_id,StringType,true), StructField(goods_name,StringType,true))
: : : : +- LogicalRDD [order_id#471, goods_name#472], false
解决方式:对第二次进行join之前,每列取别名,在join,即可解决。
val tmpJoinDf = achievementDf.selectExpr("txt_code as txt_code2", "order_record_code as order_record_code2", "business_type as business_type2", "cust_id as cust_id2", "min_to_account_date", "groupCol").as("joinDf")
val joinExprs = functions.col("joinDf.txt_code2") === functions.col("txt_code") && functions.col("joinDf.order_record_code2") === functions.col("order_record_code") && functions.col("joinDf.business_type2") === functions.col("business_type") && functions.col("joinDf.cust_id2") === functions.col("cust_id")
result1Df = “第一次join成功后的df”.join(tmpJoinDf, joinExprs).select(fileldArr: _*)```