Exception in thread "main" org.apache.spark.sql.AnalysisException: Resolved attribute(s)

因业务特殊,对同一个df进行两次join会报以下错:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Resolved attribute(s) goods_name#1139 missing from order_id#498,goods_name#476 in operator !TypedFilter , interface org.apache.spark.sql.Row, [StructField(order_id,StringType,true), StructField(goods_name,StringType,true)], createexternalrow(order_id#498.toString, goods_name#1139.toString, StructField(order_id,StringType,true), StructField(goods_name,StringType,true)). Attribute(s) with the same name appear in the operation: goods_name. Please check if the right attribute(s) are used.;;
Join Inner
:- Project [txt_code#6, order_record_code#7, business_type#8, cust_id#9, net_cash_account#10, to_account_date#11, goods_name#476]
:  +- Join Inner, (order_code#569 = order_record_code#7)
:     :- LogicalRDD [txt_code#6, order_record_code#7, business_type#8, cust_id#9, net_cash_account#10, to_account_date#11], false
:     +- Join Inner, (order_record_code#770 = order_code#569)
:        :- Join Inner, (order_id#471 = id#568)
:        :  :- LogicalRDD [id#568, order_code#569], false
:        :  +- Project [order_id#471, goods_name#476, news_package#490, brand_package#494]
:        :     +- Join Inner, (order_id#471 = order_id#503)
:        :        :- Project [order_id#471, goods_name#476, news_package#490]
:        :        :  +- Join Inner, (order_id#471 = order_id#498)
:        :        :     :- TypedFilter , interface org.apache.spark.sql.Row, [StructField(order_id,StringType,true), StructField(goods_name,StringType,true)], createexternalrow(order_id#471.toString, goods_name#476.toString, StructField(order_id,StringType,true), StructField(goods_name,StringType,true))
:        :        :     :  +- Project [order_id#471, UDF(goods_name#472) AS goods_name#476]
:        :        :     :     +- TypedFilter , interface org.apache.spark.sql.Row, [StructField(order_id,StringType,true), StructField(goods_name,StringType,true)], createexternalrow(order_id#471.toString, goods_name#472.toString, StructField(order_id,StringType,true), StructField(goods_name,StringType,true))
:        :        :     :        +- LogicalRDD [order_id#471, goods_name#472], false

解决方式:对第二次进行join之前,每列取别名,在join,即可解决。

    val tmpJoinDf = achievementDf.selectExpr("txt_code as txt_code2", "order_record_code as order_record_code2", "business_type as business_type2", "cust_id as cust_id2", "min_to_account_date", "groupCol").as("joinDf")

 val joinExprs = functions.col("joinDf.txt_code2") === functions.col("txt_code") && functions.col("joinDf.order_record_code2") === functions.col("order_record_code") && functions.col("joinDf.business_type2") === functions.col("business_type") && functions.col("joinDf.cust_id2") === functions.col("cust_id")
    result1Df = “第一次join成功后的df”.join(tmpJoinDf, joinExprs).select(fileldArr: _*)```

你可能感兴趣的:(经验分享,hadoop)