报错:It appears that you are attempting to reference SparkContext from a broadcast variable, action

报错

_pickle.PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation.
 SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

_pickle.PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation.
SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

代码

def decode_itemid(p):
            hbase_util = HappyBaseUtil(hbase_host_bc.value, hbase_port_bc.value)
            for row in p:
                userid = row[0]
                rowkey_user = 'map_user_rev_' + str(userid)
                user_json = hbase_util.get_row(self.hbase_table_for_decode_itemid, rowkey_user,
                                                    columns=["info:message"])
                user_decode_id = json.loads(user_json.get("info:message"))........
                
rec_result1 = predictions.mapPartitions(decode_itemid)

原因

self.sc 在worker中运行导致的; self.spark 和self.sc 只能在master中运行

代码错误

user_json = hbase_util.get_row(self.hbase_table_for_decode_itemid, rowkey_user,
                                                    columns=["info:message"])
#TODO: self.hbase_table_for_decode_itemid导致的报错

解决办法

改为

#decode_itemid函数外 ,添加下列代码
hbase_table_for_result_recall = self.hbase_table_for_result_recall
hbase_table_for_result_recall_bc = self.sc.broadcast(hbase_table_for_result_recall)

#decode_itemid函数内,修改代码
user_json = hbase_util.get_row(hbase_table_for_decode_itemid_bc.value, rowkey_user,
                                               columns=["info:message"])

你可能感兴趣的:(spark)