【spark】开发过程中遇到的问题

一、批量写入redis

1、问题描述

        提取dataset固定字段写入redis,使用pipeline方式,最初一直报redis序列化问题。经排查是将获取jedis的操作放在了foreachPartition外面了,导致获取jedis在外部,使用在内部,故报这个错。

2、错误代码

       //推送redis端
        val jedis = RedisUtils.getJedisClient(redisHost, redisPort)
        //redis密码
        jedis.auth(redisPwd)
        val pipeline: Pipeline = jedis.pipelined()
        pipeline.select(redisIndex)
        ds_redis.foreachPartition(row => {
        val map = new util.HashMap[String, String]()
        row.foreach(line => {
          val url_domain: String = line.getAs[String]("url_domain")
          map.clear()
          map.put("taskId",taskId)
          map.put("userId",userId)
          map.put("url",url_domain)
          pipeline.xadd(redisStreamKey,StreamEntryID.NEW_ENTRY,map)
        })
        pipeline.sync()
        pipeline.close()

3、正确操作

        ds_redis.foreachPartition(row => {
        //推送redis端
        val jedis = RedisUtils.getJedisClient(redisHost, redisPort)
        //redis密码
        jedis.auth(redisPwd)
        val pipeline: Pipeline = jedis.pipelined()
        pipeline.select(redisIndex)
        val map = new util.HashMap[String, String]()
        row.foreach(line => {
          val url_domain: String = line.getAs[String]("url_domain")
          map.clear()
          map.put("taskId",taskId)
          map.put("userId",userId)
          map.put("url",url_domain)
          pipeline.xadd(redisStreamKey,StreamEntryID.NEW_ENTRY,map)
        })
        pipeline.sync()
        pipeline.close()

你可能感兴趣的:(Spark技术经验,开发语言,spark)