数据库连接池 实现 spark 写入数据库中

spark 批量数据写入数据库中书多线程的,创建数据库连接池来优化 数据库Connection的来接

  • 配置文件db.properties

还有参数MAX-IDLE MIN-IDLE等待略

jdbc.driver=com.mysql.jdbc.Driver
jdbc.url=jdbc:mysql://localhost:3306/test
jdbc.user=spark
jdbc.password=spark
jdbc.max.active=10  //最大连接数
  • 常亮类Constans

当常量很多时,单独封装Constans便于扩展和修改

public interface Constants {
    String JDBC_DRIVER = "jdbc.driver";
    String JDBC_URL = "jdbc.url";
    String JDBC_USER = "jdbc.user";
    String JDBC_PASSWORD = "jdbc.password";
    String JDBC_MAX_ACTIVE = "jdbc.max.active";
}
  • 连接池ConnectionPool
public class ConnectionPool {

    private static LinkedList pool = new LinkedList();
    private ConnectionPool(){}

    static {//初始化工作
        //注册驱动
        try {
            Properties properties = new Properties();
            properties.load(ConnectionPool.class.getClassLoader().getResourceAsStream("db.properties"));
            Class.forName(properties.getProperty(Constants.JDBC_DRIVER));
            int maxActive = Integer.valueOf(properties.getProperty(Constants.JDBC_MAX_ACTIVE));
            String url = properties.getProperty(Constants.JDBC_URL);
            String password = properties.getProperty(Constants.JDBC_PASSWORD);
            String user = properties.getProperty(Constants.JDBC_USER);
            
            // 创建连接对象
            for(int i = 0; i < maxActive; i++) {
                pool.push(DriverManager.getConnection(url, user, password));
            }
        } catch (Exception e) {
            throw new ExceptionInInitializerError("初始化异常~");
        }
    }
    
	// 获得连接池对象
    public static Connection getConnection() {
        while(pool.isEmpty()) {
            try {
                System.out.println("线程池为空,请稍后再来~~~~~");
                Thread.sleep(2000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
        return pool.poll();
    }
    
	// spark写入数据库线程结束后,释放连接。也就是连接池收回连接
    public static void release(Connection connection) {
        pool.push(connection);
    }
}
  • 代码调用
private def foreachOps5(rbkRDD: RDD[(String, Int)]) = {
    rbkRDD.foreachPartition(partition => {
        val connection:Connection = ConnectionPool.getConnection()
        val sql = "insert into wordcount(word, `count`) values(?, ?)"
        val ps = connection.prepareStatement(sql)
        //step 3
        partition.foreach{case (word, count) => {
            ps.setString(1, word)
            ps.setInt(2, count)
            ps.addBatch()
        }}
        ps.executeBatch()
        ps.close()
        ConnectionPool.release(connection)
    })
}

当分区中的数据很大时(可以更具数据预估), ps.addBatch() 缓存可能会不够,所有在添加到缓存中时,可以让他 5万或者(估计值)执行一次 ps.executeBatch()。只需要在添加变量 var count = 0 计数就行

你可能感兴趣的:(spark)