hadoop+spark+spring简单整合

       本demo基于“多语言版基础管理系统”http://fmfl.iteye.com/blog/2286414 ,作为组件整合进系统中

       组件的基本功能:

       1.文件上传

       2.文件读取

       3.简单入库

       4.简单查询

主要涉及技术或框架:hadoop2.7.x,spark1.6.x,spring,as

数据库:mysql

服务器:tomcat7.x

环境:osX,redhat,fedora

 

该部分源码:https://github.com/394286006/minn-hadoop.git

 

表结构:

CREATE TABLE `hadoopspark` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(100) DEFAULT NULL,
  `email` varchar(100) DEFAULT NULL,
  `qq` varchar(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) 

 

下图:本系统应用的抽象架构表示

hadoop+spark+spring简单整合_第1张图片

以下图为功能展示:

图1:文件上传

图2:通过命令查看比较文件名

图3:比较文件内容

hadoop+spark+spring简单整合_第2张图片

图4:入库后的数据比较

 

部分java代码:

package p.minn.spark.jdbc;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;

import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SQLContext;

import p.minn.common.utils.Page;

/**
 * 
 * @author minn
 * @QQ:3942986006
 * @omment 
 */
public class BaseSparkJDBC<T> implements Serializable{

 
  private JavaSparkContext javaSparkContext;
  
  private SQLContext sqlContext;
  
  private Properties options;
  
  
  
  public BaseSparkJDBC(String url,String driver,String user,String password) {
    super();
    options=new Properties();
    options.put("url", url);
    options.put("driver", driver);
    options.put("user", user);
    options.put("password", password);
  }

  public void save(List<T> list,Class<T> clz,String targettable){
    JavaRDD<T> jrdd=  javaSparkContext.parallelize(list);
    DataFrame df=sqlContext.createDataFrame(jrdd,clz);
    df.write().mode("append").jdbc(options.getProperty("url"), targettable, options);
  }

  public void save(T hs,Class<T> clz,String targettable) {
    List<T> list=new ArrayList<T>();
    list.add(hs);
    save(list,clz,targettable);
  }
  
  
  protected  List<T> pageSql(Function<Row,T> rdd,Page page,String targettable,String sqltxt) throws Exception{
    DataFrame jdbcDF =sqlContext.read().jdbc(options.getProperty("url"), targettable, options);
    jdbcDF.registerTempTable(targettable);
    List<T> list=jdbcDF.sqlContext().sql(sqltxt).limit(page.getTotal()).javaRDD().map(rdd).take(page.getRp());
    return list;
  }

  public  int getTotal(String targettable,String sqltxt){
    int count=0;
    DataFrame jdbcDF =sqlContext.read().jdbc(options.getProperty("url"), targettable, options);
    jdbcDF.registerTempTable(targettable);
    Row[] rows= jdbcDF.sqlContext().sql(sqltxt).collect();
    if(rows!=null){
      count=(int)rows[0].getLong(0);
    }
    return count;
  }
  public void setJavaSparkContext(JavaSparkContext javaSparkContext) {
    this.javaSparkContext = javaSparkContext;
  }


  public void setSqlContext(SQLContext sqlContext) {
    this.sqlContext = sqlContext;
  }

}

 

 

你可能感兴趣的:(spring,hadoop,Web,spark)