ZoeYen_

收视率系统

一、项目背景

对《中国好声音》、《快乐男声》、《最美和声》、《中国梦之声》等各种音乐选节目收视率的一个调查。依托北330万高清交互数字电视双向用户，从中随机抽取25000户作为样本进行统计。

二、项目需求

这里展示从节目的维度，统计每个节目的平均收视人数、平均到达人数、收视率、到达率和市场份额。我们根据每天抽样用户的收视数据，统计出每个节目按天、按小时、按分钟的上述5个收视指标。

三、系统功能（这里以一个维度为例）

主要包括收视概况浏览、收视率走势分析、收视指标对比、收视数据对比查看。

五、收视指标定义

收视人数
总的：某天收视人数（S11）：sum(distinct stbnum) WHERE 指定日期。

频道：某天收视人数（S21）：sum(distinct stbnum) WHERE 指定日期 AND 指定频道。

节目（这1天内收看此节目的人数）：某天收视人数（S31）：sum(distinct stbnum) WHERE 指定日期 AND 指定节目。

平均收视人数
该指标为在选定期间内平均每分钟的用户ID数。

总的：

每分钟(X11)： sum(distinct stbnum)

每分钟(X12)：……

每分钟(X1n)：……

平均收视人数：(X11 + X12 + … + X1n)/n

频道：

每分钟(X21)： sum(distinct stbnum) where 指定频道名称 = 频道名

每分钟(X22)：……

每分钟(X2n)：……

平均收视人数：(X21 + X22 + … + X2n)/n

节目：

每分钟(X31)： sum(distinct stbnum) where 指定节目名称 = 节目名

每分钟(X32)：……

每分钟(X3n)：……

平均收视人数：(X31 + X32 + … + X3n)/n

收视率
平均收视人数/系统总ID数。

CONSTANT系统总ID数 IDNUM = sum(distinct stbnum)。

总的，频道，节目：

每分钟收视率Y1：X1/IDNUM;

每分钟收视率Y2：……

每分钟收视率Yn：Xn/IDNUM

某一段时间的收视率：（Y1 + Y2… + Yn）/n

市场份额
对应频道平均收视人数/所有频道平均收视人数。

总体：

100%

频道：

每分钟(Z21)：X21/ sum(distinct stbnum) where 时间

……

每分钟(Z2n)：X2n/ sum(distinct stbnum) where 时间

市场份额：(Z21 + … + Z2n)/n

节目：

每分钟(Z31)：X31/ sum(distinct stbnum) where 时间

……

每分钟(Z3n)：X3n/ sum(distinct stbnum) where 时间

市场份额：(Z31 + … + Z3n)/n

平均到达人数
默认扣除在某个频道或整个系统停留时间小于60s的用户ID，不包括60s，跟平均收视人数的差别在于排除原始记录中停留时间小于60s的记录。

总的：

每分钟(U11)： sum(distinct stbnum) WHERE ((a_e – a_s)>=60)

每分钟(U12)：……

每分钟(U1n)：……

平均到达人数：(U11 + U12 + … + U1n)/n

频道：

每分钟(U21)： sum(distinct stbnum) where 指定频道名称 = 频道名 AND ((a_e – a_s)>=60)

每分钟(U22)：……

每分钟(U2n)：……

平均到达人数：(U21 + U22 + … + U2n)/n

节目：

每分钟(U31)： sum(distinct stbnum) where 指定节目名称 = 节目名 AND ((a_e – a_s)>=60)

每分钟(U32)：……

每分钟(U3n)：……

平均到达人数：(U31 + U32 + … + U3n)/n

到达率
平均到达人数/系统总ID数。

CONSTANT系统总ID数 IDNUM = sum(distinct stbnum)。

总的，频道，节目：

每分钟(V1):U1/ IDNUM

……

每分钟(Vn):Un/ IDNUM

某一段时间的到达率：(V1 + V2 + … + Vn)/n

人均收视时长
所有频道 —— 每天所有用户ID的总时间/用户ID数；具体某个频道 —— 访问过该频道的所有用户ID每天总时间/该频道每天的用户ID数；具体某个栏目 —— 访问过每期节目的所有用户ID总时间/该栏目的用户ID数。

总的：

某天人均收视时长（W11）:SUM(a_e – a_s)/S11

频道：

某天人均收视时长（W21）:SUM(a_e – a_s)/S21

节目：

某天人均收视时长（W31）:SUM(a_e – a_s)/S31

六、开发流程

1.通过flume收集工具将用户产生的原始数据收集到hdfs分布式文件系统。

2.编写MR程序对原始的收视数据进行解析、清洗、提取业务所需的有效字段。

3.利用hive工具将MR处理后的数据导入数据仓库，同时对该数据进行统计分析。

4.编写应用程序或者使用sqoop工具将hive分析的最终数据导入数据库，比如mysql数据库。

5.前端查询，实现数据的可视化。

七.源数据

利用hdfs的小文件合并MergeSmallFilesToHDFS.java将每天的小文件合并为大文件，具体参考http://blog.csdn.net/zoeyen_/article/details/78947676

八、将源数据上传到hdfs文件系统

这里使用flume采集工具，我将flume工具安装在主节点（pc1）上，仅使用了一层agent。

①启动集群

②修改flume的配置文件

[hadoop@pc1 conf]$ vi flume-conf.properties

agent1.channels = ch1
agent1.sinks = sink1

# Define and configure an Spool directory source（使用spooldir监控日志目录）
agent1.sources.source1.channels = ch1
agent1.sources.source1.type = spooldir
agent1.sources.source1.spoolDir = /home/hadoop/tvdata 
#前三项必须配置，具体参数可参考官方文档
agent1.sources.source1.ignorePattern = event(_\d{4}\-\d{2}\-\d{2}_\d{2}_\d{2})?\.log(\.COMPLETED)?
agent1.sources.source1.deserializer.maxLineLength = 10240 
#配置收集每行数据的最大长度

# Configure channel（channel 选择file，防止数据丢失）
agent1.channels.ch1.type = file 
#也可以配置内存
agent1.channels.ch1.checkpointDir = /home/hadoop/app/flume/checkpointDir
agent1.channels.ch1.dataDirs = /home/hadoop/app/flume/dataDirs
#在flume目录下创建以上两个路径
# Define and configure a hdfs sink（数据采集到hdfs）
agent1.sinks.sink1.channel = ch1
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path =
hdfs://pc1:9000/home/app/tvdata/%Y%m%d
#如果是集群就配置对外提供服务的地址
agent1.sinks.sink1.hdfs.useLocalTimeStamp = true
agent1.sinks.sink1.hdfs.rollInterval = 300
agent1.sinks.sink1.hdfs.rollSize = 67108864
agent1.sinks.sink1.hdfs.rollCount = 0
#agent1.sinks.sink1.hdfs.codeC = snappy #没有做snappy压缩

③创建路径

④将源数据上传到tvtest目录下

⑤进入flume安装目录，执行运行命令

[hadoop@node2 flume]$bin/flume-ng agent -n agent1 -c conf -f conf/flume-conf.properties

⑥查看

出现乱码
查看官方文档，hdfs.fileType默认为SequenceFil，改为datastream就可以按原样输出数据到hdfs。

删除已经采集到hdfs的数据，重新采集

九、编写MR程序对原始的收视数据进行解析、清洗、提取业务需要的有效字段

①对源数据进行预处理，提取需要的数据。编写一个只有mapper的mapreduce程序，调用一个DataUtil接口，这个接口引用了jsoup的jar包，来解析源数据的每一行数据，将机顶盒号和日期作为输出key，其它作为输出value。其中日期的解析由TimeUtil这个类实现。

/*
 * 解析机顶盒用户原始数据
 */
public class ParseAndFilterLog extends Configured implements Tool {

    /*
     * 只需Mapper完成原始数据解析
     */
    public static class ExtractTVMsgLogMapper extends
            //Mapper<LongWritable, BytesWritable, Text, Text> {
        Mapper {
        //public void map(LongWritable key, BytesWritable value, Context context)
        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            // 原始数据
            //String data = new String(value.getBytes(), 0, value.getLength());
            String data = value.toString();
            // 调用接口直接解析出我们需要数据格式
            // stbNum + "@" + date + "@" + sn + "@" + p+ "@" + s + "@" + e + "@"
            // + duration
            DataUtil.transData(data, context);
        }

    }

    public int run(String[] args) throws Exception {
        // TODO Auto-generated method stub
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args)
                .getRemainingArgs();
        if (otherArgs.length < 2) {
            System.err.println("Usage: ParseAndFilterLog [...] ");
            System.exit(2);
        }       

        Job job = Job.getInstance();

        // 设置输出key value分隔符
        job.getConfiguration().set("mapreduce.output.textoutputformat.separator", "@");

        job.setJarByClass(ParseAndFilterLog.class);
        job.setMapperClass(ExtractTVMsgLogMapper.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        //job.setInputFormatClass(SequenceFileInputFormat.class);
        // 设置输入路径
        for (int i = 0; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }

        // 设置输出路径
        FileOutputFormat.setOutputPath(job, new Path(
                otherArgs[otherArgs.length - 1]));
        return job.waitForCompletion(true) ? 0 : 1;
    }
    public static void main(String[] args) throws Exception {
        int ec = ToolRunner.run(new Configuration(),new ParseAndFilterLog(), args);
        System.exit(ec);
    }
}

/**
 * 
 * 解析机顶盒用户原始数据
 * 
 * 
 * 
 * 
 * 
 *
 */
public class DataUtil {

    @SuppressWarnings("unchecked")
    public static void transData(String text,Context context) {
        try {
            //通过Jsoup解析每行数据
            Document doc = Jsoup.parse(text);

            //获取WIC标签内容，每行数据只有一个WIC标签
            Elements content = doc.getElementsByTag("WIC");

            //解析出机顶盒号
            String stbNum = content.get(0).attr("stbNum");
            if(stbNum == null||"".equals(stbNum)){
                return ;
            }

            //解析出日期
            String date = content.get(0).attr("date");

            if(date == null||"".equals(date)){
                return ;
            }

            //解析A标签
            Elements els = doc.getElementsByTag("A");

            for (Element el : els) {
                //解析结束时间
                String e = el.attr("e");
                if(e ==null||"".equals(e)){
                    break;
                }
                //解析起始时间
                String s = el.attr("s");
                if(s == null||"".equals(s)){
                    break;
                }

                //解析节目内容
                String p = el.attr("p");
                if(p == null||"".equals(p)){
                    break;
                }

                //解析频道
                String sn = el.attr("sn");

                if(sn ==null||"".equals(sn)){
                    break ;
                }

                //对节目解码
                p = URLDecoder.decode(p, "utf-8");

                //解析出统一的节目名称，比如：天龙八部(1)，天龙八部(2)，同属于一个节目
                int index = p.indexOf("(");

                if (index != -1) {
                    p = p.substring(0, index);
                } 

                //起始时间转换为秒
                int startS = TimeUtil.TimeToSecond(s);

                //结束时间转换为秒
                int startE = TimeUtil.TimeToSecond(e);

                if (startE < startS) {
                    startE = startE + 24 * 3600;
                }
                //每条记录的收看时长
                int duration = startE - startS;

                context.write(new Text(stbNum + "@" + date), new Text(sn + "@" + p+ "@" + s + "@" + e + "@" + duration));

            }
        } catch (Exception e) {
               e.printStackTrace();
        }
    }
}

import java.util.ArrayList;
import java.util.List;
/**
 * 
 * 时间工具
 *
 */
public class TimeUtil {
    /**
     * 将时间00:00:00转换为秒 int
     * 
     * @param time
     * @return
     */
    public static int TimeToSecond(String time) {
        if (time == null||time.equals("")) {
            return 0;
        }
        String[] my = time.split(":");
        int hour = Integer.parseInt(my[0]);
        int min = Integer.parseInt(my[1]);
        int sec = Integer.parseInt(my[2]);
        int totalSec = hour * 3600 + min * 60 + sec;

        return totalSec;
    }

    /**
     * 将时间00:00:00转换为秒 String
     * 
     * @param time
     * @return
     */
    public static String TimeToSecond2(String time) {
        if (time == null) {
            return "";
        }
        String[] my = time.split(":");
        int hour = Integer.parseInt(my[0]);
        int min = Integer.parseInt(my[1]);
        int sec = Integer.parseInt(my[2]);
        int totalSec = hour * 3600 + min * 60 + sec;

        return totalSec + "";
    }

    /**
     * 求两个时间的字符串差值
     * @param a_e
     * @param a_s
     * @return
     */
    public static String getDuration(String a_e, String a_s) {
        if (a_e == null || a_s == null) {
            return 0 + "";
        }
        int ae = Integer.parseInt(a_e);
        int as = Integer.parseInt(a_s);
        return (ae - as) + "";

    }

    /**
     * 将时间 00:00转换为秒 int
     * 
     * @param time
     * @return
     */
    public static int Time2ToSecond(String time) {
        if (time == null) {
            return 0;
        }
        String[] my = time.split(":");
        int hour = Integer.parseInt(my[0]);
        int min = Integer.parseInt(my[1]);
        int totalSec = hour * 3600 + min * 60;

        return totalSec;
    }

    /**
     * 提取start end 之间的分钟数
     * 
     * @param time
     * @return
     */
    public static List getTimeSplit(String start, String end) {
        List list = new ArrayList();
        String[] s = start.split(":");
        int sh = Integer.parseInt(s[0]);
        int sm = Integer.parseInt(s[1]);
        String[] e = end.split(":");
        int eh = Integer.parseInt(e[0]);
        int em = Integer.parseInt(e[1]);
        if (eh < sh) {
            eh = 24;
        }
        if (sh == eh) {
            for (int m = sm; m <= em; m++) {
                int am = m + 1;
                int ah = sh;
                if (am == 60) {
                    am = 0;
                    ah += 1;
                }
                String hstr = "";
                String mstr = "";
                if (sh < 10) {
                    hstr = "0" + sh;
                } else {
                    hstr = sh + "";
                }
                if (m < 10) {
                    mstr = "0" + m;
                } else {
                    mstr = m + "";
                }
                String time =  hstr + ":" + mstr ;
                list.add(time);
            }
        } else {
            for (int h = sh; h <= eh; h++) {
                if (h == 24) {
                    break;
                }
                if (h == sh) {
                    for (int m = sm; m <= 59; m++) {
                        int am = m + 1;
                        int ah = h;
                        if (am == 60) {
                            am = 0;
                            ah += 1;
                        }
                        String hstr = "";
                        String mstr = "";
                        if (h < 10) {
                            hstr = "0" + h;
                        } else {
                            hstr = h + "";
                        }
                        if (m < 10) {
                            mstr = "0" + m;
                        } else {
                            mstr = m + "";
                        }
                        String time =  hstr + ":" + mstr ;
                        list.add(time);
                    }
                } else if (h == eh) {
                    for (int m = 0; m <= em; m++) {
                        int am = m + 1;
                        int ah = h;
                        if (am == 60) {
                            am = 0;
                            ah += 1;
                        }
                        String hstr = "";
                        String mstr = "";
                        if (h < 10) {
                            hstr = "0" + h;
                        } else {
                            hstr = h + "";
                        }
                        if (m < 10) {
                            mstr = "0" + m;
                        } else {
                            mstr = m + "";
                        }
                        String time = hstr + ":" + mstr ;
                        list.add(time);
                    }
                } else {
                    for (int m = 0; m <= 59; m++) {
                        int am = m + 1;
                        int ah = h;
                        if (am == 60) {
                            am = 0;
                            ah += 1;
                        }
                        String hstr = "";
                        String mstr = "";
                        if (h < 10) {
                            hstr = "0" + h;
                        } else {
                            hstr = h + "";
                        }
                        if (m < 10) {
                            mstr = "0" + m;
                        } else {
                            mstr = m + "";
                        }
                        String time = hstr + ":" + mstr ;
                        list.add(time);
                    }
                }
            }
        }
        return list;
    }
}

数据解析之后，得到
01050908200000218@2012-09-22@浙江卫视@综艺精选@23:58:04@00:03:05@301
01050908200000218@2012-09-22@浙江卫视@综艺精选@00:18:03@00:23:03@300

调用接口直接解析出我们需要数据格式
stbNum + “@” + date + “@” + sn + “@” + p+ “@” + s + “@” + e + “@”
stbNum是机顶盒号 date是日期 s是每条记录的起始时间 e是每条记录的结束时间 p是具体节目 sn是具体频道

②针对①的结果统计当前在播数，写一个类名为ExtractCurrentNum.的MR程序统计每分钟的当前在播数。

/**
 * 
 * 针对上一步的结果统计每分钟的当前在播数
 * 
 */
public class ExtractCurrentNum  extends Configured implements Tool {
    public static class ExtractCurrentNumMapper extends
            Mapper<LongWritable, Text, Text, Text> {
        @Override
        protected void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            // stbNum0+"@"+date1+"@"+sn2+"@"+p3+"@"+s4+"@"+e5+"@"+duration6
            String[] kv = StringUtils.split(value.toString(), "@");
            // 过滤掉不合格数据
            if (kv.length != 7) {
                return;
            }
            // 机顶盒号
            String stbnum = kv[0].trim();
            // 日期
            String date = kv[1].trim();

            // 将时间段解析为每分钟记录，比如23:51:45~23:56:45之间的每分钟
            List list = TimeUtil.getTimeSplit(kv[4], kv[5]);
            int size = list.size();
            // 循环统计所有指标每分钟的数据
            for (int i = 0; i < size; i++) {
                // 根据start end 切割的每分钟
                String min = list.get(i);

                // 输出每分钟当前在播人数（1）
                context.write(new Text(date + "@" + min), new Text(stbnum));
            }
        }
    }

    public static class ExtractCurrentNumReduce extends
            Reducer<Text, Text, Text, Text> {
        private Text result = new Text();
        // 定义当前在播数集合
        private Set set_curnum = new HashSet();

        protected void reduce(Text key, Iterable values, Context context)
                throws IOException, InterruptedException {
            set_curnum.clear();
            for (Text value : values) {
                set_curnum.add(value.toString());
            }
            // 计算出当前在播人数
            result.set(set_curnum.size()+"");
            context.write(key, result);
        }

    }

    public int run(String[] args) throws Exception {
        // TODO Auto-generated method stub
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args)
                .getRemainingArgs();
        if (otherArgs.length < 2) {
            System.err
                    .println("Usage: ExtractProgramCurrentNum [...] ");
            System.exit(2);
        }

        Job job = Job.getInstance();

        // 设置输出key value分隔符
        job.getConfiguration().set(
                "mapreduce.output.textoutputformat.separator", "@");

        job.setJarByClass(ExtractCurrentNum.class);
        job.setMapperClass(ExtractCurrentNumMapper.class);
        job.setReducerClass(ExtractCurrentNumReduce.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        // 设置输入路径
        for (int i = 0; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }

        // 设置输出路径
        FileOutputFormat.setOutputPath(job, new Path(
                otherArgs[otherArgs.length - 1]));
        return job.waitForCompletion(true) ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        int ec = ToolRunner.run(new Configuration(),
                new ExtractCurrentNum(), args);
        System.exit(ec);
    }
}

③针对①的结果统计每个频道每天的收视人数和人均收视时长

/**
 * 
 * 统计每个频道每天的收视人数和人均收视时长
 * 
 */
public class ExtractChannelNumAndTimelen   extends Configured implements Tool {
    public static class ExtractChannelNumAndTimelenMapper extends
            Mapper<LongWritable, Text, Text, Text> {
        @Override
        protected void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            // stbNum0+"@"+date1+"@"+sn2+"@"+p3+"@"+s4+"@"+e5+"@"+duration6
            String[] kv = StringUtils.split(value.toString(), "@");
            // 过滤掉不合格数据
            if (kv.length != 7) {
                return;
            }
            // 机顶盒号
            String stbnum = kv[0].trim();
            // 日期
            String date = kv[1].trim();
            // 节目
            String channel = kv[2].trim();

            String duration = kv[6].trim();
            // 输出每条记录用户的机顶盒号和时长
            context.write(new Text(channel + "@" + date), new Text(stbnum + "@"
                    + duration));
        }
    }

    public static class ExtractChannelNumAndTimelenReduce extends
            Reducer<Text, Text, Text, Text> {
        private Text result = new Text();
        // 定义收视人数集合
        private Set set_num = new HashSet();

        protected void reduce(Text key, Iterable values, Context context)
                throws IOException, InterruptedException {
            set_num.clear();
            int timelen = 0;
            for (Text value : values) {
                String[] arr = StringUtils.split(value.toString(), "@");
                set_num.add(arr[0]);
                // 满足到达条件
                if (arr.length > 1) {
                    timelen += Integer.parseInt(arr[1]);
                }
            }
            int num = set_num.size();
            // 计算出每天的收视人数和人均收视时长
            result.set(num + "@" + timelen / num);
            context.write(key, result);
        }

    }

    public int run(String[] args) throws Exception {
        // TODO Auto-generated method stub
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args)
                .getRemainingArgs();
        if (otherArgs.length < 2) {
            System.err
                    .println("Usage: ExtractChannelNumAndTimelen [...] ");
            System.exit(2);
        }

        Job job = Job.getInstance();

        // 设置输出key value分隔符
        job.getConfiguration().set(
                "mapreduce.output.textoutputformat.separator", "@");

        job.setJarByClass(ExtractChannelNumAndTimelen.class);
        job.setMapperClass(ExtractChannelNumAndTimelenMapper.class);
        job.setReducerClass(ExtractChannelNumAndTimelenReduce.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        // 设置输入路径
        for (int i = 0; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }

        // 设置输出路径
        FileOutputFormat.setOutputPath(job, new Path(
                otherArgs[otherArgs.length - 1]));
        return job.waitForCompletion(true) ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        int ec = ToolRunner.run(new Configuration(),
                new ExtractChannelNumAndTimelen(), args);
        System.exit(ec);
    }
}

④针对①的结果统计每个频道每分钟的平均收视人数和平均到达人数

/**
 * 
 * 针对上一步的结果统计每个频道每分钟的平均收视人数和平均到达人数
 * 
 */
public class ExtractChannelAvgAndReachNum  extends Configured implements Tool {
    public static class ExtractChannelAvgAndReachNumMapper extends
            Mapper<LongWritable, Text, Text, Text> {
        @Override
        protected void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            // stbNum0+"@"+date1+"@"+sn2+"@"+p3+"@"+s4+"@"+e5+"@"+duration6
            String[] kv = StringUtils.split(value.toString(), "@");
            // 过滤掉不合格数据
            if (kv.length != 7) {
                return;
            }
            // 机顶盒号
            String stbnum = kv[0].trim();
            // 日期
            String date = kv[1].trim();
            // 频道
            String channel = kv[2].trim();
            // 起始时间
            int start = TimeUtil.TimeToSecond(kv[4].trim());
            // 结束时间
            int end = TimeUtil.TimeToSecond(kv[5].trim());

            // 将时间段解析为每分钟记录，比如23:51:45~23:56:45之间的每分钟
            List list = TimeUtil.getTimeSplit(kv[4], kv[5]);
            int size = list.size();
            // 循环统计所有指标每分钟的数据
            for (int i = 0; i < size; i++) {
                // 根据start end 切割的每分钟
                String min = list.get(i);

                // 输出每分钟栏目的收视人数（1）和到达人数（0，1）
                if ((end - start) > 60) {
                    // 达到人数满足此条件
                    context.write(new Text(channel + "@" + date + "@" + min),
                            new Text(stbnum + "@" + stbnum));
                }
                context.write(new Text(channel + "@" + date + "@" + min),
                        new Text(stbnum + "@"));
            }
        }
    }

    public static class ExtractChannelAvgAndReachNumReduce extends
            Reducer<Text, Text, Text, Text> {
        private Text result = new Text();
        // 定义平均收视人数集合
        private Set set_avgnum = new HashSet();
        // 定义平均到达人数集合
        private Set set_reachnum = new HashSet();

        protected void reduce(Text key, Iterable values, Context context)
                throws IOException, InterruptedException {
            set_avgnum.clear();
            set_reachnum.clear();
            for (Text value : values) {
                String[] arr = StringUtils.split(value.toString(), "@");
                set_avgnum.add(arr[0]);
                // 满足到达条件
                if (arr.length > 1) {
                    set_reachnum.add(arr[1]);
                }
            }
            // 计算出每分钟的收视人数和到达人数
            result.set(set_avgnum.size() + "@" + set_reachnum.size());
            context.write(key, result);
        }

    }

    public int run(String[] args) throws Exception {
        // TODO Auto-generated method stub
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args)
                .getRemainingArgs();
        if (otherArgs.length < 2) {
            System.err
                    .println("Usage: ExtractChannelAvgAndReachNum [...] ");
            System.exit(2);
        }

        Job job = Job.getInstance();

        // 设置输出key value分隔符
        job.getConfiguration().set(
                "mapreduce.output.textoutputformat.separator", "@");

        job.setJarByClass(ExtractChannelAvgAndReachNum.class);
        job.setMapperClass(ExtractChannelAvgAndReachNumMapper.class);
        job.setReducerClass(ExtractChannelAvgAndReachNumReduce.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        // 设置输入路径
        for (int i = 0; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }

        // 设置输出路径
        FileOutputFormat.setOutputPath(job, new Path(
                otherArgs[otherArgs.length - 1]));
        return job.waitForCompletion(true) ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        int ec = ToolRunner.run(new Configuration(),
                new ExtractChannelAvgAndReachNum(), args);
        System.exit(ec);
    }
}

⑤针对①的结果统计每个节目每分钟的平均收视人数和平均到达人数

/**
 * 
 * 针对上一步的结果统计每个节目每分钟的平均收视人数和平均到达人数
 * 
 */
public class ExtractProgramAvgAndReachNum extends Configured implements Tool {
    public static class ExtractAvgAndReachNumMapper extends
            Mapper<LongWritable, Text, Text, Text> {
        @Override
        protected void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            // stbNum0+"@"+date1+"@"+sn2+"@"+p3+"@"+s4+"@"+e5+"@"+duration6
            String[] kv = StringUtils.split(value.toString(), "@");
            // 过滤掉不合格数据
            if (kv.length != 7) {
                return;
            }
            // 机顶盒号
            String stbnum = kv[0].trim();
            // 日期
            String date = kv[1].trim();
            // 节目
            String column = kv[3].trim();
            // 起始时间
            int start = TimeUtil.TimeToSecond(kv[4].trim());
            // 结束时间
            int end = TimeUtil.TimeToSecond(kv[5].trim());

            // 将时间段解析为每分钟记录，比如23:51:45~23:56:45之间的每分钟
            List list = TimeUtil.getTimeSplit(kv[4], kv[5]);
            int size = list.size();
            // 循环统计所有指标每分钟的数据
            for (int i = 0; i < size; i++) {
                // 根据start end 切割的每分钟
                String min = list.get(i);

                // 输出每分钟栏目的收视人数（1）和到达人数（0，1）
                if ((end - start) > 60) {
                    // 达到人数满足此条件
                    context.write(new Text(column + "@" + date + "@" + min),
                            new Text(stbnum + "@" + stbnum));
                }
                context.write(new Text(column + "@" + date + "@" + min),
                        new Text(stbnum + "@"));
            }
        }
    }

    public static class ExtractAvgAndReachNumReduce extends
            Reducer<Text, Text, Text, Text> {
        private Text result = new Text();
        // 定义平均收视人数集合
        private Set set_avgnum = new HashSet();
        // 定义平均到达人数集合
        private Set set_reachnum = new HashSet();

        protected void reduce(Text key, Iterable values, Context context)
                throws IOException, InterruptedException {
            set_avgnum.clear();
            set_reachnum.clear();
            for (Text value : values) {
                String[] arr = StringUtils.split(value.toString(), "@");
                set_avgnum.add(arr[0]);
                // 满足到达条件
                if (arr.length > 1) {
                    set_reachnum.add(arr[1]);
                }
            }
            // 计算出每分钟的收视人数和到达人数
            result.set(set_avgnum.size() + "@" + set_reachnum.size());
            context.write(key, result);
        }

    }

    public int run(String[] args) throws Exception {
        // TODO Auto-generated method stub
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args)
                .getRemainingArgs();
        if (otherArgs.length < 2) {
            System.err.println("Usage: ExtractProgramAvgAndReachNum [...] ");
            System.exit(2);
        }

        Job job = Job.getInstance();

        // 设置输出key value分隔符
        job.getConfiguration().set(
                "mapreduce.output.textoutputformat.separator", "@");

        job.setJarByClass(ExtractProgramAvgAndReachNum.class);
        job.setMapperClass(ExtractAvgAndReachNumMapper.class);
        job.setReducerClass(ExtractAvgAndReachNumReduce.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        // 设置输入路径
        for (int i = 0; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }

        // 设置输出路径
        FileOutputFormat.setOutputPath(job, new Path(
                otherArgs[otherArgs.length - 1]));
        return job.waitForCompletion(true) ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        int ec = ToolRunner.run(new Configuration(),
                new ExtractProgramAvgAndReachNum(), args);
        System.exit(ec);
    }
}

StringUtils接口实现字符串转换成数组的功能

public class StringUtils {

    /*
     * 将字符串转换为数组
     */
    public static String[] split(String value,String regex){
        if(value==null)
            value = "";
        String[] valueItems = value.split(regex);
        return valueItems;
    }
}

⑥针对①的结果统计每个节目每天的收视人数和人均收视时长。

/**
 * 
 * 针对上一步的结果统计每个节目每天的收视人数和人均收视时长
 * 
 */
public class ExtractProgramNumAndTimelen  extends Configured implements Tool {
    public static class ExtractProgramNumAndTimelenMapper extends
            Mapper<LongWritable, Text, Text, Text> {
        @Override
        protected void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            // stbNum0+"@"+date1+"@"+sn2+"@"+p3+"@"+s4+"@"+e5+"@"+duration6
            String[] kv = StringUtils.split(value.toString(), "@");
            // 过滤掉不合格数据
            if (kv.length != 7) {
                return;
            }
            // 机顶盒号
            String stbnum = kv[0].trim();
            // 日期
            String date = kv[1].trim();
            // 节目
            String column = kv[3].trim();

            String duration = kv[6].trim();
            //输出每条记录用户的机顶盒号和时长
            context.write(new Text(column + "@" + date),
                    new Text(stbnum + "@" + duration));
        }
    }

    public static class ExtractProgramNumAndTimelenReduce extends
            Reducer<Text, Text, Text, Text> {
        private Text result = new Text();
        // 定义收视人数集合
        private Set set_num = new HashSet();

        protected void reduce(Text key, Iterable values, Context context)
                throws IOException, InterruptedException {
            set_num.clear();
            int timelen = 0;
            for (Text value : values) {
                String[] arr = StringUtils.split(value.toString(), "@");
                set_num.add(arr[0]);
                // 满足到达条件
                if (arr.length > 1) {
                    timelen +=Integer.parseInt(arr[1]);
                }
            }
            int num = set_num.size();
            // 计算出每天的收视人数和人均收视时长
            result.set(num + "@" + timelen/num);
            context.write(key, result);
        }

    }
    public int run(String[] args) throws Exception {
        // TODO Auto-generated method stub
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args)
                .getRemainingArgs();
        if (otherArgs.length < 2) {
            System.err.println("Usage: ExtractProgramNumAndTimelen [...] ");
            System.exit(2);
        }

        Job job = Job.getInstance();

        // 设置输出key value分隔符
        job.getConfiguration().set(
                "mapreduce.output.textoutputformat.separator", "@");

        job.setJarByClass(ExtractProgramNumAndTimelen.class);
        job.setMapperClass(ExtractProgramNumAndTimelenMapper.class);
        job.setReducerClass(ExtractProgramNumAndTimelenReduce.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        // 设置输入路径
        for (int i = 0; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }

        // 设置输出路径
        FileOutputFormat.setOutputPath(job, new Path(
                otherArgs[otherArgs.length - 1]));
        return job.waitForCompletion(true) ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        int ec = ToolRunner.run(new Configuration(),
                new ExtractProgramNumAndTimelen(), args);
        System.exit(ec);
    }
}

⑦根据每分钟的当前在播数和每个频道每分钟的平均收视人数和平均到达人数统计频道每天每分钟的收视指标AnalyzeCountChannelRating。

public class AnalyzeCountChannelRating extends Configured implements Tool {
    public static class AnalyzeCountChannelRatingMapper extends
            Mapper<Object, Text, Text, Text> {
        // 存储当前在播数集合
        private Map curNumMap = new HashMap();

        /**
         * 获取分布式缓存文件
         */
        @SuppressWarnings("deprecation")
        protected void setup(Context context) throws IOException,
                InterruptedException {
            BufferedReader br;
            String infoAddr = null;
            // 返回缓存文件路径
            Path[] cacheFilesPaths = context.getLocalCacheFiles();
            for (Path path : cacheFilesPaths) {
                String pathStr = path.toString();
                br = new BufferedReader(new FileReader(pathStr));
                while (null != (infoAddr = br.readLine())) {
                    // 按行读取并解析当前在播数据
                    String[] tvjoin = StringUtils.split(infoAddr.toString(),
                            "@");
                    if (tvjoin.length == 3) {
                        curNumMap.put(
                                tvjoin[0].trim() + "@" + tvjoin[1].trim(),
                                tvjoin[2].trim());
                    }
                }
            }

        }

        @Override
        protected void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
            // channel0 + "@" + date1 + "@" + min2+avgnum3 +reachnum4
            String[] kv = StringUtils.split(value.toString(), "@");
            if (kv.length != 5) {
                return;
            }
            // 平均收视人数
            int avgnum = Integer.parseInt(kv[3].trim());
            // 到达人数
            int reachnum = Integer.parseInt(kv[4].trim());
            // 当前在播数
            int currentStbnum = Integer.parseInt(curNumMap.get(kv[1].trim()
                    + "@" + kv[2].trim()));
            // 收视率
            float tvrating = (float) avgnum / 25000 * 100;
            // 市场份额
            float marketshare = (float) avgnum / currentStbnum * 100;
            // 到达率
            float reachrating = (float) reachnum / 25000 * 100;
            // 将计算的所有指标输出
            context.write(value, new Text(tvrating + "@" + reachrating + "@"
                    + marketshare));

        }
    }

    public int run(String[] args) throws Exception {
        // TODO Auto-generated method stub
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args)
                .getRemainingArgs();
        if (otherArgs.length < 2) {
            System.err
                    .println("Usage: AnalyzeCountChannelRating cache in [...] ");
            System.exit(2);
        }

        Job job = Job.getInstance();

        // 设置输出key value分隔符
        job.getConfiguration().set(
                "mapreduce.output.textoutputformat.separator", "@");
        // 添加缓存文件
        FileSystem fs = FileSystem.get(conf);
        FileStatus[] dirstatus = fs.listStatus(new Path(otherArgs[0]));
        for (FileStatus file : dirstatus) {
            job.addCacheFile(file.getPath().toUri());
        }
        job.setJarByClass(AnalyzeCountChannelRating.class);
        job.setMapperClass(AnalyzeCountChannelRatingMapper.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        // 设置输入路径
        for (int i = 1; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }

        // 设置输出路径
        FileOutputFormat.setOutputPath(job, new Path(
                otherArgs[otherArgs.length - 1]));
        return job.waitForCompletion(true) ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        int ec = ToolRunner.run(new Configuration(),
                new AnalyzeCountChannelRating(), args);
        System.exit(ec);
    }
}

输入路径是
hdfs://pc2:9000/home/ECN/part-r-00000
hdfs://pc2:9000/home/ECAARN/part-r-00000
输出路径是
hdfs://pc2:9000/home/ACCR

[hadoop@pc1 hadoop]$ bin/hadoop jar ACCR.jar com.pc.hadoop.tvdata.AnalyzeCountChannelRating hdfs://pc2:9000/home/ECN/part-r-00000 hdfs://pc2:9000/home/ECAARN/part-r-00000 hdfs://pc2:9000/home/ACCR/

回车运行，报错

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://cluster1:9000/home/hadoop/ECAARN/part-r-00000, expected: hdfs://cluster1

代码不能识别此hdfs文件系统，期望：hdfs://cluster1
最后在http://blog.csdn.net/a2011480169/article/details/51804139找到解决办法。

在配置文件conf中指定所用的文件系统—HDFS

重新运行

⑦根据每分钟的当前在播数和每个节目每分钟的平均收视人数和平均到达人数统计节目每天每分钟的收视指标AnalyzeCountProgramRating。

/**
 * 
 * 统计频道每天每分钟的收视指标
 * 
 */
public class AnalyzeCountChannelRating extends Configured implements Tool {
    public static class AnalyzeCountChannelRatingMapper extends
            Mapper {
        // 存储当前在播数集合
        private Map curNumMap = new HashMap();

        /**
         * 获取分布式缓存文件
         */
        @SuppressWarnings("deprecation")
        protected void setup(Context context) throws IOException,
                InterruptedException {
            BufferedReader br;
            String infoAddr = null;
            // 返回缓存文件路径
            Path[] cacheFilesPaths = context.getLocalCacheFiles();
            for (Path path : cacheFilesPaths) {
                String pathStr = path.toString();
                br = new BufferedReader(new FileReader(pathStr));
                while (null != (infoAddr = br.readLine())) {
                    // 按行读取并解析当前在播数据
                    String[] tvjoin = StringUtils.split(infoAddr.toString(),
                            "@");
                    if (tvjoin.length == 3) {
                        curNumMap.put(
                                tvjoin[0].trim() + "@" + tvjoin[1].trim(),
                                tvjoin[2].trim());
                    }
                }
            }

        }

        @Override
        protected void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
            // channel0 + "@" + date1 + "@" + min2+avgnum3 +reachnum4
            String[] kv = StringUtils.split(value.toString(), "@");
            if (kv.length != 5) {
                return;
            }
            // 平均收视人数
            int avgnum = Integer.parseInt(kv[3].trim());
            // 到达人数
            int reachnum = Integer.parseInt(kv[4].trim());
            // 当前在播数
            int currentStbnum = Integer.parseInt(curNumMap.get(kv[1].trim()
                    + "@" + kv[2].trim()));
            // 收视率
            float tvrating = (float) avgnum / 25000 * 100;
            // 市场份额
            float marketshare = (float) avgnum / currentStbnum * 100;
            // 到达率
            float reachrating = (float) reachnum / 25000 * 100;
            // 将计算的所有指标输出
            context.write(value, new Text(tvrating + "@" + reachrating + "@"
                    + marketshare));

        }
    }

    public int run(String[] args) throws Exception {
        // TODO Auto-generated method stub
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://pc2:9000");                      
        String[] otherArgs = new GenericOptionsParser(conf, args)
                .getRemainingArgs();
        if (otherArgs.length < 2) {
            System.err
                    .println("Usage: AnalyzeCountChannelRating cache in [...] ");
            System.exit(2);
        }

        Job job = Job.getInstance();

        // 设置输出key value分隔符
        job.getConfiguration().set(
                "mapreduce.output.textoutputformat.separator", "@");
        // 添加缓存文件
        FileSystem fs = FileSystem.get(conf);
        FileStatus[] dirstatus = fs.listStatus(new Path(otherArgs[0]));
        for (FileStatus file : dirstatus) {
            job.addCacheFile(file.getPath().toUri());
        }
        job.setJarByClass(AnalyzeCountChannelRating.class);
        job.setMapperClass(AnalyzeCountChannelRatingMapper.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        // 设置输入路径
        for (int i = 0; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }

        // 设置输出路径
        FileOutputFormat.setOutputPath(job, new Path(
                otherArgs[otherArgs.length - 1]));
        return job.waitForCompletion(true) ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        int ec = ToolRunner.run(new Configuration(),
                new AnalyzeCountChannelRating(), args);
        System.exit(ec);
    }
}

输入路径是
hdfs://pc2:9000/home/ECN/part-r-00000
hdfs://pc2:9000/home/EPRAAN/part-r-00000
输出路径是
hdfs://pc2:9000/home/ACCR

[hadoop@pc1 hadoop]$ bin/hadoop jar ACPR.jar com.pc.hadoop.tvdata.AnalyzeCountProgramRating hdfs://pc1:9000/home/ECN/part-r-00000 hdfs://pc1:9000/home/EPRAAN/part-r-00000 hdfs://pc1:9000/home/ACPR/

由于没有安装maven，所以这里采用的是手动去导出export jar包，再用到全名的类名路径。

十、利用hive工具将MR处理后的数据导入数据仓库，同时对该数据进行统计分析
①先创建hive表（一般创建hive外部表）

创建channellog_min外部表
create external table channellog_min(tvchannel string,tvtime string,tvmin string,avgnum int ,reachnum int ,tvrating double,reachrating double,marketshare double) PARTITIONED BY(tvdate string) row format delimited fields terminated by ‘@’ location ‘/homer/hive/warehouse/channellog_min/’;

创建channellog_hour外部表
create external table channellog_hour(tvchannel string,tvtime string,tvhour string,avgnum int,reachnum int ,tvrating double,reachrating double,marketshare double) PARTITIONED BY(tvdate string) row format delimited fields terminated by ‘@’ location ‘/home/hive/warehouse/channellog_hour/’;

创建channellog_day外部表

create external table channellog_day(tvchannel string,tvtime string,avgnum int,reachnum int ,tvrating double,reachrating double,marketshare double)  PARTITIONED BY(tvdate string) row format delimited fields terminated by '@'   location '/home/hive/warehouse/channellog_day/';

创建columnlog_min外部表

create external table columnlog_min(tvcolumn string,tvtime string,tvmin string,avgnum int ,reachnum int ,tvrating double,reachrating double,marketshare double) PARTITIONED BY(tvdate string) row format delimited fields terminated by '@' location '/home/hive/warehouse/columnlog_min/';

创建columnlog_hour外部表

create external table columnlog_hour(tvcolumn string,tvtime string,tvhour string,avgnum int,reachnum int ,tvrating double,reachrating double,marketshare double) PARTITIONED BY(tvdate string) row format delimited fields terminated by '@'   location '/home/hive/warehouse/columnlog_hour/';

创建columnlog_day外部表

create external table columnlog_day(tvcolumn string,tvtime string,avgnum int,reachnum int ,tvrating double,reachrating double,marketshare double)  PARTITIONED BY(tvdate string) row format delimited fields terminated by '@'  location '/home/hive/warehouse/columnlog_day/';

创建channellog_count外部表

create external table channellog_count(tvchannel string,tvtime string,num int ,timelen int ) PARTITIONED BY(tvdate string) row format delimited fields terminated by '@' location '/home/hive/warehouse/channellog_count/';

创建columnlog_count外部表

create external table columnlog_count(tvcolumn string,tvtime string,num int ,timelen int ) PARTITIONED BY(tvdate string) row format delimited fields terminated by '@' location '/home/hive/warehouse/columnlog_count/';

启动mysql，再启动hive，依次创建表

②创建好表（一共是8张表）之后，往表里加载即导入对应数据。

1、

load data inpath ‘/home/ACCR/part-r-00000’ into table channellog_min
partition(tvdate=’2012-09-17’);

hive> select * from channellog_min;

2、把分钟表导入到小时表中同时在语句中计算当前在播数，收视率，市场份额，到达率等，里面集成了mapreduce过程，就不需要另外编写mapreduce函数

insert overwrite table channellog_hour partition (tvdate=’2012-09-17’)
select
tvchannel,tvtime,concat(substr(tvmin,0,2),’:00’),sum(avgnum)/count(),sum(reachnum)/count(),sum(tvrating)/count(),sum(reachrating)/count(),sum(marketshare)/count(*)
from channellog_min where tvdate=’2012-09-17’ group by
tvchannel,tvtime,concat(substr(tvmin,0,2),’:00’);

3、从频道的分钟表中查询数据插入到频道的天表

insert overwrite table channellog_day partition (tvdate=’2012-09-17’)
select tvchannel,tvtime,
sum(avgnum)/count(),sum(reachnum)/count(),sum(tvrating)/count(),sum(reachrating)/count(),sum(marketshare)/count(*)
from channellog_min where tvdate=’2012-09-17’ group by
tvchannel,tvtime;

.
4、加载收视时长相关数据到channel_count表中

load data inpath ‘/home/ECNAT/part-r-00000’ into table
channellog_count partition(tvdate=’2012-09-17’);

5、

load data inpath ‘/home/ACPR/part-r-00000’ into table columnlog_min
partition(tvdate=’2012-09-17’);

6、从节目的分钟表中查询数据插入到节目小时表

insert overwrite table columnlog_hour partition (tvdate=’2012-09-17’)
select
tvcolumn,tvtime,concat(substr(tvmin,0,2),’:00’),sum(avgnum)/count(),sum(reachnum)/count(),sum(tvrating)/count(),sum(reachrating)/count(),sum(marketshare)/count(*)
from columnlog_min where tvdate=’2012-09-17’ group by
tvcolumn,tvtime,concat(substr(tvmin,0,2),’:00’);

7、从节目的分钟表中查询数据插入到节目的天表

insert overwrite table columnlog_day partition (tvdate=’2012-09-17’)
select tvcolumn,tvtime,
sum(avgnum)/count(),sum(reachnum)/count(),sum(tvrating)/count(),sum(reachrating)/count(),sum(marketshare)/count(*)
from columnlog_min where tvdate=’2012-09-17’ group by
tvcolumn,tvtime;

8、加载收视时长相关数据到channel_count表中

load data inpath ‘/home/EPNAT/part-r-00000’ into table columnlog_count
partition(tvdate=’2012-09-17’);

十一、编写应用程序或者使用sqoop 工具将hive 分析的最终数据导入mysql 数据库。

1.使用sqlyog数据库管理工具链接mysql数据库，为广电项目建立一个专门的数据库

sqlyog的使用可参考 http://blog.csdn.net/zoeyen_/article/details/78722197

2.在MySQL中创建对应的表

注意：数据类型要对，否则导不进去

①channellog_min

sqoop export --connect 'jdbc:mysql://pc1/tv?useUnicode=true&characterEncoding=utf-8' --username hive --password hive --table channellog_min --columns tvchannel,tvtime,tvhour,avgnum,reachnum,tvrating,reachrating,marketshare --export-dir /home/hive/warehouse/channellog_min/tvdate=2012-09-17 --input-fields-terminated-by '@'

②channellog_hour

sqoop export --connect 'jdbc:mysql://pc1/tv?useUnicode=true&characterEncoding=utf-8' --username hive --password hive --table channellog_hour --columns tvchannel,tvtime,tvhour,avgnum,reachnum,tvrating,reachrating,marketshare --export-dir /home/hive/warehouse/channellog_hour/tvdate=2012-09-17 --input-fields-terminated-by '@'

③channellog_day

sqoop export --connect 'jdbc:mysql://pc1/tv?useUnicode=true&characterEncoding=utf-8' --username hive --password hive --table channellog_day --columns tvchannel,tvtime,avgnum,reachnum,tvrating,reachrating,marketshare --export-dir /home/hive/warehouse/channellog_day/tvdate=2012-09-17 --input-fields-terminated-by '@'

④channellog_count

sqoop export --connect 'jdbc:mysql://pc1/tv?useUnicode=true&characterEncoding=utf-8' --username hive --password hive --table channellog_count --columns tvchannel,tvtime,num,timelen --export-dir /home/hive/warehouse/channellog_count/tvdate=2012-09-17 --input-fields-terminated-by '@'

⑤columnlog_min

sqoop export --connect 'jdbc:mysql://pc1/tv?useUnicode=true&characterEncoding=utf-8' --username hive --password hive --table columnlog_min --columns tvcolumn,tvtime,tvhour,avgnum,reachnum,tvrating,reachrating,marketshare --export-dir /home/hive/warehouse/columnlog_min/tvdate=2012-09-17 --input-fields-terminated-by '@'

⑥columnlog_hour

sqoop export --connect 'jdbc:mysql://pc1/tv?useUnicode=true&characterEncoding=utf-8' --username hive --password hive --table columnlog_hour --columns tvcolumn,tvtime,tvhour,avgnum,reachnum,tvrating,reachrating,marketshare --export-dir /home/hive/warehouse/columnlog_hour/tvdate=2012-09-17 --input-fields-terminated-by '@'

⑦columnlog_day

sqoop export --connect 'jdbc:mysql://pc1/tv?useUnicode=true&characterEncoding=utf-8' --username hive --password hive --table columnlog_day --columns tvcolumn,tvtime,avgnum,reachnum,tvrating,reachrating,marketshare --export-dir /home/hive/warehouse/columnlog_day/tvdate=2012-09-17 --input-fields-terminated-by '@'

⑧columnlog_count

sqoop export --connect 'jdbc:mysql://pc1/tv?useUnicode=true&characterEncoding=utf-8' --username hive --password hive --table columnlog_count --columns tvcolumn,tvtime,num,timelen --export-dir /home/hive/warehouse/columnlog_count/tvdate=2012-09-17 --input-fields-terminated-by '@'

十二、使用Azkaban实现自动化流程

azkaban的安装配置参考
http://blog.csdn.net/zoeyen_/article/details/79301684

编写脚本文件

将脚本文件打包为一个zip压缩包

登录azkaban，新建一个project

将zip压缩包上传到项目下

查看sqoop的工作流

如果执行失败，查看日志

执行成功，查看详细

查看数据库结果

ok！

你可能感兴趣的:(hadoop)

浅谈MapReduce Android路上的人 Hadoop 分布式计算 mapreduce 分布式框架 hadoop
从今天开始，本人将会开始对另一项技术的学习，就是当下炙手可热的Hadoop分布式就算技术。目前国内外的诸多公司因为业务发展的需要，都纷纷用了此平台。国内的比如BAT啦，国外的在这方面走的更加的前面，就不一一列举了。但是Hadoop作为Apache的一个开源项目，在下面有非常多的子项目，比如HDFS，HBase,Hive，Pig,等等，要先彻底学习整个Hadoop，仅仅凭借一个的力量，是远远不够的。
Hadoop 傲雪凌霜，松柏长青后端大数据 hadoop 大数据分布式
ApacheHadoop是一个开源的分布式计算框架，主要用于处理海量数据集。它具有高度的可扩展性、容错性和高效的分布式存储与计算能力。Hadoop核心由四个主要模块组成，分别是HDFS（分布式文件系统）、MapReduce（分布式计算框架）、YARN（资源管理）和HadoopCommon（公共工具和库）。1.HDFS（HadoopDistributedFileSystem）HDFS是Hadoop生
Hadoop架构 henan程序媛 hadoop 大数据分布式
一、案列分析1.1案例概述现在已经进入了大数据(BigData)时代，数以万计用户的互联网服务时时刻刻都在产生大量的交互，要处理的数据量实在是太大了，以传统的数据库技术等其他手段根本无法应对数据处理的实时性、有效性的需求。HDFS顺应时代出现，在解决大数据存储和计算方面有很多的优势。1.2案列前置知识点1.什么是大数据大数据是指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的大量数据集合，
分享一个基于python的电子书数据采集与可视化分析 hadoop电子书数据分析与推荐系统 spark大数据毕设项目（源码、调试、LW、开题、PPT) 计算机源码社 Python项目大数据大数据 python hadoop 计算机毕业设计选题计算机毕业设计源码数据分析 spark毕设
作者：计算机源码社个人简介：本人八年开发经验，擅长Java、Python、PHP、.NET、Node.js、Android、微信小程序、爬虫、大数据、机器学习等，大家有这一块的问题可以一起交流！学习资料、程序开发、技术解答、文档报告如需要源码，可以扫取文章下方二维码联系咨询Java项目微信小程序项目Android项目Python项目PHP项目ASP.NET项目Node.js项目选题推荐项目实战|p
hbase介绍 CrazyL- 云计算+大数据 hbase
hbase是一个分布式的、多版本的、面向列的开源数据库hbase利用hadoophdfs作为其文件存储系统，提供高可靠性、高性能、列存储、可伸缩、实时读写、适用于非结构化数据存储的数据库系统hbase利用hadoopmapreduce来处理hbase、中的海量数据hbase利用zookeeper作为分布式系统服务特点：数据量大：一个表可以有上亿行，上百万列（列多时，插入变慢）面向列：面向列（族）的
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
Spark集群的三种模式 MelodyYN #Spark spark hadoop big data
文章目录1、Spark的由来1.1Hadoop的发展1.2MapReduce与Spark对比2、Spark内置模块3、Spark运行模式3.1Standalone模式部署配置历史服务器配置高可用运行模式3.2Yarn模式安装部署配置历史服务器运行模式4、WordCount案例1、Spark的由来定义：Hadoop主要解决，海量数据的存储和海量数据的分析计算。Spark是一种基于内存的快速、通用、可
月度总结 | 2022年03月 | 考研与就业的抉择 | 确定未来走大数据开发路线「已注销」个人总结 hadoop
一、时间线梳理3月3日，寻找到同专业的就业伙伴3月5日，着手准备Java八股文，决定先走Java后端路线3月8月，申请到了校图书馆的考研专座，决定暂时放弃就业，先准备考研，买了数学和408的资料书3月9日-3月13日，因疫情原因，宿舍区暂封，这段时间在准备考研，发现内容特别多3月13日-3月19日，大部分时间在刷Hadoop、Zookeeper、Kafka的视频，同时在准备实习的项目3月20日，退
HBase介绍 mingyu1016 数据库
概述HBase是一个分布式的、面向列的开源数据库,源于google的一篇论文《bigtable：一个结构化数据的分布式存储系统》。HBase是GoogleBigtable的开源实现，它利用HadoopHDFS作为其文件存储系统，利用HadoopMapReduce来处理HBase中的海量数据，利用Zookeeper作为协同服务。HBase的表结构HBase以表的形式存储数据。表有行和列组成。列划分为
Java中的大数据处理框架对比分析省赚客app开发者 java 开发语言
Java中的大数据处理框架对比分析大家好，我是微赚淘客系统3.0的小编，是个冬天不穿秋裤，天冷也要风度的程序猿！今天，我们将深入探讨Java中常用的大数据处理框架，并对它们进行对比分析。大数据处理框架是现代数据驱动应用的核心，它们帮助企业处理和分析海量数据，以提取有价值的信息。本文将重点介绍ApacheHadoop、ApacheSpark、ApacheFlink和ApacheStorm这四种流行的
Hadoop windows intelij 跑 MR WordCount piziyang12138
一、软件环境我使用的软件版本如下:IntellijIdea2017.1Maven3.3.9Hadoop分布式环境二、创建maven工程打开Idea,file->new->Project,左侧面板选择maven工程。(如果只跑MapReduce创建java工程即可，不用勾选Creatfromarchetype，如果想创建web工程或者使用骨架可以勾选)image.png设置GroupId和Artif
Hadoop学习第三课（HDFS架构--读、写流程）小小程序员呀~ 数据库 hadoop 架构 big data
1.块概念举例1：一桶水1000ml，瓶子的规格100ml=>需要10个瓶子装完一桶水1010ml，瓶子的规格100ml=>需要11个瓶子装完一桶水1010ml，瓶子的规格200ml=>需要6个瓶子装完块的大小规格，只要是需要存储，哪怕一点点，也是要占用一个块的块大小的参数：dfs.blocksize官方默认的大小为128M官网：https://hadoop.apache.org/docs/r3.
hadoop启动HDFS命令 m0_67401228 java 搜索引擎 linux 后端
启动命令：/hadoop/sbin/start-dfs.sh停止命令：/hadoop/sbin/stop-dfs.sh
【计算机毕设-大数据方向】基于Hadoop的电商交易数据分析可视化系统的设计与实现程序员-石头山大数据实战案例大数据 hadoop 毕业设计毕设
博主介绍：✌全平台粉丝5W+,高级大厂开发程序员，博客之星、掘金/知乎/华为云/阿里云等平台优质作者。【源码获取】关注并且私信我【联系方式】最下边感兴趣的可以先收藏起来，同学门有不懂的毕设选题，项目以及论文编写等相关问题都可以和学长沟通，希望帮助更多同学解决问题前言随着电子商务行业的迅猛发展，电商平台积累了海量的数据资源，这些数据不仅包括用户的基本信息、购物记录，还包括用户的浏览行为、评价反馈等多
分布式离线计算—Spark—基础介绍测试开发abbey 人工智能—大数据
原文作者：饥渴的小苹果原文地址：【Spark】Spark基础教程目录Spark特点Spark相对于Hadoop的优势Spark生态系统Spark基本概念Spark结构设计Spark各种概念之间的关系Executor的优点Spark运行基本流程Spark运行架构的特点Spark的部署模式Spark三种部署方式Hadoop和Spark的统一部署摘要：Spark是基于内存计算的大数据并行计算框架Spar
spark常用命令我是浣熊的微笑 spark
查看报错日志：yarnlogsapplicationIDspark2-submit--masteryarn--classcom.hik.ReadHdfstest-1.0-SNAPSHOT.jar进入$SPARK_HOME目录，输入bin/spark-submit--help可以得到该命令的使用帮助。hadoop@wyy:/app/hadoop/spark100$bin/spark-submit--
spark启动命令学不会又听不懂 spark 大数据分布式
hadoop启动：cd/root/toolssstart-dfs.sh，只需在hadoop01上启动stop-dfs.sh日志查看：cat/root/toolss/hadoop/logs/hadoop-root-datanode-hadoop03.outzookeeper启动：cd/root/toolss/zookeeperbin/zkServer.shstart，三台都要启动bin/zkServ
编程常用命令总结 Yellow0523 Linux BigData 大数据
编程命令大全1.软件环境变量的配置JavaScalaSparkHadoopHive2.大数据软件常用命令Spark基本命令Spark-SQL命令Hive命令HDFS命令YARN命令Zookeeper命令kafka命令Hibench命令MySQL命令3.Linux常用命令Git命令conda命令pip命令查看Linux系统的详细信息查看Linux系统架构(X86还是ARM，两种方法都可)端口号命令L
Hadoop常见面试题整理及解答叶青舟 Linux hdfs 大数据 hadoop linux
Hadoop常见面试题整理及解答一、基础知识篇：1.把数据仓库从传统关系型数据库转到hadoop有什么优势？答：（1）关系型数据库成本高，且存储空间有限。而Hadoop使用较为廉价的机器存储数据，且Hadoop可以将大量机器构建成一个集群，并在集群中使用HDFS文件系统统一管理数据，极大的提高了数据的存储及处理能力。（2）关系型数据库仅支持标准结构化数据格式，Hadoop不仅支持标准结构化数据格式
2025毕业设计指南：如何用Hadoop构建超市进货推荐系统？大数据分析助力精准采购计算机编程指导师 Java实战集 Python实战集大数据实战集课程设计 hadoop 数据分析 spring boot java 进货 python
✍✍计算机编程指导师⭐⭐个人介绍：自己非常喜欢研究技术问题！专业做Java、Python、小程序、安卓、大数据、爬虫、Golang、大屏等实战项目。⛽⛽实战项目：有源码或者技术上的问题欢迎在评论区一起讨论交流！⚡⚡Java实战|SpringBoot/SSMPython实战项目|Django微信小程序/安卓实战项目大数据实战项目⚡⚡文末获取源码文章目录⚡⚡文末获取源码基于hadoop的超市进货推荐系
Hadoop Common 之序列化机制小解猫君之上 #Apache Hadoop
1.JavaSerializable序列化该序列化通过ObjectInputStream的readObject实现序列化，ObjectOutputStream的writeObject实现反序列化。这不过此种序列化虽然跨病态兼容性强，但是因为存储过多的信息，但是传输效率比较低，所以hadoop弃用它。（序列化信息包括这个对象的类，类签名，类的所有静态，费静态成员的值，以及他们父类都要被写入）publ
深入理解hadoop(一)----Common的实现----Configuration maoxiao_jsd 深入理解----hadoop
属本人个人原创，转载请注明,希望对大家有帮助！！一,hadoop的配置管理a,hadoop通过独有的Configuration处理配置信息Configurationconf=newConfiguration();conf.addResource("core-default.xml");conf.addResource("core-site.xml");后者会覆盖前者中未final标记的相同配置项b
hadoop 0.22.0 部署笔记 weixin_33701564 大数据 java 运维
为什么80%的码农都做不了架构师？>>>因为需要使用hbase，所以开始对hbase进行学习。hbase是部署在hadoop平台上的NOSql数据库，因此在部署hbase之前需要先部署hadoop。环境：redhat5、hadoop-0.22.0.tar.gz、jdk-6u13-linux-i586.zipip192.168.1.128hostname：localhost.localdomain（
解决Windows环境下hadoop集群的运行_window运行hadoop,unknown hadoop01(4) 2401_84160087 大数据面试学习
网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是浅尝辄止，不再深入研究，那么很难做到真正的技术提升。需要这份系统化资料的朋友，可以戳这里获取一个人可以走的很快，但一群人才能走的更远！不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人，都欢迎加入我们的的圈子（技术交流、学习资源、职场吐槽、大厂内推、面试辅导），让我们一起学习成长！org.apache.hadoophadoop-com
解决Windows环境下hadoop集群的运行_window运行hadoop,unknown hadoop01(3) 2401_84160087 大数据面试学习
网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是浅尝辄止，不再深入研究，那么很难做到真正的技术提升。需要这份系统化资料的朋友，可以戳这里获取一个人可以走的很快，但一群人才能走的更远！不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人，都欢迎加入我们的的圈子（技术交流、学习资源、职场吐槽、大厂内推、面试辅导），让我们一起学习成长！xmlns:xsi="http://www.w3.or
深入解析HDFS：定义、架构、原理、应用场景及常用命令 CloudJourney hdfs 架构 hadoop
引言Hadoop分布式文件系统（HDFS，HadoopDistributedFileSystem）是Hadoop框架的核心组件之一，它提供了高可靠性、高可用性和高吞吐量的大规模数据存储和管理能力。本文将从HDFS的定义、架构、工作原理、应用场景以及常用命令等多个方面进行详细探讨，帮助读者全面深入地了解HDFS。1.HDFS的定义1.1什么是HDFSHDFS是Hadoop生态系统中的一个分布式文件系
Hadoop的搭建流程 lzhlizihang hadoop 大数据分布式
文章目录一、配置IP二、配置主机名三、配置主机映射四、关闭防火墙五、配置免密六、安装jdk1、第一步：2、第二步：3、第三步：4、第四步：5、第五步：七、安装hadoop1、上传2、解压3、重命名4、开始配置环境变量5、刷新配置文件6、验证hadoop命令是否可以识别八、全分布搭建7、修改配置文件core-site.xml8、修改配置文件hdfs-site.xml9、修改配置文件hadoop-en
hive搭建 -----内嵌模式和本地模式 lzhlizihang hive hadoop
文章目录一、内嵌模式（使用较少）1、上传、解压、重命名2、配置环境变量3、配置conf下的hive-env.sh4、修改conf下的hive-site.xml5、启动hadoop集群6、给hdfs创建文件夹7、修改hive-site.xml中的非法字符8、初始化元数据9、测试是否成功10、内嵌模式的缺点二、本地模式（最常用）1、检查mysql是否正常2、上传、解压、重命名3、配置环境变量4、修改c
Hadoop之mapreduce -- WrodCount案例以及各种概念 lzhlizihang hadoop mapreduce 大数据
文章目录一、MapReduce的优缺点二、MapReduce案例--WordCount1、导包2、Mapper方法3、Partitioner方法（自定义分区器）4、reducer方法5、driver（main方法）6、Writable（手机流量统计案例的实体类）三、关于片和块1、什么是片，什么是块？2、mapreduce启动多少个MapTask任务？四、MapReduce的原理五、Shuffle过
IAAS: IT公司去IOE-Alibaba系统构架解读 wishchin 心理学/职业 BigDataMini Spark PaaS
从Hadoop到自主研发，技术解读阿里去IOE后的系统架构原地址：......................云计算阿里飞天摘要：从IOE时代，到Hadoop与飞天并行，再到飞天单集群5000节点的实现，阿里一直摸索在技术衍变的前沿。这里，我们将从架构、性能、运维等多个方面深入了解阿里基础设施。【导读】互联网的普及，智能终端的增加，大数据时代悄然而至。在这个数据为王的时代，数十倍、数百倍的数据给各
jvm调优总结（从基本概念到深度优化） oloz java jvm jdk 虚拟机应用服务器
JVM参数详解：http://www.cnblogs.com/redcreen/archive/2011/05/04/2037057.html Java虚拟机中，数据类型可以分为两类：基本类型和引用类型。基本类型的变量保存原始值，即：他代表的值就是数值本身；而引用类型的变量保存引用值。“引用值”代表了某个对象的引用，而不是对象本身，对象本身存放在这个引用值所表示的地址的位置。
【Scala十六】Scala核心十：柯里化函数 bit1129 scala
本篇文章重点说明什么是函数柯里化，这个语法现象的背后动机是什么，有什么样的应用场景，以及与部分应用函数(Partial Applied Function)之间的联系 1. 什么是柯里化函数 A way to write functions with multiple parameter lists. For instance def f(x: Int)(y: Int) is a
HashMap dalan_123 java
HashMap在java中对很多人来说都是熟的；基于hash表的map接口的非同步实现。允许使用null和null键；同时不能保证元素的顺序；也就是从来都不保证其中的元素的顺序恒久不变。 1、数据结构在java中，最基本的数据结构无外乎：数组和引用（指针），所有的数据结构都可以用这两个来构造，HashMap也不例外，归根到底HashMap就是一个链表散列的数据
Java Swing如何实时刷新JTextArea，以显示刚才加append的内容周凡杨 java 更新 swing JTextArea
在代码中执行完textArea.append("message")后，如果你想让这个更新立刻显示在界面上而不是等swing的主线程返回后刷新，我们一般会在该语句后调用textArea.invalidate()和textArea.repaint()。问题是这个方法并不能有任何效果，textArea的内容没有任何变化，这或许是swing的一个bug，有一个笨拙的办法可以实现
servlet或struts的Action处理ajax请求 g21121 servlet
其实处理ajax的请求非常简单，直接看代码就行了： //如果用的是struts //HttpServletResponse response = ServletActionContext.getResponse(); // 设置输出为文字流 response.setContentType("text/plain"); // 设置字符集 res
FineReport的公式编辑框的语法简介老A不折腾 finereport 公式总结
FINEREPORT用到公式的地方非常多，单元格（以=开头的便被解析为公式），条件显示，数据字典，报表填报属性值定义，图表标题，轴定义，页眉页脚，甚至单元格的其他属性中的鼠标悬浮提示内容都可以写公式。简单的说下自己感觉的公式要注意的几个地方： 1.if语句语法刚接触感觉比较奇怪，if(条件式子,值1,值2)，if可以嵌套，if(条件式子1，值1，if(条件式子2，值2，值3)
linux mysql 数据库乱码的解决办法墙头上一根草 linux mysql 数据库乱码
linux 上mysql数据库区分大小写的配置 lower_case_table_names=1 1-不区分大小写 0-区分大小写修改/etc/my.cnf 具体的修改内容如下: [client] default-character-set=utf8 [mysqld] datadir=/var/lib/mysql socket=/va
我的spring学习笔记6-ApplicationContext实例化的参数兼容思想 aijuans Spring 3
ApplicationContext能读取多个Bean定义文件，方法是： ApplicationContext appContext = new ClassPathXmlApplicationContext（ new String[]｛“bean-config1.xml”，“bean-config2.xml”，“bean-config3.xml”，“bean-config4.xml
mysql 基准测试之sysbench annan211 基准测试 mysql基准测试 MySQL测试 sysbench
1 执行如下命令，安装sysbench-0.5： tar xzvf sysbench-0.5.tar.gz cd sysbench-0.5 chmod +x autogen.sh ./autogen.sh ./configure --with-mysql --with-mysql-includes=/usr/local/mysql
sql的复杂查询使用案列与技巧百合不是茶 oracle sql 函数数据分页合并查询
本片博客使用的数据库表是oracle中的scott用户表; ------------------- 自然连接查询查询 smith 的上司(两种方法) &
深入学习Thread类 bijian1013 java thread 多线程 java多线程
一．线程的名字下面来看一下Thread类的name属性，它的类型是String。它其实就是线程的名字。在Thread类中，有String getName()和void setName(String)两个方法用来设置和获取这个属性的值。同时，Thr
JSON串转换成Map以及如何转换到对应的数据类型 bijian1013 java fastjson net.sf.json
在实际开发中，难免会碰到JSON串转换成Map的情况，下面来看看这方面的实例。另外，由于fastjson只支持JDK1.5及以上版本，因此在JDK1.4的项目中可以采用net.sf.json来处理。一.fastjson实例 JsonUtil.java package com.study; impor
【RPC框架HttpInvoker一】HttpInvoker：Spring自带RPC框架 bit1129 spring
HttpInvoker是Spring原生的RPC调用框架，HttpInvoker同Burlap和Hessian一样，提供了一致的服务Exporter以及客户端的服务代理工厂Bean，这篇文章主要是复制粘贴了Hessian与Spring集成一文，【RPC框架Hessian四】Hessian与Spring集成在【RPC框架Hessian二】Hessian 对象序列化和反序列化一文中
【Mahout二】基于Mahout CBayes算法的20newsgroup的脚本分析 bit1129 Mahout
#!/bin/bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information re
nginx三种获取用户真实ip的方法 ronin47
随着nginx的迅速崛起，越来越多公司将apache更换成nginx. 同时也越来越多人使用nginx作为负载均衡, 并且代理前面可能还加上了CDN加速，但是随之也遇到一个问题：nginx如何获取用户的真实IP地址,如果后端是apache,请跳转到<apache获取用户真实IP地址>，如果是后端真实服务器是nginx，那么继续往下看。实例环境：用户IP 120.22.11.11
java-判断二叉树是不是平衡 bylijinnan java
参考了 http://zhedahht.blog.163.com/blog/static/25411174201142733927831/ 但是用java来实现有一个问题。由于Java无法像C那样“传递参数的地址，函数返回时能得到参数的值”，唯有新建一个辅助类：AuxClass import ljn.help.*; public class BalancedBTree {
BeanUtils.copyProperties VS PropertyUtils.copyProperties 诸葛不亮 PropertyUtils BeanUtils
BeanUtils.copyProperties VS PropertyUtils.copyProperties 作为两个bean属性copy的工具类，他们被广泛使用，同时也很容易误用，给人造成困然；比如：昨天发现同事在使用BeanUtils.copyProperties copy有integer类型属性的bean时，没有考虑到会将null转换为0，而后面的业
[金融与信息安全]最简单的数据结构最安全 comsci 数据结构
现在最流行的数据库的数据存储文件都具有复杂的文件头格式，用操作系统的记事本软件是无法正常浏览的，这样的情况会有什么问题呢？从信息安全的角度来看，如果我们数据库系统仅仅把这种格式的数据文件做异地备份，如果相同版本的所有数据库管理系统都同时被攻击，那么
vi区段删除 Cwind linux vi 区段删除
区段删除是编辑和分析一些冗长的配置文件或日志文件时比较常用的操作。简记下vi区段删除要点备忘。 vi概述引文中并未将末行模式单独列为一种模式。单不单列并不重要，能区分命令模式与末行模式即可。 vi区段删除步骤： 1. 在末行模式下使用:set nu显示行号非必须，随光标移动vi右下角也会显示行号，能够正确找到并记录删除开始行
清除tomcat缓存的方法总结 dashuaifu tomcat 缓存
用tomcat容器，大家可能会发现这样的问题，修改jsp文件后，但用IE打开依然是以前的Jsp的页面。出现这种现象的原因主要是tomcat缓存的原因。解决办法如下: 在jsp文件头加上 <meta http-equiv="Expires" content="0"> <meta http-equiv="kiben&qu
不要盲目的在项目中使用LESS CSS dcj3sjt126com Web less
　如果你还不知道LESS CSS是什么东西，可以看一下这篇文章，是我一朋友写给新人看的《CSS——LESS》　　不可否认，LESS CSS是个强大的工具，它弥补了css没有变量、无法运算等一些“先天缺陷”，但它似乎给我一种错觉，就是为了功能而实现功能。　　比如它的引用功能 ? .rounded_corners{
[入门]更上一层楼 dcj3sjt126com PHP yii2
更上一层楼通篇阅读完整个“入门”部分，你就完成了一个完整 Yii 应用的创建。在此过程中你学到了如何实现一些常用功能，例如通过 HTML 表单从用户那获取数据，从数据库中获取数据并以分页形式显示。你还学到了如何通过 Gii 去自动生成代码。使用 Gii 生成代码把 Web 开发中多数繁杂的过程转化为仅仅填写几个表单就行。本章将介绍一些有助于更好使用 Yii 的资源：
Apache HttpClient使用详解 eksliang httpclient http协议
Http协议的重要性相信不用我多说了，HttpClient相比传统JDK自带的URLConnection，增加了易用性和灵活性（具体区别，日后我们再讨论），它不仅是客户端发送Http请求变得容易，而且也方便了开发人员测试接口（基于Http协议的），即提高了开发的效率，也方便提高代码的健壮性。因此熟练掌握HttpClient是很重要的必修内容，掌握HttpClient后，相信对于Http协议的了解会
zxing二维码扫描功能 gundumw100 android zxing
经常要用到二维码扫描功能现给出示例代码 import com.google.zxing.WriterException; import com.zxing.activity.CaptureActivity; import com.zxing.encoding.EncodingHandler; import android.app.Activity; import an
纯HTML+CSS带说明的黄色导航菜单 ini html Web html5 css hovertree
HoverTree带说明的CSS菜单:纯HTML+CSS结构链接带说明的黄色导航在线体验效果：http://hovertree.com/texiao/css/1.htm代码如下,保存到HTML文件可以看到效果： <!DOCTYPE html > <html > <head> <title>HoverTree
fastjson初始化对性能的影响 kane_xie fastjson 序列化
之前在项目中序列化是用thrift，性能一般，而且需要用编译器生成新的类，在序列化和反序列化的时候感觉很繁琐，因此想转到json阵营。对比了jackson，gson等框架之后，决定用fastjson，为什么呢，因为看名字感觉很快。。。网上的说法： fastjson 是一个性能很好的 Java 语言实现的 JSON 解析器和生成器，来自阿里巴巴的工程师开发。
基于Mybatis封装的增删改查实现通用自动化sql mengqingyu DAO
1.基于map或javaBean的增删改查可实现不写dao接口和实现类以及xml，有效的提高开发速度。 2.支持自定义注解包括主键生成、列重复验证、列名、表名等 3.支持批量插入、批量更新、批量删除 <bean id="dynamicSqlSessionTemplate" class="com.mqy.mybatis.support.Dynamic
js控制input输入框的方法封装(数字，中文，字母，浮点数等) qifeifei javascript js
在项目开发的时候，经常有一些输入框，控制输入的格式，而不是等输入好了再去检查格式，格式错了就报错，体验不好。 /** 数字，中文，字母,浮点数(+/-/.) 类型输入限制，只要在input标签上加上 jInput="number,chinese,alphabet,floating" 备注：floating属性只能单独用*/ funct
java 计时器应用 tangqi609567707 java timer
mport java.util.TimerTask; import java.util.Calendar; public class MyTask extends TimerTask { private static final int
erlang输出调用栈信息 wudixiaotie erlang
在erlang otp的开发中，如果调用第三方的应用，会有有些错误会不打印栈信息，因为有可能第三方应用会catch然后输出自己的错误信息，所以对排查bug有很大的阻碍，这样就要求我们自己打印调用的栈信息。用这个函数：erlang:process_display (self (), backtrace).需要注意这个函数只会输出到标准错误输出。也可以用这个函数：erlang:get_s