Time: 2017.9.14
Targets: 对于用户活跃情况的数据
- 执行16年日志数据;
- HDFS导入HBase;
执行日志
数据格式
hadoop fs -ls /warehouse/orc_elapsed_log
/warehouse/orc_elapsed_log/dt=20160101
执行脚本,Java的Hive脚本。
cd wangchenlong/workspace/user-profile/processor/profile
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160101 20160131
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160201 20160229
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160301 20160331
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160401 20160430
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160501 20160731
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160801 20161031
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20161101 20161231
hadoop fs -ls /tmp/wangchenlong/log_event
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -h 20160601 20160731
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -h 20160912 20161031
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -h 20161214 20161231
Hive的Maven Jar包,与Orc包造成冲突,版本不同,导致类不同,一些方法找不到,参考。
java.lang.Exception: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.getDataColumnCount()
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.getDataColumnCount()
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1044)
原因是hive-exec和orc-mapreduce的hive-storage-api版本不同,导致VectorizedRowBatch类异常。
测试:
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_process.HiveMRDemo /tmp/wangchenlong/orc
解决方案,添加hive-storage-api,强制指定使用新的类。
org.apache.hive
hive-storage-api
2.4.0
导入HBase
HDFS导入HBase,查看表
hbash shell
list
desc 'cy_event'
scan 'cy_event', {LIMIT=>5} # 显示5个rowkey
表数据
user_time|1488384000000|29768601 column=info:assess_num, timestamp=1505384441438, value=3
user_time|1488384000000|29768601 column=info:duration, timestamp=1505384441438, value=42654
user_time|1488384000000|29768601 column=info:event_name, timestamp=1505384441438, value=user_time
user_time|1488384000000|29768601 column=info:event_time, timestamp=1505384441438, value=20170302_000000
user_time|1488384000000|29768601 column=info:login_zone, timestamp=1505384441438, value=0
user_time|1488384000000|29768601 column=info:uid, timestamp=1505384441438, value=29768601
执行数据,从HDFS导入HBase
cd wangchenlong/workspace/user-profile/processor/profile
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -p 20170101 20170331
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -p 20170401 20170731
hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -p 20170731 20170909
Processor业务类
public class UserTimeHBaseProcessor extends BaseSavedProcessor {
public static final String DEF_NAME = UserTimeHBaseProcessor.class.getSimpleName();
@Override protected void onProcess(LogEntity entity) {
super.onProcess(entity);
String line = entity.original_line;
String[] items = line.split("\\|");
if (items.length != 6) {
return;
}
Map map = new HashMap<>();
String uid = items[0];
String event_name = items[1];
String time = items[2];
Date date = LaDateUtils.parseWriteDate(time);
if (date == null) {
return;
}
String login_zone = items[3];
String duration = items[4];
String assess_num = items[5];
String rowKey = event_name + "|" + date.getTime() + "|" + uid;
map.put("uid", uid);
map.put("event_name", event_name);
map.put("event_time", time);
map.put("login_zone", login_zone);
map.put("duration", duration);
map.put("assess_num", assess_num);
saveHBase(rowKey, map);
}
}
注册Processor
public class ProcessorRegister extends BaseMainManager {
private static class Holder {
private static ProcessorRegister sInstance = new ProcessorRegister();
}
public static ProcessorRegister getInstance() {
return Holder.sInstance;
}
private ProcessorRegister() {
super();
//++++++++++++++++++++ 处理器添加位置 ++++++++++++++++++++/
// registerProcessor(UserTimeProcessor.DEF_NAME, new UserTimeProcessor());
registerProcessor(UserTimeHBaseProcessor.DEF_NAME, new UserTimeHBaseProcessor());
//++++++++++++++++++++ 处理器添加位置 ++++++++++++++++++++/
}
}
执行
case "-p":
main = new ProcessMain(args[1], args[2], LaValues.PathFormat.USER_TIME_PATH_FORMAT); // 进程模式
break;
使用Log_Analysis分析框架
OK, that's all!