在debug hive的问题时,我们经常需要在源码里面增加必要的日志信息,来跟踪变量的变化或者是方法执行的情况,性能等。。hive里面有下面几个日志的实现方式。
1种是常用的通过
org.apache.commons.logging.Log
和
org.apache.commons.logging.LogFactory
类实现的。
比如要打印类的info信息时,只需要下面这样就可以。
static final private Log LOG = LogFactory.getLog(CLASS_NAME) LOG.info("xxxxxx");
如果要跟踪一个方法的执行时间,只需要在在方法调用前后各加一条日志信息记录当前时间戳,然后取差值即可。
另一种是LogHelper
org.apache.hadoop.hive.ql.session.SessionState.LogHelper
LogHelper类是SessionState的内部类
输出日志最终调用的是LOG.xxx和对应日志输出的println方法(比如System.out.println/log4j等)
但是日志只有两个级别,info和error。
以printInfo方法为例,具体实现:
public void printInfo(String info, String detail) { if (!getIsSilent()) { getInfoStream().println(info); } LOG.info(info + StringUtils.defaultString(detail)); //调用LOG.info(即第一种方法) }
使用的sample:
static final private LogHelper console = new LogHelper (LOG); console.printError("Authorization failed:" + authExp.getMessage() + ". Use SHOW GRANT to get more details." );
还有一种重要的类是PerfLogger类,可以实现记录方法执行时间的信息,在一些情况下可以取代第一种记录性能的方法。
PerfLogger类主要是用在Driver类的compile方法中,用来记录各个step使用的时间。
看下具体实现:
首先声明存储starttime和endtime的map:
protected final Map<String, Long> startTimes = new HashMap<String, Long>(); protected final Map<String, Long> endTimes = new HashMap<String, Long>();
主要方法:
1).构造方法可以传入一个boolean参数,控制是否reset(默认是false)
protected static final ThreadLocal<PerfLogger> perfLogger = new ThreadLocal<PerfLogger>(); public static PerfLogger getPerfLogger(boolean resetPerfLogger) { if (SessionState.get() == null) { if (perfLogger.get() == null || resetPerfLogger) { perfLogger.set(new PerfLogger()); } return perfLogger.get(); } else { return SessionState.get().getPerfLogger(resetPerfLogger); } }
2).PerfLogBegin 用来记录开始时间,主要是使用了LOG.info(第一种记录日志的方法)
public void PerfLogBegin(String callerName, String method) { long startTime = System.currentTimeMillis(); LOG.info( "<PERFLOG method=" + method + " from=" + callerName + ">"); startTimes.put(method, new Long(startTime)); //将方法和对应的startTime放在startTimes的map中 }
3).相应的PerfLogEnd用来记录结束时间
public long PerfLogEnd(String callerName, String method) { Long startTime = startTimes.get(method); long endTime = System. currentTimeMillis(); long duration = -1; endTimes.put(method, new Long(endTime)); StringBuilder sb = new StringBuilder("</PERFLOG method=" ).append(method); if (startTime != null) { sb.append( " start=").append(startTime); } sb.append(" end=").append(endTime); if (startTime != null) { duration = endTime - startTime.longValue(); sb.append( " duration=").append(duration); } sb.append(" from=").append(callerName).append( ">"); LOG.info(sb); return duration; }
getStartTime和getEndTime分别通过对应的map来获取starttime和endtime
4.是不是所有的method都可以用这种方法获取执行的性能?答案是no,在PerfLogger类中定义了部分属性,对应于method
public static final String ACQUIRE_READ_WRITE_LOCKS = "acquireReadWriteLocks"; public static final String COMPILE = "compile"; public static final String PARSE = "parse"; public static final String ANALYZE = "semanticAnalyze"; public static final String DO_AUTHORIZATION = "doAuthorization"; public static final String DRIVER_EXECUTE = "Driver.execute"; public static final String INPUT_SUMMARY = "getInputSummary"; public static final String GET_SPLITS = "getSplits"; public static final String RUN_TASKS = "runTasks"; public static final String SERIALIZE_PLAN = "serializePlan"; public static final String DESERIALIZE_PLAN = "deserializePlan"; public static final String CLONE_PLAN = "clonePlan"; public static final String TASK = "task."; public static final String RELEASE_LOCKS = "releaseLocks"; public static final String PRUNE_LISTING = "prune-listing"; public static final String PARTITION_RETRIEVING = "partition-retrieving"; public static final String PRE_HOOK = "PreHook."; public static final String POST_HOOK = "PostHook."; public static final String FAILURE_HOOK = "FailureHook."; public static final String DRIVER_RUN = "Driver.run"; public static final String TIME_TO_SUBMIT = "TimeToSubmit"; public static final String TEZ_SUBMIT_TO_RUNNING = "TezSubmitToRunningDag"; public static final String TEZ_BUILD_DAG = "TezBuildDag"; public static final String TEZ_SUBMIT_DAG = "TezSubmitDag"; public static final String TEZ_RUN_DAG = "TezRunDag"; public static final String TEZ_CREATE_VERTEX = "TezCreateVertex."; public static final String TEZ_RUN_VERTEX = "TezRunVertex."; public static final String TEZ_INITIALIZE_PROCESSOR = "TezInitializeProcessor"; public static final String TEZ_RUN_PROCESSOR = "TezRunProcessor"; public static final String TEZ_INIT_OPERATORS = "TezInitializeOperators"; public static final String LOAD_HASHTABLE = "LoadHashtable"; public static final String ORC_GET_SPLITS = "OrcGetSplits";
再来看一下其调用方式:
比如org.apache.hadoop.hive.ql.Driver类中的compile方法: static final private String CLASS_NAME = Driver. class.getName(); public int compile(String command, boolean resetTaskIds) { PerfLogger perfLogger = PerfLogger.getPerfLogger(); //实例化一个PerfLogger的对象 ....... perfLogger.PerfLogBegin( CLASS_NAME, PerfLogger.PARSE); //记录开始时间,传入本类的类名记录的方法(方法对应了PerfLogger类中对应的属性) ParseDriver pd = new ParseDriver(); ASTNode tree = pd.parse(command, ctx); tree = ParseUtils.findRootNonNullToken(tree); perfLogger.PerfLogEnd( CLASS_NAME, PerfLogger.PARSE);
记录parse消耗时间的对应日志:
14/07/31 14:33:21 INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver> ..... 14/07/31 14:33:21 INFO log.PerfLogger: </PERFLOG method=parse start=1406788401914 end=1406788401916 duration=2 from=org.apache.hadoop.hive.ql.Driver>
3种方法可以根据需要选择使用。