日志打印规范

本文不讨论使用什么日志框架（通常是slf4j+lo4j/log4j2/logback）或者如何使用，重点讨论在什么情况下需要打印日志，如何打印日志，用什么样的日志级别，打印哪些关键信息，用什么格式打印等。

为什么要规范日志打印

为什么需要日志？因为我们需要知道代码内部的运行状态，快速定位运行情况。为什么需要规范日志？希望在规范化后，提高日志的可读性，方便终端查询，及后续用日志分析工具也更方便。

如何做能打出规范的日志

日志打印（java）分为两个部分，一部分是日志库打印的元信息（时间、方法、进程、行号等），另一部分是业务中重要的流程和变量值，其中元信息可以通过规范日志框架的conversion pattern来规范，业务中的重要的流程和变量需要一些约定，通过代码评审和peer追bug来持续改进。

日志打印建议

1.RD需要注意(以下英文摘自Google blog，Optimal Logging)
Good things to log:

Important startup configuration 重要的启动配置
Changes to persistent data 对持久化数据的更改
Requests and responses between major system components 主要系统组件间的请求和响应
Significant state changes 非常重要的状态转换（微餐厅重要状态变换都打印了日志）
User interaction 用户的交互
Calls with a known risk of failure 调用一个已知可能会很大概率失败的方法
Waits on conditions that could take measurable time to satisfy 在等待很久才会满足的条件时
Periodic progress during long-running tasks 运行很久的周期性的作业
Significant branch points of logic and conditions that led to the branch 重要的分支逻辑判断
Summaries of processing steps or events from high level functions - Avoid logging every step of a complex process in low-level functions. ::>_<::，翻译不了了，大概意思是：多步操作要从循环外总结后输出精简的日志，不要打印每一步操作日志

Bad things to log:

Function entry - Don’t log a function entry unless it is significant or logged at the debug level. 方法的入口－不要在方法的入口打印，除非非常重要，或者是dubug的级别（微餐厅借鉴一下）
Data within a loop - Avoid logging from many iterations of a loop. It is OK to log from iterations of small loops or to log periodically from large loops. 循环中的数据－避免在很大的迭代中打印log
Content of large messages or files - Truncate or summarize the data in some way that will be useful to debugging. 内容非常大的消息和文件－精简或者总结的有用的数据可以dubug打出
Benign errors - Errors that are not really errors can confuse the log reader. This sometimes happens when exception handling is part of successful execution flow. 又不会翻译了，良性的错误-不是真正错误的错误会混淆日志信息，当异常处理是成功执行流程的一部分时容易发生这种情况。总结的说就是：有些error不是真正的error，在打印error log的时候要考虑一下，不要看到异常就是error级别。
Repetitive errors - Do not repetitively log the same or similar error. This can quickly fill a log and hide the actual cause. Frequency of error types is best handled by monitoring. Logs only need to capture detail for some of those errors. 高频错误－不要重复的打印同样或者相似的错误。这样会马上添满文件，淹没真正错误的原因。错误的频率最好被监控。对于这样的一些错误只需要捕获打印detail信息。

附：谷歌文章：Optimal Logging ，翻不了墙，用360doc图书馆的 http://www.360doc.com/content/17/0502/14/42565517_650321240.shtml。

一些日志打印问题：

重要的信息用key=value格式输出，前后空格分离，是很多log处理工具(splunk, logentris)的推荐，netflix的zuul KV用:分割，我们也有很多用：分割KV的例子，这里统一使用k=v这种格式。
用{}的方式构造日志字符串信息，不要用+拼接，这样会提升一些性能，但在netfilx的zuul里也有很多是用+拼接的，但是在netflix eureka中全部使用{}。
能合并的日志合并的一行打印，不要分开的多次打印。

logger.debug("att:" + matcher.group());
logger.debug(", att key:" + key);
logger.debug(", att value:" + value);
//建议
logger.debug("att={},att key={},att value={}",matcher.group(),key,value)

同一个包或者类中，如果有大量重复格式的日志，用统一的方法打印，如：

logger.error(“Gearman timeout exception for request ” + getRequestID() + ” value: ” + value, e);
logger.error(“RequestID: ” + getRequestID() + “, Error Message: Gearman timeout exception: ” + e);
//建议
logger.error(getErrorMessage(getRequestID(), getErrorMessage(), e));

对于状态的变换，log4j2建议使用EventLogger打印，具体参考：http://logging.apache.org/log4j/2.x/manual/eventlogging.html
后续遇到在补充 xxx

2.项目owner需要注意

外部访问的请求（web，app），要有一个唯一标识（标识最好有语义），串起所有系统的日志信息，可以放在MDC/NDC/ThreadContext中，或者rpc框架中，便于日志追踪（现在已经有大型分布式系统的追踪系统这种概念了）
日志中除非出必要的中文参数，否则日志内容全用英文打印，装逼一下。
代码评审，让不好的日志打印习惯，从源头改正。
系统可以做到日志级别的动态调整（co-market-api已经做到）
日志库的conversion pattern用于规范业务无关的元信息打印，可根据具体情况设定，这里给几个知名库的使用作为参考：
netflix zuul：%5p %d{HH:mm:ss,SSS} %m%n
netflix eureka：%d %-5p %C:%L [%t] [%M] %m%n
根据经验日志打印行号时，服务性能会有所下降很多，如果出现日志导致的性能问题时，可以先去掉打印行号试试。

参考：LOGENTRIES日志工具日志打印实践
https://logentries.com/doc/best-practices-user-tracking/
https://logentries.com/doc/best-practices-logs/

日志打印规范