之前的一系列文章中主要追踪的是代码的执行逻辑和处理思路,进而熟悉代码,加深理解及使用。但是zap为什么快?它是怎么设计的?在熟悉代码后,再回头看看zap的设计思路,看看能否解答之前的一些疑惑。
zap README的Performance一节中提到:
For applications that log in the hot path, reflection-based serialization and
string formatting are prohibitively expensive — they're CPU-intensive and
make many small allocations. Put differently, using encoding/json and
fmt.Fprintf to log tons of interface{}s makes your application slow.
Zap takes a different approach. It includes a reflection-free, zero-allocation
JSON encoder, and the base Logger strives to avoid serialization overhead
and allocations wherever possible. By building the high-level SugaredLogger
on that foundation, zap lets users choose when they need to count every
allocation and when they'd prefer a more familiar, loosely typed API.
As measured by its own benchmarking suite, not only is zap more performant
than comparable structured logging packages — it's also faster than the
standard library. Like all benchmarks, take these with a grain of salt.1
挑重点说下:
基于反射的序列化和字符串格式化代价高昂,这些操作占用大量的CPU,并进行许多小的内存分配。所以用encoding/json和fmt.Fprintf来log大量的interface会使应用程序变慢(两者均涉及反射的处理)。zap的思路就是尽可能避免此类的序列化开销和分配,构造不基于反射的编码器(ReflectType的Filed除外,ReflectType建议仅用作扩展使用)。
zap的核心是encoder的实现,encoder负责log的序列化/格式化,其性能决定了整体的性能。
编码器实现的主思路是放弃直接对interface的支持,直接处理确切的类型,避免标准库中涉及的反射处理,提高性能。
func (enc *jsonEncoder) EncodeEntry(ent Entry, fields []Field) (*buffer.Buffer, error) {
final := enc.clone()
final.buf.AppendByte('{')
if final.LevelKey != "" {
final.addKey(final.LevelKey)
cur := final.buf.Len()
final.EncodeLevel(ent.Level, final)
if cur == final.buf.Len() {
// User-supplied EncodeLevel was a no-op. Fall back to strings to keep
// output JSON valid.
final.AppendString(ent.Level.String())
}
}
if final.TimeKey != "" {
final.AddTime(final.TimeKey, ent.Time)
}
if ent.LoggerName != "" && final.NameKey != "" {
final.addKey(final.NameKey)
cur := final.buf.Len()
nameEncoder := final.EncodeName
// if no name encoder provided, fall back to FullNameEncoder for backwards
// compatibility
if nameEncoder == nil {
nameEncoder = FullNameEncoder
}
nameEncoder(ent.LoggerName, final)
if cur == final.buf.Len() {
// User-supplied EncodeName was a no-op. Fall back to strings to
// keep output JSON valid.
final.AppendString(ent.LoggerName)
}
}
if ent.Caller.Defined && final.CallerKey != "" {
final.addKey(final.CallerKey)
cur := final.buf.Len()
final.EncodeCaller(ent.Caller, final)
if cur == final.buf.Len() {
// User-supplied EncodeCaller was a no-op. Fall back to strings to
// keep output JSON valid.
final.AppendString(ent.Caller.String())
}
}
if final.MessageKey != "" {
final.addKey(enc.MessageKey)
final.AppendString(ent.Message)
}
if enc.buf.Len() > 0 {
final.addElementSeparator()
final.buf.Write(enc.buf.Bytes())
}
addFields(final, fields)
final.closeOpenNamespaces()
if ent.Stack != "" && final.StacktraceKey != "" {
final.AddString(final.StacktraceKey, ent.Stack)
}
final.buf.AppendByte('}')
if final.LineEnding != "" {
final.buf.AppendString(final.LineEnding)
} else {
final.buf.AppendString(DefaultLineEnding)
}
ret := final.buf
putJSONEncoder(final)
return ret, nil
}
以上是JSON encoder的入口源码。从这些源码可以知道:
func (enc *jsonEncoder) addKey(key string) {
enc.addElementSeparator()
enc.buf.AppendByte('"')
enc.safeAddString(key)
enc.buf.AppendByte('"')
enc.buf.AppendByte(':')
if enc.spaced {
enc.buf.AppendByte(' ')
}
}
func (enc *jsonEncoder) AppendString(val string) {
enc.addElementSeparator()
enc.buf.AppendByte('"')
enc.safeAddString(val)
enc.buf.AppendByte('"')
}
直接根据对应的参数及类型直接处理,无需反射等过程,性能更高。
根据源码可知,顺序如下:
LevelKey->TimeKey->NameKey->CallerKey->MessageKey->[]Field->StacktraceKey->LineEnding
注意: 当对应的key为空时,不参与序列化。
除MessageKey对应的内容及[]Field外,其他参数均限定在Entry内,在创建logger时指定。
func (f Field) AddTo(enc ObjectEncoder) {
var err error
switch f.Type {
case ArrayMarshalerType:
err = enc.AddArray(f.Key, f.Interface.(ArrayMarshaler))
case ObjectMarshalerType:
err = enc.AddObject(f.Key, f.Interface.(ObjectMarshaler))
case BinaryType:
enc.AddBinary(f.Key, f.Interface.([]byte))
case BoolType:
enc.AddBool(f.Key, f.Integer == 1)
case ByteStringType:
enc.AddByteString(f.Key, f.Interface.([]byte))
case Complex128Type:
enc.AddComplex128(f.Key, f.Interface.(complex128))
case Complex64Type:
enc.AddComplex64(f.Key, f.Interface.(complex64))
case DurationType:
enc.AddDuration(f.Key, time.Duration(f.Integer))
case Float64Type:
enc.AddFloat64(f.Key, math.Float64frombits(uint64(f.Integer)))
case Float32Type:
enc.AddFloat32(f.Key, math.Float32frombits(uint32(f.Integer)))
case Int64Type:
enc.AddInt64(f.Key, f.Integer)
case Int32Type:
enc.AddInt32(f.Key, int32(f.Integer))
case Int16Type:
enc.AddInt16(f.Key, int16(f.Integer))
case Int8Type:
enc.AddInt8(f.Key, int8(f.Integer))
case StringType:
enc.AddString(f.Key, f.String)
case TimeType:
if f.Interface != nil {
enc.AddTime(f.Key, time.Unix(0, f.Integer).In(f.Interface.(*time.Location)))
} else {
// Fall back to UTC if location is nil.
enc.AddTime(f.Key, time.Unix(0, f.Integer))
}
case Uint64Type:
enc.AddUint64(f.Key, uint64(f.Integer))
case Uint32Type:
enc.AddUint32(f.Key, uint32(f.Integer))
case Uint16Type:
enc.AddUint16(f.Key, uint16(f.Integer))
case Uint8Type:
enc.AddUint8(f.Key, uint8(f.Integer))
case UintptrType:
enc.AddUintptr(f.Key, uintptr(f.Integer))
case ReflectType:
err = enc.AddReflected(f.Key, f.Interface)
case NamespaceType:
enc.OpenNamespace(f.Key)
case StringerType:
err = encodeStringer(f.Key, f.Interface, enc)
case ErrorType:
encodeError(f.Key, f.Interface.(error), enc)
case SkipType:
break
default:
panic(fmt.Sprintf("unknown field type: %v", f))
}
if err != nil {
enc.AddString(fmt.Sprintf("%sError", f.Key), err.Error())
}
}
func (enc *jsonEncoder) AppendString(val string) {
enc.addElementSeparator()
enc.buf.AppendByte('"')
enc.safeAddString(val)
enc.buf.AppendByte('"')
}
type jsonEncoder struct {
*EncoderConfig
buf *buffer.Buffer
spaced bool // include spaces after colons and commas
openNamespaces int
// for encoding generic values by reflection
reflectBuf *buffer.Buffer
reflectEnc *json.Encoder
}
type ObjectEncoder interface {
...
// AddReflected uses reflection to serialize arbitrary objects, so it's slow
// and allocation-heavy.
AddReflected(key string, value interface{}) error
...
}
func (enc *jsonEncoder) AddReflected(key string, obj interface{}) error {
enc.resetReflectBuf()
err := enc.reflectEnc.Encode(obj)
if err != nil {
return err
}
enc.reflectBuf.TrimNewline()
enc.addKey(key)
_, err = enc.buf.Write(enc.reflectBuf.Bytes())
return err
}
zap对Field对应的类型均有相应的封装处理,除ReflectType外,可直接将信息拼接至缓存中。ReflectType会调用标准库的序列化过程,涉及到了反射的使用,因此使用zap是在非必要的情况下强烈建议不要使用ReflectType类型的Field,这根本无法发挥zap的优势,反而因为多了处理过程,进一步影响性能。
if final.LevelKey != "" {
final.addKey(final.LevelKey)
cur := final.buf.Len()
final.EncodeLevel(ent.Level, final)
if cur == final.buf.Len() {
// User-supplied EncodeLevel was a no-op. Fall back to strings to keep
// output JSON valid.
final.AppendString(ent.Level.String())
}
}
func (c consoleEncoder) EncodeEntry(ent Entry, fields []Field) (*buffer.Buffer, error) {
line := bufferpool.Get()
// We don't want the entry's metadata to be quoted and escaped (if it's
// encoded as strings), which means that we can't use the JSON encoder. The
// simplest option is to use the memory encoder and fmt.Fprint.
//
// If this ever becomes a performance bottleneck, we can implement
// ArrayEncoder for our plain-text format.
arr := getSliceEncoder()
if c.TimeKey != "" && c.EncodeTime != nil {
c.EncodeTime(ent.Time, arr)
}
if c.LevelKey != "" && c.EncodeLevel != nil {
c.EncodeLevel(ent.Level, arr)
}
if ent.LoggerName != "" && c.NameKey != "" {
nameEncoder := c.EncodeName
if nameEncoder == nil {
// Fall back to FullNameEncoder for backward compatibility.
nameEncoder = FullNameEncoder
}
nameEncoder(ent.LoggerName, arr)
}
if ent.Caller.Defined && c.CallerKey != "" && c.EncodeCaller != nil {
c.EncodeCaller(ent.Caller, arr)
}
for i := range arr.elems {
if i > 0 {
line.AppendByte('\t')
}
fmt.Fprint(line, arr.elems[i])
}
putSliceEncoder(arr)
// Add the message itself.
if c.MessageKey != "" {
c.addTabIfNecessary(line)
line.AppendString(ent.Message)
}
// Add any structured context.
c.writeContext(line, fields)
// If there's no stacktrace key, honor that; this allows users to force
// single-line output.
if ent.Stack != "" && c.StacktraceKey != "" {
line.AppendByte('\n')
line.AppendString(ent.Stack)
}
if c.LineEnding != "" {
line.AppendString(c.LineEnding)
} else {
line.AppendString(DefaultLineEnding)
}
return line, nil
}
func (c consoleEncoder) writeContext(line *buffer.Buffer, extra []Field) {
context := c.jsonEncoder.Clone().(*jsonEncoder)
defer context.buf.Free()
addFields(context, extra)
context.closeOpenNamespaces()
if context.buf.Len() == 0 {
return
}
c.addTabIfNecessary(line)
line.AppendByte('{')
line.Write(context.buf.Bytes())
line.AppendByte('}')
}
CONSOLE encoder与JSON encoder在处理key的顺序上是一致的,也是通过直接拼接字符串实现的高性能。有以下几点可以注意下:
看过encoder的处理过程后,回过头来看看Config中的EncoderConfig的设计。
type EncoderConfig struct {
// Set the keys used for each log entry. If any key is empty, that portion
// of the entry is omitted.
MessageKey string `json:"messageKey" yaml:"messageKey"`
LevelKey string `json:"levelKey" yaml:"levelKey"`
TimeKey string `json:"timeKey" yaml:"timeKey"`
NameKey string `json:"nameKey" yaml:"nameKey"`
CallerKey string `json:"callerKey" yaml:"callerKey"`
StacktraceKey string `json:"stacktraceKey" yaml:"stacktraceKey"`
LineEnding string `json:"lineEnding" yaml:"lineEnding"`
// Configure the primitive representations of common complex types. For
// example, some users may want all time.Times serialized as floating-point
// seconds since epoch, while others may prefer ISO8601 strings.
EncodeLevel LevelEncoder `json:"levelEncoder" yaml:"levelEncoder"`
EncodeTime TimeEncoder `json:"timeEncoder" yaml:"timeEncoder"`
EncodeDuration DurationEncoder `json:"durationEncoder" yaml:"durationEncoder"`
EncodeCaller CallerEncoder `json:"callerEncoder" yaml:"callerEncoder"`
// Unlike the other primitive type encoders, EncodeName is optional. The
// zero value falls back to FullNameEncoder.
EncodeName NameEncoder `json:"nameEncoder" yaml:"nameEncoder"`
}
EncoderConfig限定了日志中常用的Level、Time、Msg、Caller等信息的key,且提供了对应的自定义的名称及encoder,均是为了通过限定了log的key、存储数据类型等信息,在编码时可以直接根据数据类型处理数据,无需反射,快捷高效。同时也是限定了规范,减少滥用参数的情况。
Logger通过Config限定了log常用参数的类型及key,通过这些key可以快速地进行json的序列化。整体上来看,看似Logger以牺牲一定的自由性换取足够高的性能,但考虑到log的格式化及Filed的扩展,足以满足绝大多数情况下的需求,而性能上的提高可以带来很大的优势。当然,zap为满足更大的灵活性,zap提供了SugaredLogger,不过性能上会比Logger差一点,在后面的章节中会再讨论。