go readfile详解buffer

需要判断到底是文件不存在还是其他原因

注意:当文件不存在的时候返回的err是nil。

这里返回的data是[]byte。

func readAll(r io.Reader, capacity int64) (b []byte, err error) {
	var buf bytes.Buffer
	// If the buffer overflows, we will get bytes.ErrTooLarge.
	// Return that as an error. Any other panic remains.
	defer func() {
		e := recover()
		if e == nil {
			return
		}
		if panicErr, ok := e.(error); ok && panicErr == bytes.ErrTooLarge {
			err = panicErr
		} else {
			panic(e)
		}
	}()
	if int64(int(capacity)) == capacity {
		buf.Grow(int(capacity))
	}
	_, err = buf.ReadFrom(r)
	return buf.Bytes(), err
}

// MinRead is the minimum slice size passed to a Read call by
// Buffer.ReadFrom. As long as the Buffer has at least MinRead bytes beyond
// what is required to hold the contents of r, ReadFrom will not grow the
// underlying buffer.
const MinRead = 512

// ReadFrom reads data from r until EOF and appends it to the buffer, growing
// the buffer as needed. The return value n is the number of bytes read. Any
// error except io.EOF encountered during the read is also returned. If the
// buffer becomes too large, ReadFrom will panic with ErrTooLarge.
func (b *Buffer) ReadFrom(r io.Reader) (n int64, err error) {
	b.lastRead = opInvalid
	for {
		i := b.grow(MinRead)
		b.buf = b.buf[:i]
		m, e := r.Read(b.buf[i:cap(b.buf)])
		if m < 0 {
			panic(errNegativeRead)
		}

		b.buf = b.buf[:i+m]
		n += int64(m)
		if e == io.EOF {
			return n, nil // e is EOF, so return nil explicitly
		}
		if e != nil {
			return n, e
		}
	}
}

首先buf尝试增长,首先[i,cap(buf)]能够保证一次读取完整的切片直到末尾

,然后m应该是本次读取的字节数。i是shouldreadfrom,所以i应该是len(buffer)。

然后这个read是file的实现,其实就是一个系统调用

/ ReadFile reads the file named by filename and returns the contents.
// A successful call returns err == nil, not err == EOF. Because ReadFile
// reads the whole file, it does not treat an EOF from Read as an error
// to be reported.
func ReadFile(filename string) ([]byte, error) {
	f, err := os.Open(filename)
	if err != nil {
		return nil, err
	}
	defer f.Close()
	// It's a good but not certain bet that FileInfo will tell us exactly how much to
	// read, so let's try it but be prepared for the answer to be wrong.
	var n int64 = bytes.MinRead

	if fi, err := f.Stat(); err == nil {
		// As initial capacity for readAll, use Size + a little extra in case Size
		// is zero, and to avoid another allocation after Read has filled the
		// buffer. The readAll call will read into its allocated internal buffer
		// cheaply. If the size was wrong, we'll either waste some space off the end
		// or reallocate as needed, but in the overwhelmingly common case we'll get
		// it just right.
		if size := fi.Size() + bytes.MinRead; size > n {
			n = size
		}
	}
	return readAll(f, n)
}

其中对于大文件它会把file的size当作buffer的cap,这样就省去了不断的buffer的grow带来的干扰。

// Bytes returns a slice of length b.Len() holding the unread portion of the buffer.
// The slice is valid for use only until the next buffer modification (that is,
// only until the next call to a method like Read, Write, Reset, or Truncate).
// The slice aliases the buffer content at least until the next buffer modification,
// so immediate changes to the slice will affect the result of future reads.
func (b *Buffer) Bytes() []byte { return b.buf[b.off:] }

readALL返回的是buffer.bytes,这玩意还有一个断点续读的功能。

可惜在这儿读取文件没什么用处,因为我们可以看到每次都是新开一个buffer来读取文件的。

// Read reads up to len(b) bytes from the File.
// It returns the number of bytes read and any error encountered.
// At end of file, Read returns 0, io.EOF.
func (f *File) Read(b []byte) (n int, err error) {
	if err := f.checkValid("read"); err != nil {
		return 0, err
	}
	n, e := f.read(b)
	return n, f.wrapErr("read", e)
}

file的read实现,就是检查一下filapath是否为空而已。

至于为什么每次grow都必须以512字节为单位

从磁盘的物理结构来看存取信息的最小单位是扇区,一个扇区是512字节;

// grow grows the buffer to guarantee space for n more bytes.
// It returns the index where bytes should be written.
// If the buffer can't grow it will panic with ErrTooLarge.
func (b *Buffer) grow(n int) int {
	m := b.Len()
	// If buffer is empty, reset to recover space.
	if m == 0 && b.off != 0 {
		b.Reset()
	}
	// Try to grow by means of a reslice.
	if i, ok := b.tryGrowByReslice(n); ok {
		return i
	}
	if b.buf == nil && n <= smallBufferSize {
		b.buf = make([]byte, n, smallBufferSize)
		return 0
	}
	c := cap(b.buf)
	if n <= c/2-m {
		// We can slide things down instead of allocating a new
		// slice. We only need m+n <= c to slide, but
		// we instead let capacity get twice as large so we
		// don't spend all our time copying.
		copy(b.buf, b.buf[b.off:])
	} else if c > maxInt-c-n {
		panic(ErrTooLarge)
	} else {
		// Not enough space anywhere, we need to allocate.
		buf := makeSlice(2*c + n)
		copy(buf, b.buf[b.off:])
		b.buf = buf
	}
	// Restore b.off and len(b.buf).
	b.off = 0
	b.buf = b.buf[:m+n]
	return m
}

首先尝试tryGrowbyslice,传入的是512

// tryGrowByReslice is a inlineable version of grow for the fast-case where the
// internal buffer only needs to be resliced.
// It returns the index where bytes should be written and whether it succeeded.
func (b *Buffer) tryGrowByReslice(n int) (int, bool) {
	if l := len(b.buf); n <= cap(b.buf)-l {
		b.buf = b.buf[:l+n]
		return l, true
	}
	return 0, false
}

case1:如果说这个buffer剩余的空间还能有一个n的大小的话,那当然不用分配。

但是这里似乎没有考虑off的情况,所以说如果每次read都是很少的话他这个off数据其实还是能够保存的。不然off就没有了意义,如果每次grow都删除了

case2:如果说buffer为空且分配的为小于64字节的话,那就会直接申请64字节大小的空间分配;

应该跟页的大小有关

case3:如果c/2》=l-m

就是l-m就是已经读过了的部分。就是说如果已经读过了的部分满足大小的话,那就直接覆盖,将之前已经读过的清零。

事实上只要n《=c-m就行了,那这里为什么小于2/c呢??

那是因为还是不想全部的时间都花在copy上。就是说要求还是要有一定空间的,比他要求的空间更大。

case4:为了防止c太大

case5:没有足够的空间,就分配两倍的cap和原来要求的n然后再把通过copy把没有读取的删掉。

这个unreadpartion就是b.len()

copy(b.buf, b.buf[b.off:])

其实发现这个倍增形式的获取buffer基本在各个领域都是通用的;

而他这里严格了copy的条件其实也能理解,就是copy的如果刚好跟他想要申请的一样,那其实总的来说他遇到因为内存不够而复制的情况会大大增长。

所以说

don’t spend all our time copying

对于文件来说的话,如果说off一直得不到更新的话,

就是从一簇变成三簇再变成七簇。不断的copy,,,,十分耗时。

这样可以看出之前的readfile避免了grow的内存分配有多好了吧。

你可能感兴趣的:(go readfile详解buffer)