需要判断到底是文件不存在还是其他原因
注意:当文件不存在的时候返回的err是nil。
这里返回的data是[]byte。
func readAll(r io.Reader, capacity int64) (b []byte, err error) {
var buf bytes.Buffer
// If the buffer overflows, we will get bytes.ErrTooLarge.
// Return that as an error. Any other panic remains.
defer func() {
e := recover()
if e == nil {
return
}
if panicErr, ok := e.(error); ok && panicErr == bytes.ErrTooLarge {
err = panicErr
} else {
panic(e)
}
}()
if int64(int(capacity)) == capacity {
buf.Grow(int(capacity))
}
_, err = buf.ReadFrom(r)
return buf.Bytes(), err
}
// MinRead is the minimum slice size passed to a Read call by
// Buffer.ReadFrom. As long as the Buffer has at least MinRead bytes beyond
// what is required to hold the contents of r, ReadFrom will not grow the
// underlying buffer.
const MinRead = 512
// ReadFrom reads data from r until EOF and appends it to the buffer, growing
// the buffer as needed. The return value n is the number of bytes read. Any
// error except io.EOF encountered during the read is also returned. If the
// buffer becomes too large, ReadFrom will panic with ErrTooLarge.
func (b *Buffer) ReadFrom(r io.Reader) (n int64, err error) {
b.lastRead = opInvalid
for {
i := b.grow(MinRead)
b.buf = b.buf[:i]
m, e := r.Read(b.buf[i:cap(b.buf)])
if m < 0 {
panic(errNegativeRead)
}
b.buf = b.buf[:i+m]
n += int64(m)
if e == io.EOF {
return n, nil // e is EOF, so return nil explicitly
}
if e != nil {
return n, e
}
}
}
首先buf尝试增长,首先[i,cap(buf)]能够保证一次读取完整的切片直到末尾
,然后m应该是本次读取的字节数。i是shouldreadfrom,所以i应该是len(buffer)。
然后这个read是file的实现,其实就是一个系统调用
/ ReadFile reads the file named by filename and returns the contents.
// A successful call returns err == nil, not err == EOF. Because ReadFile
// reads the whole file, it does not treat an EOF from Read as an error
// to be reported.
func ReadFile(filename string) ([]byte, error) {
f, err := os.Open(filename)
if err != nil {
return nil, err
}
defer f.Close()
// It's a good but not certain bet that FileInfo will tell us exactly how much to
// read, so let's try it but be prepared for the answer to be wrong.
var n int64 = bytes.MinRead
if fi, err := f.Stat(); err == nil {
// As initial capacity for readAll, use Size + a little extra in case Size
// is zero, and to avoid another allocation after Read has filled the
// buffer. The readAll call will read into its allocated internal buffer
// cheaply. If the size was wrong, we'll either waste some space off the end
// or reallocate as needed, but in the overwhelmingly common case we'll get
// it just right.
if size := fi.Size() + bytes.MinRead; size > n {
n = size
}
}
return readAll(f, n)
}
其中对于大文件它会把file的size当作buffer的cap,这样就省去了不断的buffer的grow带来的干扰。
// Bytes returns a slice of length b.Len() holding the unread portion of the buffer.
// The slice is valid for use only until the next buffer modification (that is,
// only until the next call to a method like Read, Write, Reset, or Truncate).
// The slice aliases the buffer content at least until the next buffer modification,
// so immediate changes to the slice will affect the result of future reads.
func (b *Buffer) Bytes() []byte { return b.buf[b.off:] }
readALL返回的是buffer.bytes,这玩意还有一个断点续读的功能。
可惜在这儿读取文件没什么用处,因为我们可以看到每次都是新开一个buffer来读取文件的。
// Read reads up to len(b) bytes from the File.
// It returns the number of bytes read and any error encountered.
// At end of file, Read returns 0, io.EOF.
func (f *File) Read(b []byte) (n int, err error) {
if err := f.checkValid("read"); err != nil {
return 0, err
}
n, e := f.read(b)
return n, f.wrapErr("read", e)
}
file的read实现,就是检查一下filapath是否为空而已。
至于为什么每次grow都必须以512字节为单位
从磁盘的物理结构来看存取信息的最小单位是扇区,一个扇区是512字节;
// grow grows the buffer to guarantee space for n more bytes.
// It returns the index where bytes should be written.
// If the buffer can't grow it will panic with ErrTooLarge.
func (b *Buffer) grow(n int) int {
m := b.Len()
// If buffer is empty, reset to recover space.
if m == 0 && b.off != 0 {
b.Reset()
}
// Try to grow by means of a reslice.
if i, ok := b.tryGrowByReslice(n); ok {
return i
}
if b.buf == nil && n <= smallBufferSize {
b.buf = make([]byte, n, smallBufferSize)
return 0
}
c := cap(b.buf)
if n <= c/2-m {
// We can slide things down instead of allocating a new
// slice. We only need m+n <= c to slide, but
// we instead let capacity get twice as large so we
// don't spend all our time copying.
copy(b.buf, b.buf[b.off:])
} else if c > maxInt-c-n {
panic(ErrTooLarge)
} else {
// Not enough space anywhere, we need to allocate.
buf := makeSlice(2*c + n)
copy(buf, b.buf[b.off:])
b.buf = buf
}
// Restore b.off and len(b.buf).
b.off = 0
b.buf = b.buf[:m+n]
return m
}
首先尝试tryGrowbyslice,传入的是512
// tryGrowByReslice is a inlineable version of grow for the fast-case where the
// internal buffer only needs to be resliced.
// It returns the index where bytes should be written and whether it succeeded.
func (b *Buffer) tryGrowByReslice(n int) (int, bool) {
if l := len(b.buf); n <= cap(b.buf)-l {
b.buf = b.buf[:l+n]
return l, true
}
return 0, false
}
case1:如果说这个buffer剩余的空间还能有一个n的大小的话,那当然不用分配。
但是这里似乎没有考虑off的情况,所以说如果每次read都是很少的话他这个off数据其实还是能够保存的。不然off就没有了意义,如果每次grow都删除了
case2:如果说buffer为空且分配的为小于64字节的话,那就会直接申请64字节大小的空间分配;
应该跟页的大小有关
case3:如果c/2》=l-m
就是l-m就是已经读过了的部分。就是说如果已经读过了的部分满足大小的话,那就直接覆盖,将之前已经读过的清零。
事实上只要n《=c-m就行了,那这里为什么小于2/c呢??
那是因为还是不想全部的时间都花在copy上。就是说要求还是要有一定空间的,比他要求的空间更大。
case4:为了防止c太大
case5:没有足够的空间,就分配两倍的cap和原来要求的n然后再把通过copy把没有读取的删掉。
这个unreadpartion就是b.len()
copy(b.buf, b.buf[b.off:])
其实发现这个倍增形式的获取buffer基本在各个领域都是通用的;
而他这里严格了copy的条件其实也能理解,就是copy的如果刚好跟他想要申请的一样,那其实总的来说他遇到因为内存不够而复制的情况会大大增长。
所以说
don’t spend all our time copying
对于文件来说的话,如果说off一直得不到更新的话,
就是从一簇变成三簇再变成七簇。不断的copy,,,,十分耗时。
这样可以看出之前的readfile避免了grow的内存分配有多好了吧。