强制类型转化,Golang 日常使用随处可见,真的理解么?
Go 的字符串处理有一些特性,从一个例子看起~
// Benchmark 2733 ns/op
var b string
for i := 0; i < 10; i++ {
b += "hello world"
b += " "
b += "welcome"
b += "\n"
}
仿佛也没什么,就是不断拼接的过程,但是这得是多无脑才能这么去做拼接?不停循环,不停拼字符串,过程中不断内存拷贝……
起码是不是写成下面:
// Benchmark 688.1 ns/op
var b string
for i := 0; i < 10; i++ {
tmp := "hello world" + " " + "welcome" + "\n"
b += tmp
}
其实 Go 有内置的字符串拼接方法,可以如下:
// Benchmark 326.3 ns/op
var b strings.Builder
for i := 0; i < 10; i++ {
b.WriteString("hello world")
b.WriteString(" ")
b.WriteString("welcome")
b.WriteString("\n")
}
return b.String()
常见字符串转和 []byte 的转化:
// Benchmark 27.72 ns/op
s := "this is source string"
b := []byte(s)
fmt.Println(b)
b[0] = 100
fmt.Println(b)
c := string(b)
fmt.Println(c)
这种转化,实质是用 字符串初始化一个 byte Slice,会存在一个新内存分配。
上文的字符串拼接 b.String()
可以看下源码实现
// String returns the accumulated string.
func (b *Builder) String() string {
return *(*string)(unsafe.Pointer(&b.buf)) // buf []byte
}
不难发现字符串和 []byte
是可以不拷贝,用强制转化
s := "this is source string"
b := *(*[]byte)(unsafe.Pointer(&s))
c := *(*string)(unsafe.Pointer(&b))
此种方式性能会大大优于拷贝式,特别是在文件内容解析,从原始文件读出大段 []byte
,可以高效转化为长字符串。
1.如果对从 s -> b
的结果 b
做内容修改呢?会发生什么呢?
2.为什么字符串和数组能够做如此的强制类型转化呢?
Go runtime 时 slice 和 string 的描述:
// StringHeader is the runtime representation of a string.
// It cannot be used safely or portably and its representation may
// change in a later release.
// Moreover, the Data field is not sufficient to guarantee the data
// it references will not be garbage collected, so programs must keep
// a separate, correctly typed pointer to the underlying data.
type StringHeader struct {
Data uintptr
Len int
}
// SliceHeader is the runtime representation of a slice.
// It cannot be used safely or portably and its representation may
// change in a later release.
// Moreover, the Data field is not sufficient to guarantee the data
// it references will not be garbage collected, so programs must keep
// a separate, correctly typed pointer to the underlying data.
type SliceHeader struct {
Data uintptr
Len int
Cap int
}
可见在运行时的描述只是相差一个 Cap
,string 实际存储是一个 byte 的数组
// runtime/slice.go
type slice struct {
array unsafe.Pointer
len int
cap int
}
// runtime/string.go
type stringStruct struct {
str unsafe.Pointer
len int
}
非常好奇 stringStruct
和 StringHeader
的区别,源码注释说 StringHeader 是字符串在 runtime 时候的描述,查阅一些资料,字符串做参数传递时,都是传递 StringHeader
换而言之是不会出现拷贝的。
stringStruct
是对字符串操作时候,使用的一个结构体描述,本质是和 StringHeader
区别不大,结合上面的 string -> []byte
调试看下具体过程
func main() {
s := "this is source string"
b := []byte(s)
fmt.Println(b)
}
(dlv) c
> main.main() ./main.go:9 (hits goroutine(1):1 total:1) (PC: 0x10abe6f)
4:
5: import (
6: "fmt"
7: )
8:
=> 9: func main() {
10: b := []byte("this is source string")
11: s := string(b)
12: fmt.Println(s)
13: }
(dlv) disassemble
disassemble
TEXT main.main(SB) /Users/machao/go/src/collection/main.go
main.go:9 0x10abe60 4c8d6424e0 lea r12, ptr [rsp-0x20]
main.go:9 0x10abe65 4d3b6610 cmp r12, qword ptr [r14+0x10]
main.go:9 0x10abe69 0f8628010000 jbe 0x10abf97
=> main.go:9 0x10abe6f* 4881eca0000000 sub rsp, 0xa0
main.go:9 0x10abe76 4889ac2498000000 mov qword ptr [rsp+0x98], rbp
main.go:9 0x10abe7e 488dac2498000000 lea rbp, ptr [rsp+0x98]
main.go:10 0x10abe86 488d54241b lea rdx, ptr [rsp+0x1b]
main.go:10 0x10abe8b 4889542440 mov qword ptr [rsp+0x40], rdx
main.go:10 0x10abe90 8402 test byte ptr [rdx], al
main.go:10 0x10abe92 488d1583ab0100 lea rdx, ptr [rip+0x1ab83]
main.go:10 0x10abe99 8402 test byte ptr [rdx], al
main.go:10 0x10abe9b 48ba7468697320697320 mov rdx, 0x2073692073696874
main.go:10 0x10abea5 488954241b mov qword ptr [rsp+0x1b], rdx
main.go:10 0x10abeaa 48ba697320736f757263 mov rdx, 0x6372756f73207369
main.go:10 0x10abeb4 4889542420 mov qword ptr [rsp+0x20], rdx
main.go:10 0x10abeb9 48ba6520737472696e67 mov rdx, 0x676e697274732065
main.go:10 0x10abec3 4889542428 mov qword ptr [rsp+0x28], rdx
main.go:10 0x10abec8 488b5c2440 mov rbx, qword ptr [rsp+0x40]
main.go:10 0x10abecd 8403 test byte ptr [rbx], al
main.go:10 0x10abecf eb00 jmp 0x10abed1
main.go:10 0x10abed1 48895c2468 mov qword ptr [rsp+0x68], rbx
main.go:10 0x10abed6 48c744247015000000 mov qword ptr [rsp+0x70], 0x15
main.go:10 0x10abedf 48c744247815000000 mov qword ptr [rsp+0x78], 0x15
main.go:11 0x10abee8 31c0 xor eax, eax
main.go:11 0x10abeea b915000000 mov ecx, 0x15
main.go:11 0x10abeef e82cfef9ff call $runtime.slicebytetostring
main.go:11 0x10abef4 4889442448 mov qword ptr [rsp+0x48], rax
main.go:11 0x10abef9 48895c2450 mov qword ptr [rsp+0x50], rbx
main.go:12 0x10abefe 440f117c2458 movups xmmword ptr [rsp+0x58], xmm15
main.go:12 0x10abf04 488d542458 lea rdx, ptr [rsp+0x58]
main.go:12 0x10abf09 4889542438 mov qword ptr [rsp+0x38], rdx
main.go:12 0x10abf0e 488b442448 mov rax, qword ptr [rsp+0x48]
main.go:12 0x10abf13 488b5c2450 mov rbx, qword ptr [rsp+0x50]
main.go:12 0x10abf18 e8e3daf5ff call $runtime.convTstring
main.go:12 0x10abf1d 4889442430 mov qword ptr [rsp+0x30], rax
main.go:12 0x10abf22 488b542438 mov rdx, qword ptr [rsp+0x38]
main.go:12 0x10abf27 8402 test byte ptr [rdx], al
main.go:12 0x10abf29 488d35d07a0000 lea rsi, ptr [rip+0x7ad0]
main.go:12 0x10abf30 488932 mov qword ptr [rdx], rsi
main.go:12 0x10abf33 488d7a08 lea rdi, ptr [rdx+0x8]
main.go:12 0x10abf30 488932 mov qword ptr [rdx], rsi
main.go:12 0x10abf33 488d7a08 lea rdi, ptr [rdx+0x8]
main.go:12 0x10abf37 833dc25f0d0000 cmp dword ptr [runtime.writeBarrier], 0x0
main.go:12 0x10abf3e 6690 data16 nop
main.go:12 0x10abf40 7402 jz 0x10abf44
main.go:12 0x10abf42 eb06 jmp 0x10abf4a
main.go:12 0x10abf44 48894208 mov qword ptr [rdx+0x8], rax
main.go:12 0x10abf48 eb07 jmp 0x10abf51
main.go:12 0x10abf4a e8713dfbff call $runtime.gcWriteBarrier
main.go:12 0x10abf4f eb00 jmp 0x10abf51
main.go:12 0x10abf51 488b442438 mov rax, qword ptr [rsp+0x38]
main.go:12 0x10abf56 8400 test byte ptr [rax], al
main.go:12 0x10abf58 eb00 jmp 0x10abf5a
main.go:12 0x10abf5a 4889842480000000 mov qword ptr [rsp+0x80], rax
main.go:12 0x10abf62 48c784248800000001000000 mov qword ptr [rsp+0x88], 0x1
main.go:12 0x10abf6e 48c784249000000001000000 mov qword ptr [rsp+0x90], 0x1
main.go:12 0x10abf7a bb01000000 mov ebx, 0x1
main.go:12 0x10abf7f 4889d9 mov rcx, rbx
main.go:12 0x10abf82 e879a8ffff call $fmt.Println
main.go:13 0x10abf87 488bac2498000000 mov rbp, qword ptr [rsp+0x98]
main.go:13 0x10abf8f 4881c4a0000000 add rsp, 0xa0
main.go:13 0x10abf96 c3 ret
main.go:9 0x10abf97 e8441dfbff call $runtime.morestack_noctxt
main.go:9 0x10abf9c 0f1f4000 nop dword ptr [rax], eax
main.go:9 0x10abfa0 e9bbfeffff jmp $main.main
转化的操作是在 call $runtime.slicebytetostring
中实现:
// slicebytetostring converts a byte slice to a string.
// It is inserted by the compiler into generated code.
// ptr is a pointer to the first element of the slice;
// n is the length of the slice.
// Buf is a fixed-size buffer for the result,
// it is not nil if the result does not escape.
func slicebytetostring(buf *tmpBuf, ptr *byte, n int) (str string) {
...
stringStructOf(&str).str = p
stringStructOf(&str).len = n
memmove(p, unsafe.Pointer(ptr), uintptr(n))
return
}
func stringStructOf(sp *string) *stringStruct {
return (*stringStruct)(unsafe.Pointer(sp))
}
len(s)
获取的是字符串底层数组的长度,并非当前字符串中字符个数,而 range
遍历字符串时是逐个字符对象遍历,那么想要获取字符串字符个数可以入下:
s := "你好 T"
// 转化成 []rune 数组
len([]rune(s))
// 调用 utf8.RuneCountInString
utf8.RuneCountInString(str)
for i, o := range s {
fmt.Println(i, " -- ", o)
}
因为 golang 的字符串中文集是 utf8 编码的
适用 64位 操作系统,将 []float64
转化为 []int
数组排序
import "sort"
var a = []float64{4, 2, 5, 7, 2, 1, 88, 1}
func SortFloat64FastV1(a []float64) {
// 强制类型转换
var b []int = ((*[1 << 20]int)(unsafe.Pointer(&a[0])))[:len(a):cap(a)]
// 以 int 方式给 float64 排序
sort.Ints(b)
}
float64
是双精度浮点IEEE 754
浮点的字节表示,浮点数值顺序与其对应的有符号整型序一致var a = []float64{4, 2.21, 5, 7, 2.22, 1.0001, 88, 1.0002}
runtimeA := *(*reflect.SliceHeader)(unsafe.Pointer(&a)) // 前文提到的 Slice runtime 的表达
// 二者一致
fmt.Printf("%x\n", runtimeA.Data)
fmt.Printf("%x\n", &a[0])
var b []int = ((*[1 << 20]int)(unsafe.Pointer(&a[0])))[:len(a):cap(a)]
[]float
头地址unsafe.Pointer
通用指针*[1 << 20]int
即 *[1048576]int
即长度为 1048576 int 数组的头指针*[1048576]int
数组按照原 []float
的 len
和 cap
切片a
数组的 底层每个元素 8 字节 的2进制序列不变,每个将该 8字节做 int
型数组a
中内容也排序完成