Go 笔记七 类型转换

前言

强制类型转化,Golang 日常使用随处可见,真的理解么?

字符串

Go 的字符串处理有一些特性,从一个例子看起~

// Benchmark 2733 ns/op
var b string
for i := 0; i < 10; i++ {
   b += "hello world"
   b += " "
   b += "welcome"
   b += "\n"
} 

字符串拼接

仿佛也没什么,就是不断拼接的过程,但是这得是多无脑才能这么去做拼接?不停循环,不停拼字符串,过程中不断内存拷贝……
起码是不是写成下面:

// Benchmark 688.1 ns/op
var b string
for i := 0; i < 10; i++ {
   tmp := "hello world" + " " + "welcome" + "\n"
   b += tmp
} 

其实 Go 有内置的字符串拼接方法,可以如下:

// Benchmark 326.3 ns/op
var b strings.Builder
for i := 0; i < 10; i++ {
   b.WriteString("hello world")
   b.WriteString(" ")
   b.WriteString("welcome")
   b.WriteString("\n")
}
return b.String()

字符串强制类型转化

常见字符串转和 []byte 的转化:

// Benchmark 27.72 ns/op
s := "this is source string"
b := []byte(s)
fmt.Println(b)
b[0] = 100
fmt.Println(b)
c := string(b)
fmt.Println(c)

这种转化,实质是用 字符串初始化一个 byte Slice,会存在一个新内存分配。
上文的字符串拼接 b.String() 可以看下源码实现

// String returns the accumulated string.
func (b *Builder) String() string {
	return *(*string)(unsafe.Pointer(&b.buf)) // buf  []byte
}

不难发现字符串和 []byte 是可以不拷贝,用强制转化

s := "this is source string"
b := *(*[]byte)(unsafe.Pointer(&s))
c := *(*string)(unsafe.Pointer(&b))

此种方式性能会大大优于拷贝式,特别是在文件内容解析,从原始文件读出大段 []byte,可以高效转化为长字符串。

1.如果对从 s -> b 的结果 b 做内容修改呢?会发生什么呢?
2.为什么字符串和数组能够做如此的强制类型转化呢?

Go runtime 时 slice 和 string 的描述:

// StringHeader is the runtime representation of a string.
// It cannot be used safely or portably and its representation may
// change in a later release.
// Moreover, the Data field is not sufficient to guarantee the data
// it references will not be garbage collected, so programs must keep
// a separate, correctly typed pointer to the underlying data.
type StringHeader struct {
	Data uintptr
	Len  int
}
// SliceHeader is the runtime representation of a slice.
// It cannot be used safely or portably and its representation may
// change in a later release.
// Moreover, the Data field is not sufficient to guarantee the data
// it references will not be garbage collected, so programs must keep
// a separate, correctly typed pointer to the underlying data.
type SliceHeader struct {
	Data uintptr
	Len  int
	Cap  int
}

可见在运行时的描述只是相差一个 Cap,string 实际存储是一个 byte 的数组

// runtime/slice.go
type slice struct {
	array unsafe.Pointer
	len   int
	cap   int
}

// runtime/string.go
type stringStruct struct {
	str unsafe.Pointer
	len int
}

stringStruct 和 StringHeader

非常好奇 stringStructStringHeader 的区别,源码注释说 StringHeader 是字符串在 runtime 时候的描述,查阅一些资料,字符串做参数传递时,都是传递 StringHeader 换而言之是不会出现拷贝的。
stringStruct 是对字符串操作时候,使用的一个结构体描述,本质是和 StringHeader 区别不大,结合上面的 string -> []byte 调试看下具体过程

func main() {
	s := "this is source string"
	b := []byte(s)
	fmt.Println(b)
}
(dlv) c
> main.main() ./main.go:9 (hits goroutine(1):1 total:1) (PC: 0x10abe6f)
     4: 
     5: import (
     6:         "fmt"
     7: )
     8: 
=>   9: func main() {
    10:         b := []byte("this is source string")
    11:         s := string(b)
    12:         fmt.Println(s)
    13: }
(dlv) disassemble
 disassemble
TEXT main.main(SB) /Users/machao/go/src/collection/main.go
        main.go:9       0x10abe60       4c8d6424e0                      lea r12, ptr [rsp-0x20]
        main.go:9       0x10abe65       4d3b6610                        cmp r12, qword ptr [r14+0x10]
        main.go:9       0x10abe69       0f8628010000                    jbe 0x10abf97
=>      main.go:9       0x10abe6f*      4881eca0000000                  sub rsp, 0xa0
        main.go:9       0x10abe76       4889ac2498000000                mov qword ptr [rsp+0x98], rbp
        main.go:9       0x10abe7e       488dac2498000000                lea rbp, ptr [rsp+0x98]
        main.go:10      0x10abe86       488d54241b                      lea rdx, ptr [rsp+0x1b]
        main.go:10      0x10abe8b       4889542440                      mov qword ptr [rsp+0x40], rdx
        main.go:10      0x10abe90       8402                            test byte ptr [rdx], al
        main.go:10      0x10abe92       488d1583ab0100                  lea rdx, ptr [rip+0x1ab83]
        main.go:10      0x10abe99       8402                            test byte ptr [rdx], al
        main.go:10      0x10abe9b       48ba7468697320697320            mov rdx, 0x2073692073696874
        main.go:10      0x10abea5       488954241b                      mov qword ptr [rsp+0x1b], rdx
        main.go:10      0x10abeaa       48ba697320736f757263            mov rdx, 0x6372756f73207369
        main.go:10      0x10abeb4       4889542420                      mov qword ptr [rsp+0x20], rdx
        main.go:10      0x10abeb9       48ba6520737472696e67            mov rdx, 0x676e697274732065
        main.go:10      0x10abec3       4889542428                      mov qword ptr [rsp+0x28], rdx
        main.go:10      0x10abec8       488b5c2440                      mov rbx, qword ptr [rsp+0x40]
        main.go:10      0x10abecd       8403                            test byte ptr [rbx], al
        main.go:10      0x10abecf       eb00                            jmp 0x10abed1
        main.go:10      0x10abed1       48895c2468                      mov qword ptr [rsp+0x68], rbx
        main.go:10      0x10abed6       48c744247015000000              mov qword ptr [rsp+0x70], 0x15
        main.go:10      0x10abedf       48c744247815000000              mov qword ptr [rsp+0x78], 0x15
        main.go:11      0x10abee8       31c0                            xor eax, eax
        main.go:11      0x10abeea       b915000000                      mov ecx, 0x15
        main.go:11      0x10abeef       e82cfef9ff                      call $runtime.slicebytetostring
        main.go:11      0x10abef4       4889442448                      mov qword ptr [rsp+0x48], rax
        main.go:11      0x10abef9       48895c2450                      mov qword ptr [rsp+0x50], rbx
        main.go:12      0x10abefe       440f117c2458                    movups xmmword ptr [rsp+0x58], xmm15
        main.go:12      0x10abf04       488d542458                      lea rdx, ptr [rsp+0x58]
        main.go:12      0x10abf09       4889542438                      mov qword ptr [rsp+0x38], rdx
        main.go:12      0x10abf0e       488b442448                      mov rax, qword ptr [rsp+0x48]
        main.go:12      0x10abf13       488b5c2450                      mov rbx, qword ptr [rsp+0x50]
        main.go:12      0x10abf18       e8e3daf5ff                      call $runtime.convTstring
        main.go:12      0x10abf1d       4889442430                      mov qword ptr [rsp+0x30], rax
        main.go:12      0x10abf22       488b542438                      mov rdx, qword ptr [rsp+0x38]
        main.go:12      0x10abf27       8402                            test byte ptr [rdx], al
        main.go:12      0x10abf29       488d35d07a0000                  lea rsi, ptr [rip+0x7ad0]
        main.go:12      0x10abf30       488932                          mov qword ptr [rdx], rsi
        main.go:12      0x10abf33       488d7a08                        lea rdi, ptr [rdx+0x8]
        main.go:12      0x10abf30       488932                          mov qword ptr [rdx], rsi
        main.go:12      0x10abf33       488d7a08                        lea rdi, ptr [rdx+0x8]
        main.go:12      0x10abf37       833dc25f0d0000                  cmp dword ptr [runtime.writeBarrier], 0x0
        main.go:12      0x10abf3e       6690                            data16 nop
        main.go:12      0x10abf40       7402                            jz 0x10abf44
        main.go:12      0x10abf42       eb06                            jmp 0x10abf4a
        main.go:12      0x10abf44       48894208                        mov qword ptr [rdx+0x8], rax
        main.go:12      0x10abf48       eb07                            jmp 0x10abf51
        main.go:12      0x10abf4a       e8713dfbff                      call $runtime.gcWriteBarrier
        main.go:12      0x10abf4f       eb00                            jmp 0x10abf51
        main.go:12      0x10abf51       488b442438                      mov rax, qword ptr [rsp+0x38]
        main.go:12      0x10abf56       8400                            test byte ptr [rax], al
        main.go:12      0x10abf58       eb00                            jmp 0x10abf5a
        main.go:12      0x10abf5a       4889842480000000                mov qword ptr [rsp+0x80], rax
        main.go:12      0x10abf62       48c784248800000001000000        mov qword ptr [rsp+0x88], 0x1
        main.go:12      0x10abf6e       48c784249000000001000000        mov qword ptr [rsp+0x90], 0x1
        main.go:12      0x10abf7a       bb01000000                      mov ebx, 0x1
        main.go:12      0x10abf7f       4889d9                          mov rcx, rbx
        main.go:12      0x10abf82       e879a8ffff                      call $fmt.Println
        main.go:13      0x10abf87       488bac2498000000                mov rbp, qword ptr [rsp+0x98]
        main.go:13      0x10abf8f       4881c4a0000000                  add rsp, 0xa0
        main.go:13      0x10abf96       c3                              ret
        main.go:9       0x10abf97       e8441dfbff                      call $runtime.morestack_noctxt
        main.go:9       0x10abf9c       0f1f4000                        nop dword ptr [rax], eax
        main.go:9       0x10abfa0       e9bbfeffff                      jmp $main.main

转化的操作是在 call $runtime.slicebytetostring 中实现:

// slicebytetostring converts a byte slice to a string.
// It is inserted by the compiler into generated code.
// ptr is a pointer to the first element of the slice;
// n is the length of the slice.
// Buf is a fixed-size buffer for the result,
// it is not nil if the result does not escape.
func slicebytetostring(buf *tmpBuf, ptr *byte, n int) (str string) {
	...
	stringStructOf(&str).str = p
	stringStructOf(&str).len = n
	memmove(p, unsafe.Pointer(ptr), uintptr(n))
	return
}

func stringStructOf(sp *string) *stringStruct {
	return (*stringStruct)(unsafe.Pointer(sp))
}

字符遍历

len(s) 获取的是字符串底层数组的长度,并非当前字符串中字符个数,而 range 遍历字符串时是逐个字符对象遍历,那么想要获取字符串字符个数可以入下:

s := "你好 T"
// 转化成 []rune 数组
len([]rune(s))
// 调用 utf8.RuneCountInString
utf8.RuneCountInString(str)
for i, o := range s {
	fmt.Println(i, " -- ", o)
}

因为 golang 的字符串中文集是 utf8 编码的

float64 和 int 转化

float64 slice 排序

适用 64位 操作系统,将 []float64 转化为 []int 数组排序

import "sort"

var a = []float64{4, 2, 5, 7, 2, 1, 88, 1}

func SortFloat64FastV1(a []float64) {
	// 强制类型转换
	var b []int = ((*[1 << 20]int)(unsafe.Pointer(&a[0])))[:len(a):cap(a)]

	// 以 int 方式给 float64 排序
	sort.Ints(b)
}

Some Tips

  1. Go 的浮点符合 IEEE 754 标准,单精度 4字节,双精度 8字节,float64 是双精度浮点
  2. 对于IEEE 754浮点的字节表示,浮点数值顺序与其对应的有符号整型序一致
  3. Go int 是一个至少 4 字节的有符号整型数据,实际 int 是根据操作系统而定,在 64位操作系统中占 8字节,32位占 4字节
  4. 非空 Slice 底层 header 地址就是第一个 元素地址
var a = []float64{4, 2.21, 5, 7, 2.22, 1.0001, 88, 1.0002}
runtimeA := *(*reflect.SliceHeader)(unsafe.Pointer(&a)) // 前文提到的 Slice runtime 的表达
// 二者一致
fmt.Printf("%x\n", runtimeA.Data)
fmt.Printf("%x\n", &a[0])

转化排序解读

var b []int = ((*[1 << 20]int)(unsafe.Pointer(&a[0])))[:len(a):cap(a)]
  1. &a[0] 取 []float 头地址
  2. 转化unsafe.Pointer 通用指针
  3. 再强制转化成 *[1 << 20]int*[1048576]int 即长度为 1048576 int 数组的头指针
  4. *[1048576]int 数组按照原 []floatlencap 切片
    实质就保持 原始 a 数组的 底层每个元素 8 字节 的2进制序列不变,每个将该 8字节做 int 型数组
  5. 最后调用内置 sort.Ints(b) 排序,原 a 中内容也排序完成
    参考:https://github.com/chai2010/advanced-go-programming-book/blob/master/ch1-basic/ch1-03-array-string-and-slice.md

你可能感兴趣的:(go)