环境
原文链接
后端
语言golang
数据库elastisearch
web框架是基于gin封装的
websocket库用的是gorilla/websocket
日志: zap
Error: pkg/errors
前端
框架: react
服务端渲染: nextjs
UI: Material UI
登录认证: cookie nookie
websocket是浏览器原生支持
后续聊天功能模仿:https://getstream.io/
登录
github OAuth documentation
google Sign-In for Websites
如何建立连接
- websocket协议
websocket复用http的握手通道,客户端通过HTTP请求与WebSocket服务端协商升级协议,协议升级完成后,后续的数据交换则按按照websocket的协议。
1. 客户端:申请协议升级
Connection: Upgrade
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
Sec-WebSocket-Key: m3xNMIzhueJyd3N66EAK6w==
Sec-WebSocket-Version: 13
Upgrade: websocket
Connection: Upgrade
表示要升级协议Upgrade: websocket
升级到websocket协议Sec-WebSocket-Version: 13
websocket协议版本,如果服务端不支持该版本,需要返回一个Sec-WebSocket-Version
header,里面包含服务端支持的版本号。Sec-WebSocket-Key: m3xNMIzhueJyd3N66EAK6w==
与后面服务端响应首部的Sec-WebSocket-Accept
是配套的,提供基本的防护,比如恶意的连接,或者无意的连接
2. 服务端:响应协议升级
HTTP/1.1 101 Switching Protocols
Connection: upgrade
Sec-WebSocket-Accept: qgWIKoKflyd9H8L/jCNa8XP4CYQ=
Upgrade: websocket
状态代码101
表示协议切换。到此完成协议升级,后续的数据交互都按照新的协议来。
3. Sec-WebSocket-Accept的计算
Sec-WebSocket-Accept
根据客户端请求首
Sec-WebSocket-Accept
根据客户端请求首部的Sec-WebSocket-Key
计算出来。
计算公式为:
将
Sec-WebSocket-Key
跟258EAFA5-E914-47DA-95CA-C5AB0DC85B11
拼接。通过SHA1计算出摘要,并转成base64字符串。
伪代码如下:
>toBase64( sha1( Sec-WebSocket-Key + 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 ) )
验证下前面的返回结果:
const crypto = require('crypto');
const magic = '258EAFA5-E914-47DA-95CA-C5AB0DC85B11';
const secWebSocketKey = 'm3xNMIzhueJyd3N66EAK6w==';
let secWebSocketAccept = crypto.createHash('sha1')
.update(secWebSocketKey + magic)
.digest('base64');
console.log(secWebSocketAccept);
// qgWIKoKflyd9H8L/jCNa8XP4CYQ=
前端难点
服务端渲染框架如何使用cookie
nextjs是服务端渲染框架,nextjs会请求是在服务器上初始化一次dom,要获取cookie可以有2种方式
从请求request中获取cookie,返回给组件
客户端请求完成后用hook从浏览器本地加载cookie
cookie 库用的是 nookies
React.useEffect(() => {
const all = parseCookies();
if (all.douyacun) {
const douyacun = JSON.parse(all.douyacun);
setCook(douyacun);
}
}, []);
另一个就是本地开发cookie跨域的问题, 本地域名localhost:3000,服务端域名localhost:9000:
本来是想用http-proxy-middleware 来解决跨域的问题,但是websocket代理关于cookie的问题难以搞定,版本是1.0.3
最后还是nginx了,怎么统一一个域名呢?location,后端服务统一以api开头
server {
listen 80;
server_name douyacun.io;
location /api/ {
if ($request_method = 'OPTIONS') {
return 204;
}
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_set_header X-Nginx-Proxy true;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
# 不需要考虑到负载的,就无需配置upstream节点。
proxy_pass http://127.0.0.1:9003;
}
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_set_header X-Nginx-Proxy true;
proxy_set_header Connection "";
# 不需要考虑到负载的,就无需配置upstream节点。
proxy_pass http//127.0.0.1:3000;
}
}
修改本地dns douyacun.io => 127.0.0.1
如何保证websocket初始化一次
还是因为框架用的nextjs服务端渲染框架,不能在全局初始化websocket,只能在useEffect中使用
useEffect(() => {
initWebsocket();
return () => {
}
});
const initWebsocket = () => {
conn = new WebSocket(WS_ADDRESS);
conn.onmessage = handlerMessage;
conn.onclose = function () {
dispatch({ type: "dialog_open", dialogOpen: true })
};
conn.onerror = function () {
console.log("连接失败");
}
}
这样会有一个问题,只要有state变更,useEffect就会被调用,导致webSocket循环初始化
todo: 后端服务如何识别这种重复性的初始化,如何拒绝?
react hook 文档给出了答案
the previous effect is cleaned up before executing the next effect. In our example, this means a new subscription is created on every update. To avoid firing an effect on every update, refer to the next section.
意味着组件的每一次更新都会创建新的订阅
If you pass an empty array (
[]
), the props and state inside the effect will always have their initial values,
useEffect第二个参数传 [], 就不回每次state变更都调用了。
useEffect(() => {
initWebsocket();
return () => {
}
}, []);
conn.onmessage(() => {setMessages()}) 不会触发state组件的重新渲染?
这个问题没有弄懂,不过react hook提供另一种状态管理,useReducer(), 就像redux一样。
state 逻辑较复杂且包含多个子值,或者下一个 state 依赖于之前的 state 等。并且,使用
useReducer
还能给那些会触发深更新的组件做性能优化,
滑动条的位置
- 收到新消息时,滑动条回到最底部
hook 提供了useRef可以操作组件, 在收到消息时,控制滑动条到最底部,
const scrollToBottom = () => {
// 保证消息在最下面
contentRef.current.scrollTop = contentRef.current.scrollHeight
}
const handlerMessage = function (evt) {
let msg = JSON.parse(evt.data)
dispatch({ type: "new_message", msg: msg, channel_id: msg['channel_id'] });
scrollToBottom();
}
...
- 上拉加载更多历史消息
滑动条高度 = 加载历史消息完成后文档高度 - 加载前文档高度
// 这里是为了在加载完成后,在控制滑动条位置,如果不是hook的话,setState(state, () => {}),第二个参数回掉函数完成,hook使用useEffect的代替了这种方式
useEffect(() => {
// loadMore 是请求后端的接口
let t = setTimeout(loadMore, 1000);
if (!loading) {
// 滑动条高度 = 加载历史消息完成后文档高度 - 加载前文档高度
contentRef.current.scrollTop = contentRef.current.scrollHeight - messagesHeight;
}
return () => {
clearTimeout(t);
}
}, [loading]);
// contentRef.onScroll
const upScrollLoadMore = () => {
let scrollTop = contentRef.current.scrollTop;
// 记录加载前文档的高度
let channelMessages = state.messages[state.currentId];
if (scrollTop == 0 && !loading && channelMessages && channelMessages.length > 0) {
let channel = getChannelById(state.currentId);
if (channel.total > 0) {
setloading(true);
}
}
}
- 正在查看历史消息,如果有新消息来会导致滑动条到最底部
todo::
nginx如何反向代理websocket?
前端开发环境配置的时候已经展示过一次
proxy_http_version 1.1;
: 代理http版本号proxy_set_header Upgrade $http_upgrade;
: http协议升级的请求头Upgrade传给后端proxy_set_header Connection 'upgrade';
http协议升级的请求头Connection传给后端
群聊和私聊是如何实现的
不管是私聊还是群聊,都会创建一个channel来包含成员,每次消息发送消息指定channel.id即可,服务器在收到消息后,根据channel.id向channel.members推送消息。
channel分了3中类型:
global 全局,每个人都会订阅
public 群聊 聊天人数超过2个
private 私聊 聊天人群2个
一个连接的成本是多少?
这边使用gorilla/websocket 1万个连接测试
占用147.93MB RAM, 平均连接每个占用15kb 测试代码见:github gwebsocket
(pprof) top
Showing nodes accounting for 137.93MB, 93.24% of 147.93MB total
Dropped 6 nodes (cum <= 0.74MB)
Showing top 10 nodes out of 51
flat flat% sum% cum cum%
73.79MB 49.88% 49.88% 73.79MB 49.88% bufio.NewWriterSize
34.63MB 23.41% 73.29% 34.63MB 23.41% bufio.NewReaderSize
11MB 7.44% 80.73% 11MB 7.44% runtime.malg
4MB 2.70% 83.44% 5.50MB 3.72% net/textproto.(*Reader).ReadMIMEHeader
3MB 2.03% 85.46% 3.50MB 2.37% github.com/gorilla/websocket.newConn
3MB 2.03% 87.49% 10.50MB 7.10% net/http.readRequest
2.50MB 1.69% 89.18% 16.50MB 11.16% net/http.(*conn).readRequest
2.50MB 1.69% 90.87% 3.50MB 2.37% context.propagateCancel
2MB 1.35% 92.23% 2MB 1.35% syscall.anyToSockaddr
1.50MB 1.01% 93.24% 1.50MB 1.01% net.newFD
(pprof) web
failed to execute dot. Is Graphviz installed? Error: exec: "dot": executable file not found in $PATH
(pprof) list flat
Total: 147.93MB
goroutine是10003,每个goroutine占用4kb的内存
(pprof) top
Showing nodes accounting for 10001, 100% of 10003 total
Dropped 24 nodes (cum <= 50)
Showing top 10 nodes out of 19
flat flat% sum% cum cum%
10001 100% 100% 10001 100% runtime.gopark
0 0% 100% 9998 100% bufio.(*Reader).Peek
0 0% 100% 9998 100% bufio.(*Reader).fill
0 0% 100% 9999 100% github.com/gorilla/websocket.(*Conn).NextReader
0 0% 100% 9999 100% github.com/gorilla/websocket.(*Conn).ReadMessage
0 0% 100% 9999 100% github.com/gorilla/websocket.(*Conn).advanceFrame
0 0% 100% 9998 100% github.com/gorilla/websocket.(*Conn).read
0 0% 100% 9999 100% internal/poll.(*FD).Read
0 0% 100% 10001 100% internal/poll.(*pollDesc).wait
0 0% 100% 10001 100% internal/poll.(*pollDesc).waitRead (inline)
(pprof) list flat
Total: 10003
(pprof)
根据 Eran Yanay 在 Gophercon Israel 分享的讲座 https://www.youtube.com/watch?reload=9&v=LI1YTFMi8W4 优化, 代码在github
- 使用epoll优化, 复用goroutine, goroutine适合cpu密集型,而epoll适合I/O密集型,这里使用epoll来复用goroutine, 如果是1万个链接的话, 4kb * 10000 / 1024 ~= 39M , epoll的原理和用法可以看一下,了解一下高大上的epoll
这边使用epoll 内存节省了 147.93 - 79.94 = 67.99MB,
(pprof) top
Showing nodes accounting for 79.94MB, 100% of 79.94MB total
Showing top 10 nodes out of 37
flat flat% sum% cum cum%
38.65MB 48.35% 48.35% 38.65MB 48.35% bufio.NewReaderSize (inline)
30.12MB 37.67% 86.02% 30.12MB 37.67% bufio.NewWriterSize
4.50MB 5.63% 91.65% 5MB 6.26% github.com/gorilla/websocket.newConn
2.50MB 3.13% 94.78% 2.50MB 3.13% net.sockaddrToTCP
2MB 2.50% 97.28% 2MB 2.50% syscall.anyToSockaddr
0.67MB 0.84% 98.12% 0.67MB 0.84% main.(*epoll).Add
0.50MB 0.63% 98.75% 0.50MB 0.63% fmt.(*pp).handleMethods
0.50MB 0.63% 99.37% 0.50MB 0.63% net.newFD
0.50MB 0.63% 100% 0.50MB 0.63% github.com/gorilla/websocket.(*Conn).SetPingHandler
0 0% 100% 38.65MB 48.35% bufio.NewReader
(pprof) list flat
Total: 79.94MB
运行的goroutine只有5个
(pprof) top
Showing nodes accounting for 5, 100% of 5 total
Showing top 10 nodes out of 35
flat flat% sum% cum cum%
2 40.00% 40.00% 2 40.00% runtime.gopark
1 20.00% 60.00% 1 20.00% runtime/pprof.writeRuntimeProfile
1 20.00% 80.00% 1 20.00% syscall.Syscall
1 20.00% 100% 1 20.00% syscall.Syscall6
0 0% 100% 2 40.00% internal/poll.(*FD).Accept
0 0% 100% 1 20.00% internal/poll.(*FD).Read
0 0% 100% 2 40.00% internal/poll.(*pollDesc).wait
0 0% 100% 2 40.00% internal/poll.(*pollDesc).waitRead (inline )
0 0% 100% 2 40.00% internal/poll.runtime_pollWait
0 0% 100% 1 20.00% main.(*epoll).Wait
(pprof) list flat
Total: 5
-
从上面内存占用情况来看还是buf占用内存比较多,接下来就考虑优化buf的使用。go websocket库有2个比较推荐,第一个就是上面一直用的另一个是:gobwas/ws, gobwas有几个特性:
Zero-copy upgrade
No intermediate allocations during I/O
Low-level API which allows to build your own logic of packet handling and buffers reuse
High-level wrappers and helpers around API in
wsutil
package, which allow to start fast without digging the protocol internals
1w个链接3M内存, 5个goroutine
(pprof) top
Showing nodes accounting for 3328.46kB, 100% of 3328.46kB total
Showing top 10 nodes out of 20
flat flat% sum% cum cum%
1024.12kB 30.77% 30.77% 1024.12kB 30.77% net.newFD (inline)
1024.05kB 30.77% 61.54% 1024.05kB 30.77% net.sockaddrToTCP
768.26kB 23.08% 84.62% 768.26kB 23.08% main.(*epoll).Add
512.03kB 15.38% 100% 512.03kB 15.38% syscall.anyToSockaddr
0 0% 100% 512.03kB 15.38% internal/poll.(*FD).Accept
0 0% 100% 512.03kB 15.38% internal/poll.accept
0 0% 100% 2560.20kB 76.92% main.main
0 0% 100% 768.26kB 23.08% main.wsHandle
0 0% 100% 2560.20kB 76.92% net.(*TCPListener).Accept
0 0% 100% 2560.20kB 76.92% net.(*TCPListener).accept
(pprof) list flat
Total: 3.25MB
(pprof) top
Showing nodes accounting for 5, 100% of 5 total
Showing top 10 nodes out of 35
flat flat% sum% cum cum%
2 40.00% 40.00% 2 40.00% runtime.gopark
1 20.00% 60.00% 1 20.00% runtime/pprof.writeRuntimeProfile
1 20.00% 80.00% 1 20.00% syscall.Syscall
1 20.00% 100% 1 20.00% syscall.Syscall6
0 0% 100% 2 40.00% internal/poll.(*FD).Accept
0 0% 100% 1 20.00% internal/poll.(*FD).Read
0 0% 100% 2 40.00% internal/poll.(*pollDesc).wait
0 0% 100% 2 40.00% internal/poll.(*pollDesc).waitRead (inline)
0 0% 100% 2 40.00% internal/poll.runtime_pollWait
0 0% 100% 1 20.00% main.(*epoll).Wait
(pprof) list flat
Total: 5
heartbeat???
使用gobws重构了一下聊天室,每隔一段时间,如果不发消息连接会自动断开,问题定位在nginx上,代理超时的机制,2次读之间的超时。
server {
listen 80;
server_name douyacun.io;
location /api/ {
if ($request_method = 'OPTIONS') {
return 204;
}
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_set_header X-Nginx-Proxy true;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
# 不需要考虑到负载的,就无需配置upstream节点。
proxy_pass http://127.0.0.1:9003;
}
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_set_header X-Nginx-Proxy true;
proxy_set_header Connection "";
# 不需要考虑到负载的,就无需配置upstream节点。
proxy_pass http://127.0.0.1:3000;
}
}
nginx有2个参数是涉及到timeout
proxy_read_timeout 默认值 60s 该指令设置与代理服务器的读超时时间。它决定了nginx会等待多长时间来获得请求的响应。 这个时间不是获得整个response的时间,而是两次reading操作的时间。
proxy_send_timeout 默认值 60s 这个指定设置了发送请求给upstream服务器的超时时间。超时设置不是为了整个发送期间,而是在两次write操作期间。 如果超时后,upstream没有收到新的数据,nginx会关闭连接
如何解决?
通过定期发送ping帧以保持连接并确认连接是否还在使用
定时给scoket发送ping消息?
如果每个connection持有一个goroutine的话,初始化一个ticker定时器每隔N秒发送一次ping就ok了,问题是我们使用epoll来节省goroutine
TODO:: epoll如何实现定时轮询