Varnish(主要用作缓存)

1.介绍

http是现在服务中用的主流协议
restful
缓存 ：
           代理缓存-----proxy---类似于递归
           旁挂式缓存-----类似于迭代
        （递归：直接给出答案，不知道的被人去找答案）
        （迭代：自己一级一级问）
缓存有过期机制和条件式请求
vanish具有代理式缓存
               正向代理：代表客户端发请求 client
               反向代理：代表服务器端  server
httpd既能做正向代理又能做反向代理
Nginx核心是代理，也可以缓存 
squid    ： httpd
varnish ： Nginx
                 Nginx用epoll做事件驱动
c10k：并发连接  connections 10k
c100k
vanish工作在客户端和服务器之间作反向代理，并且用作缓存
mysql关系型数据库

运维日常:

clipboard.png

clipboard1.png

2.Web Page Cache：

             squid --> varnish
              程序的运行具有局部性特征：
                          时间局部性：一个数据被访问过之后，可能很快会被再次访问到
                             空间局部性：一个数据被访问时，其周边的数据也有可能被访问到
               cache：命中
                            热区：局部性；（就像某宝推荐的商品有时效性和局部性）
                                          时效性：
                                                缓存空间耗尽：LRU，最近最少使用；
                                                 过期：缓存清理
缓存命中率：hit/(hit+miss) （命中次数/命中次数+未命中次数）
                    (0,1)          命中次数+为命中次数不一定等于总次数，有的不经过缓存访问
                    页面命中率：基于页面数量进行衡量
                     字节命中率： 基于页面的体积进行衡量
缓存与否：
               私有数据：private,private cache;
               公共数据：public,public or private cache;
Cache-related Headers Fields 缓存的区域性
            The most important caching header fields are：
               Expires：过期时间；
               Expires:Thu, 22 Oct 2026 06:34:30 GMT
                     Cache-Control：max-age=
                         
                       Etag
                 If-None-Match

     Last-Modified
     If-Modified-Since

     Vary
     Age
 
缓存有效性判断机制：
过期时间：Expires
HTTP\/1.0
Expires：过期
HTTP\/1.1
Cache-Control: maxage=
Cache-Control: s-maxage=
条件式请求：
Last-Modified/If-Modified-Since：基于文件的修改时间戳来判别；
Etag\/If-None-Match：基于文件的校验码来判别；
Expires:Thu, 13 Aug 2026 02:05:12 GMT
Cache-Control:max-age=315360000
ETag:"1ec5-502264e2ae4c0"
Last-Modified:Wed, 03 Sep 2014 10:00:27 GMT
缓存层级：
      私有缓存：用户代理附带的本地缓存机制；
      公共缓存：反向代理服务器的缓存功能；
      User-Agent <--> private cache <--> public cache <--> public cache 2 <--> Original Server
请求报文用于通知缓存服务如何使用缓存响应请求：
（以下是请求报文可用的选项）
cache-request-directive =
"no-cache"，                        
| "no-store"                         
| "max-age" "=" delta-seconds        
| "max-stale" [ "=" delta-seconds ]  
| "min-fresh" "=" delta-seconds      
| "no-transform"                    
| "only-if-cached"                  
| cache-extension                    
 响应报文用于通知缓存服务器如何存储上级服务器响应的内容：
（以下是响应报文可用的选项）
cache-response-directive =
"public"                               
| "private" [ "=" <"> 1#field-name <"> ]
| "no-cache" [ "=" <"> 1#field-name <"> ]，可缓存，但响应给客户端之前 条件式请求进行缓存有效性验正；
| "no-store" ，不允许存储响应内容于缓存中；                           
| "no-transform"                        
| "must-revalidate"                     
| "proxy-revalidate"                  
| "max-age" "=" delta-seconds  定义最大缓存时长，过期时间       
| "s-maxage" "=" delta-seconds 定义最大缓存时长，仅用于控制公共时长        
| cache-extension     
开源解决方案：
   squid：
   varnish：
   varnish官方站点： http://www.varnish-cache.org/
Community
Enterprise
 This is Varnish Cache, a high-performance HTTP accelerator.

clipboard2.png

varnish2.0,3.0处理过程

clipboard3.png

varnish4.0
varnish的程序环境：（我们只能缓存GET和HEAD请求）s

varnish (1)4.png

3.vanish程序架构：

vanish由manager和cacher进程组成，还有共享内存日志组件
Manager进程（主控进程）
Cacher进程，包含多种类型的线程：
accept, worker, expiry, ...
（cacher处理各种缓存事物，比如处理请求，管理缓存，清理过期缓存）
shared memory log：
（共享内存日志：为了免得日志成为性能瓶颈，日志信息直接计入内存）
统计数据：计数器；
日志区域：日志记录；
varnishlog, varnishncsa, varnishstat...
配置接口：VCL
Varnish Configuration Language,
vcl complier --> c complier --> shared object
/etc/varnish/varnish.params：配置varnish服务进程的工作特性，例如监听的地址和端口，缓存机制；
/etc/varnish/default.vcl：配置各Child/Cache线程的缓存策略；
主程序：
/usr/sbin/varnishd
CLI interface：
/usr/bin/varnishadm
Shared Memory Log交互工具：
/usr/bin/varnishhist
/usr/bin/varnishlog
/usr/bin/varnishncsa
/usr/bin/varnishstat
/usr/bin/varnishtop
测试工具程序：
/usr/bin/varnishtest
VCL配置文件重载程序：
/usr/sbin/varnish_reload_vcl
Systemd Unit File：
/usr/lib/systemd/system/varnish.service
varnish服务
/usr/lib/systemd/system/varnishlog.service
/usr/lib/systemd/system/varnishncsa.service
日志持久的服务；
varnish的缓存存储机制( Storage Types)：
-s [name=]type[,options]
· malloc[,size]
内存存储，[,size]用于定义空间大小；重启后所有缓存项失效；
· file[,path[,size[,granularity]]]
磁盘文件存储，黑盒；重启后所有缓存项失效；
· persistent,path,size
文件存储，黑盒；重启后所有缓存项有效；实验；暂不能用

clipboard5.png

varnish程序的选项：

程序选项：/etc/varnish/varnish.params文件
-a address[:port][,address[:port][...]，默认为6081端口；
-T address[:port]，默认为6082端口；
-s [name=]type[,options]，定义缓存存储机制；
-u user
-g group
-f config：VCL配置文件；
-F：运行于前台；
...
运行时参数：/etc/varnish/varnish.params文件， DEAMON_OPTS
DAEMON_OPTS="-p thread_pool_min=5 -p thread_pool_max=500 -p thread_pool_timeout=300"
-p param=value：设定运行参数及其值；可重复使用多次；
-r param[,param...]: 设定指定的参数为只读状态；

clipboard6.png

重载vcl配置文件：

~ ]# varnish_reload_vcl
# varnishadm（varnish客户端命令）
-S /etc/varnish/secret -T [ADDRESS:]PORT
help []
ping []
auth
quit
banner
status
start
stop
vcl.load 相当于varnish_relod_acl:加载ACL
vcl.inline
vcl.use
vcl.discard
vcl.list
param.show [-l] []
param.set
panic.show
panic.clear
storage.list
vcl.show [-v]
backend.list []
backend.set_health
ban [&& ]...
ban.list

clipboard7.png

clipboard8.png

clipboard9.png

clipboard10.png

配置文件相关：

vcl.list
vcl.load：装载，加载并编译；
vcl.use：激活；
vcl.discard：删除；
vcl.show [-v] ：查看指定的配置文件的详细信息；
运行时参数：
param.show -l：显示列表；
param.show
param.set
缓存存储：
storage.list
后端服务器：
backend.list
VCL：
”域“专有类型的配置语言；
state engine：状态引擎；
VCL有多个状态引擎，状态之间存在相关性，但状态引擎彼此间互相隔离；每个状态引擎可使用return(x)指明关联至哪个下一级引擎；每个状态引擎对应于vcl文件中的一个配置段，即为subroutine
vcl_hash --> return(hit) --> vcl_hit
vcl_recv的默认配置：
sub vcl_recv {
if (req.method == "PRI") {
/* We do not support SPDY or HTTP/2.0 */
return (synth(405));
}
if (req.method != "GET" &&
req.method != "HEAD" &&
req.method != "PUT" &&
req.method != "POST" &&
req.method != "TRACE" &&
req.method != "OPTIONS" &&
req.method != "DELETE") {
/* Non-RFC2616 or CONNECT which is weird. */
return (pipe);
}
if (req.method != "GET" && req.method != "HEAD") {
/* We only deal with GET and HEAD by default */
return (pass);
}
if (req.http.Authorization || req.http.Cookie) {
/* Not cacheable by default */
return (pass);
}
return (hash);
}
}

Client Side：

vcl_recv, vcl_pass, vcl_hit, vcl_miss, vcl_pipe, vcl_purge, vcl_synth, vcl_deliver
vcl_recv：
hash：vcl_hash
pass: vcl_pass
pipe: vcl_pipe
synth: vcl_synth
purge: vcl_hash --> vcl_purge
vcl_hash：
lookup：
hit: vcl_hit
miss: vcl_miss
pass, hit_for_pass: vcl_pass
purge: vcl_purge
Backend Side：
vcl_backend_fetch, vcl_backend_response, vcl_backend_error
两个特殊的引擎：
vcl_init：在处理任何请求之前要执行的vcl代码：主要用于初始化VMODs；
vcl_fini：所有的请求都已经结束，在vcl配置被丢弃时调用；主要用于清理VMODs；

vcl的语法格式：

(1) VCL files start with vcl 4.0;表明版本
(2) //, # and /* foo / for comments；//,#代表单行注释 / */代表多行注释
(3) Subroutines are declared with the sub keyword; 例如sub vcl_recv { ...}；
每个子例程以sub关键字开头
(4) No loops, state-limited variables（受限于引擎的内建变量）；不支持循环
(5) Terminating statements with a keyword for next action as argument of the return() function, i.e.: return(action)；用于实现状态引擎转换；
(6) Domain-specific;
The VCL Finite State Machine
(1) Each request is processed separately;每个请求都是独立的
(2) Each request is independent from others at any given time;每个请求都与其他的请求分离
(3) States are related, but isolated;状态是有关联的
(4) return(action); exits one state and instructs Varnish to proceed to the next state;return退出当前状态
(5) Built-in VCL code is always present and appended below your own VCL;
Built-in VCL 相当于默认vcl

三类主要语法：

sub subroutine {
...
}
if CONDITION {
...
} else {
...
}
return(), hash_data()
VCL Built-in Functions and Keywords
函数：
regsub(str, regex, sub)
regsuball(str, regex, sub)
ban(boolean expression)
hash_data(input)
synthetic(str)
Keywords:
call subroutine， return(action)，new，set，unset
操作符：
==, !=, ~, >, >=, <, <=
逻辑操作符：&&, ||, !
变量赋值：=
举例：obj.hits是内建变量，用于保存某缓存项的从缓存中命中的次数；
if (obj.hits>0) {
set resp.http.X-Cache = "HIT via "(固定字符串) + server.ip;
} else {
set resp.http.X-Cache = "MISS from "（未能命中） + server.ip;
}

clipboard11.png

clipboard12.png

clipboard13.png

常用变量：

bereq., req.：
bereq.http.HEADERS
bereq.request：请求方法；
bereq.url：请求的url；
bereq.proto：请求的协议版本；
bereq.backend：指明要调用的后端主机；
req.http.Cookie：客户端的请求报文中Cookie首部的值；
req.http.User-Agent ~ "chrome"
beresp., resp.：
beresp.http.HEADERS
beresp.status：响应的状态码；
reresp.proto：协议版本；
beresp.backend.name：BE主机的主机名；
beresp.ttl：BE主机响应的内容的余下的可缓存时长；
obj.*
obj.hits：此对象从缓存中命中的次数；
obj.ttl：对象的ttl值，缓存过期
server.*
server.ip：varnish主机的IP；
server.hostname：varnish主机的Hostname
；
client.*
client.ip：发请求至varnish主机的客户端IP；
用户自定义：
set
unset
示例1：强制对某类资源的请求不检查缓存：
vcl_recv {
if (req.url ~ "(?i)^/(login|admin)") { (?i)表示忽略大小写，后面是匹配URI，
不包括主机名和端口
return(pass);
}
}

clipboard14.png

vcl.show test2 查看内容
vcl.use test2

示例2：对于特定类型的资源，例如公开的图片等，取消其私有标识，并强行设定其可以由varnish缓存的时长；定义在vcl_backend_response中；
（并不是所有的cookie都不可以被缓存下来，可以剥离cookie）
if (beresp.http.cache-control !~ "s-maxage") {
if (bereq.url ~ "(?i).(jpg|jpeg|png|gif|css|js)$") {
unset beresp.http.Set-Cookie;
set beresp.ttl = 3600s;
}
}
示例3：定义在vcl_recv中；
if (req.restarts == 0) { 请求的重启次数为0
if (req.http.X-Fowarded-For) { 如果请求报文中有forwarded-for
set req.http.X-Forwarded-For = req.http.X-Forwarded-For + "," + client.ip;
} else {
set req.http.X-Forwarded-For = client.ip;
}
}
后端主机# vim /etc/httpd/conf/htttpd.conf

clipboard15.png

purge:手动修剪指定的缓存项
ban:一类的缓存项
(1) 能执行purge操作
sub vcl_purge {
return (synth(200,"Purged"));
}
(2) 何时执行purge操作
sub vcl_recv {
if (req.method == "PURGE") {
return(purge);
}
...
}
添加此类请求的访问控制法则：
acl purgers {
"127.0.0.0"/8;
"10.1.0.0"/16;
}
sub vcl_recv {
if (req.method == "PURGE") {
if (!client.ip ~ purgers) {
return(synth(405,"Purging not allowed for " + client.ip));
}
return(purge);
}
...
}

clipboard16.png

clipboard17.png

Banning：

(1) varnishadm：
ban
示例：
ban req.url ~ ^/javascripts
(2) 在配置文件中定义，使用ban()函数；
示例：
if (req.method == "BAN") {
ban("req.http.host == " + req.http.host + " && req.url == " + req.url);
# Throw a synthetic page so the request won't go to the backend.
return(synth(200, "Ban added"));
}
ban req.http.host==www.ilinux.io && req.url==/test1.html

如何设定使用多个后端主机：

backend default {每添加一个后端主机就写一个backend，default代表一个主机名
.host = "172.16.100.6";真正主机的地址
.port = "80";真正主机的端口
}
backend appsrv {
.host = "172.16.100.7";
.port = "80";
}
sub vcl_recv {
if (req.url ~ "(?i).php$") {
set req.backend_hint = appsrv;
} else {
set req.backend_hint = default;
}
...
}

clipboard18.png

clipboard19.png

clipboard20.png

clipboard21.png

Director：

varnish module；
使用前需要导入：
import directors；

varnish两种调度算法轮询和随机

示例：
import directors; # load the directors
backend server1 {
.host =
.port =
}
backend server2 {
.host =
.port =
}
sub vcl_init {
new GROUP_NAME = directors.round_robin();
GROUP_NAME.add_backend(server1);
GROUP_NAME.add_backend(server2);
}
sub vcl_recv {
# send all traffic to the bar director:
set req.backend_hint = GROUP_NAME.backend();组名
}

clipboard22.png

会话保存的三种方式：

                            会话绑定 基于原IP绑定，基于应用层绑定
                            session复制
                            session服务器

基于cookie的session sticky：

sub vcl_init {
new h = directors.hash();
h.add_backend(one, 1); // backend 'one' with weight '1'
h.add_backend(two, 1); // backend 'two' with weight '1'
}
sub vcl_recv {
// pick a backend based on the cookie header of the client
set req.backend_hint = h.backend(req.http.cookie);
}
BE Health Check：
backend BE_NAME {
.host =
.port =
.probe = {
.url=
.timeout=
.interval=
.window=
.threshold=
}
}
.probe：定义健康状态检测方法；
.url：检测时要请求的URL，默认为”/";
.request：发出的具体请求；
.request =
"GET /.healthtest.html HTTP/1.1"
"Host: www.magedu.com"
"Connection: close"
.window：基于最近的多少次检查来判断其健康状态；
.threshold：最近.window中定义的这么次检查中至有.threshhold定义的次数是成功;
.interval：检测频度；每个多长时间检查一次
.timeout：超时时长；
.expected_response：期望的响应码，默认为200；

clipboard23.png

健康状态检测的配置方式：

(1) probe PB_NAME { }
backend NAME = {
.probe = PB_NAME;
...
}
(2) backend NAME {
.probe = {
...
}
}
示例：
probe check {
.url = "/.healthcheck.html";
.window = 5;
.threshold = 4;
.interval = 2s;
.timeout = 1s;
}
backend default {
.host = "10.1.0.68";
.port = "80";
.probe = check;
}
backend appsrv {
.host = "10.1.0.69";
.port = "80";
.probe = check;
}
手动设定BE主机的状态：
sick：管理down;
healthy：管理up；
auto：probe auto；

clipboard24.png

设置后端的主机属性：

backend BE_NAME {
...
.connect_timeout = 0.5s;
.first_byte_timeout = 20s;
.between_bytes_timeout = 5s;两个字节之间传送的间隔，如果超时也认为down
.max_connections = 50;
}
varnish的运行时参数：
线程模型：
cache-worker
cache-main
ban lurker
acceptor：
epoll/kqueue：
...
线程相关的参数：使用线程池机制管理线程；
在线程池内部，其每一个请求由一个线程来处理；其worker线程的最大数决定了varnish的并发响应能力；
每个参数都要使用-p来引导
thread_pools：Number of worker thread pools. 最好小于或等于CPU核心数量；
thread_pool_max：每线程池的最大线程数；
thread_pool_min：The minimum number of worker threads in each pool. 额外意义为“最大空闲线程数”；
最大并发连接数 = thread_pools * thread_pool_max
thread_pool_timeout：Thread idle threshold. Threads in excess of thread_pool_min, which have been idle for at least this long, will be destroyed.
thread_pool_add_delay：Wait at least this long after creating a thread.添加线程延迟一段时间，使用默认值就好
thread_pool_destroy_delay：Wait this long after destroying a thread.
Timer相关的参数：
send_timeout：Send timeout for client connections. If the HTTP response hasn't been transmitted in this many seconds the session is closed.
timeout_idle：Idle timeout for client connections.
timeout_req： Max time to receive clients request headers, measured from first non-white-space character to double CRNL.
cli_timeout：Timeout for the childs replies to CLI requests from the mgt_param.

设置方式：
vcl.param
param.set
永久有效的方法：
varnish.params
DEAMON_OPTS="-p PARAM1=VALUE -p PARAM2=VALUE"
varnish运行时参数，重启缓存将失效

clipboard25.png

varnish日志区域：

shared memory log
计数器
日志信息
1、varnishstat - Varnish Cache statistics
-1 表示只显示一批就结束
-1 -f FILED_NAME
-f FILED_NAME 查看某一个字段
-l：可用于-f选项指定的字段名称列表；

MAIN.cache_hit
MAIN.cache_miss 没有命中
# varnishstat -1 -f MAIN.cache_hit -f MAIN.cache_miss
显示指定参数的当前统计数据；
# varnishstat -l -f MAIN -f MEMPOOL
列出指定配置段的每个参数的意义；

clipboard26.png

2、varnishtop - Varnish log entry ranking

-1 Instead of a continously updated display, print the statistics once and exit.
-i taglist，可以同时使用多个-i选项，也可以一个选项跟上多个标签；
-I <[taglist:]regex>
-x taglist：排除列表,出了什么其他的都显示
-X <[taglist:]regex>

clipboard27.png

clipboard28.png

3、varnishlog - Display Varnish logs
4、 varnishncsa - Display Varnish logs in Apache / NCSA combined log format

clipboard29.png

内建函数：

hash_data()：
指明哈希计算的数据；减少差异，以提升命中率；
regsub(str,regex,sub)：
把str中被regex第一次匹配到字符串替换为sub；主要用于URL Rewrite
regsuball(str,regex,sub)：
把str中被regex每一次匹配到字符串均替换为sub；
return()：
ban(expression)
ban_url(regex)：
Bans所有的其URL可以被此处的regex匹配到的缓存对象；
synth(status,"STRING")：purge操作；

总结
varnish： state engine, vcl
varnish 4.0：
vcl_init
vcl_rec
vcl_hash
vcl_hit
vcl_pass
vcl_miss
vcl_pip
vcl_waiting
vcl_purge
vcl_deliver
vcl_synth
vcl_fini
vcl_backend_fetch
vcl_backend_response
vcl_backend_error
sub VCL_STATE_ENGINE {
...
}
backend BE_NAME {}
probe PB_NAME {}acl ACL_NAME {}

实战项目：两个lamp部署wordpress，用Nginx反代，做压测；nginx后部署varnish缓存，调整vcl，多次压测；
ab, http_load, webbench, seige, jmeter, loadrunner,...
补充资料：varnish book
http://book.varnish-software.com/4.0/

示例：
backend imgsrv1 {
.host = "192.168.10.11";
.port = "80";
}
backend imgsrv2
.host = "192.168.10.12";
.port = "80";
}
backend appsrv1 {
.host = "192.168.10.21";
.port = "80";
}
backend appsrv2 {
.host = "192.168.10.22";
.port = "80";
}
sub vcl_init {
new imgsrvs = directors.random();
imgsrvs.add_backend(imgsrv1,10);
imgsrvs.add_backend(imgsrv2,20);
new staticsrvs = directors.round_robin();
appsrvs.add_backend(appsrv1);
appsrvs.add_backend(appsrv2);
new appsrvs = directors.hash();
appsrvs.add_backend(appsrv1,1);
appsrvs.add_backend(appsrv2,1);
}
sub vcl_recv {
if (req.url ~ "(?i).(css|js)$" {
set req.backend_hint = staticsrvs.backend();
}
if (req.url ~ "(?i).(jpg|jpeg|png|gif)$" {
set req.backend_hint = imgsrvs.backend();
} else {
set req.backend_hint = appsrvs.backend(req.http.cookie);
}
}