参考:
https://www.varnish-cache.org/docs/4.1/users-guide/increasing-your-hitrate.html
Achieving a high hitrate
翻译内容:
到现在为止,你的varnish已经运行起来了,你能访问你的web应用了通过varnish。除非你的web应用程序开始写的时候就工作在一个web加速器的后面,否则你将需要去改变你的配置文件或者应用为了提高缓存命中率。
varnish将不缓存你的数据,除非绝对的确信它这样做是安全的。因此你需要懂得varnish是否决定和怎样去缓存一个页面的。我们将引导你使用三两个工具,这两个工具是非常有用的对于去理解懂得varnsh的设置将会发生什么效果。
注:你需要一个工具去看http的头部信息在varnish和后端服务器之间传输之间,有一个很容易的方法是去使用varnishlog and varnishtop(这两个命令),但是有时候一个client-side工具会更加清晰(如curl命令)。这是我们通常使用的
工具: varnishtop
你可以使用varishtop查看那个urls是被请求到了后端服务器。使用 varnishtop -i BereqURL 是一个最好最本质的命令。显示最高的请求被发送到了后端。你能看到一些其他的例子使用varnishtop
本人喜欢用:
varnishtop -i ReqURL #查看那个url访问最多
varnishtop -i BereqURL # 透过varnish到后端的请求那个多,一般是排除MISS高的原因
Tool: varnishlog
当你确定发下一个url频繁的发送到后端服务器时候,你能使用varnishlog去查看这个请求,
varnishlog -q 'ReqURL ~ "^/foo/bar"' 它将显示来自客户端匹配/foo/bar的请求。
获取更多的信息关于varnishlog的工作,请查看 Loggin in Varnish章节,或者man帮助文档。
https://www.varnish-cache.org/docs/4.1/users-guide/operation-logging.html
Tool: lwp-request
lwp-request is tool that is a part of The World-Wide Web library for Perl.
是一个perl语言的开源工具,centos上安装使用 yum install perl-libwww-perl 可以获取。
我们常用 GET 和 HEAD命令,它可以详细的显示出请求的http head和响应的 http response
$ GET -H 'Host: www.vg.no' -Used http://vg.no/ GET http://vg.no/ Host: www.vg.no User-Agent: lwp-request/5.834 libwww-perl/5.834 200 OK Cache-Control: must-revalidate Refresh: 600 Title: VG Nett - Forsiden - VG Nett X-Age: 463 X-Cache: HIT X-Rick-Would-Never: Let you down X-VG-Jobb: http://www.finn.no/finn/job/fulltime/result?keyword=vg+multimedia Merk:HeaderNinja X-VG-Korken: http://www.youtube.com/watch?v=Fcj8CnD5188 X-VG-WebCache: joanie X-VG-WebServer: leon
-H 添加请求头部
-U print request headers,
-s prints response status,
-e prints response headers and
-d discards the actual content
Tool: Live HTTP Headers
一个火狐浏览器的插件,这插件能显示请求和接收的头部信息。
The role of HTTP Headers
随着每一个http的请求和响应,都携带这头部信息元数据。varnish检查这些头部信息决定如何使用适当的方式去缓存这些内容并决定缓存的时长。
请注意varnish考虑这些头部,varnish实际是考虑他自己做为web服务器的一部分。这理论基础是在你的控制之下。(以下这句不好翻译)
Please note that when Varnish considers these headers Varnish actually considers itself part of the actual webserver. The rationale being that both are under your control.
这项代理缓存来源不是被很好的定义在iETF和RFC 2616中,所以varnish各样的工作方式可能和你期待的不一样。
让我们看看这些重要的头部,你是需要知道的。
Cookies
在默认的配置中,varnish将不缓存来自后端携带有 'Set-Cookie' 头部的对象。因此如果客户端发送一个Cookie header头部,varnish将直接pass这个对象到后端服务器。
这个默认配置可能是过度保守的。许多站点使用 Google Analytics(GA) 去分析他们的流量。GA 设置了一个cookie去跟踪你。这cookied被客户端这边的javascrip所使用并且因此对服务器没有兴趣。
Cookies from the client
对于许多web应用程序,它是很有意义的完全忽视这个cookie除非你访问的是一个指定的web站点。这VCL片段你v的cl_recv子函数中将忽视cookie除非你访问 /admin/:
if (!(req.url ~ "^/admin/")) { unset req.http.Cookie; }
很简单。如果,然后,你需要做一些更加复杂的,像移除许多cookie中的某一个。不幸的是varnish没有很好的工具去操作这cookies。我们不得不是使用正则表达式去做这项工作。如果你很熟悉正则表达式,你将懂得这是怎么回事。如果你不熟悉,我们推荐你拾起书,看这一方面的内容读正则表达式的帮助文档,在一些在线站点。
让我们使用一个使用varnish 软件的web的例子。 非常简单的安装使用vaarnish,能被描述成使用后端基于一个varnish缓存 web在前端。varnish 软件使用一些cookie为谷歌分析跟踪和相似的工具。这些cookied能被设定并且被javascript使用。Varnish和Drupal不需要使用这些cookies并且varnish将停止缓存和页面当客户端发送这些cookie的时候。varnish将忽视一些不必要的cookie在VCL中。
在接下的VCl中我们丢弃所有的cookie以下换线开始的。
# Remove has_js and Google Analytics __* cookies. set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(_[_a-z]+|has_js)=[^;]*", ""); # Remove a ";" prefix, if present. set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
注:函数的用法:
regsuball(str, regex, sub) //同上,不过这个方法是替换所有匹配的
看下面这个例子,我们将移除所有cookie除名字为"COOKIE1"和"COOKIE2"的。并且你能看到奇迹般的发生了,我们为之惊叹!
sub vcl_recv { if (req.http.Cookie) { set req.http.Cookie = ";" + req.http.Cookie; set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";"); set req.http.Cookie = regsuball(req.http.Cookie, ";(COOKIE1|COOKIE2)=", "; \1="); set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", ""); set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", ""); if (req.http.Cookie == "") { unset req.http.Cookie; } } }
如cookies为: cityid=1; _ga=GA1.2.924326379.1469841379; PHPSESSID=b47kj3n2cekhku28ohgg5u84a0; hfcityid=1; 第一步变成 ;cityid=1; _ga=GA1.2.924326379.1469841379; PHPSESSID=b47kj3n2cekhku28ohgg5u84a0; hfcityid=1; 第二步变成 ;cityid=1;_ga=GA1.2.924326379.1469841379;PHPSESSID=b47kj3n2cekhku28ohgg5u84a0;hfcityid=1; 第三步变成 ; 第四不变成 空 第二步模拟: echo ";cityid=1; _ga=GA1.2.924326379.1469841379; PHPSESSID=b47kj3n2cekhku28ohgg5u84a0; hfcityid=1;" | sed -r "s/; +/;/g" 第三步模拟: echo ";cityid=1;_ga=GA1.2.924326379.1469841379;PHPSESSID=b47kj3n2cekhku28ohgg5u84a0;hfcityid=1;" | sed -r 's/;[^ ][^;]*//g' 第四步模拟: echo ";" | sed -r 's/^[; ]+|[; ]+$//g'
下面的的例子能完成相同的功能。不是过滤的方法而是把需要的提取出来。
sub vcl_recv { # save the original cookie header so we can mangle it set req.http.X-Varnish-PHP_SID = req.http.Cookie; # using a capturing sub pattern, extract the continuous string of # alphanumerics that immediately follows "PHPSESSID=" set req.http.X-Varnish-PHP_SID = regsuball(req.http.X-Varnish-PHP_SID, ";? ?PHPSESSID=([a-zA-Z0-9]+)( |;| ;).*","\1"); set req.http.Cookie = req.X-Varnish-PHP_SID; unset req.X-Varnish-PHP_SID; }
其他的实现方法可以在varnish cache wiki中查看。
Cookies coming from the backend
如果你的后端服务器设定了一个cookie使用 'Set-Cookie'的头部varnish将不缓存这个页面,使用默认配置。 一个 hit-for-pass 对象被创建。因此如果你的后端服务器的行为愚蠢并且设定了不该设定的cookie,你最好unset这个'Set-Cookie'.
Cache-Control
'Cache-Control'头部控制如何缓存内容。Varnish关心这max-age参数并且使用它去计算对象的TTL值。因此确定'Cache-Control'头部的max-age。你可以用以下的方法:
$ GET -Used http://www.varnish-software.com/|grep ^Cache-Control Cache-Control: public, max-age=600
Age
Varnish增加一个'Age'头部来表明这个对象已经在Vanish中保存了多久。
你能好使用varnishlog过滤出这个。
varnishlog -I RespHeader:^Age
.
Pragma
在http1.0中server能够发送Pragma: nocache。 Varnish忽略这个头部。你能非常容易的支持这个在VCL中。
In vcl_backend_response:
if (beresp.http.Pragma ~ "nocache") { set beresp.uncacheable = true; set beresp.ttl = 120s; # how long not to cache this url.}
Authorization
如果varnish看到一个‘Authencation’头部,varnish将pass这个请求。如果他不是你想要的东西,你能unset这个头部。
Overriding the time-to-live (TTL)
有时候你的后端服务器对于设置ttl值不合理,你很容易的改变这种现状。
You need VCL to identify the objects you want and then you set the 'beresp.ttl' to whatever you want:
sub vcl_backend_response { if (bereq.url ~ "^/legacy_broken_cms/") { set beresp.ttl = 5d; }}
This example will set the TTL to 5 days for the old legacy stuff on your site.
Forcing caching for certain requests and certain responses
有些不好的后端响应设置,我们可以进行覆盖。我们推荐使用默认的缓存规则。它是很容易的强制缓存,但是我们不这么推荐
Since you still might have this cumbersome backend that isn't very friendly to work with you might want to override more stuff in Varnish. We recommend that you rely as much as you can on the default caching rules. It is perfectly easy to force Varnish to lookup an object in the cache but it isn't really recommended.
Normalizing your namespace
规范你的命名空间。 一个站点可能有许多的主机名。如
http://www.varnish-software.com/
http://varnsih-software.com/
http://varnishsoftware.com/
者三个指向同一个站点。varnish不知道他们是相同的站点。varnsih会为每一个主机名进行缓存。你能减轻这个缓存大小通过重定向或者以下的VCL。
if (req.http.host ~ "(?i)^(www.)?varnish-?software.com") { set req.http.host = "varnish-software.com";}
HTTP Vary
http vary是一个不通俗的概念。目前为止它是最让人误解的一个header头。
A lot of the response headers tell the client something about the HTTP object being delivered. Clients can request different variants of a HTTP object, based on their preference. Their preferences might cover stuff like encoding or language. When a client prefers UK English this is indicated through Accept-Language: en-uk
. Caches need to keep these different variants apart and this is done through the HTTP response header 'Vary'.
When a backend server issues a Vary: Accept-Language
it tells Varnish that its needs to cache a separate version for every different Accept-Language that is coming from the clients.
If two clients say they accept the languages "en-us, en-uk" and "da, de" respectively, Varnish will cache and serve two different versions of the page if the backend indicated that Varnish needs to vary on the 'Accept-Language' header.
Please note that the headers that 'Vary' refer to need to match exactly for there to be a match. So Varnish will keep two copies of a page if one of them was created for "en-us, en-uk" and the other for "en-us,en-uk". Just the lack of a whitespace will force Varnish to cache another version.
To achieve a high hitrate whilst using Vary is there therefore crucial to normalize the headers the backends varies on. Remember, just a difference in casing can force different cache entries.
以下的VCL代码将规范化这个"Accept-Language"头部。设置成"en" "de" 或者 "fr"
if (req.http.Accept-Language) { if (req.http.Accept-Language ~ "en") { set req.http.Accept-Language = "en"; } elsif (req.http.Accept-Language ~ "de") { set req.http.Accept-Language = "de"; } elsif (req.http.Accept-Language ~ "fr") { set req.http.Accept-Language = "fr"; } else { # unknown language. Remove the accept-language header and # use the backend default. unset req.http.Accept-Language }}
Vary parse errors
返回的错误为503 如果解析不了"Vary" 头部。或者......
Varnish will return a "503 internal server error" page when it fails to parse the 'Vary' header, or if any of the client headers listed in the Vary header exceeds the limit of 65k characters. An 'SLT_Error' log entry is added in these cases.
Pitfall - Vary: User-Agent
Some applications or application servers send Vary: User-Agent
along with their content. This instructs Varnish to cache a separate copy for every variation of 'User-Agent' there is and there are plenty. Even a single patchlevel of the same browser will generate at least 10 different 'User-Agent' headers based just on what operating system they are running.
So if you really need to vary based on 'User-Agent' be sure to normalize the header or your hit rate will suffer badly. Use the above code as a template.