Web Cache

    程序运行具有局部性:

        时间局部性

        空间局部性


    key-value:

        key: 访问路径,URL, hash

        value:web content


    数据具有热点:


    缓存命中率:hit/(hit+miss)

        文档命中率:从文档个数进行衡量;

        字节命中率:从内容大小进行衡量;


    缓存的生命周期:
        缓存清理:缓存项过期、缓存空间耗尽;
    

    缓存与否:
        私有数据:不可缓存;


    缓存处理的步骤:
        接收请求 --> 解析请求 (提取请求的URL及各种首部)--> 查询缓存 --> 新鲜度检测 --> 创建响应报文 --> 发送响应 --> 记录日志


    缓存控制机制:
        1、过期日期:

                HTTP/1.0 Expires

                    示例:Expires:Thu, 04 Jun 2015 23:38:18 GMT           指明绝对时间,但是受时区的影响;

                HTTP/1.1 Cache-Control: max-age

                    示例:Cache-Control:max-age=3600     指明3600秒过期,相对时长;


    Cache-Control   = "Cache-Control" ":" 1#cache-directive

    cache-directive = cache-request-directive

         | cache-response-directive

    cache-request-directive =                 请求

           "no-cache"                                              

         | "no-store" (backup)                          

         | "max-age" "=" delta-seconds         

         | "max-stale" [ "=" delta-seconds ]  

         | "min-fresh" "=" delta-seconds      

         | "no-transform"                      

         | "only-if-cached"                   

         | cache-extension                   

     cache-response-directive =               响应

           "public"                                         可以放在共同缓存之上

         | "private" [ "=" <"> 1#field-name <"> ]                    只能私有缓存

         | "no-cache" [ "=" <"> 1#field-name <"> ]                可缓存,必须在响应客户端前,验证缓存有效与否

         | "no-store"                                                                  不可缓存

         | "no-transform"                         

         | "must-revalidate"                                                       必须做重新校验  

         | "proxy-revalidate"                    

         | "max-age" "=" delta-seconds            

         | "s-maxage" "=" delta-seconds                                   共同缓存时长

         | cache-extension

    

        2、有效性再验正:revalidate

                如果原始内容未改变,则仅响应首部(不附带body部分),响应码304 (Not Modified)

                如果原始内容发生改变,则正常响应,响应码200;

                如果原始内容消失,则响应404,此时缓存中的缓存项也应该被删除;


                    条件式请求首部:

                        If-Modified-Since:基于原始内容的最近一个修改时间戳进行;自从某某时间,发生改变,则返回新的内容,响应码为200;没发生改变,则返回响应码304;

                        If-Unmodified-Since:

                        If-Match:

                        If-None-Match:基于Etag(扩展标记)的比较进行;如果不匹配,则返回新的内容,响应码为200,否则返回响应码304

                                示例:Etag: faiy89345


Web Cache:常见的缓存服务开源解决方案:squid, varnish

    varnish官网:https://www.varnish-cache.org


    varnish通过可以基于文件系统接口进行访问的共享内存区域来记录日志(shared memory log);默认为90M;分为两部分:
            前半部分是计数器
            后半部分为客户端请求的数据


    vcl:Varnish Configuration Language:varinish配置语言

    缓存策略配置接口;

    基于“域”的简单编程语言;

    

    varnish缓存内容的存储:
        (1)file:自管理的文件系统,黑盒,不支持持久机制;

        (2)malloc:使用malloc()库调用在varnish启动时向内存申请指定大小的空间,不支持持久机制;

        (3)persistent:与file功能相同,基于文件的持久存储,仍处于测试期;

            

                指明缓存存储:-s  [name=]type[,options]

                        malloc[,size]

                        file[,path[,size[,granularity]]]

                        persistent,path,size


    varnishd v4程序的选项有两类:

            程序选项:  

                    -s,-f,...

            运行时选项:

                    -p param=value


    配置进程特性:centos7:/etc/varnish/varnish.params

    配置缓存策略:centos7:/etc/varnish/default.vcl

    服务:systemctl  start varnish.service



    vcl:“域”专用的编程语言;状态引擎:state engine;
    VCL存在多个状态引擎,状态之间存在相关性,但彼此间相互隔离;每个引擎使用return(x)来退出当前状态,并转入下一状态;不同状态的引擎,其x是不尽相同;


    请求处理流程:
        (1)请求的为可缓存:
                (a)命中:通过本地缓存响应;
                (b)未命中:到后端服务器取得响应内容;
                            可缓存对象:先缓存再响应;定义缓存时间、自定义缓存键;
                            不可缓存对象:不缓存而直接响应;

        (2)请求的为不可缓存:到后端服务器取得直接响应;


    varnish finate state machine:

        vcl_rec:

                    hit:vcl_hit:
                    miss:vcl_miss

                    purge:vcl_purge

                    pipe:vcl_pipe

                    pass,hit_for_pass:vcl_pass

       vcl_hash:lookup

        

        vcl_backend_fetch

        vcl_backend_response

        vcl_backend_error


        vcl_synth


        vcl_deliver:


    数据报文处理流向:

            vcl_recv---》vcl_hash---》

                    (1)vcl_hit---》

                            (a)vcl_deliver

                            (b)vcl_pass---》vcl_backend_fetch

                    (2)vcl_miss---》

                            (a)vcl_pass

                            (b)vcl_backend_fetch

                    (3)vcl_purge---》vcl_synth

                    (4)vcl_pipe---》


            vcl_backend_fetch---》

                    vcl_backend_response---》vcl_deliver

                    vcl_backend_error


    vcl的语法:

            (1)//,#,/*...*/,:注释

            (2)sub $name:定义子例程;

            (3)不支持循环,支持条件判断;

            (4)有内建变量;

            (5)使用终止语句return,没有返回值;

            (6)操作符:=,==,!=,~,&&,||


    测试使用示例1:判断是否命中与对应的server ip;

            sub vcl_deliver{

                        if  (obj.hits>0) {

                                set resp.http.X-Cache = "HIT" + " " + server.ip;

                        } else {

                                set resp.http.X-Cache = "MISSS" + " " + server.ip;

                        }

            }


     测试使用示例2:禁止test.html使用缓存;

            sub vcl_recv{

                    if  (req.url ~ "^/test.html$") {

                            return(pass);

                    }

            }


    测试使用示例3:强制对某资源的请求不检查缓存:

             sub vcl_recv{

                    if  (req.url ~ "(?i)~/login" || req.url ~ "(?i)^/admin" {                       //   (?i)此处表示不区分大小写

                            return(pass);

                    }

            }


    对特定类型的资源,例如公开的图片等,取消其私有标识,并强行设定其可以由varnish缓存的时长;

            sub vcl_backend_response{

                    if   (bereq.url ~ "(?i)\.jpg$") {

                            set beresp.ttl = 7200s;

                            unset beresp.http.Set-Cookie;

                    }

                   if   (bereq.url ~ "(?i)\.css$") {

                            set beresp.ttl = 3600s;

                            unset beresp.http.Set-Cookie;

                    }

            }



     变量:

        内建变量:
                    req*:由客户端发来的http请求相关;

                            req.http.*:请求报文各首部;

                    bereq.*:由varnish向backend主机发出的http请求;

                    beresp.*:由backend主机发来的http响应报文;

                    resp.*:由varnish响应给客户端的http响应报文;

                            req.http.*:响应报文的各首部;

                    obj.*:存储在缓存空间中的缓存对象属性;只读;

                    client.*,server.*,storage.*:可用在所有的client side的sub routines中;

        自定义:set                    

                    

        常用的变量:

                bereq.http.HEADERS:

                bereq.request:请求方法;

                bereq.url:请求的url;
                bereq.proto:协议版本
                bereq.backend:指明要调用的后端主机;

                

                beresp.proto:

                beresp.status:响应的状态码;

                beresp.reason:

                beresp.backend.name:

                beresp.http.HEADERS:

                beresp.ttl:后端服务器响应中的内容的余下的生存时长;

        

                obj.hits:此对象从缓存中命中的次数;

                obj.ttl:对象的ttl值;

        

                server.ip

                server.hostname

                server.port


                req.method:请求方法;

                req.url:请求的url;


        缓存对象修剪的方式:purge、band

            (1)purge

                        acl purgers {

                                "127.0.0.1";

                                "172.20.120.0"/23;

                        }                                    //为purge增加访问控制,即只允许某些主机执行purge请求method;

                       sub vcl_purge {

                            return(synth(200,"Purged."));

                        }

                        sub vcl_recv {

                            if (req.method == "PURGE")   {

                                    if (!client.ip ~ purgers) {

                                            return(synth(405,"Purging not allowed for" + client.ip));

                                    }

                                    return(purge);

                            }

                        }

                测试:curl   -X  PURGE  http://172.20.120.40/night.jpg


            

    设定多个后端服务器:

            backend appsrv {

                        .host = "172.20.120.40";

                        .port = "80";

            }

            backend default {

                        .host = "172.20.120.41";

                        .port = "80";

            }


            sub vcl_recv {

                    if (req.url ~ "(?i)\.php$") {

                            set req.backend.hint = appsrv;

                    }  else {

                            set req.backend.hint = default;

                    }

            }

      

        

        后端主机的健康状态检测方式:

            probe name {

                     .attribute = "value";

            }


                .url: 判定BE健康与否要请求的url; 

                .timeout =    :请求超时

                .interval =   :请求间隔

                .window =    :采样次数

                .threshold =     :采样次数中至少多少次成功,才算是正常的健康状态;


                .request=

                    "GET / HTTP/1.1"

                    "Host: 172.20.120.41"

                    "Connection: close"

                .expected_response = 200;        :期望响应状态码;默认为200;


                示例1:

                backend websrv1 {

                    .host = "172.16.100.68";

                    .port = "80";

                    .probe = {

                        .url = "/test1.html";

                        .timeout = 2s;   

                        .interval =  1s; 

                        .window =   8;

                        .threshold =  4;

                    }

                }

                

                backend websrv2 {

                    .host = "172.16.100.69";

                    .port = "80";

                    .probe = {

                        .url = "/test1.html";

                        .timeout = 2s;   

                        .interval =  1s; 

                        .window =   8;

                        .threshold =  4;

                    }

                }

                

                sub vcl_recv {

                    if (req.url ~ "(?i)\.(jpg|png|gif)$") {

                        set req.backend_hint = websrv1;

                    } else {

                        set req.backend_hint = websrv2;

                    }

                }


                示例2:定义后端服务负载均衡:

                import directors; 

                sub vcl_init {

                    new mycluster = directors.round_robin();

                    mycluster.add_backend(websrv1);

                    mycluster.add_backend(websrv2);

                }

                

                vcl_recv {

                set req.backend_hint = mycluster.backend();

                }

                

                负载均衡算法:

                fallback, random, round_robin, hash

     

                                       

    varnish命令行工具:

            varnishadm -S /etc/varnish/secret -T IP:PORT        :连接到varnish,查看其状态;

                     varnish>

                            param.show    :查看运行参数

                            param.set        :设置运行参数

            varnishtop            :对varnish日志信息排序等

            varnishncsa            :以apache、ncsa格式显示日志信息,可以以服务模式启动:varnishncsa.service

            varnishlog                :显示原始日志,可以以服务模式启动:varnishlog.service

            varnishstat            :显示varnish的缓存统计信息


    varnish的线程模型:

            cache-worker线程

            cache-main线程:此线程只有一个,用于启动caceh;

            ban luker:

            acceptor:

            epoll:线程池管理器

            expire:清理过期缓存

            

varnish定义其最大并发连接数:线程池模型:

            thread_pools:线程池个数;默认为2;

            thread_pool_max:单线程池内允许启动的最多线程个数;

            thread_pool_min

            thread_pool_timeout:多于thread_pool_min的线程空闲此参数指定的时长后即被purge;