(一) 对象更新校验方式:
HTTP通过两种方式验证对象是否有更新if-non-match 或者 if-modified-since. 通过在Request中包含上述header向服务器发起询问。当response中包含E-tag头时,浏览器应该使用if-non-match来询问;response中含有last-modified头时,浏览器应用if-modified-since来进行更新询问。HTTP1.1规范建议使用E-tag方式(当不能使用e-tag方式时使用last-modified),但事实上很多现代服务器依然使用last-modified方式。当服务器同时使用E-tag和last-modified时,浏览器应同时发送if-non-match和if-modified-since头,服务器应同时对这两个头进行确认,只有同时符合未更新条件方可返回304响应。
(二) Cache控制:
1. 用在request中的cache控制头
Pragma: no-cache :兼容早起HTTP协议版本 如1.0+
Cache-Control: no-cache ,表示不希望得到一个缓存内容。只是希望,cache设备可能忽略。
Cache-Control: no-store,表示client与server之间的设备不能缓存响应内容,并应该删除已有缓存。
Cache-Control: only-if-cached,表示只接受是被缓存的内容
2. 用在response中控制cache的头
Cache-Control: max-age=3600,用相对于接收到的时间开始可缓存多久
Cache-Control: s-maxage=3600,与上面类似,只是s-maxage一般用在cache服务器上,并只对public缓存有效
Expires: Fri, 05 Jul 2002, 05:00:00 GMT 基于GMT的时间,绝对时间,但该头容易受到本地错误时间影响
Cache-Control: must-revalidate 该头表示内容可以被缓存但每次必须询问是否有更新。
各种cache-control头值和意义:
Cache-Control header directives
Directive
Message type
Description
no-cache
Request
Do not return a cached copy of the document without first revalidating it with the server.
no-store
Request
Do not return a cached copy of the document. Do not store the response from the server.
max-age
Request
The document in the cache must not be older than the specified age.
max-stale
Request
The document may be stale based on the server-specified expiration information, but it must not have been expired for longer than the value in this directive.
min-fresh
Request
The document's age must not be more than its age plus the specified amount. In other words, the response must be fresh for at least the specified amount of time.
no-transform
Request
The document must not be transformed before being sent.
only-if-cached
Request
Send the document only if it is in the cache, without contacting the origin server.
public
Response
Response may be cached by any cache.
private
Response
Response may be cached such that it can be accessed only by a single client.
no-cache
Response
If the directive is accompanied by a list of header fields, the content may be cached and served to clients, but the listed header fields must first be removed. If no header fields are specified, the cached copy must not be served without revalidation with the server.
no-store
Response
Response must not be cached.
no-transform
Response
Response must not be modified in any way before being served.
must-revalidate
Response
Response must be revalidated with the server before being served.
proxy-revalidate
Response
Shared caches must revalidate the response with the origin server before serving. This directive can be ignored by private caches.
max-age
Response
Specifies the maximum length of time the document can be cached and still considered fresh.
s-max-age
Response
Specifies the maximum age of the document as it applies to shared caches (overriding the max-age directive, if one is present). This directive can be ignored by private caches.
(三) 两个特殊的HTTP 动作 options,trace
1. Trace可用来追踪在client和Server之间存在多少个代理服务器,当然前提是代理服务器支持设置via头,用法:
执行:
trace /tttt.gif HTTP/1.1
host:www.sohu.com
服务器会返回如下头信息
HTTP/1.0 200 OK
Date: Mon, 16 Mar 2009 11:47:52 GMT
Server: Apache/1.3.37 (Unix) mod_gzip/1.3.26.1a
Content-Type: message/http
X-Cache: MISS from 19709705.29867846.28603073.sohu.com
Via: 1.0 19709705.29867846.28603073.sohu.com:80 (squid)
Connection: close
服务器返回如下内容(这个内容反应的是中间代理服务器发向OWS的头部内容)
TRACE / HTTP/1.0
Cache-Control: max-age=36288000
Connection: keep-alive
Host: www.sohu.com
Via: 1.1 19709705.29867846.28603073.sohu.com:80 (squid)
X-Forwarded-For: 58.31.225.229
从上可以看出,中间经过了19709705.29867846.28603073.sohu.com代理服务器,而且该服务器只支持http1.0
2.Options可用来探测请求某个对象时,服务器能支持的HTTP动作
OPTIONS /ssss.gif HTTP/1.1
host:www.sohu.com
HTTP/1.0 200 OK
Date: Mon, 16 Mar 2009 11:59:17 GMT
Server: Apache/1.3.37 (Unix) mod_gzip/1.3.26.1a
Cache-Control: max-age=5184000
Expires: Fri, 15 May 2009 11:59:17 GMT
Content-Length: 0
Allow: GET, HEAD, OPTIONS, TRACE
X-Cache: MISS from 32583031.43658676.41464477.sohu.com
Via: 1.1 32583031.43658676.41464477.sohu.com:80 (squid)
Connection: close
(四) HTTP连接控制:
http连接可以分为1.顺序连接 2并行连接 3保持连接
顺序连接:是为每个对象建立一个TCP连接,这导致了传输中增加了大量的TCP建立、拆连时间
并行连接: 同时建立多个TCP通道,并行传输对象,重叠了TCP连接建立时间,因而总体延迟会减少,但并行连接对客户端及服务器性能提出了更高要求,HTTP规范并行TCP连接不应超过2个,事实上现代浏览器已经支持6-10个不等
保持连接:
通过保持TCP通道的打开,在通道内连续传输对象,可以有效减少TCP建立带来的开销或TCP慢启动带来的影响。
在HTTP1.0+版本中开始引入keep-alive概念,在HTTP1.1中改为persistent,两者的区别是HTTP1.0中,必须在header中显式说明keep-alive,而HTTP1.1中persistent是默认行为,除非使用connection:close明确指明关闭连接。
使用keep-alive或persistent需注意:
在HTTP1.0中必须显式申明keep-alive,并在一个通道的后续request中也明确包含keep-alive,否则服务器将会认为client希望关闭连接;服务器的response中可以通过包含connection头来指明是同意keep-alive还是希望关闭连接。
使用保持连接必须在response中正确包含实体内容的长度或使用chunked,否则其他HTTPrequest将无法知道前一个对象是否传输完成。
(五) HTTP规范认为:如果Request中不含Accept-Encoding:即表示接受任意编码类型(例如GZIP压缩.------------实际测试发现并不一定成立。
(六) Chunked
这是一种传输编码,正常情况下http要求先知道对象的大小才能进行传输,以便接收端正确知道传输该何时结束,但是如果服务器无法报告对象的大小(例如)时,且连接是一个保持连接,则必须使用chunked传输。设置chunked后(在response头中设置transfer-encoding:chunked),对象将被切割为多个长度来传输,每次传输均指明当次内容长度,并在最后一次设置0以指示传输结束:
(七) 区间请求(range request)
http容许请求一个文档的指定区间内容,如果一次http下载因为某种原因中途失败,则http可以在下次请求使用range头,这样可以实现断点续传。同时range也广泛用在P2P类下载中,同时从多个服务器上下载同一类容以实现加快下载速度。
GET /bigfile.html HTTP/1.1
Host: www.joes-hardware.com
Range: bytes=4000-
User-Agent: Mozilla/4.61 [en] (WinNT; I)
在request头中包含Range: bytes=4000-表示已经下载4000bytes,本次请求从4000bytes开始即可。
而在response中可以设置Accept-Ranges: bytes以表示服务器可以接受range请求,并求度量单位是byte。
(八) Delta Encoding
一种减少http传输量的方法,正常情况下,如果服务器端一个文档更新后,将导致在下次客户端请求时,服务器端发送整个新文档给客户端,而如果这个文档只是更新了一小部分,重新传输完整的文档则是对资源的一种浪费。http通过delta encoding技术实现只传输变化部分,其技术原理是:
1. 服务器在第一次响应中包含一个e-tag头,表示该文档的一个唯一版本识别码
2. 客户端在下一次请求时,将在request中包含if-non-match头,向服务器询问该文档是否有更新;同时在request设置A-IM(accept-instance manipulation)头表示可以接受delta技术。
3. 服务器在接到请求后发现自己拥有文档的新版本(因为文档的e-tag已经变化了),于是在响应中包含IM头,e-tag头,delta-base头向客户端表明文档是如何更新的,其中IM头的值表示的是delta的某种算法,e-tag头是新的e-tag,delta-base表示本次delta算法是基于哪个版本计算出来的(正常情况下应该等于request中的if-non-match头值)
4. 客户端在接到response后启动delta算法更新本地文档,并更新本地文档的e-tag值为新的e-tag值。
在delta算法中用到的头有:
Delta-encoding headers
Header
Description
ETag
Unique identifier for each instance of a document. Sent by the server in the response; used by clients in subsequent requests in If-Match and If-None-Match headers.
If-None-Match
Request header sent by the client, asking the server for a document if and only if the client's version of the document is different from the server's.
A-IM
Client request header indicating types of instance manipulations accepted.
IM
Server response header specifying the type of instance manipulation applied to the response. This header is sent when the response code is 226 IM Used.
Delta-Base
Server response header that specifies the ETag of the base document used for generating the delta (should be the same as the ETag in the client request's If-None-Match header).
可以包含在A-IM和IM头中的值有(即delta可用的算法):
IANA registered types of instance manipulations
Type
Description
vcdiff
Delta using the vcdiff algorithm[14]
diffe
Delta using the Unix diff -e command
gdiff
Delta using the gdiff algorithm[15]
gzip
Compression using the gzip algorithm
deflate
Compression using the deflate algorithm
range
Used in a server response to indicate that the response is partial content as the result of a range selection
identity
Used in a client request's A-IM header to indicate that the client is willing to accept an identity instance manipulation
(九) HTTP状态码一览表:
Status codes
Status code
Reason phrase
Meaning
100
Continue
An initial part of the request was received, and the client should continue.
101
Switching Protocols
The server is changing protocols, as specified by the client, to one listed in the Upgrade header.
200
OK
The request is okay.
201
Created
The resource was created (for requests that create server objects).
202
Accepted
The request was accepted, but the server has not yet performed any action with it.
203
Non-Authoritative Information
The transaction was okay, except the information contained in the entity headers was not from the origin server, but from a copy of the resource.
204
No Content
The response message contains headers and a status line, but no entity body.
205
Reset Content
Another code primarily for browsers; basically means that the browser should clear any HTML form elements on the current page.
206
Partial Content
A partial request was successful.
300
Multiple Choices
A client has requested a URL that actually refers to multiple resources. This code is returned along with a list of options; the user can then select which one he wants.
301
Moved Permanently
The requested URL has been moved. The response should contain a Location URL indicating where the resource now resides.
302
Found
Like the 301 status code, but the move is temporary. The client should use the URL given in the Location header to locate the resource temporarily.
303
See Other
Tells the client that the resource should be fetched using a different URL. This new URL is in the Location header of the response message.
304
Not Modified
Clients can make their requests conditional by the request headers they include. This code indicates that the resource has not changed.
305
Use Proxy
The resource must be accessed through a proxy, the location of the proxy is given in the Location header.
306
(Unused)
This status code currently is not used.
307
Temporary Redirect
Like the 301 status code; however, the client should use the URL given in the Location header to locate the resource temporarily.
400
Bad Request
Tells the client that it sent a malformed request.
401
Unauthorized
Returned along with appropriate headers that ask the client to authenticate itself before it can gain access to the resource.
402
Payment Required
Currently this status code is not used, but it has been set aside for future use.
403
Forbidden
The request was refused by the server.
404
Not Found
The server cannot find the requested URL.
405
Method Not Allowed
A request was made with a method that is not supported for the requested URL. The Allow header should be included in the response to tell the client what methods are allowed on the requested resource.
406
Not Acceptable
Clients can specify parameters about what types of entities they are willing to accept. This code is used when the server has no resource matching the URL that is acceptable for the client.
407
Proxy Authentication Required
Like the 401 status code, but used for proxy servers that require authentication for a resource.
408
Request Timeout
If a client takes too long to complete its request, a server can send back this status code and close down the connection.
409
Conflict
The request is causing some conflict on a resource.
410
Gone
Like the 404 status code, except that the server once held the resource.
411
Length Required
Servers use this code when they require a Content-Length header in the request message. The server will not accept requests for the resource without the Content-Length header.
412
Precondition Failed
If a client makes a conditional request and one of the conditions fails, this response code is returned.
413
Request Entity Too Large
The client sent an entity body that is larger than the server can or wants to process.
414
Request URI Too Long
The client sent a request with a request URL that is larger than what the server can or wants to process.
415
Unsupported Media Type
The client sent an entity of a content type that the server does not understand or support.
416
Requested Range Not Satisfiable
The request message requested a range of a given resource, and that range either was invalid or could not be met.
417
Expectation Failed
The request contained an expectation in the Expect request header that could not be satisfied by the server.
500
Internal Server Error
The server encountered an error that prevented it from servicing the request.
501
Not Implemented
The client made a request that is beyond the server's capabilities.
502
Bad Gateway
A server acting as a proxy or gateway encountered a bogus response from the next link in the request response chain.
503
Service Unavailable
The server cannot currently service the request but will be able to in the future.
504
Gateway Timeout
Similar to the 408 status code, except that the response is coming from a gateway or proxy that has timed out waiting for a response to its request from another server.
505
HTTP Version Not Supported
The server received a request in a version of the protocol that it can't or won't support.
(十) 【原创】一个负载均衡与E-tag头矛盾导致缓存效果变坏的实例分析:
负载均衡服务器后端是WEB服务器,但这些服务器是异构的比如说有linux的有windows的。
linux上设置http response中含last-modified头,但没有etag头:
windows服务器上设置response中既有last-modified 又有etag头。
第一次打开网站,a图片是从windows服务器上下到的,b图片是从linux服务器上下到的。
第二次打开网站(第2次打开时候超过缓存时间,由于该网站响应中只含有last-modified头,因此浏览器会使用启发式机制来计算可缓存时间。启发式缓存时间控制会有一个计算系数
WA上的assembly策略中有一个50%的系数就是控制这个的)
浏览器在请求a图片时候,被分配到了linux服务器上,b图片被分配到了windows服务器上。
由于a图片在第一次下载时拥有etag和last-modified两种属性。因此在第二次请求时浏览器会同时进行带2个条件的get
if-none-match和if-modified-since,根据http规范必须这2个条件同时满足未变化才会返回304。可惜第2次请求被分配到了linux服务器上,这个服务器是没有设置etag属性的,本来可以从本地缓存的图片却变成了重新下载:
进一步分析:
使用e-tag是一件很坏的事情:
不同的服务器对同样的e-tag算出的值是不一样的,如果用e-tag作为判断条件,在被负载均衡到不同服务器后,则很容易导致缓存失效
同一图片在不同服务器上e-tag不同导致重新下载。
从服务器选择上看,这个图片这次恰好又分配到了另一台windows服务器,这样e-tag和last-modified头都有了,可以看到时间没有变化。可惜的是由于e-tag不一致导致重新下载。