HTTP协议知识点(收集)

引用

HTTP协议知识点(收集)

(一)   对象更新校验方式:

HTTP通过两种方式验证对象是否有更新if-non-match 或者 if-modified-since. 通过在Request中包含上述header向服务器发起询问。当response中包含E-tag头时,浏览器应该使用if-non-match来询问;response中含有last-modified头时,浏览器应用if-modified-since来进行更新询问。HTTP1.1规范建议使用E-tag方式(当不能使用e-tag方式时使用last-modified),但事实上很多现代服务器依然使用last-modified方式。当服务器同时使用E-tag和last-modified时,浏览器应同时发送if-non-match和if-modified-since头,服务器应同时对这两个头进行确认,只有同时符合未更新条件方可返回304响应。



(二)   Cache控制:

1.       用在request中的cache控制头

Pragma: no-cache :兼容早起HTTP协议版本 如1.0+

Cache-Control: no-cache ,表示不希望得到一个缓存内容。只是希望,cache设备可能忽略。

Cache-Control: no-store,表示client与server之间的设备不能缓存响应内容,并应该删除已有缓存。

Cache-Control: only-if-cached,表示只接受是被缓存的内容

2.       用在response中控制cache的头

Cache-Control: max-age=3600,用相对于接收到的时间开始可缓存多久
Cache-Control: s-maxage=3600,与上面类似,只是s-maxage一般用在cache服务器上,并只对public缓存有效
Expires: Fri, 05 Jul 2002, 05:00:00 GMT 基于GMT的时间,绝对时间,但该头容易受到本地错误时间影响

Cache-Control: must-revalidate 该头表示内容可以被缓存但每次必须询问是否有更新。

各种cache-control头值和意义:

Cache-Control header directives

Directive

Message type

Description

no-cache

Request

Do not return a cached copy of the document without first revalidating it with the server.

no-store

Request

Do not return a cached copy of the document. Do not store the response from the server.

max-age

Request

The document in the cache must not be older than the specified age.

max-stale

Request

The document may be stale based on the server-specified expiration information, but it must not have been expired for longer than the value in this directive.

min-fresh

Request

The document's age must not be more than its age plus the specified amount. In other words, the response must be fresh for at least the specified amount of time.

no-transform

Request

The document must not be transformed before being sent.

only-if-cached

Request

Send the document only if it is in the cache, without contacting the origin server.

public

Response

Response may be cached by any cache.

private

Response

Response may be cached such that it can be accessed only by a single client.

no-cache

Response

If the directive is accompanied by a list of header fields, the content may be cached and served to clients, but the listed header fields must first be removed. If no header fields are specified, the cached copy must not be served without revalidation with the server.

no-store

Response

Response must not be cached.

no-transform

Response

Response must not be modified in any way before being served.

must-revalidate

Response

Response must be revalidated with the server before being served.

proxy-revalidate

Response

Shared caches must revalidate the response with the origin server before serving. This directive can be ignored by private caches.

max-age

Response

Specifies the maximum length of time the document can be cached and still considered fresh.

s-max-age

Response

Specifies the maximum age of the document as it applies to shared caches (overriding the max-age directive, if one is present). This directive can be ignored by private caches.





(三)   两个特殊的HTTP 动作 options,trace

1.       Trace可用来追踪在client和Server之间存在多少个代理服务器,当然前提是代理服务器支持设置via头,用法:

执行:

trace /tttt.gif HTTP/1.1

host:www.sohu.com

服务器会返回如下头信息

HTTP/1.0 200 OK

Date: Mon, 16 Mar 2009 11:47:52 GMT

Server: Apache/1.3.37 (Unix) mod_gzip/1.3.26.1a

Content-Type: message/http

X-Cache: MISS from 19709705.29867846.28603073.sohu.com

Via: 1.0 19709705.29867846.28603073.sohu.com:80 (squid)

Connection: close

服务器返回如下内容(这个内容反应的是中间代理服务器发向OWS的头部内容)

TRACE / HTTP/1.0

Cache-Control: max-age=36288000

Connection: keep-alive

Host: www.sohu.com

Via: 1.1 19709705.29867846.28603073.sohu.com:80 (squid)

X-Forwarded-For: 58.31.225.229

从上可以看出,中间经过了19709705.29867846.28603073.sohu.com代理服务器,而且该服务器只支持http1.0

2.Options可用来探测请求某个对象时,服务器能支持的HTTP动作

OPTIONS /ssss.gif HTTP/1.1

host:www.sohu.com



HTTP/1.0 200 OK

Date: Mon, 16 Mar 2009 11:59:17 GMT

Server: Apache/1.3.37 (Unix) mod_gzip/1.3.26.1a

Cache-Control: max-age=5184000

Expires: Fri, 15 May 2009 11:59:17 GMT

Content-Length: 0

Allow: GET, HEAD, OPTIONS, TRACE

X-Cache: MISS from 32583031.43658676.41464477.sohu.com

Via: 1.1 32583031.43658676.41464477.sohu.com:80 (squid)

Connection: close



(四)   HTTP连接控制:

http连接可以分为1.顺序连接 2并行连接 3保持连接

顺序连接:是为每个对象建立一个TCP连接,这导致了传输中增加了大量的TCP建立、拆连时间



并行连接: 同时建立多个TCP通道,并行传输对象,重叠了TCP连接建立时间,因而总体延迟会减少,但并行连接对客户端及服务器性能提出了更高要求,HTTP规范并行TCP连接不应超过2个,事实上现代浏览器已经支持6-10个不等



保持连接:

通过保持TCP通道的打开,在通道内连续传输对象,可以有效减少TCP建立带来的开销或TCP慢启动带来的影响。



在HTTP1.0+版本中开始引入keep-alive概念,在HTTP1.1中改为persistent,两者的区别是HTTP1.0中,必须在header中显式说明keep-alive,而HTTP1.1中persistent是默认行为,除非使用connection:close明确指明关闭连接。

使用keep-alive或persistent需注意:

在HTTP1.0中必须显式申明keep-alive,并在一个通道的后续request中也明确包含keep-alive,否则服务器将会认为client希望关闭连接;服务器的response中可以通过包含connection头来指明是同意keep-alive还是希望关闭连接。

使用保持连接必须在response中正确包含实体内容的长度或使用chunked,否则其他HTTPrequest将无法知道前一个对象是否传输完成。



(五)   HTTP规范认为:如果Request中不含Accept-Encoding:即表示接受任意编码类型(例如GZIP压缩.------------实际测试发现并不一定成立。



(六)   Chunked

这是一种传输编码,正常情况下http要求先知道对象的大小才能进行传输,以便接收端正确知道传输该何时结束,但是如果服务器无法报告对象的大小(例如)时,且连接是一个保持连接,则必须使用chunked传输。设置chunked后(在response头中设置transfer-encoding:chunked),对象将被切割为多个长度来传输,每次传输均指明当次内容长度,并在最后一次设置0以指示传输结束:





(七)   区间请求(range request)

http容许请求一个文档的指定区间内容,如果一次http下载因为某种原因中途失败,则http可以在下次请求使用range头,这样可以实现断点续传。同时range也广泛用在P2P类下载中,同时从多个服务器上下载同一类容以实现加快下载速度。

GET /bigfile.html HTTP/1.1
Host: www.joes-hardware.com
Range: bytes=4000-
User-Agent: Mozilla/4.61 [en] (WinNT; I)
在request头中包含Range: bytes=4000-表示已经下载4000bytes,本次请求从4000bytes开始即可。

而在response中可以设置Accept-Ranges: bytes以表示服务器可以接受range请求,并求度量单位是byte。







(八)   Delta Encoding

一种减少http传输量的方法,正常情况下,如果服务器端一个文档更新后,将导致在下次客户端请求时,服务器端发送整个新文档给客户端,而如果这个文档只是更新了一小部分,重新传输完整的文档则是对资源的一种浪费。http通过delta encoding技术实现只传输变化部分,其技术原理是:

1.       服务器在第一次响应中包含一个e-tag头,表示该文档的一个唯一版本识别码

2.       客户端在下一次请求时,将在request中包含if-non-match头,向服务器询问该文档是否有更新;同时在request设置A-IM(accept-instance manipulation)头表示可以接受delta技术。

3.       服务器在接到请求后发现自己拥有文档的新版本(因为文档的e-tag已经变化了),于是在响应中包含IM头,e-tag头,delta-base头向客户端表明文档是如何更新的,其中IM头的值表示的是delta的某种算法,e-tag头是新的e-tag,delta-base表示本次delta算法是基于哪个版本计算出来的(正常情况下应该等于request中的if-non-match头值)

4.       客户端在接到response后启动delta算法更新本地文档,并更新本地文档的e-tag值为新的e-tag值。



在delta算法中用到的头有:

Delta-encoding headers

Header

Description

ETag

Unique identifier for each instance of a document. Sent by the server in the response; used by clients in subsequent requests in If-Match and If-None-Match headers.

If-None-Match

Request header sent by the client, asking the server for a document if and only if the client's version of the document is different from the server's.

A-IM

Client request header indicating types of instance manipulations accepted.

IM

Server response header specifying the type of instance manipulation applied to the response. This header is sent when the response code is 226 IM Used.

Delta-Base

Server response header that specifies the ETag of the base document used for generating the delta (should be the same as the ETag in the client request's If-None-Match header).

可以包含在A-IM和IM头中的值有(即delta可用的算法):

IANA registered types of instance manipulations

Type

Description

vcdiff

Delta using the vcdiff algorithm[14]

diffe

Delta using the Unix diff -e command

gdiff

Delta using the gdiff algorithm[15]

gzip

Compression using the gzip algorithm

deflate

Compression using the deflate algorithm

range

Used in a server response to indicate that the response is partial content as the result of a range selection

identity

Used in a client request's A-IM header to indicate that the client is willing to accept an identity instance manipulation





(九)   HTTP状态码一览表:

Status codes

Status code

Reason phrase

Meaning

100

Continue

An initial part of the request was received, and the client should continue.

101

Switching Protocols

The server is changing protocols, as specified by the client, to one listed in the Upgrade header.

200

OK

The request is okay.

201

Created

The resource was created (for requests that create server objects).

202

Accepted

The request was accepted, but the server has not yet performed any action with it.

203

Non-Authoritative Information

The transaction was okay, except the information contained in the entity headers was not from the origin server, but from a copy of the resource.

204

No Content

The response message contains headers and a status line, but no entity body.

205

Reset Content

Another code primarily for browsers; basically means that the browser should clear any HTML form elements on the current page.

206

Partial Content

A partial request was successful.

300

Multiple Choices

A client has requested a URL that actually refers to multiple resources. This code is returned along with a list of options; the user can then select which one he wants.

301

Moved Permanently

The requested URL has been moved. The response should contain a Location URL indicating where the resource now resides.

302

Found

Like the 301 status code, but the move is temporary. The client should use the URL given in the Location header to locate the resource temporarily.

303

See Other

Tells the client that the resource should be fetched using a different URL. This new URL is in the Location header of the response message.

304

Not Modified

Clients can make their requests conditional by the request headers they include. This code indicates that the resource has not changed.

305

Use Proxy

The resource must be accessed through a proxy, the location of the proxy is given in the Location header.

306

(Unused)

This status code currently is not used.

307

Temporary Redirect

Like the 301 status code; however, the client should use the URL given in the Location header to locate the resource temporarily.

400

Bad Request

Tells the client that it sent a malformed request.

401

Unauthorized

Returned along with appropriate headers that ask the client to authenticate itself before it can gain access to the resource.

402

Payment Required

Currently this status code is not used, but it has been set aside for future use.

403

Forbidden

The request was refused by the server.

404

Not Found

The server cannot find the requested URL.

405

Method Not Allowed

A request was made with a method that is not supported for the requested URL. The Allow header should be included in the response to tell the client what methods are allowed on the requested resource.

406

Not Acceptable

Clients can specify parameters about what types of entities they are willing to accept. This code is used when the server has no resource matching the URL that is acceptable for the client.

407

Proxy Authentication Required

Like the 401 status code, but used for proxy servers that require authentication for a resource.

408

Request Timeout

If a client takes too long to complete its request, a server can send back this status code and close down the connection.

409

Conflict

The request is causing some conflict on a resource.

410

Gone

Like the 404 status code, except that the server once held the resource.

411

Length Required

Servers use this code when they require a Content-Length header in the request message. The server will not accept requests for the resource without the Content-Length header.

412

Precondition Failed

If a client makes a conditional request and one of the conditions fails, this response code is returned.

413

Request Entity Too Large

The client sent an entity body that is larger than the server can or wants to process.

414

Request URI Too Long

The client sent a request with a request URL that is larger than what the server can or wants to process.

415

Unsupported Media Type

The client sent an entity of a content type that the server does not understand or support.

416

Requested Range Not Satisfiable

The request message requested a range of a given resource, and that range either was invalid or could not be met.

417

Expectation Failed

The request contained an expectation in the Expect request header that could not be satisfied by the server.

500

Internal Server Error

The server encountered an error that prevented it from servicing the request.

501

Not Implemented

The client made a request that is beyond the server's capabilities.

502

Bad Gateway

A server acting as a proxy or gateway encountered a bogus response from the next link in the request response chain.

503

Service Unavailable

The server cannot currently service the request but will be able to in the future.

504

Gateway Timeout

Similar to the 408 status code, except that the response is coming from a gateway or proxy that has timed out waiting for a response to its request from another server.

505

HTTP Version Not Supported

The server received a request in a version of the protocol that it can't or won't support.





(十)   【原创】一个负载均衡与E-tag头矛盾导致缓存效果变坏的实例分析:

负载均衡服务器后端是WEB服务器,但这些服务器是异构的比如说有linux的有windows的。
linux上设置http response中含last-modified头,但没有etag头:

linux.jpg (16.64 KB)

2009-3-13 19:02


windows服务器上设置response中既有last-modified 又有etag头。

windows.jpg (12.37 KB)

2009-3-13 19:02



第一次打开网站,a图片是从windows服务器上下到的,b图片是从linux服务器上下到的。
第二次打开网站(第2次打开时候超过缓存时间,由于该网站响应中只含有last-modified头,因此浏览器会使用启发式机制来计算可缓存时间。启发式缓存时间控制会有一个计算系数
WA上的assembly策略中有一个50%的系数就是控制这个的)
浏览器在请求a图片时候,被分配到了linux服务器上,b图片被分配到了windows服务器上。

由于a图片在第一次下载时拥有etag和last-modified两种属性。因此在第二次请求时浏览器会同时进行带2个条件的get
if-none-match和if-modified-since,根据http规范必须这2个条件同时满足未变化才会返回304。可惜第2次请求被分配到了linux服务器上,这个服务器是没有设置etag属性的,本来可以从本地缓存的图片却变成了重新下载:

截图00.jpg (57.37 KB)

2009-3-13 19:02



进一步分析:
使用e-tag是一件很坏的事情:
不同的服务器对同样的e-tag算出的值是不一样的,如果用e-tag作为判断条件,在被负载均衡到不同服务器后,则很容易导致缓存失效

截图01.jpg (52.91 KB)

2009-3-13 19:11


上图,同一图片在不同服务器上e-tag不同导致重新下载。
从服务器选择上看,这个图片这次恰好又分配到了另一台windows服务器,这样e-tag和last-modified头都有了,可以看到时间没有变化。可惜的是由于e-tag不一致导致重新下载。



(十一) 使用yahoo的Yslow评测工具分析一个站点在HTTP方面所做的优化:

Yahoo WEB应用开发团队是HTTP应用优化的倡导者和身体力行者,其开发团队根据多年的经验总结了数条网站优化规则,并编写成程序,该程序已经被众多的测试人员所津津乐道,并和强大的firebug工具集成,成为开发和测试人员的有利工具。

安装方法:

1. 下载安装Firefox浏览器

2. 下载安装Firefox浏览插件firebug

3. 下载安装Yslow

使用方法很简单,类似httpwatch,打开一个网站时,该程序会自动分析并评测:



从上面可以看出,网站总体测评分较低,属于F级别(A最优),其中还列出了具体可以优化的项目,并给各个项目的测评级别,例如可以再减少一些HTTP请求书,哪些项目可以使用CDN优化,哪些项目可以使用expire头或GZIP压缩等等。从上面的结果看,前5项都有很大的优化空间。具体内容可以展开项目后的三角形箭头查看,例如CDN部分:



对象过期优化:



一些对象位置的优化:


http://www.cppblog.com/age100/archive/2010/06/25/118688.aspx

你可能感兴趣的:(应用服务器,linux,算法,浏览器,cache)