关于 HTTP 协议的快速回顾

翻译自:
http://www.haproxy.org/#doc1.4

1. Quick reminder about HTTP


当 haproxy 运行于 HTTP 模式,请求报文和响应报文都将被彻底地进行分析和建立索引,因而基本上可以对 HTTP 报文的任何内容进行匹配。

如果能理解 HTTP 请求报文和响应报文是如何建立的,那么在配置中编写正确的规则就更为容易。

1.1. The HTTP transaction model


HTTP 协议是 transaction-driven,对应于一个请求,有且仅有一个响应。传统的工作模式是这样的:client 与 server 建立连接,client 向 server 发出 HTTP 请求报文,server 回复响应报文给 client,连接关闭。新的请求只能新起一个新的连接发送:

[CON1] [REQ1] ... [RESP1] [CLO1] [CON2] [REQ2] ... [RESP2] [CLO2] ...

这种模式被称为 "HTTP close" 模式,有多少个 HTTP transaction,对应就有多少个连接被建立。当 server 回复了响应报文后,服务端就主动关闭链接,因此 client 不需要知道内容的长度。

由于 HTTP 协议的 transactional 属性,有了改进的方法。对于两个连续的 transactions,server 在第一次响应后不会马上关闭连接。

在这种模式中,server 需要将响应内容的长度告诉 client 以避免客户端无限期地等待。为此,一个特殊的 header 被使用:"Content-length"。这个模式被称为 "keep-alive" 模式:

[CON] [REQ1] ... [RESP1] [REQ2] ... [RESP2] [CLO] ...

这种模式可以减少两个 transactions 之间的延迟,并且减轻 server 端处理连接建立、关闭的工作。一般来说这种模式好于第一种 "HTTP close" 模式,但也不总是这样,因为客户端经常限制了他们的并发连接数为一个比较小的值。

最后一种改进模式是 "pipelining" 模式。它仍然使用 keep-alive 连接保持,但 client 不等待接收第一个响应之后才发送第二个请求,这对于获取大量的图片来组成一个页面时是很有用的:

[CON] [REQ1] [REQ2] ... [RESP1] [RESP2] [CLO] ...

这种模式对于性能的提升是显而易见的,因为 client 的一个请求与下一个请求之间没有了网络延迟。许多的 HTTP agent 不能正确支持 "pipelining" 模式,因为无法在 HTTP 中将请求和响应进行关联。因为这个原因,server 必须严格按照接收到的请求的顺序发送响应。

HAProxy 默认工作于 "tunnel-like" 模式,支持连接保持:对于每个连接,HAProxy 处理第一个请求,然后将后续的所有..(包括额外的请求) 转发到被选择的服务器。一旦连接建立,连接在 client 和 server 端都是持久的。

HAProxy 如果使用了 "option http-server-close" 选项,连接在 client 端是持久的,对于所有进来的请求进行独立的处理,将它们分发到后端服务器,server 端以 "HTTP close" 模式工作。

HAProxy 如果使用了 "option httpclose" 选项,client 和 server 端都工作于 "HTTP close" 模式。

如果 server 在 "HTTP close" 模式工作不正常,可尝试使用 "option forceclose" 或者 "option http-pretend-keepalive" 选项,或许会有帮助。

1.2. HTTP request


首先,我们看看这个 HTTP 请求:

Line Contents
number
1 GET /serv/login.php?lang=en&profile=2 HTTP/1.1
2 Host: www.mydomain.com
3 User-agent: my small browser
4 Accept: image/jpeg, image/gif
5 Accept: image/png

1.2.1. The Request line


Line 1 是 "request line",它总是由三个字段组成,三个字段通常以空格(LWS)分隔:

  • a METHOD : GET
  • a URI : /serv/login.php?lang=en&profile=2
  • a version tag : HTTP/1.1

这种结构很好解析, HAProxy 可以自行对其进行解析,所以无需用户自己写复杂的正则表达式去抓取其中的字段。

注:LWS (linear white spaces),which are commonly spaces, but can also be tabs or line feeds/carriage returns followed by spaces/tabs.

URI 可以有几种不同的形式 :

  • 一个 “相对的 URI” :

    /serv/login.php?lang=en&profile=2

    这是一个不包括 host 部分的完整的 URL。一般情况下,服务器,反向代理和透明代理都接收这种 URI。

  • 一个 “绝对的 URI”,也被称为 “URL” :

    http://192.168.0.12:8080/serv/login.php?lang=en&profile=2

它的组成为:
    scheme: 格式为 <协议名>://
    host:       主机名或IP地址
    端口号:        格式为 ":PORT",是可选项
    相对 URI: 以 / 为起始,跟在地址后面

反向代理一般会接收这种请求,但支持 HTTP/1.1 协议的服务器也必须接收这种形式的请求。
  • a star ('*') :

    这种形式必须和 OPTIONS 方法联合使用,并且能被 relay。这是用于查询下一跳的能力的。

  • an address:port combination : 192.168.0.12:80

    这必须和 CONNECT 方法联合使用,用于通过 HTTP 代理建立 TCP 隧道,一般是为了 HTTPS,有时也为其他协议。

在相对 URI /serv/login.php?lang=en&profile=2 中,有两个 sub-parts。

/serv/login.php 是 “path”,这是一个文件在服务器上的相对路径。

lang=en&profile=2 是 “query string”,通常与 GET 方法一起使用,请求目标通常是一个动态脚本。它的含义与具体的动态语言、框架、应用相关。

1.2.2. The request headers

The headers start at the second line. They are composed of a name at the
beginning of the line, immediately followed by a colon (':'). Traditionally,
an LWS is added after the colon but that's not required. Then come the values.
Multiple identical headers may be folded into one single line, delimiting the
values with commas, provided that their order is respected. This is commonly
encountered in the "Cookie:" field. A header may span over multiple lines if
the subsequent lines begin with an LWS. In the example in 1.2, lines 4 and 5
define a total of 3 values for the "Accept:" header.

从 Line 2 开始是 HTTP 的 headers(首部),格式为 header_name: value。

 2     Host: www.mydomain.com
 3     User-agent: my small browser
 4     Accept: image/jpeg, image/gif
 5     Accept: image/png
 <空行>

Line 4 和 5 可合并为一行:

Accept: image/jpeg, image/gif, image/png

Contrary to a common mis-conception, header names are not case-sensitive, and
their values are not either if they refer to other header names (such as the
"Connection:" header).

首部名对大小写不敏感。

The end of the headers is indicated by the first empty line. People often say
that it's a double line feed, which is not exact, even if a double line feed
is one valid form of empty line.

首部以一个空行为结尾。double line feed :LFLF 也是一种有效的空行。

Fortunately, HAProxy takes care of all these complex combinations when indexing
headers, checking values and counting them, so there is no reason to worry
about the way they could be written, but it is important not to accuse an
application of being buggy if it does unusual, valid things.

HAProxy 能够对它们进行正确解析。

Important note:
As suggested by RFC2616, HAProxy normalizes headers by replacing line breaks
in the middle of headers by LWS in order to join multi-line headers. This
is necessary for proper analysis and helps less capable HTTP parsers to work
correctly and not to be fooled by such complex constructs.

1.3. HTTP response

以下是一个 HTTP response:

Line Contents
number
1 HTTP/1.1 200 OK
2 Content-length: 350
3 Content-Type: text/html

As a special case, HTTP supports so called "Informational responses" as status
codes 1xx. These messages are special in that they don't convey any part of the
response, they're just used as sort of a signaling message to ask a client to
continue to post its request for instance. In the case of a status 100 response
the requested information will be carried by the next non-100 response message
following the informational one. This implies that multiple responses may be
sent to a single request, and that this only works when keep-alive is enabled
(1xx messages are HTTP/1.1 only). HAProxy handles these messages and is able to
correctly forward and skip them, and only process the next non-100 response. As
such, these messages are neither logged nor transformed, unless explicitly
state otherwise. Status 101 messages indicate that the protocol is changing
over the same connection and that haproxy must switch to tunnel mode, just as
if a CONNECT had occurred. Then the Upgrade header would contain additional
information about the type of protocol the connection is switching to.

1.3.1. The Response line

Line 1 is the "response line". It is always composed of 3 fields :

  • a version tag : HTTP/1.1
  • a status code : 200
  • a reason : OK

The status code is always 3-digit. The first digit indicates a general status :

  • 1xx = informational message to be skipped (eg: 100, 101)
  • 2xx = OK, content is following (eg: 200, 206)
  • 3xx = OK, no content following (eg: 302, 304)
  • 4xx = error caused by the client (eg: 401, 403, 404)
  • 5xx = error caused by the server (eg: 500, 502, 503)

Please refer to RFC2616 for the detailed meaning of all such codes. The
"reason" field is just a hint, but is not parsed by clients. Anything can be
found there, but it's a common practice to respect the well-established
messages. It can be composed of one or multiple words, such as "OK", "Found",
or "Authentication Required".

Haproxy 自己可能发出以下的 status code :

Code When / reason
200 access to stats page, and when replying to monitoring requests
301 when performing a redirection, depending on the configured code
302 when performing a redirection, depending on the configured code
303 when performing a redirection, depending on the configured code
307 when performing a redirection, depending on the configured code
308 when performing a redirection, depending on the configured code
400 for an invalid or too large request
401 when an authentication is required to perform the action (when
accessing the stats page)
403 when a request is forbidden by a "block" ACL or "reqdeny" filter
408 when the request timeout strikes before the request is complete
500 when haproxy encounters an unrecoverable internal error, such as a
memory allocation failure, which should never happen
502 when the server returns an empty, invalid or incomplete response, or
when an "rspdeny" filter blocks the response.
503 when no server was available to handle the request, or in response to
monitoring requests which match the "monitor fail" condition
504 when the response timeout strikes before the server responds

Haproxy 的 4xx 和 5xx 状态码可进行自定义,(see "errorloc" in section
4.2).

1.3.2. The response headers

Response headers work exactly like request headers, and as such, HAProxy uses
the same parsing function for both. Please refer to paragraph 1.2.2 for more
details.

你可能感兴趣的:(关于 HTTP 协议的快速回顾)