catalog
1. Introduction 2. Protocol Parameters 3. HTTP Message 4. Request 5. Response 6. HTTP Method、Content-type对HTTP包解析逻辑的影响
1. Introduction
The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. HTTP has been in use by the World-Wide Web global information initiative since 1990
1. The first version of HTTP, referred to as HTTP/0.9, was a simple protocol for raw data transfer across the Internet. 2. HTTP/1.0, as defined by RFC 1945 [6], improved the protocol by allowing messages to be in the format of MIME-like messages, containing metainformation about the data transferred and modifiers on the request/response semantics. However, HTTP/1.0 does not sufficiently take into consideration 1) the effects of hierarchical proxies 2) caching 3) the need for persistent connections 4) virtual hosts. In addition, the proliferation of incompletely-implemented applications calling themselves "HTTP/1.0" has necessitated a protocol version change in order for two communicating applications to determine each other's true capabilities. 3. "HTTP/1.1". This protocol includes more stringent requirements than HTTP/1.0 in order to ensure reliable implementation of its features. Practical information systems require more functionality than 1) simple retrieval 2) including search 3) front-end update 4) annotation. HTTP allows an open-ended set of methods and headers that indicate the purpose of a request. It builds on the discipline of reference provided by the Uniform Resource Identifier (URI), as a location(URL) or name (URN), for indicating the resource to which a method is to be applied. Messages are passed in a format similar to that used by Internet mail as defined by the Multipurpose Internet Mail Extensions (MIME) HTTP is also used as a generic protocol for communication between user agents and proxies/gateways to other Internet systems, including those supported by the SMTP, NNTP, FTP, Gopher, and WAIS protocols. In this way, HTTP allows basic hypermedia access to resources available from diverse applications.
0x1: Overall Operation
The HTTP protocol is a request/response protocol.
//Client-Server 1. A client sends a request to the server in the form of 1) a request method 2) URI 3) protocol version 2. followed by a MIME-like message containing 1) request modifiers 2) client information 3) and possible body content over a connection with a server. //Server-Client The server responds with a status line, including 1) the message's protocol version 2) a success or error code, 3) followed by a MIME-like message containing 3.1) server information 3.2) entity metainformation, 3.3) and possible entity-body content.
Most HTTP communication is initiated by a user agent and consists of a request to be applied to a resource on some origin server. In the simplest case, this may be accomplished via a single connection between the user agent (UA) and the origin server (O).
Relevant Link:
https://www.ietf.org/rfc/rfc2616.txt
2. Protocol Parameters
0x1: HTTP Version
HTTP uses a "<major>.<minor>" numbering scheme to indicate versions of the protocol. The protocol versioning policy is intended to allow the sender to indicate the format of a message and its capacity for understanding further HTTP communication, rather than the features obtained via that communication.
Proxy and gateway applications need to be careful when forwarding messages in protocol versions different from that of the application.
Since the protocol version indicates the protocol capability of the sender, a proxy/gateway MUST NOT send a message with a version indicator which is greater than its actual version. If a higher version request is received, the proxy/gateway MUST either downgrade the request version, or respond with an error, or switch to tunnel behavior.
0x2: Uniform Resource Identifiers
URIs have been known by many names
1. WWW addresses 2. Universal Document Identifiers 3. Universal Resource Identifiers 4. Uniform Resource Locators (URL) 5. Names (URN)
As far as HTTP is concerned, Uniform Resource Identifiers are simply formatted strings which identify--via name, location, or any other characteristic--a resource.
可以说,HTTP的URL格式是一种松散格式规约,如何理解URL很大程度上取决于后端的WEB容器的实现逻辑
1. http URL
The "http" scheme is used to locate network resources via the HTTP protocol
http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]
2. URI Comparison
When comparing two URIs to decide if they match or not, a client SHOULD use a case-sensitive octet-by-octet comparison of the entire URIs, with these exceptions:
1. A port that is empty or not given is equivalent to the default port(80) for that URI-reference; 2. Comparisons of host names MUST be case-insensitive; 3. Comparisons of scheme names MUST be case-insensitive; 4. An empty abs_path is equivalent to an abs_path of "/". 5. Characters other than those in the "reserved" and "unsafe" sets are equivalent to their ""%" HEX HEX" encoding. For example, the following three URIs are equivalent: 1) http://abc.com:80/~smith/home.html 2) http://ABC.com/%7Esmith/home.html 3) http://ABC.com:/%7esmith/home.html
0x3: Date/Time Formats
HTTP applications have historically allowed three different formats for the representation of date/time stamps:
1. Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123 2. Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036 3. Sun Nov 6 08:49:37 1994 ; ANSI C's asctime() format
0x4: Character Sets
HTTP uses the same definition of the term "character set" as that described for MIME: The term "character set" is used in this document to refer to a method used with one or more tables to convert a sequence of octets into a sequence of characters
Note: This use of the term "character set" is more commonly referred to as a "character encoding." However, since HTTP and MIME share the same registry, it is important that the terminology also be shared. 也就是说,HTTP请求包可以在MIME的框架下,进行任意的"广义编码/格式变换" //WAF Bypass的的一大原因就在于HTTP要求sender/receiver之间需要理解MIME格式的各种转换编码,,攻击者可以构造出一些经过特殊编码的、且同时能让WEB容器理解的HTTP请求包,而如果WAF无法理解或理解错误,就产生了Bypass
HTTP character sets are identified by case-insensitive tokens. The complete set of tokens is defined by the IANA Character Set registry
Although HTTP allows an arbitrary token to be used as a charset value, any token that has a predefined value within the IANA Character Set registry MUST represent the character set defined by that registry.
Applications SHOULD limit their use of character sets to those defined by the IANA registry. Implementors should be aware of IETF character set requirements
1. Missing Charset
Some HTTP/1.0 software has interpreted a Content-Type header without charset parameter incorrectly to mean "recipient should guess." Senders wishing to defeat this behavior MAY include a charset parameter even when the charset is ISO-8859-1 and SHOULD do so when it is known that it will not confuse the recipient.
Unfortunately, some older HTTP/1.0 clients did not deal properly with an explicit charset parameter. HTTP/1.1 recipients MUST respect the charset label provided by the sender; and those user agents that have a provision to "guess" a charset MUST use the charset from the
WAF Bypas的另一大原因在于需要对旧的HTTP协议(0.9、1.0)进行兼容,从而导致攻击者可以构造一些"特殊的编码"或"HTTP包",WEB容器需要对这些情况进行兼容,而如果WAF无法理解或理解错误,就产生了Bypass
Content-Typ对照表
1 .*( 二进制流,不知道下载文件类型): application/octet-stream 2 .htm: text/html 3 .html: text/html 4 .gif: image/gif ..
Relevant Link:
http://www.iana.org/assignments/character-sets/character-sets.xhtml http://tool.oschina.net/commons
0x5: Content Codings(client: Accept-Encoding、server: Content-Encoding)
Content coding values indicate an encoding transformation that has been or can be applied to an entity. Content codings are primarily used to allow a document to be compressed or otherwise usefully transformed without losing the identity of its underlying media type and without loss of information.
为了避免在网络传输中丢失数据,sender和receiver之间协定好了一种编码(转换/加密)方式,用于传输Server端返回的数据
1. "gzip" (GNU zip) as described in RFC 1952. This format is a Lempel-Ziv coding (LZ77) with a 32 bit CRC. 2. compress: The encoding format produced by the common UNIX file compression program "compress". This format is an adaptive Lempel-Ziv-Welch coding (LZW). 3. deflate: The "zlib" format defined in RFC 1950 in combination with the "deflate" compression mechanism described in RFC 1951 4. identity: The default (identity) encoding; the use of no transformation whatsoever. This content-coding is used only in the Accept-Encoding header, and SHOULD NOT be used in the Content-Encoding header.
0x6: Transfer Codings
0x7: Media Types
HTTP uses Internet Media Types in the Content-Type and Accept header fields in order to provide open and extensible data typing and type negotiation.
media-type = type "/" subtype *( ";" parameter ) type = token subtype = token
1. Canonicalization and Text Defaults
When in canonical form, media subtypes of the "text" type use CRLF as the text line break. HTTP relaxes this requirement and allows the transport of text media with plain CR or LF alone representing a line break when it is done consistently for an entire entity-body.
HTTP applications MUST accept CRLF, bare CR, and bare LF as being representative of a line break in text media received via HTTP. In addition, if the text is represented in a character set that does not use octets 13 and 10 for CR and LF respectively, as is the case for some multi-byte character sets, HTTP allows the use of whatever octet sequences are defined by that character set to represent the equivalent of CR and LF for line breaks
2. Multipart Types
MIME provides for a number of "multipart" types -- encapsulations of one or more entities within a single message-body. All multipart types share a common syntax, as defined RFC 2046, and MUST include a boundary parameter as part of the media type value.
The message body is itself a protocol element and MUST therefore use only CRLF to represent line breaks between body-parts.
Unlike in RFC 2046, the epilogue of any multipart message MUST be empty; HTTP applications MUST NOT transmit the epilogue (even if the original multipart contains an epilogue). These restrictions exist in order to preserve the self-delimiting nature of a multipart message- body, wherein the "end" of the message-body is indicated by the ending multipart boundary.
WEB容器在处理Multipart/form-data数据的时候,只有通过检测HTTP包中的"multipart boundary结束符"来界定的包结尾
In general, HTTP treats a multipart message-body no differently than any other media type: strictly as payload. The one exception is the "multipart/byteranges" type when it appears in a 206 (Partial Content) response, which will be interpreted by some HTTP caching mechanisms. In all other cases, an HTTP user agent SHOULD follow the same or similar behavior as a MIME user agent would upon receipt of a multipart type.
The MIME header fields within each body-part of a multipart message-body do not have any significance to HTTP beyond that defined by their MIME semantics.
In general, an HTTP user agent SHOULD follow the same or similar behavior as a MIME user agent would upon receipt of a multipart type. If an application receives an unrecognized multipart subtype, the application MUST treat it as being equivalent to "multipart/mixed".
Note: The "multipart/form-data" type has been specifically defined for carrying form data suitable for processing via the POST request method, as described in RFC 1867
0x8: Product Tokens(client: User-Agent、server: Server)
Product tokens are used to allow communicating applications to identify themselves by software name and version.
User-Agent: CERN-LineMode/2.15 libwww/2.17b3 Server: Apache/0.8.4
0x9: Quality Values
0x10: Language Tags(client: Accept-Language、server: Content-Language)
0x11: Entity Tags
entity tags are used for comparing two or more entities from the same requested resource. HTTP/1.1 uses entity tags in the
1. ETag 2. If-Match 3. If-None-Match 4. If-Range header fields.
The definition of how they are used and compared as cache validators is in rfc2616.
0x12: Range Units
HTTP/1.1 allows a client to request that only part (a range of) the response entity be included within the response. HTTP/1.1 uses range units in the Range and Content-Range header fields. An entity can be broken down into subranges according to various structural units.
range-unit = bytes-unit | other-range-unit bytes-unit = "bytes" other-range-unit = token
The only range unit defined by HTTP/1.1 is "bytes". HTTP/1.1 implementations MAY ignore ranges specified using other units.
Relevant Link:
https://www.ietf.org/rfc/rfc2616.txt
3. HTTP Message
0x1: Message Types
HTTP messages consist of requests from client to server and responses from server to client.
HTTP-message = Request | Response ; HTTP/1.1 messages
Request and Response messages use the generic message format of RFC 822 for transferring entities (the payload of the message). Both types of message consist of
1. a start-line: Request-Line | Status-Line 2. zero or more header fields (also known as "headers") 3. an empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields 4. possibly a message-body.
In the interest of robustness, servers SHOULD ignore any empty line(s) received where a Request-Line is expected. In other words, if the server is reading the protocol stream at the beginning of a message and receives a CRLF first, it should ignore the CRLF.
WEB Server忽略包头的空行,直到读取到HTTP Request-Line
0x2: Message Headers
HTTP header fields, which include
1. general-header 2. request-header 3. response-header 4. entity-header
Each header field consists of a name followed by a colon (":") and the field value.
1. Field names are case-insensitive. 2. The field value MAY be preceded by any amount of LWS, though a single SP is preferred.
Header fields can be extended over multiple lines by preceding each extra line with at least one SP or HT. Applications ought to follow "common form", where one is known or indicated, when generating HTTP constructs, since there might exist some implementations that fail to accept anything
也正因为HTTP的header(key:value)允许value跨多行,才导致了PHP Multipart/form-data remote dos Vulnerability漏洞
0x3: Message Body
The message-body (if any) of an HTTP message is used to carry the entity-body associated with the request or response. The message-body differs from the entity-body only when a transfer-coding has been applied, as indicated by the Transfer-Encoding header field
message-body = entity-body | <entity-body encoded as per Transfer-Encoding>
Transfer-Encoding MUST be used to indicate any transfer-codings applied by an application to ensure safe and proper transfer of the message. Transfer-Encoding is a property of the message, not of the
0x4: Message Length
The transfer-length of a message is the length of the message-body as it appears in the message;
0x5: General Header Fields
There are a few header fields which have general applicability for both request and response messages, but which do not apply to the entity being transferred.
general-header = Cache-Control | Connection | Date | Pragma | Trailer | Transfer-Encoding | Upgrade | Via | Warning
General-header field names can be extended reliably only in combination with a change in the protocol version. However, new or experimental header fields may be given the semantics of general header fields if all parties in the communication recognize them to be general-header fields. Unrecognized header fields are treated as entity-header fields.
Relevant Link:
http://www.cnblogs.com/LittleHann/p/5044140.html
4. Request
0x1: Request-Line
The Request-Line begins with a method token, followed by the Request-URI and the protocol version, and ending with CRLF. The elements are separated by SP characters. No CR or LF is allowed except in the final CRLF sequence.
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
1. Method
The Method token indicates the method to be performed on the resource identified by the Request-URI. The method is case-sensitive.
Method = "OPTIONS" | "GET" | "HEAD" | "POST" | "PUT" | "DELETE" | "TRACE" | "CONNECT" | extension-method
WEB容器在处理extension-method这类非标准的method时,往往会提高容错性,到也因此导致Bypass的可能性
某些apache版本在做GET请求的时候,无论method为何值均会取出GET的内容,如果某些WAF在处理数据的时候严格按照GET,POST等方式来获取数据,就会因为apache的宽松的请求方式导致bypass
2. Request-URI
The Request-URI is a Uniform Resource Identifier and identifies the resource upon which to apply the request.
Request-URI = "*" | absoluteURI | abs_path | authority
The four options for Request-URI are dependent on the nature of the request.
1. The asterisk "*" means that the request does not apply to a particular resource, but to the server itself, and is only allowed when the method used does not necessarily apply to a resource. //OPTIONS * HTTP/1.1 2. The absoluteURI form is REQUIRED when the request is being made to a proxy. The proxy is requested to forward the request or service it from a valid cache, and return the response. Note that the proxy MAY forward the request on to another proxy or directly to the server Fielding, et al. //GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.1 3. The authority form is only used by the CONNECT method 4. The most common form of Request-URI is that used to identify a resource on an origin server or gateway. In this case the absolute path of the URI MUST be transmitted as the Request-URI, and the network location of the URI (authority) MUST be transmitted in a Host header field. For example, a client wishing to retrieve the resource above directly from the origin server would create a TCP connection to port 80 of the host "www.w3.org" and send the lines: /* GET /pub/WWW/TheProject.html HTTP/1.1 Host: www.w3.org */
If the Request-URI is encoded using the "% HEX HEX" encoding, the origin server MUST decode the Request-URI in order to properly interpret the request. Servers SHOULD respond to invalid Request-URIs with an appropriate status code.
0x2: The Resource Identified by a Request
0x3: Request Header Fields
The request-header fields allow the client to pass additional information about the request, and about the client itself, to the server. These fields act as request modifiers, with semantics equivalent to the parameters on a programming language method invocation.
request-header = Accept | Accept-Charset | Accept-Encoding | Accept-Language | Authorization | Expect | From | Host | If-Match | If-Modified-Since | If-None-Match | If-Range | If-Unmodified-Since | Max-Forwards | Proxy-Authorization | Range | Referer | TE | User-Agent
Relevant Link:
https://www.ietf.org/rfc/rfc2616.txt
5. Response
After receiving and interpreting a request message, a server responds with an HTTP response message.
Response = Status-Line *(( general-header | response-header | entity-header ) CRLF) CRLF [ message-body ]
0x1: Status-Line
The first line of a Response message is the Status-Line, consisting of the protocol version followed by a numeric status code and its associated textual phrase, with each element separated by SP characters. No CR or LF is allowed except in the final CRLF sequence.
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
1. Status Code and Reason Phrase
The Status-Code element is a 3-digit integer result code of the attempt to understand and satisfy the request. The Reason-Phrase is intended to give a short textual description of the Status-Code. The Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason-Phrase.
The first digit of the Status-Code defines the class of response. The last two digits do not have any categorization role. There are 5 values for the first digit:
- 1xx: Informational - Request received, continuing process - 2xx: Success - The action was successfully received, understood, and accepted - 3xx: Redirection - Further action must be taken in order to complete the request - 4xx: Client Error - The request contains bad syntax or cannot be fulfilled - 5xx: Server Error - The server failed to fulfill an apparently valid request
The individual values of the numeric status codes defined for HTTP/1.1, and an example set of corresponding Reason-Phrase's, are presented below. The reason phrases listed here are only recommendations -- they MAY be replaced by local equivalents without affecting the protocol.
Status-Code = "100" : Continue | "101" : Switching Protocols | "200" : OK | "201" : Created | "202" : Accepted | "203" : Non-Authoritative Information | "204" : No Content | "205" : Reset Content | "206" : Partial Content | "300" : Multiple Choices | "301" : Moved Permanently | "302" : Found | "303" : See Other | "304" : Not Modified | "305" : Use Proxy | "307" : Temporary Redirect | "400" : Bad Request | "401" : Unauthorized | "402" : Payment Required | "403" : Forbidden | "404" : Not Found | "405" : Method Not Allowed | "406" : Not Acceptable | "407" : Proxy Authentication Required | "408" : Request Time-out | "409" : Conflict | "410" : Gone | "411" : Length Required | "412" : Precondition Failed | "413" : Request Entity Too Large | "414" : Request-URI Too Large | "415" : Unsupported Media Type | "416" : Requested range not satisfiable | "417" : Expectation Failed | "500" : Internal Server Error | "501" : Not Implemented | "502" : Bad Gateway | "503" : Service Unavailable | "504" : Gateway Time-out | "505" : HTTP Version not supported | extension-code
HTTP status codes are extensible. HTTP applications are not required to understand the meaning of all registered status codes, though such understanding is obviously desirable. However, applications MUST understand the class of any status code, as indicated by the first digit, and treat any unrecognized response as being equivalent to the x00 status code of that class, with the exception that an unrecognized response MUST NOT be cached.
0x2: Response Header Fields
The response-header fields allow the server to pass additional information about the response which cannot be placed in the Status-Line. These header fields give information about the server and about further access to the resource identified by the Request-URI.
response-header = Accept-Ranges | Age | ETag | Location | Proxy-Authenticate | Retry-After | Server | Vary | WWW-Authenticate
Response-header field names can be extended reliably only in combination with a change in the protocol version. However, new or experimental header fields MAY be given the semantics of response- header fields if all parties in the communication recognize them to be response-header fields. Unrecognized header fields are treated as entity-header fields.
Relevant Link:
https://www.ietf.org/rfc/rfc2616.txt
6. HTTP Method、Content-type对HTTP包解析逻辑的影响
0x0: PUT 还是 POST
1. 创建、更新与HTTP幂等性
通常,开发者将每个HTTP方法与CRUP操作一一对应
CRUD HTTP
Create POST
Read GET
Update PUT
Delete DELETE
GET与DELETE对应的操作是很明确的,但论及与create和update对应的HTTP方法时要取决于幂等性
2. 状态统一性
状态统一性在HTTP规范中是一个很重要的概念。它规定对于执行多次相同的HTTP请求,处于服务端的资源的状态是相同的。GET,HEAD, PUT与DELETE都具有这种特性,但POST没有
引用Dino Chiesa所述, "PUT 意为提交一个资源——用一个不同的事物完全替代给定的URL下的所有可访问资源",要使用PUT请求,你必须发送所有可访问属性/值,而不仅仅是你想要改变的那些,幂等性是HTTP规范的一项基本属性,并且必须确保web的互操作性与规模
3. HTTP POST vs HTTP PUT
1. 创建 1) 在不知道资源标识符时你应该使用POST来创建资源。使用POST创建资源时,返回"201 Created"状态和新建资源的位置是很好的实践,因为新建资源的位置在提交时是未知的。这可以使客户端稍后访问新创建的资源如果需要的话 2) 当你允许客户端指定新建资源的资源标识符时要使用PUT。但要记住,因为PUT是幂等的,你必须要发送所有可能的值 2. 更新 1) 你可以使用POST更新全部或一部分值 2) 如果你想用PUT更新某一资源,则必须要更新资源的全部属性。你必须要在PUT请求中发送所有属性值以保证幂等性
你也可以使用POST发送所有值,这样服务端状态与处理PUT请求的结果是一样的——这不是HTTP规范所必需的。注意幂等性与HTTP缓存服务器的缓存有较强的关联,并且POST请求通常是非缓存的。如果你对缓存的副作用感冒的话,你可以使用POST来执行全部或部分更新
POST是目前惟一的状态不统一的方法。HTTP规范对它的定义也很宽泛,并且大体上将它定义为了一个“服务端处理指令”。这就意味着在POST请求中做任何处理都是"安全"的
基于流量的入侵检测WAF往往会遇到HTTP流量过大的性能问题,为此可以根据HTTP请求报文中的Method、Content-type字段对HTTP包进行过滤,对不关心的HTTP包直接忽略,不进行耗时的深度包参数解析
一个好的解决思路是对HTTP Method进行分类(等级),对不同等级的Method分别采用不同程度的深度包参数解析
0x1: Method = GET(Safe Methods)、Content-Type = *
1. Web Server会接收并处理Request URL中的GETS参数 2. Web Server会忽略Request Body(即POST参数) 3. Web Server会接收并处理Request Header中的CCOKIE参数 4. Web Server会忽略Request Body中的FILES参数(即FILES参数) 5. Web Server会接收并处理Request USER_AGENT这种HTTP Header字段
0x2: Method = HEAD(Safe Methods)、Content-Type = *
HEAD方法允许客户端仅向服务器请求某个资源的响应头,而不要真正的下载该资源本身.服务器返回的响应头应该和客户端使用GET方法请求该资源时返回的请求头相同,比起GET方法,只是省略了响应体
1. Web Server会接收并处理Request URL中的GETS参数 2. Web Server会忽略Request Body(即POST参数) 3. Web Server会接收并处理Request Header中的CCOKIE参数 4. Web Server会忽略Request Body中的FILES参数(即FILES参数) 5. Web Server会接收并处理Request USER_AGENT这种HTTP Header字段
0x3: Method = POST、Content-Type = application/x-www-form-urlencoded
POST /test/test.php?op=code HTTP/1.1 Host: localhost Proxy-Connection: keep-alive Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 Accept-Encoding: gzip,deflate,sdch Accept-Language: zh-CN,zh;q=0.8 Cookie: admincp_language=zh; sYCB_2132_ulastactivity=f0edrXcuyr3SWGarcZ6cqkWc3Q7vEBwM3aoWG0%2BHG8kRjP5bb3le; sYCB_2132_nofavfid=1; sYCB_2132_smile=1D1; ECS[visit_times]=5 Content-Type: application/x-www-form-urlencoded;charset=utf-8 a=1&b=payload
这是最常见的 POST 提交数据的方式了。浏览器的原生 form 表单,如果不设置 enctype 属性,那么最终就会以 application/x-www-form-urlencoded 方式提交数据
首先,Content-Type 被指定为 application/x-www-form-urlencoded;其次,提交的数据按照 key1=val1&key2=val2 的方式进行编码,key 和 val 都进行了 URL 转码。大部分服务端语言都对这种方式有很好的支持。例如 PHP 中,$_POST['title'] 可以获取到 title 的值,$_POST['sub'] 可以得到 sub 数组,很多时候,我们用 Ajax 提交数据时,也是使用这种方式。例如 JQuery 和 QWrap 的 Ajax,Content-Type 默认值都是「application/x-www-form-urlencoded;charset=utf-8」
1. Web Server会接收并处理Request URL中的GETS参数 2. Web Server会接收并处理Request Body(即POST参数) 3. Web Server会接收并处理Request Header中的CCOKIE参数 4. Web Server会忽略Request Body中的FILES参数(即FILES参数) 5. Web Server会接收并处理Request USER_AGENT这种HTTP Header字段
0x4: Method = POST、Content-Type = multipart/form-data
这是一个常见的 POST 数据提交的方式。我们使用表单上传文件时,必须让 form 的 enctyped 等于multipart/form-data
生成了一个 boundary 用于分割不同的字段,为了避免与正文内容重复,boundary 很长很复杂。然后 Content-Type 里指明了数据是以 mutipart/form-data 来编码。消息主体里按照字段个数又分为多个结构类似的部分,每部分都是以
1. --boundary 开始 2. 紧接着内容描述信息 1) Content-Disposition 2) Content-Type 3. 然后是回车 4. 最后是字段具体内容(文本或二进制)。如果传输的是文件,还要包含文件名和文件类型信息 5. 消息主体最后以 --boundary-- 标示结束
对HTTP包解析的影响
1. Web Server会接收并处理Request URL中的GETS参数 2. Web Server会接收并处理Request Body,对于通过multipart/form-data方式发送的POST Body,每一个单独的POST参数包含boundary包裹的"区块"中 1) 参数名在Content-Disposition中的name值标识 2) 参数值在区块中 /* ------WebKitFormBoundary4zmcCcRJqhRDBsOq Content-Disposition: form-data; name="filename" hello ------WebKitFormBoundary4zmcCcRJqhRDBsOq Content-Disposition: form-data; name="number" 123 ------WebKitFormBoundary4zmcCcRJqhRDBsOq 这等效发送了filename、number两个POST参数 */ 3. Web Server会接收并处理Request Header中的CCOKIE参数 4. Web Server会忽略Request Body中的FILES参数(即FILES参数) 5. Web Server会接收并处理Request USER_AGENT这种HTTP Header字段
http请求包
POST /test/test.php?op=code HTTP/1.1 Host: localhost Proxy-Connection: keep-alive Content-Length: 1561 Cache-Control: max-age=0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 Origin: http://localhost User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 Content-Type: multipart/form-data; boundary=----WebKitFormBoundary4zmcCcRJqhRDBsOq Referer: http://localhost/test/upload.html Accept-Encoding: gzip,deflate,sdch Accept-Language: zh-CN,zh;q=0.8 Cookie: admincp_language=zh; sYCB_2132_ulastactivity=f0edrXcuyr3SWGarcZ6cqkWc3Q7vEBwM3aoWG0%2BHG8kRjP5bb3le; sYCB_2132_nofavfid=1; sYCB_2132_smile=1D1; ECS[visit_times]=5 ------WebKitFormBoundary4zmcCcRJqhRDBsOq Content-Disposition: form-data; name="file"; filename="xorcipher.c" Content-Type: text/plain #include<stdio.h> #include<stdlib.h> #include<string.h> #include<unistd.h> void xorcipher(const unsigned char *key, FILE *infp, FILE *outfp) { int ch; const unsigned char *keyp=key; while((ch=getc(infp))>=0) { putc(ch^*keyp++, outfp); if(!*keyp) keyp=key; } } int main(int argc, char *argv[]) { char *infname; char *outfname; const unsigned char *key; FILE *infp, *outfp; key = argv[3]; if(argc>1) infname=argv[1]; //else usage(argv[0]); if(!(infp=fopen(infname, "rb"))) { fprintf(stderr, "ERROR: fopen(%s)\n", argv[1]); exit(1); } if(argc>2) outfname=argv[2]; else { if(!(outfname=malloc(strlen(infname)+5))) { fprintf(stderr, "ERROR: malloc failed\n"); exit(1); } strcpy(outfname, argv[1]); strcat(outfname, ".xor"); } if(!(outfp=fopen(outfname, "wb"))) { fprintf(stderr, "ERROR: fopen(%s)\n", outfname); exit(1); } //key= "this is naru catch me if you can"; xorcipher(key, infp, outfp); fclose(outfp); fclose(infp); if(argc<4) free(outfname); return 0; } ------WebKitFormBoundary4zmcCcRJqhRDBsOq Content-Disposition: form-data; name="filename" hello ------WebKitFormBoundary4zmcCcRJqhRDBsOq Content-Disposition: form-data; name="number" 123 ------WebKitFormBoundary4zmcCcRJqhRDBsOq Content-Disposition: form-data; name="submit" Submit ------WebKitFormBoundary4zmcCcRJqhRDBsOq--
0x5: Method = POST、Content-Type = application/json
POST /test/test.php?op=code HTTP/1.1 Host: localhost Proxy-Connection: keep-alive Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 Accept-Encoding: gzip,deflate,sdch Accept-Language: zh-CN,zh;q=0.8 Cookie: admincp_language=zh; sYCB_2132_ulastactivity=f0edrXcuyr3SWGarcZ6cqkWc3Q7vEBwM3aoWG0%2BHG8kRjP5bb3le; sYCB_2132_nofavfid=1; sYCB_2132_smile=1D1; ECS[visit_times]=5 Content-Type: application/json;charset=utf-8 {"title":"test","sub":[1,2,3]}
这种方案,可以方便的提交复杂的结构化数据,特别适合 RESTful 的接口。各大抓包工具如 Chrome 自带的开发者工具、Firebug、Fiddler,都会以树形结构展示 JSON 数据,非常友好。但也有些服务端语言还没有支持这种方式,例如 php 就无法通过 $_POST 对象从上面的请求中获得内容。这时候,需要自己动手处理下:在请求头中 Content-Type 为 application/json 时,从 php://input 里获得原始输入流,再 json_decode 成对象。一些 php 框架已经开始这么做了
1. Web Server会接收并处理Request URL中的GETS参数 2. Web Server会忽略Request Body(即POST参数),因为无法识别 3. Web Server会接收并处理Request Header中的CCOKIE参数 4. Web Server会忽略Request Body中的FILES参数(即FILES参数) 5. Web Server会接收并处理Request USER_AGENT这种HTTP Header字段
其他的如text/xml也是和application/json一样的处理逻辑
0x6: Method = PUT、Content-Type = *
1. Web Server会接收并处理Request URL中的GETS参数,同时Get Url中的文件名就是将要进行写入/删除内容的文件 2. Web Server会接收 1) Apache: 默认关闭了PUT协议处理 2) IIS: 对于开启了WebDAV开关的IIS,PUT Post Body会被当作内容写入对应磁盘 3. Web Server会接收并处理Request Header中的CCOKIE参数 4. Web Server会忽略Request Body中的FILES参数(即FILES参数) 5. Web Server会接收并处理Request USER_AGENT这种HTTP Header字段
0x7: Method = DELETE、Content-Type = *
1. Web Server会接收并处理Request URL中的GETS参数 2. Web Server会忽略Request Body(即POST参数),因为无法识别 3. Web Server会接收并处理Request Header中的CCOKIE参数 4. Web Server会忽略Request Body中的FILES参数(即FILES参数) 5. Web Server会接收并处理Request USER_AGENT这种HTTP Header字段
0x8: Method = TRACE/CONNECT、Content-Type = *
对于这些扩展协议,Apache默认配置是不接收的,也不进行参数解析
0x9: Method = OPTIONS、Content-Type = *
1. Web Server会接收并处理Request URL中的GETS参数 2. Web Server会忽略Request Body(即POST参数),因为无法识别 3. Web Server会接收并处理Request Header中的CCOKIE参数 4. Web Server会忽略Request Body中的FILES参数(即FILES参数) 5. Web Server会接收并处理Request USER_AGENT这种HTTP Header字段
Relevant Link:
http://www.cnblogs.com/ziyunfei/archive/2012/11/17/2775421.html http://blog.csdn.net/mad1989/article/details/7918267 http://www.oschina.net/translate/put-or-post https://imququ.com/post/four-ways-to-post-data-in-http.html http://stackoverflow.com/questions/2934554/how-to-enable-and-use-http-put-and-delete-with-apache2-and-php http://www.runoob.com/http/http-content-type.html http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
Copyright (c) 2015 LittleHann All rights reserved