一般来说,对于一个请求,服务器都会对其进行解析,以确定请求的合法性以及行进的路径。于是本节将讲解如何获取请求的数据。(转载请指明出于breaksoftware的csdn博客)
我们使用《服务器架设笔记——编译Apache及其插件》一文中的方法创建一个Handler工程——get_request。该工程中,我们可以操作的入口函数是
static int get_request_handler(request_rec *r) { r->content_type = "text/html";通过该入口函数,我们可以直接得到的数据就是request_rec结构体对象指针r。通过查阅源码,我们得到其定义
/** * @brief A structure that represents the current request */ struct request_rec { /** The pool associated with the request */ apr_pool_t *pool; /** The connection to the client */ conn_rec *connection; /** The virtual host for this request */ server_rec *server; /** Pointer to the redirected request if this is an external redirect */ request_rec *next; /** Pointer to the previous request if this is an internal redirect */ request_rec *prev; /** Pointer to the main request if this is a sub-request * (see http_request.h) */ request_rec *main; /* Info about the request itself... we begin with stuff that only * protocol.c should ever touch... */ /** First line of request */ char *the_request; /** HTTP/0.9, "simple" request (e.g. GET /foo\n w/no headers) */ int assbackwards; /** A proxy request (calculated during post_read_request/translate_name) * possible values PROXYREQ_NONE, PROXYREQ_PROXY, PROXYREQ_REVERSE, * PROXYREQ_RESPONSE */ int proxyreq; /** HEAD request, as opposed to GET */ int header_only; /** Protocol version number of protocol; 1.1 = 1001 */ int proto_num; /** Protocol string, as given to us, or HTTP/0.9 */ char *protocol; /** Host, as set by full URI or Host: */ const char *hostname; /** Time when the request started */ apr_time_t request_time; /** Status line, if set by script */ const char *status_line; /** Status line */ int status; /* Request method, two ways; also, protocol, etc.. Outside of protocol.c, * look, but don't touch. */ /** M_GET, M_POST, etc. */ int method_number; /** Request method (eg. GET, HEAD, POST, etc.) */ const char *method; /** * 'allowed' is a bitvector of the allowed methods. * * A handler must ensure that the request method is one that * it is capable of handling. Generally modules should DECLINE * any request methods they do not handle. Prior to aborting the * handler like this the handler should set r->allowed to the list * of methods that it is willing to handle. This bitvector is used * to construct the "Allow:" header required for OPTIONS requests, * and HTTP_METHOD_NOT_ALLOWED and HTTP_NOT_IMPLEMENTED status codes. * * Since the default_handler deals with OPTIONS, all modules can * usually decline to deal with OPTIONS. TRACE is always allowed, * modules don't need to set it explicitly. * * Since the default_handler will always handle a GET, a * module which does *not* implement GET should probably return * HTTP_METHOD_NOT_ALLOWED. Unfortunately this means that a Script GET * handler can't be installed by mod_actions. */ apr_int64_t allowed; /** Array of extension methods */ apr_array_header_t *allowed_xmethods; /** List of allowed methods */ ap_method_list_t *allowed_methods; /** byte count in stream is for body */ apr_off_t sent_bodyct; /** body byte count, for easy access */ apr_off_t bytes_sent; /** Last modified time of the requested resource */ apr_time_t mtime; /* HTTP/1.1 connection-level features */ /** The Range: header */ const char *range; /** The "real" content length */ apr_off_t clength; /** sending chunked transfer-coding */ int chunked; /** Method for reading the request body * (eg. REQUEST_CHUNKED_ERROR, REQUEST_NO_BODY, * REQUEST_CHUNKED_DECHUNK, etc...) */ int read_body; /** reading chunked transfer-coding */ int read_chunked; /** is client waiting for a 100 response? */ unsigned expecting_100; /** The optional kept body of the request. */ apr_bucket_brigade *kept_body; /** For ap_body_to_table(): parsed body */ /* XXX: ap_body_to_table has been removed. Remove body_table too or * XXX: keep it to reintroduce ap_body_to_table without major bump? */ apr_table_t *body_table; /** Remaining bytes left to read from the request body */ apr_off_t remaining; /** Number of bytes that have been read from the request body */ apr_off_t read_length; /* MIME header environments, in and out. Also, an array containing * environment variables to be passed to subprocesses, so people can * write modules to add to that environment. * * The difference between headers_out and err_headers_out is that the * latter are printed even on error, and persist across internal redirects * (so the headers printed for ErrorDocument handlers will have them). * * The 'notes' apr_table_t is for notes from one module to another, with no * other set purpose in mind... */ /** MIME header environment from the request */ apr_table_t *headers_in; /** MIME header environment for the response */ apr_table_t *headers_out; /** MIME header environment for the response, printed even on errors and * persist across internal redirects */ apr_table_t *err_headers_out; /** Array of environment variables to be used for sub processes */ apr_table_t *subprocess_env; /** Notes from one module to another */ apr_table_t *notes; /* content_type, handler, content_encoding, and all content_languages * MUST be lowercased strings. They may be pointers to static strings; * they should not be modified in place. */ /** The content-type for the current request */ const char *content_type; /* Break these out --- we dispatch on 'em */ /** The handler string that we use to call a handler function */ const char *handler; /* What we *really* dispatch on */ /** How to encode the data */ const char *content_encoding; /** Array of strings representing the content languages */ apr_array_header_t *content_languages; /** variant list validator (if negotiated) */ char *vlist_validator; /** If an authentication check was made, this gets set to the user name. */ char *user; /** If an authentication check was made, this gets set to the auth type. */ char *ap_auth_type; /* What object is being requested (either directly, or via include * or content-negotiation mapping). */ /** The URI without any parsing performed */ char *unparsed_uri; /** The path portion of the URI, or "/" if no path provided */ char *uri; /** The filename on disk corresponding to this response */ char *filename; /* XXX: What does this mean? Please define "canonicalize" -aaron */ /** The true filename, we canonicalize r->filename if these don't match */ char *canonical_filename; /** The PATH_INFO extracted from this request */ char *path_info; /** The QUERY_ARGS extracted from this request */ char *args; /** * Flag for the handler to accept or reject path_info on * the current request. All modules should respect the * AP_REQ_ACCEPT_PATH_INFO and AP_REQ_REJECT_PATH_INFO * values, while AP_REQ_DEFAULT_PATH_INFO indicates they * may follow existing conventions. This is set to the * user's preference upon HOOK_VERY_FIRST of the fixups. */ int used_path_info; /** A flag to determine if the eos bucket has been sent yet */ int eos_sent; /* Various other config info which may change with .htaccess files * These are config vectors, with one void* pointer for each module * (the thing pointed to being the module's business). */ /** Options set in config files, etc. */ struct ap_conf_vector_t *per_dir_config; /** Notes on *this* request */ struct ap_conf_vector_t *request_config; /** Optional request log level configuration. Will usually point * to a server or per_dir config, i.e. must be copied before * modifying */ const struct ap_logconf *log; /** Id to identify request in access and error log. Set when the first * error log entry for this request is generated. */ const char *log_id; /** * A linked list of the .htaccess configuration directives * accessed by this request. * N.B. always add to the head of the list, _never_ to the end. * that way, a sub request's list can (temporarily) point to a parent's list */ const struct htaccess_result *htaccess; /** A list of output filters to be used for this request */ struct ap_filter_t *output_filters; /** A list of input filters to be used for this request */ struct ap_filter_t *input_filters; /** A list of protocol level output filters to be used for this * request */ struct ap_filter_t *proto_output_filters; /** A list of protocol level input filters to be used for this * request */ struct ap_filter_t *proto_input_filters; /** This response can not be cached */ int no_cache; /** There is no local copy of this response */ int no_local_copy; /** Mutex protect callbacks registered with ap_mpm_register_timed_callback * from being run before the original handler finishes running */ apr_thread_mutex_t *invoke_mtx; /** A struct containing the components of URI */ apr_uri_t parsed_uri; /** finfo.protection (st_mode) set to zero if no such file */ apr_finfo_t finfo; /** remote address information from conn_rec, can be overridden if * necessary by a module. * This is the address that originated the request. */ apr_sockaddr_t *useragent_addr; char *useragent_ip; /** MIME trailer environment from the request */ apr_table_t *trailers_in; /** MIME trailer environment from the response */ apr_table_t *trailers_out; };这是个非常大的结构体,可谓是包罗万象。对于初学者来说,想完全弄明白各项是什么还是比较困难的。而我们的需求很简单,我们就列出我们可能需要关心的数据
/** First line of request */ char *the_request;
请求的第一行数据
/** Protocol version number of protocol; 1.1 = 1001 */ int proto_num; /** Protocol string, as given to us, or HTTP/0.9 */ char *protocol; /** Host, as set by full URI or Host: */ const char *hostname;协议的版本和请求的类型
/** Time when the request started */ apr_time_t request_time;请求的时间
/** The URI without any parsing performed */ char *unparsed_uri; /** The path portion of the URI, or "/" if no path provided */ char *uri; /** The filename on disk corresponding to this response */ char *filename;未进行urldecode的URI、经过urldecode的URI和处理该请求的文件路径
/** The PATH_INFO extracted from this request */ char *path_info; /** The QUERY_ARGS extracted from this request */ char *args;请求中的路径和参数
/** A struct containing the components of URI */ apr_uri_t parsed_uri;请求解析的详细结果
char *useragent_ip;
请求来源的IP
/** MIME header environment from the request */ apr_table_t *headers_in;以table形式保存的http头信息
对于基础数据类型我们很容易编写出例程
if (r->the_request) { ap_rprintf(r, "the request : %s\n", r->the_request); } else { ap_rprintf(r, "the request is NULL\n"); } if (r->protocol) { ap_rprintf(r, "protocol : %s\n", r->protocol); } else { ap_rprintf(r, "protocol is NULL\n"); } ap_rprintf(r, "proto_num is %d\n", r->proto_num);而对于请求时间apr_time_t类型,我们可以参考 《服务器架设笔记——Apache模块开发基础知识》中对模块的介绍。我们查看源码,可以编写出如下例程
static void print_time(request_rec* r) { if (!r) { ap_rprintf(r, "request_rec pointer is NULL\n"); return; } char data_str[128] = {0}; apr_status_t status = apr_ctime(data_str, r->request_time); if (APR_SUCCESS != status) { ap_rprintf(r, "apr_ctime error\n"); } else { ap_rprintf(r, "ctime\t:\t%s\n", data_str); } apr_time_exp_t exp_t; memset(&exp_t, 0, sizeof(exp_t)); status = apr_time_exp_gmt(&exp_t, r->request_time); if (APR_SUCCESS != status) { ap_rprintf(r, "apr_time_exp_gmt error\n"); } else { ap_rprintf(r, "exp time\t:\n"); ap_rprintf(r, "\ttm_usec\t:\t%d\n", exp_t.tm_usec); ap_rprintf(r, "\ttm_sec\t:\t%d\n", exp_t.tm_sec); ap_rprintf(r, "\ttm_min\t:\t%d\n", exp_t.tm_min); ap_rprintf(r, "\ttm_hour\t:\t%d\n", exp_t.tm_hour); ap_rprintf(r, "\ttm_mday\t:\t%d\n", exp_t.tm_mday); ap_rprintf(r, "\ttm_mon\t:\t%d\n", exp_t.tm_mon); ap_rprintf(r, "\ttm_year\t:\t%d\n", exp_t.tm_year); ap_rprintf(r, "\ttm_wday\t:\t%d\n", exp_t.tm_wday); ap_rprintf(r, "\ttm_yday\t:\t%d\n", exp_t.tm_yday); ap_rprintf(r, "\ttm_isdst\t:\t%d\n", exp_t.tm_isdst); ap_rprintf(r, "\ttm_gmtoff\t:\t%d\n", exp_t.tm_gmtoff); } }其中apr_time_exp_t的定义在《apr_time.h》中。
/** * a structure similar to ANSI struct tm with the following differences: * - tm_usec isn't an ANSI field * - tm_gmtoff isn't an ANSI field (it's a BSDism) */ struct apr_time_exp_t { /** microseconds past tm_sec */ apr_int32_t tm_usec; /** (0-61) seconds past tm_min */ apr_int32_t tm_sec; /** (0-59) minutes past tm_hour */ apr_int32_t tm_min; /** (0-23) hours past midnight */ apr_int32_t tm_hour; /** (1-31) day of the month */ apr_int32_t tm_mday; /** (0-11) month of the year */ apr_int32_t tm_mon; /** year since 1900 */ apr_int32_t tm_year; /** (0-6) days since Sunday */ apr_int32_t tm_wday; /** (0-365) days since January 1 */ apr_int32_t tm_yday; /** daylight saving time */ apr_int32_t tm_isdst; /** seconds east of UTC */ apr_int32_t tm_gmtoff; };对于已分析过了的请求结构体apr_uri_t的例程也非常简单,我就不再列出来,只是把其结构体定义贴一下。大家一看就明白
/** * A structure to encompass all of the fields in a uri */ struct apr_uri_t { /** scheme ("http"/"ftp"/...) */ char *scheme; /** combined [user[:password]\@]host[:port] */ char *hostinfo; /** user name, as in http://user:passwd\@host:port/ */ char *user; /** password, as in http://user:passwd\@host:port/ */ char *password; /** hostname from URI (or from Host: header) */ char *hostname; /** port string (integer representation is in "port") */ char *port_str; /** the request path (or NULL if only scheme://host was given) */ char *path; /** Everything after a '?' in the path, if present */ char *query; /** Trailing "#fragment" string, if present */ char *fragment; /** structure returned from gethostbyname() */ struct hostent *hostent; /** The port number, numeric, valid only if port_str != NULL */ apr_port_t port; /** has the structure been initialized */ unsigned is_initialized:1; /** has the DNS been looked up yet */ unsigned dns_looked_up:1; /** has the dns been resolved yet */ unsigned dns_resolved:1; };这些例程中麻烦的是对apr_table_t的解析。因为网上很难找到对该table的遍历代码,于是我只能参考apr_table_clone中代码得出如下
static void print_table(request_rec *r, const apr_table_t* t) { const apr_array_header_t* array = apr_table_elts(t); apr_table_entry_t* elts = (apr_table_entry_t*)array->elts; for (int i = 0; i < array->nelts; i++) { ap_rprintf(r, "\t%s : %s\n", elts[i].key, elts[i].val); } }我们请求一个URL: http://192.168.191.129/AP%26AC%3aHE?a=b#c
其返回如下
headers_in start Host : 192.168.191.129 Connection : keep-alive Cache-Control : max-age=0 Accept : text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 User-Agent : Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36 Accept-Encoding : gzip,deflate,sdch Accept-Language : zh-CN,zh;q=0.8 headers_in end headers_out start headers_out end the request : GET /AP%26AC%3aHE?a=b HTTP/1.1 protocol : HTTP/1.1 proto_num is 1001 method : GET host name : 192.168.191.129 unparsed uri : /AP%26AC%3aHE?a=b uri : /AP&AC:HE filename : /usr/local/apache2/htdocs/AP&AC:HE path info : args : a=b user is NULL log id is NULL useragent ip : 192.168.191.1 ctime : Mon Feb 16 18:20:39 2015 exp time : tm_usec : 200039 tm_sec : 39 tm_min : 20 tm_hour : 10 tm_mday : 16 tm_mon : 1 tm_year : 115 tm_wday : 1 tm_yday : 46 tm_isdst : 0 tm_gmtoff : 0 scheme is NULL hostinfo is NULL user is NULL password is NULL hostname is NULL port_str is NULL path : /AP&AC:HE query : a=b fragment is NULL The sample page from mod_get_request.c