最近持续在研究Apache的模块开发,为在Apache上的实验做准备。以下是介绍Apache Module中处理请求的过程,以及如何加入请求处理挂钩以及响应处理挂钩。尤其是几张图,很清晰地解释了Apache处理请求以及返回结果的框架。只是这篇文章主要从概念上介绍,没有实际可用的代码,仅帮助开发人员理解之用。
Processing HTTP requests is central to most web applications. In this article, we present an overview of request handling in Apache, and how modules may insert hooks into the request processing to build custom applications and components.
This article should help developers on the learning curve to working with apache modules, and equip you to work comfortably with the API documentation and code examples shipped with Apache itself.
Sorry, anonymous annotations have been disabled due to excessive spambots.
The Apache architecture comprises a common core, a platform-dependent layer (the APR), and a number of modules. Any Apache-based application - even one as simple as serving Apache's default "it worked" page - require several modules. Users of Apache need not be aware of this, but for applications developers, understanding modules and Apache's module API are the key to working with Apache.
Most, though by no means all, modules are concerned with some aspect of processing an HTTP request. But there is rarely if ever a reason for a module to concern itself with every aspect of HTTP: that is the business of the httpd. The advantage of a modular approach is that it is straightforward for a module to focus on a particular task but ignore aspects of HTTP that are not relevant to the task at hand.
In this article, we present the Apache request processing architecture, and show how a module can hook in to - and optionally control - different parts of the request cycle.
In the first line you should define what the letters APR are.
The simplest possible formulation of a webserver is a program that listens for HTTP requests and returns a response when it recieves one. In Apache, this is fundamentally the business of a content generator, the core of the webserver. Exactly one content generator must be run for every HTTP request. Any module may register content generators, normally by defining a function referenced by a handler that can be configured using the SetHandler or AddHandlerdirectives in httpd.conf. Any request for which no generator is provided by some module is handled by the default generator, which simply returns a file mapped directly from the request to the filesystem. Modules that implement one or more content generator may be known as content generator or handler modules.
"when it recieves one" is a typo. As we learned at school, 'I before E, except after C'
In principle, a content generator can handle all the functions of a webserver: for example, a CGI program gets the request and produces the response, and can take full control of what happens between them. But in common with other webservers, Apache splits the request into different phases. So, for example, it checks whether the user is authorised to do something before the content generator does that thing.
There are several request phases before the content generator. These serve to examine and perhaps manipulate the request headers, and determine what to do with the request. For example:
In addition, there is a request logging phase, that comes after the content generator has sent a reply to the browser.
A module may hook its own handlers into any of these processing hooks. Modules that concern themselves with the phases before content generation are known as metadata modules. Those that deal with logging are known as logging modules.
What we have described above is essentially the architecture of every general-purpose webserver. There are differences in the detail, but the request processing metadata->generator->logger phases are common.
The major innovation in Apache 2 that transforms it from a 'mere' webserver (like Apache 1.3 and others) into a powerful applications platform is the filter chain. This can be represented as a data axis, orthogonal to the request processing axis. The request data may be processed by input filters before reaching the content generator, and the response may be processed by output filters before being sent to the client. Filters enable a far cleaner and more efficient implementation of data processing than was possible in the past, as well as separating it from content generation. Examples of filters include Server side includes (SSI), XML and XSLT processing, gzip compression, and Encryption (SSL).
Before proceeding to discuss how a module hooks itself in to any of the stages of processing a request / data, let's pause to clear up a matter that often causes confusion amongst new module developers: namely, the order of processing.
The request processing axis is straightforward: the phases happen strictly in order. But confusion arises in the data axis. For maximum efficiency, this is pipelined, so the content generator and filters do not run in a deterministic order. So, for example, you cannot in general set something in an input filter and expect it to apply in the generator or output filters.
The order of processing is in fact centred on the content generator, which is responsible for pulling data from the input filter stack and pushing data to the output filters (where applicable, in both cases). When a generator or filter needs to set something affecting the request as a whole, it must do so before passing any data down the chain (generator and output filters), or before returning data to the caller (input filters). Techniques for this will be discussed in another article.
I'd like to know how to write an input filter to modify HTTP headers. Could you tell us which article discussed those techniques with regard to the last line of the third paragraph? Thank you in advance. --Miki
Now that we have an overview of request processing in Apache, we can proceed to show how a module hooks into it to play a part.
The apache module structure declares several (optional) data and function members:
module AP_MODULE_DECLARE_DATA my_module = { STANDARD20_MODULE_STUFF, my_dir_conf, my_dir_merge, my_server_conf, my_server_merge, my_cmds, my_hooks } ;
The relevant function for the module to create request processing hooks is the final member:
static void my_hooks(apr_pool_t* pool) { /* create request processing hooks as required */ }
What hooks we need to create here depend on what part or parts of the request our module is interested in. For example, a module that implements a content generator (handler) will need a handler hook, looking something like:
ap_hook_handler(my_handler, NULL, NULL, APR_HOOK_MIDDLE) ;
Now my_handler will be called when a request reaches the content generation phase. Hooks for other request phases are similar; a few commonly used ones are:
Between the general post_read_request and fixups hooks are several other hooks designated for specific purposes: for example access and authentication modules have specific hooks for checking permissions. All these hooks take exactly the same form as the handler hook. For further details, see http_config.h.
The prototype for a handler for any of these phases is:
static int my_handler(request_rec* r) { /* do something with the request */ }
The request_rec is the main apache data structure representing all aspects of an HTTP request.
The return value of my_handler is one of:
Implementation of the handlers will be discussed in other articles.
For httpd_config.h documentation see http://docx.itscales.com/http__config_8h.html
Filters are also normally registered in the my_hooks function, but the API is rather different:
ap_register_output_filter("my-output-filter-name", my_output_filter, NULL, AP_FTYPE_RESOURCE); ap_register_input_filter("my-input-filter-name", my_input_filter,NULL, AP_FTYPE_RESOURCE) ;
with the filter function prototypes
static apr_status_t my_output_filter(ap_filter_t* f,apr_bucket_brigade* bb) { /* read a chunk of data, process it, pass it to the next filter */ return APR_SUCCESS ; } static apr_status_t my_input_filter(ap_filter_t* f, apr_bucket_brigade* bb, ap_input_mode_t mode, apr_read_type_e block, apr_off_t nbytes) { /* pull a chunk of data from the next filter, process it, return it in bb */ return APR_SUCCESS ; }
Filter functions will normally return APR_SUCCESS, either explicitly as above or as the return code from the next filter via an ap_pass_brigadeor ap_get_brigade call. Any other return value is an internal server error and should only happen when the request is unrecoverable.
As with handlers, implementation of filters will be discussed in other articles. The API documentation is in util_filter.h.
in the filter function prototypes, the comments should be : pull a chunk of data from previous input filter, process... The word 'next' should be 'previous'. becuase,say,the http_filter is behind of ssl_filter, and http_filter pull data from ssl_filter.
The central data structure that represents an HTTP request is therequest_rec. It is created when Apache accepts the request, and is provided to all request processing functions, as shown in the prototype my_handler above. In a content filter, the request_rec is available as f->r.
The request_rec is a large struct containing, directly or indirectly, all the data fields a handler needs to process the request. Any metadata handler works by accessing and updating fields in the request_rec, content generator or filter may do so but additionally processes I/O, and a logger gets its information from the request_rec. For full details, see the API header file httpd.h.
We'll conclude this article with a few quick-tips about using the request_rec. You'll need to look at the API - or other articles where available - for details of how to use them, but these deal with frequently asked questions.
原文链接:http://www.apachetutor.org/dev/request