CherryProxy - a filtering HTTP proxy extensible in Python

CherryProxy - a filtering HTTP proxy extensible in Python | Decalage

CherryProxy - a filtering HTTP proxy extensible in Python

CherryProxy is a simple HTTP proxy written in Python 2.x, based on the CherryPy WSGI server and httplib, extensible for content analysis and filtering.

It has not been designed for operational use and the current version lacks some HTTP features (such as HTTPS support), so some websites will not display properly. However, it should be very useful for testing / demo / prototyping / educational purposes.

Why a new proxy

There are already quite a few HTTP proxies developed in Python, as shown by the excellent list maintained by xhaus. I needed a simple proxy with filtering features, so I looked at several of them. But they were either too simple with lacking features, or too complex to extend, so I decided to develop a different one.

I chose the CherryPy WSGI server because it provides a robust, thread-pooled HTTP server with good HTTP 1.1 support.

News:

  • 2011-11-22: moved CherryProxy to its own bitbucket project
  • 2011-11-15 v0.12: added parent proxy support

Download:

Get the zip archive from here, or use Mercurial to get the latest source code from here.

Install

On Windows, double-click on install.bat. On other systems, run "python setup.py install" from a shell.

License:

Open-source, BSD-style

Usage as a tool (simple proxy):

1) run CherryProxy.py [options]

Options:

  -h, --help            show this help message and exit

  -p PORT, --port=PORT  port for HTTP proxy, 8070 by default

  -a ADDRESS, --address=ADDRESS

                        IP address of interface for HTTP proxy (0.0.0.0 for

                        all, default=localhost)

  -f PROXY, --forward=PROXY

                        Forward requests to parent proxy, specified as

                        hostname[:port] or IP address[:port]

  -v, --verbose

2) setup your browser to use localhost:8070 as proxy

Usage in a Python Application:

- import cherryproxy

- create a subclass of cherryproxy.CherryProxy

- implement methods filter_request and/or filter_response to enable filtering as

  needed.

- see provided examples

Filtering API:

CherryProxy: class implementing a filtering HTTP proxy

 

To use it, create a class inheriting from CherryProxy and implement the

methods filter_request and filter_response as desired.

Then call the start method to start the proxy.

Note: the logging module needs to be initialized before creating a

CherryProxy object.

See the example scripts for more information.

__init__(self, address ='localhost', port =8070, server_name ='CherryProxy/0.12', debug =False, log_level =20, options =None, parent_proxy =None)
CherryProxy constructor

 

address: IP address of interface to listen to, or 0.0.0.0 for all

         (localhost by default)

port: TCP port for the proxy (8070 by default)

server_name: server name used in HTTP responses

debug: enable debugging messages if set to True

log_level: logging level (use constants from logging module)

options: None or optparse.OptionParser  object to provide additional options

parent_proxy: parent proxy, either IP address or hostname, with optional

    port (example: 'myproxy.local:8080')
filter_request(self)
Method to be overridden:

Called to analyse/filter/modify the request received from the client,

after reading the full request with its body if there is one,

before it is sent to the server.

 

This method may call  set_response() if the request needs to be blocked

before being sent to the server.

 

The following attributes can be read and MODIFIED:

    self. req.data: data sent with the request (POST or PUT)

    (and also all listed in filter_request_headers)
filter_request_headers(self)
Method to be overridden:

Called to analyse/filter/modify the request received from the client,

before reading the full request with its body if there is one,

before it is sent to the server.

 

This method may call  set_response() if the request needs to be blocked

before being sent to the server.

 

The following attributes can be read and MODIFIED:

    self. req.headers: dictionary of HTTP headers, with lowercase names

    self. req.method: HTTP method, e.g. 'GET', 'POST', etc

    self. req.scheme: protocol from URL, e.g. 'http' or 'https'

    self. req.netloc: IP address or hostname of server, with optional

                     port, for example 'www.google.com' or '1.2.3.4:8000'

    self. req.path: path in URL, for example '/folder/index.html'

    self. req.query: query string, found after question mark in URL

 

The following attributes can be READ only:

    self. req.environ: dictionary of request attributes following WSGI

                      format ( PEP 333)

    self. req.url: partial URL containing 'path?query'

    self. req.full_url: full URL containing 'scheme:netloc/path?query'

    self. req.length: length of request data in bytes, 0 if none

    self. req.content_type: content-type, for example 'text/html'

    self. req.charset: charset, for example 'UTF-8'

    self. req.url_filename: filename extracted from URL path
filter_response(self)
Method to be overridden:

Called to analyse/filter/modify the response received from the server,

after reading the full response with its body if there is one,

before it is sent back to the client.

 

This method may call  set_response() if the response needs to be blocked

(e.g. replaced by a simple response) before being sent to the client.
filter_response_headers(self)
Method to be overridden:

Called to analyse/filter/modify the response received from the server,

before reading the full response with its body if there is one,

before it is sent back to the client.

 

This method may call  set_response() if the response needs to be blocked

(e.g. replaced by a simple response) before being sent to the client.
set_response(self, status, reason =None, data =None, content_type ='text/plain')
set a HTTP response to be sent to the client instead of the one from

the server.

 

- status: int, HTTP status code (see  RFC 2616)

- reason: str, optional text for the response line, standard text by default

- data: str, optional body for the response, default="status reason"

- content_type: str, content-type corresponding to data
set_response_forbidden(self, status =403, reason ='Forbidden', data =None, content_type ='text/plain')
set a HTTP 403 Forbidden response to be sent to the client instead of

the one from the server.

 

- status: int, HTTP status code (see  RFC 2616)

- reason: str, optional text for the response line, standard text by default

- data: str, optional body for the response, default="status reason"

- content_type: str, content-type corresponding to data
start(self)
start proxy server
stop(self)
stop proxy server

你可能感兴趣的:(python)