CherryProxy - a filtering HTTP proxy extensible in Python | Decalage
CherryProxy - a filtering HTTP proxy extensible in Python
CherryProxy is a simple HTTP proxy written in Python 2.x, based on the CherryPy WSGI server and httplib, extensible for content analysis and filtering.
It has not been designed for operational use and the current version lacks some HTTP features (such as HTTPS support), so some websites will not display properly. However, it should be very useful for testing / demo / prototyping / educational purposes.
Why a new proxy
There are already quite a few HTTP proxies developed in Python, as shown by the excellent list maintained by xhaus. I needed a simple proxy with filtering features, so I looked at several of them. But they were either too simple with lacking features, or too complex to extend, so I decided to develop a different one.
I chose the CherryPy WSGI server because it provides a robust, thread-pooled HTTP server with good HTTP 1.1 support.
News:
- 2011-11-22: moved CherryProxy to its own bitbucket project
- 2011-11-15 v0.12: added parent proxy support
Download:
Get the zip archive from here, or use Mercurial to get the latest source code from here.
Install
On Windows, double-click on install.bat. On other systems, run "python setup.py install" from a shell.
License:
Open-source, BSD-style
Usage as a tool (simple proxy):
1) run CherryProxy.py [options]
Options:
-h, --help show this help message and exit
-p PORT, --port=PORT port for HTTP proxy, 8070 by default
-a ADDRESS, --address=ADDRESS
IP address of interface for HTTP proxy (0.0.0.0 for
all, default=localhost)
-f PROXY, --forward=PROXY
Forward requests to parent proxy, specified as
hostname[:port] or IP address[:port]
-v, --verbose2) setup your browser to use localhost:8070 as proxy
Usage in a Python Application:
- import cherryproxy
- create a subclass of cherryproxy.CherryProxy
- implement methods filter_request and/or filter_response to enable filtering as
needed.
- see provided examplesFiltering API:
CherryProxy: class implementing a filtering HTTP proxy
To use it, create a class inheriting from CherryProxy and implement the
methods filter_request and filter_response as desired.
Then call the start method to start the proxy.
Note: the logging module needs to be initialized before creating a
CherryProxy object.
See the example scripts for more information.
- __init__(self, address ='localhost', port =8070, server_name ='CherryProxy/0.12', debug =False, log_level =20, options =None, parent_proxy =None)
- CherryProxy constructor
address: IP address of interface to listen to, or 0.0.0.0 for all
(localhost by default)
port: TCP port for the proxy (8070 by default)
server_name: server name used in HTTP responses
debug: enable debugging messages if set to True
log_level: logging level (use constants from logging module)
options: None or optparse.OptionParser object to provide additional options
parent_proxy: parent proxy, either IP address or hostname, with optional
port (example: 'myproxy.local:8080')
- filter_request(self)
- Method to be overridden:
Called to analyse/filter/modify the request received from the client,
after reading the full request with its body if there is one,
before it is sent to the server.
This method may call set_response() if the request needs to be blocked
before being sent to the server.
The following attributes can be read and MODIFIED:
self. req.data: data sent with the request (POST or PUT)
(and also all listed in filter_request_headers)
- filter_request_headers(self)
- Method to be overridden:
Called to analyse/filter/modify the request received from the client,
before reading the full request with its body if there is one,
before it is sent to the server.
This method may call set_response() if the request needs to be blocked
before being sent to the server.
The following attributes can be read and MODIFIED:
self. req.headers: dictionary of HTTP headers, with lowercase names
self. req.method: HTTP method, e.g. 'GET', 'POST', etc
self. req.scheme: protocol from URL, e.g. 'http' or 'https'
self. req.netloc: IP address or hostname of server, with optional
port, for example 'www.google.com' or '1.2.3.4:8000'
self. req.path: path in URL, for example '/folder/index.html'
self. req.query: query string, found after question mark in URL
The following attributes can be READ only:
self. req.environ: dictionary of request attributes following WSGI
format ( PEP 333)
self. req.url: partial URL containing 'path?query'
self. req.full_url: full URL containing 'scheme:netloc/path?query'
self. req.length: length of request data in bytes, 0 if none
self. req.content_type: content-type, for example 'text/html'
self. req.charset: charset, for example 'UTF-8'
self. req.url_filename: filename extracted from URL path
- filter_response(self)
- Method to be overridden:
Called to analyse/filter/modify the response received from the server,
after reading the full response with its body if there is one,
before it is sent back to the client.
This method may call set_response() if the response needs to be blocked
(e.g. replaced by a simple response) before being sent to the client.
- filter_response_headers(self)
- Method to be overridden:
Called to analyse/filter/modify the response received from the server,
before reading the full response with its body if there is one,
before it is sent back to the client.
This method may call set_response() if the response needs to be blocked
(e.g. replaced by a simple response) before being sent to the client.
- set_response(self, status, reason =None, data =None, content_type ='text/plain')
- set a HTTP response to be sent to the client instead of the one from
the server.
- status: int, HTTP status code (see RFC 2616)
- reason: str, optional text for the response line, standard text by default
- data: str, optional body for the response, default="status reason"
- content_type: str, content-type corresponding to data
- set_response_forbidden(self, status =403, reason ='Forbidden', data =None, content_type ='text/plain')
- set a HTTP 403 Forbidden response to be sent to the client instead of
the one from the server.
- status: int, HTTP status code (see RFC 2616)
- reason: str, optional text for the response line, standard text by default
- data: str, optional body for the response, default="status reason"
- content_type: str, content-type corresponding to data
- start(self)
- start proxy server
- stop(self)
- stop proxy server