protlib - Easily implement binary network protocols

protlib - Easily implement binary network protocols

protlib builds on thestructandSocketServermodules in the standard library to make it easy to implement binarynetwork protocols. It provides support for default and constant structfields, nested structs, arrays of structs, better handling for stringsand arrays, struct inheritance, and convenient syntax for instantiatingand using your custom structs.

Here’s an example of defining, instantiating, writing, and reading a struct using file i/o:

from protlib import *
class Point(CStruct):
    x = CInt()
    y = CInt()

p1 = Point(5, 6)
p2 = Point(x=5, y=6)
p3 = Point(y=6, x=5)
assert p1 == p2 == p3

with open("point.dat", "wb") as f:
    f.write( p1.serialize() )

with open("point.dat", "rb") as f:
    p4 = Point.parse(f)

assert p1 == p4

You may use thesocket.makefilemethod to use this file i/o approach for sockets.

Installation

protlib is free under the BSD license. It requires Python 2.6 or later and has no otherdependencies. Because protlib supports Python 3, the code snippets in thisdocumentation are copied from a Python 3 interpreter.

You may click here to download protlib.You may also run easy_install protlib if you haveEasyInstall on your system. Theproject page for protlib in the Cheese Shop (aka the Python Package Index or PyPI)may be found here.

You may also check out the development version of protlib with this command:

svn checkout http://courtwright.org/svn/protlib

You may download older versions of protlib and view older versions of the protlib documentationhere.

Data Types

class CType ( **kwargs )

This is the root class of all classes representing C data typesin the protlib library. It may not be directly instantiated; youmust always use one of its subtypes instead. There are fiveoptional keyword arguments which you may pass to a CType:

  • length: Only valid for the CString, CUnicode, and CArray data types, for which it is required. This may be one of three things: an integer which represents the length of the string; the special value protlib.AUTOSIZED, which indicates that the string is null-terminated and can be any size; or a string denoting the field where the actual length value may be found. For example:

    >>> from protlib import *
    >>> class Person(CStruct):
    ...     state    = CString(length = 2)
    ...     name_len = CShort()
    ...     name     = CString(length = "name_len")
    ...
    >>> Person(state="VA", name_len=3, name="Eli")
    Person(state=b'VA', name_len=3, name=b'Eli')
    >>>
    >>> class Person(CStruct):
    ...     state = CString(length = 2)
    ...     name  = CString(length = AUTOSIZED)
    ...
    >>> Person.parse(b"VAEli\0")
    Person(state=b'VA', name=b'Eli')
    >>> Person(state="VA", name="Eli").serialize()
    b'VAEli\x00'
    
  • always: Use this to set a constant value for a field. You won’t need to specify this value, and a CWarning will be triggered if this field is ever assigned a different value. For example:

    >>> import warnings
    >>> warnings.simplefilter("always")
    >>>
    >>> from protlib import *
    >>> class OriginPoint(CStruct):
    ...     x = CInt(always = 0)
    ...     y = CInt(always = 0)
    ...
    >>>
    >>> op1 = OriginPoint()
    >>> op1
    OriginPoint(x=0, y=0)
    >>> op1.x = 5
    /home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py:733: CWarning: OriginPoint.x should always be 0 but was given a value of 5
      warn("{0}.{1} should always be {2!r} but was given a value of {3!r}".format(self.__class__.__name__, name, field.always, value), CWarning)
    >>>
    >>> buf = op1.serialize()
    >>> op2 = OriginPoint.parse(buf)
    /home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py:733: CWarning: OriginPoint.x should always be 0 but was given a value of 5
      warn("{0}.{1} should always be {2!r} but was given a value of {3!r}".format(self.__class__.__name__, name, field.always, value), CWarning)
    >>>
    >>> assert op1 == op2
    
  • default: Like the always parameter, except that no warnings are raised when a different value is parsed or serialized. Also, a default parameter may be either a value or a callable object. For example:

    >>> from protlib import *
    >>> class Point(CStruct):
    ...     x = CInt(default = 0)
    ...     y = CInt(default = lambda: 5)
    ...
    >>> p = Point()
    >>> p
    Point(x=0, y=5)
    
  • full_string: Unlike the struct module, protlib right-strips strings when they’re parsed, starting with the first null byte. This default behavior can be overridden by setting this parameter to True. For example:

    >>> raw = b"foo\0\0"
    >>> import struct
    >>> s = struct.unpack(b"5s", raw)[0]
    >>> assert s == b"foo\0\0"
    >>>
    >>> from protlib import *
    >>> s = CString(length = 5).parse(raw)
    >>> assert s == b"foo"
    >>>
    >>> raw = b"foo\0!"
    >>> s = CString(length = 5).parse(raw)
    >>> assert s == b"foo"
    >>>
    >>> raw = b"foo\0!"
    >>> s = CString(length = 5, full_string = True).parse(raw)
    >>> assert s == b"foo\0!"
    
  • encoding: This is required for CUnicode objects but invalid for all other types. It specifies the encoding to use when translating to and from unicode and raw bytes. For example:

    >>> from protlib import *
    >>> CUnicode(length=6, encoding="utf8").serialize("andré")
    b'andr\xc3\xa9'
    >>> assert "andré" == CUnicode(length=6, encoding="utf8").parse(b"andr\xc3\xa9")
    
  • enc_errors: This optional parameter is only valid for CUnicode objects. It defined how errors are handled, e.g. by being passed as the errors argument to the errors argument to the unicode builtin. If omitted, it defaults to “strict”. For example:

    >>> CUnicode(length=3, encoding="utf8", enc_errors="ignore").serialize(b"\x80")
    b'\x00\x00\x00'
    >>> CUnicode(length=3, encoding="utf8", enc_errors="replace").serialize(b"\x80")
    b'\xef\xbf\xbd'
    >>> CUnicode(length=3, encoding="utf8", enc_errors="strict").serialize(b"\x80")
    Traceback (most recent call last):
      File "/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py", line 374, in serialize
        encoded = self.convert(val).encode(self.encoding, self.enc_errors)
      File "/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py", line 274, in convert
        return x if isinstance(x, str) else str(_to_bytes(x), self.encoding, self.enc_errors)
      File "/home/eli/protlib/examples/../env3/lib/python3.1/encodings/utf_8.py", line 16, in decode
        return codecs.utf_8_decode(input, errors, True)
    UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: unexpected code byte
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "", line 1, in 
      File "/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py", line 378, in serialize
        raise CError(cerror).with_traceback(tb)
      File "/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py", line 374, in serialize
        encoded = self.convert(val).encode(self.encoding, self.enc_errors)
      File "/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py", line 274, in convert
        return x if isinstance(x, str) else str(_to_bytes(x), self.encoding, self.enc_errors)
      File "/home/eli/protlib/examples/../env3/lib/python3.1/encodings/utf_8.py", line 16, in decode
        return codecs.utf_8_decode(input, errors, True)
    protlib.CError: unicode error serializing b'\x80': 'utf8' codec can't decode byte 0x80 in position 0: unexpected code byte
    

Warning

The length parameter of the CUnicode class indicates the max lengthof the raw serialized bytes of the CUnicode field. It does not indicatethe number of unicode characters. For example, a 5-character unicode stringmight serialize to more than 5 bytes:

>>> from __future__ import unicode_literals
>>> from protlib import *
>>> CUnicode(length=5, encoding="utf8").serialize("andré")
/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py:384: CWarning: CUnicode value has length 5 and was told to serialize an encoded string of length 6 b'andr\xc3\xa9'
  warn("CUnicode value has length {0} and was told to serialize an encoded string of length {1} {2!r}".format(self.real_length(cstruct), len(encoded), encoded), CWarning)
b'andr\xc3'

Warning

Some unicode character encodings commonly contain null bytes, which makes itinadvisable to use those encodings with an AUTOSIZED string. For example:

>>> from protlib import *
>>> s = "Hello World".encode("utf-32")
>>> s.count(b"\0")
35
>>> CUnicode(length=AUTOSIZED, encoding="utf-32").parse(s)
Traceback (most recent call last):
  File "/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py", line 366, in parse
    return s.decode(self.encoding, self.enc_errors)
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-1: truncated data
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py", line 370, in parse
    raise CError(cerror).with_traceback(tb)
  File "/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py", line 366, in parse
    return s.decode(self.encoding, self.enc_errors)
protlib.CError: unicode error parsing b'\xff\xfe': 'utf32' codec can't decode bytes in position 0-1: truncated data
sizeof

The size of the packed binary data representing this CType.Note that this is a classmethod for subclasses of CStruct.

struct_format

The format string used by the underyingstruct moduleto represent the packed binary data format.Note that this is a classmethod for subclasses of CStruct.

parse ( f )

Accepts either a string or a file-like object (anything with a read method)and returns a Python object with the appropriate value.

>>> raw = b"\x00\x00\x00\x05"
>>> i = CInt().parse(raw)
>>> assert i == 5

Note that this is a classmethod on subclasses of CStruct.

serialize ( x )

Serializes the value according to the specific CType class.Note that this takes no argument when called on a CStructinstance.

Basic Data Types

Because protlib is built on top of struct module, each basic data typein protlib uses a struct format string. The list of struct format stringsis hereand the protlib types which use them are listed below. These sizes areconstant on all processor architectures by default, but this will changeif you change the value of protlib.BYTE_ORDER

C data type protlib class struct format string size in bytes
char CChar b 1
unsigned char CUChar B 1
short CShort h 2
unsigned short CUShort H 2
int CInt i 4
unsigned int CUInt I 4
long CLong q 8
unsigned long CULong Q 8
float CFloat f 4
double CDouble d 8
char[] CString Xs (e.g. 5s for char[5]) 1 * length
char[] CUnicode Xs (e.g. 5s for char[5]) 1 * length

Creating Custom CTypes

Some projects might require you to write custom parsing and serializing code;protlib makes this easy by allowing you to subclass CType classes. Here’san example, which you can find in examples/ctype_subclassing/testing.py:

import json
from protlib import *

class JsonCString(CString):
    def parse(self, f, cstruct=None):
        return json.loads(CString.parse(self, f, cstruct).decode("utf8"))
    
    def serialize(self, s, cstruct=None):
        return CString.serialize(self, json.dumps(s).encode("utf8"), cstruct)
    
    def convert(self, x):
        return x

class Person(CStruct):
    name = CUnicode(encoding = "utf8", length = 6)
    data = JsonCString(length = AUTOSIZED)

eli = Person("Eli", {"age": 28})
assert eli.data == {"age": 28}
assert eli.serialize() == b'Eli\0\0\0{"age": 28}\0'

This code works in both Python 2 and Python 3 and demonstrates the three methodsyou can override to define your custom parsing and serialization:

  • The parse method calls json.loads, which requires a unicode string in Python 3 but can take either a unicode a regular byte string in Python 2. Since CString.parse returns a byte string, we make sure to encode it before passing it to json.loads.
  • The serialize method calls json.dumps, which returns a unicode string in Python 3 and returns a byte string in Python 2. Because the serialize method must return a byte string, we always encode our result; on Python 3 this will encode as we exect and when called on a regular byte string in Python 2 the original string is returned.
  • The convert method defines what happens when we assign a value to our struct field. As mentioned in the CStruct.__setattr__ documentation below, protlib automatically does type coercion, so if you assign 5 to a CString field it will be converted to "5", etc. This behavior is defined by the convert method, and in our case if someone assigns a value to a JsonCString field we don’t want that value to be converted to a string as it would for a regular CString, so we simply return the object unchanged.

Arrays

class CArray ( length, ctype )

You can make an array of any CType. Arrays pack and unpack to andfrom Python lists. For example:

>>> ca = CArray(5, CInt)
>>> raw = ca.serialize( [5,6,7,8,9] )
>>> xs = ca.parse(raw)
>>> assert xs == [5,6,7,8,9]

Arrays may either be given default/always values themselves or use thedefault/always values of the CType they are given. For example:

>>> class Triangle(CStruct):
...     xcoords = CArray(3, CInt(default=0))
...     ycoords = CArray(3, CInt, default=[0,0,0])
...
>>> tri = Triangle()
>>> assert tri.xcoords == tri.ycoords == [0,0,0]

Nested arrays work as you might expect:

>>> class Matrix(CStruct):
...     xs = CArray(3, CArray(2, CInt(default=0)))
...
>>> assert Matrix().xs == [[0,0], [0,0], [0,0]]

Custom Structs

class CStruct

This should never be instantiated directly. Instead, you should subclassthis when defining a custom struct. Your subclass will be given aconstructor which takes the fields of your struct as positional and/orkeyword arguments. However, you don’t have to provide values for yourfields at this time. For example:

>>> class Point(CStruct):
...     x = CInt()
...     y = CInt()
...
>>> p1 = Point(5, 6)
>>> p2 = Point()
>>> p2.x = 5
>>> p2.y = 6
>>> assert p1 == p2
classmethod sizeof ( cstruct = None )

Returns the size of the packed binary data needed to hold thisCStruct. This method takes no arguments on a fixed-sizestruct, but if any of this struct’s fields has a variable length,this method will throw an exception if called with no arguments.You can pass an instance of this CStruct to get the size of thatparticular instance, for example:

>>> from protlib import *
>>>
>>> class Person(CStruct):
...     name = CString(length = 5)
...
>>> Person.sizeof()
5
>>>
>>> class Person(CStruct):
...     name = CString(length = AUTOSIZED)
...
>>> Person.sizeof()
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py", line 654, in sizeof
    return cls.get_type(cached=True).sizeof(cstruct)
  File "/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py", line 254, in sizeof
    return struct.calcsize(BYTE_ORDER + self.struct_format(cstruct))
  File "/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py", line 486, in struct_format
    return b"".join(ctype.struct_format(cstruct) for name,ctype in self.subclass.get_fields())
  File "/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py", line 486, in 
    return b"".join(ctype.struct_format(cstruct) for name,ctype in self.subclass.get_fields())
  File "/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py", line 245, in struct_format
    CString:  _to_bytes("{0}s".format(self.real_length(cstruct))),
  File "/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py", line 210, in real_length
    raise CError("cstruct not provided to resolve variable-length field with length attribute {0!r}".format(self.length))
protlib.CError: cstruct not provided to resolve variable-length field with length attribute 'AUTOSIZED'
>>>
>>> eli = Person(name = "Eli")
>>> Person.sizeof(eli)
4
classmethod parse ( f )

Accepts a string or file-like object and returns an instance ofthis CStruct drawn from that data source.

serialize ( )

Returns the packed binary data representing this CStruct.This is what should be written to files and sockets.

__str__ ( )

Alias for __repr__

__repr__ ( )

Returns a literal representation of the CStruct. For example:

>>> class Point(CStruct):
...     x = CInt()
...     y = CInt()
...
>>> p = Point(x=5, y=6)
>>> p
Point(x=5, y=6)
__setattr__ ( self, name, val )

When you assign a value to one of a struct’s fields, protlib convertsthe value to the proper data type, according to the data type.For example:

>>> class Point(CStruct):
...     code = CChar()
...     x = CInt()
...     y = CInt()
...
>>> p = Point(code="A", x="5")
>>> assert p.code == ord("A") == 65
>>> assert p.x == 5
>>>
>>> p.y = 6.25
/home/eli/protlib/env3/lib/python3.1/site-packages/protlib-1.4-py3.1.egg/protlib.py:746: CWarning: Loss of precision when converting a float (6.25) to an integer field
  warn("Loss of precision when converting a float ({0}) to an integer field".format(x), CWarning)
>>> assert p.y == 6
classmethod get_type ( **kwargs )

Returns an objects which may be used to declare a CStruct as afield in another CStruct. This accepts the same defaultand always parameters as the CType constructor. For example:

>>> class Point(CStruct):
...     x = CInt()
...     y = CInt()
...
>>> class Vector(CStruct):
...     p1 = Point.get_type()
...     p2 = Point.get_type(default = Point(0,0))
...
>>> v = Vector(p1 = Point(5,6))
classmethod get_fields ( )

Returns a list of the CType objects which define the fields ofthis struct in the order in which they were declared.

Warning

The order of struct fields is defined by the order in which the CTypesubclasses for those fields were instantiated. In other words, if you say

from protlib import *

y_field = CInt()
x_field = CInt()

class Point(CStruct):
    x = x_field
    y = y_field

then when you serialize your struct, the y field will come beforethe x field because its CInt value was instantiated first. Similarly,if you say

from protlib import *

class Point(CStruct):
    x = y = CInt()

then the order of the x and y fields is undefined since they share the sameCInt instance. In this second case, a CWarning will be triggered,but the first case is not automatically detected by the protlib library.

Protocol Handlers

protlib also provides a convenient framework for implementing servers which receive andsend CStruct objects. This makes it easy to implement custom binary protocols inwhich structs are passed back and forth over socket connections. This is based onthe SocketServer modulein the Python standard library.

In order to use these examples, you must do only two things.

  • First, make sure that each struct which represents a message begins with a constantvalue which uniquely identifies that struct.
  • Second, define a subclass of the appropriate handler class, either TCPHandler orUDPHandler, and define a handler method for each message type you wish to respond to.

An example client/server

Let’s walk through a simple example. We’ll define several structs to represent geometricconcepts: a Point, a Vector, and a Rectangle. Each of these structs is a message whichcan be sent between the client and server. We’ll also define a variable-length messagecalled PointGroup, which demonstrates using variable-length arrays.

Note that first field in each of these messages is a constant value that uniquelyidentifies the message.

This entire example can be found in the examples/geocalc directory. Here’s thecommon.py file, which is imported by both the server.py and client.py programs:

import logging
logging.basicConfig(level = logging.INFO)

from protlib import *

SERVER_ADDR = ("127.0.0.1", 32123)

class Point(CStruct):
    code = CShort(always = 1)
    x    = CFloat()
    y    = CFloat()

class Vector(CStruct):
    code = CShort(always = 2)
    p1   = Point.get_type()
    p2   = Point.get_type()

class Rectangle(CStruct):
    code   = CShort(always = 4)
    points = CArray(4, Point)

class PointGroup(CStruct):
    code   = CShort(always = 3)
    count  = CInt()
    points = CArray("count", Point)

For our server, we define a handler class with a handler method for each message we wishto accept. The name of each handler method should be the name of the message class inlower case with the words separated by underscores. For example, the Vector classis handled by the vector method, and the PointGroup class is handled by thepoint_group method. Each of these handler methods takes a single parameter otherthan self which is the actual message read and parsed from the socket.

Here’s the server.py file which uses our subclasses ofthe SocketServer moduleclasses to accept and handle incoming messages:

from math import sqrt

from common import *

class Handler(TCPHandler):
    LOG_TO_SCREEN = True
    
    def vector(self, v):
        """returns the mid-point of the line segment"""
        return Point(x = (v.p1.x + v.p2.x) / 2,
                     y = (v.p1.y + v.p2.y) / 2)
    
    def rectangle(self, r):
        """returns the endpoint closest to the origin"""
        dists = [(sqrt(p.x**2 + p.y**2), p) for p in r.points]
        return min(dists)[1]
    
    def point_group(self, pg):
        """returns a rectangle which encompasses all points in the group"""
        xmin = min(p.x for p in pg.points)
        xmax = max(p.x for p in pg.points)
        ymin = min(p.y for p in pg.points)
        ymax = max(p.y for p in pg.points)
        return Rectangle(points=[
            Point(x=xmin, y=ymin), Point(x=xmin, y=ymax),
            Point(x=xmax, y=ymin), Point(x=xmax, y=ymax)
        ])

if __name__ == "__main__":
    LoggingTCPServer(SERVER_ADDR, Handler).serve_forever()

To test this server, we have a simple client which sends a series of messages to theserver and then reads back the responses, logging everything with our protlib.Loggerclass. Here’s our client.py script:

import socket
from random import randrange

from common import *

def rand_point():
    return Point(x=randrange(100), y=randrange(100))

logger = Logger(also_print = True)
parser = Parser(logger)
sock = socket.create_connection(SERVER_ADDR)
f = sock.makefile("rwb", 0)

vec = Vector(p1=rand_point(), p2=rand_point())
logger.log_and_write(f, vec)
pt = parser.parse(f)
assert vec.p1.x < pt.x < vec.p2.x or vec.p1.x > pt.x > vec.p2.x
assert vec.p1.y < pt.y < vec.p2.y or vec.p1.y > pt.y > vec.p2.y

rect = Rectangle(points=[Point(x=1, y=1),
                         Point(x=1, y=5),
                         Point(x=5, y=1),
                         Point(x=5, y=5)])
logger.log_and_write(f, rect)
pt = parser.parse(f)
assert pt.x == pt.y == 1

points = [rand_point() for i in range(10)]
logger.log_and_write(f, PointGroup(count=10, points=points))
rect = parser.parse(f)
assert rect.code == Rectangle.code.always

sock.close()

Our server does all of our logging automatically, but we need to manually invoke thelogger on the client. The logs created and their format are explained below.

Logging

protlib uses the logging module toprovide 5 different logs, each with their own suffix: hex, raw, struct, error, and stack.By default, the prefix of these logs will be the name of the current script.A RotatingFileHandleris created for each of these logs if no handlers already exist when the logs are firstaccessed by protlib.

For example, if you’re running the script server.py then these will be the log names,log file names, logging levelused for the log messages, and type of messages written to each log:

log name default filename level messages
server.hex server.hex_log DEBUG nicely formatted hex dumps of the binary data sent and received
server.raw server.raw_log INFO Python string literals of the binary data sent and received
server.struct server.struct_log WARNING literal representations of each struct sent and received
server.error server.error_log ERROR error messages
server.stack server.stack_log CRITICAL stack traces of uncaught exceptions thrown by handler methods

Each log message generated by one of our protocol handlers contains a unique identifierwhich indicates the binary protocol message received. This makes it easy to match thelog messages in the different files to one another, since this unique message identifierwill be present in each of the 5 logs.

Log examples

Here’s a description of each log:

struct

This contains the literal representation of each request and response, for example:

2010-03-15 18:54:07,664: (1268693647_0) received Vector(code=2, p1=Point(code=1, x=39.0, y=41.0), p2=Point(code=1, x=93.0, y=13.0))
2010-03-15 18:54:07,664: (1268693647_0) sending Point(code=1, x=66.0, y=27.0)

This is convenient because the structs are logged with the Python code which representsthem. Therefore we can paste them directly into a Python command prompt to inspect andplay around with them:

>>> from common import *
>>> p = Point(code=1, x=66.0, y=27.0)
>>> p
Point(code=1, x=66.0, y=27.0)
raw

This contains the raw data in the form of a Python string of each request and response, for example:

2010-03-15 18:54:07,664: (1268693647_0) sending b'\x00\x01B\x84\x00\x00A\xd8\x00\x00'
2010-03-15 18:54:07,667: (1268693647_1) received b'\x00\x04\x00\x01?\x80\x00\x00?\x80\x00\x00\x00\x01?\x80\x00\x00@\xa0\x00\x00\x00\x01@\xa0\x00\x00?\x80\x00\x00\x00\x01@\xa0\x00\x00@\xa0\x00\x00'

This is convenient because we can paste these strings into a Python command promptand play around with them. If they are valid then we can parse them into structs, andif they aren’t then we can examine exactly why; this log will always contain whatwe receive even in the case of unparsable binary data:

>>> from common import *
>>> s = b'\x00\x01B\x84\x00\x00A\xd8\x00\x00'
>>> p = Point.parse(s)
>>> p
Point(code=1, x=66.0, y=27.0)
>>>
>>> s = b"bad"
>>> p = Point.parse(s)
>>> Point.parse(s)
Traceback (most recent call last):
  File "", line 1, in 
  File "protlib.py", line 230, in parse
    return cls.get_type(cached=True).parse(f)
  File "protlib.py", line 141, in parse
    raise CError("{0} requires {1} bytes and was only given {2} ({3!r})".format(self.subclass.__name__, self.sizeof, len(buf), buf))
protlib.CError: Point requires 10 bytes and was only given 3 ('bad')
>>>
>>> s = b"invalid but with enough data"
>>> p = Point.parse(s)
../../protlib.py:526: CWarning: Point.code should always be 1 but was given a value of 26990
  warn("{0}.{1} should always be {2!r} but was given a value of {3!r}".format(self.__class__.__name__, name, field.always, value), CWarning)
>>> p
Point(code=26990, x=1.1430328245747994e+33, y=1.1834294514326081e+22)
hex

This contains nicely-formatted tables of the binary data sent and received in hexadecimal notation. For example:

2010-03-15 18:38:50,978: (1268692730_0) received
     0  1  2  3  4  5  6  7
  0  00 02 00 01 42 30 00 00
  8  42 74 00 00 00 01 42 aa
 16  00 00 42 18 00 00
error

This contains messages for common errors, such as when a message is too short, orwhen we have no handler to match a message we’ve received, etc. These messagescontain as much information as possible to help reconstruct the problem, whichusually includes the raw data involved (also present in the raw log).

stack

This contains stack traces from exceptions thrown in your handler methods.

Logger objects

Although logging is performed automatically when using SocketServer classes,you may find it useful to instantiate your own logger objects, then manually make useof the 5 logs listed above. Use this object to do that; note that this class uses butdoes not inherit from thelogging.Logger class.

class Logger ( [ prefix [, also_print=False ] ] )

A logging object which uses the 5 logs listed above.

Parameters:
  • prefix – Pass a string as this parameter to replace the default prefix (whichis the name of the script being executed). For example, if you passthe string "foo" as this parameter, then your logs will be namedfoo.hex, foo.raw, etc.
  • also_print – whether to also print log messages to the screen
log_struct ( inst [, trans_type="received" ] )

Logs the repr of an instance of a CStruct subclass to the struct log.

Parameters:
  • inst – the instance of the struct to be logged
  • trans_type – a prefix to the log message, generally this should be either"sending" or "received"
log_binary ( data [, trans_type="received" ] )

Logs the repr of the packed binary data to the raw log, then logs anicely formatted table of thje data to the hex log.

Parameters:
  • data – the packed binary data, such as what’s produced by callings.serialize() on an instance of a CStruct subclass
  • trans_type – a prefix to the log message, generally this should be either"sending" or "received"
log_error ( message, *args, **kwargs )

Logs the message to the error log. The message parameter should bea string, and the *args and **kwargs to this method are used as theparameters to str.format

log_stacktrace ( )

Logs the value of traceback.format_exc()to the stack log.

log_and_write ( f, data )

Logs a string or CStruct instance to the appropriate logs, then writes it to a file.

Parameters:
  • f – a file object to which data will be written
  • message – a string or CStruct instance

Advanced logging

As mentioned above, protlib automatically sets up a RotatingFileHandler whenyou instantiate protlib.Logger on each of the 5 logs for which noother logging handlers are defined. Because protlib uses the logging modulefrom the standard library, you can use your own configuration, handlers, formatters,etc. This is demonstated by the following example, which is included as the fileexamples/custom_logging/testing.py, although you’ll need to replace the string"smtp.example.com" with a valid outgoing mail server for the code to run properly.

import sys
import time
import logging
from logging.handlers import SMTPHandler, TimedRotatingFileHandler

from protlib import *

class Point(CStruct):
    code = CShort(always = 0x1234)
    x = CInt()
    y = CInt()

logging.basicConfig(level = logging.DEBUG)

trfh = TimedRotatingFileHandler("testing.rotating_log", "s", 1)
logging.getLogger("testing.hex").addHandler(trfh)

logger = Logger()
parser = Parser(logger)

smtp = SMTPHandler("smtp.example.com", "[email protected]", ["[email protected]"], "Stack Trace")
logging.getLogger("testing.stack").addHandler(smtp)

if __name__ == "__main__":
    with open("point.dat","w") as f:
        p1 = Point(x=5, y=6)
        logger.log_and_write(f, p1)
    
    time.sleep(2)
    
    with open("point.dat") as f:
        p2 = parser.parse(f)
    
    try:
        Point(x = "not an integer")
    except CError:
        logger.log_stacktrace()

Here’s an explanation of the customizations made to our logging:

  • The logging level is set to logging.DEBUG, which differs from the default value of logging.WARNING.
  • We use a TimedRotatingFileHandler for our hex log. Because we add this handler before instantiating protlib.Logger, this handler is used instead of the default RotatingFileHandler.
  • We use a SMTPHandler for our stack log. Because we add this handler after instantiating protlib.Logger, this is used in addition to the default RotatingFileHandler.

Protocol Handler Classes

As mentioned above, you should always have your protocol classes extend eitherthe TCPHandler or UDPHandler class, depending on what type of SocketServeryou’re using. Each of these classes inherits from ProtHandler, and you may usethese methods and fields to affect the behavior of your custom protocol handlers:

class ProtHandler

The user does not instantiate this class or any of its subclasses directly. Instead,you declare your own handler class which subclasses either TCPHandler orUDPHandler, which are themselves subclasses of ProtHandler. They also extendthe StreamRequestHandler and DatagramRequestHandler classesof the SocketServer module, respectively.

This class also inherits from the protlib.Logger class, so you can call the logfunctions listed above from your handler methods by simply calling self.log_stack(),self.log_error("Boo!"), etc.

STRUCT_MOD

By default, your handler will detect all messages present in the same modulewhere the handler class itself is defined. So you can either define your handlerin the same module where your structs are defined, or you can import thosestructs into the handler’s module. This is the recommended way to integrate yourhandlers with your struct definitions.

However, you may instead set the STRUCT_MOD field to the module where the structsare declared. (Technically this can be anything with __dict__ and__name__ fields.) You may also set this to a string which is the name ofthe module where they are declared. For example:

import module_with_structs

class SomeHandler(TCPHandler):
    STRUCT_MOD = module_with_structs

    # handler methods would go here

class AnotherHandler(UDPHandler):
    STRUCT_MOD = "module_with_structs"

    # handler methods would go here
LOG_TO_SCREEN

This is False by default, but if set to True, every log message will beprinted to the screen in addition to being written to the appropriate log.

LOG_PREFIX

Changes the prefix of each log from the name of the current script to whatever is specified.For example, if you set the LOG_PREFIX to "foo", then your logs will befoo.hex, foo.raw, etc.

These attributes are best set where your custom handler class is defined, for example:

class Handler(TCPHandler):
    LOG_TO_SCREEN = True
    LOG_PREFIX = "unified"

    # handler methods would go here
raw_data ( data )

This is the default handler for any message for which no struct has beendefined. By default this logs an error message and sends no reply. Overridethis if you wish to have your own handler for unclassified binary messages;the data parameter is a string containing the binary data of the message.

reply ( data )

Anything you return a handler method is sent back to the client, whether it’sa struct or just binary data in a string. However, sometimes you may need tosend multiple messages back to the client. You can manually concatenate thebinary data strings, or you can use the reply method, for example:

class RepeatRequest(CStruct):
    code = CShort(always = 1)
    name = CString(length = 25)
    repititions = CInt()

class Handler(TCPHandler):
    def repeat_request(self, rr):
        for i in range(rr.repititions):
            self.reply(b"Hello " + sm.name + b"!\n")
class LoggingTCPServer ( addr, handler_class )

class LoggingUDPServer ( addr, handler_class )

These classes extend theTCPServerand theUDPServerclasses from the SocketServer module, respectively. There are only two differences betweenthese and their parent classes:

  • The allow_reuse_address field is set to True for these classes.
  • When your protocol handler is used with one of these classes, the logging level of the default RotatingFileHandler objects is set to INFO. When it’s used with other classes, it’s set to CRITICAL + 1. Note that this is the level of the handlers, which is independent of the level of the loggers themselves, as explained here.

So basically, using these classes simply provides sensible default settings for your logs and sockets.

class Parser ( [ logger [, module ] ] )

If you know what struct you want, then you can use the CStruct.parse classmethodto read an instance of that struct from a file, e.g. p = Point.parse(f). However,in some cases you want to read some data from a file or socket but aren’t sure whatmessage is coming across. This class’s parse method figures out which messageis being read and returns an instance of the correct struct.

Parameters:
  • module – This is exactly the same as the ProtHandler.STRUCT_MOD field;if present then it indicates which module contains the struct classesyou want to use. If omitted, then the module where this class isinstantiated is used.
  • logger – The instance of the Logger class to use to perform logging. Ifomitted, the logging level of each default RotatingFileHandlerwill be CRITICAL + 1.
parse ( f )

This method accepts a string or file and returns an instance of the structit reads from that string/file. If the data it finds cannot be parsed intoa struct, then it just returns all of the data it is able to read. Thismay be an empty string if no data is available. Any data returned will bewritten to the appropriate logs.

None will be returned in the case of an incomplete message. In this casea message will be written to the error log.

Struct Inheritance

Many binary protocols have many message types, but every message has exactly the samefields, even if those fields have different constant values. It would be annoying if you hadto write a bunch of mostly-identical struct definitions, so protlib lets you subclassyour custom structs and only override the fields which are different in some way,such as having a default value in some subclasses but not others, etc.

Let’s walk through a simple example, which is available in the examples/struct_inheritancedirectory. First, we define our messages in common.py:

from random import randrange
from datetime import datetime

from protlib import *

SERVER_ADDR = ("127.0.0.1", 5665)

class Message(CStruct):
    code      = CInt()
    timestamp = CString(length=20, default=lambda: datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
    comment   = CString(length=100, default="")
    params    = CArray(20, CInt(default=0))

class ErrorMessage(Message): code = CInt(always = 0)
class CCRequest(Message):    code = CInt(always = 1)
class CCResponse(Message):   code = CInt(always = 2)
class ZipRequest(Message):   code = CInt(always = 3)
class ZipResponse(Message):  code = CInt(always = 4)

In this case we have a standard message format, and the only thing that varies isthe value of the code field, so we need only specify that field in our subclasses.If we needed to override other fields, we could do so in any order; the order offields would remain as however they were declared in the parent class.

Since these messages all have different constant values in their first field, we canwrite a normal handler class in our server.py:

from common import *

def credit_card_lookup(ssn):
    if ssn != [0] * 9:
        return [randrange(10) for i in range(12)]

def zip_lookup(ssn):
    if ssn != [0] * 9:
        return [randrange(10) for i in range(5)]

class Handler(TCPHandler):
    LOG_TO_SCREEN = True
    
    def cc_request(self, ccr):
        """return the credit card number of the person with the given SSN"""
        ssn = ccr.params[:9]
        cc_num = credit_card_lookup(ssn)
        if cc_num:
            return CCResponse(params = cc_num)
        else:
            return ErrorMessage(params=ssn, comment="No matching SSN")
    
    def zip_request(self, zr):
        """return the zip code of the person with the given SSN"""
        ssn = zr.params[:9]
        zip_code = zip_lookup(ssn)
        if zip_code:
            return ZipResponse(params = zip_code)
        else:
            return ErrorMessage(params=ssn, comment="No matching SSN")

if __name__ == "__main__":
    LoggingTCPServer(SERVER_ADDR, Handler).serve_forever()

Since our handler can return different types of messages depending on whether our lookupwas successful, our client.py uses the Parser class to parse all incoming messages:

import socket

from common import *

logger = Logger(also_print = True)
parser = Parser(logger)

def rand_ssn():
    return [randrange(10) for i in range(9)]

sock = socket.create_connection(SERVER_ADDR)
f = sock.makefile("rwb", 0)

logger.log_and_write(f, CCRequest(params=rand_ssn()))
ccresp = parser.parse(f)
assert ccresp.code == CCResponse.code.always

logger.log_and_write(f, ZipRequest(params=rand_ssn()))
zresp = parser.parse(f)
assert zresp.code == ZipResponse.code.always

logger.log_and_write(f, ZipRequest())
err = parser.parse(f)
assert err.code == ErrorMessage.code.always

sock.close()

Miscellaneous classes, methods, and constants

class CError

All exceptions raised by the protlib module will be instances of this class, which extends BaseException.

class CWarning

All warnings triggered by the protlib module will be instances of this class, which extends UserWarning.

underscorize ( name )

This is the function used to convert between camelCased andseparated_with_underscores names. Pass it a string and it returns anall-lower-case string with underscores inserted in the appropriate places. Younever have to call this method yourself, but you can use it as a test if you’reunsure of the correct handler method name for one of your CStruct class.If your struct names are already lower case then this function will just return theoriginal string, whether or not you are already using underscores. To makethings even clearer, here are some examples:

SomeStruct    -> some_struct
SSNLookup     -> ssn_lookup
RS485Adaptor  -> rs485_adaptor
Rot13Encoded  -> rot13_encoded
RequestQ      -> request_q
John316       -> john316
rs485adaptor  -> rs485adaptor
rot13_encoded -> rot13_encoded
hexdump ( data )

Takes a string and returns a string containing a nicely formatted table of thehexadecimal values of the data in that string. For example:

>>> from protlib import *
>>> print(hexdump(b"Hello World!"))
     0  1  2  3  4  5  6  7
  0  48 65 6c 6c 6f 20 57 6f
  8  72 6c 64 21
BYTE_ORDER

The first character of the format string passed tothe struct modulewhich determines the byte order used to parse and serialize our structs. By defaultthis is set to "!", which indicates network byte order. You may change it toany of the options available in the struct module.

AUTOSIZED

Special constant value which can be passed to the length attribute of aCString or CUnicode object to indicate that the string is null-terminatedand may have any length.

你可能感兴趣的:(python)