Writing High-Efficiency Large Python Systems--Lesson #2: Use nothing but local syslog
Writing High-Efficiency Large Python Systems--Lesson #2: Use nothing but local syslog
07/03/08
Writing High-Efficiency Large Python Systems--Lesson #2: Use nothing but local syslog
You want to log everything, but you'll find that even in the simplest requests with the fastest response times, a simple file-based access log can add 10% to your response time (which usually means ~91% as many requests per second). The fastest substitute we've found for file-based logging in Python is syslog. Here's how easy it is:
import syslog syslog.syslog(facility | priority, msg)
Nothing's faster, at least nothing that doesn't require you telling Operations to compile a new C module on their production servers.
"But wait!" you say, "Python's builtin logging module has a SysLogHandler! Use that!" Well, no. There are two reasons why not. First, because Python's logging module in general is bog-slow--too slow for high-efficiency apps. It can make many function calls just to decide it's not going to log a message. Second, the SysLogHandler in the stdlib uses a UDP socket by default. You can pass it a string for the address (probably '/dev/log') and it will use a UNIX socket just like syslog.syslog, but it'll still do it in Python, not C, and you still have all the logging module overhead.
Here's a SysLogLibHandler if you're stuck with the stdlib logging module:
class SysLogLibHandler(logging.Handler): """A logging handler that emits messages to syslog.syslog.""" priority_map = { 10: syslog.LOG_NOTICE, 20: syslog.LOG_NOTICE, 30: syslog.LOG_WARNING, 40: syslog.LOG_ERR, 50: syslog.LOG_CRIT, 0: syslog.LOG_NOTICE, } def __init__(self, facility): self.facility = facility logging.Handler.__init__(self) def emit(self, record): syslog.syslog(self.facility | self.priority_map[record.levelno], self.format(record))
I suggest using syslog.LOCAL0 - syslog.LOCAL7 for the facility arg. If you're writing a server, use one facility for access log messages and a different one for error/debug logs. Then you can configure syslogd to handle them differently (e.g., send them to /var/log/myapp/access.log and /var/log/myapp/error.log).
9 comments
Comment from: raz [Visitor]Good one. And there are other reasons to use syslog, too. For example, in a distributed environment you'll want to aggregate logs at some point in time (i.e. "the logserver") and syslog can do that - probably more efficiently than your reinvented wheel would.
Btw... when will the messy cherrypy logging be fixed to use syslog? ;-)07/07/08 @ 20:42
Comment from: Patrick Mézard [Visitor]Perhaps the column name should be suffixed like "Writing High-Efficiency Large Python Systems--Lesson Under Unix".
07/08/08 @ 01:20
Comment from: Alec Munro [Visitor]As Patrick pointed out, what about us poor Windows users? (Or was Lesson #1: Don't use Windows?)
Is there a decent free alternative, if logging is so bad?07/08/08 @ 11:51
Comment from: fumanchu [Member]I honestly haven't found a good solution on Windows, because I've never been on a team that deployed high-load apps on Windows. I develop almost exclusively on Windows (Vista, at the moment), and therefore end up hacking in alternatives in our code based on sys.platform--usually, this is just a file logger. But it's only exercised in tests and never benchmarked. I'd love to hear some ideas and stats on what's best for win32.
07/08/08 @ 18:06
Comment from: Rene Dudfield [Visitor] · http://rene.f0o.com/hi,
Another trick is to use something like multilog from daemontools.
That fails your requirement of not needing something to be installed by operations most likely (unless the machine runs qmail or djbdns).
Multilog is way faster than syslog.
http://cr.yp.to/daemontools/multilog.html
It also rotates log files based on size. So can limit the total amount of logs you have. Which is nice when you don't care all that much about your logs -- and you don't want your machine running out of disk space because of heaps of logs.07/13/08 @ 19:04
Comment from: Japherwocky [Visitor]I patched cherrypy to use syslog for a work project a while back; it's actually pretty painless.
10/24/08 @ 13:37
Comment from: Anton [Visitor]If you are writing High-Efficient applications, consider avoid Python logging module at all. I has single lock per handle where all the threads lock down.
Or patch the SysLogHandler as recommended in this post:
http://kpoxit.blogspot.com/2008/11/python-logging-in-threaded-application.html11/27/08 @ 13:59
Comment from: Shane C. Mason [Visitor] · http://wrongdog.netI am afraid this looks more like a 'common misconception' than truth, or is based off some old version of Python. I put together a series of tests to try your assertions out, fully expecting them to be correct, and only disproved them.
See this article:
http://wrongdog.net/component/content/article/1-technology/6-comparing-python-logging-facilities
From what I saw, when logging unique messages, Python's file based logging was about 17 times faster than the syslog module.03/12/09 @ 18:58
Comment from: Derek Simkowiak [Visitor]I can confirm that the standard Python "logging" module is very slow.
I wrote a vanilla UDP packet server to see how fast Python can be with sockets. I just used the standard SocketServer class.
The server just reads a UDP packet, logs the contents, and sends a reply back that says "ACK".
My dev box can process 63,000 UDP packets per second with logging turned off. (w00t!)
In can only process 16,000 UDP packets per second with logging turn on. Boo! This is true even if I write to a RAM file under /dev/shm/, so it's not disk I/O.
I haven't tried the syslog module to compare. Using /dev/log sounds like a nice way to handle it for Linux-based systems.02/25/10 @ 17:38