Embedding Python in Your C Programs

Embedding Python in Your C Programs

From Issue #142
February 2006

 in
C, meet Python. Python, this is C. With surprisingly little effort, the Python interpreter can be integrated into your program to add features quickly that could take months if written entirely in C.

The language of choice for large, high-performance applicationsin Linux is almost always C, or somewhat less often C++. Both arepowerful languages that allow you to create high-performance nativelycompiled programs. However, they are not languages that lend themselvesto runtime flexibility. Once a C/C++ application is compiled, its code ispretty much static. At times, that can be a real hindrance. For example,if you want to allow users of a program to create plugins easily thatextend the application's functionality, you have to deal with complexdynamic linking issues that can cause no end of headaches. Additionally,your users will have to know C/C++ in order to extend the application,which severely limits the number of people capable of writing extensions.

A much better solution is to provide your users with a scripting languagethey can use to extend your application. With a scripting language,you will tend to have much more runtime flexibility, as well as shorterdevelopment times and a lower learning curve that will extend the baseof users capable of creating extensions.

Unfortunately, creating a scripting language is very much a nontrivialtask that easily could become a major portion of your program.Fortunately, you don't need to create a scripting language. With Python,you can embed the interpreter directly into your application and exposethe full power and flexibility of Python without adding very much codeat all to your application.

Including Python in an Application

Including the Python interpreter in your program is extremely simple.Python provides a single header file for including all of the definitionsyou need when embedding the interpreter into your application,aptly named Python.h. This contains a lot of stuff, including several ofthe standard headers. For compiling efficiency, it might be nice if youcould include only those parts of the interface that you actually intendto use, but unfortunately Python doesn't really give you that option.If you take a look at the Python.h file, you'll see that it definesseveral important macros and includes a number of common headers thatare required by the individual components included later in the file.

To link your application to the Python interpreter at compile time,you should run the python-config program to get a list of the linkingoptions that should be passed to the compiler. On my system, those are:

-lpython2.3 -lm -L/usr/lib/python2.3/config
A Very Simple Embedded App

So, how much code does it take to run the Python interpreter from aC app? As it turns out, very little. In fact, if you look at Listing1, you'll see that it can be done in as little as three lines of code,which initialize the interpreter, send it a string of Python code toexecute and then shut the interpreter back down.

Or, you could embed an interactive Python terminal in your program bycalling Py_Main() instead, as in Listing 2. This brings up theinterpreter just as if you'd run Python directly from the command line.Control is returned to your application after the user exits fromthe interpreter shell.

The Python Environment

Embedding the interpreter in three lines of code is easy enough,but let's face it, just executing arbitrary strings of Python code insidea program is neither interesting nor all that useful. Fortunately,it's also far from the extent of what Python allows. Before I get toodeep into what it can do though, let's take a look at initializing theenvironment that Python executes within.

When you run the Python interpreter, the main environment context isstored in the __main__ module's namespace dictionary. All functions,classes and variables that are defined globally can be found in thisdictionary. When running Python interactively or on a script file, yourarely need to care about this global namespace. However, when runningthe embedded interpreter, you'll often need to access this dictionary toget references to functions or classes in order to call or construct them.You also may find that you occasionally want to copy the global dictionaryso that different bits of code can be run in distinct environments.For instance, you might want to create a new environment for each pluginthat you load.

To get at the __main__ module's dictionary, you first need toget a reference to the module. You can do this by calling thePyImport_AddModule() function, which looks up the module name yousupply and returns a PyObject pointer to that object. Why a PyObject?All Python data types derive from PyObject, which makes it a handylowest-common denominator. Therefore, almost all of the functions that you'lldeal with when interacting with the Python interpreter will take or returnpointers to PyObjects rather than another more specific Python data type.

Once you have the __main__ module referenced by a PyObject, you canuse the PyModule_GetDict() function to get a reference to the mainmodule's dictionary, which again is returned as a PyObject pointer.You can then pass the dictionary reference when you execute other Pythoncommands. For example, Listing 3 shows how you could duplicate theglobal environment and execute two different Python files in separateenvironments.

C, meet Python. Python, this is C. With surprisingly little effort, the Python interpreter can be integrated into your program to add features quickly that could take months if written entirely in C.

I'll get into the details of how PyRun_File() works in a little bit,but if you look carefully at Listing 3, you should notice somethinginteresting. When I call PyRun_File() to execute the files, thedictionary gets passed in twice. The reason for this is that Pythoncode actually has two environmental contexts when it is executed.The first is the global context, which I've already talked about.The second context is the local context, which contains any locallydefined variables or functions. In this case, those are the same, becausethe code being executed is top-level code. On the other hand, if youwere to execute a function dynamically using multiple C-level calls,you might want to create a local context and use that instead of theglobal dictionary. For the most part though, it's generally safe topass the global environment for both the global and local parameters.

Manipulating Python Data Structures in C/C++

At this point, I'm sure you've noticed the Py_DECREF() calls thatpopped up in the Listing 3 example. Those fun little guys are therefor memory management purposes. Inside the interpreter, Python handlesmemory management automatically by keeping track of all references tomemory transparent to the programmer. As soon as it determines that allreferences to a given chunk of memory have been released, it deallocatesthe no-longer needed chunk. This can be a problem when you start workingon the C side though. Because C is not a memory-managed language, as soonas a Python data structure ends up referenced from C, all ability to trackthe references automatically is lost to Python. The C application can make as manycopies of the reference that it wants, and hold on to it indefinitelywithout Python knowing anything about it.

The solution is to have C code that gets a reference to a Python objecthandle all of the reference counting manually. Generally, when a Pythoncall hands an object out to a C program, it increments the reference countby one. The C code can then do what it likes with the object without worryingthat it will be deleted out from under it. Then when the C program isdone with the object, it is responsible for releasing its reference bymaking a call to Py_DECREF().

It's important, though, to remember when you copy a pointer withinyour C program that may outlast the pointer from which you're copying, youneed to increment the reference count manually, by calling Py_INCREF().For example, if you make a copy of a PyObject pointer to store insidean array, you'll probably want to call Py_INCREF() to ensure thatthe pointed-to object won't get garbage-collected after the originalPyObject reference is decremented.

Executing Code from a File

Now let's take a look at a slightly more useful example to see how Pythoncan be embedded into a real program. If you take a look at Listing4, you'll see a small program that allows the user to specify shortexpressions on the command line. The program then calculates theresults of those expressions and displays them in the output. To add alittle spice to the mix, the program also lets users specify a fileof Python code that will be loaded before the expressions are executed.This way, the user can define functions that will be available to thecommand-line expressions.

Two basic Python API functions are used in this program,PyRun_SimpleString() and PyRun_AnyFile(). You've seen PyRun_SimpleString()before. All it does is execute the given Python expressionin the global environment. PyRun_SimpleFile() is similar to thePyRun_File() function that I discussed earlier, but it runs things in theglobal environment by default. Because everything is run in theglobal environment, the results of each executed expression or group ofexpressions will be available to those that are executed later.

Getting a Callable Function Object

Now, let's say that instead of having our expression calculator executea list of expressions, you'd rather have it load a function f() fromthe Python file and execute it a variable number of times to calculatean aggregate total, based on a number provided on the command line.You could execute the function simply by runningPyRun_SimpleString("f()"),but that's really not very efficient, as it requires the interpreterto parse and evaluate the string every time it's called. It would bemuch better if we could reference the function directly to call it.

If you recall, Python stores all globally defined functions in theglobal dictionary. Therefore, if you can get a reference to the globaldictionary, you can extract a reference to any of the defined functions.Fortunately, the Python API provides functions for doing just that.You can see it in use by taking a look at Listing 5.

To obtain the function reference, the program first gets a reference to themain module by “importing” it using the PyImport_AddModule("__main__")function. Once it has this reference to the main module, the program usesthe PyModule_GetDict() function to extract its dictionary. From there, it's simply a matter of calling PyDict_GetItemString(global_dict, "f")to extract the function from the dictionary.

Now that the program has a reference to the function, it can call it usingthe PyObject_CallObect() function. As you can see, this takes a pointerto the function object to call. Because the function itself already existsin the Python environment, it is already compiled. That means whenyou perform the call, there is no parsing and little or no compilationoverhead, which means the function can be executed quite quickly.

Passing Data in Function Calls

At this point, I'm sure you're starting to think, “Gee whiz, this isgreat but it would be a whole lot better if I could actually pass somedata to these functions I'm calling.” Well, you need wonder no longer.As it turns out, you can do exactly that. One way is through the use of thatmysterious NULL value that you saw being passed to PyObject_CallObject inListing 5. I'll talk about how that works in a bit, but first there isa much easier way to call functions with arguments that are in the formof C/C++ data types, PyObject_CallFunction(). Instead of requiring youto perform C-to-Python conversions, this handy function takes a formatstring and a variable number of arguments, much like the printf() familyof functions.

Looking back at our calculator program, let's say you want to evaluatean expression over a range of noncontiguous values. If the expressionto evaluate is defined in a function provided by the loaded Pythonfile, you can get a reference as normal and then iterate over therange. For each value, simply call PyObject_CallFunction(expression,"i", num). The “i” string tells Python that you will be passing aninteger as the only argument. If the function you were calling took twointegers and a string instead, you could make the function call asPyObject_CallFunction(expression, "iis", num1, num2, string). If thefunction has a return value, it will be passed to you in the return valueof PyObject_CallFunction(), as a PyObject pointer.

That's the easiest way to pass arguments to a Python function, butit's not actually the most flexible. Think about it for a second.What happens if you are dynamically choosing the function to call?The odds are that you're going to want the flexibility to call a varietyof functions that accept different numbers and types of arguments.However, with PyObject_CallFunction(), you have to choose the number andtype of the arguments at compile time, which hardly fits with the spiritof flexibility inherent in embedding a scripting language.

The solution is to use PyObject_CallObject() instead. This functionallows you to pass a single tuple of Python objects instead of thevariable-length list of native C data items. The downside here is thatyou will need to convert native C values to Python objects first, butwhat you lose in execution speed is made up for in flexibility. Of course,before you can pass values to your function as a Python tuple, you'llneed to know how to create the tuple, which brings me to the next section.

Converting Between Python and C Data Types

Python data structures are returned from and passed to the Pythoninterpreter in the form of PyObjects. To get to a specific type,you need to perform a cast to the correct type. For instance,you can get to a PyIntObject pointer by casting a PyObject pointer.If you don't know for sure what the variable's type is, though, blindlyperforming a cast could have disastrous results. In such a case, you cancall one of the many Check() functions to see if an object is indeed ofan appropriate type, such as the PyFloat_Check() function that returnstrue if the object could indeed be cast to a float. In other words,it returns true if the object is a float or a subtype of a float.If you'd rather know whether the object is exactly a float, not a subclass,you can use PyFloat_CheckExact().

The opaque PyObject structure isn't actually useful to a C programthough. In order to access Python data in your program, you'll need to usea variety of conversion functions that will return a native C type.For example, if you want to convert a PyObject to a long int, you canrun PyInt_AsLong(). PyInt_AsLong is a safe function, and will performa checked casting to PyIntObject before extracting the long int value.If you know for sure that the value you're converting is indeed anint, it may be wasteful to perform the extra checking—especiallyif it's inside of a tight loop.

Often, Python functions ask for or return Python sequence objects, suchas tuples or lists. These objects don't have directly correspondingtypes in C, but Python provides functions that allow you to build themfrom C data types. As an example, let's take a look at building a tuplesince you'll need to be able to do that to call a functionusing PyObject_CallObject().

The first step to creating a new tuple is to construct an empty tuplewith PyTuple_New(), which takes the length of the tuple and returns aPyObject pointer to a new tuple. You can then use PyTuple_SetItem to setthe values of the tuple items, passing each value as a PyObject pointer.

Conclusion

You should now have enough to get started with embedding Python scriptsinside your own applications. For more information, take a look at thePython documentation. “Extending and Embedding the PythonInterpreter”goes into more detail on going the other direction and embedding Cfunctions inside Python. The “Python/C API Reference Manual” alsohas detailed reference documentation on all of the functions availablefor embedding Python in your program. TheLinux Journal archives alsocontain an excellent article from Ivan Pulleyn that discusses issuesfor multithreaded programs that embed Python.

Resources for this article:/article/8714.

William Nagel is the Chief Software Engineer for StageLogic, LLC, a small software development company, where he developsreal-time systems based on Linux. He is also the author of“Subversion Version Control: Using the Subversion Version ControlSystem in Development Projects”.




评论

The Python/C API is very low level, verbose, painful to work with, and highly error prone. With C++ code in particular it just sucks - you'll spend half your time writing const_cast("blah") to work around "interesting" APIs or writing piles of "extern C" wrapper functions, and the rest of your time writing verbose argument encoding/decoding or reference counting code. It's immensely frustraing to use plain C to write a Python object to wrap a C++ object.

Do yourself a favour, and once you've got embedding working, expose the Python interface to your program using a higher level tool. I hear SWIG is pretty good, especially for plain C code, but haven't used it myself (I work with heavily templated C++ with Qt). SIP (used to make PyQt) from Riverbank computing has its advantages also, and is good if you want your Qt app's API to integrate cleanly into PyQt. Otherwise, I'd suggest the amazing Boost::Python for C++ users, as it's ability to almost transparently wrap your C++ interfaces, mapping them to quite sensible Python semantics, is pretty impressive.

Boost::Python has the added advantage that you can write some very nice C++ code that integrates cleanly with Python. For example, you can iterate over a Python list much like a C++ list, throw exceptions between C++ and Python, build Python lists as easily as (oversimplified example):


#include <boost/python.hpp>
using namespace boost::python;


boost::python::list make_list()
{
    list pylist;
    for (int i = 0; i != 10; ++i)
        pylist.append( make_tuple( "fred", 10 ) );
    return pylist;
}

The equivalent Python/C API code is longer, filled with dangerous reference count juggling, contains a lot of manual error checking that's often ignored, and is a lot uglier.

With regards to the article above, it's good to see things like this written. I had real trouble getting started with embedding Python, and I think this is a pretty well written intro. I do take issue with one point, though, and that's duplicating the environment. Cloning the main dict does not provide separate program environments - very far from it. It only gives them different global namespaces. Interpreter-wide state changes still affect both programs. For example, if one program imports a module, the other one can see it in sys.modules ; if one program changes a setting in a module, the other one is affected. Locale settings come to mind. Most well designed modules will be fine, but you'll run into the odd one that thinks that module-wide globals are a good idea and consequently chokes.

Unfortunately, the alternative is to use sub-interpreters. Sub-interpreters are a less than well documented part of Python's API, and as they rely on thread local storage they're hopeless in single threaded programs. They can be made to work (see Scribus, for example) but it's not overly safe, and will abort if you use a Python debug build.

When you combine this with a GUI toolkit like Qt3 that only permits GUI operations from the main thread (thankfully, the limitiation is relieved by Qt4), this becomes very frustrating. If you're not stuck with this limitation, you can just spawn off a thread for your users' scripts, and should consider designing your interface that way right from the start.

Embedding Python is handy. Have fun.

P.S: LJ staff, please fix your comment engine's braindeath about leading whitespace, < and > chars in <code> sections, and blank lines in <code> sections. Thankyou. Repeated &nbsp; entities are not a fun way to format text.

--
Craig Ringer
[email protected]


你可能感兴趣的:(Embedding Python in Your C Programs)