本篇為Extending and Embedding the Python Interpreter系列第一篇1. Extending Python with C or C++的學習筆記。
Python平易近人但效率較差,所以有時候我們會希望能用C語言實作某些功能,再由Python端調用。
舉個例子,假設我們想要寫一個名為spam
的Python包,並透過以下方式調用:
import spam
status = spam.system("ls -l")
可以料想到我們得先用C語言實作system
函數,並將它打包成package才能夠被Python端調用。分為以下步驟:
實作C函數
定義method table並把C函數加入table中
定義Python模組並把method table與Python模組做關聯
定義模組初始化函數
編譯成.so
或.pyd
檔
但是注意此處介紹的方法只適用於CPython版本的Python。摘自教學本文:
The C extension interface is specific to CPython
首先創建一個名為spam.c
的檔案,然後依照如下步驟填入內容。
為了撰寫一個能夠從Python端調用的C函數,需要事先引入Python.h
這個header:
#define PY_SSIZE_T_CLEAN
#include
注:其中#define PY_SSIZE_T_CLEAN
的作用是讓之後會提到的PyArg_ParseTuple
函數把s#
的長度參數當成Py_ssize_t
而非int
型別,詳見PY_SSIZE_T_CLEAN macro的作用。
因為在Python端是透過spam.system
調用,所以此處將函數取名為spam_system
,定義如下:
// the function that will be called by spam.system(string) from python
static PyObject*
spam_system(PyObject* self, PyObject* args) {
// The C function always has two arguments, conventionally named self and args
// The self argument points to the module object for module-level functions; for a method it would point to the object instance
// The args argument will be a pointer to a Python tuple object containing the arguments
const char* command;
int sts;
// checks the argument types and converts Python objects to C values
// on success, the string value of the argument will be copied to the local variable command, and returns true
// returns false, and may raise PyExc_TypeError on failure
if(!PyArg_ParseTuple(args, "s", &command))
// for functions returning object pointers, NULL is the error indicator
return NULL;
sts = system(command);
if(sts < 0){
// raise a spam.error exception defined in PyInit_spam
PyErr_SetString(SpamError, "System command failed");
return NULL;
}
// return an integer object
return PyLong_FromLong(sts);
// for function without return value
// Method 1:
// Py_INCREF(Py_None);
// return Py_None;
// Note: Py_None is the C name for the special Python object None
// Method 2:
// Py_RETURN_NONE
}
這段代碼包含了幾個重點,一一分析如下。
C函數總是接受self
和args
這兩個參數:
self
:根據該函數是module-level function還是class method,self
參數會分別指向module物件或是該類別的物件args
:指向的一個包含函數參數的Python tuple收到args
參數後,PyArg_ParseTuple(args, "s", &command)
會將它解析為字串型別("s"
)的command
,如果失敗,則回傳空指標NULL
。關於PyArg_ParseTuple
函數,詳見Extracting Parameters in Extension Functions。
如果成功,就會接著調用system
系統函數:sts = system(command);
得到int
型別的sts
返回值。如果system
執行失敗(也就是sts
小於0的情況),會需要進行錯誤處理。稍後將會詳述。
函數的回傳值是一個指向Python object(PyObject)的指標,C函數在回傳任何東西前必須先透過Python.h
裡的函數將C裡的變數轉換為PyObject*
型別。
如此處PyObject *PyLong_FromLong(long v)的作用便是將sts
(型別為C的int
)轉換成Python的int object(PyObject*
)。
接著將spam_system
函數註冊到SpamMethods
這個method table裡。這個method table稍後會跟名為spam
的 模組關聯在一起,使得本函數可以從Python透過spam.system
被調用。
// method table
static PyMethodDef SpamMethods[] = {
{"system", // name
spam_system, // address
METH_VARARGS, // or "METH_VARARGS | METH_KEYWORDS"
// METH_VARARGS: expect the Python-level parameters to be passed in as a tuple acceptable for parsing via PyArg_ParseTuple()
// METH_KEYWORDS: the C function should accept a third PyObject * parameter which will be a dictionary of keywords. Use PyArg_ParseTupleAndKeywords() to parse
"Execute a shell command."},
{NULL, NULL, 0, NULL} // sentinel
};
其中第三個欄位METH_VARARGS
表示Python函數將接受positional argument。
PyMethodDef
各欄位的具體意義詳見PyMethodDef。
PyDoc_STRVAR創建一個名為spam_doc
的變數,可以作為docstring使用。
// Creates a variable with name name that can be used in docstrings. If Python is built without docstrings, the value will be empty.
PyDoc_STRVAR(spam_doc, "Spam module that call system function.");
我們希望一個寫一個名為spam
的Python包/套件,所以此處需要定義spam
這個Python module在C裡的映射,命名為spammodule
:
// module definition structure
static struct PyModuleDef spammodule = {
PyModuleDef_HEAD_INIT,
"spam", // name of module
spam_doc, // module documentation, may be NULL // Docstring for the module; usually a docstring variable created with PyDoc_STRVAR is used.
-1, // size of per-interpreter state of the module, or -1 if the module keeps state in global variables.
SpamMethods // the method table
};
PyModuleDef
各欄位的具體意義詳見PyModuleDef。
PyInit_spam
函數負責初始化module:
// PyInit_spam is module’s initialization function
// must be named PyInit_name
// it will be called when python program imports module spam for the first time
// should be the only non-static item defined in the module file!
// if adding "static", variables and functions can only be used in the specific file, can't be linked through "extern"
// PyMODINIT_FUNC declares the function as PyObject * return type, declares any special linkage declarations required by the platform, and for C++ declares the function as extern "C"
PyMODINIT_FUNC
PyInit_spam(void){
PyObject* m;
// returns a module object, and inserts built-in function objects into the newly created module based upon the table (an array of PyMethodDef structures) found in the module definition
// The init function must return the module object to its caller, so that it then gets inserted into sys.modules
m = PyModule_Create(&spammodule);
if(m == NULL)
return NULL;
// if the last 2 arguments are NULL, then it creates a class who base class is Excetion
// exception type, exception instance, and a traceback object
SpamError = PyErr_NewException("spam.error", NULL, NULL);
// retains a reference to the newly created exception class
// Since the exception could be removed from the module by external code, an owned reference to the class is needed to ensure that it will not be discarded, causing SpamError to become a dangling pointer
// Should it become a dangling pointer, C code which raises the exception could cause a core dump or other unintended side effects
Py_XINCREF(SpamError);
if(PyModule_AddObject(m, "error", SpamError) < 0){
// clean up garbage (by making Py_XDECREF() or Py_DECREF() calls for objects you have already created) when you return an error indicator
// Decrement the reference count for object o. The object may be NULL, in which case the macro has no effect; otherwise the effect is the same as for Py_DECREF(), and the same warning applies.
Py_XDECREF(SpamError);
// Decrement the reference count for object o. The object may be NULL, in which case the macro has no effect; otherwise the effect is the same as for Py_DECREF(), except that the argument is also set to NULL.
Py_CLEAR(SpamError);
// Decrement the reference count for object o.
// If the reference count reaches zero, the object’s type’s deallocation function (which must not be NULL) is invoked.
Py_DECREF(m);
return NULL;
}
return m;
}
在Python程式第一次引入模組的時候會調用該模組的初始化函數。初始化函數必須被命名為PyInit_
。
初始化函數的關鍵在於PyModule_Create,它會創造一個模組物件。並且將稍早與模組物件關聯的method table插入新建的模組物件中。
初始化函數PyInit_spam
最終會把m
這個模組物件回傳給它的caller。注意到函數名前面的PyMODINIT_FUNC
,它的主要功能就是宣告函數回傳值的型別為PyObject *
;另外對於C++,它會將函數宣告為extern "C"
;對於各種不同的平台,也會為函數做link時所需的宣告。
這段代碼中用到了SpamError
物件,其定義如下:
// define your own new exception
static PyObject* SpamError;
PyErr_NewException這句初始化並創建了SpamError
這個例外類別。為了避免SpamError
之後被外部代碼從module裡被移除,所以需要使用Py_XINCREF來手動增加引用計數。
接著嘗試透過PyModule_AddObject將SpamError
加入m
這個module裡,如果成功,之後就可以透過spam.error
來存取;如果失敗,則需對SpamError
和m
減少引用計數做清理。
Py_XDECREF和Py_CLEAR都是減少物件的引用計數,為何要對SpamError
重複調用?
SpamError
物件初始化成功後,在C函數spam_system
處就可以使用PyErr_SetString拋出程序異常:
PyErr_SetString(SpamError, "System command failed");
最後一步是撰寫main函數,調用剛剛定義的Pyinit_spam
對模組做初始化:
int main(int argc, char* argv[]){
wchar_t* program = Py_DecodeLocale(argv[0], NULL);
if(program == NULL){
fprintf(stderr, "Fatal error: cannot decode argv[0]\n");
exit(1);
}
//add a built-in module, before Py_Initialize
//When embedding Python, the PyInit_spam() function is not called automatically unless there’s an entry in the PyImport_Inittab table. To add the module to the initialization table, use PyImport_AppendInittab(), optionally followed by an import of the module
if(PyImport_AppendInittab("spam", PyInit_spam) == -1){
fprintf(stderr, "Error: could not extend in-built modules table\n");
exit(1);
}
// Pass argv[0] to the Python interpreter
Py_SetProgramName(program);
//Initialize the Python interpreter. Required
//If this step fails, it will be a fatal error.
Py_Initialize();
// Optionally import the module; alternatively,
// import can be deferred until the embedded script imports it.
PyObject* pmodule = PyImport_ImportModule("spam");
if(!pmodule){
PyErr_Print();
fprintf(stderr, "Error: could not import module 'spam'\n");
}
PyMem_RawFree(program);
return 0;
}
在調用Py_Initialize函數對Python解釋器做初始化前,需要先透過PyImport_AppendInittab函數把PyInit_spam
函數加入PyImport_Inittab
這個table,這樣Py_Initialize
才會調用PyInit_spam
對spam module做初始化。
為了測試模組初始化成功與否,程式的最後透過PyImport_ImportModule嘗試import spam module。
新建一個名為spam.c
的檔案並填入以下內容:
// pulls in the Python API
#define PY_SSIZE_T_CLEAN // Make "s#" use Py_ssize_t rather than int
#include // must be included before any standard headers
// define your own new exception
static PyObject* SpamError;
// the function that will be called by spam.system(string) from python
static PyObject*
spam_system(PyObject* self, PyObject* args) {
// The C function always has two arguments, conventionally named self and args
// The self argument points to the module object for module-level functions; for a method it would point to the object instance
// The args argument will be a pointer to a Python tuple object containing the arguments
const char* command;
int sts;
// checks the argument types and converts Python objects to C values
// on success, the string value of the argument will be copied to the local variable command, and returns true
// returns false, and may raise PyExc_TypeError on failure
if(!PyArg_ParseTuple(args, "s", &command))
// for functions returning object pointers, NULL is the error indicator
return NULL;
sts = system(command);
if(sts < 0){
// raise a spam.error exception defined in PyInit_spam
PyErr_SetString(SpamError, "System command failed");
return NULL;
}
// return an integer object
return PyLong_FromLong(sts);
// for function without return value
// Method 1:
// Py_INCREF(Py_None);
// return Py_None;
// Note: Py_None is the C name for the special Python object None
// Method 2:
// Py_RETURN_NONE
}
// method table
static PyMethodDef SpamMethods[] = {
{"system", // name
spam_system, // address
METH_VARARGS, // or "METH_VARARGS | METH_KEYWORDS"
// METH_VARARGS: expect the Python-level parameters to be passed in as a tuple acceptable for parsing via PyArg_ParseTuple()
// METH_KEYWORDS: the C function should accept a third PyObject * parameter which will be a dictionary of keywords. Use PyArg_ParseTupleAndKeywords() to parse
"Execute a shell command."},
{NULL, NULL, 0, NULL} // sentinel
};
// Creates a variable with name name that can be used in docstrings. If Python is built without docstrings, the value will be empty.
PyDoc_STRVAR(spam_doc, "Spam module that call system function.");
// module definition structure
static struct PyModuleDef spammodule = {
PyModuleDef_HEAD_INIT,
"spam", // name of module
spam_doc, // module documentation, may be NULL // Docstring for the module; usually a docstring variable created with PyDoc_STRVAR is used.
-1, // size of per-interpreter state of the module, or -1 if the module keeps state in global variables.
SpamMethods // the method table
};
// PyInit_spam is module’s initialization function
// must be named PyInit_name
// it will be called when python program imports module spam for the first time
// should be the only non-static item defined in the module file!
// if adding "static", variables and functions can only be used in the specific file, can't be linked through "extern"
// PyMODINIT_FUNC declares the function as PyObject * return type, declares any special linkage declarations required by the platform, and for C++ declares the function as extern "C"
PyMODINIT_FUNC
PyInit_spam(void){
PyObject* m;
// returns a module object, and inserts built-in function objects into the newly created module based upon the table (an array of PyMethodDef structures) found in the module definition
// The init function must return the module object to its caller, so that it then gets inserted into sys.modules
m = PyModule_Create(&spammodule);
if(m == NULL)
return NULL;
// if the last 2 arguments are NULL, then it creates a class who base class is Excetion
// exception type, exception instance, and a traceback object
SpamError = PyErr_NewException("spam.error", NULL, NULL);
// retains a reference to the newly created exception class
// Since the exception could be removed from the module by external code, an owned reference to the class is needed to ensure that it will not be discarded, causing SpamError to become a dangling pointer
// Should it become a dangling pointer, C code which raises the exception could cause a core dump or other unintended side effects
Py_XINCREF(SpamError);
if(PyModule_AddObject(m, "error", SpamError) < 0){
// clean up garbage (by making Py_XDECREF() or Py_DECREF() calls for objects you have already created) when you return an error indicator
// Decrement the reference count for object o. The object may be NULL, in which case the macro has no effect; otherwise the effect is the same as for Py_DECREF(), and the same warning applies.
Py_XDECREF(SpamError);
// Decrement the reference count for object o. The object may be NULL, in which case the macro has no effect; otherwise the effect is the same as for Py_DECREF(), except that the argument is also set to NULL.
Py_CLEAR(SpamError);
// Decrement the reference count for object o.
// If the reference count reaches zero, the object’s type’s deallocation function (which must not be NULL) is invoked.
Py_DECREF(m);
return NULL;
}
return m;
}
int main(int argc, char* argv[]){
wchar_t* program = Py_DecodeLocale(argv[0], NULL);
if(program == NULL){
fprintf(stderr, "Fatal error: cannot decode argv[0]\n");
exit(1);
}
//add a built-in module, before Py_Initialize
//When embedding Python, the PyInit_spam() function is not called automatically unless there’s an entry in the PyImport_Inittab table. To add the module to the initialization table, use PyImport_AppendInittab(), optionally followed by an import of the module
if(PyImport_AppendInittab("spam", PyInit_spam) == -1){
fprintf(stderr, "Error: could not extend in-built modules table\n");
exit(1);
}
// Pass argv[0] to the Python interpreter
Py_SetProgramName(program);
//Initialize the Python interpreter. Required
//If this step fails, it will be a fatal error.
Py_Initialize();
// Optionally import the module; alternatively,
// import can be deferred until the embedded script imports it.
PyObject* pmodule = PyImport_ImportModule("spam");
if(!pmodule){
PyErr_Print();
fprintf(stderr, "Error: could not import module 'spam'\n");
}
PyMem_RawFree(program);
return 0;
}
參考Python进阶笔记C语言拓展篇(二)动态链接库pyd+存根文件pyi——学会优雅地使用C\C++拓展Python模块的正确/官方姿势和Build .so file from .c file using gcc command line,使用以下指令編譯及鏈接。
分成兩步,首先編譯,得到spam.o:
gcc -c -I /usr/include/python3.8 -o spam.o -fPIC spam.c
然後鏈接,得到spam.so:
gcc -shared -L /usr/lib/python3.8/config-3.8-x86_64-linux-gnu -lpython3.8 -o spam.so spam.o
注:如果是Windows平台,需將.so
改成.pyd
這兩步可以合併為:
gcc -shared -I /usr/include/python3.8 -L /usr/lib/python3.8/config-3.8-x86_64-linux-gnu -o spam.so -fPIC spam.c
可以善用Python flags,這樣我們就不必手動去尋找Python的include和lib目錄。
分以下這兩步:
gcc -c $(python3.8-config --includes) -o spam.o -fPIC spam.c
gcc -shared $(python3.8-config --ldflags) -o spam.so -fPIC spam.o
可以合併為:
gcc -shared $(python3.8-config --includes) $(python3.8-config --ldflags) -o spam.so -fPIC spam.c
參考如Py似C:Python 與 C 的共生法則和4. Building C and C++ Extensions,創建setup.py
,填入以下內容:
from distutils.core import setup, Extension
spammodule = Extension('spam', sources=['spam.c'])
setup(name='Spam',
description='',
ext_modules=[spammodule],
)
然後用以下指令編譯生成.so
:
python3 setup.py build_ext --inplace
結果如下:
.
├── build
│ └── temp.linux-x86_64-3.8
│ └── spam.o
└── spam.cpython-38-x86_64-linux-gnu.so
C++編出so或pyd(動態鏈接庫,相當於dll)後Python可以直接import:
>>> import spam
>>> spam.system("ls")
spam.c spam.o spam.so
0 # spam.system的回傳值
使用distutils編譯出來的so調用方式跟直接import一樣。
如果希望將package name由spam改成spammodule該怎麼做呢?
因為初始化函數的名稱必須是PyInit_
,所以首先將PyInit_spam
改成PyInit_spammodule
。PyImport_AppendInittab
的第一個參數和PyImport_ImportModule
的參數代表的也是Python模組名稱,所以也需要將它們改成spammodule。修改之後可以編譯成功,但import時會出現:
>>> import spam
Traceback (most recent call last):
File "" , line 1, in <module>
ImportError: dynamic module does not define module export function (PyInit_spam)
>>> import spammodule
Traceback (most recent call last):
File "" , line 1, in <module>
ModuleNotFoundError: No module named 'spammodule'
參考Unable to solve “ImportError: dynamic module does not define module export function”:
In PyInit_ modname should match the filename.
再把檔名由spam.c
改成spammodule.c
,記得Extension
的第二個參數也要改,但結果還是跟上面一樣。
改了這樣還不夠,根據distutils.core.Extension,它的name
參數和sources
參數分別代表:
name: the full name of the extension, including any packages — ie. not a filename or pathname, but Python dotted name
sources: list of source filenames, relative to the distribution root (where the setup script lives), in Unix form (slash-separated) for portability. Source files may be C, C++, SWIG (.i), platform-specific resource files, or whatever else is recognized by the build_ext command as source for a Python extension.
Extension
的name
參數表示import時Python函式庫的名字,因此也需要修改。修改之後便可以成功運行:
>>> import spam
Traceback (most recent call last):
File "" , line 1, in <module>
ModuleNotFoundError: No module named 'spam'
>>> import spammodule
>>> spammodule.system("ls")
build setup.py spammodule.c spammodule.cpython-38-x86_64-linux-gnu.so
0
至於distutils.core.setup
的第一個參數,參考Distutils Examples - Pure Python distribution:
Note that the name of the distribution is specified independently with the name option, and there’s no rule that says it has to be the same as the name of the sole module in the distribution
它決定的是發布時的套件(package)名稱,因此可以使用與模組不同的名字。
參考How to run .so files using through python script,使用cdll.LoadLibrary
導入so檔:
>>> from ctypes import cdll, c_char_p, c_wchar_p
>>> spam = cdll.LoadLibrary("spam.cpython-38-x86_64-linux-gnu.so")
Traceback (most recent call last):
File "" , line 1, in <module>
File "/usr/lib/python3.8/ctypes/__init__.py", line 451, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: spam.cpython-38-x86_64-linux-gnu.so: cannot open shared object file: No such file or directory
找不到當前目錄下的.so
,在前面加上./
就可以找到了(不知為何不加./
就會找不到?):
>>> spam = cdll.LoadLibrary("./spam.cpython-38-x86_64-linux-gnu.so")
>>> spam = cdll.LoadLibrary("./spam.so")
嘗試調用spam.system
函數,直接傳入字串參數,卻發現執錯誤:
>>> spam.system("ls")
sh: 1: l: not found
32512
查看ctypes文檔:
None, integers, bytes objects and (unicode) strings are the only native Python objects that can directly be used as parameters in these function calls. None is passed as a C NULL pointer, bytes objects and strings are passed as pointer to the memory block that contains their data (char * or wchar_t *). Python integers are passed as the platforms default C int type, their value is masked to fit into the C type.
才發現這是因為Python中只有None,整數,bytes和unicode字串才可以直接作為參數被傳入函數。其它型別則需要用Fundamental data types列出的函數做轉換。
注意其中c_char_p
雖然是回傳char*
,但它只接受bytes object,如果傳入字串的話會報錯:
>>> c_char_p("ls")
Traceback (most recent call last):
File "" , line 1, in <module>
TypeError: bytes or integer address expected instead of str instance
而c_wchar_p
雖然接受字串為參數,但是它回傳的型別是wchar_t*
,所以一樣會出錯:
>>> spam.system(c_wchar_p("ls"))
sh: 1: l: not found
32512
參考Different behaviour of ctypes c_char_p?,正確的做法是先將Python字串轉換為bytes object,傳入c_char_p
得到char*
後,才能調用extension函數。有兩種方法,參考【Python】隨記:印出來常見「b」,但究竟什麼是b呢?:
>>> spam.system(c_char_p(b"ls"))
build demo setup.py spam.c spam.cpython-38-x86_64-linux-gnu.so system.c test.py
0
或:
>>> spam.system(c_char_p("ls".encode('utf-8')))
build demo setup.py spam.c spam.cpython-38-x86_64-linux-gnu.so system.c test.py
0
在字串前面加b
和在後面加.encode('utf-8')
有何區別呢?可以來實驗一下:
>>> s = "str"
>>> s.encode("utf-8")
b'str'
>>> s.encode("ascii")
b'str'
>>> s.encode("utf-8") == b'str'
True
>>> s.encode("ascii") == b'str'
True
>>> s.encode("utf-8") == s.encode("ascii")
True
>>> s = "str"
>>> c_char_p(s.encode('utf-8'))
c_char_p(140096259724976)
>>> c_char_p(b"str")
c_char_p(140096243429648)
用UTF-8和ascii編碼後的字串都與字串前面加b
相等。但為何這兩種編碼方式得到的byte string是一樣的呢?可以猜想這是因為UTF-8編碼的前128個字元就是ASCII編碼,所以對於純英文的字串,使用這兩種編碼所得到的byte string是相同的。