Python GIL及其釋放/獲取函數

Python GIL及其釋放/獲取函數

前言

Python默認使用CPython解釋器,當中會引入GIL,參考GlobalInterpreterLock:

In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once.

在CPython解釋器中,GIL(全局解釋器鎖)是一個互斥鎖,用於保護Python物件,避免它們在多線程執行被同時存取。

In hindsight, the GIL is not ideal, since it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. Luckily, many potentially blocking or long-running operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck.

多線程程式花了很多時間在GIL裡,解釋CPython bytecode,這時候GIL的存在便使得多線程程式無法充份利用多核系統。

還好pybind11有提供釋放GIL的機制,參考pybind11 documentation - Global Interpreter Lock (GIL):

The classes gil_scoped_release and gil_scoped_acquire can be used to acquire and release the global interpreter lock in the body of a C++ function call. In this way, long-running C++ code can be parallelized using multiple Python threads, 

pybind11提供了釋放和重新獲取GIL的API,即gil_scoped_releasegil_scoped_acquire這兩個類別,可以利用這兩個API將運行很久的C++代碼平行化。

因為gil_scoped_releasegil_scoped_acquire會分別調用PyEval_SaveThread, PyEval_RestoreThread, PyEval_AcquireThreadPyThreadState_Clear,接下來先來看看這些底層API。

PyEval_SaveThread/PyEval_RestoreThread

PyEval_SaveThreadPyEval_RestoreThread分別由gil_scoped_release的constructor和destructor調用;PyEval_AcquireThreadPyThreadState_Clear則分別由gil_scoped_acquire的constructor和destructor調用。

PyEval_SaveThread

參考官方文檔PyEval_SaveThread:

PyThreadState *PyEval_SaveThread()
Part of the Stable ABI.
Release the global interpreter lock (if it has been created) and reset the thread state to NULL, returning the previous thread state (which is not NULL). If the lock has been created, the current thread must have acquired it.

PyEval_SaveThread的作用是釋放GIL並且reset線程的狀態,然後回傳之前的線程狀態。

PyEval_RestoreThread

參考官方文檔PyEval_RestoreThread:

void PyEval_RestoreThread(PyThreadState *tstate)
Part of the Stable ABI.
Acquire the global interpreter lock (if it has been created) and set the thread state to tstate, which must not be NULL. If the lock has been created, the current thread must not have acquired it, otherwise deadlock ensues.

PyEval_RestoreThread的作用是獲取GIL並且將線程狀態設為tstate。對照gil_scoped_release的代碼,可以知道tstatePyEval_SaveThread的回傳值,也就是釋放GIL前的線程狀態。

PyEval_AcquireThread

PyEval_AcquireThread

void PyEval_AcquireThread(PyThreadState *tstate)
Part of the Stable ABI.
Acquire the global interpreter lock and set the current thread state to tstate, which must not be NULL. The lock must have been created earlier. If this thread already has the lock, deadlock ensues.

Note Calling this function from a thread when the runtime is finalizing will terminate the thread, even if the thread was not created by Python. You can use _Py_IsFinalizing() or sys.is_finalizing() to check if the interpreter is in process of being finalized before calling this function to avoid unwanted termination.
Changed in version 3.8: Updated to be consistent with PyEval_RestoreThread(), Py_END_ALLOW_THREADS(), and PyGILState_Ensure(), and terminate the current thread if called while the interpreter is finalizing.

PyEval_RestoreThread() is a higher-level function which is always available (even when threads have not been initialized).

獲取GIL並將當前線程狀態設為tstate(不能為NULL)。GIL必須事先被創造。如果本線程已經有GIL了,就會出現deadlock的情況。

作用類似PyEval_RestoreThread

PyThreadState_Clear

PyThreadState_Clear

void PyThreadState_Clear(PyThreadState *tstate)
Part of the Stable ABI.
Reset all information in a thread state object. The global interpreter lock must be held.

Changed in version 3.9: This function now calls the PyThreadState.on_delete callback. Previously, that happened in PyThreadState_Delete().

重設線程狀態物件tstate中的所有訊息。必須擁有GIL。

gil_scoped_release/gil_scoped_aquire

參考pybind11 documentation - Global Interpreter Lock (GIL):

but great care must be taken when any gil_scoped_release appear: if there is any way that the C++ code can access Python objects, gil_scoped_acquire should be used to reacquire the GIL. 

使用了gil_scoped_release之後,如果有C++代碼想存取Python物件,必須先使用gil_scoped_acquire重新獲取GIL。

如同How to use pybind11 in multithreaded application中所給的示例:

pybind11::gil_scoped_release release;

while (true)
{
    // do something and break
}

pybind11::gil_scoped_acquire acquire;

gil_scoped_release

為了做驗證,來查看pybind11/gil.h中gil_scoped_release的constructor:

constructor

    explicit gil_scoped_release(bool disassoc = false) : disassoc(disassoc) {
        // `get_internals()` must be called here unconditionally in order to initialize
        // `internals.tstate` for subsequent `gil_scoped_acquire` calls. Otherwise, an
        // initialization race could occur as multiple threads try `gil_scoped_acquire`.
        auto &internals = detail::get_internals();
        // NOLINTNEXTLINE(cppcoreguidelines-prefer-member-initializer)
        tstate = PyEval_SaveThread();
        if (disassoc) {
            // Python >= 3.7 can remove this, it's an int before 3.7
            // NOLINTNEXTLINE(readability-qualified-auto)
            auto key = internals.tstate;
            PYBIND11_TLS_DELETE_VALUE(key);
        }
    }

調用了PyEval_SaveThread,也就是會釋放GIL。

destructor

    ~gil_scoped_release() {
        if (!tstate) {
            return;
        }
        // `PyEval_RestoreThread()` should not be called if runtime is finalizing
        if (active) {
            PyEval_RestoreThread(tstate);
        }
        if (disassoc) {
            // Python >= 3.7 can remove this, it's an int before 3.7
            // NOLINTNEXTLINE(readability-qualified-auto)
            auto key = detail::get_internals().tstate;
            PYBIND11_TLS_REPLACE_VALUE(key, tstate);
        }
    }

destructor中則調用了PyEval_RestoreThread,也就是會獲取GIL,這也就是RAII的概念。

正如pybind11的官方討論區所說:

Maybe this will help you: https://docs.python.org/2/c-api/init.html#thread-state-and-the-global-interpreter-lock
py::gil_scoped_release is an RAII version of Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS, while py::gil_scoped_acquire is an RAII version of PyGILState_Ensure/PyGILState_Release.

gil_scoped_releasePyEval_SaveThread/PyEval_RestoreThread的RAII版本,解構子被調用時會間接地調用PyEval_RestoreThread,重新獲取GIL。

注:此處的py命名空間即pybind11,參考Header and namespace conventions。

文檔中給了如下的範例,只出現了gil_scoped_release而沒有gil_scoped_acquire

    m.def("call_go", [](Animal *animal) -> std::string {
        // GIL is held when called from Python code. Release GIL before
        // calling into (potentially long-running) C++ code
        py::gil_scoped_release release;
        return call_go(animal);
    });

這是因為在退出lambda函數後便會調用gil_scoped_release的解構子,自動釋放GIL,所以不必再調用gil_scoped_acquire

gil_scoped_aquire

constructor

    PYBIND11_NOINLINE gil_scoped_acquire() {
        auto &internals = detail::get_internals();
        tstate = (PyThreadState *) PYBIND11_TLS_GET_VALUE(internals.tstate);

        if (!tstate) {
            /* Check if the GIL was acquired using the PyGILState_* API instead (e.g. if
               calling from a Python thread). Since we use a different key, this ensures
               we don't create a new thread state and deadlock in PyEval_AcquireThread
               below. Note we don't save this state with internals.tstate, since we don't
               create it we would fail to clear it (its reference count should be > 0). */
            tstate = PyGILState_GetThisThreadState();
        }

        if (!tstate) {
            tstate = PyThreadState_New(internals.istate);
#        if defined(PYBIND11_DETAILED_ERROR_MESSAGES)
            if (!tstate) {
                pybind11_fail("scoped_acquire: could not create thread state!");
            }
#        endif
            tstate->gilstate_counter = 0;
            PYBIND11_TLS_REPLACE_VALUE(internals.tstate, tstate);
        } else {
            release = detail::get_thread_state_unchecked() != tstate;
        }

        if (release) {
            PyEval_AcquireThread(tstate);
        }

        inc_ref();
    }

當中的PyEval_AcquireThread會獲取GIL。

destructor

    PYBIND11_NOINLINE ~gil_scoped_acquire() {
        dec_ref();
        if (release) {
            PyEval_SaveThread();
        }
    }

其中dec_ref為:

    PYBIND11_NOINLINE void dec_ref() {
        --tstate->gilstate_counter;
#        if defined(PYBIND11_DETAILED_ERROR_MESSAGES)
        if (detail::get_thread_state_unchecked() != tstate) {
            pybind11_fail("scoped_acquire::dec_ref(): thread state must be current!");
        }
        if (tstate->gilstate_counter < 0) {
            pybind11_fail("scoped_acquire::dec_ref(): reference count underflow!");
        }
#        endif
        if (tstate->gilstate_counter == 0) {
#        if defined(PYBIND11_DETAILED_ERROR_MESSAGES)
            if (!release) {
                pybind11_fail("scoped_acquire::dec_ref(): internal error!");
            }
#        endif
            PyThreadState_Clear(tstate);
            if (active) {
                PyThreadState_DeleteCurrent();
            }
            PYBIND11_TLS_DELETE_VALUE(detail::get_internals().tstate);
            release = false;
        }
    }

當中的PyThreadState_Clear的作用是重設線程狀態物件tstate中的所有訊息。

PyEval_SaveThread則會釋放GIL。

Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS

討論區的解釋還提到了Py_BEGIN_ALLOW_THREADSPy_END_ALLOW_THREADS

Py_BEGIN_ALLOW_THREADS

Py_BEGIN_ALLOW_THREADS:

Py_BEGIN_ALLOW_THREADS
Part of the Stable ABI.
This macro expands to { PyThreadState *_save; _save = PyEval_SaveThread();. Note that it contains an opening brace; it must be matched with a following Py_END_ALLOW_THREADS macro. See above for further discussion of this macro.

宣告一個PyThreadState物件,調用PyEval_SaveThread然後把當前狀態存入該物件中。必須與Py_END_ALLOW_THREADS搭配使用。

Py_END_ALLOW_THREADS

Py_END_ALLOW_THREADS:

Py_END_ALLOW_THREADS
Part of the Stable ABI.
This macro expands to PyEval_RestoreThread(_save); }. Note that it contains a closing brace; it must be matched with an earlier Py_BEGIN_ALLOW_THREADS macro. See above for further discussion of this macro.

調用PyEval_RestoreThread

Releasing the GIL from extension code中給中如下的範例:

Most extension code manipulating the GIL has the following simple structure:

Save the thread state in a local variable.
Release the global interpreter lock.
... Do some blocking I/O operation ...
Reacquire the global interpreter lock.
Restore the thread state from the local variable.
This is so common that a pair of macros exists to simplify it:

Py_BEGIN_ALLOW_THREADS
... Do some blocking I/O operation ...
Py_END_ALLOW_THREADS

也就是在需要多線程運行的C++代碼前後用Py_BEGIN_ALLOW_THREADSPy_END_ALLOW_THREADS包起來。

gil_scoped_release是在constructor中調用PyEval_SaveThread,在destructor中調用PyEval_RestoreThread,所以討論中才說gil_scoped_release是RAII版本的Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS

PyGILState_Ensure/PyGILState_Release

討論中提到gil_scoped_acquirePyGILState_EnsurePyGILState_Release的RAII版本。

PyGILState_Ensure

PyGILState_Ensure

PyGILState_STATE PyGILState_Ensure()
Part of the Stable ABI.
Ensure that the current thread is ready to call the Python C API regardless of the current state of Python, or of the global interpreter lock. This may be called as many times as desired by a thread as long as each call is matched with a call to PyGILState_Release(). In general, other thread-related APIs may be used between PyGILState_Ensure() and PyGILState_Release() calls as long as the thread state is restored to its previous state before the Release(). For example, normal usage of the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros is acceptable.

The return value is an opaque “handle” to the thread state when PyGILState_Ensure() was called, and must be passed to PyGILState_Release() to ensure Python is left in the same state. Even though recursive calls are allowed, these handles cannot be shared - each unique call to PyGILState_Ensure() must save the handle for its call to PyGILState_Release().

When the function returns, the current thread will hold the GIL and be able to call arbitrary Python code. Failure is a fatal error.

PyGILState_Ensure會使當前線程獲取GIL,並返回當前線程狀態。之後必須調用PyGILState_Release並把當前線程狀態當作參數傳入。

PyGILState_Release

PyGILState_Release

void PyGILState_Release(PyGILState_STATE)
Part of the Stable ABI.
Release any resources previously acquired. After this call, Python’s state will be the same as it was prior to the corresponding PyGILState_Ensure() call (but generally this state will be unknown to the caller, hence the use of the GILState API).

Every call to PyGILState_Ensure() must be matched by a call to PyGILState_Release() on the same thread.

釋放之前獲取的所有資源。

gil_scoped_acquire的建構子中調用PyEval_AcquireThread,作用類似PyGILState_Ensure;destructor中調用PyEval_SaveThread,作用類似PyGILState_Release

PyTorch中的例子

在PyTorch的torch/csrc/autograd/generated/python_torch_functions_0.cpp中有:

        auto dispatch_rand = [](c10::SymIntArrayRef size, at::TensorOptions options) -> at::Tensor {
          pybind11::gil_scoped_release no_gil;
          return torch::rand_symint(size, options);
        };
        return wrap(dispatch_rand(_r.symintlist(0), options));

注意這裡只用了gil_scoped_release而沒用gil_scoped_acquire,這是因為在torch::rand_symint之後就沒有其它code需要存取Python物件了,並且因為gil_scoped_release使用了RAII,所以GIL會自動在gil_scoped_release的解構子中被釋放。

範例

PyEval_SaveThread/PyEval_RestoreThread

參考Releasing the GIL from extension code和pybind11 multithreading parallellism in python,編輯spam.c如下:

#define PY_SSIZE_T_CLEAN
#include 

long fib_raw(long n) {
    if (n < 2) {
        return 1;
    }
    return fib_raw(n-2) + fib_raw(n-1);
}

long fib_cpp(long n) {
    long res = 0;
    
#ifdef SAVE_THREAD
PyThreadState *_save;

_save = PyEval_SaveThread();
#endif
    res = fib_raw(n);
#ifdef SAVE_THREAD
PyEval_RestoreThread(_save);
#endif
    
    return res;
}

static PyObject*
fib(PyObject* self, PyObject* args) {
    long n;

    if(!PyArg_ParseTuple(args, "l", &n))
        return NULL;
    
    return PyLong_FromLong(fib_cpp(n));
}

static PyMethodDef SpamMethods[] = {
    {"fib",
     fib,
     METH_VARARGS,
     "Execute a shell command."},
    {NULL, NULL, 0, NULL}
};

PyDoc_STRVAR(spam_doc, "Spam module that calculate fib.");

static struct PyModuleDef spammodule = {
    PyModuleDef_HEAD_INIT,
    "spam",
    spam_doc,
    -1,
    SpamMethods
};

PyMODINIT_FUNC
PyInit_spam(void){
    PyObject* m;

    m = PyModule_Create(&spammodule);
    if(m == NULL)
        return NULL;

    return m;
}

int main(int argc, char* argv[]){
    wchar_t* program = Py_DecodeLocale(argv[0], NULL);
    if(program == NULL){
        fprintf(stderr, "Fatal error: cannot decode argv[0]\n");
        exit(1);
    }

    if(PyImport_AppendInittab("spam", PyInit_spam) == -1){
        fprintf(stderr, "Error: could not extend in-built modules table\n");
        exit(1);
    }

    Py_SetProgramName(program);

    Py_Initialize();

    PyObject* pmodule = PyImport_ImportModule("spam");
    if(!pmodule){
        PyErr_Print();
        fprintf(stderr, "Error: could not import module 'spam'\n");
    }

    PyMem_RawFree(program);
    return 0;
}
g++ -DSAVE_THREAD -shared $(python3.8-config --includes) $(python3.8-config --ldflags) -o spam_save_shared.so -fPIC spam.c
g++ -shared $(python3.8-config --includes) $(python3.8-config --ldflags) -o spam_save_shared_raw.so -fPIC spam.c

Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS

編輯spam.c如下:

#define PY_SSIZE_T_CLEAN
#include 

long fib_raw(long n) {
    if (n < 2) {
        return 1;
    }
    return fib_raw(n-2) + fib_raw(n-1);
}

long fib_cpp(long n) {
    long res = 0;
    
#ifdef ALLOW_THREADS
Py_BEGIN_ALLOW_THREADS
#endif
    res = fib_raw(n);
#ifdef ALLOW_THREADS
Py_END_ALLOW_THREADS
#endif
    
    return res;
}

static PyObject*
fib(PyObject* self, PyObject* args) {
    long n;

    if(!PyArg_ParseTuple(args, "l", &n))
        return NULL;
    
    return PyLong_FromLong(fib_cpp(n));
}

static PyMethodDef SpamMethods[] = {
    {"fib",
     fib,
     METH_VARARGS,
     "Execute a shell command."},
    {NULL, NULL, 0, NULL}
};

PyDoc_STRVAR(spam_doc, "Spam module that calculate fib.");

static struct PyModuleDef spammodule = {
    PyModuleDef_HEAD_INIT,
    "spam",
    spam_doc,
    -1,
    SpamMethods
};

PyMODINIT_FUNC
PyInit_spam(void){
    PyObject* m;

    m = PyModule_Create(&spammodule);
    if(m == NULL)
        return NULL;

    return m;
}

int main(int argc, char* argv[]){
    wchar_t* program = Py_DecodeLocale(argv[0], NULL);
    if(program == NULL){
        fprintf(stderr, "Fatal error: cannot decode argv[0]\n");
        exit(1);
    }

    if(PyImport_AppendInittab("spam", PyInit_spam) == -1){
        fprintf(stderr, "Error: could not extend in-built modules table\n");
        exit(1);
    }

    Py_SetProgramName(program);

    Py_Initialize();

    PyObject* pmodule = PyImport_ImportModule("spam");
    if(!pmodule){
        PyErr_Print();
        fprintf(stderr, "Error: could not import module 'spam'\n");
    }

    PyMem_RawFree(program);
    return 0;
}
g++ -DALLOW_THREADS -shared $(python3.8-config --includes) $(python3.8-config --ldflags) -o spam_allow_threads.so -fPIC spam.c
g++ -shared $(python3.8-config --includes) $(python3.8-config --ldflags) -o spam_raw.so -fPIC spam.c

gil_scoped_release/gil_scoped_aquire

參考pybind11 documentation - First steps和pybind11 documentation - Global Interpreter Lock (GIL)將上面的例子修改成pybind11版本,然後使用pybind11::gil_scoped_release

#include 

long fib_raw(long n) {
    if (n < 2) {
        return 1;
    }
    return fib_raw(n-2) + fib_raw(n-1);
}

long fib(long n) {
    long res = 0;
    
#ifdef NOGIL1
    pybind11::gil_scoped_release no_gil;
#endif
    res = fib_raw(n);
    
    return res;
}

PYBIND11_MODULE(spam, m) {
    m.doc() = "pybind11 fibonacci plugin"; // optional module docstring

#ifdef NOGIL
    m.def("fib", [](long n) -> long {
        // GIL is held when called from Python code. Release GIL before
        // calling into (potentially long-running) C++ code
        pybind11::gil_scoped_release release;
        return fib(n);
    }, "A function that calculate fibonacci");
#elif DEFINED(NOGIL2)
    m.def("fib", &fib, pybind11::call_guard<py::gil_scoped_release>(), "A function that calculate fibonacci");
#else
    m.def("fib", &fib, "A function that calculate fibonacci");
#endif
}
pybind11 is a header-only library, hence it is not necessary to link against any special libraries and there are no intermediate (magic) translation steps. 
g++ -DNOGIL -shared $(python3.8-config --includes) $(python3 -m pybind11 --includes) $(python3.8-config --ldflags) -o spam_pybind11_nogil.so -fPIC spam.c
g++ -shared $(python3.8-config --includes) $(python3 -m pybind11 --includes) $(python3.8-config --ldflags) -o spam_pybind11_raw.so -fPIC spam.c

調用

挑選一個spam_xxx.so將它複製成spam.so,並編寫run.py如下:

import time
import spam
import sys
from concurrent.futures import ThreadPoolExecutor, as_completed

if __name__ == "__main__":
    max_workers = 5
    num = 40
    results = []
    
    start = time.time()
    if len(sys.argv) > 1 and sys.argv[1]:
        print("multiple thread")
        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            results = executor.map(spam.fib, [num] * max_workers)
    else:
        print("single thread")
        for _ in range(max_workers):
            results.append(spam.fib(num))
    end = time.time()
    print(end - start)
    print(list(results))

這個腳本會根據是否有傳入參數,分別使用單線程和多線程去呼叫C++函數。

運行結果

運行結果會像下面這樣:

$ python run.py 1
multiple thread
1.391195297241211
[165580141, 165580141, 165580141, 165580141, 165580141]
$ python run.py 
single thread
3.9591684341430664
[165580141, 165580141, 165580141, 165580141, 165580141]

我們有三種不同釋放GIL的代碼,它們的運行時間整理如下:

單線程 多線程
關閉PyEval_SaveThread 3.976 4.018
開啟PyEval_SaveThread 3.959 1.391
關閉ALLOW_THREADS 4.036 4.079
開啟ALLOW_THREADS 4.025 1.453
關閉gil_scoped_release 4.088 4.041
開啟gil_scoped_release 4.031 1.464

可以看到,在關閉PyEval_SaveThread, ALLOW_THREADSgil_scoped_release時,單線程與多線程運行時的執行時間差異不大。如果釋放了GIL,並使用5個線程運行的話,則可以加速近3倍。

你可能感兴趣的:(Python學習筆記,PyTorch,python,c++,GIL)