大师兄的Python源码学习笔记(十一): Python的虚拟机框架

大师兄的Python源码学习笔记(十): Python的编译过程
大师兄的Python源码学习笔记(十二): Python虚拟机中的一般表达式(一)

一、关于字节码虚拟机

  • 字节码虚拟机是Python的核心。
  • 在源码被编译为字节码指令序列后,字节码虚拟机将接手整个工作。
  • 字节码虚拟机会从编译得到的PyCodeObject对象中依次读入并执行每一条字节码指令

二、可执行文件的运作方式

1. 可执行文件在x86机器上的大致运行原理
void a(int n)
{
    printf("%d\n",n);
}

void b()
{
    a(1);
}

int main()
{
    b();
}
  • 当程序进入a()时,调用者的栈b()的栈帧,当前帧a()的栈帧。
  • 函数所有的局部变量操作都在自己的栈帧中完成,函数之间的调用则通过创建新的栈帧完成。
  • 运行时栈是从地址空间的高地址向低地址延伸的,当b()调用a()时,系统就会在地址空间中,b()的栈帧之后创建a()的栈帧,并在a()中保存b()栈指针esp帧指针ebp
  • a()执行完成后,系统会把espebp的值恢复为创建a()的栈帧之前的值,这样程序流程又回到b()中,程序工作的空间又回到了b()的栈帧中。
2. 关于执行环境
  • 在Python中,PyCodeObject中储存着所有字节码指令和静态信息,但不包含程序运行的动态信息,即执行环境
>>>i = 1
>
>>>def a():
>>>    i = 2
>>>    print(i)
>
>>>a()
>>>print(i)
2
1
  • 上面的代码之所以i的值不同,是因为它们在不同的命名空间,而命名空间就是执行环境的一部分。
  • 在开始执行.py程序时,Python会建立一个执行环境A,当函数调用时,会重新创建一个新的执行环境B,B实际就是一个新的栈帧。
  • 所以,在Python真正执行的时候,对应的不是一个PyCodeObject,而是一个执行环境,即PyFrameObject

三、关于PyFrameObject

1. PyFrameObject源码
Include\frameobject.h

typedef struct _frame {
    PyObject_VAR_HEAD
    struct _frame *f_back;      /* previous frame, or NULL */
    PyCodeObject *f_code;       /* code segment */
    PyObject *f_builtins;       /* builtin symbol table (PyDictObject) */
    PyObject *f_globals;        /* global symbol table (PyDictObject) */
    PyObject *f_locals;         /* local symbol table (any mapping) */
    PyObject **f_valuestack;    /* points after the last local */
    /* Next free slot in f_valuestack.  Frame creation sets to f_valuestack.
       Frame evaluation usually NULLs it, but a frame that yields sets it
       to the current stack top. */
    PyObject **f_stacktop;
    PyObject *f_trace;          /* Trace function */
    char f_trace_lines;         /* Emit per-line trace events? */
    char f_trace_opcodes;       /* Emit per-opcode trace events? */

    /* Borrowed reference to a generator, or NULL */
    PyObject *f_gen;

    int f_lasti;                /* Last instruction if called */
    /* Call PyFrame_GetLineNumber() instead of reading this field
       directly.  As of 2.3 f_lineno is only valid when tracing is
       active (i.e. when f_trace is set).  At other times we use
       PyCode_Addr2Line to calculate the line from the current
       bytecode index. */
    int f_lineno;               /* Current line number */
    int f_iblock;               /* index in f_blockstack */
    char f_executing;           /* whether the frame is still executing */
    PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */
    PyObject *f_localsplus[1];  /* locals+stack, dynamically sized */
} PyFrameObject;
参数 含义
*f_back 执行环境链的前一个frame
*f_code PyCodeObject对象
*f_builtins 内置命名空间
*f_globals global命名空间
*f_locals local命名空间
**f_valuestack 栈底
**f_stacktop 栈顶
f_lasti 上一条字节码指令在f_code中的偏移位置
f_lineno 当前字节码对应的源代码行
f_executing 正在运行的帧位置
*f_localsplus[1] 动态所需空间
  • 包含一个PyObject_VAR_HEAD,表示这是一个变长对象。

  • f_back参数可以看出,许多PyFrameObject形成了一个链表结构,这是模拟栈帧关系中的esp和ebp指针。

  • f_code中存放了一个待执行的PyCodeObject对象。

  • *f_builtins*f_globals*f_locals分别维护着builtin、global和local的键值对对应关系。

  • 类型对象如下:

Objects\frameobject.c

PyTypeObject PyFrame_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "frame",
    sizeof(PyFrameObject),
    sizeof(PyObject *),
    (destructor)frame_dealloc,                  /* tp_dealloc */
    0,                                          /* tp_print */
    0,                                          /* tp_getattr */
    0,                                          /* tp_setattr */
    0,                                          /* tp_reserved */
    (reprfunc)frame_repr,                       /* tp_repr */
    0,                                          /* tp_as_number */
    0,                                          /* tp_as_sequence */
    0,                                          /* tp_as_mapping */
    0,                                          /* tp_hash */
    0,                                          /* tp_call */
    0,                                          /* tp_str */
    PyObject_GenericGetAttr,                    /* tp_getattro */
    PyObject_GenericSetAttr,                    /* tp_setattro */
    0,                                          /* tp_as_buffer */
    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,/* tp_flags */
    0,                                          /* tp_doc */
    (traverseproc)frame_traverse,               /* tp_traverse */
    (inquiry)frame_tp_clear,                    /* tp_clear */
    0,                                          /* tp_richcompare */
    0,                                          /* tp_weaklistoffset */
    0,                                          /* tp_iter */
    0,                                          /* tp_iternext */
    frame_methods,                              /* tp_methods */
    frame_memberlist,                           /* tp_members */
    frame_getsetlist,                           /* tp_getset */
    0,                                          /* tp_base */
    0,                                          /* tp_dict */
};
2. PyFrameObject动态内存空间
  • PyFrameObject源码中,我们看到*f_localsplus[1]维护动态所需空间。
  • 从创建PyFrameObject的过程可见,这段内存不只给栈使用。
Objects\frameobject.c

PyFrameObject*
PyFrame_New(PyThreadState *tstate, PyCodeObject *code,
            PyObject *globals, PyObject *locals)
{
    PyFrameObject *f = _PyFrame_New_NoTrack(tstate, code, globals, locals);
    if (f)
        _PyObject_GC_TRACK(f);
    return f;
}

PyFrameObject* _Py_HOT_FUNCTION
_PyFrame_New_NoTrack(PyThreadState *tstate, PyCodeObject *code,
                     PyObject *globals, PyObject *locals)
{
    PyFrameObject *back = tstate->frame;
    PyFrameObject *f;
... ...
        Py_ssize_t extras, ncells, nfrees;
        ncells = PyTuple_GET_SIZE(code->co_cellvars);
        nfrees = PyTuple_GET_SIZE(code->co_freevars);
        extras = code->co_stacksize + code->co_nlocals + ncells +
            nfrees;
        if (free_list == NULL) {
            f = PyObject_GC_NewVar(PyFrameObject, &PyFrame_Type,
            extras);
            if (f == NULL) {
                Py_DECREF(builtins);
                return NULL;
            }
        }
... ...
    f->f_stacktop = f->f_valuestack;
... ...
}
  • 可以看到由code->co_stacksizecode->co_nlocalsncellsnfrees四部分构成了维护的动态内存区,与闭包实现相关,其大小由extra确定,而另一部分才是给运行栈使用的。
  • 所以PyFrameObject对象的栈底由f_valuestack维护,栈顶由f_stacktop维护。
3. 在Python中访问PyFrameObject对象
  • 在Python中可以使用sys._getframe()函数获得当前调用函数的函数信息。
>>>import sys
>
>>>def sample():
>>>    f= sys._getframe()
>>>    for a in dir(f):
>>>        print(a,':',eval(f"f.{a}"))
>
>>>if __name__ == '__main__':
>>>    sample()
__class__ : 
__delattr__ : 
__dir__ : 
__doc__ : None
__eq__ : 
__format__ : 
__ge__ : 
__getattribute__ : 
__gt__ : 
__hash__ : 
__init__ : 
__init_subclass__ : 
__le__ : 
__lt__ : 
__ne__ : 
__new__ : 
__reduce__ : 
__reduce_ex__ : 
__repr__ : 
__setattr__ : 
__sizeof__ : 
__str__ : 
__subclasshook__ : 
clear : 
f_back : >
f_builtins : {'__name__': 'builtins', '__doc__': "Built-in functions, exceptions, and other objects.\n\nNoteworthy: None is the `nil' object; Ellipsis represents `...' in slices.", '__package__': '', '__loader__': , '__spec__': ModuleSpec(name='builtins', loader=), '__build_class__': , '__import__': , 'abs': , 'all': , 'any': , 'ascii': , 'bin': , 'breakpoint': , 'callable': , 'chr': , 'compile': , 'delattr': , 'dir': , 'divmod': , 'eval': , 'exec': , 'format': , 'getattr': , 'globals': , 'hasattr': , 'hash': , 'hex': , 'id': , 'input': , 'isinstance': , 'issubclass': , 'iter': , 'len': , 'locals': , 'max': , 'min': , 'next': , 'oct': , 'ord': , 'pow': , 'print': , 'repr': , 'round': , 'setattr': , 'sorted': , 'sum': , 'vars': , 'None': None, 'Ellipsis': Ellipsis, 'NotImplemented': NotImplemented, 'False': False, 'True': True, 'bool': , 'memoryview': , 'bytearray': , 'bytes': , 'classmethod': , 'complex': , 'dict': , 'enumerate': , 'filter': , 'float': , 'frozenset': , 'property': , 'int': , 'list': , 'map': , 'object': , 'range': , 'reversed': , 'set': , 'slice': , 'staticmethod': , 'str': , 'super': , 'tuple': , 'type': , 'zip': , '__debug__': True, 'BaseException': , 'Exception': , 'TypeError': , 'StopAsyncIteration': , 'StopIteration': , 'GeneratorExit': , 'SystemExit': , 'KeyboardInterrupt': , 'ImportError': , 'ModuleNotFoundError': , 'OSError': , 'EnvironmentError': , 'IOError': , 'WindowsError': , 'EOFError': , 'RuntimeError': , 'RecursionError': , 'NotImplementedError': , 'NameError': , 'UnboundLocalError': , 'AttributeError': , 'SyntaxError': , 'IndentationError': , 'TabError': , 'LookupError': , 'IndexError': , 'KeyError': , 'ValueError': , 'UnicodeError': , 'UnicodeEncodeError': , 'UnicodeDecodeError': , 'UnicodeTranslateError': , 'AssertionError': , 'ArithmeticError': , 'FloatingPointError': , 'OverflowError': , 'ZeroDivisionError': , 'SystemError': , 'ReferenceError': , 'MemoryError': , 'BufferError': , 'Warning': , 'UserWarning': , 'DeprecationWarning': , 'PendingDeprecationWarning': , 'SyntaxWarning': , 'RuntimeWarning': , 'FutureWarning': , 'ImportWarning': , 'UnicodeWarning': , 'BytesWarning': , 'ResourceWarning': , 'ConnectionError': , 'BlockingIOError': , 'BrokenPipeError': , 'ChildProcessError': , 'ConnectionAbortedError': , 'ConnectionRefusedError': , 'ConnectionResetError': , 'FileExistsError': , 'FileNotFoundError': , 'IsADirectoryError': , 'NotADirectoryError': , 'InterruptedError': , 'PermissionError': , 'ProcessLookupError': , 'TimeoutError': , 'open': , 'quit': Use quit() or Ctrl-Z plus Return to exit, 'exit': Use exit() or Ctrl-Z plus Return to exit, 'copyright': Copyright (c) 2001-2018 Python Software Foundation.
All Rights Reserved.

Copyright (c) 2000 BeOpen.com.
All Rights Reserved.

Copyright (c) 1995-2001 Corporation for National Research Initiatives.
All Rights Reserved.

Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.
All Rights Reserved., 'credits':     Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands
    for supporting Python development.  See www.python.org for more information., 'license': Type license() to see the full license text, 'help': Type help() for interactive help, or help(object) for help about object.}
f_code : 
f_globals : {'__name__': '__main__', '__doc__': '\n@File    :   temp.py    \n@Contact :   [email protected]\n\n@Modify Time      @Author           @Version    @Desciption\n------------      ---------------   --------    -----------\n2021/x/x 16:54   大师兄(superkmi)      1.0         None\n', '__package__': None, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x000002B852DD9518>, '__spec__': None, '__annotations__': {}, '__builtins__': , '__file__': 'xx/temp.py', '__cached__': None, 'sys': , 'sample': }
f_lasti : 38
f_lineno : 18
f_locals : {'f': , 'a': 'f_locals'}
f_trace : None
f_trace_lines : True
f_trace_opcodes : False

四、虚拟机的运行框架

  • Python虚拟机由PyEval_EvalFram函数为入口,调用了一个巨大的函数EvalFrameDefault
ceval.c

/* Interpreter main loop */

PyObject *
PyEval_EvalFrame(PyFrameObject *f) {
    /* This is for backward compatibility with extension modules that
       used this API; core interpreter code should call
       PyEval_EvalFrameEx() */
    return PyEval_EvalFrameEx(f, 0);
}

PyObject *
PyEval_EvalFrameEx(PyFrameObject *f, int throwflag)
{
    PyThreadState *tstate = PyThreadState_GET();
    return tstate->interp->eval_frame(f, throwflag);
}

pystate.c

PyInterpreterState *
PyInterpreterState_New(void)
{
    PyInterpreterState *interp = (PyInterpreterState *)
                                 PyMem_RawMalloc(sizeof(PyInterpreterState));

    if (interp == NULL) {
        return NULL;
    }

... ...
    interp->eval_frame = _PyEval_EvalFrameDefault;
... ...
ceval.c

PyObject* _Py_HOT_FUNCTION
_PyEval_EvalFrameDefault(PyFrameObject *f, int throwflag)
{
... ...
    co = f->f_code;
    names = co->co_names;
    consts = co->co_consts;
    fastlocals = f->f_localsplus;
    freevars = f->f_localsplus + co->co_nlocals;
    first_instr = (_Py_CODEUNIT *) PyBytes_AS_STRING(co->co_code);
... ...
    next_instr = first_instr;
    if (f->f_lasti >= 0) {
        next_instr += f->f_lasti / sizeof(_Py_CODEUNIT) + 1;
    }
    stack_pointer = f->f_stacktop;
    f->f_stacktop = NULL;       /* remains NULL unless yield suspends frame */
... ...
}
  • EvalFrameDefault中初始化了一些变量,包括PyFrameObjectPyCodeObject对象的重要信息。
  • 同时,也初始化了栈顶指针,指向f->f_stacktop
  • Python虚拟机执行字节码指令序列的过程就是从头到尾遍历整个co_code,而co_codePyCodeObject对象中保存字节码指令和字节码指令的参数,所以这个过程就是在依次执行字节码指令的过程。
  • Python虚拟机利用3个char*类型变量来完成整个遍历过程,first_instr指向字节码指令序列的开始位置,next_instr指向下一条执行的字节码指令位置,f_lasti指向上一条已经执行过的字节码指令的位置。
  • Python虚拟机执行字节码指令的整体架构,是一个for循环加上巨大的switch/case结构:
ceval.c

PyObject* _Py_HOT_FUNCTION
_PyEval_EvalFrameDefault(PyFrameObject *f, int throwflag)
{
 why = WHY_NOT;

    if (throwflag) /* support for generator.throw() */
        goto error;

#ifdef Py_DEBUG
    /* PyEval_EvalFrameEx() must not be called with an exception set,
       because it can clear it (directly or indirectly) and so the
       caller loses its exception */
    assert(!PyErr_Occurred());
#endif

    for (;;) {
        assert(stack_pointer >= f->f_valuestack); /* else underflow */
        assert(STACK_LEVEL() <= co->co_stacksize);  /* else overflow */
        assert(!PyErr_Occurred());

        /* Do periodic things.  Doing this every time through
           the loop would add too much overhead, so we do it
           only every Nth instruction.  We also do it if
           ``pendingcalls_to_do'' is set, i.e. when an asynchronous
           event needs attention (e.g. a signal handler or
           async I/O handler); see Py_AddPendingCall() and
           Py_MakePendingCalls() above. */

        if (_Py_atomic_load_relaxed(&_PyRuntime.ceval.eval_breaker)) {
            opcode = _Py_OPCODE(*next_instr);
            if (opcode == SETUP_FINALLY ||
                opcode == SETUP_WITH ||
                opcode == BEFORE_ASYNC_WITH ||
                opcode == YIELD_FROM) {
                /* Few cases where we skip running signal handlers and other
                   pending calls:
                   - If we're about to enter the 'with:'. It will prevent
                     emitting a resource warning in the common idiom
                     'with open(path) as file:'.
                   - If we're about to enter the 'async with:'.
                   - If we're about to enter the 'try:' of a try/finally (not
                     *very* useful, but might help in some cases and it's
                     traditional)
                   - If we're resuming a chain of nested 'yield from' or
                     'await' calls, then each frame is parked with YIELD_FROM
                     as its next opcode. If the user hit control-C we want to
                     wait until we've reached the innermost frame before
                     running the signal handler and raising KeyboardInterrupt
                     (see bpo-30039).
                */
                goto fast_next_opcode;
            }
            if (_Py_atomic_load_relaxed(
                        &_PyRuntime.ceval.pending.calls_to_do))
            {
                if (Py_MakePendingCalls() < 0)
                    goto error;
            }
            if (_Py_atomic_load_relaxed(
                        &_PyRuntime.ceval.gil_drop_request))
            {
                /* Give another thread a chance */
                if (PyThreadState_Swap(NULL) != tstate)
                    Py_FatalError("ceval: tstate mix-up");
                drop_gil(tstate);

                /* Other threads may run now */

                take_gil(tstate);

                /* Check if we should make a quick exit. */
                if (_Py_IsFinalizing() &&
                    !_Py_CURRENTLY_FINALIZING(tstate))
                {
                    drop_gil(tstate);
                    PyThread_exit_thread();
                }

                if (PyThreadState_Swap(tstate) != NULL)
                    Py_FatalError("ceval: orphan tstate");
            }
            /* Check for asynchronous exceptions. */
            if (tstate->async_exc != NULL) {
                PyObject *exc = tstate->async_exc;
                tstate->async_exc = NULL;
                UNSIGNAL_ASYNC_EXC();
                PyErr_SetNone(exc);
                Py_DECREF(exc);
                goto error;
            }
        }

    fast_next_opcode:
        f->f_lasti = INSTR_OFFSET();

        if (PyDTrace_LINE_ENABLED())
            maybe_dtrace_line(f, &instr_lb, &instr_ub, &instr_prev);

        /* line-by-line tracing support */

        if (_Py_TracingPossible &&
            tstate->c_tracefunc != NULL && !tstate->tracing) {
            int err;
            /* see maybe_call_line_trace
               for expository comments */
            f->f_stacktop = stack_pointer;

            err = maybe_call_line_trace(tstate->c_tracefunc,
                                        tstate->c_traceobj,
                                        tstate, f,
                                        &instr_lb, &instr_ub, &instr_prev);
            /* Reload possibly changed frame fields */
            JUMPTO(f->f_lasti);
            if (f->f_stacktop != NULL) {
                stack_pointer = f->f_stacktop;
                f->f_stacktop = NULL;
            }
            if (err)
                /* trace function raised an exception */
                goto error;
        }

        /* Extract opcode and argument */

        NEXTOPARG();
    dispatch_opcode:
#ifdef DYNAMIC_EXECUTION_PROFILE
#ifdef DXPAIRS
        dxpairs[lastopcode][opcode]++;
        lastopcode = opcode;
#endif
        dxp[opcode]++;
#endif

#ifdef LLTRACE
        /* Instruction tracing */

        if (lltrace) {
            if (HAS_ARG(opcode)) {
                printf("%d: %d, %d\n",
                       f->f_lasti, opcode, oparg);
            }
            else {
                printf("%d: %d\n",
                       f->f_lasti, opcode);
            }
        }
#endif

        switch (opcode) {
      ...  ...        
    }
... ...
}
  • 在这个执行架构中,对字节码的遍历是通过几个宏来实现的。
ceval.c

/* The integer overflow is checked by an assertion below. */
#define INSTR_OFFSET()  \
    (sizeof(_Py_CODEUNIT) * (int)(next_instr - first_instr))
#define NEXTOPARG()  do { \
        _Py_CODEUNIT word = *next_instr; \
        opcode = _Py_OPCODE(word); \
        oparg = _Py_OPARG(word); \
        next_instr++; \
    } while (0)
#define JUMPTO(x)       (next_instr = first_instr + (x) / sizeof(_Py_CODEUNIT))
#define JUMPBY(x)       (next_instr += (x) / sizeof(_Py_CODEUNIT))
  • 判断字节码是否带参是通过宏HAS_ARG实现的。
Include\opcode.h

#define HAS_ARG(op) ((op) >= HAVE_ARGUMENT)
  • Python在获得一条字节码指令和指令参数后,会对字节码指令利用switch进行判断,根据判断结果选择不同的case语句,每一条字节码会对应一个case语句,在case语句中,就是Python对字节码指令的实现。
ceval.c

#define TARGET(op) \
    case op:

... ...
PyObject* _Py_HOT_FUNCTION
_PyEval_EvalFrameDefault(PyFrameObject *f, int throwflag)
{
... ...
      TARGET(STORE_SUBSCR) {
            PyObject *sub = TOP();
            PyObject *container = SECOND();
            PyObject *v = THIRD();
            int err;
            STACKADJ(-3);
            /* container[sub] = v */
            err = PyObject_SetItem(container, sub, v);
            Py_DECREF(v);
            Py_DECREF(container);
            Py_DECREF(sub);
            if (err != 0)
                goto error;
            DISPATCH();
        }

        TARGET(DELETE_SUBSCR) {
            PyObject *sub = TOP();
            PyObject *container = SECOND();
            int err;
            STACKADJ(-2);
            /* del container[sub] */
            err = PyObject_DelItem(container, sub);
            Py_DECREF(container);
            Py_DECREF(sub);
            if (err != 0)
                goto error;
            DISPATCH();
        }
... ...
Include\opcode.h

    /* Instruction opcodes for compiled code */
#define POP_TOP                   1
#define ROT_TWO                   2
#define ROT_THREE                 3
#define DUP_TOP                   4
#define DUP_TOP_TWO               5
#define NOP                       9
#define UNARY_POSITIVE           10
#define UNARY_NEGATIVE           11
#define UNARY_NOT                12
#define UNARY_INVERT             15
#define BINARY_MATRIX_MULTIPLY   16
#define INPLACE_MATRIX_MULTIPLY  17
#define BINARY_POWER             19
#define BINARY_MULTIPLY          20
#define BINARY_MODULO            22
#define BINARY_ADD               23
#define BINARY_SUBTRACT          24
#define BINARY_SUBSCR            25
#define BINARY_FLOOR_DIVIDE      26
#define BINARY_TRUE_DIVIDE       27
#define INPLACE_FLOOR_DIVIDE     28
#define INPLACE_TRUE_DIVIDE      29
#define GET_AITER                50
#define GET_ANEXT                51
#define BEFORE_ASYNC_WITH        52
#define INPLACE_ADD              55
#define INPLACE_SUBTRACT         56
#define INPLACE_MULTIPLY         57
#define INPLACE_MODULO           59
#define STORE_SUBSCR             60
#define DELETE_SUBSCR            61
#define BINARY_LSHIFT            62
#define BINARY_RSHIFT            63
#define BINARY_AND               64
#define BINARY_XOR               65
#define BINARY_OR                66
#define INPLACE_POWER            67
#define GET_ITER                 68
#define GET_YIELD_FROM_ITER      69
#define PRINT_EXPR               70
#define LOAD_BUILD_CLASS         71
#define YIELD_FROM               72
#define GET_AWAITABLE            73
#define INPLACE_LSHIFT           75
#define INPLACE_RSHIFT           76
#define INPLACE_AND              77
#define INPLACE_XOR              78
#define INPLACE_OR               79
#define BREAK_LOOP               80
#define WITH_CLEANUP_START       81
#define WITH_CLEANUP_FINISH      82
#define RETURN_VALUE             83
#define IMPORT_STAR              84
#define SETUP_ANNOTATIONS        85
#define YIELD_VALUE              86
#define POP_BLOCK                87
#define END_FINALLY              88
#define POP_EXCEPT               89
#define HAVE_ARGUMENT            90
#define STORE_NAME               90
#define DELETE_NAME              91
#define UNPACK_SEQUENCE          92
#define FOR_ITER                 93
#define UNPACK_EX                94
#define STORE_ATTR               95
#define DELETE_ATTR              96
#define STORE_GLOBAL             97
#define DELETE_GLOBAL            98
#define LOAD_CONST              100
#define LOAD_NAME               101
#define BUILD_TUPLE             102
#define BUILD_LIST              103
#define BUILD_SET               104
#define BUILD_MAP               105
#define LOAD_ATTR               106
#define COMPARE_OP              107
#define IMPORT_NAME             108
#define IMPORT_FROM             109
#define JUMP_FORWARD            110
#define JUMP_IF_FALSE_OR_POP    111
#define JUMP_IF_TRUE_OR_POP     112
#define JUMP_ABSOLUTE           113
#define POP_JUMP_IF_FALSE       114
#define POP_JUMP_IF_TRUE        115
#define LOAD_GLOBAL             116
#define CONTINUE_LOOP           119
#define SETUP_LOOP              120
#define SETUP_EXCEPT            121
#define SETUP_FINALLY           122
#define LOAD_FAST               124
#define STORE_FAST              125
#define DELETE_FAST             126
#define RAISE_VARARGS           130
#define CALL_FUNCTION           131
#define MAKE_FUNCTION           132
#define BUILD_SLICE             133
#define LOAD_CLOSURE            135
#define LOAD_DEREF              136
#define STORE_DEREF             137
#define DELETE_DEREF            138
#define CALL_FUNCTION_KW        141
#define CALL_FUNCTION_EX        142
#define SETUP_WITH              143
#define EXTENDED_ARG            144
#define LIST_APPEND             145
#define SET_ADD                 146
#define MAP_ADD                 147
#define LOAD_CLASSDEREF         148
#define BUILD_LIST_UNPACK       149
#define BUILD_MAP_UNPACK        150
#define BUILD_MAP_UNPACK_WITH_CALL 151
#define BUILD_TUPLE_UNPACK      152
#define BUILD_SET_UNPACK        153
#define SETUP_ASYNC_WITH        154
#define FORMAT_VALUE            155
#define BUILD_CONST_KEY_MAP     156
#define BUILD_STRING            157
#define BUILD_TUPLE_UNPACK_WITH_CALL 158
#define LOAD_METHOD             160
#define CALL_METHOD             161
  • 在成功执行完一条字节码指令后,Python的执行流程会跳转到fast_next_opcode或for循环处。
  • 无论如何,Python接下来的动作都是获得下一条字节码指令和指令参数,执行下一条指令。
  • 就这样一条条遍历co_code中的所有字节码指令,最终完成对Python程序的执行。
  • _PyEval_EvalFrameDefault函数中的why变量,指示了退出for循环时的状态,比如异常。
ceval.c

/* Status code for main loop (reason for stack unwind) */
enum why_code {
        WHY_NOT =       0x0001, /* No error */
        WHY_EXCEPTION = 0x0002, /* Exception occurred */
        WHY_RETURN =    0x0008, /* 'return' statement */
        WHY_BREAK =     0x0010, /* 'break' statement */
        WHY_CONTINUE =  0x0020, /* 'continue' statement */
        WHY_YIELD =     0x0040, /* 'yield' operator */
        WHY_SILENCED =  0x0080  /* Exception silenced by 'with' */
};

五、Python的运行时环境

1. 操作系统中的进程和线程
  • 原生的win32可执行文件,多会在一个进程(Process)中运行。
  • 但与机器指令序列相对应的活动对象是由线程Thread来进行抽象的,进程则是线程的活动环境。
  • 对于单线程可执行文件,在执行时操作系统会创建一个进程,在进程中,又会有一个主线程
  • 对于多线程的可执行文件,操作系统会创建一个进程和多个线程,CPU在线程中不断切换,在切换时需要执行线程环境的保存工作,以实现线程的同步。

2. Python中的进程和线程

2.1 Python中的线程环境
  • Python实现了对多线程的支持,并且Python中的每一个线程对应操作系统上的原生线程
  • 虚拟机就是Python对CPU的抽象,负责所有线程的计算工作。
  • Python中的任务切换,就是不同线程轮流使用虚拟机的机制。
  • Python使用PyThreadState保存当前的线程信息,每个线程都拥有一个PyThreadState对象,所以也可以将PyThreadState看做是线程状态的抽象。
Include\pystate.h

typedef struct _ts {
    /* See Python/ceval.c for comments explaining most fields */

    struct _ts *prev;
    struct _ts *next;
    PyInterpreterState *interp;

    struct _frame *frame;
    int recursion_depth;
    char overflowed; /* The stack has overflowed. Allow 50 more calls
                        to handle the runtime error. */
    char recursion_critical; /* The current calls must not cause
                                a stack overflow. */
    int stackcheck_counter;

    /* 'tracing' keeps track of the execution depth when tracing/profiling.
       This is to prevent the actual trace/profile code from being recorded in
       the trace/profile. */
    int tracing;
    int use_tracing;

    Py_tracefunc c_profilefunc;
    Py_tracefunc c_tracefunc;
    PyObject *c_profileobj;
    PyObject *c_traceobj;

    /* The exception currently being raised */
    PyObject *curexc_type;
    PyObject *curexc_value;
    PyObject *curexc_traceback;

    /* The exception currently being handled, if no coroutines/generators
     * are present. Always last element on the stack referred to be exc_info.
     */
    _PyErr_StackItem exc_state;

    /* Pointer to the top of the stack of the exceptions currently
     * being handled */
    _PyErr_StackItem *exc_info;

    PyObject *dict;  /* Stores per-thread state */

    int gilstate_counter;

    PyObject *async_exc; /* Asynchronous exception to raise */
    unsigned long thread_id; /* Thread id where this tstate was created */

    int trash_delete_nesting;
    PyObject *trash_delete_later;

    /* Called when a thread state is deleted normally, but not when it
     * is destroyed after fork().
     * Pain:  to prevent rare but fatal shutdown errors (issue 18808),
     * Thread.join() must wait for the join'ed thread's tstate to be unlinked
     * from the tstate chain.  That happens at the end of a thread's life,
     * in pystate.c.
     * The obvious way doesn't quite work:  create a lock which the tstate
     * unlinking code releases, and have Thread.join() wait to acquire that
     * lock.  The problem is that we _are_ at the end of the thread's life:
     * if the thread holds the last reference to the lock, decref'ing the
     * lock will delete the lock, and that may trigger arbitrary Python code
     * if there's a weakref, with a callback, to the lock.  But by this time
     * _PyThreadState_Current is already NULL, so only the simplest of C code
     * can be allowed to run (in particular it must not be possible to
     * release the GIL).
     * So instead of holding the lock directly, the tstate holds a weakref to
     * the lock:  that's the value of on_delete_data below.  Decref'ing a
     * weakref is harmless.
     * on_delete points to _threadmodule.c's static release_sentinel() function.
     * After the tstate is unlinked, release_sentinel is called with the
     * weakref-to-lock (on_delete_data) argument, and release_sentinel releases
     * the indirectly held lock.
     */
    void (*on_delete)(void *);
    void *on_delete_data;

    int coroutine_origin_tracking_depth;

    PyObject *coroutine_wrapper;
    int in_coroutine_wrapper;

    PyObject *async_gen_firstiter;
    PyObject *async_gen_finalizer;

    PyObject *context;
    uint64_t context_ver;

    /* Unique thread state id. */
    uint64_t id;

    /* XXX signal handlers should also be here */

} PyThreadState;
  • 在结构中包含了*frame对象,这表示在PyThreadState对象中,维护着一个帧栈列表。
  • 而在虚拟机源码中,也可以看到会将当前线程状态设置为当前的执行环境。
ceval.c

PyObject *
PyEval_EvalFrameEx(PyFrameObject *f, int throwflag)
{
    PyThreadState *tstate = PyThreadState_GET();
    return tstate->interp->eval_frame(f, throwflag);
}
pystate.c

PyThreadState *
PyThreadState_Get(void)
{
    PyThreadState *tstate = GET_TSTATE();
    if (tstate == NULL)
        Py_FatalError("PyThreadState_Get: no current thread");

    return tstate;
}
  • 在创建PyFrameObject对象时,也会调用当前线程对象。
Objects\frameobject.c

PyFrameObject*
PyFrame_New(PyThreadState *tstate, PyCodeObject *code,
            PyObject *globals, PyObject *locals)
{
    PyFrameObject *f = _PyFrame_New_NoTrack(tstate, code, globals, locals);
    if (f)
        _PyObject_GC_TRACK(f);
    return f;
}
2.2 Python中的进程环境
  • 而对于进程的概念,Python是以PyInterpreterState对象来实现的。
  • Python可以有多个逻辑上的PyInterpreterState,对应系统中的多进程。
  • 但在通常环境下,Python中只有一个interpreter,它维护多个PyThreadState对象,而这些线程对象轮流使用一个字节码执行引擎。
  • 而Python中使用全局解释器锁(Global Interpreter Lock,GIL)来实现所有线程的同步。
Include\pystate.h

typedef struct _is {

    struct _is *next;
    struct _ts *tstate_head;

    int64_t id;
    int64_t id_refcount;
    PyThread_type_lock id_mutex;

    PyObject *modules;
    PyObject *modules_by_index;
    PyObject *sysdict;
    PyObject *builtins;
    PyObject *importlib;

    /* Used in Python/sysmodule.c. */
    int check_interval;

    /* Used in Modules/_threadmodule.c. */
    long num_threads;
    /* Support for runtime thread stack size tuning.
       A value of 0 means using the platform's default stack size
       or the size specified by the THREAD_STACK_SIZE macro. */
    /* Used in Python/thread.c. */
    size_t pythread_stacksize;

    PyObject *codec_search_path;
    PyObject *codec_search_cache;
    PyObject *codec_error_registry;
    int codecs_initialized;
    int fscodec_initialized;

    _PyCoreConfig core_config;
    _PyMainInterpreterConfig config;
#ifdef HAVE_DLOPEN
    int dlopenflags;
#endif

    PyObject *builtins_copy;
    PyObject *import_func;
    /* Initialized to PyEval_EvalFrameDefault(). */
    _PyFrameEvalFunction eval_frame;

    Py_ssize_t co_extra_user_count;
    freefunc co_extra_freefuncs[MAX_CO_EXTRA_USERS];

#ifdef HAVE_FORK
    PyObject *before_forkers;
    PyObject *after_forkers_parent;
    PyObject *after_forkers_child;
#endif
    /* AtExit module */
    void (*pyexitfunc)(PyObject *);
    PyObject *pyexitmodule;

    uint64_t tstate_next_unique_id;
} PyInterpreterState;
  • 综上所述,可以猜测Python的运行时环境如下:


你可能感兴趣的:(大师兄的Python源码学习笔记(十一): Python的虚拟机框架)