在docx.oxml.text.paragraph模块中定义了CT_P段落对象元素类,但是CT_P中并未定义add_r等与CT_Run相关的方法。在不断探索源码逻辑的过程中,对这种自动为类注册合适的方法的功能进行了梳理——xmlchemy这个模块设计的真好!!!
大体逻辑如下:
本文以docx.oxml.text.paragraph.CT_P类创建为例,将重点对MetaOxmlElement元类、_BaseChildElement类的功能进行详细记录。注意:本文档参考的版本信息为python_docx=1.1.0
MetaOxmlElement元类的源码定义如下:
class MetaOxmlElement(type):
"""Metaclass for BaseOxmlElement."""
def __init__(cls, clsname: str, bases: Tuple[type, ...], namespace: Dict[str, Any]):
dispatchable = (
OneAndOnlyOne,
OneOrMore,
OptionalAttribute,
RequiredAttribute,
ZeroOrMore,
ZeroOrOne,
ZeroOrOneChoice,
)
for key, value in namespace.items():
if isinstance(value, dispatchable):
value.populate_class_members(cls, key)
for key, value in namespace.items()
迭代过程中,如果待创建类对象的属性值为dispatchable中的某种类型,则调用populate_class_members方法,注意传入的cls是指父节点,key是dispatchable对象对应的名称。BaseOxmlElement基础类是docx.oxml子包中所有元素类的基础类,其角色与etree.ElementBase类似,源码定义如下:
class BaseOxmlElement( # pyright: ignore[reportGeneralTypeIssues]
etree.ElementBase, metaclass=MetaOxmlElement
):
"""Effective base class for all custom element classes.
Adds standardized behavior to all classes in one place.
"""
MetaOxmlElement.__init__
,但是实例化创建的子类,会调用etree.ElementBase的初始化方法。_BaseChildElement是所有子元素的基础类对象,ZeroOrMore等类均继承该类。在该类中定义了诸多公用的方法,下面先介绍一部分,后续将结合CT_P创建过程逐步介绍。
class _BaseChildElement:
"""Base class for the child-element classes.
The child-element sub-classes correspond to varying cardinalities, such as ZeroOrOne
and ZeroOrMore.
"""
def __init__(self, nsptagname: str, successors: Tuple[str, ...] = ()):
super(_BaseChildElement, self).__init__()
self._nsptagname = nsptagname
self._successors = successors
def populate_class_members(
self, element_cls: MetaOxmlElement, prop_name: str
) -> None:
"""Baseline behavior for adding the appropriate methods to `element_cls`."""
self._element_cls = element_cls
self._prop_name = prop_name
ZeroOrMore是一种子元素类,其表示某一父节点允许拥有任意多个该种子节点对象。在word文档中,这是最常见的一种子节点元素了,比如word文档允许包含任意多个paragraph,单个paragraph允许包含任意多个run节点。ZeroOrMore的源码定义如下:
class ZeroOrMore(_BaseChildElement):
"""Defines an optional repeating child element for MetaOxmlElement."""
def populate_class_members(
self, element_cls: MetaOxmlElement, prop_name: str
) -> None:
"""Add the appropriate methods to `element_cls`."""
super(ZeroOrMore, self).populate_class_members(element_cls, prop_name)
self._add_list_getter()
self._add_creator()
self._add_inserter()
self._add_adder()
self._add_public_adder()
delattr(element_cls, prop_name)
继承_BaseChildElement,并实现自定义的populate_class_members——为父节点添加合适的方法。
CT_P表示
class CT_P(BaseOxmlElement):
"""`` element, containing the properties and text for a paragraph."""
add_r: Callable[[], CT_R]
get_or_add_pPr: Callable[[], CT_PPr]
hyperlink_lst: List[CT_Hyperlink]
r_lst: List[CT_R]
...
r = ZeroOrMore("w:r")
...
MetaOxmlElement.__init__(cls, clsname="CT_P", bases=(BaseOxmlElement,), namespace={...r: ZeroOrMore...}
。注意MetaOxmlElement初始化时传入的cls是CT_P,即待创建的类对象。namespace是一个字典,存储CT_P中定义的所有类属性与方法、以及一些模块信息,这里简化了,因为本文主要关注如何为CT_P自动添加合适的方法。key="r" and value=ZeroOrMore("w:r")
时,就会调用ZeroOrMore的populate_class_members(CT_P, "r")
。下述分项记录一下五条语句: def populate_class_members(
self, element_cls: MetaOxmlElement, prop_name: str
) -> None:
"""Add the appropriate methods to `element_cls`."""
...
self._add_list_getter()
self._add_creator()
self._add_inserter()
self._add_adder()
self._add_public_adder()
...
_add_list_getter方法定义在_BaseChildElement中,其定义如下:
def _add_list_getter(self):
"""Add a read-only ``{prop_name}_lst`` property to the element class to retrieve
a list of child elements matching this type."""
prop_name = "%s_lst" % self._prop_name
property_ = property(self._list_getter, None, None)
setattr(self._element_cls, prop_name, property_)
此时,self._prop_name存储的属性名称为“r”,即prop_name等于“r_lst”。第三句中的self._element_cls此时存储的父节点为“CT_P”,即第三句将self._list_getter方法设置为CT_P的可读特性。self._list_getter同样定义在_BaseChildElement中:
@property
def _list_getter(self):
"""Return a function object suitable for the "get" side of a list property
descriptor."""
def get_child_element_list(obj: BaseOxmlElement):
return obj.findall(qn(self._nsptagname))
get_child_element_list.__doc__ = (
"A list containing each of the ``<%s>`` child elements, in the o"
"rder they appear." % self._nsptagname
)
return get_child_element_list
r = ZeroOrMore("w:r")
,因此self._nsptagname等于“w:r”,qn函数是将命名空间前缀名称转换为限定性名称,即将“w:r”转换为“{http://schemas.openxmlformats.org/wordprocessingml/2006/main}r”_add_creator方法同样定义在_BaseChildElement内,其功能是为父节点添加一个合适的创建子节点的方法。源码定义如下:
def _add_creator(self):
"""Add a ``_new_{prop_name}()`` method to the element class that creates a new,
empty element of the correct type, having no attributes."""
creator = self._creator
creator.__doc__ = (
'Return a "loose", newly created ``<%s>`` element having no attri'
"butes, text, or children." % self._nsptagname
)
self._add_to_class(self._new_method_name, creator)
@property
def _creator(self) -> Callable[[BaseOxmlElement], BaseOxmlElement]:
"""Callable that creates an empty element of the right type, with no attrs."""
from docx.oxml.parser import OxmlElement
def new_child_element(obj: BaseOxmlElement):
return OxmlElement(self._nsptagname)
return new_child_element
def _add_to_class(self, name: str, method: Callable[..., Any]):
"""Add `method` to the target class as `name`, unless `name` is already defined
on the class."""
if hasattr(self._element_cls, name):
return
setattr(self._element_cls, name, method)
def _add_inserter(self):
"""Add an ``_insert_x()`` method to the element class for this child element."""
def _insert_child(obj: BaseOxmlElement, child: BaseOxmlElement):
obj.insert_element_before(child, *self._successors)
return child
_insert_child.__doc__ = (
"Return the passed ``<%s>`` element after inserting it as a chil"
"d in the correct sequence." % self._nsptagname
)
self._add_to_class(self._insert_method_name, _insert_child)
def _add_adder(self):
"""Add an ``_add_x()`` method to the element class for this child element."""
def _add_child(obj: BaseOxmlElement, **attrs: Any):
new_method = getattr(obj, self._new_method_name)
child = new_method()
for key, value in attrs.items():
setattr(child, key, value)
insert_method = getattr(obj, self._insert_method_name)
insert_method(child)
return child
_add_child.__doc__ = (
"Add a new ``<%s>`` child element unconditionally, inserted in t"
"he correct sequence." % self._nsptagname
)
self._add_to_class(self._add_method_name, _add_child)
def _add_public_adder(self):
"""Add a public ``add_x()`` method to the parent element class."""
def add_child(obj: BaseOxmlElement):
private_add_method = getattr(obj, self._add_method_name)
child = private_add_method()
return child
add_child.__doc__ = (
"Add a new ``<%s>`` child element unconditionally, inserted in t"
"he correct sequence." % self._nsptagname
)
self._add_to_class(self._public_add_method_name, add_child)