OPC(Open Packaging Conventions)定义于“ISO/IEC 29500-2”文件。“ISO/IEC 29500” 包含4个文件,构成了Office Open XML标准。其中OPC定义了Package的读写规范。为了帮助继续学习OPC,此文档主要记录OPC中的重要概念。
sequence of 8 bits treated as a unit
Byte是字节,Bit是计算机二进制基本单位,Byte字节由8个Bits组成。
linearly ordered sequence of bytes
线性有序的字节流。
Abstract model that defines abstract packages.
The abstract package model is a package abstraction that holds a collection of parts and relationships.
abstract package 是与physical package对应。
logical entity that holds a collection of parts and relationships.
多个parts及relationships组成abstract package。可以将part看成abstrat package中的一个节点, relationship作为节点之间的一条边,abstract package存在一个虚拟的根节点。整个abstract package类似一个图数据结构。在docx第三方库中,docx.opc.part.Part的初始化定义:
class OpcPackage(object):
"""Main API class for |python-opc|.
A new instance is constructed by calling the :meth:`open` class method with a path
to a package file or file-like object containing one.
"""
def __init__(self):
super(OpcPackage, self).__init__()
stream with a name, a MIME media type and associated common properties.
对于part对象,需要指定在Package范围内唯一的名称,MIME类型——以便告知程序如何处理该part内容,以及part内存在的字节流stream。在docx第三方库中,docx.opc.part.Part的初始化定义:
class Part(object):
"""
Base class for package parts. Provides common properties and methods, but
intended to be subclassed in client code to implement specific part
behaviors.
"""
def __init__(self, partname, content_type, blob=None, package=None):
super(Part, self).__init__()
self._partname = partname
self._content_type = content_type
self._blob = blob
self._package = package
part or package from which a connection is established by a relationship.
relationship关系边的起始端节点。该节点可以是part或者abstract package 虚拟根节点。
part or external resource to which a connection is established by a relationship.
relationship关系边的终止端节点。该节点可以是part节点或者外部资源。
package relationship or part relationship. package relationship is that connection from a package to a specific part in the same package, or to an external resource. part relationship is that connection from a part in a package to another part in the same package, or to an external
resource.
relationship将abstract package 中各个节点连接起来,使得用户可以通过虚拟根节点访问任意节点。docx.opc.rel._Relationship的初始化定义如下:
class _Relationship(object):
"""
Value object for relationship to part.
"""
def __init__(self, rId, reltype, target, baseURI, external=False):
super(_Relationship, self).__init__()
self._rId = rId
self._reltype = reltype
self._target = target
self._baseURI = baseURI
self._is_external = bool(external)
在abstract package中,每一relationship对象都具有唯一的rId及reltype。
absolute IRI for specifying the role of a relationship.
相关的IRI定义于pack IRI。
IRI that conforms to the pack scheme.
pack scheme is that URI scheme that allows IRIs to be used as a uniform mechanism for addressing parts within a package.
part containing an XML representation of relationships.
relationships 的XML文件一般定义于“/_rels/.rels”文件中。
physical format is specific file format, or other persistence or transport mechanism.
physical package is result of mapping an abstract package to a physical format.
physical package model 是将abstract package 按照指定的physical format要求,如docx格式——docx遵守zip标准,持久化到硬盘。
stream in a physical package representing an XML document that specifies the media type
of each part in the package.
part的MIME类型一般定义于"/[Content_Types].xml”文件中。
《Document description and processing languages — Office Open XML file formats》