python 内置类型(六)---bytes、bytearray、memoryview

4.8. Binary Sequence Types — `bytes`, `bytearray`, `memoryview`

The core built-in types for manipulating binary data are bytes and bytearray. They are supported by memoryview which uses the buffer protocolto access the memory of other binary objects without needing to make a copy.

The array module supports efficient storage of basic data types like 32-bit integers and IEEE754 double-precision floating values.

4.8.1. Bytes Objects

Bytes objects are immutable sequences of single bytes. Since many major binary protocols are based on the ASCII text encoding, bytes objects offer several methods that are only valid when working with ASCII compatible data and are closely related to string objects in a variety of other ways.

class bytes([source[, encoding[, errors]]])

Firstly, the syntax for bytes literals is largely the same as that for string literals, except that a b prefix is added:

Single quotes: b'still allows embedded "double" quotes'
Double quotes: b"still allows embedded 'single' quotes".
Triple quoted: b'''3 single quotes''', b"""3 double quotes"""

Only ASCII characters are permitted in bytes literals (regardless of the declared source code encoding). Any binary values over 127 must be entered into bytes literals using the appropriate escape sequence.

As with string literals, bytes literals may also use a r prefix to disable processing of escape sequences. See String and Bytes literals for more about the various forms of bytes literal, including supported escape sequences.

While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256 (attempts to violate this restriction will trigger ValueError. This is done deliberately to emphasise that while many binary formats include ASCII based elements and can be usefully manipulated with some text-oriented algorithms, this is not generally the case for arbitrary binary data (blindly applying text processing algorithms to binary data formats that are not ASCII compatible will usually lead to data corruption).

In addition to the literal forms, bytes objects can be created in a number of other ways:

A zero-filled bytes object of a specified length: bytes(10)
From an iterable of integers: bytes(range(20))
Copying existing binary data via the buffer protocol: bytes(obj)

Also see the bytes built-in.

Since 2 hexadecimal digits correspond precisely to a single byte, hexadecimal numbers are a commonly used format for describing binary data. Accordingly, the bytes type has an additional class method to read data in that format:

classmethod fromhex(string)

This bytes class method returns a bytes object, decoding the given string object. The string must contain two hexadecimal digits per byte, with ASCII whitespace being ignored.

>>> bytes.fromhex('2Ef0 F1f2 ')
 b'.\xf0\xf1\xf2'

A reverse conversion function exists to transform a bytes object into its hexadecimal representation.

hex()

Return a string object containing two hexadecimal digits for each byte in the instance.

>>> b'\xf0\xf1\xf2'.hex()
 'f0f1f2'

New in version 3.5.

Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1)

The representation of bytes objects uses the literal format (b'...') since it is often more useful than e.g. bytes([46, 46, 46]). You can always convert a bytes object into a list of integers using list(b).

Note

For Python 2.x users: In the Python 2.x series, a variety of implicit conversions between 8-bit strings (the closest thing 2.x offers to a built-in binary data type) and Unicode strings were permitted. This was a backwards compatibility workaround to account for the fact that Python originally only supported 8-bit text, and Unicode text was a later addition. In Python 3.x, those implicit conversions are gone - conversions between 8-bit binary data and Unicode text must be explicit, and bytes and string objects will always compare unequal.

4.8.2. Bytearray Objects

bytearray objects are a mutable counterpart to bytes objects.

class bytearray([source[, encoding[, errors]]])

There is no dedicated literal syntax for bytearray objects, instead they are always created by calling the constructor:

Creating an empty instance: bytearray()
Creating a zero-filled instance with a given length: bytearray(10)
From an iterable of integers: bytearray(range(20))
Copying existing binary data via the buffer protocol: bytearray(b'Hi!')

As bytearray objects are mutable, they support the mutable sequence operations in addition to the common bytes and bytearray operations described in Bytes and Bytearray Operations.

Also see the bytearray built-in.

Since 2 hexadecimal digits correspond precisely to a single byte, hexadecimal numbers are a commonly used format for describing binary data. Accordingly, the bytearray type has an additional class method to read data in that format:

classmethod fromhex(string)

This bytearray class method returns bytearray object, decoding the given string object. The string must contain two hexadecimal digits per byte, with ASCII whitespace being ignored.

>>> bytearray.fromhex('2Ef0 F1f2 ')
 bytearray(b'.\xf0\xf1\xf2')

A reverse conversion function exists to transform a bytearray object into its hexadecimal representation.

hex()

Return a string object containing two hexadecimal digits for each byte in the instance.

>>> bytearray(b'\xf0\xf1\xf2').hex()
 'f0f1f2'

New in version 3.5.

Since bytearray objects are sequences of integers (akin to a list), for a bytearray object b, b[0] will be an integer, while b[0:1] will be a bytearray object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1)

The representation of bytearray objects uses the bytes literal format (bytearray(b'...')) since it is often more useful than e.g. bytearray([46,46, 46]). You can always convert a bytearray object into a list of integers using list(b).

4.8.3. Bytes and Bytearray Operations

Both bytes and bytearray objects support the common sequence operations. They interoperate not just with operands of the same type, but with any bytes-like object. Due to this flexibility, they can be freely mixed in operations without causing errors. However, the return type of the result may depend on the order of operands.

Note

The methods on bytes and bytearray objects don’t accept strings as their arguments, just as the methods on strings don’t accept bytes as their arguments. For example, you have to write:

a = "abc"
 b = a.replace("a", "f")

and:

a = b"abc"
 b = a.replace(b"a", b"f")

Some bytes and bytearray operations assume the use of ASCII compatible binary formats, and hence should be avoided when working with arbitrary binary data. These restrictions are covered below.

Note

Using these ASCII based operations to manipulate binary data that is not stored in an ASCII based format may lead to data corruption.

The following methods on bytes and bytearray objects can be used with arbitrary binary data.

bytes.``count(sub[, start[, end]])

bytearray.``count(sub[, start[, end]])

Return the number of non-overlapping occurrences of subsequence sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.

The subsequence to search for may be any bytes-like object or an integer in the range 0 to 255.

Changed in version 3.3: Also accept an integer in the range 0 to 255 as the subsequence.

bytes.``decode(encoding="utf-8", errors="strict")

bytearray.``decode(encoding="utf-8", errors="strict")

Return a string decoded from the given bytes. Default encoding is 'utf-8'. errors may be given to set a different error handling scheme. The default for errors is 'strict', meaning that encoding errors raise a UnicodeError. Other possible values are 'ignore', 'replace' and any other name registered via codecs.register_error(), see section Error Handlers. For a list of possible encodings, see section Standard Encodings.

Note

Passing the encoding argument to str allows decoding any bytes-like object directly, without needing to make a temporary bytes or bytearray object.

Changed in version 3.1: Added support for keyword arguments.

bytes.``endswith(suffix[, start[, end]])

bytearray.``endswith(suffix[, start[, end]])

Return True if the binary data ends with the specified suffix, otherwise return False. suffix can also be a tuple of suffixes to look for. With optional start, test beginning at that position. With optional end, stop comparing at that position.

The suffix(es) to search for may be any bytes-like object.

bytes.``find(sub[, start[, end]])

bytearray.``find(sub[, start[, end]])

Return the lowest index in the data where the subsequence sub is found, such that sub is contained in the slice s[start:end]. Optional arguments start and end are interpreted as in slice notation. Return -1 if sub is not found.

The subsequence to search for may be any bytes-like object or an integer in the range 0 to 255.

Note

The find() method should be used only if you need to know the position of sub. To check if sub is a substring or not, use the inoperator:

>>> b'Py' in b'Python'
 True

Changed in version 3.3: Also accept an integer in the range 0 to 255 as the subsequence.

bytes.``index(sub[, start[, end]])

bytearray.``index(sub[, start[, end]])

Like find(), but raise ValueError when the subsequence is not found.

The subsequence to search for may be any bytes-like object or an integer in the range 0 to 255.

Changed in version 3.3: Also accept an integer in the range 0 to 255 as the subsequence.

bytes.``join(iterable)

bytearray.``join(iterable)

Return a bytes or bytearray object which is the concatenation of the binary data sequences in iterable. A TypeError will be raised if there are any values in iterable that are not bytes-like objects, including str objects. The separator between elements is the contents of the bytes or bytearray object providing this method.

static bytes.``maketrans(from, to)

static bytearray.``maketrans(from, to)

This static method returns a translation table usable for bytes.translate() that will map each character in from into the character at the same position in to; from and to must both be bytes-like objects and have the same length.

New in version 3.1.

bytes.``partition(sep)

bytearray.``partition(sep)

Split the sequence at the first occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself or its bytearray copy, and the part after the separator. If the separator is not found, return a 3-tuple containing a copy of the original sequence, followed by two empty bytes or bytearray objects.

The separator to search for may be any bytes-like object.

bytes.``replace(old, new[, count])

bytearray.``replace(old, new[, count])

Return a copy of the sequence with all occurrences of subsequence old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

The subsequence to search for and its replacement may be any bytes-like object.

Note

The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.

bytes.``rfind(sub[, start[, end]])

bytearray.``rfind(sub[, start[, end]])

Return the highest index in the sequence where the subsequence sub is found, such that sub is contained within s[start:end]. Optional arguments start and end are interpreted as in slice notation. Return -1 on failure.

The subsequence to search for may be any bytes-like object or an integer in the range 0 to 255.

Changed in version 3.3: Also accept an integer in the range 0 to 255 as the subsequence.

bytes.``rindex(sub[, start[, end]])

bytearray.``rindex(sub[, start[, end]])

Like rfind() but raises ValueError when the subsequence sub is not found.

The subsequence to search for may be any bytes-like object or an integer in the range 0 to 255.

Changed in version 3.3: Also accept an integer in the range 0 to 255 as the subsequence.

bytes.``rpartition(sep)

bytearray.``rpartition(sep)

Split the sequence at the last occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself or its bytearray copy, and the part after the separator. If the separator is not found, return a 3-tuple containing a copy of the original sequence, followed by two empty bytes or bytearray objects.

The separator to search for may be any bytes-like object.

bytes.``startswith(prefix[, start[, end]])

bytearray.``startswith(prefix[, start[, end]])

Return True if the binary data starts with the specified prefix, otherwise return False. prefix can also be a tuple of prefixes to look for. With optional start, test beginning at that position. With optional end, stop comparing at that position.

The prefix(es) to search for may be any bytes-like object.

bytes.``translate(table, delete=b'')

bytearray.``translate(table, delete=b'')

Return a copy of the bytes or bytearray object where all bytes occurring in the optional argument delete are removed, and the remaining bytes have been mapped through the given translation table, which must be a bytes object of length 256.

You can use the bytes.maketrans() method to create a translation table.

Set the table argument to None for translations that only delete characters:

>>> b'read this short text'.translate(None, b'aeiou')
 b'rd ths shrt txt'

Changed in version 3.6: delete is now supported as a keyword argument.

The following methods on bytes and bytearray objects have default behaviours that assume the use of ASCII compatible binary formats, but can still be used with arbitrary binary data by passing appropriate arguments. Note that all of the bytearray methods in this section do not operate in place, and instead produce new objects.

bytes.``center(width[, fillbyte])

bytearray.``center(width[, fillbyte])

Return a copy of the object centered in a sequence of length width. Padding is done using the specified fillbyte (default is an ASCII space). For bytes objects, the original sequence is returned if width is less than or equal to len(s).

Note

The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.

bytes.``ljust(width[, fillbyte])

bytearray.``ljust(width[, fillbyte])

Return a copy of the object left justified in a sequence of length width. Padding is done using the specified fillbyte (default is an ASCII space). For bytes objects, the original sequence is returned if width is less than or equal to len(s).

Note

The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.

bytes.``lstrip([chars])

bytearray.``lstrip([chars])

Return a copy of the sequence with specified leading bytes removed. The chars argument is a binary sequence specifying the set of byte values to be removed - the name refers to the fact this method is usually used with ASCII characters. If omitted or None, the chars argument defaults to removing ASCII whitespace. The chars argument is not a prefix; rather, all combinations of its values are stripped:

>>> b' spacious '.lstrip()
 b'spacious ' 
  
   
    
    b'www.example.com'.lstrip(b'cmowz.')
 b'example.com'

The binary sequence of byte values to remove may be any bytes-like object.

Note

The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.

bytes.``rjust(width[, fillbyte])

bytearray.``rjust(width[, fillbyte])

Return a copy of the object right justified in a sequence of length width. Padding is done using the specified fillbyte (default is an ASCII space). For bytes objects, the original sequence is returned if width is less than or equal to len(s).

Note

The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.

bytes.``rsplit(sep=None, maxsplit=-1)

bytearray.``rsplit(sep=None, maxsplit=-1)

Split the binary sequence into subsequences of the same type, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done, the rightmost ones. If sep is not specified or None, any subsequence consisting solely of ASCII whitespace is a separator. Except for splitting from the right, rsplit() behaves like split() which is described in detail below.

bytes.``rstrip([chars])

bytearray.``rstrip([chars])

Return a copy of the sequence with specified trailing bytes removed. The chars argument is a binary sequence specifying the set of byte values to be removed - the name refers to the fact this method is usually used with ASCII characters. If omitted or None, the chars argument defaults to removing ASCII whitespace. The chars argument is not a suffix; rather, all combinations of its values are stripped:

>>> b' spacious '.rstrip()
 b' spacious' 
  
   
    
    b'mississippi'.rstrip(b'ipz')
 b'mississ'

The binary sequence of byte values to remove may be any bytes-like object.

Note

The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.

bytes.``split(sep=None, maxsplit=-1)

bytearray.``split(sep=None, maxsplit=-1)

Split the binary sequence into subsequences of the same type, using sep as the delimiter string. If maxsplit is given and non-negative, at mostmaxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified or is -1, then there is no limit on the number of splits (all possible splits are made).

If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty subsequences (for example, b'1,,2'.split(b',') returns [b'1', b'', b'2']). The sep argument may consist of a multibyte sequence (for example, b'1<>2<>3'.split(b'<>') returns [b'1', b'2', b'3']). Splitting an empty sequence with a specified separator returns [b''] or [bytearray(b'')] depending on the type of object being split. The sep argument may be any bytes-like object.

For example:

>>> b'1,2,3'.split(b',')
 [b'1', b'2', b'3'] 
  
   
    
    b'1,2,3'.split(b',', maxsplit=1)
 [b'1', b'2,3']
 b'1,2,,3,'.split(b',')
 [b'1', b'2', b'', b'3', b'']

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive ASCII whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the sequence has leading or trailing whitespace. Consequently, splitting an empty sequence or a sequence consisting solely of ASCII whitespace without a specified separator returns [].

For example:

>>> b'1 2 3'.split()
 [b'1', b'2', b'3'] 
  
   
    
    b'1 2 3'.split(maxsplit=1)
 [b'1', b'2 3']
 b' 1 2 3 '.split()
 [b'1', b'2', b'3']

bytes.``strip([chars])

bytearray.``strip([chars])

Return a copy of the sequence with specified leading and trailing bytes removed. The chars argument is a binary sequence specifying the set of byte values to be removed - the name refers to the fact this method is usually used with ASCII characters. If omitted or None, the charsargument defaults to removing ASCII whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped:

>>> b' spacious '.strip()
 b'spacious' 
  
   
    
    b'www.example.com'.strip(b'cmowz.')
 b'example'

The binary sequence of byte values to remove may be any bytes-like object.

Note

The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.

The following methods on bytes and bytearray objects assume the use of ASCII compatible binary formats and should not be applied to arbitrary binary data. Note that all of the bytearray methods in this section do not operate in place, and instead produce new objects.

bytes.``capitalize()

bytearray.``capitalize()

Return a copy of the sequence with each byte interpreted as an ASCII character, and the first byte capitalized and the rest lowercased. Non-ASCII byte values are passed through unchanged.

Note

The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.

bytes.``expandtabs(tabsize=8)

bytearray.``expandtabs(tabsize=8)

Return a copy of the sequence where all ASCII tab characters are replaced by one or more ASCII spaces, depending on the current column and the given tab size. Tab positions occur every tabsize bytes (default is 8, giving tab positions at columns 0, 8, 16 and so on). To expand the sequence, the current column is set to zero and the sequence is examined byte by byte. If the byte is an ASCII tab character (b'\t'), one or more space characters are inserted in the result until the current column is equal to the next tab position. (The tab character itself is not copied.) If the current byte is an ASCII newline (b'\n') or carriage return (b'\r'), it is copied and the current column is reset to zero. Any other byte value is copied unchanged and the current column is incremented by one regardless of how the byte value is represented when printed:

>>> b'01\t012\t0123\t01234'.expandtabs()
 b'01 012 0123 01234' 
  
   
    
    b'01\t012\t0123\t01234'.expandtabs(4)
 b'01 012 0123 01234'

Note

The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.

bytes.``isalnum()

bytearray.``isalnum()

Return true if all bytes in the sequence are alphabetical ASCII characters or ASCII decimal digits and the sequence is not empty, false otherwise. Alphabetic ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'. ASCII decimal digits are those byte values in the sequence b'0123456789'.

For example:

>>> b'ABCabc1'.isalnum()
 True 
  
   
    
    b'ABC abc1'.isalnum()
 False

bytes.``isalpha()

bytearray.``isalpha()

Return true if all bytes in the sequence are alphabetic ASCII characters and the sequence is not empty, false otherwise. Alphabetic ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'.

For example:

>>> b'ABCabc'.isalpha()
 True 
  
   
    
    b'ABCabc1'.isalpha()
 False

bytes.``isdigit()

bytearray.``isdigit()

Return true if all bytes in the sequence are ASCII decimal digits and the sequence is not empty, false otherwise. ASCII decimal digits are those byte values in the sequence b'0123456789'.

For example:

>>> b'1234'.isdigit()
 True 
  
   
    
    b'1.23'.isdigit()
 False

bytes.``islower()

bytearray.``islower()

Return true if there is at least one lowercase ASCII character in the sequence and no uppercase ASCII characters, false otherwise.

For example:

>>> b'hello world'.islower()
 True 
  
   
    
    b'Hello world'.islower()
 False

Lowercase ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyz'. Uppercase ASCII characters are those byte values in the sequence b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.

bytes.``isspace()

bytearray.``isspace()

Return true if all bytes in the sequence are ASCII whitespace and the sequence is not empty, false otherwise. ASCII whitespace characters are those byte values in the sequence b' \t\n\r\x0b\f' (space, tab, newline, carriage return, vertical tab, form feed).

bytes.``istitle()

bytearray.``istitle()

Return true if the sequence is ASCII titlecase and the sequence is not empty, false otherwise. See bytes.title() for more details on the definition of “titlecase”.

For example:

>>> b'Hello World'.istitle()
 True 
  
   
    
    b'Hello world'.istitle()
 False

bytes.``isupper()

bytearray.``isupper()

Return true if there is at least one uppercase alphabetic ASCII character in the sequence and no lowercase ASCII characters, false otherwise.

For example:

>>> b'HELLO WORLD'.isupper()
 True 
  
   
    
    b'Hello world'.isupper()
 False

Lowercase ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyz'. Uppercase ASCII characters are those byte values in the sequence b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.

bytes.``lower()

bytearray.``lower()

Return a copy of the sequence with all the uppercase ASCII characters converted to their corresponding lowercase counterpart.

For example:

>>> b'Hello World'.lower()
 b'hello world'

Lowercase ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyz'. Uppercase ASCII characters are those byte values in the sequence b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.

Note

The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.

bytes.``splitlines(keepends=False)

bytearray.``splitlines(keepends=False)

Return a list of the lines in the binary sequence, breaking at ASCII line boundaries. This method uses the universal newlines approach to splitting lines. Line breaks are not included in the resulting list unless keepends is given and true.

For example:

>>> b'ab c\n\nde fg\rkl\r\n'.splitlines()
 [b'ab c', b'', b'de fg', b'kl'] 
  
   
    
    b'ab c\n\nde fg\rkl\r\n'.splitlines(keepends=True)
 [b'ab c\n', b'\n', b'de fg\r', b'kl\r\n']

Unlike split() when a delimiter string sep is given, this method returns an empty list for the empty string, and a terminal line break does not result in an extra line:

>>> b"".split(b'\n'), b"Two lines\n".split(b'\n')
 ([b''], [b'Two lines', b'']) 
  
   
    
    b"".splitlines(), b"One line\n".splitlines()
 ([], [b'One line'])

bytes.``swapcase()

bytearray.``swapcase()

Return a copy of the sequence with all the lowercase ASCII characters converted to their corresponding uppercase counterpart and vice-versa.

For example:

>>> b'Hello World'.swapcase()
 b'hELLO wORLD'

Lowercase ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyz'. Uppercase ASCII characters are those byte values in the sequence b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.

Unlike str.swapcase(), it is always the case that bin.swapcase().swapcase() == bin for the binary versions. Case conversions are symmetrical in ASCII, even though that is not generally true for arbitrary Unicode code points.

Note

The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.

bytes.``title()

bytearray.``title()

Return a titlecased version of the binary sequence where words start with an uppercase ASCII character and the remaining characters are lowercase. Uncased byte values are left unmodified.

For example:

>>> b'Hello world'.title()
 b'Hello World'

Lowercase ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyz'. Uppercase ASCII characters are those byte values in the sequence b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'. All other byte values are uncased.

The algorithm uses a simple language-independent definition of a word as groups of consecutive letters. The definition works in many contexts but it means that apostrophes in contractions and possessives form word boundaries, which may not be the desired result:

>>> b"they're bill's friends from the UK".title()
 b"They'Re Bill'S Friends From The Uk"

A workaround for apostrophes can be constructed using regular expressions:

>>> import re 
  
   
    
    def titlecase(s):
 ... return re.sub(rb"[A-Za-z]+('[A-Za-z]+)?",
 ... lambda mo: mo.group(0)[0:1].upper() +
 ... mo.group(0)[1:].lower(),
 ... s)
 ...
 titlecase(b"they're bill's friends.")
 b"They're Bill's Friends."

Note

The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.

bytes.``upper()

bytearray.``upper()

Return a copy of the sequence with all the lowercase ASCII characters converted to their corresponding uppercase counterpart.

For example:

>>> b'Hello World'.upper()
 b'HELLO WORLD'

Lowercase ASCII characters are those byte values in the sequence b'abcdefghijklmnopqrstuvwxyz'. Uppercase ASCII characters are those byte values in the sequence b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.

Note

The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.

bytes.``zfill(width)

bytearray.``zfill(width)

Return a copy of the sequence left filled with ASCII b'0' digits to make a sequence of length width. A leading sign prefix (b'+'/ b'-' is handled by inserting the padding after the sign character rather than before. For bytes objects, the original sequence is returned if width is less than or equal to len(seq).

For example:

>>> b"42".zfill(5)
 b'00042' 
  
   
    
    b"-42".zfill(5)
 b'-0042'

Note

The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.

4.8.4. `printf`-style Bytes Formatting

Note

The formatting operations described here exhibit a variety of quirks that lead to a number of common errors (such as failing to display tuples and dictionaries correctly). If the value being printed may be a tuple or dictionary, wrap it in a tuple.

Bytes objects (bytes/bytearray) have one unique built-in operation: the % operator (modulo). This is also known as the bytes formatting or interpolation operator. Given format % values (where format is a bytes object), % conversion specifications in format are replaced with zero or more elements of values. The effect is similar to using the sprintf() in the C language.

If format requires a single argument, values may be a single non-tuple object. [5] Otherwise, values must be a tuple with exactly the number of items specified by the format bytes object, or a single mapping object (for example, a dictionary).

A conversion specifier contains two or more characters and has the following components, which must occur in this order:

The '%' character, which marks the start of the specifier.
Mapping key (optional), consisting of a parenthesised sequence of characters (for example, (somename)).
Conversion flags (optional), which affect the result of some conversion types.
Minimum field width (optional). If specified as an '*' (asterisk), the actual width is read from the next element of the tuple in values, and the object to convert comes after the minimum field width and optional precision.
Precision (optional), given as a '.' (dot) followed by the precision. If specified as '*' (an asterisk), the actual precision is read from the next element of the tuple in values, and the value to convert comes after the precision.
Length modifier (optional).
Conversion type.

When the right argument is a dictionary (or other mapping type), then the formats in the bytes object must include a parenthesised mapping key into that dictionary inserted immediately after the '%' character. The mapping key selects the value to be formatted from the mapping. For example:

>>> print(b'%(language)s has %(number)03d quote types.' %
 ... {b'language': b"Python", b"number": 2})
 b'Python has 002 quote types.'

In this case no * specifiers may occur in a format (since they require a sequential parameter list).

The conversion flag characters are:

Flag	Meaning
`'#'`	The value conversion will use the “alternate form” (where defined below).
`'0'`	The conversion will be zero padded for numeric values.
`'-'`	The converted value is left adjusted (overrides the `'0'` conversion if both are given).
`' '`	(a space) A blank should be left before a positive number (or empty string) produced by a signed conversion.
`'+'`	A sign character (`'+'` or `'-'`) will precede the conversion (overrides a “space” flag).

Flag	Meaning
`'#'`	The value conversion will use the “alternate form” (where defined below).
`'0'`	The conversion will be zero padded for numeric values.
`'-'`	The converted value is left adjusted (overrides the `'0'` conversion if both are given).
`' '`	(a space) A blank should be left before a positive number (or empty string) produced by a signed conversion.
`'+'`	A sign character (`'+'` or `'-'`) will precede the conversion (overrides a “space” flag).

A length modifier (h, l, or L) may be present, but is ignored as it is not necessary for Python – so e.g. %ld is identical to %d.

The conversion types are:

Conversion	Meaning	Notes
`'d'`	Signed integer decimal.
`'i'`	Signed integer decimal.
`'o'`	Signed octal value.	(1)
`'u'`	Obsolete type – it is identical to `'d'`.	(8)
`'x'`	Signed hexadecimal (lowercase).	(2)
`'X'`	Signed hexadecimal (uppercase).	(2)
`'e'`	Floating point exponential format (lowercase).	(3)
`'E'`	Floating point exponential format (uppercase).	(3)
`'f'`	Floating point decimal format.	(3)
`'F'`	Floating point decimal format.	(3)
`'g'`	Floating point format. Uses lowercase exponential format if exponent is less than -4 or not less than precision, decimal format otherwise.	(4)
`'G'`	Floating point format. Uses uppercase exponential format if exponent is less than -4 or not less than precision, decimal format otherwise.	(4)
`'c'`	Single byte (accepts integer or single byte objects).
`'b'`	Bytes (any object that follows the buffer protocol or has `__bytes__()`).	(5)
`'s'`	`'s'` is an alias for `'b'` and should only be used for Python2/3 code bases.	(6)
`'a'`	Bytes (converts any Python object using `repr(obj).encode('ascii','backslashreplace)`).	(5)
`'r'`	`'r'` is an alias for `'a'` and should only be used for Python2/3 code bases.	(7)
`'%'`	No argument is converted, results in a `'%'` character in the result.

Conversion	Meaning	Notes
`'d'`	Signed integer decimal.
`'i'`	Signed integer decimal.
`'o'`	Signed octal value.	(1)
`'u'`	Obsolete type – it is identical to `'d'`.	(8)
`'x'`	Signed hexadecimal (lowercase).	(2)
`'X'`	Signed hexadecimal (uppercase).	(2)
`'e'`	Floating point exponential format (lowercase).	(3)
`'E'`	Floating point exponential format (uppercase).	(3)
`'f'`	Floating point decimal format.	(3)
`'F'`	Floating point decimal format.	(3)
`'g'`	Floating point format. Uses lowercase exponential format if exponent is less than -4 or not less than precision, decimal format otherwise.	(4)
`'G'`	Floating point format. Uses uppercase exponential format if exponent is less than -4 or not less than precision, decimal format otherwise.	(4)
`'c'`	Single byte (accepts integer or single byte objects).
`'b'`	Bytes (any object that follows the buffer protocol or has `__bytes__()`).	(5)
`'s'`	`'s'` is an alias for `'b'` and should only be used for Python2/3 code bases.	(6)
`'a'`	Bytes (converts any Python object using `repr(obj).encode('ascii','backslashreplace)`).	(5)
`'r'`	`'r'` is an alias for `'a'` and should only be used for Python2/3 code bases.	(7)
`'%'`	No argument is converted, results in a `'%'` character in the result.

Notes:

The alternate form causes a leading octal specifier ('0o') to be inserted before the first digit.
The alternate form causes a leading '0x' or '0X' (depending on whether the 'x' or 'X' format was used) to be inserted before the first digit.
The alternate form causes the result to always contain a decimal point, even if no digits follow it.

The precision determines the number of digits after the decimal point and defaults to 6.
The alternate form causes the result to always contain a decimal point, and trailing zeroes are not removed as they would otherwise be.

The precision determines the number of significant digits before and after the decimal point and defaults to 6.
If precision is N, the output is truncated to N characters.
b'%s' is deprecated, but will not be removed during the 3.x series.
b'%r' is deprecated, but will not be removed during the 3.x series.
See PEP 237.

Note

The bytearray version of this method does not operate in place - it always produces a new object, even if no changes were made.

4.8.5. Memory Views

memoryview objects allow Python code to access the internal data of an object that supports the buffer protocol without copying.

class memoryview(obj)

Create a memoryview that references obj. obj must support the buffer protocol. Built-in objects that support the buffer protocol include bytesand bytearray.

A memoryview has the notion of an element, which is the atomic memory unit handled by the originating object obj. For many simple types such as bytes and bytearray, an element is a single byte, but other types such as array.array may have bigger elements.

len(view) is equal to the length of tolist. If view.ndim = 0, the length is 1. If view.ndim = 1, the length is equal to the number of elements in the view. For higher dimensions, the length is equal to the length of the nested list representation of the view. The itemsize attribute will give you the number of bytes in a single element.

A memoryview supports slicing and indexing to expose its data. One-dimensional slicing will result in a subview:

>>> v = memoryview(b'abcefg') 
  
   
    
    v[1]
 98
 v[-1]
 103
 v[1:4]
 
 bytes(v[1:4])
 b'bce'

If format is one of the native format specifiers from the struct module, indexing with an integer or a tuple of integers is also supported and returns a single element with the correct type. One-dimensional memoryviews can be indexed with an integer or a one-integer tuple. Multi-dimensional memoryviews can be indexed with tuples of exactly ndim integers where ndim is the number of dimensions. Zero-dimensional memoryviews can be indexed with the empty tuple.

Here is an example with a non-byte format:

>>> import array 
  
   
    
    a = array.array('l', [-11111111, 22222222, -33333333, 44444444])
 m = memoryview(a)
 m[0]
 -11111111
 m[-1]
 44444444
 m[::2].tolist()
 [-11111111, -33333333]

If the underlying object is writable, the memoryview supports one-dimensional slice assignment. Resizing is not allowed:

>>> data = bytearray(b'abcefg') 
  
   
    
    v = memoryview(data)
 v.readonly
 False
 v[0] = ord(b'z')
 data
 bytearray(b'zbcefg')
 v[1:4] = b'123'
 data
 bytearray(b'z123fg')
 v[2:3] = b'spam'
 Traceback (most recent call last): File "", line 1, in 
 ValueError: memoryview assignment: lvalue and rvalue have different structures
 v[2:6] = b'spam'
 data
 bytearray(b'z1spam')

One-dimensional memoryviews of hashable (read-only) types with formats ‘B’, ‘b’ or ‘c’ are also hashable. The hash is defined as hash(m) ==hash(m.tobytes()):

>>> v = memoryview(b'abcefg') 
  
   
    
    hash(v) == hash(b'abcefg')
 True
 hash(v[2:4]) == hash(b'ce')
 True
 hash(v[::-2]) == hash(b'abcefg'[::-2])
 True

Changed in version 3.3: One-dimensional memoryviews can now be sliced. One-dimensional memoryviews with formats ‘B’, ‘b’ or ‘c’ are now hashable.

Changed in version 3.4: memoryview is now registered automatically with collections.abc.Sequence

Changed in version 3.5: memoryviews can now be indexed with tuple of integers.

memoryview has several methods:

__eq__(exporter)

A memoryview and a PEP 3118 exporter are equal if their shapes are equivalent and if all corresponding values are equal when the operands’ respective format codes are interpreted using struct syntax.

For the subset of struct format strings currently supported by tolist(), v and w are equal if v.tolist() == w.tolist():

>>> import array 
  
   
    
    a = array.array('I', [1, 2, 3, 4, 5])
 b = array.array('d', [1.0, 2.0, 3.0, 4.0, 5.0])
 c = array.array('b', [5, 3, 1])
 x = memoryview(a)
 y = memoryview(b)
 x == a == y == b
 True
 x.tolist() == a.tolist() == y.tolist() == b.tolist()
 True
 z = y[::-2]
 z == c
 True
 z.tolist() == c.tolist()
 True

If either format string is not supported by the struct module, then the objects will always compare as unequal (even if the format strings and buffer contents are identical):

>>> from ctypes import BigEndianStructure, c_long 
  
   
    
    class BEPoint(BigEndianStructure):
 ... fields = [("x", c_long), ("y", c_long)]
 ...
 point = BEPoint(100, 200)
 a = memoryview(point)
 b = memoryview(point)
 a == point
 False
 a == b
 False

Note that, as with floating point numbers, v is w does not imply v == w for memoryview objects.

Changed in version 3.3: Previous versions compared the raw memory disregarding the item format and the logical array structure.

tobytes()

Return the data in the buffer as a bytestring. This is equivalent to calling the bytes constructor on the memoryview.

>>> m = memoryview(b"abc") 
  
   
    
    m.tobytes()
 b'abc'
 bytes(m)
 b'abc'

For non-contiguous arrays the result is equal to the flattened list representation with all elements converted to bytes. tobytes() supports all format strings, including those that are not in struct module syntax.

hex()

Return a string object containing two hexadecimal digits for each byte in the buffer.

>>> m = memoryview(b"abc") 
  
   
    
    m.hex()
 '616263'

New in version 3.5.

tolist()

Return the data in the buffer as a list of elements.

>>> memoryview(b'abc').tolist()
 [97, 98, 99] 
  
   
    
    import array
 a = array.array('d', [1.1, 2.2, 3.3])
 m = memoryview(a)
 m.tolist()
 [1.1, 2.2, 3.3]

Changed in version 3.3: tolist() now supports all single character native formats in struct module syntax as well as multi-dimensional representations.

release()

Release the underlying buffer exposed by the memoryview object. Many objects take special actions when a view is held on them (for example, a bytearray would temporarily forbid resizing); therefore, calling release() is handy to remove these restrictions (and free any dangling resources) as soon as possible.

After this method has been called, any further operation on the view raises a ValueError (except release() itself which can be called multiple times):

>>> m = memoryview(b'abc') 
  
   
    
    m.release()
 m[0]
 Traceback (most recent call last): File "", line 1, in 
 ValueError: operation forbidden on released memoryview object

The context management protocol can be used for a similar effect, using the with statement:

>>> with memoryview(b'abc') as m:
 ... m[0]
 ...
 97 
  
   
    
    m[0]
 Traceback (most recent call last): File "", line 1, in 
 ValueError: operation forbidden on released memoryview object

New in version 3.2.

cast(format[, shape])

Cast a memoryview to a new format or shape. shape defaults to [byte_length//new_itemsize], which means that the result view will be one-dimensional. The return value is a new memoryview, but the buffer itself is not copied. Supported casts are 1D -> C-contiguous and C-contiguous -> 1D.

The destination format is restricted to a single element native format in struct syntax. One of the formats must be a byte format (‘B’, ‘b’ or ‘c’). The byte length of the result must be the same as the original length.

Cast 1D/long to 1D/unsigned bytes:

>>> import array 
  
   
    
    a = array.array('l', [1,2,3])
 x = memoryview(a)
 x.format
 'l'
 x.itemsize
 8
 len(x)
 3
 x.nbytes
 24
 y = x.cast('B')
 y.format
 'B'
 y.itemsize
 1
 len(y)
 24
 y.nbytes
 24

Cast 1D/unsigned bytes to 1D/char:

>>> b = bytearray(b'zyz') 
  
   
    
    x = memoryview(b)
 x[0] = b'a'
 Traceback (most recent call last): File "", line 1, in 
 ValueError: memoryview: invalid value for format "B"
 y = x.cast('c')
 y[0] = b'a'
 b
 bytearray(b'ayz')

Cast 1D/bytes to 3D/ints to 1D/signed char:

>>> import struct 
  
   
    
    buf = struct.pack("i"*12, list(range(12)))
 x = memoryview(buf)
 y = x.cast('i', shape=[2,2,3])
 y.tolist()
 [[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]]
 y.format
 'i'
 y.itemsize
 4
 len(y)
 import array
 >>> a = array.array('l', [1,2,3])
 >>> x = memoryview(a)
 >>> x.format
 'l'
 >>> x.itemsize
 8
 >>> len(x)
 3
 >>> x.nbytes
 24
 >>> y = x.cast('B')
 >>> y.format
 'B'
 >>> y.itemsize
 1
 >>> len(y)
 24
 >>> y.nbytes
 24

Cast 1D/unsigned bytes to 1D/char:

>>>

>>> b = bytearray(b'zyz')
>>> x = memoryview(b)
>>> x[0] = b'a'
Traceback (most recent call last): File "", line 1, in
ValueError: memoryview: invalid value for format "B"
>>> y = x.cast('c')
>>> y[0] = b'a'
>>> b
bytearray(b'ayz')

Cast 1D/bytes to 3D/ints to 1D/signed char:

>>>

>>> import struct
 >>> buf = struct.pack("i"12, *list(range(12)))
 >>> x = memoryview(buf)
 >>> y = x.cast('i', shape=[2,2,3])
 >>> y.tolist()
 [[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]]
 >>> y.format
 'i'
 >>> y.itemsize
 4
 >>> len(y)
 2
 y.nbytes 
     
     a = array.array('l', [1,2,3])
 >>> x = memoryview(a)
 >>> x.format
 'l'
 >>> x.itemsize
 8
 >>> len(x)
 3
 >>> x.nbytes
 24
 >>> y = x.cast('B')
 >>> y.format
 'B'
 >>> y.itemsize
 1
 >>> len(y)
 24
 >>> y.nbytes
 24

Cast 1D/unsigned bytes to 1D/char:

>>>

>>> b = bytearray(b'zyz')
 >>> x = memoryview(b)
 >>> x[0] = b'a'
 Traceback (most recent call last): File "", line 1, in 
 ValueError: memoryview: invalid value for format "B"
 >>> y = x.cast('c')
 >>> y[0] = b'a'
 >>> b
 bytearray(b'ayz')

Cast 1D/bytes to 3D/ints to 1D/signed char:

>>>

>>> import struct
 >>> buf = struct.pack("i"*12, list(range(12)))
 >>> x = memoryview(buf)
 >>> y = x.cast('i', shape=[2,2,3])
 >>> y.tolist()
 [[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]]
 >>> y.format
 'i'
 >>> y.itemsize
 4
 >>> len(y)
 2
 >>> y.nbytes
 48
 z = y.cast('b')
 z.format
 x = memoryview(a)
 >>> x.format
 'l'
 >>> x.itemsize
 8
 >>> len(x)
 3
 >>> x.nbytes
 24
 >>> y = x.cast('B')
 >>> y.format
 'B'
 >>> y.itemsize
 1
 >>> len(y)
 24
 >>> y.nbytes
 24

>>> import struct
 >>> buf = struct.pack("i"12, *list(range(12)))
 >>> x = memoryview(buf)
 >>> y = x.cast('i', shape=[2,2,3])
 >>> y.tolist()
 [[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]]
 >>> y.format
 'i'
 >>> y.itemsize
 4
 >>> len(y)
 2
 >>> y.nbytes
 48
 >>> z = y.cast('b')
 >>> z.format
 'b'
 z.itemsize
 1
 len(z)
 48 
     
    z.nbytes
 48

Cast 1D/unsigned char to 2D/unsigned long:

>>> buf = struct.pack("L"*6, *list(range(6))) 
  
   
    
    x = memoryview(buf)
 y = x.cast('L', shape=[2,3])
 len(y)
 2
 y.nbytes
 48
 y.tolist()
 [[0, 1, 2], [3, 4, 5]]

New in version 3.3.

Changed in version 3.5: The source format is no longer restricted when casting to a byte view.

There are also several readonly attributes available:

obj

The underlying object of the memoryview:

>>> b = bytearray(b'xyz') 
  
   
    
    m = memoryview(b)
 m.obj is b
 True

New in version 3.3.

nbytes

nbytes == product(shape) * itemsize == len(m.tobytes()). This is the amount of space in bytes that the array would use in a contiguous representation. It is not necessarily equal to len(m):

>>> import array 
  
   
    
    a = array.array('i', [1,2,3,4,5])
 m = memoryview(a)
 len(m)
 5
 m.nbytes
 20
 y = m[::2]
 len(y)
 3
 y.nbytes
 12
 len(y.tobytes()) 
     
     
>>> b = bytearray(b'xyz')
 >>> m = memoryview(b)
 >>> m.obj is b
 True
 

 
 New in version 3.3.

nbytes

nbytes == product(shape) * itemsize == len(m.tobytes()). This is the amount of space in bytes that the array would use in a contiguous representation. It is not necessarily equal to len(m):

>>>

>>> import array
 >>> a = array.array('i', [1,2,3,4,5])
 >>> m = memoryview(a)
 >>> len(m)
 5
 >>> m.nbytes
 20
 >>> y = m[::2]
 >>> len(y)
 3
 >>> y.nbytes
 12
 >>> len(y.tobytes())
 12

Multi-dimensional arrays:

>>> import struct 
  
   
    
    buf = struct.pack("d"*12, [1.5x for x in range(12)])
 x = memoryview(buf)
 y = x.cast('d', shape=[3,4])
 y.tolist()
 [[0.0, 1.5, 3.0, 4.5], [6.0, 7.5, 9.0, 10.5], [12.0, 13.5, 15.0, 16.5]]
 len(y)
 3
 y.nbytes
 96

New in version 3.3.

readonly

A bool indicating whether the memory is read only.

format

A string containing the format (in struct module style) for each element in the view. A memoryview can be created from exporters with arbitrary format strings, but some methods (e.g. tolist()) are restricted to native single element formats.

Changed in version 3.3: format 'B' is now handled according to the struct module syntax. This means that memoryview(b'abc')[0] ==b'abc'[0] == 97.

itemsize

The size in bytes of each element of the memoryview:

>>> import array, struct 
  
   
    
    m = memoryview(array.array('H', [32000, 32001, 32002]))
 m.itemsize
 2
 m[0]
 32000
 struct.calcsize('H') == m.itemsize
 True

ndim

An integer indicating how many dimensions of a multi-dimensional array the memory represents.

shape

A tuple of integers the length of ndim giving the shape of the memory as an N-dimensional array.

Changed in version 3.3: An empty tuple instead of None when ndim = 0.

strides

A tuple of integers the length of ndim giving the size in bytes to access each element for each dimension of the array.

Changed in version 3.3: An empty tuple instead of None when ndim = 0.

suboffsets

Used internally for PIL-style arrays. The value is informational only.

c_contiguous

A bool indicating whether the memory is C-contiguous.

New in version 3.3.

f_contiguous

A bool indicating whether the memory is Fortran contiguous.

New in version 3.3.

contiguous

A bool indicating whether the memory is contiguous.

New in version 3.3.

python 内置类型(六)---bytes、bytearray、memoryview

4.8. Binary Sequence Types — bytes, bytearray, memoryview

4.8.1. Bytes Objects

4.8.2. Bytearray Objects

4.8.3. Bytes and Bytearray Operations

4.8.4. printf-style Bytes Formatting

4.8.5. Memory Views

你可能感兴趣的:(python 内置类型(六)---bytes、bytearray、memoryview)

4.8. Binary Sequence Types — `bytes`, `bytearray`, `memoryview`

4.8.4. `printf`-style Bytes Formatting