3、NumPy basics(基础)

1 . Data types(数据类型)

see also: Data type objects

Array types and conversions between types(数组类型和类型之间的转换)


NumPy supports a much greater variety of numerical types than Python does. This section shows which are available, and how to modify an array’s data-type.
Since many of these have platform-dependent definitions, a set of fixed-size aliases are provided



NumPy numerical types are instances of dtype (data-type) objects, each having unique characteristics. Once you have imported NumPy using

import numpy as np

There are 5 basic numerical types representing booleans (bool), integers (int), unsigned integers (uint), floating point (float) and complex.
Data-types can be used as functions to convert python numbers to array scalars (see the array scalar section for an explanation), python sequences of numbers to arrays of that type, or as arguments to the dtype keyword that many numpy functions or methods accept. Some examples:


Array types can also be referred to by character codes, mostly to retain backward compatibility with older packages such as Numeric. Some documentation may still refer to these, for example:

To convert the type of an array, use the .astype() method (preferred) or the type itself as a function. For example:

To determine the type of an array, look at the dtype attribute:

dtype objects also contain information about the type, such as its bit-width and its byte-order. The data type can also be used indirectly to query properties of the type, such as whether it is an integer:

Array Scalars(数组标量)


NumPy generally returns elements of arrays as array scalars (a scalar with an associated dtype). Array scalars differ from Python scalars, but for the most part they can be used interchangeably (the primary exception is for versions of Python older than v2.x, where integer array scalars cannot act as indices for lists and tuples). There are some exceptions, such as when code requires very specific attributes of a scalar or when it checks specifically whether a value is a Python scalar. Generally, problems are easily fixed by explicitly converting array scalars to Python scalars, using the corresponding Python type function (e.g., int, float, complex, str, unicode).

Overflow Errors(溢出错误)


The fixed size of NumPy numeric types may cause overflow errors when a value requires more memory than available in the data type. For example, numpy.power evaluates 100 * 10 ** 8 correctly for 64-bit integers, but gives 1874919424 (incorrect) for a 32-bit integer.


The behaviour of NumPy and Python integer types differs significantly for integer overflows and may confuse users expecting NumPy integers to behave similar to Python’s int. Unlike NumPy, the size of Python’s int is flexible. This means Python integers may expand to accommodate any integer and will not overflow.
NumPy provides numpy.iinfo and numpy.finfo to verify the minimum or maximum values of NumPy integer and floating point values respectively

If 64-bit integers are still too small the result may be cast to a floating point number. Floating point numbers offer a larger, but inexact, range of possible values.

Extended Precision(拓展精度)

Python’s floating-point numbers are usually 64-bit floating-point numbers, nearly equivalent to np.float64. In some unusual situations it may be useful to use floating-point numbers with more precision. Whether this is possible in numpy depends on the hardware and on the development environment: specifically, x86 machines provide hardware floating-point with 80-bit precision, and while most C compilers provide this as their long double type, MSVC (standard for Windows builds) makes long double identical to double (64 bits). NumPy makes the compiler’s long double available as np.longdouble (and np.clongdouble for the complex numbers). You can find out what your numpy provides with np.finfo(np.longdouble).

NumPy does not provide a dtype with more precision than C’s long double; in particular, the 128-bit IEEE quad precision data type (FORTRAN’s REAL*16) is not available.

For efficient memory alignment, np.longdouble is usually stored padded with zero bits, either to 96 or 128 bits. Which is more efficient depends on hardware and development environment; typically on 32-bit systems they are padded to 96 bits, while on 64-bit systems they are typically padded to 128 bits. np.longdouble is padded to the system default; np.float96 and np.float128 are provided for users who want specific padding. In spite of the names, np.float96 and np.float128 provide only as much precision as np.longdouble, that is, 80 bits on most x86 machines and 64 bits in standard Windows builds.

Be warned that even if np.longdouble offers more precision than python float, it is easy to lose that extra precision, since python often forces values to pass through float. For example, the % formatting operator requires its arguments to be converted to standard python types, and it is therefore impossible to preserve extended precision even if many decimal places are requested. It can be useful to test your code with the value 1 + np.finfo(np.longdouble).eps.

2 . Array creation(数组创建)

Introduction

There are 5 general mechanisms for creating arrays:

  • Conversion from other Python structures (e.g., lists, tuples)
  • Intrinsic numpy array creation objects (e.g., arange, ones, zeros, etc.)
  • Reading arrays from disk, either from standard or custom formats
  • Creating arrays from raw bytes through the use of strings or buffers
  • Use of special library functions (e.g., random)

Converting Python array_like Objects to NumPy Arrays


In general, numerical data arranged in an array-like structure in Python can be converted to arrays through the use of the array() function. The most obvious examples are lists and tuples. See the documentation for array() for details for its use. Some objects may support the array-protocol and allow conversion to arrays this way. A simple way to find out if the object can be converted to a numpy array using array() is simply to try it interactively and see if it works! (The Python Way).

Intrinsic NumPy Array Creation(内部numpy创建数组)


NumPy has built-in functions for creating arrays from scratch:
zeros(shape) will create an array filled with 0 values with the specified shape. The default dtype is float64.



ones(shape) will create an array filled with 1 values. It is identical to zeros in all other respects.

arange() will create arrays with regularly incrementing values. Check the docstring for complete information on the various ways it can be used. A few examples will be given here:



Note that there are some subtleties regarding the last usage that the user should be aware of that are described in the arange docstring.

linspace() will create arrays with a specified number of elements, and spaced equally between the specified beginning and end values. For example:



The advantage of this creation function is that one can guarantee the number of elements and the starting and end point, which arange() generally will not do for arbitrary start, stop, and step values.
indices() will create a set of arrays (stacked as a one-higher dimensioned array), one per dimension with each representing variation in that dimension. An example illustrates much better than a verbal description:



This is particularly useful for evaluating functions of multiple dimensions on a regular grid.

Reading Arrays From Disk(从磁盘读取阵列)


This is presumably the most common case of large array creation. The details, of course, depend greatly on the format of data on disk and so this section can only give general pointers on how to handle various formats.

Standard Binary Formats(标准二进制格式)


Various fields have standard formats for array data. The following lists the ones with known python libraries to read them and return numpy arrays (there may be others for which it is possible to read and convert to numpy arrays so check the last section as well)

HDF5:h5py
FITS:Astropy

Examples of formats that cannot be read directly but for which it is not hard to convert are those formats supported by libraries like PIL (able to read and write many image formats such as jpg, png, etc).

Common ASCII Formats(通用ASCII格式)


Comma Separated Value files (CSV) are widely used (and an export and import option for programs like Excel). There are a number of ways of reading these files in Python. There are CSV functions in Python and functions in pylab (part of matplotlib).

More generic ascii files can be read using the io package in scipy.

Custom Binary Formats


There are a variety of approaches one can use. If the file has a relatively simple format then one can write a simple I/O library and use the numpy fromfile() function and .tofile() method to read and write numpy arrays directly (mind your byteorder though!) If a good C or C++ library exists that read the data, one can wrap that library with a variety of techniques though that certainly is much more work and requires significantly more advanced knowledge to interface with C or C++.

Use of Special Libraries


There are libraries that can be used to generate arrays for special purposes and it isn’t possible to enumerate all of them. The most common uses are use of the many array generation functions in random that can generate arrays of random values, and some utility functions to generate special matrices (e.g. diagonal).

3 . I/O with Numpy


Importing data with genfromtxt


NumPy provides several functions to create arrays from tabular data. We focus here on the genfromtxtfunction.
In a nutshell, genfromtxt runs two main loops. The first loop converts each line of the file in a sequence of strings. The second loop converts each string to the appropriate data type. This mechanism is slower than a single loop, but gives more flexibility. In particular, genfromtxt is able to take missing data into account, when other faster and simpler functions like loadtxt cannot.

import numpy as np
from io import StringIO

Defining the input


The only mandatory argument of genfromtxt is the source of the data. It can be a string, a list of strings, or a generator. If a single string is provided, it is assumed to be the name of a local or remote file, or an open file-like object with a read method, for example, a file or io.StringIO object. If a list of strings or a generator returning strings is provided, each string is treated as one line in a file. When the URL of a remote file is passed, the file is automatically downloaded to the current directory and opened.

Recognized file types are text files and archives. Currently, the function recognizes gzip and bz2 (bzip2) archives. The type of the archive is determined from the extension of the file: if the filename ends with '.gz', a gzip archive is expected; if it ends with 'bz2', a bzip2 archive is assumed.

Splitting the lines into columns(将行拆分成列)


****The delimiter argument(分隔符参数)


Once the file is defined and open for reading, genfromtxt splits each non-empty line into a sequence of strings. Empty or commented lines are just skipped. The delimiter keyword is used to define how the splitting should take place.

Quite often, a single character marks the separation between columns. For example, comma-separated files (CSV) use a comma (,) or a semicolon (;) as delimiter:


Another common separator is "\t", the tabulation character. However, we are not limited to a single character, any string will do. By default, genfromtxt assumes delimiter=None, meaning that the line is split along white spaces (including tabs) and that consecutive white spaces are considered as a single white space.

Alternatively, we may be dealing with a fixed-width file, where columns are defined as a given number of characters. In that case, we need to set delimiter to a single integer (if all the columns have the same size) or to a sequence of integers (if columns can have different sizes):

The autostrip argument


By default, when a line is decomposed into a series of strings, the individual entries are not stripped of leading nor trailing white spaces. This behavior can be overwritten by setting the optional argument autostrip to a value of True:


The comments argument


The optional argument comments is used to define a character string that marks the beginning of a comment. By default, genfromtxt assumes comments='#'. The comment marker may occur anywhere on the line. Any character present after the comment marker(s) is simply ignored:


New in version 1.7.0: When comments is set to None, no lines are treated as comments.

Note:
There is one notable exception to this behavior: if the optional argument names=True, the first commented line will be examined for names.

Skipping lines and choosing columns(跳过行并选择列)


The 'skip_header' and 'skip_footer' arguments


The presence of a header in the file can hinder data processing. In that case, we need to use the skip_header optional argument. The values of this argument must be an integer which corresponds to the number of lines to skip at the beginning of the file, before any other action is performed. Similarly, we can skip the last n lines of the file by using the skip_footer attribute and giving it a value of n:


By default, skip_header=0 and skip_footer=0, meaning that no lines are skipped.

The usecols argument


In some cases, we are not interested in all the columns of the data but only a few of them. We can select which columns to import with the usecols argument. This argument accepts a single integer or a sequence of integers corresponding to the indices of the columns to import. Remember that by convention, the first column has an index of 0. Negative integers behave the same as regular Python negative indexes.

For example, if we want to import only the first and the last columns, we can use usecols=(0, -1):



If the columns have names, we can also select which columns to import by giving their name to the usecols argument, either as a sequence of strings or a comma-separated string:


Choosing the data type

The main way to control how the sequences of strings we have read from the file are converted to other types is to set the dtype argument. Acceptable values for this argument are:

  • a single type, such as dtype=float. The output will be 2D with the given dtype, unless a name has been associated with each column with the use of the names argument (see below). Note that dtype=float is the default for genfromtxt.
  • a sequence of types, such as dtype=(int, float, float).
  • a comma-separated string, such as dtype="i4,f8,|U3".
  • a dictionary with two keys 'names' and 'formats'.
  • a sequence of tuples (name, type), such as dtype=[('A', int), ('B', float)].
  • an existing numpy.dtype object.
  • the special value None. In that case, the type of the columns will be determined from the data itself (see below).

In all the cases but the first one, the output will be a 1D array with a structured dtype. This dtype has as many fields as items in the sequence. The field names are defined with the names keyword.

When dtype=None, the type of each column is determined iteratively from its data. We start by checking whether a string can be converted to a boolean (that is, if the string matches true or false in lower cases); then whether it can be converted to an integer, then to a float, then to a complex and eventually to a string. This behavior may be changed by modifying the default mapper of the StringConverter class.

The option dtype=None is provided for convenience. However, it is significantly slower than setting the dtype explicitly.

Setting the names


The names argument


A natural approach when dealing with tabular data is to allocate a name to each column. A first possibility is to use an explicit structured dtype, as mentioned previously:



Another simpler possibility is to use the names keyword with a sequence of strings or a comma-separated string:



In the example above, we used the fact that by default, dtype=float. By giving a sequence of names, we are forcing the output to a structured dtype.

We may sometimes need to define the column names from the data itself. In that case, we must use the names keyword with a value of True. The names will then be read from the first line (after the skip_header ones), even if the line is commented out:



The default value of names is None. If we give any other value to the keyword, the new names will overwrite the field names we may have defined with the dtype:


The defaultfmt argument


If names=None but a structured dtype is expected, names are defined with the standard NumPy default of "f%i", yielding names like f0, f1 and so forth:


In the same way, if we don’t give enough names to match the length of the dtype, the missing names will be defined with this default template:

We can overwrite this default with the defaultfmt argument, that takes any format string:

Note:
We need to keep in mind that defaultfmt is used only if some names are expected but not defined.

Validating names(验证名称)


NumPy arrays with a structured dtype can also be viewed as recarray, where a field can be accessed as if it were an attribute. For that reason, we may need to make sure that the field name doesn’t contain any space or invalid character, or that it does not correspond to the name of a standard attribute (like size or shape), which would confuse the interpreter. genfromtxt accepts three optional arguments that provide a finer control on the names:

deletechars
Gives a string combining all the characters that must be deleted from the name. By default, invalid characters are !@#$%^&*()-=+|]}[{';: /?.>,<.
excludelist
Gives a list of the names to exclude, such as return, file, print… If one of the input name is part of this list, an underscore character ('_') will be appended to it.
case_sensitive
Whether the names should be case-sensitive (case_sensitive=True), converted to upper case (case_sensitive=False or case_sensitive='upper') or to lower case (case_sensitive='lower').

Tweaking the conversion(调整转换)


The converters argument


Usually, defining a dtype is sufficient to define how the sequence of strings must be converted. However, some additional control may sometimes be required. For example, we may want to make sure that a date in a format YYYY/MM/DD is converted to a datetime object, or that a string like xx% is properly converted to a float between 0 and 1. In such cases, we should define conversion functions with the converters arguments.

The value of this argument is typically a dictionary with column indices or column names as keys and a conversion functions as values. These conversion functions can either be actual functions or lambda functions. In any case, they should accept only a string as input and output only a single element of the wanted type.

In the following example, the second column is converted from as string representing a percentage to a float between 0 and 1:



We need to keep in mind that by default, dtype=float. A float is therefore expected for the second column. However, the strings ' 2.3%' and ' 78.9%' cannot be converted to float and we end up having np.nan instead. Let’s now use a converter:



The same results can be obtained by using the name of the second column ("p") as key instead of its index (1):

Converters can also be used to provide a default for missing entries. In the following example, the converter convert transforms a stripped string into the corresponding float or into -999 if the string is empty. We need to explicitly strip the string from white spaces as it is not done by default:


Using missing and filling values(使用缺失值和填充值)


Some entries may be missing in the dataset we are trying to import. In a previous example, we used a converter to transform an empty string into a float. However, user-defined converters may rapidly become cumbersome to manage.

The genfromtxt function provides two other complementary mechanisms: the missing_values argument is used to recognize missing data and a second argument, filling_values, is used to process these missing data.

missing_values


By default, any empty string is marked as missing. We can also consider more complex strings, such as "N/A" or "???" to represent missing or invalid data. The missing_values argument accepts three kind of values:

a string or a comma-separated string
This string will be used as the marker for missing data for all the columns
a sequence of strings
In that case, each item is associated to a column, in order.
a dictionary
Values of the dictionary are strings or sequence of strings. The corresponding keys can be column indices (integers) or column names (strings). In addition, the special key None can be used to define a default applicable to all columns.

filling_values


We know how to recognize missing data, but we still need to provide a value for these missing entries. By default, this value is determined from the expected dtype according to this table:

Expected type Default
bool False
int -1
float np.nan
complex np.nan+0j
string '???'

We can get a finer control on the conversion of missing values with the filling_values optional argument. Like missing_values, this argument accepts different kind of values:

a single value
This will be the default for all columns
a sequence of values
Each entry will be the default for the corresponding column
a dictionary
Each key can be a column index or a column name, and the corresponding value should be a single object. We can use the special key None to define a default for all columns.
In the following example, we suppose that the missing values are flagged with "N/A" in the first column and by "???" in the third column. We wish to transform these missing values to 0 if they occur in the first and second column, and to -999 if they occur in the last column:

usemask(使用掩码)


We may also want to keep track of the occurrence of missing data by constructing a boolean mask, with True entries where data was missing and False otherwise. To do that, we just have to set the optional argument usemask to True (the default is False). The output array will then be a MaskedArray.

Shortcut function


In addition to genfromtxt, the numpy.lib.io module provides several convenience functions derived from genfromtxt. These functions work the same way as the original, but they have different default values.
recfromtxt
Returns a standard numpy.recarray (if usemask=False) or a MaskedRecords array (if usemaske=True). The default dtype is dtype=None, meaning that the types of each column will be automatically determined.
recfromcsv
Like recfromtxt, but with a default delimiter=",".

4 . Indexing(索引)


See also
Indexing
Indexing routines
Array indexing refers to any use of the square brackets ([]) to index array values. There are many options to indexing, which give numpy indexing great power, but with power comes some complexity and the potential for confusion. This section is just an overview of the various options and issues related to indexing. Aside from single element indexing, the details on most of these options are to be found in related sections.

Assignment vs referencing(赋值与引用)


Most of the following examples show the use of indexing when referencing data in an array. The examples work just as well when assigning to an array. See the section at the end for specific examples and explanations on how assignments work.

Single element indexing


Single element indexing for a 1-D array is what one expects. It work exactly like that for other standard Python sequences. It is 0-based, and accepts negative indices for indexing from the end of the array.


Unlike lists and tuples, numpy arrays support multidimensional indexing for multidimensional arrays. That means that it is not necessary to separate each dimension’s index into its own set of square brackets.



Note that if one indexes a multidimensional array with fewer indices than dimensions, one gets a subdimensional array. For example:


That is, each index specified selects the array corresponding to the rest of the dimensions selected. In the above example, choosing 0 means that the remaining dimension of length 5 is being left unspecified, and that what is returned is an array of that dimensionality and size. It must be noted that the returned array is not a copy of the original, but points to the same values in memory as does the original array. In this case, the 1-D array at the first position (0) is returned. So using a single index on the returned array, results in a single element being returned. That is:


So note that x[0,2] = x[0][2] though the second case is more inefficient as a new temporary array is created after the first index that is subsequently indexed by 2.

Note to those used to IDL or Fortran memory order as it relates to indexing. NumPy uses C-order indexing. That means that the last index usually represents the most rapidly changing memory location, unlike Fortran or IDL, where the first index represents the most rapidly changing location in memory. This difference represents a great potential for confusion

Other indexing options


It is possible to slice and stride arrays to extract arrays of the same number of dimensions, but of different sizes than the original. The slicing and striding works exactly the same way it does for lists and tuples except that they can be applied to multiple dimensions as well. A few examples illustrates best:



Note that slices of arrays do not copy the internal array data but only produce new views of the original data. This is different from list or tuple slicing and an explicit copy() is recommended if the original data is not required anymore.

It is possible to index arrays with other arrays for the purposes of selecting lists of values out of arrays into new arrays. There are two different ways of accomplishing this. One uses one or more arrays of index values. The other involves giving a boolean array of the proper shape to indicate the values to be selected. Index arrays are a very powerful tool that allow one to avoid looping over individual elements in arrays and thus greatly improve performance.

It is possible to use special features to effectively increase the number of dimensions in an array through indexing so the resulting array acquires the shape needed for use in an expression or with a specific function.

Index arrays


NumPy arrays may be indexed with other arrays (or any other sequence- like object that can be converted to an array, such as lists, with the exception of tuples; see the end of this document for why this is). The use of index arrays ranges from simple, straightforward cases to complex, hard-to-understand cases. For all cases of index arrays, what is returned is a copy of the original data, not a view as one gets for slices.

Index arrays must be of integer type. Each value in the array indicates which value in the array to use in place of the index. To illustrate:


The index array consisting of the values 3, 3, 1 and 8 correspondingly create an array of length 4 (same as the index array) where each index is replaced by the value the index array has in the array being indexed.
Negative values are permitted and work as they do with single indices or slices:



It is an error to have index values out of bounds:


Generally speaking, what is returned when index arrays are used is an array with the same shape as the index array, but with the type and values of the array being indexed. As an example, we can use a multidimensional index array instead:


Indexing Multi-dimensional arrays(索引多维数组)


Things become more complex when multidimensional arrays are indexed, particularly with multidimensional index arrays. These tend to be more unusual uses, but they are permitted, and they are useful for some problems. We’ll start with the simplest multidimensional case (using the array y from the previous examples):



In this case, if the index arrays have a matching shape, and there is an index array for each dimension of the array being indexed, the resultant array has the same shape as the index arrays, and the values correspond to the index set for each position in the index arrays. In this example, the first index value is 0 for both index arrays, and thus the first value of the resultant array is y[0,0]. The next value is y[2,1], and the last is y[4,2].

If the index arrays do not have the same shape, there is an attempt to broadcast them to the same shape. If they cannot be broadcast to the same shape, an exception is raised:



The broadcasting mechanism permits index arrays to be combined with scalars for other indices. The effect is that the scalar value is used for all the corresponding values of the index arrays:



Jumping to the next level of complexity, it is possible to only partially index an array with index arrays. It takes a bit of thought to understand what happens in such cases. For example if we just use one index array with y:

What results is the construction of a new array where each value of the index array selects one row from the array being indexed and the resultant array has the resulting shape (number of index elements, size of row).

An example of where this may be useful is for a color lookup table where we want to map the values of an image into RGB triples for display. The lookup table could have a shape (nlookup, 3). Indexing such an array with an image with shape (ny, nx) with dtype=np.uint8 (or any integer type so long as values are with the bounds of the lookup table) will result in an array of shape (ny, nx, 3) where a triple of RGB values is associated with each pixel location.

In general, the shape of the resultant array will be the concatenation of the shape of the index array (or the shape that all the index arrays were broadcast to) with the shape of any unused dimensions (those not indexed) in the array being indexed.

Boolean or "mask" index arrays(布尔或“屏蔽”索引数组)


Boolean arrays used as indices are treated in a different manner entirely than index arrays. Boolean arrays must be of the same shape as the initial dimensions of the array being indexed. In the most straightforward case, the boolean array has the same shape:


Unlike in the case of integer index arrays, in the boolean case, the result is a 1-D array containing all the elements in the indexed array corresponding to all the true elements in the boolean array. The elements in the indexed array are always iterated and returned inrow-major (C-style) order. The result is also identical to y[np.nonzero(b)]. As with index arrays, what is returned is a copy of the data, not a view as one gets with slices.
The result will be multidimensional if y has more dimensions than b. For example:


Here the 4th and 5th rows are selected from the indexed array and combined to make a 2-D array.

In general, when the boolean array has fewer dimensions than the array being indexed, this is equivalent to y[b, …], which means y is indexed by b followed by as many : as are needed to fill out the rank of y. Thus the shape of the result is one dimension containing the number of True elements of the boolean array, followed by the remaining dimensions of the array being indexed.

For example, using a 2-D boolean array of shape (2,3) with four True elements to select rows from a 3-D array of shape (2,3,5) results in a 2-D result of shape (4,5):



For further details, consult the numpy reference documentation on array indexing.

Combining index arrays with slices(将索引数组与切片组合)


Index arrays may be combined with slices. For example:



In effect, the slice is converted to an index array np.array([[1,2]]) (shape (1,2)) that is broadcast with the index array to produce a resultant array of shape (3,2).

Likewise, slicing can be combined with broadcasted boolean indices:


Structural indexing tools(结构索引工具)


To facilitate easy matching of array shapes with expressions and in assignments, the np.newaxis object can be used within array indices to add new dimensions with a size of 1. For example:



Note that there are no new elements in the array, just that the dimensionality is increased. This can be handy to combine two arrays in a way that otherwise would require explicitly reshaping operations. For example:


The ellipsis syntax maybe used to indicate selecting in full any remaining unspecified dimensions. For example:


Assigning values to indexed arrays(为索引数组赋值)


As mentioned, one can select a subset of an array to assign to using a single index, slices, and index and mask arrays. The value being assigned to the indexed array must be shape consistent (the same shape or broadcastable to the shape the index produces). For example, it is permitted to assign a constant to a slice:



or an array of the right size:



Note that assignments may result in changes if assigning higher types to lower types (like floats to ints) or even exceptions (assigning complex to floats or ints):

Unlike some of the references (such as array and mask indices) assignments are always made to the original data in the array (indeed, nothing else would make sense!). Note though, that some actions may not work as one may naively expect. This particular example is often surprising to people:



Where people expect that the 1st location will be incremented by 3. In fact, it will only be incremented by 1. The reason is because a new array is extracted from the original (as a temporary) containing the values at 1, 1, 3, 1, then the value 1 is added to the temporary, and then the temporary is assigned back to the original array. Thus the value of the array at x[1]+1 is assigned to x[1] three times, rather than being incremented 3 times.

Dealing with variable numbers of indices within programs(处理程序中索引的可变数目)


The index syntax is very powerful but limiting when dealing with a variable number of indices. For example, if you want to write a function that can handle arguments with various numbers of dimensions without having to write special case code for each number of possible dimensions, how can that be done? If one supplies to the index a tuple, the tuple will be interpreted as a list of indices. For example (using the previous definition for the array z):

So one can use code to construct tuples of any number of indices and then use these within an index.

Slices can be specified within programs by using the slice() function in Python. For example:



Likewise, ellipsis can be specified by code by using the Ellipsis object:



For this reason it is possible to use the output from the np.nonzero() function directly as an index since it always returns a tuple of index arrays.

Because the special treatment of tuples, they are not automatically converted to an array as a list would be. As an example:


5 . Broadcasting(广播)


See also:
numpy.broadcast
Array Broadcasting in Numpy
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.

NumPy operations are usually done on pairs of arrays on an element-by-element basis. In the simplest case, the two arrays must have exactly the same shape, as in the following example:

>>> a = np.array([1.0, 2.0, 3.0])
>>> b = np.array([2.0, 2.0, 2.0])
>>> a * b
array([ 2.,  4.,  6.])

NumPy’s broadcasting rule relaxes this constraint when the arrays’ shapes meet certain constraints. The simplest broadcasting example occurs when an array and a scalar value are combined in an operation:

>>> a = np.array([1.0, 2.0, 3.0])
>>> b = 2.0
>>> a * b
array([ 2.,  4.,  6.])

The result is equivalent to the previous example where b was an array. We can think of the scalar b being stretched during the arithmetic operation into an array with the same shape as a. The new elements in b are simply copies of the original scalar. The stretching analogy is only conceptual. NumPy is smart enough to use the original scalar value without actually making copies so that broadcasting operations are as memory and computationally efficient as possible.

The code in the second example is more efficient than that in the first because broadcasting moves less memory around during the multiplication (b is a scalar rather than an array).

General Broadcasting Rules


When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions and works its way forward. Two dimensions are compatible when
1.they are equal, or
2.one of them is 1
If these conditions are not met, a ValueError: operands could not be broadcast together exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input arrays.

Arrays do not need to have the same number of dimensions. For example, if you have a 256x256x3 array of RGB values, and you want to scale each color in the image by a different value, you can multiply the image by a one-dimensional array with 3 values. Lining up the sizes of the trailing axes of these arrays according to the broadcast rules, shows that they are compatible:



When either of the dimensions compared is one, the other is used. In other words, dimensions with size 1 are stretched or “copied” to match the other.

In the following example, both the A and B arrays have axes with length one that are expanded to a larger size during the broadcast operation:


Here are some more examples:


Here are examples of shapes that do not broadcast:


An example of broadcasting in practice:

>>> x = np.arange(4)
>>> xx = x.reshape(4,1)
>>> y = np.ones(5)
>>> z = np.ones((3,4))

>>> x.shape
(4,)

>>> y.shape
(5,)

>>> x + y
ValueError: operands could not be broadcast together with shapes (4,) (5,)

>>> xx.shape
(4, 1)

>>> y.shape
(5,)

>>> (xx + y).shape
(4, 5)

>>> xx + y
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.,  4.]])

>>> x.shape
(4,)

>>> z.shape
(3, 4)

>>> (x + z).shape
(3, 4)

>>> x + z
array([[ 1.,  2.,  3.,  4.],
       [ 1.,  2.,  3.,  4.],
       [ 1.,  2.,  3.,  4.]])

Broadcasting provides a convenient way of taking the outer product (or any other outer operation) of two arrays. The following example shows an outer addition operation of two 1-d arrays:



Here the newaxis index operator inserts a new axis into a, making it a two-dimensional 4x1 array. Combining the 4x1 array with b, which has shape (3,), yields a 4x3 array.

6 . Byte-swapping(字节交换)


Introduction to byte ordering and ndarrays


The ndarray is an object that provide a python array interface to data in memory.

It often happens that the memory that you want to view with an array is not of the same byte ordering as the computer on which you are running Python.

For example, I might be working on a computer with a little-endian CPU - such as an Intel Pentium, but I have loaded some data from a file written by a computer that is big-endian. Let’s say I have loaded 4 bytes from a file written by a Sun (big-endian) computer. I know that these 4 bytes represent two 16-bit integers. On a big-endian machine, a two-byte integer is stored with the Most Significant Byte (MSB) first, and then the Least Significant Byte (LSB). Thus the bytes are, in memory order:
1.MSB integer 1
2.LSB integer 1
3.MSB integer 2
4.LSB integer 2
Let’s say the two integers were in fact 1 and 770. Because 770 = 256 * 3 + 2, the 4 bytes in memory would contain respectively: 0, 1, 3, 2. The bytes I have loaded from the file would have these contents:



We might want to use an ndarray to access these integers. In that case, we can create an array around this memory, and tell numpy that there are two integers, and that they are 16 bit and big-endian:


Note the array dtype above of >i2. The > means ‘big-endian’ (< is little-endian) and i2 means ‘signed 2-byte integer’. For example, if our data represented a single unsigned 4-byte little-endian integer, the dtype string would be

In fact, why don’t we try that?


Returning to our big_end_arr - in this case our underlying data is big-endian (data endianness) and we’ve set the dtype to match (the dtype is also big-endian). However, sometimes you need to flip these around.

Changing byte ordering


As you can imagine from the introduction, there are two ways you can affect the relationship between the byte ordering of the array and the underlying memory it is looking at:

Change the byte-ordering information in the array dtype so that it interprets the underlying data as being in a different byte order. This is the role of arr.newbyteorder()
Change the byte-ordering of the underlying data, leaving the dtype interpretation as it was. This is what arr.byteswap() does.
The common situations in which you need to change byte ordering are:

Your data and dtype endianness don’t match, and you want to change the dtype so that it matches the data.
Your data and dtype endianness don’t match, and you want to swap the data so that they match the dtype
Your data and dtype endianness match, but you want the data swapped and the dtype to reflect this

Data and dtype endianness don’t match, change dtype to match data


We make something where they don’t match:


The obvious fix for this situation is to change the dtype so it gives the correct endianness:


Note the array has not changed in memory:


Data and type endianness don’t match, change data to match dtype


You might want to do this if you need the data in memory to be a certain ordering. For example you might be writing the memory out to a file that needs a certain byte ordering.


Now the array has changed in memory:


Data and dtype endianness match, swap data and dtype


You may have a correctly specified array dtype, but you need the array to have the opposite byte order in memory, and you want the dtype to match so the array values make sense. In this case you just do both of the previous operations:


An easier way of casting the data to a specific dtype and byte ordering can be achieved with the ndarray astype method:


7 . Structured arrays(结构化数组)


Introduction


Structured Datatypes


Structured Datatype Creation


Manipulating and Displaying Structured Datatypes


Automatic Byte Offsets and Alignment


Field Title


Union types


Indexing and Assignment to Structured arrays


Assigning data to a Structured Array


Assignment from Python NativeType(Tuples)


Assignment from Scalars


Assignment from other Structured Arrays


Assignment involving subarrays


Indexing Structured Arrays


Accessing Individual Fields


Accessing Multiple Fields


Indexing with an Integer to get a Structured Scalar


Viewing Structured Arrays Containing Objects


Structure Comparison


Record Arrays


Recarray Helper Functions


8 . Writing custom array containers(写入自定义数组容器)


9 . Subclassing ndarray(子类数据数组)


Introduction


ndarrays and object creation


View casting


Creating new from template


Relationship of view casting and new-from-template


Implications for subclassing


A brief Python primer on __new__ and __init__


The role of __array_finalize__


Simple example - adding an extra attribute to ndarray


Slightly more realistic example - attribute added to existing array


__array_ufunc__ for ufuncs


__array_wrap__ for ufuncs and other functions


Extra gotchas - custom __del__ methods and ndarray.base


Subclassing and Downstream Compatibility


你可能感兴趣的:(3、NumPy basics(基础))