05 API reference

Version queries

This module contains information about the library version.

reikna.version.version

A tuple with version numbers, major components first.

reikna.version.full_version

A string fully identifying the current build.

reikna.version.git_revision

A string with Git SHA identifying the revision used to create this build.

reikna.version.release

A boolean variable, equals True if current version is a release version.

Helpers

This module contains various auxiliary functions which are used throughout the library.

reikna.helpers.bounding_power_of_2(num)

Returns the minimal number of the form 2**m such that it is greater or equal to n.

reikna.helpers.default_strides(shape, itemsize)

Return the default strides (corresponding to a contiguous array) for an array of shape shape and elements of size itemsize bytes.

reikna.helpers.factors(num, limit=None)

Returns the list of pairs (factor, num/factor) for all factors of num (including 1 and num), sorted by factor. If limit is set, only pairs with factor <= limit are returned.

class reikna.helpers.ignore_integer_overflow

Context manager for ignoring integer overflow in numpy operations on scalars (not ignored by default because of a bug in numpy).

reikna.helpers.log2(num)

Integer-valued logarigthm with base 2. If n is not a power of 2, the result is rounded to the smallest number.

reikna.helpers.make_axes_innermost(ndim, axes)

Given the total number of array axes and a list of axes in this range, produce a transposition plan (suitable e.g. for numpy.transpose()) that will move make the given axes innermost (in the order they’re given). Returns the transposition plan, and the plan to transpose the resulting array back to the original axes order.

reikna.helpers.min_blocks(length, block)

Returns minimum number of blocks with length block necessary to cover the array with length length.

reikna.helpers.min_buffer_size(shape, itemsize, strides=None, offset=0)

Return the minimum memory buffer size (in bytes) that can fit an array with given parameters, starting at an offset bytes from the beginning of the buffer.

reikna.helpers.normalize_axes(ndim, axes)

Transform an iterable of array axes (which can be negative) or a single axis into a tuple of non-negative axes.

reikna.helpers.padded_buffer_parameters(shape, itemsize, pad=0)

For an array of shape shape, padded from all sizes with pad elements, return a tuple of (strides, offset, size (in bytes) of the required memory buffer), which would have to be requested when allocating such an array.

reikna.helpers.product(seq)

Returns the product of elements in the iterable seq.

reikna.helpers.template_def(signature, code)

Returns a Mako template with the given signature.

Parameters: signature – a list of postitional argument names, or a Signature object from funcsigs module.
Code: a body of the template.

reikna.helpers.template_for(filename)

Returns the Mako template object created from the file which has the same name as filename and the extension .mako. Typically used in computation modules as template_for(__filename__).

reikna.helpers.template_from(template)

Creates a Mako template object from a given string. If template already has render() method, does nothing.

reikna.helpers.wrap_in_tuple(seq_or_elem)

If seq_or_elem is a sequence, converts it to a tuple, otherwise returns a tuple with a single element seq_or_elem.

CLUDA layer

CLUDA is the foundation of reikna. It provides the unified access to basic features of CUDA and OpenCL, such as memory operations, compilation and so on. It can also be used by itself, if you want to write GPU API-independent programs and happen to only need a small subset of GPU API. The terminology is borrowed from OpenCL, since it is a more general API.

class reikna.cluda.Module(template_src, render_kwds=None)

Contains a CLUDA module. See Tutorial: modules and snippets for details.

Parameters:
  • template_src (str or Mako template.) – a Mako template with the module code, or a string with the template source.
  • render_kwds – a dictionary which will be used to render the template. Can contain other modules and snippets.

classmethod create(func_or_str, render_kwds=None)

Creates a module from the Mako def:

  • if func_or_str is a function, then the def has the same signature as func_or_str (prefix will be passed as the first positional parameter), and the body equal to the string it returns;
  • if func_or_str is a string, then the def has a single positional argument prefix. and the body code.

exception reikna.cluda.OutOfResourcesError

Thrown by compile_static() if the provided local_size is too big, or one cannot be found.

class reikna.cluda.Snippet(template_src, render_kwds=None)

Contains a CLUDA snippet. See Tutorial: modules and snippets for details.

Parameters:
  • template_src (str or Mako template.) – a Mako template with the module code, or a string with the template source.
  • render_kwds – a dictionary which will be used to render the template. Can contain other modules and snippets.

classmethod create(func_or_str, render_kwds=None)

Creates a snippet from the Mako def:

  • if func_or_str is a function, then the def has the same signature as func_or_str, and the body equal to the string it returns;
  • if func_or_str is a string, then the def has empty signature.

reikna.cluda.any_api()

Returns one of the API modules supported by the system or raises an Exception if there are not any.

reikna.cluda.api_ids()

Returns a list of identifiers for all known (not necessarily available for the current system) APIs.

reikna.cluda.cuda_api()

Returns the PyCUDA-based API module.

reikna.cluda.cuda_id()

Returns the identifier of the PyCUDA-based API.

reikna.cluda.find_devices(api, include_devices=None, exclude_devices=None, include_platforms=None, exclude_platforms=None, include_duplicate_devices=True, include_pure_only=False)

Find platforms and devices meeting certain criteria.

Parameters:
  • api – a CLUDA API object.
  • include_devices – a list of masks for a device name which will be used to pick devices to include in the result.
  • exclude_devices – a list of masks for a device name which will be used to pick devices to exclude from the result.
  • include_platforms – a list of masks for a platform name which will be used to pick platforms to include in the result.
  • exclude_platforms – a list of masks for a platform name which will be used to pick platforms to exclude in the result.
  • include_duplicate_devices – if False, will only include a single device from the several with the same name available on a platform.
  • include_pure_only – if True, will include devices with maximum group size equal to 1.
Returns:

a dictionary with found platform numbers as keys, and lists of device numbers as values.

reikna.cluda.get_api(api_id)

Returns an API module with the generalized interface reikna.cluda.api for the given identifier.

reikna.cluda.ocl_api()

Returns the PyOpenCL-based API module.

reikna.cluda.ocl_id()

Returns the identifier of the PyOpenCL-based API.

reikna.cluda.supported_api_ids()

Returns a list of identifiers of supported APIs.

reikna.cluda.supports_api(api_id)

Returns True if given API is supported.

API module

Modules for all APIs have the same generalized interface. It is referred here (and references from other parts of this documentation) as reikna.cluda.api.

class reikna.cluda.api.Buffer

Low-level untyped memory allocation. Actual class depends on the API: pycuda.driver.DeviceAllocation for CUDA and pyopencl.Buffer for OpenCL.

size

class reikna.cluda.api.Array

A superclass of the corresponding API’s native array (pycuda.gpuarray.GPUArray for CUDA and pyopencl.array.Array for OpenCL), with some additional functionality.

shape

dtype

strides

offset

The start of the array data in the memory buffer (in bytes).

base_data

The memory buffer where the array is located.

nbytes

The total size of the array data plus the offset (in bytes).

get()

Returns numpy.ndarray with the contents of the array. Synchronizes the parent Thread.

thread

The Thread object for which the array was created.

class reikna.cluda.api.DeviceParameters(device)

An assembly of device parameters necessary for optimizations.

api_id

Identifier of the API this device belongs to.

max_work_group_size

Maximum block size for kernels.

max_work_item_sizes

List with maximum local_size for each dimension.

max_num_groups

List with maximum number of workgroups for each dimension.

warp_size

Warp size (nVidia), or wavefront size (AMD), or SIMD width is supposed to be the number of threads that are executed simultaneously on the same computation unit (so you can assume that they are perfectly synchronized).

local_mem_banks

Number of local (shared in CUDA) memory banks is a number of successive 32-bit words you can access without getting bank conflicts.

local_mem_size

Size of the local (shared in CUDA) memory per workgroup, in bytes.

min_mem_coalesce_width

Dictionary {word_size:elements}, where elements is the number of elements with size word_size in global memory that allow coalesced access.

compute_units

The value of MULTIPROCESSOR_COUNT in CUDA and MAX_COMPUTE_UNITS in OpenCL.

supports_dtype(self, dtype)

Checks if given numpy dtype can be used in kernels compiled using this thread.

class reikna.cluda.api.Platform

A vendor-specific implementation of the GPGPU API.

name

Platform name.

vendor

Vendor name.

version

Platform version.

get_devices()

Returns a list of device objects available in the platform.

class reikna.cluda.api.Kernel(thr, program, name, static=False)

An object containing GPU kernel.

max_work_group_size

Maximum size of the work group for the kernel.

__call__(*args, **kwds)

A shortcut for successive call to prepare() and prepared_call(). In case of the OpenCL backend, returns a pyopencl.Event object.

prepare(global_size, local_size=None, local_mem=0)

Prepare the kernel for execution with given parameters.

Parameters:
  • global_size – an integer or a tuple of integers, specifying total number of work items to run.
  • local_size – an integer or a tuple of integers, specifying the size of a single work group. Should have the same number of dimensions as global_size. If None is passed, some local_size will be picked internally.
  • local_mem – (CUDA API only) amount of dynamic local memory (in bytes)

prepared_call(*args)

Execute the kernel. Array objects are allowed as arguments. In case of the OpenCL backend, returns a pyopencl.Event object.

set_constant(name, arr)

Load a constant array (arr can be either numpy array or a Array object) corresponding to the symbol name to device. Note that all the kernels belonging to the same Program object share constant arrays.

class reikna.cluda.api.Program(thr, src, static=False, fast_math=False, compiler_options=None, constant_arrays=None, keep=False)

An object with compiled GPU code.

source

Contains module source code.

kernel_name

Contains Kernel object for the kernel kernel_name.

set_constant(name, arr)

Load a constant array (arr can be either numpy array or a Array object) corresponding to the symbol name to device.

class reikna.cluda.api.StaticKernel(thr, template_src, name, global_size, local_size=None, render_args=None, render_kwds=None, fast_math=False, compiler_options=None, constant_arrays=None, keep=False)

An object containing a GPU kernel with fixed call sizes.

source

Contains the source code of the program.

__call__(*args)

Execute the kernel. Array objects are allowed as arguments. In case of the OpenCL backend, returns a pyopencl.Event object.

set_constant(name, arr)

Load a constant array (arr can be either numpy array or a Array object) corresponding to the symbol name to device.

class reikna.cluda.api.Thread(cqd, async_=True, temp_alloc=None)

Wraps an existing context in the CLUDA thread object.

Parameters:
  • cqd – a Context, Device or Stream/CommandQueue object to base on. If a context is passed, a new stream/queue will be created internally.
  • async – whether to execute all operations with this thread asynchronously (you would generally want to set it to False only for profiling purposes).

Note

If you are using CUDA API, you must keep in mind the stateful nature of CUDA calls. Briefly, this means that there is the context stack, and the current context on top of it. When the create() is called, the PyCUDA context gets pushed to the stack and made current. When the thread object goes out of scope (and the thread object owns it), the context is popped, and it is the user’s responsibility to make sure the popped context is the correct one. In simple single-context programs this only means that one should avoid reference cycles involving the thread object.

Warning

Do not pass one Stream/CommandQueue object to several Thread objects.

api

Module object representing the CLUDA API corresponding to this Thread.

device_params

Instance of DeviceParameters class for this thread’s device.

temp_alloc

Instance of TemporaryManager which handles allocations of temporary arrays (see temp_array()).

allocate(size)

Creates an untyped memory allocation object of type Buffer with size size.

array(shape, dtype, strides=None, offset=0, nbytes=None, base=None, base_data=None, allocator=None)

Creates an Array on GPU with given shape, dtype, strides and offset.

If nbytes is None, the size of the allocated memory buffer is chosen to be the minimum one to fit all the elements of the array, based on shape, dtype and strides (if provided). If offset is not 0, an additional offset bytes is added at the beginning of the buffer.

Note

Reikna computations (including the template functions load_idx(), store_idx() etc), high-level PyCUDA/PyOpenCL functions and PyCUDA kernels take offset into account automatically and address arrays starting from the position of the actual data. Reikna kernels (created with compile() and compile_static()) and PyOpenCL kernels receive base addresses of arrays, and thus have to add offsets manually.

If base, base_data and nbytes are None, the total allocated size will be the minimum size required for the array data (based on shape and strides) plus offset.

If base and base_data are None, but nbytes is not, nbytes bytes will be allocated for the array (this includes the offset).

If base_data (a memory buffer) is not None, it will be used as the underlying buffer for the array, with the actual data starting at the offset bytes from the beginning of base_data. No size checking to make sure the array and the offset fit it will be performed.

If base (an Array object) is not None, its buffer is used as the underlying buffer for the array, with the actual data starting at the offset bytes from the beginning of base.base_data. base_data will be ignored.

Optionally, an allocator is a callable returning any object castable to int representing the physical address on the device (for instance, Buffer).

compile(template_src, render_args=None, render_kwds=None, fast_math=False, compiler_options=None, constant_arrays=None, keep=False)

Creates a module object from the given template.

Parameters:
  • template_src – Mako template source to render
  • render_args – an iterable with positional arguments to pass to the template.
  • render_kwds – a dictionary with keyword parameters to pass to the template.
  • fast_math – whether to enable fast mathematical operations during compilation.
  • compiler_options – a list of strings to be passed to the compiler as arguments.
  • constant_arrays – (CUDA only) a dictionary {name: metadata} of constant memory arrays to be declared in the compiled program. metadata can be either an array-like object (possessing shape and dtype attributes), or a pair (shape, dtype).
  • keep – if True, preserve the source file being compiled and the accompanying binaries (if any). With PyCUDA backend, it is used as the keep option when creating SourceModule. With PyOpenCL backend, it is used as the cache_dir option for Program.build() (and, additionally, the kernel source itself is put there).
Returns:

a Program object.

compile_static(template_src, name, global_size, local_size=None, render_args=None, render_kwds=None, fast_math=False, compiler_options=None, constant_arrays=None, keep=False)

Creates a kernel object with fixed call sizes, which allows to overcome some backend limitations. Global and local sizes can have any length, providing that len(global_size) >= len(local_size), and the total number of work items and work groups is less than the corresponding total number available for the device. In order to get IDs and sizes in such kernels, virtual size functions have to be used (see VIRTUAL_SKIP_THREADS and others for details).

Parameters:
  • template_src – Mako template or a template source to render
  • name – name of the kernel function
  • global_size – global size to be used, in row-major order.
  • local_size – local size to be used, in row-major order. If None, some suitable one will be picked.
  • local_mem – (CUDA API only) amount of dynamically allocated local memory to be used (in bytes).

The rest of the keyword parameters are the same as for compile().

Returns: a StaticKernel object.

copy_array(arr, dest=None, src_offset=0, dest_offset=0, size=None)

Copies array on device.

Parameters:
  • dest – the effect is the same as in to_device().
  • src_offset – offset (in items of arr.dtype) in the source array.
  • dest_offset – offset (in items of arr.dtype) in the destination array.
  • size – how many elements of arr.dtype to copy.

classmethod create(interactive=False, device_filters=None, **thread_kwds)

Creates a new Thread object with its own context and queue inside. Intended for cases when you want to base your whole program on CLUDA.

Parameters:
  • interactive – ask a user to choose a platform and a device from the ones found. If there is only one platform/device available, they will be chosen automatically.
  • device_filters – keywords to filter devices (see the keywords for find_devices()).
  • thread_kwds – keywords to pass to Thread constructor.

empty_like(arr)

Allocates an array on GPU with the same attributes (shape, dtype, strides, offset and nbytes) as arr.

Warning

Note that pycuda.GPUArray objects do not have the offset attribute.

from_device(arr, dest=None, async_=False)

Transfers the contents of arr to a numpy.ndarray object. The effect of dest parameter is the same as in to_device(). If async_ is True, the transfer is asynchronous (the thread-wide asynchronisity setting does not apply here).

Alternatively, one can use Array.get().

release()

Forcefully free critical resources (rendering the object unusable). In most cases you can rely on the garbage collector taking care of things. Calling this method explicitly may be necessary in case of CUDA API when you want to make sure the context got popped.

synchronize()

Forcefully synchronize this thread with the main program.

temp_array(shape, dtype, strides=None, offset=0, nbytes=None, dependencies=None)

Creates an Array on GPU with given shape, dtype, strides, offset and nbytes (see array() for details). In order to reduce the memory footprint of the program, the temporary array manager will allow these arrays to overlap. Two arrays will not overlap, if one of them was specified in dependencies for the other one. For a list of values dependencies takes, see the reference entry for TemporaryManager.

to_device(arr, dest=None)

Copies an array to the device memory. If dest is specified, it is used as the destination, and the method returns None. Otherwise the destination array is created internally and returned from the method.

reikna.cluda.api.get_id()

Returns the identifier of this API.

reikna.cluda.api.get_platforms()

Returns a list of available Platform objects. In case of OpenCL returned objects are actually instances of pyopencl.Platform.

Temporary Arrays

Each Thread contains a special allocator for arrays with data that does not have to be persistent all the time. In many cases you only want some array to keep its contents between several kernel calls. This can be achieved by manually allocating and deallocating such arrays every time, but it slows the program down, and you have to synchronize the queue because allocation commands are not serialized. Therefore it is advantageous to use temp_array() method to get such arrays. It takes a list of dependencies as an optional parameter which gives the allocator a hint about which arrays should not use the same physical allocation.

class reikna.cluda.tempalloc.TemporaryManager(thr, pack_on_alloc=False, pack_on_free=False)

Base class for a manager of temporary allocations.

Parameters:
  • thr – an instance of Thread.
  • pack_on_alloc – whether to repack allocations when a new allocation is requested.
  • pack_on_free – whether to repack allocations when an allocation is freed.

array(shape, dtype, strides=None, offset=0, nbytes=None, dependencies=None)

Returns a temporary array.

Parameters:
  • shape – shape of the array.
  • dtype – data type of the array.
  • strides – tuple of bytes to step in each dimension when traversing an array.
  • offset – the array offset (in bytes)
  • nbytes – the buffer size for the array (if None, the minimum required size will be used).
  • dependencies – can be a Array instance (the ones containing persistent allocations will be ignored), an iterable with valid values, or an object with the attribute __tempalloc__ which is a valid value (the last two will be processed recursively).

pack()

Packs the real allocations possibly reducing total memory usage. This process can be slow.

class reikna.cluda.tempalloc.TrivialManager(*args, **kwds)

Trivial manager — allocates a separate buffer for each allocation request.

class reikna.cluda.tempalloc.ZeroOffsetManager(*args, **kwds)

Tries to assign several allocation requests to a single real allocation, if dependencies allow that. All virtual allocations start from the beginning of real allocations.

Function modules

This module contains Module factories which are used to compensate for the lack of complex number operations in OpenCL, and the lack of C++ synthax which would allow one to write them.

reikna.cluda.functions.add(*in_dtypes, out_dtype=None)

Returns a Module with a function of len(in_dtypes) arguments that adds values of types in_dtypes. If out_dtype is given, it will be set as a return type for this function.

This is necessary since on some platforms the + operator for a complex and a real number works in an unexpected way (returning (a.x + b, a.y + b) instead of (a.x + b, a.y)).

reikna.cluda.functions.cast(out_dtype, in_dtype)

Returns a Module with a function of one argument that casts values of in_dtype to out_dtype.

reikna.cluda.functions.conj(dtype)

Returns a Module with a function of one argument that conjugates the value of type dtype (must be a complex data type).

reikna.cluda.functions.div(in_dtype1, in_dtype2, out_dtype=None)

Returns a Module with a function of two arguments that divides values of in_dtype1 and in_dtype2. If out_dtype is given, it will be set as a return type for this function.

reikna.cluda.functions.exp(dtype)

Returns a Module with a function of one argument that exponentiates the value of type dtype (must be a real or complex data type).

reikna.cluda.functions.mul(*in_dtypes, out_dtype=None)

Returns a Module with a function of len(in_dtypes) arguments that multiplies values of types in_dtypes. If out_dtype is given, it will be set as a return type for this function.

reikna.cluda.functions.norm(dtype)

Returns a Module with a function of one argument that returns the 2-norm of the value of type dtype (product by the complex conjugate if the value is complex, square otherwise).

reikna.cluda.functions.polar(dtype)

Returns a Module with a function of two arguments that returns the complex-valued rho * exp(i * theta) for values rho, theta of type dtype (must be a real data type).

reikna.cluda.functions.polar_unit(dtype)

Returns a Module with a function of one argument that returns a complex number (cos(theta), sin(theta)) for a value theta of type dtype (must be a real data type).

reikna.cluda.functions.pow(dtype, exponent_dtype=None, output_dtype=None)

Returns a Module with a function of two arguments that raises the first argument of type dtype to the power of the second argument of type exponent_dtype (an integer or real data type). If exponent_dtype or output_dtype are not given, they default to dtype. If dtype is not the same as output_dtype, the input is cast to output_dtype before exponentiation. If exponent_dtype is real, but both dtype and output_dtype are integer, a ValueError is raised.

Kernel toolbox

The stuff available for the kernel passed for compilation consists of two parts.

First, there are several objects available at the template rendering stage, namely numpy, reikna.cluda.dtypes (as dtypes), and reikna.helpers (as helpers).

Second, there is a set of macros attached to any kernel depending on the API it is being compiled for:

CUDA

If defined, specifies that the kernel is being compiled for CUDA API.

COMPILE_FAST_MATH

If defined, specifies that the compilation for this kernel was requested with fast_math == True.

LOCAL_BARRIER

Synchronizes threads inside a block.

WITHIN_KERNEL

Modifier for a device-only function declaration.

KERNEL

Modifier for a kernel function declaration.

GLOBAL_MEM

Modifier for a global memory pointer argument.

LOCAL_MEM

Modifier for a statically allocated local memory variable.

LOCAL_MEM_DYNAMIC

Modifier for a dynamically allocated local memory variable.

LOCAL_MEM_ARG

Modifier for a local memory argument in device-only functions.

CONSTANT_MEM

Modifier for a statically allocated constant memory variable.

CONSTANT_MEM_ARG

Modifier for a constant memory argument in device-only functions.

INLINE

Modifier for inline functions.

SIZE_T

The type of local/global IDs and sizes. Equal to unsigned int for CUDA, and size_t for OpenCL (which can be 32- or 64-bit unsigned integer, depending on the device).

SIZE_T get_local_id(int dim)

SIZE_T get_group_id(int dim)

SIZE_T get_global_id(int dim)

SIZE_T get_local_size(int dim)

SIZE_T get_num_groups(int dim)

SIZE_T get_global_size(int dim)

Local, group and global identifiers and sizes. In case of CUDA mimic the behavior of corresponding OpenCL functions.

VSIZE_T

The type of local/global IDs in the virtual grid. It is separate from SIZE_T because the former is intended to be equivalent to what the backend is using, while VSIZE_T is a separate type and can be made larger than SIZE_T in the future if necessary.

ALIGN(int)

Used to specify an explicit alignment (in bytes) for fields in structures, as

typedef struct {
    char ALIGN(4) a;
    int b;
} MY_STRUCT;

VIRTUAL_SKIP_THREADS

This macro should start any kernel compiled with compile_static(). It skips all the empty threads resulting from fitting call parameters into backend limitations.

VSIZE_T virtual_local_id(int dim)

VSIZE_T virtual_group_id(int dim)

VSIZE_T virtual_global_id(int dim)

VSIZE_T virtual_local_size(int dim)

VSIZE_T virtual_num_groups(int dim)

VSIZE_T virtual_global_size(int dim)

VSIZE_T virtual_global_flat_id()

VSIZE_T virtual_global_flat_size()

Only available in StaticKernel objects obtained from compile_static(). Since its dimensions can differ from actual call dimensions, these functions have to be used.

Datatype tools

This module contains various convenience functions which operate with numpy.dtype objects.

reikna.cluda.dtypes.align(dtype)

Returns a new struct dtype with the field offsets changed to the ones a compiler would use (without being given any explicit alignment qualifiers). Ignores all existing explicit itemsizes and offsets.

reikna.cluda.dtypes.c_constant(val, dtype=None)

Returns a C-style numerical constant. If val has a struct dtype, the generated constant will have the form { ... } and can be used as an initializer for a variable.

reikna.cluda.dtypes.c_path(path)

Returns a string corresponding to the path to a struct element in C. The path is the sequence of field names/array indices returned from flatten_dtype().

reikna.cluda.dtypes.cast(dtype)

Returns function that takes one argument and casts it to dtype.

reikna.cluda.dtypes.complex_ctr(dtype)

Returns name of the constructor for the given dtype.

reikna.cluda.dtypes.complex_for(dtype)

Returns complex dtype corresponding to given floating point dtype.

reikna.cluda.dtypes.ctype(dtype)

For a built-in C type, returns a string with the name of the type.

reikna.cluda.dtypes.ctype_module(dtype, ignore_alignment=False)

For a struct type, returns a Module object with the typedef of a struct corresponding to the given dtype (with its name set to the module prefix); falls back to ctype() otherwise.

The structure definition includes the alignment required to produce field offsets specified in dtype; therefore, dtype must be either a simple type, or have proper offsets and dtypes (the ones that can be reporoduced in C using explicit alignment attributes, but without additional padding) and the attribute isalignedstruct == True. An aligned dtype can be produced either by standard means (aligned flag in numpy.dtype constructor and explicit offsets and itemsizes), or created out of an arbitrary dtype with the help of align().

If ignore_alignment is True, all of the above is ignored. The C structures produced will not have any explicit alignment modifiers. As a result, the the field offsets of dtype may differ from the ones chosen by the compiler.

Modules are cached and the function returns a single module instance for equal dtype’s. Therefore inside a kernel it will be rendered with the same prefix everywhere it is used. This results in a behavior characteristic for a structural type system, same as for the basic dtype-ctype conversion.

Warning

As of numpy 1.8, the isalignedstruct attribute is not enough to ensure a mapping between a dtype and a C struct with only the fields that are present in the dtype. Therefore, ctype_module will make some additional checks and raise ValueError if it is not the case.

reikna.cluda.dtypes.detect_type(val)

Find out the data type of val.

reikna.cluda.dtypes.extract_field(arr, path)

Extracts an element from an array of struct dtype. The path is the sequence of field names/array indices returned from flatten_dtype().

reikna.cluda.dtypes.flatten_dtype(dtype)

Returns a list of tuples (path, dtype) for each of the basic dtypes in a (possibly nested) dtype. path is a list of field names/array indices leading to the corresponding element.

reikna.cluda.dtypes.is_complex(dtype)

Returns True if dtype is complex.

reikna.cluda.dtypes.is_double(dtype)

Returns True if dtype is double precision floating point.

reikna.cluda.dtypes.is_integer(dtype)

Returns True if dtype is an integer.

reikna.cluda.dtypes.is_real(dtype)

Returns True if dtype is a real.

reikna.cluda.dtypes.min_scalar_type(val)

Wrapper for numpy.min_scalar_dtype which takes into account types supported by GPUs.

reikna.cluda.dtypes.normalize_type(dtype)

Function for wrapping all dtypes coming from the user. numpy uses two different classes to represent dtypes, and one of them does not have some important attributes.

reikna.cluda.dtypes.normalize_types(dtypes)

Same as normalize_type(), but operates on a list of dtypes.

reikna.cluda.dtypes.real_for(dtype)

Returns floating point dtype corresponding to given complex dtype.

reikna.cluda.dtypes.result_type(*dtypes)

Wrapper for numpy.result_type which takes into account types supported by GPUs.

reikna.cluda.dtypes.zero_ctr(dtype)

Returns the string with constructed zero value for the given dtype.

Core functionality

Classes necessary to create computations and transformations are exposed from the core module.

Computation signatures

class reikna.core.Type(dtype, shape=None, strides=None, offset=0, nbytes=None)

Represents an array or, as a degenerate case, scalar type of a computation parameter.

shape

A tuple of integers. Scalars are represented by an empty tuple.

dtype

A numpy.dtype instance.

ctype

A string with the name of C type corresponding to dtype, or a module if it is a struct type.

strides

Tuple of bytes to step in each dimension when traversing an array.

offset

The initial offset (in bytes).

nbytes

The total size of the memory buffer (in bytes)

__call__(val)

Casts the given value to this type.

classmethod from_value(val)

Creates a Type object corresponding to the given value.

classmethod padded(dtype, shape, pad=0)

Creates a Type object corresponding to an array padded from all dimensions by pad elements.

class reikna.core.Annotation(type_, role=None, constant=False)

Computation parameter annotation, in the same sense as it is used for functions in the standard library.

Parameters:
  • type – a Type object.
  • role – any of 'i' (input), 'o' (output), 'io' (input/output), 's' (scalar). Defaults to 's' for scalars, 'io' for regular arrays and 'i' for constant arrays.
  • constant – if True, corresponds to a constant (cached) array.

class reikna.core.Parameter(name, annotation, default=)

Computation parameter, in the same sense as it is used for functions in the standard library. In its terms, all computation parameters have kind POSITIONAL_OR_KEYWORD.

Parameters:
  • name – parameter name.
  • annotation – an Annotation object.
  • default – default value for the parameter, can only be specified for scalars.

rename(new_name)

Creates a new Parameter object with the new name and the same annotation and default value.

class reikna.core.Signature(parameters)

Computation signature, in the same sense as it is used for functions in the standard library.

Parameters: parameters – a list of Parameter objects.

parameters

An OrderedDict with Parameter objects indexed by their names.

bind_with_defaults(args, kwds, cast=False)

Binds passed positional and keyword arguments to parameters in the signature and returns the resulting BoundArguments object.

Core classes

class reikna.core.Computation(root_parameters)

A base class for computations, intended to be subclassed.

Parameters: root_parameters – a list of Parameter objects.

signature

A Signature object representing current computation signature (taking into account connected transformations).

parameter

A named tuple of ComputationParameter objects corresponding to parameters from the current signature.

_build_plan(plan_factory, device_params, *args)

Derived classes override this method. It is called by compile() and supposed to return a ComputationPlan object.

Parameters:
  • plan_factory – a callable returning a new ComputationPlan object.
  • device_params – a DeviceParameters object corresponding to the thread the computation is being compiled for.
  • argsKernelArgument objects, corresponding to parameters specified during the creation of this computation object.

_update_attributes()

Updates signature and parameter attributes. Called by the methods that change the signature.

compile(thread, fast_math=False, compiler_options=None, keep=False)

Compiles the computation with the given Thread object and returns a ComputationCallable object. If fast_math is enabled, the compilation of all kernels is performed using the compiler options for fast and imprecise mathematical functions. compiler_options can be used to pass a list of strings as arguments to the backend compiler. If keep is True, the generated kernels and binaries will be preserved in temporary directories.

connect(_comp_connector, _trf, _tr_connector, **tr_from_comp)

Connect a transformation to the computation.

Parameters:
  • _comp_connector – connection target — a ComputationParameter object belonging to this computation object, or a string with its name.
  • _trf – a Transformation object.
  • _tr_connector – connector on the side of the transformation — a TransformationParameter object belonging to tr, or a string with its name.
  • tr_from_comp – a dictionary with the names of new or old computation parameters as keys, and TransformationParameter objects (or their names) as values. The keys of tr_from_comp cannot include the name of the connection target.
Returns:

this computation object (modified).

Note

The resulting parameter order is determined by traversing the graph of connections depth-first (starting from the initial computation parameters), with the additional condition: the nodes do not change their order in the same branching level (i.e. in the list of computation or transformation parameters, both of which are ordered).

For example, consider a computation with parameters (a, b, c, d). If you connect a transformation (a', c) -> a, the resulting computation will have the signature (a', b, c, d) (as opposed to (a', c, b, d) it would have for the pure depth-first traversal).

class reikna.core.Transformation(parameters, code, render_kwds=None, connectors=None)

A class containing a pure parallel transformation of arrays. Some restrictions apply:

  • it cannot use local memory;
  • it cannot use global/local id getters (and depends only on externally passed indices);
  • it cannot have 'io' arguments;
  • it has at least one argument that uses load_same or store_same, and does it only once.
Parameters:
  • parameters – a list of Parameter objects.
  • code – a source template for the transformation. Will be wrapped in a template def with positional arguments with the names of objects in parameters.
  • render_kwds – a dictionary with render keywords that will be passed to the snippet.
  • connectors – a list of parameter names suitable for connection. Defaults to all non-scalar parameters.

Result and attribute classes

class reikna.core.Indices(shape)

Encapsulates the information about index variables available for the snippet.

__getitem__(dim)

Returns the name of the index varibale for the dimension dim.

all()

Returns the comma-separated list of all index variable names (useful for passing the guiding indices verbatim in a load or store call).

class reikna.core.computation.ComputationCallable(thread, parameters, kernel_calls, internal_args, temp_buffers)

A result of calling compile() on a computation. Represents a callable opaque GPGPU computation.

thread

A Thread object used to compile the computation.

signature

A Signature object.

parameter

A named tuple of Type objects corresponding to the callable’s parameters.

__call__(*args, **kwds)

Execute the computation. In case of the OpenCL backend, returns a list of pyopencl.Event objects from nested kernel calls.

class reikna.core.computation.ComputationParameter(computation, name, type_)

Bases: Type

Represents a typed computation parameter. Can be used as a substitute of an array for functions which are only interested in array metadata.

connect(_trf, _tr_connector, **tr_from_comp)

Shortcut for connect() with this parameter as a first argument.

class reikna.core.computation.KernelArgument(name, type_)

Bases: Type

Represents an argument suitable to pass to planned kernel or computation call.

class reikna.core.computation.ComputationPlan(tr_tree, translator, thread, fast_math, compiler_options, keep)

Computation plan recorder.

computation_call(computation, *args, **kwds)

Adds a nested computation call. The computation value must be a Computation object. args and kwds are values to be passed to the computation.

constant_array(arr)

Adds a constant GPU array to the plan, and returns the corresponding KernelArgument.

kernel_call(template_def, args, global_size, local_size=None, render_kwds=None, kernel_name='_kernel_func')

Adds a kernel call to the plan.

Parameters:
  • template_def – Mako template def for the kernel.
  • args – a list consisting of KernelArgument objects, or scalar values wrapped in numpy.ndarray, that are going to be passed to the kernel during execution.
  • global_size – global size to use for the call, in row-major order.
  • local_size – local size to use for the call, in row-major order. If None, the local size will be picked automatically.
  • render_kwds – dictionary with additional values used to render the template.
  • kernel_name – the name of the kernel function.

persistent_array(arr)

Adds a persistent GPU array to the plan, and returns the corresponding KernelArgument.

temp_array(shape, dtype, strides=None, offset=0, nbytes=None)

Adds a temporary GPU array to the plan, and returns the corresponding KernelArgument. See array() for the information about the parameters.

Temporary arrays can share physical memory, but in such a way that their contents is guaranteed to persist between the first and the last use in a kernel during the execution of the plan.

temp_array_like(arr)

Same as temp_array(), taking the array properties from array or array-like object arr.

Warning

Note that pycuda.GPUArray objects do not have the offset attribute.

class reikna.core.transformation.TransformationParameter(trf, name, type_)

Bases: Type

Represents a typed transformation parameter. Can be used as a substitute of an array for functions which are only interested in array metadata.

class reikna.core.transformation.KernelParameter(name, type_, load_idx=None, store_idx=None, load_same=None, store_same=None, load_combined_idx=None, store_combined_idx=None)

Providing an interface for accessing kernel arguments in a template. Depending on the parameter type, and whether it is used inside a computation or a transformation template, can have different load/store attributes available.

name

Parameter name

shape

dtype

ctype

strides

offset

Same as in Type.

__str__()

Returns the C kernel parameter name corresponding to this parameter. It is the only method available for scalar parameters.

load_idx

A module providing a macro with the signature (idx0, idx1, ...), returning the corresponding element of the array.

store_idx

A module providing a macro with the signature (idx0, idx1, ..., val), saving val into the specified position.

load_combined_idx(slices)

A module providing a macro with the signature (cidx0, cidx1, ...), returning the element of the array corresponding to the new slicing of indices (e.g. an array with shape (2, 3, 4, 5, 6) sliced as slices=(2, 2, 1) is indexed as an array with shape (6, 20, 6)).

store_combined_idx(slices)

A module providing a macro with the signature (cidx0, cidx1, ..., val), saving val into the specified position corresponding to the new slicing of indices.

load_same

A module providing a macro that returns the element of the array corresponding to the indices used by the caller of the transformation.

store_same

A module providing a macro with the signature (val) that stores val using the indices used by the caller of the transformation.

Computations

Algorithms

General purpose algorithms.

Pure parallel computations

class reikna.algorithms.PureParallel(parameters, code, guiding_array=None, render_kwds=None)

Bases: Computation

A general class for pure parallel computations (i.e. with no interaction between threads).

Parameters:
  • parameters – a list of Parameter objects.
  • code – a source code for the computation. Can be a Snippet object which will be passed Indices object for the guiding_array as the first positional argument, and KernelParameter objects corresponding to parameters as the rest of positional arguments. If it is a string, such Snippet will be created out of it, with the parameter names idxs for the first one and the names of parameters for the remaining ones.
  • guiding_array – an tuple with the array shape, or the name of one of parameters. By default, the first parameter is chosen.
  • render_kwds – a dictionary with render keywords for the code.

compiled_signature(*args)

Parameters: args – corresponds to the given parameters.

classmethod from_trf(trf, guiding_array=None)

Creates a PureParallel instance from a Transformation object. guiding_array can be a string with a name of an array parameter from trf, or the corresponding TransformationParameter object.

Transposition (permutation)

class reikna.algorithms.Transpose(arr_t, output_arr_t=None, axes=None, block_width_override=None)

Bases: Computation

Changes the order of axes in a multidimensional array. Works analogous to numpy.transpose.

Parameters:
  • arr_t – an array-like defining the initial array.
  • output_arr_t – an array-like defining the output array. If None, its shape will be derived based on the shape of arr_t, its dtype will be equal to that of arr_t, and any non-default offset or strides of arr_t will be ignored.
  • axes – tuple with the new axes order. If None, then axes will be reversed.

compiled_signature(output:o, input:i)

Parameters:
  • output – an array with all the attributes of arr_t, with the shape permuted according to axes.
  • input – an array with all the attributes of arr_t.

Reduction

class reikna.algorithms.Reduce(arr_t, predicate, axes=None, output_arr_t=None)

Bases: Computation

Reduces the array over given axis using given binary operation.

Parameters:
  • arr_t – an array-like defining the initial array.
  • predicate – a Predicate object.
  • axes – a list of non-repeating axes to reduce over. If None, the whole array will be reduced (in which case the shape of the output array is (1,)).
  • output_arr_t – an output array metadata (the shape must still correspond to the result of reducing the original array over given axes, but offset and strides can be set to the desired ones).

compiled_signature(output:o, input:i)

Parameters:
  • input – an array with the attributes of arr_t.
  • output – an array with the attributes of arr_t, with its shape missing axes from axes.

Scan

class reikna.algorithms.Scan(arr_t, predicate, axes=None, exclusive=False, max_work_group_size=None, seq_size=None)

Bases: Computation

Scans the array over given axis using given binary operation. Namely, from an array [a, b, c, d, ...] and an operation ., produces [a, a.b, a.b.c, a.b.c.d, ...] if exclusive is False and [0, a, a.b, a.b.c, ...] if exclusive is True (here 0 is the operation’s identity element).

Parameters:
  • arr_t – an array-like defining the initial array.
  • predicate – a Predicate object.
  • axes – a list of non-repeating axes to scan over. (Note that the result will depend on the order of the axes). If None, the whole array will be scanned over. This means that the selected axes will be flattened (in the specified order) and treated like a single axis for the purposes of the scan.
  • exclusive – whether to perform an exclusive scan (see above).
  • max_work_group_size – the maximum workgroup size to be used for the scan kernel.
  • seq_size – the number of elements to be scanned sequentially. If not given, Reikna will attempt to choose the one resulting in the best performance, but sometimes a manual choice may be better.

compiled_signature(output:o, input:i)

Parameters:
  • input – an array with the attributes of arr_t.
  • output – an array with the attributes of arr_t.

Predicates

class reikna.algorithms.Predicate(operation, empty)

A predicate used in some of Reikna algorithms (e.g. Reduce or Scan).

Parameters:
  • operation – a Snippet object with two parameters which will take the names of two arguments to join.
  • empty – a numpy scalar with the empty value of the argument (the one which, being joined by another argument, does not change it).

reikna.algorithms.predicate_sum(dtype)

Returns a Predicate object which sums its arguments.

Linear algebra

Linear algebra algorithms.

Matrix multiplication (dot product)

class reikna.linalg.MatrixMul(a_arr, b_arr, out_arr=None, block_width_override=None, transposed_a=False, transposed_b=False)

Bases: Computation

Multiplies two matrices using last two dimensions and batching over remaining dimensions. For batching to work, the products of remaining dimensions should be equal (then the multiplication will be performed piecewise), or one of them should equal 1 (then the multiplication will be batched over the remaining dimensions of the other matrix).

Parameters:
  • a_arr – an array-like defining the first argument.
  • b_arr – an array-like defining the second argument.
  • out_arr – an array-like definign the output; if not given, both shape and dtype will be derived from a_arr and b_arr.
  • block_width_override – if provided, it will used as a block size of the multiplication kernel.
  • transposed_a – if True, the first matrix will be transposed before the multiplication.
  • transposed_b – if True, the second matrix will be transposed before the multiplication.

compiled_signature(output:o, matrix_a:i, matrix_b:i)

Parameters:
  • output – the output of matrix multiplication.
  • matrix_a – the first argument.
  • matrix_b – the second argument.

Matrix norms

class reikna.linalg.EntrywiseNorm(arr_t, order=2, axes=None)

Bases: Computation

Calculates the entrywise matrix norm (same as numpy.linalg.norm) of an arbitrary order r

:

||A||r=(∑i,j,…|Ai,j,…|r)1/r

Parameters:
  • arr_t – an array-like defining the initial array.
  • order – the order r
  • (any real number).
  • axes – a list of non-repeating axes to sum over. If None, the norm of the whole array will be calculated.

compiled_signature(output:o, input:i)

Parameters:
  • input – an array with the attributes of arr_t.
  • output – an array with the attributes of arr_t, with its shape missing axes from axes.

Fast Fourier transform and related utilities

Fast Fourier Transform

class reikna.fft.FFT(arr_t, axes=None)

Bases: Computation

Performs the Fast Fourier Transform. The interface is similar to numpy.fft.fftn. The inverse transform is normalized so that IFFT(FFT(X)) = X.

Parameters:
  • arr_t – an array-like defining the problem array.
  • axes – a tuple with axes over which to perform the transform. If not given, the transform is performed over all the axes.

Note

Current algorithm works most effectively with array dimensions being power of 2 This mostly applies to the axes over which the transform is performed, beacuse otherwise the computation falls back to the Bluestein’s algorithm, which effectively halves the performance.

compiled_signature(output:o, input:i, inverse:s)

output and input may be the same array.

Parameters:
  • output – an array with the attributes of arr_t.
  • input – an array with the attributes of arr_t.
  • inverse – a scalar value castable to integer. If 0, output contains the forward FFT of input, if 1, the inverse one.

FFT frequency shift

class reikna.fft.FFTShift(arr_t, axes=None)

Bases: Computation

Shift the zero-frequency component to the center of the spectrum. The interface is similar to numpy.fft.fftshift, and the output is the same for the same array shape and axes.

Parameters:
  • arr_t – an array-like defining the problem array.
  • axes – a tuple with axes over which to perform the shift. If not given, the shift is performed over all the axes.

compiled_signature(output:o, input:i)

output and input may be the same array.

Parameters:
  • output – an array with the attributes of arr_t.
  • input – an array with the attributes of arr_t.

Discrete harmonic transform

reikna.dht.get_spatial_grid(modes, order, add_points=0)

Returns the spatial grid required to calculate the order power of a function defined in the harmonic mode space of the size modes. If add_points is 0, the grid has the minimum size required for exact transformation back to the mode space.

reikna.dht.harmonic(mode)

Returns an eigenfunction of order n=mode for the harmonic oscillator:

ϕn=1π−−√42nn!−−−−√Hn(x)exp(−x2/2),

where Hn is the n-th order “physicists’” Hermite polynomial. The normalization is chosen so that ∫ϕ2n(x)dx=1.

class reikna.dht.DHT(mode_arr, add_points=None, inverse=False, order=1, axes=None)

Bases: Computation

Discrete transform to and from harmonic oscillator modes. With inverse=True transforms a function defined by its expansion Cm,m=0…M−1

in the mode space with mode functions from harmonic(), to the coordinate space (F(x) on the grid x from get_spatial_grid()). With inverse=False guarantees to recover first M modes of Fk(x), where k

is the order parameter.

For multiple dimensions the operation is the same, and the mode functions are products of 1D mode functions, i.e. ϕ3Dl,m,n(x,y,z)=ϕl(x)ϕm(y)ϕn(z)

.

For the detailed description of the algorithm, see Dion & Cances, PRE 67(4) 046706 (2003)

Parameters:
  • mode_arr – an array-like object defining the shape of mode space. If inverse=False, its shape is used to define the mode space size.
  • inverseFalse for forward (coordinate space -> mode space) transform, True for inverse (mode space -> coordinate space) transform.
  • axes – a tuple with axes over which to perform the transform. If not given, the transform is performed over all the axes.
  • order – if F is a function in mode space, the number of spatial points is chosen so that the transformation DHT[(DHT^{-1}[F])^order] could be performed.
  • add_points – a list of the same length as mode_arr shape, specifying the number of points in x-space to use in addition to minimally required (0 by default).

compiled_signature_forward(modes:o, coords:i)

compiled_signature_inverse(coords:o, modes:i)

Depending on inverse value, either of these two will be created.

Parameters:
  • modes – an array with the attributes of mode_arr.
  • coords – an array with the shape depending on mode_arr, axes, order and add_points, and the dtype of mode_arr.

Counter-based random number generators

This module is based on the paper by Salmon et al., P. Int. C. High. Perform. 16 (2011). and the source code of Random123 library.

A counter-based random-number generator (CBRNG) is a parametrized function fk(c) , where k is the key, c is the counter, and the function fk

defines a bijection in the set of integer numbers. Being applied to successive counters, the function produces a sequence of pseudo-random numbers. The key is an analogue of the seed of stateful RNGs; if the CBRNG is used to generate random num bers in parallel threads, the key is a combination of a seed and a unique thread number.

There are two types of generators available, threefry (uses large number of simple functions), and philox (uses smaller number of more complicated functions). The latter one is generally faster on GPUs; see the paper above for detailed comparisons. These generators can be further specialized to use words=2 or words=4 bitness=32-bit or bitness=64-bit counters. Obviously, the period of the generator equals to the cardinality of the set of possible counters. For example, if the counter consits of 4 64-bit numbers, then the period of the generator is 2256 . As for the key size, in case of threefry the key has the same size as the counter, and for philox the key is half its size.

The CBRNG class sets one of the words of the key (except for philox-2x64, where 32 bit of the only word in the key are used), the rest are the same for all threads and are derived from the provided seed. This limits the maximum number of number-generating threads (size). philox-2x32 has a 32-bit key and therefore cannot be used in CBRNG (although it can be used separately with the help of the kernel API).

The CBRNG class itself is stateless, same as other computations in Reikna, so you have to manage the generator state yourself. The state is created by the create_counters() method and contains a size counters. This state is then passed to, and updated by a CBRNG object.

class reikna.cbrng.CBRNG(randoms_arr, generators_dim, sampler, seed=None)

Bases: Computation

Counter-based pseudo-random number generator class.

Parameters:
  • randoms_arr – an array intended for storing generated random numbers.
  • generators_dim – the number of dimensions (counting from the end) which will use independent generators. For example, if randoms_arr has the shape (100, 200, 300) and generators_dim is 2, then in every sub-array (j, :, :), j = 0 .. 99, every element will use an independent generator.
  • sampler – a Sampler object.
  • seedNone for random seed, or an integer.

classmethod sampler_name(randoms_arr, generators_dim, sampler_kwds=None, seed=None)

A convenience constructor for the sampler sampler_name from samplers. The contents of the dictionary sampler_kwds will be passed to the sampler constructor function (with bijection being created automatically, and dtype taken from randoms_arr).

compiled_signature(counters:io, randoms:o)

Parameters:
  • counters – the RNG “state”. All attributes are equal to the ones of the result of create_counters().
  • randoms – generated random numbers. All attributes are equal to the ones of randoms_arr from the constructor.

create_counters()

Create a counter array for use in CBRNG.

Kernel API

class reikna.cbrng.bijections.Bijection(module, word_dtype, key_dtype, counter_dtype)

Contains a CBRNG bijection module and accompanying metadata. Supports __process_modules__ protocol.

word_dtype

The data type of the integer word used by the generator.

key_words

The number of words used by the key.

counter_words

The number of words used by the counter.

key_dtype

The numpy.dtype object representing a bijection key. Contains a single array field v with key_words of word_dtype elements.

counter_dtype

The numpy.dtype object representing a bijection counter. Contains a single array field v with key_words of word_dtype elements.

raw_functions

A dictionary dtype:function_name of available functions function_name in module that produce a random full-range integer dtype from a State, advancing it. Available functions: get_raw_uint32(), get_raw_uint64().

module

The module containing the CBRNG function. It provides the C functions below.

COUNTER_WORDS

Contains the value of counter_words.

KEY_WORDS

Contains the value of key_words.

Word

Contains the type corresponding to word_dtype.

Key

Describes the bijection key. Alias for the structure generated from key_dtype.

Word v[KEY_WORDS]

Counter

Describes the bijection counter, or its output. Alias for the structure generated from counter_dtype.

Word v[COUNTER_WORDS]

Counter make_counter_from_int(int x)

Creates a counter object from an integer.

Counter bijection(Key key, Counter counter)

The main bijection function.

State

A structure containing the CBRNG state which is used by samplers.

State make_state(Key key, Counter counter)

Creates a new state object.

Counter get_next_unused_counter(State state)

Extracts a counter which has not been used in random sampling.

uint32

A type of unsigned 32-bit word, corresponds to numpy.uint32.

uint64

A type of unsigned 64-bit word, corresponds to numpy.uint64.

uint32 get_raw_uint32(State *state)

Returns uniformly distributed unsigned 32-bit word and updates the state.

uint64 get_raw_uint64(State *state)

Returns uniformly distributed unsigned 64-bit word and updates the state.

reikna.cbrng.bijections.philox(bitness, counter_words, rounds=10)

A CBRNG based on a low number of slow rounds (multiplications).

Parameters:
  • bitness32 or 64, corresponds to the size of generated random integers.
  • counter_words2 or 4, number of integers generated in one go.
  • rounds1 to 12, the more rounds, the better randomness is achieved. Default values are big enough to qualify as PRNG.
Returns:

a Bijection object.

reikna.cbrng.bijections.threefry(bitness, counter_words, rounds=20)

A CBRNG based on a big number of fast rounds (bit rotations).

Parameters:
  • bitness32 or 64, corresponds to the size of generated random integers.
  • counter_words2 or 4, number of integers generated in one go.
  • rounds1 to 72, the more rounds, the better randomness is achieved. Default values are big enough to qualify as PRNG.
Returns:

a Bijection object.

class reikna.cbrng.samplers.Sampler(bijection, module, dtype, randoms_per_call=1, deterministic=False)

Contains a random distribution sampler module and accompanying metadata. Supports __process_modules__ protocol.

deterministic

If True, every sampled random number consumes the same amount of counters.

randoms_per_call

How many random numbers one call to sample creates.

dtype

The data type of one random value produced by the sampler.

module

The module containing the distribution sampling function. It provides the C functions below.

RANDOMS_PER_CALL

Contains the value of randoms_per_call.

Value

Contains the type corresponding to dtype.

Result

Describes the sampling result.

value v[RANDOMS_PER_CALL]

Result sample(State *state)

Performs the sampling, updating the state.

reikna.cbrng.samplers.gamma(bijection, dtype, shape=1, scale=1)

Generates random numbers from the gamma distribution P(x)=xk−1e−x/θθkΓ(k), where k is shape, and θ is scale. Supported dtypes: float(32/64). Returns a Sampler object.

reikna.cbrng.samplers.normal_bm(bijection, dtype, mean=0, std=1)

Generates normally distributed random numbers with the mean mean and the standard deviation std using Box-Muller transform. Supported dtypes: float(32/64), complex(64/128). Produces two random numbers per call for real types and one number for complex types. Returns a Sampler object.

Note

In case of a complex dtype, std refers to the standard deviation of the complex numbers (same as numpy.std() returns), not real and imaginary components (which will be normally distributed with the standard deviation std / sqrt(2)). Consequently, while mean is of type dtype, std must be real.

reikna.cbrng.samplers.uniform_float(bijection, dtype, low=0, high=1)

Generates uniformly distributed floating-points numbers in the interval [low, high). Supported dtypes: float(32/64). A fixed number of counters is used in each thread. Returns a Sampler object.

reikna.cbrng.samplers.uniform_integer(bijection, dtype, low, high=None)

Generates uniformly distributed integer numbers in the interval [low, high). If high is None, the interval is [0, low). Supported dtypes: any numpy integers. If the size of the interval is a power of 2, a fixed number of counters is used in each thread. Returns a Sampler object.

reikna.cbrng.samplers.vonmises(bijection, dtype, mu=0, kappa=1)

Generates random numbers from the von Mises distribution P(x)=exp(κcos(x−μ))2πI0(κ), where μ is the mode, κ is the dispersion, and I0 is the modified Bessel function of the first kind. Supported dtypes: float(32/64). Returns a Sampler object.

class reikna.cbrng.tools.KeyGenerator(module, base_key)

Contains a key generator module and accompanying metadata. Supports __process_modules__ protocol.

module

A module with the key generator function:

Key key_from_int(int idx)

Generates and returns a key, suitable for the bijection which was given to the constructor.

classmethod create(bijection, seed=None, reserve_id_space=True)

Creates a generator.

Parameters:
  • bijection – a Bijection object.
  • seed – an integer, or numpy array of 32-bit unsigned integers.
  • reserve_id_space – if True, the last 32 bit of the key will be reserved for the thread identifier. As a result, the total size of the key should be 64 bit or more. If False, the thread identifier will be just added to the key, which will still result in different keys for different threads, with the danger that different seeds produce the same sequences.

reference(idx)

Reference function that returns the key given the thread identifier. Uses the same algorithm as the module.

Computations

Algorithms

General purpose algorithms.

Pure parallel computations

class reikna.algorithms.PureParallel(parameters, code, guiding_array=None, render_kwds=None)

Bases: Computation

A general class for pure parallel computations (i.e. with no interaction between threads).

Parameters:
  • parameters – a list of Parameter objects.
  • code – a source code for the computation. Can be a Snippet object which will be passed Indices object for the guiding_array as the first positional argument, and KernelParameter objects corresponding to parameters as the rest of positional arguments. If it is a string, such Snippet will be created out of it, with the parameter names idxs for the first one and the names of parameters for the remaining ones.
  • guiding_array – an tuple with the array shape, or the name of one of parameters. By default, the first parameter is chosen.
  • render_kwds – a dictionary with render keywords for the code.

compiled_signature(*args)

Parameters: args – corresponds to the given parameters.

classmethod from_trf(trf, guiding_array=None)

Creates a PureParallel instance from a Transformation object. guiding_array can be a string with a name of an array parameter from trf, or the corresponding TransformationParameter object.

Transposition (permutation)

class reikna.algorithms.Transpose(arr_t, output_arr_t=None, axes=None, block_width_override=None)

Bases: Computation

Changes the order of axes in a multidimensional array. Works analogous to numpy.transpose.

Parameters:
  • arr_t – an array-like defining the initial array.
  • output_arr_t – an array-like defining the output array. If None, its shape will be derived based on the shape of arr_t, its dtype will be equal to that of arr_t, and any non-default offset or strides of arr_t will be ignored.
  • axes – tuple with the new axes order. If None, then axes will be reversed.

compiled_signature(output:o, input:i)

Parameters:
  • output – an array with all the attributes of arr_t, with the shape permuted according to axes.
  • input – an array with all the attributes of arr_t.

Reduction

class reikna.algorithms.Reduce(arr_t, predicate, axes=None, output_arr_t=None)

Bases: Computation

Reduces the array over given axis using given binary operation.

Parameters:
  • arr_t – an array-like defining the initial array.
  • predicate – a Predicate object.
  • axes – a list of non-repeating axes to reduce over. If None, the whole array will be reduced (in which case the shape of the output array is (1,)).
  • output_arr_t – an output array metadata (the shape must still correspond to the result of reducing the original array over given axes, but offset and strides can be set to the desired ones).

compiled_signature(output:o, input:i)

Parameters:
  • input – an array with the attributes of arr_t.
  • output – an array with the attributes of arr_t, with its shape missing axes from axes.

Scan

class reikna.algorithms.Scan(arr_t, predicate, axes=None, exclusive=False, max_work_group_size=None, seq_size=None)

Bases: Computation

Scans the array over given axis using given binary operation. Namely, from an array [a, b, c, d, ...] and an operation ., produces [a, a.b, a.b.c, a.b.c.d, ...] if exclusive is False and [0, a, a.b, a.b.c, ...] if exclusive is True (here 0 is the operation’s identity element).

Parameters:
  • arr_t – an array-like defining the initial array.
  • predicate – a Predicate object.
  • axes – a list of non-repeating axes to scan over. (Note that the result will depend on the order of the axes). If None, the whole array will be scanned over. This means that the selected axes will be flattened (in the specified order) and treated like a single axis for the purposes of the scan.
  • exclusive – whether to perform an exclusive scan (see above).
  • max_work_group_size – the maximum workgroup size to be used for the scan kernel.
  • seq_size – the number of elements to be scanned sequentially. If not given, Reikna will attempt to choose the one resulting in the best performance, but sometimes a manual choice may be better.

compiled_signature(output:o, input:i)

Parameters:
  • input – an array with the attributes of arr_t.
  • output – an array with the attributes of arr_t.

Predicates

class reikna.algorithms.Predicate(operation, empty)

A predicate used in some of Reikna algorithms (e.g. Reduce or Scan).

Parameters:
  • operation – a Snippet object with two parameters which will take the names of two arguments to join.
  • empty – a numpy scalar with the empty value of the argument (the one which, being joined by another argument, does not change it).

reikna.algorithms.predicate_sum(dtype)

Returns a Predicate object which sums its arguments.

Linear algebra

Linear algebra algorithms.

Matrix multiplication (dot product)

class reikna.linalg.MatrixMul(a_arr, b_arr, out_arr=None, block_width_override=None, transposed_a=False, transposed_b=False)

Bases: Computation

Multiplies two matrices using last two dimensions and batching over remaining dimensions. For batching to work, the products of remaining dimensions should be equal (then the multiplication will be performed piecewise), or one of them should equal 1 (then the multiplication will be batched over the remaining dimensions of the other matrix).

Parameters:
  • a_arr – an array-like defining the first argument.
  • b_arr – an array-like defining the second argument.
  • out_arr – an array-like definign the output; if not given, both shape and dtype will be derived from a_arr and b_arr.
  • block_width_override – if provided, it will used as a block size of the multiplication kernel.
  • transposed_a – if True, the first matrix will be transposed before the multiplication.
  • transposed_b – if True, the second matrix will be transposed before the multiplication.

compiled_signature(output:o, matrix_a:i, matrix_b:i)

Parameters:
  • output – the output of matrix multiplication.
  • matrix_a – the first argument.
  • matrix_b – the second argument.

Matrix norms

class reikna.linalg.EntrywiseNorm(arr_t, order=2, axes=None)

Bases: Computation

Calculates the entrywise matrix norm (same as numpy.linalg.norm) of an arbitrary order r

:

||A||r=(∑i,j,…|Ai,j,…|r)1/r

Parameters:
  • arr_t – an array-like defining the initial array.
  • order – the order r
  • (any real number).
  • axes – a list of non-repeating axes to sum over. If None, the norm of the whole array will be calculated.

compiled_signature(output:o, input:i)

Parameters:
  • input – an array with the attributes of arr_t.
  • output – an array with the attributes of arr_t, with its shape missing axes from axes.

Fast Fourier transform and related utilities

Fast Fourier Transform

class reikna.fft.FFT(arr_t, axes=None)

Bases: Computation

Performs the Fast Fourier Transform. The interface is similar to numpy.fft.fftn. The inverse transform is normalized so that IFFT(FFT(X)) = X.

Parameters:
  • arr_t – an array-like defining the problem array.
  • axes – a tuple with axes over which to perform the transform. If not given, the transform is performed over all the axes.

Note

Current algorithm works most effectively with array dimensions being power of 2 This mostly applies to the axes over which the transform is performed, beacuse otherwise the computation falls back to the Bluestein’s algorithm, which effectively halves the performance.

compiled_signature(output:o, input:i, inverse:s)

output and input may be the same array.

Parameters:
  • output – an array with the attributes of arr_t.
  • input – an array with the attributes of arr_t.
  • inverse – a scalar value castable to integer. If 0, output contains the forward FFT of input, if 1, the inverse one.

FFT frequency shift

class reikna.fft.FFTShift(arr_t, axes=None)

Bases: Computation

Shift the zero-frequency component to the center of the spectrum. The interface is similar to numpy.fft.fftshift, and the output is the same for the same array shape and axes.

Parameters:
  • arr_t – an array-like defining the problem array.
  • axes – a tuple with axes over which to perform the shift. If not given, the shift is performed over all the axes.

compiled_signature(output:o, input:i)

output and input may be the same array.

Parameters:
  • output – an array with the attributes of arr_t.
  • input – an array with the attributes of arr_t.

Discrete harmonic transform

reikna.dht.get_spatial_grid(modes, order, add_points=0)

Returns the spatial grid required to calculate the order power of a function defined in the harmonic mode space of the size modes. If add_points is 0, the grid has the minimum size required for exact transformation back to the mode space.

reikna.dht.harmonic(mode)

Returns an eigenfunction of order n=mode for the harmonic oscillator: ϕn=1π−−√42nn!−−−−√Hn(x)exp(−x2/2), where Hn is the n-th order “physicists’” Hermite polynomial. The normalization is chosen so that ∫ϕ2n(x)dx=1.

class reikna.dht.DHT(mode_arr, add_points=None, inverse=False, order=1, axes=None)

Bases: Computation Discrete transform to and from harmonic oscillator modes. With inverse=True transforms a function defined by its expansion Cm,m=0…M−1 in the mode space with mode functions from harmonic(), to the coordinate space (F(x) on the grid x from get_spatial_grid()). With inverse=False guarantees to recover first M modes of Fk(x), where k is the order parameter.

For multiple dimensions the operation is the same, and the mode functions are products of 1D mode functions, i.e. ϕ3Dl,m,n(x,y,z)=ϕl(x)ϕm(y)ϕn(z).

For the detailed description of the algorithm, see Dion & Cances, PRE 67(4) 046706 (2003)

Parameters:
  • mode_arr – an array-like object defining the shape of mode space. If inverse=False, its shape is used to define the mode space size.
  • inverseFalse for forward (coordinate space -> mode space) transform, True for inverse (mode space -> coordinate space) transform.
  • axes – a tuple with axes over which to perform the transform. If not given, the transform is performed over all the axes.
  • order – if F is a function in mode space, the number of spatial points is chosen so that the transformation DHT[(DHT^{-1}[F])^order] could be performed.
  • add_points – a list of the same length as mode_arr shape, specifying the number of points in x-space to use in addition to minimally required (0 by default).

compiled_signature_forward(modes:o, coords:i)

compiled_signature_inverse(coords:o, modes:i)

Depending on inverse value, either of these two will be created.

Parameters:
  • modes – an array with the attributes of mode_arr.
  • coords – an array with the shape depending on mode_arr, axes, order and add_points, and the dtype of mode_arr.

Counter-based random number generators

This module is based on the paper by Salmon et al., P. Int. C. High. Perform. 16 (2011). and the source code of Random123 library.

A counter-based random-number generator (CBRNG) is a parametrized function fk(c), where k is the key, c is the counter, and the function fk defines a bijection in the set of integer numbers. Being applied to successive counters, the function produces a sequence of pseudo-random numbers. The key is an analogue of the seed of stateful RNGs; if the CBRNG is used to generate random num bers in parallel threads, the key is a combination of a seed and a unique thread number.

There are two types of generators available, threefry (uses large number of simple functions), and philox (uses smaller number of more complicated functions). The latter one is generally faster on GPUs; see the paper above for detailed comparisons. These generators can be further specialized to use words=2 or words=4 bitness=32-bit or bitness=64-bit counters. Obviously, the period of the generator equals to the cardinality of the set of possible counters. For example, if the counter consits of 4 64-bit numbers, then the period of the generator is 2256 . As for the key size, in case of threefry the key has the same size as the counter, and for philox the key is half its size.

The CBRNG class sets one of the words of the key (except for philox-2x64, where 32 bit of the only word in the key are used), the rest are the same for all threads and are derived from the provided seed. This limits the maximum number of number-generating threads (size). philox-2x32 has a 32-bit key and therefore cannot be used in CBRNG (although it can be used separately with the help of the kernel API).

The CBRNG class itself is stateless, same as other computations in Reikna, so you have to manage the generator state yourself. The state is created by the create_counters() method and contains a size counters. This state is then passed to, and updated by a CBRNG object.

class reikna.cbrng.CBRNG(randoms_arr, generators_dim, sampler, seed=None)

Bases: Computation

Counter-based pseudo-random number generator class.

Parameters:
  • randoms_arr – an array intended for storing generated random numbers.
  • generators_dim – the number of dimensions (counting from the end) which will use independent generators. For example, if randoms_arr has the shape (100, 200, 300) and generators_dim is 2, then in every sub-array (j, :, :), j = 0 .. 99, every element will use an independent generator.
  • sampler – a Sampler object.
  • seedNone for random seed, or an integer.

classmethod sampler_name(randoms_arr, generators_dim, sampler_kwds=None, seed=None)

A convenience constructor for the sampler sampler_name from samplers. The contents of the dictionary sampler_kwds will be passed to the sampler constructor function (with bijection being created automatically, and dtype taken from randoms_arr).

compiled_signature(counters:io, randoms:o)

Parameters:
  • counters – the RNG “state”. All attributes are equal to the ones of the result of create_counters().
  • randoms – generated random numbers. All attributes are equal to the ones of randoms_arr from the constructor.

create_counters()

Create a counter array for use in CBRNG.

Kernel API

class reikna.cbrng.bijections.Bijection(module, word_dtype, key_dtype, counter_dtype)

Contains a CBRNG bijection module and accompanying metadata. Supports __process_modules__ protocol.

word_dtype

The data type of the integer word used by the generator.

key_words

The number of words used by the key.

counter_words

The number of words used by the counter.

key_dtype

The numpy.dtype object representing a bijection key. Contains a single array field v with key_words of word_dtype elements.

counter_dtype

The numpy.dtype object representing a bijection counter. Contains a single array field v with key_words of word_dtype elements.

raw_functions

A dictionary dtype:function_name of available functions function_name in module that produce a random full-range integer dtype from a State, advancing it. Available functions: get_raw_uint32(), get_raw_uint64().

module

The module containing the CBRNG function. It provides the C functions below.

COUNTER_WORDS

Contains the value of counter_words.

KEY_WORDS

Contains the value of key_words.

Word

Contains the type corresponding to word_dtype.

Key

Describes the bijection key. Alias for the structure generated from key_dtype.

Word v[KEY_WORDS]

Counter

Describes the bijection counter, or its output. Alias for the structure generated from counter_dtype.

Word v[COUNTER_WORDS]

Counter make_counter_from_int(int x)

Creates a counter object from an integer.

Counter bijection(Key key, Counter counter)

The main bijection function.

State

A structure containing the CBRNG state which is used by samplers.

State make_state(Key key, Counter counter)

Creates a new state object.

Counter get_next_unused_counter(State state)

Extracts a counter which has not been used in random sampling.

uint32

A type of unsigned 32-bit word, corresponds to numpy.uint32.

uint64

A type of unsigned 64-bit word, corresponds to numpy.uint64.

uint32 get_raw_uint32(State *state)

Returns uniformly distributed unsigned 32-bit word and updates the state.

uint64 get_raw_uint64(State *state)

Returns uniformly distributed unsigned 64-bit word and updates the state.

reikna.cbrng.bijections.philox(bitness, counter_words, rounds=10)

A CBRNG based on a low number of slow rounds (multiplications).

Parameters:
  • bitness32 or 64, corresponds to the size of generated random integers.
  • counter_words2 or 4, number of integers generated in one go.
  • rounds1 to 12, the more rounds, the better randomness is achieved. Default values are big enough to qualify as PRNG.
Returns:

a Bijection object.

reikna.cbrng.bijections.threefry(bitness, counter_words, rounds=20)

A CBRNG based on a big number of fast rounds (bit rotations).

Parameters:
  • bitness32 or 64, corresponds to the size of generated random integers.
  • counter_words2 or 4, number of integers generated in one go.
  • rounds1 to 72, the more rounds, the better randomness is achieved. Default values are big enough to qualify as PRNG.
Returns:

a Bijection object.

class reikna.cbrng.samplers.Sampler(bijection, module, dtype, randoms_per_call=1, deterministic=False)

Contains a random distribution sampler module and accompanying metadata. Supports __process_modules__ protocol.

deterministic

If True, every sampled random number consumes the same amount of counters.

randoms_per_call

How many random numbers one call to sample creates.

dtype

The data type of one random value produced by the sampler.

module

The module containing the distribution sampling function. It provides the C functions below.

RANDOMS_PER_CALL

Contains the value of randoms_per_call.

Value

Contains the type corresponding to dtype.

Result

Describes the sampling result.

value v[RANDOMS_PER_CALL]

Result sample(State *state)

Performs the sampling, updating the state.

reikna.cbrng.samplers.gamma(bijection, dtype, shape=1, scale=1)

Generates random numbers from the gamma distribution

P(x)=xk−1e−x/θθkΓ(k), where k is shape, and θ is scale. Supported dtypes: float(32/64). Returns a Sampler object.

reikna.cbrng.samplers.normal_bm(bijection, dtype, mean=0, std=1)

Generates normally distributed random numbers with the mean mean and the standard deviation std using Box-Muller transform. Supported dtypes: float(32/64), complex(64/128). Produces two random numbers per call for real types and one number for complex types. Returns a Sampler object.

Note

In case of a complex dtype, std refers to the standard deviation of the complex numbers (same as numpy.std() returns), not real and imaginary components (which will be normally distributed with the standard deviation std / sqrt(2)). Consequently, while mean is of type dtype, std must be real.

reikna.cbrng.samplers.uniform_float(bijection, dtype, low=0, high=1)

Generates uniformly distributed floating-points numbers in the interval [low, high). Supported dtypes: float(32/64). A fixed number of counters is used in each thread. Returns a Sampler object.

reikna.cbrng.samplers.uniform_integer(bijection, dtype, low, high=None)

Generates uniformly distributed integer numbers in the interval [low, high). If high is None, the interval is [0, low). Supported dtypes: any numpy integers. If the size of the interval is a power of 2, a fixed number of counters is used in each thread. Returns a Sampler object.

reikna.cbrng.samplers.vonmises(bijection, dtype, mu=0, kappa=1)

Generates random numbers from the von Mises distribution P(x)=exp(κcos(x−μ))2πI0(κ), where μ is the mode, κ is the dispersion, and I0 is the modified Bessel function of the first kind. Supported dtypes: float(32/64). Returns a Sampler object.

class reikna.cbrng.tools.KeyGenerator(module, base_key)

Contains a key generator module and accompanying metadata. Supports __process_modules__ protocol.

module A module with the key generator function:

Key key_from_int(int idx)

Generates and returns a key, suitable for the bijection which was given to the constructor.

classmethod create(bijection, seed=None, reserve_id_space=True)

Creates a generator.

Parameters:
  • bijection – a Bijection object.
  • seed – an integer, or numpy array of 32-bit unsigned integers.
  • reserve_id_space – if True, the last 32 bit of the key will be reserved for the thread identifier. As a result, the total size of the key should be 64 bit or more. If False, the thread identifier will be just added to the key, which will still result in different keys for different threads, with the danger that different seeds produce the same sequences.

reference(idx)

Reference function that returns the key given the thread identifier. Uses the same algorithm as the module.

Transformations

This module contains a number of pre-created transformations.

reikna.transformations.add_const(arr_t, param)

Returns an addition transformation with a fixed parameter (1 output, 1 input): output = input + param.

reikna.transformations.add_param(arr_t, param_dtype)

Returns an addition transformation with a dynamic parameter (1 output, 1 input, 1 scalar): output = input + param.

reikna.transformations.broadcast_const(arr_t, val)

Returns a transformation that broadcasts the given constant to the array output (1 output): output = val.

reikna.transformations.broadcast_param(arr_t)

Returns a transformation that broadcasts the free parameter to the array output (1 output, 1 param): output = param.

reikna.transformations.combine_complex(output_arr_t)

Returns a transformation that joins two real inputs into complex output (1 output, 2 inputs): output = real + 1j * imag.

reikna.transformations.copy(arr_t, out_arr_t=None)

Returns an identity transformation (1 output, 1 input): output = input. Output array type out_arr_t may have different strides, but must have the same shape and data type.

reikna.transformations.div_const(arr_t, param)

Returns a scaling transformation with a fixed parameter (1 output, 1 input): output = input / param.

reikna.transformations.div_param(arr_t, param_dtype)

Returns a scaling transformation with a dynamic parameter (1 output, 1 input, 1 scalar): output = input / param.

reikna.transformations.ignore(arr_t)

Returns a transformation that ignores the output it is attached to.

reikna.transformations.mul_const(arr_t, param)

Returns a scaling transformation with a fixed parameter (1 output, 1 input): output = input * param.

reikna.transformations.mul_param(arr_t, param_dtype)

Returns a scaling transformation with a dynamic parameter (1 output, 1 input, 1 scalar): output = input * param.

reikna.transformations.norm_const(arr_t, order)

Returns a transformation that calculates the order-norm (1 output, 1 input): output = abs(input) ** order.

reikna.transformations.norm_param(arr_t)

Returns a transformation that calculates the order-norm (1 output, 1 input, 1 param): output = abs(input) ** order.

reikna.transformations.split_complex(input_arr_t)

Returns a transformation that splits complex input into two real outputs (2 outputs, 1 input): real = Re(input), imag = Im(input).

 

 

你可能感兴趣的:(reikna,doc翻译)