This module contains information about the library version.
reikna.version.
version
A tuple with version numbers, major components first.
reikna.version.full_version
A string fully identifying the current build.
reikna.version.
git_revision
A string with Git SHA identifying the revision used to create this build.
reikna.version.
release
A boolean variable, equals True
if current version is a release version.
This module contains various auxiliary functions which are used throughout the library.
reikna.helpers.
bounding_power_of_2
(num)
Returns the minimal number of the form 2**m
such that it is greater or equal to n
.
reikna.helpers.
default_strides
(shape, itemsize)
Return the default strides (corresponding to a contiguous array) for an array of shape shape
and elements of size itemsize
bytes.
reikna.helpers.
factors
(num, limit=None)
Returns the list of pairs (factor, num/factor)
for all factors of num
(including 1 and num
), sorted by factor
. If limit
is set, only pairs with factor <= limit
are returned.
class reikna.helpers.
ignore_integer_overflow
Context manager for ignoring integer overflow in numpy operations on scalars (not ignored by default because of a bug in numpy).
reikna.helpers.
log2
(num)
Integer-valued logarigthm with base 2. If n
is not a power of 2, the result is rounded to the smallest number.
reikna.helpers.
make_axes_innermost
(ndim, axes)
Given the total number of array axes and a list of axes in this range, produce a transposition plan (suitable e.g. for numpy.transpose()
) that will move make the given axes innermost (in the order they’re given). Returns the transposition plan, and the plan to transpose the resulting array back to the original axes order.
reikna.helpers.
min_blocks
(length, block)
Returns minimum number of blocks with length block
necessary to cover the array with length length
.
reikna.helpers.
min_buffer_size
(shape, itemsize, strides=None, offset=0)
Return the minimum memory buffer size (in bytes) that can fit an array with given parameters, starting at an offset
bytes from the beginning of the buffer.
reikna.helpers.
normalize_axes
(ndim, axes)
Transform an iterable of array axes (which can be negative) or a single axis into a tuple of non-negative axes.
reikna.helpers.
padded_buffer_parameters
(shape, itemsize, pad=0)
For an array of shape shape
, padded from all sizes with pad
elements, return a tuple of (strides, offset, size (in bytes) of the required memory buffer), which would have to be requested when allocating such an array.
reikna.helpers.
product
(seq)
Returns the product of elements in the iterable seq
.
reikna.helpers.
template_def
(signature, code)
Returns a Mako
template with the given signature
.
Parameters: | signature – a list of postitional argument names, or a Signature object from funcsigs module. |
---|---|
Code: | a body of the template. |
reikna.helpers.
template_for
(filename)
Returns the Mako template object created from the file which has the same name as filename
and the extension .mako
. Typically used in computation modules as template_for(__filename__)
.
reikna.helpers.
template_from
(template)
Creates a Mako template object from a given string. If template
already has render()
method, does nothing.
reikna.helpers.
wrap_in_tuple
(seq_or_elem)
If seq_or_elem
is a sequence, converts it to a tuple
, otherwise returns a tuple with a single element seq_or_elem
.
CLUDA is the foundation of reikna
. It provides the unified access to basic features of CUDA
and OpenCL
, such as memory operations, compilation and so on. It can also be used by itself, if you want to write GPU API-independent programs and happen to only need a small subset of GPU API. The terminology is borrowed from OpenCL
, since it is a more general API.
class reikna.cluda.
Module
(template_src, render_kwds=None)
Contains a CLUDA module. See Tutorial: modules and snippets for details.
Parameters: |
|
---|
classmethod create
(func_or_str, render_kwds=None)
Creates a module from the Mako
def:
func_or_str
is a function, then the def has the same signature as func_or_str
(prefix will be passed as the first positional parameter), and the body equal to the string it returns;func_or_str
is a string, then the def has a single positional argument prefix
. and the body code
.exception reikna.cluda.OutOfResourcesError
Thrown by compile_static()
if the provided local_size
is too big, or one cannot be found.
class reikna.cluda.
Snippet
(template_src, render_kwds=None)
Contains a CLUDA snippet. See Tutorial: modules and snippets for details.
Parameters: |
|
---|
classmethod create
(func_or_str, render_kwds=None)
Creates a snippet from the Mako
def:
func_or_str
is a function, then the def has the same signature as func_or_str
, and the body equal to the string it returns;func_or_str
is a string, then the def has empty signature.reikna.cluda.any_api
()
Returns one of the API modules supported by the system or raises an Exception
if there are not any.
reikna.cluda.api_ids
()
Returns a list of identifiers for all known (not necessarily available for the current system) APIs.
reikna.cluda.cuda_api
()
Returns the PyCUDA
-based API module.
reikna.cluda.cuda_id
()
Returns the identifier of the PyCUDA
-based API.
reikna.cluda.
find_devices
(api, include_devices=None, exclude_devices=None, include_platforms=None, exclude_platforms=None, include_duplicate_devices=True, include_pure_only=False)
Find platforms and devices meeting certain criteria.
Parameters: |
|
---|---|
Returns: | a dictionary with found platform numbers as keys, and lists of device numbers as values. |
reikna.cluda.
get_api
(api_id)
Returns an API module with the generalized interface reikna.cluda.api
for the given identifier.
reikna.cluda.
ocl_api
()
Returns the PyOpenCL
-based API module.
reikna.cluda.
ocl_id
()
Returns the identifier of the PyOpenCL
-based API.
reikna.cluda.
supported_api_ids
()
Returns a list of identifiers of supported APIs.
reikna.cluda.
supports_api
(api_id)
Returns True
if given API is supported.
Modules for all APIs have the same generalized interface. It is referred here (and references from other parts of this documentation) as reikna.cluda.api
.
class reikna.cluda.api.
Buffer
Low-level untyped memory allocation. Actual class depends on the API: pycuda.driver.DeviceAllocation
for CUDA
and pyopencl.Buffer
for OpenCL
.
size
class reikna.cluda.api.
Array
A superclass of the corresponding API’s native array (pycuda.gpuarray.GPUArray
for CUDA
and pyopencl.array.Array
for OpenCL
), with some additional functionality.
shape
dtype
strides
offset
The start of the array data in the memory buffer (in bytes).
base_data
The memory buffer where the array is located.
nbytes
The total size of the array data plus the offset (in bytes).
get
()
Returns numpy.ndarray
with the contents of the array. Synchronizes the parent Thread
.
thread
The Thread
object for which the array was created.
class reikna.cluda.api.
DeviceParameters
(device)
An assembly of device parameters necessary for optimizations.
api_id
Identifier of the API this device belongs to.
max_work_group_size
Maximum block size for kernels.
max_work_item_sizes
List with maximum local_size for each dimension.
max_num_groups
List with maximum number of workgroups for each dimension.
warp_size
Warp size (nVidia), or wavefront size (AMD), or SIMD width is supposed to be the number of threads that are executed simultaneously on the same computation unit (so you can assume that they are perfectly synchronized).
local_mem_banks
Number of local (shared in CUDA) memory banks is a number of successive 32-bit words you can access without getting bank conflicts.
local_mem_size
Size of the local (shared in CUDA) memory per workgroup, in bytes.
min_mem_coalesce_width
Dictionary {word_size:elements}
, where elements
is the number of elements with size word_size
in global memory that allow coalesced access.
compute_units
The value of MULTIPROCESSOR_COUNT
in CUDA and MAX_COMPUTE_UNITS
in OpenCL.
supports_dtype
(self, dtype)
Checks if given numpy
dtype can be used in kernels compiled using this thread.
class reikna.cluda.api.
Platform
A vendor-specific implementation of the GPGPU API.
name
Platform name.
vendor
Vendor name.
version
Platform version.
get_devices
()
Returns a list of device objects available in the platform.
class reikna.cluda.api.
Kernel
(thr, program, name, static=False)
An object containing GPU kernel.
max_work_group_size
Maximum size of the work group for the kernel.
__call__
(*args, **kwds)
A shortcut for successive call to prepare()
and prepared_call()
. In case of the OpenCL backend, returns a pyopencl.Event
object.
prepare
(global_size, local_size=None, local_mem=0)
Prepare the kernel for execution with given parameters.
Parameters: |
|
---|
prepared_call
(*args)
Execute the kernel. Array
objects are allowed as arguments. In case of the OpenCL backend, returns a pyopencl.Event
object.
set_constant
(name, arr)
Load a constant array (arr
can be either numpy
array or a Array
object) corresponding to the symbol name
to device. Note that all the kernels belonging to the same Program
object share constant arrays.
class reikna.cluda.api.
Program
(thr, src, static=False, fast_math=False, compiler_options=None, constant_arrays=None, keep=False)
An object with compiled GPU code.
source
Contains module source code.
kernel_name
Contains Kernel
object for the kernel kernel_name
.
set_constant
(name, arr)
Load a constant array (arr
can be either numpy
array or a Array
object) corresponding to the symbol name
to device.
class reikna.cluda.api.
StaticKernel
(thr, template_src, name, global_size, local_size=None, render_args=None, render_kwds=None, fast_math=False, compiler_options=None, constant_arrays=None, keep=False)
An object containing a GPU kernel with fixed call sizes.
source
Contains the source code of the program.
__call__
(*args)
Execute the kernel. Array
objects are allowed as arguments. In case of the OpenCL backend, returns a pyopencl.Event
object.
set_constant
(name, arr)
Load a constant array (arr
can be either numpy
array or a Array
object) corresponding to the symbol name
to device.
class reikna.cluda.api.
Thread
(cqd, async_=True, temp_alloc=None)
Wraps an existing context in the CLUDA thread object.
Parameters: |
|
---|
Note
If you are using CUDA
API, you must keep in mind the stateful nature of CUDA calls. Briefly, this means that there is the context stack, and the current context on top of it. When the create()
is called, the PyCUDA
context gets pushed to the stack and made current. When the thread object goes out of scope (and the thread object owns it), the context is popped, and it is the user’s responsibility to make sure the popped context is the correct one. In simple single-context programs this only means that one should avoid reference cycles involving the thread object.
Warning
Do not pass one Stream
/CommandQueue
object to several Thread
objects.
api
Module object representing the CLUDA API corresponding to this Thread
.
device_params
Instance of DeviceParameters
class for this thread’s device.
temp_alloc
Instance of TemporaryManager
which handles allocations of temporary arrays (see temp_array()
).
allocate
(size)
Creates an untyped memory allocation object of type Buffer
with size size
.
array
(shape, dtype, strides=None, offset=0, nbytes=None, base=None, base_data=None, allocator=None)
Creates an Array
on GPU with given shape
, dtype
, strides
and offset
.
If nbytes
is None
, the size of the allocated memory buffer is chosen to be the minimum one to fit all the elements of the array, based on shape
, dtype
and strides
(if provided). If offset
is not 0, an additional offset
bytes is added at the beginning of the buffer.
Note
Reikna computations (including the template functions load_idx()
, store_idx()
etc), high-level PyCUDA/PyOpenCL functions and PyCUDA kernels take offset
into account automatically and address arrays starting from the position of the actual data. Reikna kernels (created with compile()
and compile_static()
) and PyOpenCL kernels receive base addresses of arrays, and thus have to add offsets manually.
If base
, base_data
and nbytes
are None
, the total allocated size will be the minimum size required for the array data (based on shape
and strides
) plus offset
.
If base
and base_data
are None
, but nbytes
is not, nbytes
bytes will be allocated for the array (this includes the offset).
If base_data
(a memory buffer) is not None
, it will be used as the underlying buffer for the array, with the actual data starting at the offset
bytes from the beginning of base_data
. No size checking to make sure the array and the offset fit it will be performed.
If base
(an Array
object) is not None
, its buffer is used as the underlying buffer for the array, with the actual data starting at the offset
bytes from the beginning of base.base_data
. base_data
will be ignored.
Optionally, an allocator
is a callable returning any object castable to int
representing the physical address on the device (for instance, Buffer
).
compile
(template_src, render_args=None, render_kwds=None, fast_math=False, compiler_options=None, constant_arrays=None, keep=False)
Creates a module object from the given template.
Parameters: |
|
---|---|
Returns: | a |
compile_static
(template_src, name, global_size, local_size=None, render_args=None, render_kwds=None, fast_math=False, compiler_options=None, constant_arrays=None, keep=False)
Creates a kernel object with fixed call sizes, which allows to overcome some backend limitations. Global and local sizes can have any length, providing that len(global_size) >= len(local_size)
, and the total number of work items and work groups is less than the corresponding total number available for the device. In order to get IDs and sizes in such kernels, virtual size functions have to be used (see VIRTUAL_SKIP_THREADS
and others for details).
Parameters: |
|
---|
The rest of the keyword parameters are the same as for compile()
.
Returns: | a StaticKernel object. |
---|
copy_array
(arr, dest=None, src_offset=0, dest_offset=0, size=None)
Copies array on device.
Parameters: |
|
---|
classmethod create
(interactive=False, device_filters=None, **thread_kwds)
Creates a new Thread
object with its own context and queue inside. Intended for cases when you want to base your whole program on CLUDA.
Parameters: |
|
---|
empty_like
(arr)
Allocates an array on GPU with the same attributes (shape
, dtype
, strides
, offset
and nbytes
) as arr
.
Warning
Note that pycuda.GPUArray
objects do not have the offset
attribute.
from_device
(arr, dest=None, async_=False)
Transfers the contents of arr
to a numpy.ndarray
object. The effect of dest
parameter is the same as in to_device()
. If async_
is True
, the transfer is asynchronous (the thread-wide asynchronisity setting does not apply here).
Alternatively, one can use Array.get()
.
release
()
Forcefully free critical resources (rendering the object unusable). In most cases you can rely on the garbage collector taking care of things. Calling this method explicitly may be necessary in case of CUDA API when you want to make sure the context got popped.
synchronize
()
Forcefully synchronize this thread with the main program.
temp_array
(shape, dtype, strides=None, offset=0, nbytes=None, dependencies=None)
Creates an Array
on GPU with given shape
, dtype
, strides
, offset
and nbytes
(see array()
for details). In order to reduce the memory footprint of the program, the temporary array manager will allow these arrays to overlap. Two arrays will not overlap, if one of them was specified in dependencies
for the other one. For a list of values dependencies
takes, see the reference entry for TemporaryManager
.
to_device
(arr, dest=None)
Copies an array to the device memory. If dest
is specified, it is used as the destination, and the method returns None
. Otherwise the destination array is created internally and returned from the method.
reikna.cluda.api.
get_id
()
Returns the identifier of this API.
reikna.cluda.api.
get_platforms
()
Returns a list of available Platform
objects. In case of OpenCL
returned objects are actually instances of pyopencl.Platform
.
Each Thread
contains a special allocator for arrays with data that does not have to be persistent all the time. In many cases you only want some array to keep its contents between several kernel calls. This can be achieved by manually allocating and deallocating such arrays every time, but it slows the program down, and you have to synchronize the queue because allocation commands are not serialized. Therefore it is advantageous to use temp_array()
method to get such arrays. It takes a list of dependencies as an optional parameter which gives the allocator a hint about which arrays should not use the same physical allocation.
class reikna.cluda.tempalloc.
TemporaryManager
(thr, pack_on_alloc=False, pack_on_free=False)
Base class for a manager of temporary allocations.
Parameters: |
|
---|
array
(shape, dtype, strides=None, offset=0, nbytes=None, dependencies=None)
Returns a temporary array.
Parameters: |
|
---|
pack
()
Packs the real allocations possibly reducing total memory usage. This process can be slow.
class reikna.cluda.tempalloc.
TrivialManager
(*args, **kwds)
Trivial manager — allocates a separate buffer for each allocation request.
class reikna.cluda.tempalloc.
ZeroOffsetManager
(*args, **kwds)
Tries to assign several allocation requests to a single real allocation, if dependencies allow that. All virtual allocations start from the beginning of real allocations.
This module contains Module
factories which are used to compensate for the lack of complex number operations in OpenCL, and the lack of C++ synthax which would allow one to write them.
reikna.cluda.functions.
add
(*in_dtypes, out_dtype=None)
Returns a Module
with a function of len(in_dtypes)
arguments that adds values of types in_dtypes
. If out_dtype
is given, it will be set as a return type for this function.
This is necessary since on some platforms the +
operator for a complex and a real number works in an unexpected way (returning (a.x + b, a.y + b)
instead of (a.x + b, a.y)
).
reikna.cluda.functions.
cast
(out_dtype, in_dtype)
Returns a Module
with a function of one argument that casts values of in_dtype
to out_dtype
.
reikna.cluda.functions.
conj
(dtype)
Returns a Module
with a function of one argument that conjugates the value of type dtype
(must be a complex data type).
reikna.cluda.functions.
div
(in_dtype1, in_dtype2, out_dtype=None)
Returns a Module
with a function of two arguments that divides values of in_dtype1
and in_dtype2
. If out_dtype
is given, it will be set as a return type for this function.
reikna.cluda.functions.
exp
(dtype)
Returns a Module
with a function of one argument that exponentiates the value of type dtype
(must be a real or complex data type).
reikna.cluda.functions.
mul
(*in_dtypes, out_dtype=None)
Returns a Module
with a function of len(in_dtypes)
arguments that multiplies values of types in_dtypes
. If out_dtype
is given, it will be set as a return type for this function.
reikna.cluda.functions.
norm
(dtype)
Returns a Module
with a function of one argument that returns the 2-norm of the value of type dtype
(product by the complex conjugate if the value is complex, square otherwise).
reikna.cluda.functions.
polar
(dtype)
Returns a Module
with a function of two arguments that returns the complex-valued rho * exp(i * theta)
for values rho, theta
of type dtype
(must be a real data type).
reikna.cluda.functions.
polar_unit
(dtype)
Returns a Module
with a function of one argument that returns a complex number (cos(theta), sin(theta))
for a value theta
of type dtype
(must be a real data type).
reikna.cluda.functions.
pow
(dtype, exponent_dtype=None, output_dtype=None)
Returns a Module
with a function of two arguments that raises the first argument of type dtype
to the power of the second argument of type exponent_dtype
(an integer or real data type). If exponent_dtype
or output_dtype
are not given, they default to dtype
. If dtype
is not the same as output_dtype
, the input is cast to output_dtype
before exponentiation. If exponent_dtype
is real, but both dtype
and output_dtype
are integer, a ValueError
is raised.
The stuff available for the kernel passed for compilation consists of two parts.
First, there are several objects available at the template rendering stage, namely numpy
, reikna.cluda.dtypes
(as dtypes
), and reikna.helpers
(as helpers
).
Second, there is a set of macros attached to any kernel depending on the API it is being compiled for:
CUDA
If defined, specifies that the kernel is being compiled for CUDA API.
COMPILE_FAST_MATH
If defined, specifies that the compilation for this kernel was requested with fast_math == True
.
LOCAL_BARRIER
Synchronizes threads inside a block.
WITHIN_KERNEL
Modifier for a device-only function declaration.
KERNEL
Modifier for a kernel function declaration.
GLOBAL_MEM
Modifier for a global memory pointer argument.
LOCAL_MEM
Modifier for a statically allocated local memory variable.
LOCAL_MEM_DYNAMIC
Modifier for a dynamically allocated local memory variable.
LOCAL_MEM_ARG
Modifier for a local memory argument in device-only functions.
CONSTANT_MEM
Modifier for a statically allocated constant memory variable.
CONSTANT_MEM_ARG
Modifier for a constant memory argument in device-only functions.
INLINE
Modifier for inline functions.
SIZE_T
The type of local/global IDs and sizes. Equal to unsigned int
for CUDA, and size_t
for OpenCL (which can be 32- or 64-bit unsigned integer, depending on the device).
SIZE_T get_local_id
(int dim)
SIZE_T get_group_id
(int dim)
SIZE_T get_global_id
(int dim)
SIZE_T get_local_size
(int dim)
SIZE_T get_num_groups
(int dim)
SIZE_T get_global_size
(int dim)
Local, group and global identifiers and sizes. In case of CUDA mimic the behavior of corresponding OpenCL functions.
VSIZE_T
The type of local/global IDs in the virtual grid. It is separate from SIZE_T
because the former is intended to be equivalent to what the backend is using, while VSIZE_T
is a separate type and can be made larger than SIZE_T
in the future if necessary.
ALIGN
(int)
Used to specify an explicit alignment (in bytes) for fields in structures, as
typedef struct {
char ALIGN(4) a;
int b;
} MY_STRUCT;
VIRTUAL_SKIP_THREADS
This macro should start any kernel compiled with compile_static()
. It skips all the empty threads resulting from fitting call parameters into backend limitations.
VSIZE_T virtual_local_id
(int dim)
VSIZE_T virtual_group_id
(int dim)
VSIZE_T virtual_global_id
(int dim)
VSIZE_T virtual_local_size
(int dim)
VSIZE_T virtual_num_groups
(int dim)
VSIZE_T virtual_global_size
(int dim)
VSIZE_T virtual_global_flat_id
()
VSIZE_T virtual_global_flat_size
()
Only available in StaticKernel
objects obtained from compile_static()
. Since its dimensions can differ from actual call dimensions, these functions have to be used.
This module contains various convenience functions which operate with numpy.dtype
objects.
reikna.cluda.dtypes.
align
(dtype)
Returns a new struct dtype with the field offsets changed to the ones a compiler would use (without being given any explicit alignment qualifiers). Ignores all existing explicit itemsizes and offsets.
reikna.cluda.dtypes.
c_constant
(val, dtype=None)
Returns a C-style numerical constant. If val
has a struct dtype, the generated constant will have the form { ... }
and can be used as an initializer for a variable.
reikna.cluda.dtypes.
c_path
(path)
Returns a string corresponding to the path
to a struct element in C. The path
is the sequence of field names/array indices returned from flatten_dtype()
.
reikna.cluda.dtypes.
cast
(dtype)
Returns function that takes one argument and casts it to dtype
.
reikna.cluda.dtypes.
complex_ctr
(dtype)
Returns name of the constructor for the given dtype
.
reikna.cluda.dtypes.
complex_for
(dtype)
Returns complex dtype corresponding to given floating point dtype
.
reikna.cluda.dtypes.
ctype
(dtype)
For a built-in C type, returns a string with the name of the type.
reikna.cluda.dtypes.
ctype_module
(dtype, ignore_alignment=False)
For a struct type, returns a Module
object with the typedef
of a struct corresponding to the given dtype
(with its name set to the module prefix); falls back to ctype()
otherwise.
The structure definition includes the alignment required to produce field offsets specified in dtype
; therefore, dtype
must be either a simple type, or have proper offsets and dtypes (the ones that can be reporoduced in C using explicit alignment attributes, but without additional padding) and the attribute isalignedstruct == True
. An aligned dtype can be produced either by standard means (aligned
flag in numpy.dtype
constructor and explicit offsets and itemsizes), or created out of an arbitrary dtype with the help of align()
.
If ignore_alignment
is True, all of the above is ignored. The C structures produced will not have any explicit alignment modifiers. As a result, the the field offsets of dtype
may differ from the ones chosen by the compiler.
Modules are cached and the function returns a single module instance for equal dtype
’s. Therefore inside a kernel it will be rendered with the same prefix everywhere it is used. This results in a behavior characteristic for a structural type system, same as for the basic dtype-ctype conversion.
Warning
As of numpy
1.8, the isalignedstruct
attribute is not enough to ensure a mapping between a dtype and a C struct with only the fields that are present in the dtype. Therefore, ctype_module
will make some additional checks and raise ValueError
if it is not the case.
reikna.cluda.dtypes.
detect_type
(val)
Find out the data type of val
.
reikna.cluda.dtypes.
extract_field
(arr, path)
Extracts an element from an array of struct dtype. The path
is the sequence of field names/array indices returned from flatten_dtype()
.
reikna.cluda.dtypes.
flatten_dtype
(dtype)
Returns a list of tuples (path, dtype)
for each of the basic dtypes in a (possibly nested) dtype
. path
is a list of field names/array indices leading to the corresponding element.
reikna.cluda.dtypes.
is_complex
(dtype)
Returns True
if dtype
is complex.
reikna.cluda.dtypes.
is_double
(dtype)
Returns True
if dtype
is double precision floating point.
reikna.cluda.dtypes.
is_integer
(dtype)
Returns True
if dtype
is an integer.
reikna.cluda.dtypes.
is_real
(dtype)
Returns True
if dtype
is a real.
reikna.cluda.dtypes.
min_scalar_type
(val)
Wrapper for numpy.min_scalar_dtype
which takes into account types supported by GPUs.
reikna.cluda.dtypes.
normalize_type
(dtype)
Function for wrapping all dtypes coming from the user. numpy
uses two different classes to represent dtypes, and one of them does not have some important attributes.
reikna.cluda.dtypes.
normalize_types
(dtypes)
Same as normalize_type()
, but operates on a list of dtypes.
reikna.cluda.dtypes.
real_for
(dtype)
Returns floating point dtype corresponding to given complex dtype
.
reikna.cluda.dtypes.
result_type
(*dtypes)
Wrapper for numpy.result_type
which takes into account types supported by GPUs.
reikna.cluda.dtypes.
zero_ctr
(dtype)
Returns the string with constructed zero value for the given dtype
.
Classes necessary to create computations and transformations are exposed from the core
module.
class reikna.core.
Type
(dtype, shape=None, strides=None, offset=0, nbytes=None)
Represents an array or, as a degenerate case, scalar type of a computation parameter.
shape
A tuple of integers. Scalars are represented by an empty tuple.
dtype
A numpy.dtype
instance.
ctype
A string with the name of C type corresponding to dtype
, or a module if it is a struct type.
strides
Tuple of bytes to step in each dimension when traversing an array.
offset
The initial offset (in bytes).
nbytes
The total size of the memory buffer (in bytes)
__call__
(val)
Casts the given value to this type.
classmethod from_value
(val)
Creates a Type
object corresponding to the given value.
classmethod padded
(dtype, shape, pad=0)
Creates a Type
object corresponding to an array padded from all dimensions by pad elements.
class reikna.core.
Annotation
(type_, role=None, constant=False)
Computation parameter annotation, in the same sense as it is used for functions in the standard library.
Parameters: |
|
---|
class reikna.core.
Parameter
(name, annotation, default=
Computation parameter, in the same sense as it is used for functions in the standard library. In its terms, all computation parameters have kind POSITIONAL_OR_KEYWORD
.
Parameters: |
|
---|
rename
(new_name)
Creates a new Parameter
object with the new name and the same annotation and default value.
class reikna.core.
Signature
(parameters)
Computation signature, in the same sense as it is used for functions in the standard library.
Parameters: | parameters – a list of Parameter objects. |
---|
parameters
An OrderedDict
with Parameter
objects indexed by their names.
bind_with_defaults
(args, kwds, cast=False)
Binds passed positional and keyword arguments to parameters in the signature and returns the resulting BoundArguments
object.
class reikna.core.
Computation
(root_parameters)
A base class for computations, intended to be subclassed.
Parameters: | root_parameters – a list of Parameter objects. |
---|
signature
A Signature
object representing current computation signature (taking into account connected transformations).
parameter
A named tuple of ComputationParameter
objects corresponding to parameters from the current signature
.
_build_plan
(plan_factory, device_params, *args)
Derived classes override this method. It is called by compile()
and supposed to return a ComputationPlan
object.
Parameters: |
|
---|
_update_attributes
()
Updates signature
and parameter
attributes. Called by the methods that change the signature.
compile
(thread, fast_math=False, compiler_options=None, keep=False)
Compiles the computation with the given Thread
object and returns a ComputationCallable
object. If fast_math
is enabled, the compilation of all kernels is performed using the compiler options for fast and imprecise mathematical functions. compiler_options
can be used to pass a list of strings as arguments to the backend compiler. If keep
is True
, the generated kernels and binaries will be preserved in temporary directories.
connect
(_comp_connector, _trf, _tr_connector, **tr_from_comp)
Connect a transformation to the computation.
Parameters: |
|
---|---|
Returns: | this computation object (modified). |
Note
The resulting parameter order is determined by traversing the graph of connections depth-first (starting from the initial computation parameters), with the additional condition: the nodes do not change their order in the same branching level (i.e. in the list of computation or transformation parameters, both of which are ordered).
For example, consider a computation with parameters (a, b, c, d)
. If you connect a transformation (a', c) -> a
, the resulting computation will have the signature (a', b, c, d)
(as opposed to (a', c, b, d)
it would have for the pure depth-first traversal).
class reikna.core.
Transformation
(parameters, code, render_kwds=None, connectors=None)
A class containing a pure parallel transformation of arrays. Some restrictions apply:
'io'
arguments;load_same
or store_same
, and does it only once.Parameters: |
|
---|
class reikna.core.
Indices
(shape)
Encapsulates the information about index variables available for the snippet.
__getitem__
(dim)
Returns the name of the index varibale for the dimension dim
.
all
()
Returns the comma-separated list of all index variable names (useful for passing the guiding indices verbatim in a load or store call).
class reikna.core.computation.
ComputationCallable
(thread, parameters, kernel_calls, internal_args, temp_buffers)
A result of calling compile()
on a computation. Represents a callable opaque GPGPU computation.
thread
A Thread
object used to compile the computation.
signature
A Signature
object.
parameter
A named tuple of Type
objects corresponding to the callable’s parameters.
__call__
(*args, **kwds)
Execute the computation. In case of the OpenCL backend, returns a list of pyopencl.Event
objects from nested kernel calls.
class reikna.core.computation.
ComputationParameter
(computation, name, type_)
Bases: Type
Represents a typed computation parameter. Can be used as a substitute of an array for functions which are only interested in array metadata.
connect
(_trf, _tr_connector, **tr_from_comp)
Shortcut for connect()
with this parameter as a first argument.
class reikna.core.computation.
KernelArgument
(name, type_)
Bases: Type
Represents an argument suitable to pass to planned kernel or computation call.
class reikna.core.computation.
ComputationPlan
(tr_tree, translator, thread, fast_math, compiler_options, keep)
Computation plan recorder.
computation_call
(computation, *args, **kwds)
Adds a nested computation call. The computation
value must be a Computation
object. args
and kwds
are values to be passed to the computation.
constant_array
(arr)
Adds a constant GPU array to the plan, and returns the corresponding KernelArgument
.
kernel_call
(template_def, args, global_size, local_size=None, render_kwds=None, kernel_name='_kernel_func')
Adds a kernel call to the plan.
Parameters: |
|
---|
persistent_array
(arr)
Adds a persistent GPU array to the plan, and returns the corresponding KernelArgument
.
temp_array
(shape, dtype, strides=None, offset=0, nbytes=None)
Adds a temporary GPU array to the plan, and returns the corresponding KernelArgument
. See array()
for the information about the parameters.
Temporary arrays can share physical memory, but in such a way that their contents is guaranteed to persist between the first and the last use in a kernel during the execution of the plan.
temp_array_like
(arr)
Same as temp_array()
, taking the array properties from array or array-like object arr
.
Warning
Note that pycuda.GPUArray
objects do not have the offset
attribute.
class reikna.core.transformation.
TransformationParameter
(trf, name, type_)
Bases: Type
Represents a typed transformation parameter. Can be used as a substitute of an array for functions which are only interested in array metadata.
class reikna.core.transformation.
KernelParameter
(name, type_, load_idx=None, store_idx=None, load_same=None, store_same=None, load_combined_idx=None, store_combined_idx=None)
Providing an interface for accessing kernel arguments in a template. Depending on the parameter type, and whether it is used inside a computation or a transformation template, can have different load/store attributes available.
name
Parameter name
shape
dtype
ctype
strides
offset
Same as in Type
.
__str__
()
Returns the C kernel parameter name corresponding to this parameter. It is the only method available for scalar parameters.
load_idx
A module providing a macro with the signature (idx0, idx1, ...)
, returning the corresponding element of the array.
store_idx
A module providing a macro with the signature (idx0, idx1, ..., val)
, saving val
into the specified position.
load_combined_idx
(slices)
A module providing a macro with the signature (cidx0, cidx1, ...)
, returning the element of the array corresponding to the new slicing of indices (e.g. an array with shape (2, 3, 4, 5, 6)
sliced as slices=(2, 2, 1)
is indexed as an array with shape (6, 20, 6)
).
store_combined_idx
(slices)
A module providing a macro with the signature (cidx0, cidx1, ..., val)
, saving val
into the specified position corresponding to the new slicing of indices.
load_same
A module providing a macro that returns the element of the array corresponding to the indices used by the caller of the transformation.
store_same
A module providing a macro with the signature (val)
that stores val
using the indices used by the caller of the transformation.
General purpose algorithms.
class reikna.algorithms.
PureParallel
(parameters, code, guiding_array=None, render_kwds=None)
Bases: Computation
A general class for pure parallel computations (i.e. with no interaction between threads).
Parameters: |
|
---|
compiled_signature
(*args)
Parameters: | args – corresponds to the given parameters . |
---|
classmethod from_trf
(trf, guiding_array=None)
Creates a PureParallel
instance from a Transformation
object. guiding_array
can be a string with a name of an array parameter from trf
, or the corresponding TransformationParameter
object.
class reikna.algorithms.
Transpose
(arr_t, output_arr_t=None, axes=None, block_width_override=None)
Bases: Computation
Changes the order of axes in a multidimensional array. Works analogous to numpy.transpose
.
Parameters: |
|
---|
compiled_signature
(output:o, input:i)
Parameters: |
|
---|
class reikna.algorithms.
Reduce
(arr_t, predicate, axes=None, output_arr_t=None)
Bases: Computation
Reduces the array over given axis using given binary operation.
Parameters: |
|
---|
compiled_signature
(output:o, input:i)
Parameters: |
|
---|
class reikna.algorithms.
Scan
(arr_t, predicate, axes=None, exclusive=False, max_work_group_size=None, seq_size=None)
Bases: Computation
Scans the array over given axis using given binary operation. Namely, from an array [a, b, c, d, ...]
and an operation .
, produces [a, a.b, a.b.c, a.b.c.d, ...]
if exclusive
is False
and [0, a, a.b, a.b.c, ...]
if exclusive
is True
(here 0
is the operation’s identity element).
Parameters: |
|
---|
compiled_signature
(output:o, input:i)
Parameters: |
|
---|
class reikna.algorithms.
Predicate
(operation, empty)
A predicate used in some of Reikna algorithms (e.g. Reduce
or Scan
).
Parameters: |
|
---|
reikna.algorithms.
predicate_sum
(dtype)
Returns a Predicate
object which sums its arguments.
Linear algebra algorithms.
class reikna.linalg.
MatrixMul
(a_arr, b_arr, out_arr=None, block_width_override=None, transposed_a=False, transposed_b=False)
Bases: Computation
Multiplies two matrices using last two dimensions and batching over remaining dimensions. For batching to work, the products of remaining dimensions should be equal (then the multiplication will be performed piecewise), or one of them should equal 1 (then the multiplication will be batched over the remaining dimensions of the other matrix).
Parameters: |
|
---|
compiled_signature
(output:o, matrix_a:i, matrix_b:i)
Parameters: |
|
---|
class reikna.linalg.
EntrywiseNorm
(arr_t, order=2, axes=None)
Bases: Computation
Calculates the entrywise matrix norm (same as numpy.linalg.norm
) of an arbitrary order r
:
||A||r=(∑i,j,…|Ai,j,…|r)1/r
Parameters: |
|
---|
|
compiled_signature
(output:o, input:i)
Parameters: |
|
---|
class reikna.fft.
FFT
(arr_t, axes=None)
Bases: Computation
Performs the Fast Fourier Transform. The interface is similar to numpy.fft.fftn
. The inverse transform is normalized so that IFFT(FFT(X)) = X
.
Parameters: |
|
---|
Note
Current algorithm works most effectively with array dimensions being power of 2 This mostly applies to the axes over which the transform is performed, beacuse otherwise the computation falls back to the Bluestein’s algorithm, which effectively halves the performance.
compiled_signature
(output:o, input:i, inverse:s)
output
and input
may be the same array.
Parameters: |
|
---|
class reikna.fft.
FFTShift
(arr_t, axes=None)
Bases: Computation
Shift the zero-frequency component to the center of the spectrum. The interface is similar to numpy.fft.fftshift
, and the output is the same for the same array shape and axes.
Parameters: |
|
---|
compiled_signature
(output:o, input:i)
output
and input
may be the same array.
Parameters: |
|
---|
reikna.dht.
get_spatial_grid
(modes, order, add_points=0)
Returns the spatial grid required to calculate the order
power of a function defined in the harmonic mode space of the size modes
. If add_points
is 0, the grid has the minimum size required for exact transformation back to the mode space.
reikna.dht.
harmonic
(mode)
Returns an eigenfunction of order n=mode for the harmonic oscillator:
ϕn=1π−−√42nn!−−−−√Hn(x)exp(−x2/2),
where Hn is the n-th order “physicists’” Hermite polynomial. The normalization is chosen so that ∫ϕ2n(x)dx=1.
class reikna.dht.
DHT
(mode_arr, add_points=None, inverse=False, order=1, axes=None)
Bases: Computation
Discrete transform to and from harmonic oscillator modes. With inverse=True
transforms a function defined by its expansion Cm,m=0…M−1
in the mode space with mode functions from harmonic()
, to the coordinate space (F(x) on the grid x from get_spatial_grid()
). With inverse=False
guarantees to recover first M modes of Fk(x), where k
is the order
parameter.
For multiple dimensions the operation is the same, and the mode functions are products of 1D mode functions, i.e. ϕ3Dl,m,n(x,y,z)=ϕl(x)ϕm(y)ϕn(z)
.
For the detailed description of the algorithm, see Dion & Cances, PRE 67(4) 046706 (2003)
Parameters: |
|
---|
compiled_signature_forward
(modes:o, coords:i)
compiled_signature_inverse
(coords:o, modes:i)
Depending on inverse
value, either of these two will be created.
Parameters: |
|
---|
This module is based on the paper by Salmon et al., P. Int. C. High. Perform. 16 (2011). and the source code of Random123 library.
A counter-based random-number generator (CBRNG) is a parametrized function fk(c) , where k is the key, c is the counter, and the function fk
defines a bijection in the set of integer numbers. Being applied to successive counters, the function produces a sequence of pseudo-random numbers. The key is an analogue of the seed of stateful RNGs; if the CBRNG is used to generate random num bers in parallel threads, the key is a combination of a seed and a unique thread number.
There are two types of generators available, threefry
(uses large number of simple functions), and philox
(uses smaller number of more complicated functions). The latter one is generally faster on GPUs; see the paper above for detailed comparisons. These generators can be further specialized to use words=2
or words=4
bitness=32
-bit or bitness=64
-bit counters. Obviously, the period of the generator equals to the cardinality of the set of possible counters. For example, if the counter consits of 4 64-bit numbers, then the period of the generator is 2256 . As for the key size, in case of threefry
the key has the same size as the counter, and for philox
the key is half its size.
The CBRNG
class sets one of the words of the key (except for philox-2x64
, where 32 bit of the only word in the key are used), the rest are the same for all threads and are derived from the provided seed
. This limits the maximum number of number-generating threads (size
). philox-2x32
has a 32-bit key and therefore cannot be used in CBRNG
(although it can be used separately with the help of the kernel API).
The CBRNG
class itself is stateless, same as other computations in Reikna, so you have to manage the generator state yourself. The state is created by the create_counters()
method and contains a size
counters. This state is then passed to, and updated by a CBRNG
object.
class reikna.cbrng.
CBRNG
(randoms_arr, generators_dim, sampler, seed=None)
Bases: Computation
Counter-based pseudo-random number generator class.
Parameters: |
|
---|
classmethod sampler_name
(randoms_arr, generators_dim, sampler_kwds=None, seed=None)
A convenience constructor for the sampler sampler_name
from samplers
. The contents of the dictionary sampler_kwds
will be passed to the sampler constructor function (with bijection
being created automatically, and dtype
taken from randoms_arr
).
compiled_signature
(counters:io, randoms:o)
Parameters: |
|
---|
create_counters
()
Create a counter array for use in CBRNG
.
class reikna.cbrng.bijections.
Bijection
(module, word_dtype, key_dtype, counter_dtype)
Contains a CBRNG bijection module and accompanying metadata. Supports __process_modules__
protocol.
word_dtype
The data type of the integer word used by the generator.
key_words
The number of words used by the key.
counter_words
The number of words used by the counter.
key_dtype
The numpy.dtype
object representing a bijection key. Contains a single array field v
with key_words
of word_dtype
elements.
counter_dtype
The numpy.dtype
object representing a bijection counter. Contains a single array field v
with key_words
of word_dtype
elements.
raw_functions
A dictionary dtype:function_name
of available functions function_name
in module
that produce a random full-range integer dtype
from a State
, advancing it. Available functions: get_raw_uint32()
, get_raw_uint64()
.
module
The module containing the CBRNG function. It provides the C functions below.
COUNTER_WORDS
Contains the value of counter_words
.
KEY_WORDS
Contains the value of key_words
.
Word
Contains the type corresponding to word_dtype
.
Key
Describes the bijection key. Alias for the structure generated from key_dtype
.
Word v[KEY_WORDS]
Counter
Describes the bijection counter, or its output. Alias for the structure generated from counter_dtype
.
Word v[COUNTER_WORDS]
Counter make_counter_from_int
(int x)
Creates a counter object from an integer.
Counter bijection
(Key key, Counter counter)
The main bijection function.
State
A structure containing the CBRNG state which is used by samplers
.
State make_state
(Key key, Counter counter)
Creates a new state object.
Counter get_next_unused_counter
(State state)
Extracts a counter which has not been used in random sampling.
uint32
A type of unsigned 32-bit word, corresponds to numpy.uint32
.
uint64
A type of unsigned 64-bit word, corresponds to numpy.uint64
.
uint32 get_raw_uint32
(State *state)
Returns uniformly distributed unsigned 32-bit word and updates the state.
uint64 get_raw_uint64
(State *state)
Returns uniformly distributed unsigned 64-bit word and updates the state.
reikna.cbrng.bijections.
philox
(bitness, counter_words, rounds=10)
A CBRNG based on a low number of slow rounds (multiplications).
Parameters: |
|
---|---|
Returns: | a |
reikna.cbrng.bijections.
threefry
(bitness, counter_words, rounds=20)
A CBRNG based on a big number of fast rounds (bit rotations).
Parameters: |
|
---|---|
Returns: | a |
class reikna.cbrng.samplers.
Sampler
(bijection, module, dtype, randoms_per_call=1, deterministic=False)
Contains a random distribution sampler module and accompanying metadata. Supports __process_modules__
protocol.
deterministic
If True
, every sampled random number consumes the same amount of counters.
randoms_per_call
How many random numbers one call to sample
creates.
dtype
The data type of one random value produced by the sampler.
module
The module containing the distribution sampling function. It provides the C functions below.
RANDOMS_PER_CALL
Contains the value of randoms_per_call
.
Value
Contains the type corresponding to dtype
.
Result
Describes the sampling result.
value v[RANDOMS_PER_CALL]
Result sample
(State *state)
Performs the sampling, updating the state.
reikna.cbrng.samplers.
gamma
(bijection, dtype, shape=1, scale=1)
Generates random numbers from the gamma distribution P(x)=xk−1e−x/θθkΓ(k), where k is shape
, and θ is scale
. Supported dtypes: float(32/64)
. Returns a Sampler
object.
reikna.cbrng.samplers.
normal_bm
(bijection, dtype, mean=0, std=1)
Generates normally distributed random numbers with the mean mean
and the standard deviation std
using Box-Muller transform. Supported dtypes: float(32/64)
, complex(64/128)
. Produces two random numbers per call for real types and one number for complex types. Returns a Sampler
object.
Note
In case of a complex dtype
, std
refers to the standard deviation of the complex numbers (same as numpy.std()
returns), not real and imaginary components (which will be normally distributed with the standard deviation std / sqrt(2)
). Consequently, while mean
is of type dtype
, std
must be real.
reikna.cbrng.samplers.
uniform_float
(bijection, dtype, low=0, high=1)
Generates uniformly distributed floating-points numbers in the interval [low, high)
. Supported dtypes: float(32/64)
. A fixed number of counters is used in each thread. Returns a Sampler
object.
reikna.cbrng.samplers.
uniform_integer
(bijection, dtype, low, high=None)
Generates uniformly distributed integer numbers in the interval [low, high)
. If high
is None
, the interval is [0, low)
. Supported dtypes: any numpy integers. If the size of the interval is a power of 2, a fixed number of counters is used in each thread. Returns a Sampler
object.
reikna.cbrng.samplers.
vonmises
(bijection, dtype, mu=0, kappa=1)
Generates random numbers from the von Mises distribution P(x)=exp(κcos(x−μ))2πI0(κ), where μ is the mode, κ is the dispersion, and I0 is the modified Bessel function of the first kind. Supported dtypes: float(32/64)
. Returns a Sampler
object.
class reikna.cbrng.tools.
KeyGenerator
(module, base_key)
Contains a key generator module and accompanying metadata. Supports __process_modules__
protocol.
module
A module with the key generator function:
Key key_from_int
(int idx)
Generates and returns a key, suitable for the bijection which was given to the constructor.
classmethod create
(bijection, seed=None, reserve_id_space=True)
Creates a generator.
Parameters: |
|
---|
reference
(idx)
Reference function that returns the key given the thread identifier. Uses the same algorithm as the module.
General purpose algorithms.
class reikna.algorithms.
PureParallel
(parameters, code, guiding_array=None, render_kwds=None)
Bases: Computation
A general class for pure parallel computations (i.e. with no interaction between threads).
Parameters: |
|
---|
compiled_signature
(*args)
Parameters: | args – corresponds to the given parameters . |
---|
classmethod from_trf
(trf, guiding_array=None)
Creates a PureParallel
instance from a Transformation
object. guiding_array
can be a string with a name of an array parameter from trf
, or the corresponding TransformationParameter
object.
class reikna.algorithms.
Transpose
(arr_t, output_arr_t=None, axes=None, block_width_override=None)
Bases: Computation
Changes the order of axes in a multidimensional array. Works analogous to numpy.transpose
.
Parameters: |
|
---|
compiled_signature
(output:o, input:i)
Parameters: |
|
---|
class reikna.algorithms.
Reduce
(arr_t, predicate, axes=None, output_arr_t=None)
Bases: Computation
Reduces the array over given axis using given binary operation.
Parameters: |
|
---|
compiled_signature
(output:o, input:i)
Parameters: |
|
---|
class reikna.algorithms.
Scan
(arr_t, predicate, axes=None, exclusive=False, max_work_group_size=None, seq_size=None)
Bases: Computation
Scans the array over given axis using given binary operation. Namely, from an array [a, b, c, d, ...]
and an operation .
, produces [a, a.b, a.b.c, a.b.c.d, ...]
if exclusive
is False
and [0, a, a.b, a.b.c, ...]
if exclusive
is True
(here 0
is the operation’s identity element).
Parameters: |
|
---|
compiled_signature
(output:o, input:i)
Parameters: |
|
---|
class reikna.algorithms.
Predicate
(operation, empty)
A predicate used in some of Reikna algorithms (e.g. Reduce
or Scan
).
Parameters: |
|
---|
reikna.algorithms.
predicate_sum
(dtype)
Returns a Predicate
object which sums its arguments.
Linear algebra algorithms.
class reikna.linalg.
MatrixMul
(a_arr, b_arr, out_arr=None, block_width_override=None, transposed_a=False, transposed_b=False)
Bases: Computation
Multiplies two matrices using last two dimensions and batching over remaining dimensions. For batching to work, the products of remaining dimensions should be equal (then the multiplication will be performed piecewise), or one of them should equal 1 (then the multiplication will be batched over the remaining dimensions of the other matrix).
Parameters: |
|
---|
compiled_signature
(output:o, matrix_a:i, matrix_b:i)
Parameters: |
|
---|
class reikna.linalg.
EntrywiseNorm
(arr_t, order=2, axes=None)
Bases: Computation
Calculates the entrywise matrix norm (same as numpy.linalg.norm
) of an arbitrary order r
:
||A||r=(∑i,j,…|Ai,j,…|r)1/r
Parameters: |
|
---|
|
compiled_signature
(output:o, input:i)
Parameters: |
|
---|
class reikna.fft.
FFT
(arr_t, axes=None)
Bases: Computation
Performs the Fast Fourier Transform. The interface is similar to numpy.fft.fftn
. The inverse transform is normalized so that IFFT(FFT(X)) = X
.
Parameters: |
|
---|
Note
Current algorithm works most effectively with array dimensions being power of 2 This mostly applies to the axes over which the transform is performed, beacuse otherwise the computation falls back to the Bluestein’s algorithm, which effectively halves the performance.
compiled_signature
(output:o, input:i, inverse:s)
output
and input
may be the same array.
Parameters: |
|
---|
class reikna.fft.
FFTShift
(arr_t, axes=None)
Bases: Computation
Shift the zero-frequency component to the center of the spectrum. The interface is similar to numpy.fft.fftshift
, and the output is the same for the same array shape and axes.
Parameters: |
|
---|
compiled_signature
(output:o, input:i)
output
and input
may be the same array.
Parameters: |
|
---|
reikna.dht.
get_spatial_grid
(modes, order, add_points=0)
Returns the spatial grid required to calculate the order
power of a function defined in the harmonic mode space of the size modes
. If add_points
is 0, the grid has the minimum size required for exact transformation back to the mode space.
reikna.dht.
harmonic
(mode)
Returns an eigenfunction of order n=mode for the harmonic oscillator: ϕn=1π−−√42nn!−−−−√Hn(x)exp(−x2/2), where Hn is the n-th order “physicists’” Hermite polynomial. The normalization is chosen so that ∫ϕ2n(x)dx=1.
class reikna.dht.
DHT
(mode_arr, add_points=None, inverse=False, order=1, axes=None)
Bases: Computation
Discrete transform to and from harmonic oscillator modes. With inverse=True
transforms a function defined by its expansion Cm,m=0…M−1 in the mode space with mode functions from harmonic()
, to the coordinate space (F(x) on the grid x from get_spatial_grid()
). With inverse=False
guarantees to recover first M modes of Fk(x), where k is the order
parameter.
For multiple dimensions the operation is the same, and the mode functions are products of 1D mode functions, i.e. ϕ3Dl,m,n(x,y,z)=ϕl(x)ϕm(y)ϕn(z).
For the detailed description of the algorithm, see Dion & Cances, PRE 67(4) 046706 (2003)
Parameters: |
|
---|
compiled_signature_forward
(modes:o, coords:i)
compiled_signature_inverse
(coords:o, modes:i)
Depending on inverse
value, either of these two will be created.
Parameters: |
|
---|
This module is based on the paper by Salmon et al., P. Int. C. High. Perform. 16 (2011). and the source code of Random123 library.
A counter-based random-number generator (CBRNG) is a parametrized function fk(c), where k is the key, c is the counter, and the function fk defines a bijection in the set of integer numbers. Being applied to successive counters, the function produces a sequence of pseudo-random numbers. The key is an analogue of the seed of stateful RNGs; if the CBRNG is used to generate random num bers in parallel threads, the key is a combination of a seed and a unique thread number.
There are two types of generators available, threefry
(uses large number of simple functions), and philox
(uses smaller number of more complicated functions). The latter one is generally faster on GPUs; see the paper above for detailed comparisons. These generators can be further specialized to use words=2
or words=4
bitness=32
-bit or bitness=64
-bit counters. Obviously, the period of the generator equals to the cardinality of the set of possible counters. For example, if the counter consits of 4 64-bit numbers, then the period of the generator is 2256 . As for the key size, in case of threefry
the key has the same size as the counter, and for philox
the key is half its size.
The CBRNG
class sets one of the words of the key (except for philox-2x64
, where 32 bit of the only word in the key are used), the rest are the same for all threads and are derived from the provided seed
. This limits the maximum number of number-generating threads (size
). philox-2x32
has a 32-bit key and therefore cannot be used in CBRNG
(although it can be used separately with the help of the kernel API).
The CBRNG
class itself is stateless, same as other computations in Reikna, so you have to manage the generator state yourself. The state is created by the create_counters()
method and contains a size
counters. This state is then passed to, and updated by a CBRNG
object.
class reikna.cbrng.
CBRNG
(randoms_arr, generators_dim, sampler, seed=None)
Bases: Computation
Counter-based pseudo-random number generator class.
Parameters: |
|
---|
classmethod sampler_name
(randoms_arr, generators_dim, sampler_kwds=None, seed=None)
A convenience constructor for the sampler sampler_name
from samplers
. The contents of the dictionary sampler_kwds
will be passed to the sampler constructor function (with bijection
being created automatically, and dtype
taken from randoms_arr
).
compiled_signature
(counters:io, randoms:o)
Parameters: |
|
---|
create_counters
()
Create a counter array for use in CBRNG
.
class reikna.cbrng.bijections.
Bijection
(module, word_dtype, key_dtype, counter_dtype)
Contains a CBRNG bijection module and accompanying metadata. Supports __process_modules__
protocol.
word_dtype
The data type of the integer word used by the generator.
key_words
The number of words used by the key.
counter_words
The number of words used by the counter.
key_dtype
The numpy.dtype
object representing a bijection key. Contains a single array field v
with key_words
of word_dtype
elements.
counter_dtype
The numpy.dtype
object representing a bijection counter. Contains a single array field v
with key_words
of word_dtype
elements.
raw_functions
A dictionary dtype:function_name
of available functions function_name
in module
that produce a random full-range integer dtype
from a State
, advancing it. Available functions: get_raw_uint32()
, get_raw_uint64()
.
module
The module containing the CBRNG function. It provides the C functions below.
COUNTER_WORDS
Contains the value of counter_words
.
KEY_WORDS
Contains the value of key_words
.
Word
Contains the type corresponding to word_dtype
.
Key
Describes the bijection key. Alias for the structure generated from key_dtype
.
Word v[KEY_WORDS]
Counter
Describes the bijection counter, or its output. Alias for the structure generated from counter_dtype
.
Word v[COUNTER_WORDS]
Counter make_counter_from_int
(int x)
Creates a counter object from an integer.
Counter bijection
(Key key, Counter counter)
The main bijection function.
State
A structure containing the CBRNG state which is used by samplers
.
State make_state
(Key key, Counter counter)
Creates a new state object.
Counter get_next_unused_counter
(State state)
Extracts a counter which has not been used in random sampling.
uint32
A type of unsigned 32-bit word, corresponds to numpy.uint32
.
uint64
A type of unsigned 64-bit word, corresponds to numpy.uint64
.
uint32 get_raw_uint32
(State *state)
Returns uniformly distributed unsigned 32-bit word and updates the state.
uint64 get_raw_uint64
(State *state)
Returns uniformly distributed unsigned 64-bit word and updates the state.
reikna.cbrng.bijections.
philox
(bitness, counter_words, rounds=10)
A CBRNG based on a low number of slow rounds (multiplications).
Parameters: |
|
---|---|
Returns: | a |
reikna.cbrng.bijections.
threefry
(bitness, counter_words, rounds=20)
A CBRNG based on a big number of fast rounds (bit rotations).
Parameters: |
|
---|---|
Returns: | a |
class reikna.cbrng.samplers.
Sampler
(bijection, module, dtype, randoms_per_call=1, deterministic=False)
Contains a random distribution sampler module and accompanying metadata. Supports __process_modules__
protocol.
deterministic
If True
, every sampled random number consumes the same amount of counters.
randoms_per_call
How many random numbers one call to sample
creates.
dtype
The data type of one random value produced by the sampler.
module
The module containing the distribution sampling function. It provides the C functions below.
RANDOMS_PER_CALL
Contains the value of randoms_per_call
.
Value
Contains the type corresponding to dtype
.
Result
Describes the sampling result.
value v[RANDOMS_PER_CALL]
Result sample
(State *state)
Performs the sampling, updating the state.
reikna.cbrng.samplers.
gamma
(bijection, dtype, shape=1, scale=1)
Generates random numbers from the gamma distribution
P(x)=xk−1e−x/θθkΓ(k), where k is shape
, and θ is scale
. Supported dtypes: float(32/64)
. Returns a Sampler
object.
reikna.cbrng.samplers.
normal_bm
(bijection, dtype, mean=0, std=1)
Generates normally distributed random numbers with the mean mean
and the standard deviation std
using Box-Muller transform. Supported dtypes: float(32/64)
, complex(64/128)
. Produces two random numbers per call for real types and one number for complex types. Returns a Sampler
object.
Note
In case of a complex dtype
, std
refers to the standard deviation of the complex numbers (same as numpy.std()
returns), not real and imaginary components (which will be normally distributed with the standard deviation std / sqrt(2)
). Consequently, while mean
is of type dtype
, std
must be real.
reikna.cbrng.samplers.
uniform_float
(bijection, dtype, low=0, high=1)
Generates uniformly distributed floating-points numbers in the interval [low, high)
. Supported dtypes: float(32/64)
. A fixed number of counters is used in each thread. Returns a Sampler
object.
reikna.cbrng.samplers.
uniform_integer
(bijection, dtype, low, high=None)
Generates uniformly distributed integer numbers in the interval [low, high)
. If high
is None
, the interval is [0, low)
. Supported dtypes: any numpy integers. If the size of the interval is a power of 2, a fixed number of counters is used in each thread. Returns a Sampler
object.
reikna.cbrng.samplers.
vonmises
(bijection, dtype, mu=0, kappa=1)
Generates random numbers from the von Mises distribution P(x)=exp(κcos(x−μ))2πI0(κ), where μ is the mode, κ is the dispersion, and I0 is the modified Bessel function of the first kind. Supported dtypes: float(32/64)
. Returns a Sampler
object.
class reikna.cbrng.tools.
KeyGenerator
(module, base_key)
Contains a key generator module and accompanying metadata. Supports __process_modules__
protocol.
module
A module with the key generator function:
Key key_from_int
(int idx)
Generates and returns a key, suitable for the bijection which was given to the constructor.
classmethod create
(bijection, seed=None, reserve_id_space=True)
Creates a generator.
Parameters: |
|
---|
reference
(idx)
Reference function that returns the key given the thread identifier. Uses the same algorithm as the module.
This module contains a number of pre-created transformations.
reikna.transformations.
add_const
(arr_t, param)
Returns an addition transformation with a fixed parameter (1 output, 1 input): output = input + param
.
reikna.transformations.
add_param
(arr_t, param_dtype)
Returns an addition transformation with a dynamic parameter (1 output, 1 input, 1 scalar): output = input + param
.
reikna.transformations.
broadcast_const
(arr_t, val)
Returns a transformation that broadcasts the given constant to the array output (1 output): output = val
.
reikna.transformations.
broadcast_param
(arr_t)
Returns a transformation that broadcasts the free parameter to the array output (1 output, 1 param): output = param
.
reikna.transformations.
combine_complex
(output_arr_t)
Returns a transformation that joins two real inputs into complex output (1 output, 2 inputs): output = real + 1j * imag
.
reikna.transformations.
copy
(arr_t, out_arr_t=None)
Returns an identity transformation (1 output, 1 input): output = input
. Output array type out_arr_t
may have different strides, but must have the same shape and data type.
reikna.transformations.
div_const
(arr_t, param)
Returns a scaling transformation with a fixed parameter (1 output, 1 input): output = input / param
.
reikna.transformations.
div_param
(arr_t, param_dtype)
Returns a scaling transformation with a dynamic parameter (1 output, 1 input, 1 scalar): output = input / param
.
reikna.transformations.
ignore
(arr_t)
Returns a transformation that ignores the output it is attached to.
reikna.transformations.
mul_const
(arr_t, param)
Returns a scaling transformation with a fixed parameter (1 output, 1 input): output = input * param
.
reikna.transformations.
mul_param
(arr_t, param_dtype)
Returns a scaling transformation with a dynamic parameter (1 output, 1 input, 1 scalar): output = input * param
.
reikna.transformations.
norm_const
(arr_t, order)
Returns a transformation that calculates the order
-norm (1 output, 1 input): output = abs(input) ** order
.
reikna.transformations.
norm_param
(arr_t)
Returns a transformation that calculates the order
-norm (1 output, 1 input, 1 param): output = abs(input) ** order
.
reikna.transformations.
split_complex
(input_arr_t)
Returns a transformation that splits complex input into two real outputs (2 outputs, 1 input): real = Re(input), imag = Im(input)
.