最近一直在看KALDI官网的资料,在看的同时加一些注解,方便自己的理解。
我的学习笔记基本上都是转自KALDI官方网址http://kaldi.sourceforge.net,并加上我的注解,特此说明。
The CUDA Matrix libary(CUDA Matrix库)
注:CUDA是NVIDIA公司的并行计算架构,该架构通过利用GPU的处理能力,可大幅提升计算性能。
The CUDA matrix library seamless wrapper of CUDA computation.
注:CUDA Matrix库可以无缝的包装CUDA运算。
Its purpose is to separate the low level CUDA-dependent routines from the high level C++ code.
注:其目的在于将低层依赖CUDA的行为从高层C++code中分离出来。
The library can be both compiled with or without CUDA libraries, depending on the HAVE_CUDA==1 macro. Without CUDA, the library backs-off to computation on host processor. The host processor is also used when the toolkit is compiled with CUDA and no suitable GPU is detected. This is particularly useful in heterogenous ``grid-like'' environments.
注:不论是否有CUDA库,CUDA Matrix库都能够编译,依赖于设置HAVE_CUDA==1。如果没有CUDA将使用主机处理器来进行编译。
Computationally, the library is based on CUBLAS linear algebra operations, and manually implemened grid-like kernels for the non-linear operations, which are conforming with the Map'' pattern. While most of theReduce'' kernels do use the tree-like computational pattern in conjunction with extensive use of the shared memory.
注:CUBLAS,是NVIDIA的一个GPU的blas库,提供的计算函数都在GPU上执行。
classes
The most important classes are: CuDevice CuMatrix CuVector CuStlVector.
注:主要的类,CuDevice CuMatrix CuVector CuStlVector.
CuDevice : is an abstraction of the GPU board, it is a singleton object which initializes CUBLAS library upon the application startup, and releases it at the end. It is also used to collect the profiling statistics.
CuMatrix : is a GPU analogy of the Matrix class. It holds a buffer in the GPU global memory, as well as a backup CPU buffer. It implements a subset of the Matrix interface. The host-GPU transfers are done by CopyFromMat CopyToMat methods, which may internally reallocate the buffers.
CuVector : is a GPU analogy of the Vector class. It holds a buffer in the GPU global memory, as well as a backup CPU buffer. It implements a subset of the Vector interface. The host-GPU transfers are done by CopyFromVec CopyToVec methods, which may internally reallocate the buffers.
CuStlVector : is particularly useful to create vectors of indices (int32)
mathematical operations
In cu-math.h are math functions which cannot be associated solely to a vector or a matrix. There are concentrated in the namespace cu::, in order to separate them from global namespace.
kernels
The CUDA kernels are concentrated in the cu-kernels.cu file. Since the CUDA code is compiled by NVCC, and the rest of the code is compiled by different compiler, the only possible way of interatation was to employ ANSI C interface cu-kernels.h, which represents a low-level interface to CUDA. The high level interface is via CuMatrix, CuVector and functions in the cu:: namespace.