42.cuBLAS开发指南中文版--cuBLAS中的Level-3函数gemm()

2.7.1. cublasgemm()

cublasStatus_t cublasSgemm(cublasHandle_t handle,
                           cublasOperation_t transa, cublasOperation_t transb,
                           int m, int n, int k,
                           const float           *alpha,
                           const float           *A, int lda,
                           const float           *B, int ldb,
                           const float           *beta,
                           float           *C, int ldc)
cublasStatus_t cublasDgemm(cublasHandle_t handle,
                           cublasOperation_t transa, cublasOperation_t transb,
                           int m, int n, int k,
                           const double          *alpha,
                           const double          *A, int lda,
                           const double          *B, int ldb,
                           const double          *beta,
                           double          *C, int ldc)
cublasStatus_t cublasCgemm(cublasHandle_t handle,
                           cublasOperation_t transa, cublasOperation_t transb,
                           int m, int n, int k,
                           const cuComplex       *alpha,
                           const cuComplex       *A, int lda,
                           const cuComplex       *B, int ldb,
                           const cuComplex       *beta,
                           cuComplex       *C, int ldc)
cublasStatus_t cublasZgemm(cublasHandle_t handle,
                           cublasOperation_t transa, cublasOperation_t transb,
                           int m, int n, int k,
                           const cuDoubleComplex *alpha,
                           const cuDoubleComplex *A, int lda,
                           const cuDoubleComplex *B, int ldb,
                           const cuDoubleComplex *beta,
                           cuDoubleComplex *C, int ldc)
cublasStatus_t cublasHgemm(cublasHandle_t handle,
                           cublasOperation_t transa, cublasOperation_t transb,
                           int m, int n, int k,
                           const __half *alpha,
                           const __half *A, int lda,
                           const __half *B, int ldb,
                           const __half *beta,
                           __half *C, int ldc)

此函数执行矩阵矩阵乘法

C = α o p ( A ) o p ( B ) + β C C = \alpha op(A)op(B) + \beta C C=αop(A)op(B)+βC

其中 α \alpha α β \beta β 是标量,A 、 B 和 C 是以列优先格式存储的矩阵,维度分别为 op(A) mxk 、 op(B) kxn 和 C mxn 。 另外,对于矩阵 A:

o p ( A ) = { A      如 果 t r a n s a = = C U B L A S _ O P _ N , A T    如 果 t r a n s a = = C U B L A S _ O P _ T , A H    如 果 t r a n s a = = C U B L A S _ O P _ C op(A)= \begin{cases} A\ \ \ \ 如果 transa == CUBLAS\_OP\_N,\\ A^T \ \ 如果 transa == CUBLAS\_OP\_T,\\ A^H \ \ 如果 transa == CUBLAS\_OP\_C \end{cases} op(A)=A    transa==CUBLAS_OP_N,AT  transa==CUBLAS_OP_T,AH  transa==CUBLAS_OP_C

这里op(B)定义的是B矩阵

Param. Memory In/out Meaning
handle input handle to the cuBLAS library context.
transa input Operation op(A) that is non- or (conj.) transpose.
transb input Operation op(B) that is non- or (conj.) transpose.
m input Number of rows of matrix op(A) and C.
n input Number of columns of matrix op(B) and C.
k input Number of columns of op(A) and rows of op(B).
alpha host or device input scalar used for multiplication.
A device input array of dimensions lda x k with lda>=max(1,m) if transa == CUBLAS_OP_N and lda x m with lda>=max(1,k) otherwise.
lda input Leading dimension of two-dimensional array used to store the matrix A.
B device input array of dimension ldb x n with ldb>=max(1,k) if transb == CUBLAS_OP_N and ldb x k with ldb>=max(1,n) otherwise.
ldb input Leading dimension of two-dimensional array used to store matrix B.
beta host or device input scalar used for multiplication. If beta==0, C does not have to be a valid input.
C device in/out array of dimensions ldc x n with ldc>=max(1,m).
ldc input Leading dimension of a two-dimensional array used to store the matrix C.

该函数可能返回的错误值及其含义如下表所示:

ErrorValue Meaning
CUBLAS_STATUS_SUCCESS 操作成功完成
CUBLAS_STATUS_NOT_INITIALIZED 库未初始化
CUBLAS_STATUS_INVALID_VALUE If m, n, k < 0 or if transa, transb != CUBLAS_OP_N, CUBLAS_OP_C, CUBLAS_OP_T or if lda < max(1, m) if transa == CUBLAS_OP_N and lda < max(1, k) otherwise or if ldb < max(1, k) if transb == CUBLAS_OP_N and ldb < max(1, n) otherwise or if ldc < max(1, m) or if alpha, beta == NULL or C == NULL if C needs to be scaled
CUBLAS_STATUS_EXECUTION_FAILED 该功能无法在 GPU 上启动

参考资料请参考:

sgemm, dgemm, cgemm, zgemm

你可能感兴趣的:(cuBLAS开发指南,算法,NVIDIA,cuBLAS,GPU,CUDA)