cublasgemm()
cublasStatus_t cublasSgemm(cublasHandle_t handle,
cublasOperation_t transa, cublasOperation_t transb,
int m, int n, int k,
const float *alpha,
const float *A, int lda,
const float *B, int ldb,
const float *beta,
float *C, int ldc)
cublasStatus_t cublasDgemm(cublasHandle_t handle,
cublasOperation_t transa, cublasOperation_t transb,
int m, int n, int k,
const double *alpha,
const double *A, int lda,
const double *B, int ldb,
const double *beta,
double *C, int ldc)
cublasStatus_t cublasCgemm(cublasHandle_t handle,
cublasOperation_t transa, cublasOperation_t transb,
int m, int n, int k,
const cuComplex *alpha,
const cuComplex *A, int lda,
const cuComplex *B, int ldb,
const cuComplex *beta,
cuComplex *C, int ldc)
cublasStatus_t cublasZgemm(cublasHandle_t handle,
cublasOperation_t transa, cublasOperation_t transb,
int m, int n, int k,
const cuDoubleComplex *alpha,
const cuDoubleComplex *A, int lda,
const cuDoubleComplex *B, int ldb,
const cuDoubleComplex *beta,
cuDoubleComplex *C, int ldc)
cublasStatus_t cublasHgemm(cublasHandle_t handle,
cublasOperation_t transa, cublasOperation_t transb,
int m, int n, int k,
const __half *alpha,
const __half *A, int lda,
const __half *B, int ldb,
const __half *beta,
__half *C, int ldc)
此函数执行矩阵矩阵乘法
C = α o p ( A ) o p ( B ) + β C C = \alpha op(A)op(B) + \beta C C=αop(A)op(B)+βC
其中 α \alpha α 和 β \beta β 是标量,A 、 B 和 C 是以列优先格式存储的矩阵,维度分别为 op(A) mxk 、 op(B) kxn 和 C mxn 。 另外,对于矩阵 A:
o p ( A ) = { A 如 果 t r a n s a = = C U B L A S _ O P _ N , A T 如 果 t r a n s a = = C U B L A S _ O P _ T , A H 如 果 t r a n s a = = C U B L A S _ O P _ C op(A)= \begin{cases} A\ \ \ \ 如果 transa == CUBLAS\_OP\_N,\\ A^T \ \ 如果 transa == CUBLAS\_OP\_T,\\ A^H \ \ 如果 transa == CUBLAS\_OP\_C \end{cases} op(A)=⎩⎪⎨⎪⎧A 如果transa==CUBLAS_OP_N,AT 如果transa==CUBLAS_OP_T,AH 如果transa==CUBLAS_OP_C
这里op(B)定义的是B矩阵
Param. | Memory | In/out | Meaning |
---|---|---|---|
handle | input | handle to the cuBLAS library context. | |
transa | input | Operation op(A) that is non- or (conj.) transpose. | |
transb | input | Operation op(B) that is non- or (conj.) transpose. | |
m | input | Number of rows of matrix op(A) and C. | |
n | input | Number of columns of matrix op(B) and C. | |
k | input | Number of columns of op(A) and rows of op(B). | |
alpha | host or device | input | scalar used for multiplication. |
A | device | input | array of dimensions lda x k with lda>=max(1,m) if transa == CUBLAS_OP_N and lda x m with lda>=max(1,k) otherwise. |
lda | input | Leading dimension of two-dimensional array used to store the matrix A. | |
B | device | input | array of dimension ldb x n with ldb>=max(1,k) if transb == CUBLAS_OP_N and ldb x k with ldb>=max(1,n) otherwise. |
ldb | input | Leading dimension of two-dimensional array used to store matrix B. | |
beta | host or device | input | scalar used for multiplication. If beta==0, C does not have to be a valid input. |
C | device | in/out | array of dimensions ldc x n with ldc>=max(1,m). |
ldc | input | Leading dimension of a two-dimensional array used to store the matrix C. |
该函数可能返回的错误值及其含义如下表所示:
ErrorValue | Meaning |
---|---|
CUBLAS_STATUS_SUCCESS | 操作成功完成 |
CUBLAS_STATUS_NOT_INITIALIZED | 库未初始化 |
CUBLAS_STATUS_INVALID_VALUE | If m, n, k < 0 or if transa, transb != CUBLAS_OP_N, CUBLAS_OP_C, CUBLAS_OP_T or if lda < max(1, m) if transa == CUBLAS_OP_N and lda < max(1, k) otherwise or if ldb < max(1, k) if transb == CUBLAS_OP_N and ldb < max(1, n) otherwise or if ldc < max(1, m) or if alpha, beta == NULL or C == NULL if C needs to be scaled |
CUBLAS_STATUS_EXECUTION_FAILED | 该功能无法在 GPU 上启动 |
参考资料请参考:
sgemm, dgemm, cgemm, zgemm