- Scalars, Vectors, Matrices and Tensors
- Multiplying Matrices and Vectors
- Identity and Inverse Matrices
- Linear Dependence and Span
- Norms
- Special Kinds of Matrices and Vectors
- Eigendecomposition
- Singular Value Decomposition
- The Moore-Penrose Pseudoinverse
- The Trace Operator
- The Determinant
Scalars, Vectors, Matrices and Tensors
-
Scalars: A scalar is just a single number, we usually give scalars lower-case variable names.
-
Vector: A vector is an array of numbers, we give vectors lower-case names written in bold typeface, such as x.
\[\boldsymbol{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \\ \end{bmatrix} \]To access \(x_1, x_3, x_6\) , we define the set S = {1, 3, 6} and write \(\boldsymbol{x}_s\). We use the - sign to index the complement of a set, \(\boldsymbol{x}_{-1}\) is the vector containing all elements of x except of \(x_1\).
-
Matrices: A matrix is a 2-D array of numbers. We usually give matrices upper-case variable names with bold typeface, such as A.
\[\mathbf{A} = \begin{bmatrix} A_{1, 1} & A_{1, 2} \\ A_{2, 1} & A_{2, 2} \\ \end{bmatrix} \]We can identify all of the numbers with vertical coordinate \(i\) by writing a ":" for the horizontal coordinate, \(\boldsymbol{A}_{i, :}\) is known as the i-th row of A. Sometimes we may need to index matrix-valued expressions that are not just a single letter, in this case, we use subscripts after the expression, such as \(f(\boldsymbol{A})_{i, j}\), which gives element (i, j) of the matrix computed by applying the function \(f\) to A.
-
Tensors: In some cases we will need an array with more than two axes. An array of numbers arranged on a regular grid with a variable number of axes is known as a tensor. We denote a tensor named "A" with this typeface: A.
Multiplying Matrices and Vectors
The matrix product of matrices A and B is a third matrix C, C = AB and
Matrix product operations have many properties:
- distributive: A(B + C) = AB + AC
- associative: A(BC) = (AB)C
- not commutative: sometimes AB \(\neq\) BA
Note that the standsrd product of two matrices is not just a matrix containing the product of the individual elements. Such an operation is called the element-wise product , and is denoted as A \(\odot\) B .
The dot product between two vectors x and y of the same dimensionality is the matrix product \(\mathbf{x}^T\mathbf{y}\)
Identity and Inverse Matrices
An identity matrix that preserves n-dimensional vectors is denoted as \(\boldsymbol{I}_n\). Formally, \(\mathbf{I}_n \in \mathbb{R}^{n \times n}\), and
The matrix inverse of A is donated as \(\mathbf{A}^{-1}\), and it is defined as the matrix such that
Linear Dependence and Span
A linear combination of some set of vectors \(\{\mathbf v^{(1)},\dots, \mathbf v^{(n)}\}\) is given by multiplying each vector \(\mathbf v^{(i)}\) by a corresponding scalar coefficient and adding the results:
The span of a set of vectors is the set of all points obtainable by linear combination of the original vectors.
Determining whether \(\mathbf{Ax = b}\) has a solution thus amounts to testing whether b is in the span of the columns of A. This particular span is known as the column space or the range of A.
A set of vectors is linear independent if no vector in the set is a linear combination of the other vectors. A square matrix with linearly dependnet columns is known as singular.
Norms
Sometimes we need to measure the size of a vector. In machine learning, we usually measure the size of vectors using a function called a norm. Formally, the \(L^p\) norm is given by
for \(p \in \mathbb{R}, p\geq1\).
A norm is any function \(f\) that satisfies the following properties:
- \(f(x) = 0 \Rightarrow \mathbf{x}=\mathbf{0}\)
- \(f(\mathbf{x} + \mathbf{y}) \leq f(\mathbf{x}) + f(\mathbf{y})\)
- \(\forall \alpha \in \mathbb{R}, f(\alpha \mathbf{x}) = |\alpha|f(\mathbf{x})\)
Special Kinds of Matrices and Vectors
-
Diagonal matrices consist mostly of zeros and have non-zero entries only along the main diagonal. We write diag(v) to denote a square diagonal matrix whose diagonal entries are given by the entries of the vector v. Then we have
\[diag(\mathbf{v}\mathbf{x}) = \mathbf{v} \odot \mathbf{x} \] -
A symmetric matrix is any matrix that is equal to its own transpose:
\[\mathbf{A} = \mathbf{A}^T \] -
A unit vector is a vector with unit norm:
\[||\mathbf{x}||_2 = 1 \] -
A vector x and a vector y are orthogonal to each other if \(\mathbf{x}^T\mathbf{y} = 0\), if the vectors are not only orthogonal but also have unit norm, we call them orthonormal.
An orthogonal matrix is a square matrix whose rows are mutually orthonormal and whose columns are mutually orthonormal:
\[\mathbf{A}^T\mathbf{A} = \mathbf{A}\mathbf{A}^T = \mathbf{I} \]
Eigendecomposition
An eigenvector of a square matrix A is a non-zero vector v such that multiplication by A alters only the scale of v:
Suppose that a matrix A has n linearly independent eigenvectors, \(\{\mathbf{v}^{(1)},\dots,\mathbf{v}^{(n)}\}\), withcorresponding eigenvalues \(\{\lambda_1,\dots,\lambda_n\}\). We may concatenate all of the eigenvectors to form a matrix V with one eigenvector per column: V = \([\mathbf{v}^{(1)},\dots,\mathbf{v}^{(n)}]\). Likewise, we can concatenate the eigenvalues to form a vector λ = \([\lambda_1,\dots,\lambda_n]\). The eigendecomposition of A is then given by
Every real symmetric matrix can be decomposed into an expression using only real-valued eigenvectors and eigenvalues:
A matrix whose eigenvalues are all positive is called positive definite. A matrix whose eigenvalues are all positive or zero-valued is called positive semidefinite. If all eigenvalues are negative, the matrix is negative definite.
Singular Value Decomposition
In last section, we saw how to decompose a matrix into eigenvectors and eigenvalues. The singular value decomposition (SVD) provides another way to factorize a matrix, into singular vectors and singular values. However, the SVD is more generally applicable. Every real matrix has a singular value decomposition, but the same is not true of the eigenvalue decomposition (matrix may not square).
In singular value decomposition, we can rewrite A as
Suppose that A is an \(m \times n\) matrix. Then U is defined to be an \(m \times m\) matrix, D to be an \(m \times n\) matrix, and V to be an \(n \times n\) matrix. Each of these matrices is defined to have a special structure. The matrices U and V are both defined to be orthogonal matrices. The matrix D is defined to be a diagonal matrix. Note that D is not necessarily square.
The elements along the diagonal of D are known as the singular values of the matrix A. The columns of U are known as the left-singular vectors. The columns of V are known as as the right-singular vectors.
We can actually interpret the singular value decomposition of A in terms of the eigendecomposition of functions of A. The left-singular vectors of A are the eigenvectors of \(\boldsymbol{AA}^T\). The right-singular vectors of A are the eigenvectors of \(\boldsymbol{A}^T\boldsymbol{A}\). The non-zero singular values of A are the square roots of the eigenvalues of \(\boldsymbol{A}^T\boldsymbol{A}\). The same is true for \(\boldsymbol{AA}^T\).
The Moore-Penrose Pseudoinverse
Matrix inversion is not defined for matrices that are not square, but the Moore-Penrose pseudoinverse allows us to defined A as
The Trace Operator
The Determinant
The determinant of a square matrix, denoted det(A), is a function mapping matrices to real scalars. The determinant is equal to the product of all the eigenvalues of the matrix.