Linear Algebra

目录
  • Scalars, Vectors, Matrices and Tensors
  • Multiplying Matrices and Vectors
  • Identity and Inverse Matrices
  • Linear Dependence and Span
  • Norms
  • Special Kinds of Matrices and Vectors
  • Eigendecomposition
  • Singular Value Decomposition
  • The Moore-Penrose Pseudoinverse
  • The Trace Operator
  • The Determinant

Scalars, Vectors, Matrices and Tensors

  • Scalars: A scalar is just a single number, we usually give scalars lower-case variable names.

  • Vector: A vector is an array of numbers, we give vectors lower-case names written in bold typeface, such as x.

    \[\boldsymbol{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \\ \end{bmatrix} \]

    To access \(x_1, x_3, x_6\) , we define the set S = {1, 3, 6} and write \(\boldsymbol{x}_s\). We use the - sign to index the complement of a set, \(\boldsymbol{x}_{-1}\) is the vector containing all elements of x except of \(x_1\).

  • Matrices: A matrix is a 2-D array of numbers. We usually give matrices upper-case variable names with bold typeface, such as A.

    \[\mathbf{A} = \begin{bmatrix} A_{1, 1} & A_{1, 2} \\ A_{2, 1} & A_{2, 2} \\ \end{bmatrix} \]

    We can identify all of the numbers with vertical coordinate \(i\) by writing a ":" for the horizontal coordinate, \(\boldsymbol{A}_{i, :}\) is known as the i-th row of A. Sometimes we may need to index matrix-valued expressions that are not just a single letter, in this case, we use subscripts after the expression, such as \(f(\boldsymbol{A})_{i, j}\), which gives element (i, j) of the matrix computed by applying the function \(f\) to A.

  • Tensors: In some cases we will need an array with more than two axes. An array of numbers arranged on a regular grid with a variable number of axes is known as a tensor. We denote a tensor named "A" with this typeface: A.

Multiplying Matrices and Vectors

The matrix product of matrices A and B is a third matrix C, C = AB and

\[C_{i, j} = \sum_kA_{i, k}B_{k, j} \]

Matrix product operations have many properties:

  • distributive: A(B + C) = AB + AC
  • associative: A(BC) = (AB)C
  • not commutative: sometimes AB \(\neq\) BA

Note that the standsrd product of two matrices is not just a matrix containing the product of the individual elements. Such an operation is called the element-wise product , and is denoted as A \(\odot\) B .

The dot product between two vectors x and y of the same dimensionality is the matrix product \(\mathbf{x}^T\mathbf{y}\)

Identity and Inverse Matrices

An identity matrix that preserves n-dimensional vectors is denoted as \(\boldsymbol{I}_n\). Formally, \(\mathbf{I}_n \in \mathbb{R}^{n \times n}\), and

\[\forall \mathbf{x} \in \mathbb{R}^n, \mathbf{I}_n\mathbf{x} = \mathbf{x}. \]

The matrix inverse of A is donated as \(\mathbf{A}^{-1}\), and it is defined as the matrix such that

\[\mathbf{A}^{-1}\mathbf{A} = \mathbf{I}_n \]

Linear Dependence and Span

A linear combination of some set of vectors \(\{\mathbf v^{(1)},\dots, \mathbf v^{(n)}\}\) is given by multiplying each vector \(\mathbf v^{(i)}\) by a corresponding scalar coefficient and adding the results:

\[\sum_i c_i \mathbf v^{(i)} \]

The span of a set of vectors is the set of all points obtainable by linear combination of the original vectors.

Determining whether \(\mathbf{Ax = b}\) has a solution thus amounts to testing whether b is in the span of the columns of A. This particular span is known as the column space or the range of A.

A set of vectors is linear independent if no vector in the set is a linear combination of the other vectors. A square matrix with linearly dependnet columns is known as singular.

Norms

Sometimes we need to measure the size of a vector. In machine learning, we usually measure the size of vectors using a function called a norm. Formally, the \(L^p\) norm is given by

\[||x||_p = \Big(\sum_i|x_i|^p\Big)^{\frac{1}{p}} \]

for \(p \in \mathbb{R}, p\geq1\).

A norm is any function \(f\) that satisfies the following properties:

  • \(f(x) = 0 \Rightarrow \mathbf{x}=\mathbf{0}\)
  • \(f(\mathbf{x} + \mathbf{y}) \leq f(\mathbf{x}) + f(\mathbf{y})\)
  • \(\forall \alpha \in \mathbb{R}, f(\alpha \mathbf{x}) = |\alpha|f(\mathbf{x})\)

Special Kinds of Matrices and Vectors

  • Diagonal matrices consist mostly of zeros and have non-zero entries only along the main diagonal. We write diag(v) to denote a square diagonal matrix whose diagonal entries are given by the entries of the vector v. Then we have

    \[diag(\mathbf{v}\mathbf{x}) = \mathbf{v} \odot \mathbf{x} \]

  • A symmetric matrix is any matrix that is equal to its own transpose:

    \[\mathbf{A} = \mathbf{A}^T \]

  • A unit vector is a vector with unit norm:

    \[||\mathbf{x}||_2 = 1 \]

  • A vector x and a vector y are orthogonal to each other if \(\mathbf{x}^T\mathbf{y} = 0\), if the vectors are not only orthogonal but also have unit norm, we call them orthonormal.

    An orthogonal matrix is a square matrix whose rows are mutually orthonormal and whose columns are mutually orthonormal:

    \[\mathbf{A}^T\mathbf{A} = \mathbf{A}\mathbf{A}^T = \mathbf{I} \]

Eigendecomposition

An eigenvector of a square matrix A is a non-zero vector v such that multiplication by A alters only the scale of v:

\[\mathbf{Av} = \lambda \mathbf{v} \]

Suppose that a matrix A has n linearly independent eigenvectors, \(\{\mathbf{v}^{(1)},\dots,\mathbf{v}^{(n)}\}\), withcorresponding eigenvalues \(\{\lambda_1,\dots,\lambda_n\}\). We may concatenate all of the eigenvectors to form a matrix V with one eigenvector per column: V = \([\mathbf{v}^{(1)},\dots,\mathbf{v}^{(n)}]\). Likewise, we can concatenate the eigenvalues to form a vector λ = \([\lambda_1,\dots,\lambda_n]\). The eigendecomposition of A is then given by

\[\mathbf{A} = \mathbf{V}diag(\boldsymbol{\lambda})\mathbf{V}^{(-1)} \]

Every real symmetric matrix can be decomposed into an expression using only real-valued eigenvectors and eigenvalues:

\[\boldsymbol{A} = \boldsymbol{Q \Lambda Q}^T \]

A matrix whose eigenvalues are all positive is called positive definite. A matrix whose eigenvalues are all positive or zero-valued is called positive semidefinite. If all eigenvalues are negative, the matrix is negative definite.

Singular Value Decomposition

In last section, we saw how to decompose a matrix into eigenvectors and eigenvalues. The singular value decomposition (SVD) provides another way to factorize a matrix, into singular vectors and singular values. However, the SVD is more generally applicable. Every real matrix has a singular value decomposition, but the same is not true of the eigenvalue decomposition (matrix may not square).

In singular value decomposition, we can rewrite A as

\[\boldsymbol{A} = \boldsymbol{UDV}^T \]

Suppose that A is an \(m \times n\) matrix. Then U is defined to be an \(m \times m\) matrix, D to be an \(m \times n\) matrix, and V to be an \(n \times n\) matrix. Each of these matrices is defined to have a special structure. The matrices U and V are both defined to be orthogonal matrices. The matrix D is defined to be a diagonal matrix. Note that D is not necessarily square.

The elements along the diagonal of D are known as the singular values of the matrix A. The columns of U are known as the left-singular vectors. The columns of V are known as as the right-singular vectors.

We can actually interpret the singular value decomposition of A in terms of the eigendecomposition of functions of A. The left-singular vectors of A are the eigenvectors of \(\boldsymbol{AA}^T\). The right-singular vectors of A are the eigenvectors of \(\boldsymbol{A}^T\boldsymbol{A}\). The non-zero singular values of A are the square roots of the eigenvalues of \(\boldsymbol{A}^T\boldsymbol{A}\). The same is true for \(\boldsymbol{AA}^T\).

The Moore-Penrose Pseudoinverse

Matrix inversion is not defined for matrices that are not square, but the Moore-Penrose pseudoinverse allows us to defined A as

\[\boldsymbol{A}^+ = \lim_{\alpha \to 0}(\boldsymbol{A}^T\boldsymbol{A}+ \alpha\boldsymbol{I})^{-1}\boldsymbol{A}^T \]

\[\boldsymbol{A}^+ = \boldsymbol{VD}^+\boldsymbol{U}^T \]

The Trace Operator

\[Tr(\boldsymbol{A})=\sum_i\boldsymbol{A}_{i,i} \]

The Determinant

The determinant of a square matrix, denoted det(A), is a function mapping matrices to real scalars. The determinant is equal to the product of all the eigenvalues of the matrix.

你可能感兴趣的:(Linear Algebra)