Schur Complement: Consider \begin{bmatrix} A & B \\ B^T & C \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} u \\ v \end{bmatrix} with A invertible, C symmetric.
Have Ax + By = u \implies x = A^{-1} (u - By)
also B^Tx + Cy = v \implies \begin{align*} v = & B^T A^{-1} (u - By) + Cy \\ = & B^T A^{-1} u + (C - B^T A^{-1}B)y \\ = & B^T A^{-1} u + Sy \end{align*}
where S := C - B^T A^{-1}B is the Schur Complement of C in the matrix X := \begin{bmatrix} A & B \\ B^T & C \end{bmatrix}
Consider \inf _u \begin{bmatrix} u^T & v^T \end{bmatrix} \begin{bmatrix} A & B \\ B^T & C \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix} = \inf _u u^T A u + 2v^T B u + v^T C v
for fixed v, A, C symmetric
gradient = 2Au + (2v^T B)^T, optimum achieved when Au = -B^T v.
Assume A^{-1} exists, optimum achieved at u = -A^{-1} B^T v
optimum value of the \inf: v^T B A^{-1} A A^{-1} B^T v - 2v^T B A^{-1} B^T v + v^T C v = -v^T B A^{-1} B^T v + v^T C v = v^T(C - B A^{-1} B^T)v = v^T S v.
Lessons: If S \not\succeq 0, then X = \begin{bmatrix} A & B \\ B^T & C \end{bmatrix} \not\succeq 0.
And S \succeq 0 \implies X \succeq 0.
Conclusion: A \succ 0 \implies (X \succeq 0 \iff S \succeq 0)
Example: Consider f(x, Y) = x^T Y^{-1} x \;\; dom \, f = \mathbb{R}^n \times S_{++}^n
Claim: f is convex (meaning: \forall (x_1, Y_1), (x_2, Y_2) \in dom \, f and \forall \theta \in [0, 1], f(\theta x_1 + (1 - \theta) x_2, \theta Y_1 + (1 - \theta) Y_2) \le \theta f(x_1, Y_1) + (1 - \theta) f(x_2, Y_2)
Jensen’s Inequality:
1. If f is a convex function then f(\theta _1 x_1 + \cdots + \theta _n x_n) \le \theta _1 f(x_1) + \cdots \theta _n f(x_n) for all x_1, \cdots, x_n \in dom \, f, \theta _1 + \cdots + \theta _n = 1, \theta _1, \cdots, \theta _n \ge 0
2. f(E[x]) \le E[f(x)] for all row vector x s.t. x \in dom \, f with probability 1.
If x_1, x_2, \cdots \in C, then \sum_{i = 1}^n \theta _i x_i \in C for \theta _1 + \theta _ 2 + \cdots = 1.
Say x is a row vector s.t. x = x_i with probability \theta _i.
Let y_i = f(x_i), epi(f) = \{(x, t) : f(x) \le t\}
((\sum \theta _i x_i, \sum \theta_i y_i) \in epi(f) ?)
f(\sum \theta _i x_i) \le \sum \theta_i f(x_i)
Proof: By f(x) = x^2 is convex,
\begin{align*} f(E[x]) = & f(\frac{\sqrt{a} + \sqrt{b}}{2}) \\ = & \frac{a + b + 2\sqrt{ab}}{4} \\ \le & E[f(x)] \\ = & \frac{f(\sqrt{a}) + f(\sqrt{b})}{2} \\ = & \frac{a + b}{2} \end{align*}
Example: Let x_1, \cdots, x_n \gt 0, and \theta _1, \cdots, \theta _n “the used”,
then x_1^{\theta _1} x_2^{\theta _2} \cdots x_n^{\theta _n} \le \theta _1 x_1 + \theta _2 x_2 + \cdots + \theta _n x_n
Example: Let a, b \gt 0, p, q \gt 1, \frac{1}{p} + \frac{1}{q} = 1, a^{\frac{1}{p}} + b^{\frac{1}{q}} \le \frac{1}{p} a + \frac{1}{q} b
Hölder’s inequality: x^T y \le \lVert x \rVert _p \cdot \lVert y \rVert _q for \frac{1}{p} + \frac{1}{q} = 1, p, q \gt 1.
1. can normalize: \lVert x \rVert _p = 1, \lVert y \rVert _q = 1.
2. x^T y = \sum_{i = 1}^n x_i y_i = \sum_{i = 1}^n (x_i^p)^{\frac{1}{p}} (y_i^q)^{\frac{1}{q}} \le \sum_{i = 1}^n \left( \frac{1}{p} x_i^p + \frac{1}{q} y_i^q \right) = 1 = \lVert x \rVert _p \cdot \lVert y \rVert _q
a^\theta b^{1 - \theta} \le \theta a + (1 - \theta) b
\begin{align*} xy \le & \theta x^{\frac{1}{\theta}} + (1 - \theta) y^{\frac{1}{1 - \theta}} \\ = & \int _0^x t^{\frac{1}{\theta} - 1} dt + \int _0^y t^{\frac{1}{1 - \theta} - 1} dt \\ = & \int _0^x t^{\frac{1 - \theta}{\theta}} dt + \int _0^y t^{\frac{\theta}{1 - \theta}} dt \end{align*}
Let \mathcal{F} be some collection of convex functions.
Let g(x) = \sup _{f \in \mathcal{F}} f(x).
g(\theta x + (1 - \theta) y) = \sup _{f \in \mathcal{F}} f(\theta x + (1 - \theta) y) \le \sup _{f \in \mathcal{F}} \theta f(x) + (1 - \theta) f(y)
g(f(\theta x + (1 - \theta) y)) \le g(\theta f(x) + (1 - \theta) f(y)) \le \theta g(f(x)) + (1 - \theta) g(f(y))