Part 1.11 Norms
Every norm for vectors or functions or matrices must share these two properties of the absolute value | c | of a number | cv | = | c | v | , | v + w | <= | v | + | w |
Euclidean norm, 1-norm, and infinite norm for a vector $ | v | _p=( | v_1 | ^p+…+ | v_n | ^p)^{1/p}$ |
$v^Tw= | v | w | cos\theta$ |
S-norms
Choose any symmetric positive definite matrix S, S-norm $ | v | ^2_S = v^TSv$, S-inner product: $(v, w)_S = v^TSw$ |
Norms of functions
A function f(x) is a “vector in function space”.
For a vector space to be “complete”, every converging sequence $v_n$ must have $ | v_n - v_\infty | \to 0$ |
Banach space. Hilbert space is a Banach space with inner product of $(v,v) = | v | ^2$ |
The Frobenius Norm
A matrix norm follows rules for a vector norm, plus | AB | <= | A | B |
$ | A | _F^2 = | a_{11} | ^2+…+ | a_{mn} | ^2$ |
When Q is an orthogonal matrix, we know that Qx has the same $l^2$ length as x. $ | QB | _F = | B | _F$ |
$ | A | _F^2$ = trace of AT A =sum of eigenvalues |
Matrix Norms from Vector Norms
If we choose the vector v with the largest growth factor, $ | A | = max( | Av | / | v | )$ |
l2 norm: largest singular value l1 norm: largest l1 norm of the columns of A $l_\infty$ norm: largest l1 norm of the rows of A
The Nuclear Norm
Also called the trace norm.
$ | A | _N = \sigma _1 + … + \sigma _n$. It is the minimum value of $ | U | _F | V | _F$ with UV=A |
$ | A | _F = \sigma _1^2 + … + \sigma _n^2$ |
$ | A | _2 = \sigma _1$ |
Factoring Matrices and Tensors
A matrix is just a two-way tensor.
To compute a factorization A = UV, we introduce a simple alternating iteration. Update U with V fixed, then update V with U fixed. This UV idea also fits the famous k-means algorithm.
NMF
The goal of NMF is to approximate a nonnegative matrix A >= 0 by a lower rank product UV of two nonnegative matrices U >= 0 and V >= 0
When A >= 0 is symmetric positive definite, we hope for a matrix B >= 0 that satisfies $B^TB=A$. Often no solution though, and we want to find a close enough $B^TB$
SPCA: Find sparse low rank matrices B and C so that BC is close to A
Facial Feature Extraction: Each column vector of the data matrix A will represent a face. Its components are the intensities of the pixels in that image. The goal is to find a few “basic faces” in B, so that their combinations come close to the many faces in A.
Text Mining and Document Classification
each column of A represents a document. Each row of A represents a word. NMF identifies topics and classifies the whole set of documents with respect to those topics.
$a_j \approx \Sigma c_{ij}b_i$ c - importance, b topic
NMF is an NP hard problem
Computing the Factors: A central idea is alternating factorization: Hold one factor fixed, and optimize the other factor. Hold that one fixed, optimize the first factor, and repeat. Using the Frobenius norm, each step is a form of least squares.
Tensors
A column vector is a 1-way tensor. A matrix is a 2-way tensor. Then a 3-way tensor has elements $T_{ijk}$ row number, column number, and “tube number”
The rows and columns and tubes are the fibers of T, with only one varying index.
A Color Image is a Tensor with 3 Slices
A black and white image is just a matrix of pixels. The numbers in the matrix are the grayscales of each pixel. Normally those numbers are between zero (black) and 255 (white). A color image is a tensor. It has 3 slices corresponding to red-green-blue. Each slice shows the density of one of the colors RGB.
The Joint Probability Tensor
Suppose we measure age a in years and height h in inches and weight w in pounds. We put N children into I age groups and J height groups and K weight groups.
joint probabilities $p_{ijk}=N_{ijk}/N$. For each combination i, j, k we count only the children who are in age group i and also height group j and also weight group k.
The Norm and Rank of a Tensor
A tensor can multiply vectors, matrices, or tensors. Then it is a linear operator. A tensor can contain data. Its entries could give the brightness of pixels in an image. A color image is 3-way, stacking RGB. A color video will be a 4-way tensor.