Every norm for vectors or functions or matrices must share these two properties of the absolute value c of a number   cv   = c     v   ,   v + w   <=   v   +   w  
Euclidean norm, 1-norm, and infinite norm for a vector $   v   _p=( v_1 ^p+…+ v_n ^p)^{1/p}$
$v^Tw=   v       w   cos\theta$

S-norms

Choose any symmetric positive definite matrix S, S-norm $   v   ^2_S = v^TSv$, S-inner product: $(v, w)_S = v^TSw$

Norms of functions

A function f(x) is a “vector in function space”.

For a vector space to be “complete”, every converging sequence $v_n$ must have $   v_n - v_\infty   \to 0$
Banach space. Hilbert space is a Banach space with inner product of $(v,v) =   v   ^2$

The Frobenius Norm

A matrix norm follows rules for a vector norm, plus   AB   <=   A       B  
$   A   _F^2 = a_{11} ^2+…+ a_{mn} ^2$
When Q is an orthogonal matrix, we know that Qx has the same $l^2$ length as x. $   QB   _F =   B   _F$
$   A   _F^2$ = trace of AT A =sum of eigenvalues

Matrix Norms from Vector Norms

If we choose the vector v with the largest growth factor, $   A   = max(   Av   /   v   )$

l2 norm: largest singular value l1 norm: largest l1 norm of the columns of A $l_\infty$ norm: largest l1 norm of the rows of A

The Nuclear Norm

Also called the trace norm.

$   A   _N = \sigma _1 + … + \sigma _n$. It is the minimum value of $   U   _F   V   _F$ with UV=A
$   A   _F = \sigma _1^2 + … + \sigma _n^2$
$   A   _2 = \sigma _1$

Factoring Matrices and Tensors

A matrix is just a two-way tensor.

To compute a factorization A = UV, we introduce a simple alternating iteration. Update U with V fixed, then update V with U fixed. This UV idea also fits the famous k-means algorithm.

NMF

The goal of NMF is to approximate a nonnegative matrix A >= 0 by a lower rank product UV of two nonnegative matrices U >= 0 and V >= 0

When A >= 0 is symmetric positive definite, we hope for a matrix B >= 0 that satisfies $B^TB=A$. Often no solution though, and we want to find a close enough $B^TB$

SPCA: Find sparse low rank matrices B and C so that BC is close to A

Facial Feature Extraction: Each column vector of the data matrix A will represent a face. Its components are the intensities of the pixels in that image. The goal is to find a few “basic faces” in B, so that their combinations come close to the many faces in A.

Text Mining and Document Classification

each column of A represents a document. Each row of A represents a word. NMF identifies topics and classifies the whole set of documents with respect to those topics.

$a_j \approx \Sigma c_{ij}b_i$ c - importance, b topic

NMF is an NP hard problem

Computing the Factors: A central idea is alternating factorization: Hold one factor fixed, and optimize the other factor. Hold that one fixed, optimize the first factor, and repeat. Using the Frobenius norm, each step is a form of least squares.

Tensors

A column vector is a 1-way tensor. A matrix is a 2-way tensor. Then a 3-way tensor has elements $T_{ijk}$ row number, column number, and “tube number”

The rows and columns and tubes are the fibers of T, with only one varying index.

A Color Image is a Tensor with 3 Slices

A black and white image is just a matrix of pixels. The numbers in the matrix are the grayscales of each pixel. Normally those numbers are between zero (black) and 255 (white). A color image is a tensor. It has 3 slices corresponding to red-green-blue. Each slice shows the density of one of the colors RGB.

The Joint Probability Tensor

Suppose we measure age a in years and height h in inches and weight w in pounds. We put N children into I age groups and J height groups and K weight groups.

joint probabilities $p_{ijk}=N_{ijk}/N$. For each combination i, j, k we count only the children who are in age group i and also height group j and also weight group k.

The Norm and Rank of a Tensor

A tensor can multiply vectors, matrices, or tensors. Then it is a linear operator. A tensor can contain data. Its entries could give the brightness of pixels in an image. A color image is 3-way, stacking RGB. A color video will be a 4-way tensor.