Part 1.11 Norms

Every norm for vectors or functions or matrices must share these two properties of the absolute value

of a number

v + w

Euclidean norm, 1-norm, and infinite norm for a vector $

_p=(

v_1

^p+…+

v_n

^p)^{1/p}$

$v^Tw=

cos\theta$

S-norms

Choose any symmetric positive definite matrix S, S-norm $

^2_S = v^TSv$, S-inner product: $(v, w)_S = v^TSw$

Norms of functions

A function f(x) is a “vector in function space”.

For a vector space to be “complete”, every converging sequence $v_n$ must have $

v_n - v_\infty

\to 0$

Banach space. Hilbert space is a Banach space with inner product of $(v,v) =

^2$

The Frobenius Norm

A matrix norm follows rules for a vector norm, plus

_F^2 =

a_{11}

^2+…+

a_{mn}

^2$

When Q is an orthogonal matrix, we know that Qx has the same $l^2$ length as x. $

_F =

_F$

_F^2$ = trace of AT A =sum of eigenvalues

Matrix Norms from Vector Norms

If we choose the vector v with the largest growth factor, $

= max(

l2 norm: largest singular value l1 norm: largest l1 norm of the columns of A $l_\infty$ norm: largest l1 norm of the rows of A

The Nuclear Norm

Also called the trace norm.

_N = \sigma _1 + … + \sigma _n$. It is the minimum value of $

_F$ with UV=A

_F = \sigma _1^2 + … + \sigma _n^2$

_2 = \sigma _1$

Factoring Matrices and Tensors

A matrix is just a two-way tensor.

To compute a factorization A = UV, we introduce a simple alternating iteration. Update U with V fixed, then update V with U fixed. This UV idea also fits the famous k-means algorithm.

NMF

The goal of NMF is to approximate a nonnegative matrix A >= 0 by a lower rank product UV of two nonnegative matrices U >= 0 and V >= 0

When A >= 0 is symmetric positive definite, we hope for a matrix B >= 0 that satisfies $B^TB=A$. Often no solution though, and we want to find a close enough $B^TB$

SPCA: Find sparse low rank matrices B and C so that BC is close to A

Facial Feature Extraction: Each column vector of the data matrix A will represent a face. Its components are the intensities of the pixels in that image. The goal is to find a few “basic faces” in B, so that their combinations come close to the many faces in A.

Text Mining and Document Classification

each column of A represents a document. Each row of A represents a word. NMF identifies topics and classifies the whole set of documents with respect to those topics.

$a_j \approx \Sigma c_{ij}b_i$ c - importance, b topic

NMF is an NP hard problem

Computing the Factors: A central idea is alternating factorization: Hold one factor fixed, and optimize the other factor. Hold that one fixed, optimize the first factor, and repeat. Using the Frobenius norm, each step is a form of least squares.

Tensors

A column vector is a 1-way tensor. A matrix is a 2-way tensor. Then a 3-way tensor has elements $T_{ijk}$ row number, column number, and “tube number”

The rows and columns and tubes are the fibers of T, with only one varying index.

A Color Image is a Tensor with 3 Slices

A black and white image is just a matrix of pixels. The numbers in the matrix are the grayscales of each pixel. Normally those numbers are between zero (black) and 255 (white). A color image is a tensor. It has 3 slices corresponding to red-green-blue. Each slice shows the density of one of the colors RGB.

The Joint Probability Tensor

Suppose we measure age a in years and height h in inches and weight w in pounds. We put N children into I age groups and J height groups and K weight groups.

joint probabilities $p_{ijk}=N_{ijk}/N$. For each combination i, j, k we count only the children who are in age group i and also height group j and also weight group k.

The Norm and Rank of a Tensor

A tensor can multiply vectors, matrices, or tensors. Then it is a linear operator. A tensor can contain data. Its entries could give the brightness of pixels in an image. A color image is 3-way, stacking RGB. A color video will be a 4-way tensor.