Sketching as a Tool for Numerical Linear Algebra (Part 2) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 1 / 21 Goal New survey by David Woodruff: I Sketching as a Tool for Numerical Linear Algebra Topics: I I I I I I Subspace Embeddings Least Squares Regression Least Absolute Deviation Regression Low Rank Approximation Graph Sparsification Sketching Lower Bounds Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 2 / 21 Goal New survey by David Woodruff: I Sketching as a Tool for Numerical Linear Algebra Topics: I I I I I I Subspace Embeddings Least Squares Regression Least Absolute Deviation Regression Low Rank Approximation Graph Sparsification Sketching Lower Bounds Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 3 / 21 Introduction You have “Big” data! I I I I Computationally expensive to deal with Excessive storage requirement Hard to communicate ... Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 4 / 21 Introduction You have “Big” data! I I I I Computationally expensive to deal with Excessive storage requirement Hard to communicate ... Summarize your data I Sampling I Sketching F F A representative subset of the data An aggregate summary of the whole data Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 5 / 21 Model Input: I I matrix A ∈ Rn×d vector b ∈ Rn . Output: function F (A, b, . . .) I e.g. least square regression Different goals: I I I Faster algorithms Streaming Distributed Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 6 / 21 Linear Sketching Input: I matrix A ∈ Rn×d Let r n and S ∈ Rr×n be a random matrix Let S · A be the sketch Compute F (S · A) instead of F (A) Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 7 / 21 Linear Sketching (cont.) Pros: I I I Compute on a r × d matrix instead of n × d Smaller representation and faster computation Linearity: F F S · (A + B) = S · A + S · B We can compose linear sketches ! Cons: I F (S · A) is an approximation of F (A) Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 8 / 21 Approximate `2-regression Input: I I I matrix A ∈ Rn×d (full column rank) vector b ∈ Rn parameter 0 < ε < 1 Output x̂ ∈ Rd : kAx̂ − bk2 ≤ (1 + ε) arg min kAx − bk2 x Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 9 / 21 Subspace Embedding Definition (`2 -subspace embedding) A (1 ± ε) `2 -subspace embedding for a matrix A ∈ Rn×d is a matrix S for which for all x ∈ Rn kSAxk22 = (1 ± ε) kAxk22 Actually subspace embedding for column space of A Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 10 / 21 Previous Session Oblivious `2 -subspace embedding I I I The distribution from which S is chosen is oblivious to A One very common tool: Johnson-Lindenstrauss transform (JLT) Immediately approximate `2 -regression problem Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 11 / 21 Today Non-oblivious `2 -subspace embedding I I I The distribution from which S is chosen depends on A One very common tool: Leverage Score Sampling Can still be used to approximate `2 -regression problem Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 12 / 21 Leverage Scores Thin Singular Value Decomposition (SVD) of A: I I An×d = Un×d · Σd×d · Vd×d U is an orthonormal basis of column space of A Leverage Score of i-th row of A: `i = U(i) 2 Properties: I I I Independent of the basis (property of the column space) Forms a probability distribution (by simple normalization) Let H = A(AT A)−1 AT , then `2i = Hi,i Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 13 / 21 Leverage Score Sampling Definition (SampleRescale(n, s, p)) We define the procedure S = SampleRescale(n, s, p), if Ss×n = D · Ω, where each row of Ω is a random basis vector in Rn chosen according to the probability distribution p, and D is a diagonal √ matrix where Di,i = 1/ pj s if ej is chosen for i-th row of Ω. Leverage Score Sampling (p = LS-Sampling(A, β)): I I I p = (p1 , . . . , pn ) is a probability distribution satisfying pi ≥ β · `2i /d, where `i is the i-th leverage score of An×d Compute S = SampleRescale(n, s, p) Return S · A Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 14 / 21 Subspace Embedding via LS-Sampling Theorem log d Let s = Θ( d βε 2 ), S = SampleRescale(n, s, p) for p = LS-Sampling(A, β), and U be an orthonormal matrix of the column space of A; then with probability 0.99, simultaneously for all i ∈ [d], 1 − ε ≤ σ 2 (S · U) ≤ 1 + ε It immediately implies subspace embedding Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 15 / 21 Subspace Embedding via LS-Sampling (cont.) Theorem log d Let s = Θ( d βε 2 ), S = SampleRescale(n, s, p) for p = LS-Sampling(A, β), and U be an orthonormal matrix of the column space of A; then with probability 0.99, simultaneously for all i ∈ [d], 1 − ε ≤ σ 2 (S · U) ≤ 1 + ε Proof. Matrix Chernoff: Suppose X1 , . . . , Xs are independent copies of d×d symmetric with E[X] = 0, and kXk ≤ γ, and matrix X ∈ R P 2 T E[X X] ≤ s and let W = 1s si=1 Xi ; then Pr(kWk > ε) ≤ 2d · exp −sε2 /(2s 2 + 2γε/3) Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 16 / 21 Linear Regression via LS-Sampling Theorem log d Let s = Θ( d βε 2 ), S = SampleRescale(n, s, p) for p = LS-Sampling(A, β), and x̂ = arg minx kSAx − Sbk, then with probability 0.99, kAx̂ − bk2 ≤ (1 + ε) arg min kAx − bk2 x Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 17 / 21 Linear Regression via LS-Sampling (cont.) Theorem (Approximate Matrix Multiplication) For an orthonormal matrix Cn×m , an arbitrary vector dn×1 , and probabilities p = (p1 , . . . , pn ) such that: 2 β C(k) pk ≥ kCkF let S = SampleRescale(n, s, p); then, with probability 0.99: (SC)T (Sd) − CT d F s ≤ O( 1 ) kCkF kdkF sβ Warning: this statement is neither general nor precise! see [DKM06] Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 18 / 21 Linear Regression via LS-Sampling (cont.) Theorem log d Let s = Θ( d βε 2 ), S = SampleRescale(n, s, p) for p = LS-Sampling(A, β), and x̂ = arg minx kSAx − Sbk, then with probability 0.99, kAx̂ − bk2 ≤ (1 + ε) arg min kAx − bk2 x Proof. On the board. Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 19 / 21 Approximating Leverage Scores Computing leverage scores is as hard as solving the regression problem! Can we approximate them? I I For β = 1/2, in time O(nd log n + d 3 ) [DMIMW12] Improved to O(nnz(A) log n + d 3 ) [CW13] Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 20 / 21 Questions? Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 21 / 21 Kenneth L Clarkson and David P Woodruff. Low rank approximation and regression in input sparsity time. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 81–90. ACM, 2013. Petros Drineas, Ravi Kannan, and Michael W Mahoney. Fast monte carlo algorithms for matrices i: Approximating matrix multiplication. SIAM Journal on Computing, 36(1):132–157, 2006. Petros Drineas, Malik Magdon-Ismail, Michael W Mahoney, and David P Woodruff. Fast approximation of matrix coherence and statistical leverage. The Journal of Machine Learning Research, 13(1):3475–3506, 2012. Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 21 / 21
© Copyright 2025 Paperzz