A Large-Deviation Analysis for the Maximum-Likelihood Learning of Tree Structures Vincent Tan † Animashree Anandkumar †∗ , Lang Tong ∗ and Alan Willsky † † Stochastic Systems Group, LIDS, Massachusetts Institute of Technology ∗ School of ECE, Cornell University ISIT (Jun 30, 2009) 1/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 1 / 17 Motivation Given a set of n i.i.d. samples drawn from P, a tree distribution. Maximum-Likelihood (ML) learning of the structure. 2/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 2 / 17 Motivation Given a set of n i.i.d. samples drawn from P, a tree distribution. Maximum-Likelihood (ML) learning of the structure. Large-Deviation analysis of the error in learning of the set of edges or the structure. 2/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 2 / 17 Motivation Given a set of n i.i.d. samples drawn from P, a tree distribution. Maximum-Likelihood (ML) learning of the structure. Large-Deviation analysis of the error in learning of the set of edges or the structure. Questions: 1 2/17 Vincent Tan (MIT) When does the error probability decay exponentially with n? Large-Deviations for Learning Trees ISIT 2 / 17 Motivation Given a set of n i.i.d. samples drawn from P, a tree distribution. Maximum-Likelihood (ML) learning of the structure. Large-Deviation analysis of the error in learning of the set of edges or the structure. Questions: 2/17 Vincent Tan (MIT) 1 When does the error probability decay exponentially with n? 2 What is the exact rate of decay of the probability of error? Large-Deviations for Learning Trees ISIT 2 / 17 Motivation Given a set of n i.i.d. samples drawn from P, a tree distribution. Maximum-Likelihood (ML) learning of the structure. Large-Deviation analysis of the error in learning of the set of edges or the structure. Questions: 2/17 Vincent Tan (MIT) 1 When does the error probability decay exponentially with n? 2 What is the exact rate of decay of the probability of error? 3 How does the rate (error exponent) depend on the parameters of the distribution? Large-Deviations for Learning Trees ISIT 2 / 17 Main Results Almost every (true) tree distribution results in exponential decay. Quantify the exact rate of decay for a given P. 3/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 3 / 17 Main Results Almost every (true) tree distribution results in exponential decay. Quantify the exact rate of decay for a given P. Rate of decay ≈ SNR for learning (Intuitively appealing). 3/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 3 / 17 Notation and Background Assume we have a set of samples xn = {x1 , x2 , . . . , xn } each drawn i.i.d. from P ∈ P(X d ), where X is a finite set. 4/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 4 / 17 Notation and Background Assume we have a set of samples xn = {x1 , x2 , . . . , xn } each drawn i.i.d. from P ∈ P(X d ), where X is a finite set. x = (x1 , . . . , xd ) ∈ X d . 4/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 4 / 17 Notation and Background Assume we have a set of samples xn = {x1 , x2 , . . . , xn } each drawn i.i.d. from P ∈ P(X d ), where X is a finite set. x = (x1 , . . . , xd ) ∈ X d . Vertex set: V := {1, . . . , d}. Edge set: EP ⊂ 4/17 Vincent Tan (MIT) Large-Deviations for Learning Trees V 2 . ISIT 4 / 17 Notation and Background Assume we have a set of samples xn = {x1 , x2 , . . . , xn } each drawn i.i.d. from P ∈ P(X d ), where X is a finite set. x = (x1 , . . . , xd ) ∈ X d . Vertex set: V := {1, . . . , d}. Edge set: EP ⊂ V 2 . P(x) is Markov on TP = (V, EP ), a tree. P(x) factorizes according to TP . Example for P with d = 4. 4/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 4 / 17 Notation and Background Assume we have a set of samples xn = {x1 , x2 , . . . , xn } each drawn i.i.d. from P ∈ P(X d ), where X is a finite set. x = (x1 , . . . , xd ) ∈ X d . Vertex set: V := {1, . . . , d}. Edge set: EP ⊂ V 2 . P(x) is Markov on TP = (V, EP ), a tree. P(x) factorizes according to TP . Example for P with d = 4. 4/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 4 / 17 Notation and Background Assume we have a set of samples xn = {x1 , x2 , . . . , xn } each drawn i.i.d. from P ∈ P(X d ), where X is a finite set. x = (x1 , . . . , xd ) ∈ X d . Vertex set: V := {1, . . . , d}. Edge set: EP ⊂ V 2 . P(x) is Markov on TP = (V, EP ), a tree. P(x) factorizes according to TP . Example for P with d = 4. P(x) = P1 (x1 ) × 4/17 Vincent Tan (MIT) P1,2 (x1 , x2 ) P1,3 (x1 , x3 ) P1,4 (x1 , x4 ) × × . P1 (x1 ) P1 (x1 ) P1 (x1 ) Large-Deviations for Learning Trees ISIT 4 / 17 ML Learning of Tree Distribution (Chow-Liu) I 5/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 5 / 17 ML Learning of Tree Distribution (Chow-Liu) I Solve the ML problem given xn = {x1 , x2 , . . . , xn } PML = argmax log Qn (xn ). Q∈Trees 5/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 5 / 17 ML Learning of Tree Distribution (Chow-Liu) I Solve the ML problem given xn = {x1 , x2 , . . . , xn } PML = argmax log Qn (xn ). Q∈Trees b bxn (x): the empirical distribution of xn . P(x) =P 5/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 5 / 17 ML Learning of Tree Distribution (Chow-Liu) I Solve the ML problem given xn = {x1 , x2 , . . . , xn } PML = argmax log Qn (xn ). Q∈Trees b bxn (x): the empirical distribution of xn . P(x) =P be : the pairwise marginal of P b on edge e = (i, j). P 5/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 5 / 17 ML Learning of Tree Distribution (Chow-Liu) I Solve the ML problem given xn = {x1 , x2 , . . . , xn } PML = argmax log Qn (xn ). Q∈Trees b bxn (x): the empirical distribution of xn . P(x) =P be : the pairwise marginal of P b on edge e = (i, j). P Reduces to a max-weight spanning tree problem (Chow-Liu 1968) EML = argmax X EQ : Q∈Trees e∈E Q 5/17 Vincent Tan (MIT) be ), I(P Large-Deviations for Learning Trees ISIT 5 / 17 ML Learning of Tree Distribution (Chow-Liu) I Solve the ML problem given xn = {x1 , x2 , . . . , xn } PML = argmax log Qn (xn ). Q∈Trees b bxn (x): the empirical distribution of xn . P(x) =P be : the pairwise marginal of P b on edge e = (i, j). P Reduces to a max-weight spanning tree problem (Chow-Liu 1968) EML = argmax X EQ : Q∈Trees e∈E Q be ), I(P Vincent Tan (MIT) X xi ,xj be ) : e ∈ Samples ⇒ {I(P 5/17 be ) := I(P V 2 b bi,j (xi , xj ) log Pi,j (xi , xj ) . P bi (xi )P bj (xj ) P } ⇒ max-weight spanning tree. Large-Deviations for Learning Trees ISIT 5 / 17 ML Learning of Tree Distribution (Chow-Liu) II 6/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 6 / 17 ML Learning of Tree Distribution (Chow-Liu) II True MIs {I(Pe )} 6/17 Vincent Tan (MIT) Max-weight spanning tree EP Large-Deviations for Learning Trees ISIT 6 / 17 ML Learning of Tree Distribution (Chow-Liu) II True MIs {I(Pe )} 6/17 Vincent Tan (MIT) Max-weight spanning tree EP Large-Deviations for Learning Trees ISIT 6 / 17 ML Learning of Tree Distribution (Chow-Liu) II True MIs {I(Pe )} Max-weight spanning tree EP be )} from xn Empirical MIs {I(P Max-weight spanning tree EML 6= EP 6/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 6 / 17 Problem Statement Define EML to be the ML edge set and the error event to be: {EML 6= EP } . 7/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 7 / 17 Problem Statement Define EML to be the ML edge set and the error event to be: {EML 6= EP } . 7/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 7 / 17 Problem Statement Define EML to be the ML edge set and the error event to be: {EML 6= EP } . Find the error exponent KP : 1 KP := lim − log P ({EML 6= EP }) . n→∞ n 7/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 7 / 17 Problem Statement Define EML to be the ML edge set and the error event to be: {EML 6= EP } . Find the error exponent KP : 1 KP := lim − log P ({EML 6= EP }) . n→∞ n . Alternatively, P ({EML 6= EP }) = exp(−nKP ). 7/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 7 / 17 Problem Statement Define EML to be the ML edge set and the error event to be: {EML 6= EP } . Find the error exponent KP : 1 KP := lim − log P ({EML 6= EP }) . n→∞ n . Alternatively, P ({EML 6= EP }) = exp(−nKP ). Easier to consider crossover events first. 7/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 7 / 17 The Crossover Rate I 8/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 8 / 17 The Crossover Rate I Given two node pairs e, e0 ∈ V 2 with distribution Pe,e0 ∈ P(X 4 ), s.t. I(Pe ) > I(Pe0 ). 8/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 8 / 17 The Crossover Rate I Given two node pairs e, e0 ∈ V 2 with distribution Pe,e0 ∈ P(X 4 ), s.t. I(Pe ) > I(Pe0 ). Consider the crossover event of the empirical mutual informations be ) ≤ I(P be0 )}. {I(P 8/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 8 / 17 The Crossover Rate I Given two node pairs e, e0 ∈ V 2 with distribution Pe,e0 ∈ P(X 4 ), s.t. I(Pe ) > I(Pe0 ). Consider the crossover event of the empirical mutual informations be ) ≤ I(P be0 )}. {I(P Def: Crossover Rate n o 1 be ) ≤ I(P be0 ) . Je,e0 := lim − log P I(P n→∞ n 8/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 8 / 17 The Crossover Rate I Given two node pairs e, e0 ∈ V 2 with distribution Pe,e0 ∈ P(X 4 ), s.t. I(Pe ) > I(Pe0 ). Consider the crossover event of the empirical mutual informations be ) ≤ I(P be0 )}. {I(P Def: Crossover Rate n o 1 be ) ≤ I(P be0 ) . Je,e0 := lim − log P I(P n→∞ n This event may potentially lead to an error in structure learning. Why? 8/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 8 / 17 The Crossover Rate II Theorem The crossover rate for empirical mutual informations is o n D(Q || Pe,e0 ) : I(Qe0 ) = I(Qe ) . Je,e0 = min Q∈P(X 4 ) 9/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 9 / 17 The Crossover Rate II Theorem The crossover rate for empirical mutual informations is o n D(Q || Pe,e0 ) : I(Qe0 ) = I(Qe ) . Je,e0 = min Q∈P(X 4 ) v Pe,e0 Sanov’s Theorem and The Contraction Principle. D(Q∗e,e0 ||Pe,e0 ) v {I(Qe ) = I(Qe0 )} Q∗e,e0 9/17 Vincent Tan (MIT) Exact but not very intuitive. Non-Convex. Large-Deviations for Learning Trees ISIT 9 / 17 The Crossover Rate III Euclidean Information Theory [Borade & Zheng ’08]: Q≈P 10/17 Vincent Tan (MIT) ⇒ D(Q || P) ≈ Large-Deviations for Learning Trees 1 kQ − Pk2P 2 ISIT 10 / 17 The Crossover Rate III Euclidean Information Theory [Borade & Zheng ’08]: Q≈P ⇒ D(Q || P) ≈ 1 kQ − Pk2P 2 Def: Very noisy learning condition on Pe,e0 Pe ≈ Pe0 10/17 Vincent Tan (MIT) ⇒ Large-Deviations for Learning Trees ISIT 10 / 17 The Crossover Rate III Euclidean Information Theory [Borade & Zheng ’08]: Q≈P ⇒ D(Q || P) ≈ 1 kQ − Pk2P 2 Def: Very noisy learning condition on Pe,e0 Pe ≈ Pe0 10/17 Vincent Tan (MIT) ⇒ I(Pe ) ≈ I(Pe0 ). Large-Deviations for Learning Trees ISIT 10 / 17 The Crossover Rate III Euclidean Information Theory [Borade & Zheng ’08]: Q≈P ⇒ D(Q || P) ≈ 1 kQ − Pk2P 2 Def: Very noisy learning condition on Pe,e0 Pe ≈ Pe0 ⇒ I(Pe ) ≈ I(Pe0 ). Def: Given a Pe = Pi,j the information density function is se (xi , xj ) := log 10/17 Vincent Tan (MIT) Pi,j (xi , xj ) , Pi (xi )Pj (xj ) Large-Deviations for Learning Trees ∀ (xi , xj ) ∈ X 2 . ISIT 10 / 17 The Crossover Rate IV n o 1 be ) ≤ I(P be0 ) . Je,e0 := lim − log P I(P n→∞ n 11/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 11 / 17 The Crossover Rate IV n o 1 be ) ≤ I(P be0 ) . Je,e0 := lim − log P I(P n→∞ n Theorem The approximate crossover rate is: (I(Pe0 ) − I(Pe ))2 e Je,e0 = . 2 Var(se0 − se ) Signal-to-noise ratio for structure learning. SNR = 11/17 Vincent Tan (MIT) mean 2 . stddev Large-Deviations for Learning Trees ISIT 11 / 17 The Crossover Rate V Convexifying the optimization problem by linearizing the constraints. 12/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 12 / 17 The Crossover Rate V Convexifying the optimization problem by linearizing the constraints. v Pe,e0 D(Q∗e,e0 ||Pe,e0 ) v {I(Qe ) = I(Qe0 )} Q∗e,e0 12/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 12 / 17 The Crossover Rate V Convexifying the optimization problem by linearizing the constraints. v Pe,e0 v Pe,e0 D(Q∗e,e0 ||Pe,e0 ) v {I(Qe ) = I(Qe0 )} Q∗e,e0 12/17 Vincent Tan (MIT) Large-Deviations for Learning Trees 1 ∗ 2 0 2 kQe,e0 −Pe,e kPe,e0 Q∗e,e0 v Q(Pe,e0 ) ISIT 12 / 17 The Crossover Rate V Convexifying the optimization problem by linearizing the constraints. v Pe,e0 v Pe,e0 D(Q∗e,e0 ||Pe,e0 ) v {I(Qe ) = I(Qe0 )} Q∗e,e0 1 ∗ 2 0 2 kQe,e0 −Pe,e kPe,e0 Q∗e,e0 v Q(Pe,e0 ) Non-Convex problem becomes a Least-Squares problem. 12/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 12 / 17 The Crossover Rate V How good is the approximation? We consider a binary model. 0.025 True Rate Approx Rate Rate Je,e′ 0.02 0.015 0.01 0.005 0 13/17 0 Vincent Tan (MIT) 0.01 0.02 0.03 0.04 I(Pe)−I(Pe′) 0.05 Large-Deviations for Learning Trees 0.06 0.07 ISIT 13 / 17 Error Exponent for Structure Learning I We have characterized the rate for the crossover event n o be ) ≤ I(P be0 ) . We called the rate Je,e0 . I(P 14/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 14 / 17 Error Exponent for Structure Learning I We have characterized the rate for the crossover event n o be ) ≤ I(P be0 ) . We called the rate Je,e0 . I(P Crossovers Error Event 14/17 Vincent Tan (MIT) Large-Deviations for Learning Trees be ) ≤ I(P be0 )} {I(P {EML 6= EP } ISIT 14 / 17 Error Exponent for Structure Learning I We have characterized the rate for the crossover event n o be ) ≤ I(P be0 ) . We called the rate Je,e0 . I(P Crossovers Error Event be ) ≤ I(P be0 )} {I(P {EML 6= EP } How to relate this to KP , the overall error exponent? . P ({EML 6= EP }) = exp(−nKP ) 14/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 14 / 17 Error Exponent for Structure Learning II Theorem KP = min e0 ∈E / P 15/17 Vincent Tan (MIT) min e∈Path(e0 ;EP ) Je,e0 . Large-Deviations for Learning Trees ISIT 15 / 17 Error Exponent for Structure Learning II Theorem KP = min e0 ∈E / P 15/17 Vincent Tan (MIT) min e∈Path(e0 ;EP ) Je,e0 . Large-Deviations for Learning Trees ISIT 15 / 17 Positivity of the Error Exponent Theorem The following statements are equivalent: (a) The error probability decays exponentially i.e., KP > 0. (b) TP is a spanning tree, i.e., not a proper forest. 16/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 16 / 17 Positivity of the Error Exponent Theorem The following statements are equivalent: (a) The error probability decays exponentially i.e., KP > 0. (b) TP is a spanning tree, i.e., not a proper forest. 16/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 16 / 17 Conclusions Goal: Find the rate of decay of the probability of error: 1 KP := lim − log P ({EML 6= EP }) . n→∞ n Employed tools from Large-Deviation theory and basic properties of trees. 17/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 17 / 17 Conclusions Goal: Find the rate of decay of the probability of error: 1 KP := lim − log P ({EML 6= EP }) . n→∞ n Employed tools from Large-Deviation theory and basic properties of trees. Found the dominant error tree that relates crossover events to overall error event. 17/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 17 / 17 Conclusions Goal: Find the rate of decay of the probability of error: 1 KP := lim − log P ({EML 6= EP }) . n→∞ n Employed tools from Large-Deviation theory and basic properties of trees. Found the dominant error tree that relates crossover events to overall error event. Used Euclidean Information Theory to obtain an intuitive signal-to-noise ratio expression for the crossover rate. 17/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 17 / 17 Conclusions Goal: Find the rate of decay of the probability of error: 1 KP := lim − log P ({EML 6= EP }) . n→∞ n Employed tools from Large-Deviation theory and basic properties of trees. Found the dominant error tree that relates crossover events to overall error event. Used Euclidean Information Theory to obtain an intuitive signal-to-noise ratio expression for the crossover rate. More details available on arXiv (http://arxiv.org/abs/0905.0940). 17/17 Vincent Tan (MIT) Large-Deviations for Learning Trees ISIT 17 / 17
© Copyright 2025 Paperzz