Data Mining Seminar : Matrix Sketching
Instructors :Jeff PhillipsandMina Ghashami
Spring 2015 | Fridays 1:45 pm - 3:00 pm
Location : MEB 3147 (the LCR)
Catalog number: CS 7931 or CS 6961
Description:
A very common way to represent very large data sets is as a matrix. For instance if there are n data points, and each data points has d attributes, then this can be thought of an nxd matrix A with n rows and d columns. While matrix approximation and decomposition has been studied in numerical linear algebra for many decades, these methods often require more space and time than is feasible for very large scale settings, and also often worry about more precision than is required. The last decade has witnessed an explosion of work inmatrix sketchingwhere input matrix A is efficiently approximated with a more compact matrix B (or product of a few matrices) so that B preserves most of the properties of A up to some guaranteed approximation ratio. This class will attempt to survey the large and growing literature on this topic, focusing on simple algorithms, intuition for error bounds, and practical performance.
This 1-credit seminar will meet once a week. Instructors will give most lecturs. Students will be expected to carry out a small project explore one or more of the topics we discuss in a bit of depth, and pushing the boundaries of research. They will give a short presentation of their results at the end of class.
Schedule: (subject to change)
DateTopicReferencesSpeaker
Fri 1.16OverviewJeff Phillips
Fri 1.23Column SamplingWoodruff2.4 |DGP2.1, 3.1, 5.1 |MahoneyJeff Phillips
Fri 1.30Random Projection and HashingWoodruff2.1 |DGP2.2, 5.2Mina Ghashami
Fri 2.06Iterative (Frequent Directions)GLPW|DGP2.3, 3.2, 5.3Jeff Phillips
Fri 2.13CUR DecompositionsWoodruff4.1, 4.2 |MahoneyMina Ghashami
Fri 2.20(No Class - Grad Visit Day)
Fri 2.27Matrix Concentration BoundsTropp(Ch 5+6)Mina Ghashami
Fri 3.06Lower BoundsWoodruff6Jeff Phillips
Fri 3.13Sparsification@ 3:15 in WEB 1705Mina Ghashami
Fri 3.20(Fall Break - No Class)
Fri 3.27Regression and L1 (and Lp) Bounds@ 3:15 in WEB 1705Woodruff2.5, 3,YMMJeff Phillips
Fri 4.03Distributed ModelsWoodruff4.4 |GPLMina Ghashami
Fri 4.10Tensors DecompositionsMina Ghashami
Fri 4.17Project Presentations
Fri 4.24Project Presentations
Useful references:
Woodruff: David P. WoodruffSketching as a Tool for Numerical Linear Algebra. Foundations and Trends in Theoretical Computer Science. Vol. 10,(2014) pages 1-157.
Tropp: Joel A. TroppAn Introduction to Matrix Concentration Inequalities. arXiv:1501.01571. To appear in Foundations and Trends in Machine Learning.
GLPW: Mina Ghashami, Edo Liberty, Jeff M. Phillips, and David WoodruffFrequent Directions : Simple and Deterministic Matrix Sketching. arXiv:1501.01711.
DGP: Amey Desai and Mina Ghashami and Jeff M. PhillipsImproved Practical Matrix Sketching with Guarantees. arXiv:1501.06561.
Mahoney: Michael W. MahoneyRandomized Algorthims for Matrices and Data. Foundations and Trends in Machine Learning. Vol. 3, (2011) pages 123-224.
YMM: Jiyan Yang and Xiangrui Meng and Michael W. MahoneyImplementing Randomized Matrix Algorithms in Parallel and Distributed Environments. arxiv:1502.03032.