November 27, 2006
The ACM Computing Classification: G.1.3 Numerical Linear Algebra, F.2.1 Numerical Algorithms and Problems
.
The first author is partially supported by the NSF grant DMS 0245380. The second author is an Alfred P. Sloan Research Fellow. He is also partially supported by the NSF grant DMS 0401032.
Sampling from large matrices: an approach through geometric functional analysis
Mark Rudelson
Roman Vershynin
Department of Mathematics, University of Missouri, Columbia, MO 65211, U.S.A. E-mail address : rudelson@math.missouri.edu Department of Mathematics, University of California, Davis, CA 95616, U.S.A. E-mail address : vershynin@math.ucdavis.edu
-
Abstract.
We study random submatrices of a large matrix
. We show how to approximately compute
from its random submatrix of size
with a small error in the spectral norm, where
is the ratio of the squares of the Frobenius (Hilbert-Schmidt) and the spectral norms. One can view
as the “average sparsity” of the matrix; it is always bounded by the rank. This yields an effective algorithm for computing low-rank approximations of
. We also estimate norms of random submatrices of
. We improve known estimates on the cut-norm (the
operator norm). This yields an approximation algorithm for all MAX-2CSP problems (which include many graph problems, such as the MAX-CUT). The solution to a MAX-2SCP problem on
variables can be approximated within the additive error
by the solution of the problem induced by a randomly chosen
variables. The best known sample complexity was
. We improve it to
.
We also find an optimal estimate on the spectral norm of random submatrices of
(the
operator norm). All our results are essentially dimension-free; the picture being only controlled by the norms of the matrix and not by its size or rank. We use methods of Probability in Banach spaces, in particular the law of large numbers for operator-valued random variables.
1 Introduction
This paper studies random submatrices of a large matrix
. The study of random submatrices spans several decades and is related to diverse areas of mathematics and computer science. Two main reasons for the interest in random submatrices are:
-
(1)
one can learn a matrix
from its random submatrices;
-
(2)
properties of a matrix
may improve by passing to its random submatrices.
We address both aspects of random submatrices in this paper. We show how to approximate
by its random submatrix in the spectral norm, and we compute the asymptotics of the spectral and the cut norms of random submatrices. We look for a dimension-free picture, which is only controlled by the norms of the matrix
and not by its size (or rank). This yields improvements upon known algorithms for computing low rank approximations, Singular Value Decompositions, and approximations to MAX-2CSP problems.
1.1 The spectral norm: low rank approximations and SVD
Can one approximate
by only knowing a random submatrix of
of a fixed size?
One is especially interested in the sample complexity of this problem, the minimal size of a submatrix which yields a good approximation with a small error in some natural norm, and with high probability.
This problem belongs to a class of problems the Statistical Learning Theory is concerned with. These problems inevitably bear the assumption that the the object to be learned belongs to a relatively small “target” class. To be able to learn
from a matrix of small size thus of small rank, we have to assume that
itself has small rank–or can be approximated by an (unknown) matrix of a small rank. We thus strive to find a low rank approximation to a matrix
, whenever such an approximation exists, from only knowing a small random submatrix of
.
Solving this problem is essential for development of fast Monte-Carlo algorithms for computations on large matrices. An extremely large matrix
– say, of the order of
– is impossible to upload into the Random Access Memory (RAM) of a computer; it is instead stored in an external memory. On the other hand, sampling a small submatrix of
, storing it in RAM and computing its small rank approximation is feasible.
The crucial assumption that
is essentially a low rank matrix holds in many applications. For example, this is a model hypothesis in the Latent Semantic Indexing (see [6, 21, 4, 9, 5, 2] ). There
is the “document-term matrix”, which is formed of the frequencies of occurrence of various terms in the documents of a large collection. The hypothesis that the documents are related to a small number of (unknown) topics translates into the assumption that
can be approximated by an (unknown) low rank matrix. Finding such an approximation would determine the “best” topics the collection is really about. Other examples where this problem arises include clustering of graphs [10] , DNA microarray data, facial recognition, web search (see [12] ), lossy data compression and cryptography (see [5] ).
The best fixed rank approximation to
is obviously given by the partial sums of the Singular Value Decomposition (SVD)
where
is the nonincreasing and nonnegative sequence of the singular values of
, and
and
are left and right singular vectors of
respectively. The best rank
approximation to
in both the spectral and Frobenius norms1
is thus
, where
is the orthogonal projection onto the top
left singular vectors of
. In particular, for the spectral norm we have
|
(1)
|
However, computing
, which gives the first elements of the SVD of a
matrix
, is often impossible in practice because (1) it would take many passes through
, which is prohibitively slow for a matrix stored in an external memory; (2) this would take superlinear time in
. Instead, it was proposed in [16, 11, 12, 13] to use the Monte-Carlo methodology: namely, approximate the
-th partial sum of the SVD of
by the
-th partial sum of the SVD of a random submatrix of
. In this paper, we show that this can be done:
-
(1)
with sample complexity
, that is by sampling only
random rows of
, if
is is approximable by a rank
matrix;
-
(2)
in one pass through
if the matrix is stored row-by-row, and in two passes if its entries are stored in arbitrary order;
-
(3)
using RAM space and time
(and polynomial in
and
).
Theorem 1.1.
Let
be an
matrix, and
be an integer. For
, and let
|
(2)
|
Let
be a
matrix consisting of
normalized rows of
, picked independently with replacement, with probabilities proportional to the squares of their Euclidean lengths. Let
be the orthogonal projection onto the top
left singular vectors of
. Then with high probability
|
(3)
|
Here and in the sequel,
denote positive absolute constants.
Comparing 3 with the best approximation 1 given by the SVD, we see an additional error
which can be made small by increasing the size
of the sample.
1.1.1 Sample complexity.
The approximation scheme in Theorem 1.1 is not new; it was developed in [16, 11, 12, 13] .
However, the sample complexity
in Theorem 1.1 improves significantly upon the previously known estimates. The sample complexity is essentially linear in the “average sparsity”
of the matrix;
is obviously bounded by the rank of
. We can view
as a relaxation of the rank, which is stable under small perturbations of
(unlike the rank itself !) Thus for matrices
which are essentially low-rank matrices,
is also small.
This complexity is optimal in many natural cases. For example, if
is a matrix of an orthogonal projection, then one must have
for a meaningful approximation of
by
. But
, thus
. One can also show that the logarithm is needed in 2 .
1.1.2 Law of large numbers for operator-valued random variables
The new feature in our proof of Theorem 1.1 is a use of the first author's argument about random vectors in the isotropic position [22] . It yields a law of large numbers for operator-valued random variables. We apply it for independent copies of a rank one random operator, which is given by a random row of the matrix
.
1.2 The cut-norm: decay, approximation to MAX-CSP problems
Alon, Fernandez de la Vega, Kannan and Karpinski [1, 3] reduced the question of finding additive approximations to the MAX-CSP problems (which are NP-hard) to computing the cut-norm of random submatrices.
The input of a MAX-2CSP problem is a set
of
distinct Boolean functions
on
Boolean variables
, where each function
depends on only two variables. The output
is the maximal number of functions that can be simultaneously set to
by a truth assignment to the variables. This class of problems includes many graph problems, in particular the MAX-CUT problem, in which one wants to maximize, for a given graph, the number of edges that join a subset of vertices with its complement.
The exact problem is NP-hard. To approximately solve MAX-2CSP, it was proposed in [1, 3] to use the Monte-Carlo methodology, that is to approximate the problem by the induced problem on a small random sample of variables. For a subset
of the variables
, then induced problem
consists of the functions from
that depend on the variables from
. Our goal is to find the minimal
(the sample complexity) such that for random subset
of
variables,
|
(4)
|
(note that the maximal number of formulas in
and
is of order of
and
respectively).
Goldreich, Goldwasser and Ron [17] were first to find a polynomial time algorithm for approximating MAX-2CSP within the additive error
. The sample complexity in [17] was
. Independently, Fernandez de la Vega [14] developed a different polynomial time algorithm for MAX-CUT. Alon, Fernandez de la Vega, Kannan and Karpinski [1, 3] improved the sample complexity in 4 to
, which has remained best known up to now. Actually, the same sample complexity of this order was shown in [3] to be sufficient for all MAX-rCSP problems (in which the functions dpend on
variables). By improving the central part of the argument of [3] , which is an estimate of the cut-norm of random submatrices, we shall reduce the sample complexity for MAX-2CSP to
.
We shall call the random subset of expected cardinality
a subset of an
-element set formed by including each element independently with probability
.
Theorem 1.2.
For any
, there exists a positive integer
which satisfies the following. Let
be a collection of Boolean functions on
variables
, each function depending on two variables. Let
be a random subset of the variables
of expected cardinality
.
Then 4 holds with high probability.
This result follows, by the argument of [3] , from an estimate of the cut-norm of random submatrices. The cut norm of an
matrix
is the maximum sum of the entries of its sumbatrix,
and it is equivalent to the
operator norm. We want to see how the cut norm of
decreases when we pass to its random submatrix
, where
is a random subset of
of expected cardinality
. Intuitively,
is
times smaller thatn
if
is diagonal-free, but only
times smaller than
if
is a diagonal matrix. The general case combines both types of decay:
Theorem 1.3.
Let
be an
matrix. Let
be a random subset of
of expected cardinality
. Then
where
is the sum of the Euclidean lengths of the columns of
, and
is the diagonal part of
.
Remark 1.4.
Note that by Cauchy-Schwartz inequality
.
This result improves in several ways the estimate from [3] . The argument of [3] is combinatorial, while our proof uses the technique of probability in normed spaces, and includes decoupling, symmetrization, and application of a version of Slepian's lemma for Bernoulli random variables due to Talagrand.
1.3 The spectral norm: decay
Perhaps the most important matrix norm is the spectral norm (the
operator norm). Nevertheless, its decay under passing to submatrices has not been sufficiently understood. Given an
matrix
, its submatrix
consists of the rows of
indexed by a subset
. If
is a random subset of epected cardinality
(as above), the length of any vector
reduces by the factor of
when one orthogonally projects
onto
. One should then expect similar type of decay of the spectral norm–something like
. However, similarly to the previous section, the decay for diagonal matrices is different–for example, there is no decay at all for the identity matrix. The correct asymptotics in this case is given by
with
. For diagonal matrices, it is the average of
biggest absolute values of the entries. General matrices again combines both types of decay:
Theorem 1.5.
Let
be an
matrix. Let
be a random subset of
of expected cardinality
. Then the random submatrix
formed by the columns of
indexed by
satisfies
Remark 1.6.
The example considered in Remark 2.2 below shows that the coefficient
is necessary here. The same example implies that the logarithmic term is required in Theorem 1.1 as well.
Generalizing an earlier result of Lunin [20] , Kashin and Tzafriri [18] (see [25] ) essentially proved the existence of a subset
of cardinality
and such that
Note that
is the average of the lengths of all columns of
.
As the example of diagonal operators shows, for random subsets
this term has to be replaced by the average of the few biggest columns. Talagrand [23] proved deep results on the more general operator norms
, where
is a
-smooth Banach space.
However, the decay on
in his results is logarithmic.
1.4 Dimension-free approach
Many problems on random submatrices, of both theoretical and practical importance, have functional-analytic rather than linear-algebraic nature. These problems, like those this paper considers, are about estimating operator norms. We thus see a matrix
as a linear operator
between finite dimensional normed spaces – say, between
and
for the spectral norm, and between
and
for the cut norm.
From this perspective, the dimension
of the underlying normed space should play a minor role, while the real control of the picture should be held by (hopefully few) quantities tied to the operator rather than the space. As a trivial example, if
is not of full rank then the dimension
of the range space is useless comparing to the rank of
. Further, we are looking for stable results, those not ruined by small perturbations of the linear operators – this is a natural demand in applications, and this differs our analytic perspective from the linear algebraic one. It would thus be natural to look for stable quantities tied to linear operators, which govern the picture. For example, operator norms are stable quantities, while the rank is not.
This paper advances a dimension free approach to matrices. The low rank approximations in Theorem 1.1 are only controlled by the average sparsity
of the matrix, which is a stable relaxation of the rank. The norms of random matrices in Theorems 1.3 and 1.5 are essentially controlled by the norms of the original matrix (and naturally by the sampling factor, the ratio of the size of the submatrix to the size of the original matrix). The size
of the matrix, which is the dimension of the space, does not play a separate role in these results.
Another part of this picture which we omitted in this paper is the invertibility of submatrices. Dimension-free versions of the invertibility results due to Bourgain and Tzafriri can be found in [25] , se also [8] for random submatrices.
Acknowledgement. This project started when the authors participated in the PIMS Thematic Programme on Asymptotic Geometric Analysis at the University of British Columbia in Summer 2002. The first author was a PIMS postdoctoral fellow at that time. We are grateful to PIMS for its hospitality. The final part of this research was done when the first authour visited University of California, Davis. We are also grateful for R. Kannan for his comments on the initial version of this manuscript.
2 Low rank approximations
In this section, we prove Theorem 1.1 and discuss the algorithm for finding low rank approximations. Our argument is based on the law of large numbers for operator-valued random variables.
2.1 Law of large numbers for operator-valued random variables
Theorem 1.1 is about random independent sampling the rows of the matrix
. The result of such sampling can be viewed as an empirical process taking values in the set of rows. If we sample enough rows, then the matrix constructed from them would nicely approximate the original matrix
in the spectral norm. For the scalar random variables, this effect is the classical Law of Large Numbers. For example, let
be a bounded random variable and let
be independent copies of
. Then Markov inequality implies
The operator-valued analog of this inequality is harder to prove. In this case instead of estimating the absolute value of the difference between the sample mean and the expectation, we have to estimate the norm. Thus instead the large deviation estimate for a single random variable, we have to deal with estimating the supremum of a random process. This requires deeper probabilistic techniques. The following Theorem generalizes the main result of [22] .
Theorem 2.1.
Let
be a random vector in
. Assume for normalization that
. Let
be a natural number and let
be independent copies of
. Then
provided that the last expression is smaller than
.
Remark 2.2.
This estimate is in general optimal. A simplest example is
, the identity matrix in
. Let us sample random rows of
(scaled by
for normalization); this amounts to the random vector
taking values
each with probability
, where
is the canonical basis of
.
Then
. Here and in the sequel, by
we denote the linear operator whose matrix is
. We wish to replace this mean by the sample mean
, where
are independent copies of
. Then
If we want this quantity to be
, then it is not hard to check that
should be of order at least
. Therefore, the coefficient
in Theorem 2.1 is optimal.
2.2 Proof of Theorem 2.1 .
The proof consists of two steps. First we introduce a Rademacher series that majorizes the expectation of the norm. Then we use the main Lemma of [22] to obtain a bound for it.
The first step is relatively standard. Let
be independent Bernoulli variables taking values
with probability
and let
be independent copies of
. Denote
the expectation according to
and
respectively. Since
is a symmetric random variable, we have
| |
| |
To estimate the last expectation, we need the following Lemma [22] .
Lemma 2.3.
Let
be vectors in
and let
be independent Bernoulli variables taking values
with probability
. Then
Remark 2.4.
We can consider the vectors
as vectors in their linear span, so the dimension of the ambient space may be always assumed less or equal to
.
Applying the Lemma, we get
| |
|
(5)
|
We have
| |
| |
Thus, denoting
we obtain by 5
| |
| |
If
we get
which completes the proof of Theorem 1.
2.3 Proof of Theorem 1.1
By the homogeneity, we can assume
.
The following lemma of Drineas and Kannan [11] (see also [12] ) reduces Theorem 1.1 to a comparison of
and a sample
in the spectral norm.
Lemma 2.5 (Drineas, Kannan).
|
(6)
|
-
Proof.
| |
| |
| |
By a result of perturbation theory,
. This proves the claim.
For
let
be the
-th row of the matrix
. Then
We shall view the matrix
as the true mean of a bounded operator valued random variable, whereas
will be its empirical mean; then we shall apply the Law of Large Numbers for operator-valued random variables, that is Theorem 2.1 . To this end, define a random vector
as
and let
be independent copies of
. Let the matrix
consist of rows
. (The normalization of
is different than in the statement of Theorem 1.1 : in the proof, it is convenient to multiply
by the factor
. However note that the singular vectors of
and thus
do not change.) Then
and
Applying Theorem 2.1 we get
whenever the last quantity is less than 1. Let
be an absolute constant. Choose
, then
Hence, by Chebychev's inequality,
with probability at least
. Finally, by 6
This proves Theorem 1.1 .
2.4 Algorithmic aspects of Theorem 1.1 .
Finding a good low rank approximation to a matrix
amounts, due to Theorem 1.1 , to sampling a random submatrix
and computing its SVD (actually, only left singular vectors are needed). The algorithm works well if the “average sparsity”
is small. This is the case, in particular, when
is essentially a low-rank matrix, because
.
First, the algorithm samples
random rows of
. Namely, it should take
independent samples of the random variable
whose law is
where
is the
-th row of
, and
is the Euclidean norm. This sampling can be done in one pass through
if the matrix is stored row-by-row, and in two passes if its entries are stored in arbitrary order ([10,Section5.1] .
Then the algorithm computes the SVD of the
matrix
. This can be done in time
the time needed to compute the SVD of a
matrix. The latter can be done by one of the known methods. This takes significantly less time than computing SVD of the original
matrix
. In particular, the running time of this algorithm is linear in the dimensions of the matrix (and polynomial in
).
3 The cut norm and MAX-CSP problems
We will first prove Theorem 1.3 on the cut norm of random submatrices and then indicate how it, together with the argument of [3] , implies Theorem 1.2 on approximating MAX-2CSP with the sample complexity
.
3.1 Decay of the cut-norm
The proof of Theorem 1.3 uses tools of Probability in Banach spaces: decoupling, symmetrization, and Slepian's Lemma (more precisely, its version for the Rademacher random variables due to M.Talagrand). It is easy to show that the cut norm of any operator
is equivalent to
, which we denote by
. We start with a decoupling lemma due to Bourgain and Tzafriri [7] .
Lemma 3.1.
Let
be a finite sequence of bounded i.i.d. random variables, and
be its independent copy. Then for any sequence of vectors
in a Banach space with
,
Let
be independent Bernoulli random variables taking value 1 with probability
. For
denote by
the coordinate projection on the random set of coordinates
.
Let
be the diagonal part of
. Write
We apply the Lemma 3.1 to estimate the first summand, taking
if
and
if
. Then by the triangle inequality
where
is an independent copy of the sequence
. Then to complete the proof it is enough to assume that the diagonal of
is zero and prove the inequality in the stated in the theorem for
.
Denoting the unit ball of
by
, we write
| |
| |
Since
are mean zero, one can replace
by
, an independent copy of
. More precisely, the first term does not exceed
The random variable
is symmetric, hence it is distributed identically with
, where
are Rademacher random variables independent of everything else (i.e.
valued symmetric r.v.'s). Therefore the expression above is bounded by
To estimate it, we use Slepian's inequality for Rademacher random variables, proved by Talagrand (see [19] ). This estimate, which allows to remove the absolue values, is known as Gine-Zinn argument. Namely, for any
,
So,
| |
| |
| |
| |
| |
Hence we proved that
|
(7)
|
By essentially repeating the above argument, we obtain
| |
| |
and interchanging the expectation and the sum, we then apply Cauchy-Schwartz inequality:
| |
| |
| |
Also,
Putting this into 7 , we complete the proof.
3.2 Approximation of MAX-2CSP
Let us briefly indicate how Theorem 1.2 on approximating MAX-2CSP reduces to Theorem 1.3 on the cut norm of random submatrices. This reduction was done by Alon, Fernandez de la Vega, Kannan and Karpinski [1, 3] .
For a
, one sets up the
matrix
whose
-th entry is the number of functions that depend on
and
and that are made true by the truth assignment
,
. Then the solution
can be computed as the maximum of the polynomial
over the Boolean cube
. Similarly, the solution
to the induced problem is the maximum of the polynomial
determined by the submatrix
.
The problem is then to compare the maxima of the two polynomials,
and
.
This can be proved (although still non-trivially, see (34) in [3] ) in the case when
are “cut-matrices” (or their scalar multiples). A cut matrix is the indicator function of the cartesian product
for some
. To pass from cut matrices to general matrices, one uses the algorithmic version of Szemeredi's Regularity Lemma [15] , which allows to approximate any matrix in the cut norm by a linear combination of a small number of cut matrices. It is easy to check that the maximum of
is stable with respect to small variations of
in the cut norm. But it is a nontrivial question if small variations of
will result in small variations in the induced problem, that is of the random submatrix
. We thus have to show that the cut norm of a matrix decreases proportinally when one passes to its random submatrix. This is where Theorem 1.3 is used.
Specifically, for a matrix
, the algorithmic version of Szemeredi's Regularity Lemma [15] states that there exists at most
cut-matrices whose sum, denoted by
, satisfies
|
(8)
|
where
denotes the maximal absolute value of the entries. Note that
|
(9)
|
because the number of different Boolean functions that can depend on two fixed variables is
. Thus the approximation is good in the cut norm,
.
To prove 4 , it remains to show that this error will decrease to
for the induced problem. This is done by using Theorem 1.3 (see remark below it) along with 8 and 9 :
| |
| |
This establishes nice approximations of both the original and the induced problem, which leads to Theorem 1.2 as described in the previous paragraph.
4 The decay of the spectral norm
In this section we prove Theorem 1.5 .
By homogeneity we can assume that
. The matrix
with columns
can be written as
We have to compute
where
are
-valued independent random variables with
. This will be done by a usual symmetrization and applying Lemma 2.3 .
Now we pass to a detailed proof. The standard symmetrization procedure (see [19] Lemma 6.3) yields
| |
| |
Denote
. We apply Lemma 2.3 to bound
.
By Remark 2.4 , we can assume
in this Lemma equal
Then using the Cauchy-Schwartz inequality we obtain
| |
| |
To estimate the fist term in the product we use the following
Lemma 4.1.
Let
and let
be independent Bernoulli random variables taking value
with probability
. Then
-
Proof.
To prove the upper estimate note that
Hence,
|
(11)
|
Jensen's inequality yields
By the linearity of expectation, the first term equals
Here we estimated
replacing
by 1. Taking the expectation first with respect to
and then with respect to the other
, we obtain that the last expression is bounded by
Finally, substituting this bound into 11 , we obtain
To prove the lower bound, we estimate the product in Lemma 4.1 from below to make the terms independent. We have
| |
|
(12)
|
These terms will be estimated separately. Since
,
Let
. Denote by
the event
for
.
Then
Hence,
Substituting this estimate into 12 finishes the proof.
Now we can complete the proof of Theorem 1.5 . Combining Lemma 4.1 and 10 , we get
Recall that
. It can be easily checked that
implies
.
Hence,
This completes the proof.
References
-
N. Alon, W. de la Vega, R. Kannan, M. Karpinski, Random Sampling and approximation of MAX-CSPs, 34'th ACM STOC (2002)
-
Y. Azar, A. Fiat, A. Karlin, F. McSsherry, J. Saia, Spectral analysis of data, Proceedings of the 33rd ACM Symposium on THeory of Computing, 2001
-
N. Alon, W. de la Vega, R. Kannan, M. Karpinski, Random Sampling and approximation of MAX-CSPs, Journal of Computer and System Sciences 67 (2003), 212-243
-
M. W. Berry, S. T. Dumais, G. W. O'Brian, Using linear algebra for intelligent information retrieval, SIAM Review 37 (1995), 573–595
-
M. W. Berry, Z. Drmac, E. R. Jessup, Matrices, vector spaces and information retrieval, SIAM Review 41 (1999), 335–362
-
M. J. Berry, G. Linoff, Data mining techniques. John-Wiley, 1997
-
J. Bourgain and L. Tzafriri: Invertibility of “large” sumatricies with applications to the geometry of Banach spaces and harmonic analysis. Israel J. Math., 57 (1987) 137–223
-
P. G. Casazza, R. Vershynin, Kadison-Singer meets Bourgain-Tzafriri, submitted
-
S. T. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, R. H. Harshman, Indexing by latent semantic analysis, Journal of the American Society for Information Science 41 (1990), 391–407
-
P. Drineas, A. Frieze, R. Kannan, S. Vempala, V. Vinay, Clustering in large graphs via Singular Value Decomposition, Journal of Machine Learning, to appear (2004)
-
P. Drineas, R. Kannan, Pass efficient algorithms for approximating large matrices, Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (Baltimore, MD, 2003), 223–232, ACM, New York, 2003
-
P. Drineas, R. Kannan, M. Mahoney, Fast Monte-Carlo Algorithms for Matrices II: Computing a low-rank approximation to a matrix, preprint
-
P. Drineas, M. Mahoney, R. Kannan, Fast Monte-Carlo Algorithms for Matrices III: Computing an Efficient Approximate Decomposition of a Matrix, preprint
-
W. Fernandez de la Vega, MAX-CUT has a randomized approximation scheme in dense graphs, Random Structures and Algorithms 8 (1996), 187–199
-
A. Frieze and R. Kannan, The Regularity Lemma and approximation schemes for dense problems, Proceedings of the 37th Annual IEEE Symposium on Foundations of Computing (1996), 12–20
-
A. Frieze, R. Kannan and S. Vempala, Fast Monte-Carlo Algorithms for finding low-rank approximations, Proceedings of the Foundations of Computer Science, 1998, pp. 378–390, journal version in Journal of the ACM 51 (2004), 1025-1041
-
O. Goldreich, S. Goldwasser, D. Ron, Property testing and its connection to learning and approximation, Proc. 37th IEEE FOCS 96, 339–348; journal version in Journal of the ACM 45 (1998), 653–750
-
B. Kashin, L. Tzafriri, Some remarks on the restrictions of operators to coordinate subspaces, unpublished notes
-
M. Ledoux and M. Talagrand, Probability in Banach spaces, Springer, 1991
-
A. A. Lunin, On operator norms of submatrices, Math. USSR Sbornik 27 (1975), 481–502
-
C. H. Papadimitriou, P. Raghavan, H. Tamaki, S. Vempala, Latent semantic indexing: A probabilistic analysis, Proceedings of the ACM symposium on principles of database systems, 1998
-
M. Rudelson, Random vectors in isotropipc position, J. Funct. Anal. 164 (1999), no. 1, 60–72.
-
M. Talagrand, Sections of smooth convex bodies via majorizing measures, Acta MAth. 175 (1995), 273–300
-
M. Talagrand, Majorizing measures: the generic chaining, Ann. Probab. 24 (1996), 1049–1103
-
R. Vershynin, John's decompositions: selecting a large part, Israel Journal of Mathematics 122 (2001), 253–277
Department of Mathematics, University of Missouri, Columbia, MO 65211, U.S.A. E-mail address : rudelson@math.missouri.edu Department of Mathematics, University of California, Davis, CA 95616, U.S.A. E-mail address : vershynin@math.ucdavis.edu