Papers

The titles below are links to arXiv versions; the bibliographic entries are links to the published versions.

Concentration and convergence rates for spectral measures of random matrices (with Mark Meckes) -- submitted.

Abstract: The topic of this paper is the typical behavior of the spectral measures of large random matrices drawn from several ensembles of interest, including in particular matrices drawn from Haar measure on the classical Lie groups, random compressions of random Hermitian matrices, and the so-called random sum of two independent random matrices. In each case, we estimate the expected Wasserstein distance from the empirical spectral measure to a deterministic reference measure, and prove a concentration result for that distance. As a consequence we obtain almost sure convergence of the empirical spectral measures in all cases.

Personal Note:I think of this paper as being unofficially dedicated to our children: Peter, who stubbornly refused to be born while most of the work in this paper was done; and Juliette, who told me one morning that it would make her happy if I proved a theorem that day (I'm pretty sure it was what became Theorem 3.5).

Projections of probability distributions: A measure-theoretic Dvoretzky theorem -- to appear in Geometric Aspects of Functional Analysis: Papers from the Israel Seminar (GAFA), 2011.

Abstract:Many authors have studied the phenomenon of typically Gaussian marginals of high-dimensional random vectors; e.g., for a probability measure on Rd, under mild conditions, most one-dimensional marginals are approximately Gaussian if d is large. In earlier work, the author used entropy techniques and Stein's method to show that this phenomenon persists for k-dimensional marginals of d-dimensional distributions, if k=o(√log(d)). In this paper, a somewhat different approach is used to show that this phenomenon persists if k<2log(d)/log(log(d)), and that this estimate is best possible.

Note: This paper was previously titled "Projections of random vectors via the generic chaining". Michel Talagrand has kindly pointed out that in fact the generic chaining is not necessary for the proof and that Dudley's entropy bound suffices to obtain the result. Also, since that version, additional discussion on the parallels with Dvoretzky's theorem has been added; the connection had previously been left for the cognoscenti to notice on their own.

Limit theorems for Betti numbers of random simplicial complexes (with Matthew Kahle) -- submitted.

Abstract: There have been several recent articles studying homology of various types of random simplicial complexes. Several theorems have concerned thresholds for vanishing of homology, and in some cases expectations of the Betti numbers. However little seems known so far about limiting distributions of random Betti numbers. In this article we establish Poisson and normal approximation theorems for Betti numbers of different kinds of random simplicial complex: Erdős-Rényi random clique complexes, random Vietoris-Rips complexes, and random Čech complexes. These results may be of practical interest in topological data analysis.

Another observation about operator compressions (with Mark Meckes) -- Proc. Amer. Math. Soc. 139 (2011).

Abstract: Let T be a self-adjoint operator on a finite dimensional Hilbert space. It is shown that the distribution of the eigenvalues of a compression of T to a subspace of a given dimension is almost the same for almost all subspaces. This is a coordinate-free analogue of a recent result of Chatterjee and Ledoux on principal submatrices. The proof is based on measure concentration and entropy techniques, and the result improves on some aspects of the result of Chatterjee and Ledoux.

Approximation of projections of random vectors -- to appear in J. Theoret. Probab.

Abstract: Let X be a d-dimensional random vector and Xθ its projection onto the span of a set of orthonormal vectors 1,...,θk}. Conditions on the distribution of X are given such that if θ is chosen according to Haar measure on the Stiefel manifold, the bounded-Lipschitz distance from Xθ to a Gaussian distribution is concentrated at its expectation; furthermore, an explicit bound is given for the expected distance, in terms of d, k, and the distribution of X, allowing consideration not just of fixed k but of k growing with d. The results are applied in the setting of projection pursuit, showing that most k-dimensional projections of n data points in Rd are close to Gaussian, when n and d are large and k=c(√log(d)) for a small constant c.

Please note: In the published version of this paper, there is a misprint in the last sentence of the abstract; it says there that k=c(log(d)). The statement above is in fact what follows from the proofs in this paper. For the sharp rate, see the paper "Projections of probability distributions: A measure-theoretic Dvoretzky theorem" above.

On Stein's method for multivariate normal approximation -- in High Dimensional Probability V: The Luminy Volume (2009).

Abstract: The purpose of this paper is to synthesize the approaches taken by Chatterjee-Meckes and Reinert-Röllin in adapting Stein's method of exchangeable pairs for multivariate normal approximation. The more general linear regression condition of Reinert-Röllin allows for wider applicability of the method, while the method of bounding the solution of the Stein equation due to Chatterjee-Meckes allows for improved convergence rates. Two abstract normal approximation theorems are proved, one for use when the underlying symmetries of the random variables are discrete, and one for use in contexts in which continuous symmetry groups are present. The application to runs on the line from Reinert-Röllin is reworked to demonstrate the improvement in convergence rates, and a new application to joint value distributions of eigenfunctions of the Laplace-Beltrami operator on a compact Riemannian manifold is presented.

Quantitative asymptotics of graphical projection pursuit -- Electron. Comm. Probab. 14 (2009).

Abstract:There is a result of Diaconis and Freedman which says that, in a limiting sense, for large collections of high-dimensional data most one-dimensional projections of the data are approximately Gaussian. This paper gives quantitative versions of that result. For a set of deterministic vectors {xi}i=1n in Rd with n and d fixed, let θ be a random point of the sphere Sd-1and let μnθ denote the random measure which puts mass 1/n at each of the points {x1· θ,...,xn· θ}. For a fixed bounded Lipschitz test function f, Z a standard Gaussian random variable and σ2 a suitable constant, an explicit bound is derived for the probability that the integral of f with respect to μnθ differs from the expected value of f(σ Z) by more than ε. A bound is also given for the deviations about zero of the bounded Lipschitz distance between μnθ and the law of σ Z, which yields a lower bound on the waiting time to finding a non-Gaussian projection of the {xi} if directions are tried independently and uniformly on Sd-1.

Two multivariate central limit theorems -- preprint (one of the two was improved and incorporated into "On Stein's method for multivariate normal approximation" above).

Abstract: In this paper, explicit error bounds are derived in the approximation of rank k projections of certain n-dimensional random vectors by standard k-dimensional Gaussian random vectors. The bounds are given in terms of k, n, and a basis of the k-dimensional space onto which we project. The random vectors considered are two generalizations of the case of a vector with independent, identically distributed components. In the first case, the random vector has components which are independent but need not have the same distribution. The second case deals with finite exchangeable sequences of random variables.

On the approximate normality of eigenfunctions of the Laplacian -- Trans. Amer. Math. Soc. 361, no. 10 (2009).

Abstract: The main result of this paper is a bound on the distance between the distribution of an eigenfunction of the Laplacian on a compact Riemannian manifold and the Gaussian distribution. If X is a random point on a manifold M and f is an eigenfunction of the Laplacian with L2-norm one and eigenvalue , then the total variation distance between f(X) and a standard Gaussian random variable is bounded by (2/μ)E||∇ f(X)|2-E|∇f(X) |2|. This result is applied to construct specific examples of spherical harmonics of arbitrary (odd) degree which are close to Gaussian in distribution. A second application is given to random linear combinations of eigenfunctions on flat tori.

Multivariate normal approximation using exchangeable pairs (with Sourav Chatterjee) -- ALEA 4 (2008).

Abstract: Since the introduction of Stein's method in the early 1970s, much research has been done in extending and strengthening it; however, there does not exist a version of Stein's original method of exchangeable pairs for multivariate normal approximation. The aim of this article is to fill this void. We present three abstract normal approximation theorems using exchangeable pairs in multivariate contexts, one for situations in which the underlying symmetries are discrete, and real and complex versions of a theorem for situations involving continuous symmetry groups. Our main applications are proofs of the approximate normality of rank k projections of Haar measure on the orthogonal and unitary groups, when k=o(n).

Linear functions on the classical matrix groups -- Trans. Amer. Math. Soc. 360, no. 10 (2008).

Abstract:Let M be a random matrix in the group of n ×n orthogonal matrices over R, distributed according to Haar measure, and let A be a fixed n× n matrix over R such that Tr(AAt)=n. Then the total variation distance of the random variable Tr(AM) to standard normal is bounded by (2√3)/(n-1), and this rate is sharp up to the constant. Analogous results are obtained for M a random unitary matrix and A a fixed n× n matrix over C. The proofs are applications of a new abstract normal approximation theorem which extends Stein's method of exchangeable pairs to situations in which continuous symmetries are present.

The central limit problem for random vectors with symmetries (with Mark Meckes) -- J. Theoret. Probab. 20, no. 4 (2007).

Abstract:Motivated by the central limit problem for convex bodies, we study normal approximation of linear functionals of high-dimensional random vectors with various types of symmetries. In particular, we obtain results for distributions which are coordinatewise symmetric, uniform in a regular simplex, or spherically symmetric. Our proofs are based on Stein's method of exchangeable pairs; as far as we know, this approach has not previously been used in convex geometry and we give a brief introduction to the classical method. The spherically symmetric case is treated by a variation of Stein's method which is adapted for continuous symmetries.

An Infinitesimal Version of Stein's Method of Exchangeable Pairs (Ph.D. thesis under Persi Diaconis, 2006).

Abstract:The central theme of this dissertation is an extension of Stein's method of exchangeable pairs for use in proving the approximate normality of random variables which are invariant under a continuous group of symmetries. A key feature of the technique is that, for univariate approximation, it provides convergence rates in the total variation metric as opposed to the weaker notions of distance obtained via the classical versions of Stein's method. This new technique is applied to projections of Haar measure on the classical matrix groups as well as spherically symmetric distributions on Euclidean space. The technique is also used in studying eigenfunctions of the Laplacian on a large class of Riemannian manifolds. A multivariate version of the method is developed and applied to higher dimensional projections of Haar measure on the classical matrix groups and spherically symmetric distributions on Euclidean space.

Exchangeable pairs and Poisson approximation (with Sourav Chatterjee and Persi Diaconis) -- Probab. Surv. 2 (2005).

Abstract:This is a survey paper on Poisson approximation using Stein's method of exchangeable pairs. We illustrate using Poisson-binomial trials and many variations on three classical problems of combinatorial probability: the matching problem, the coupon collector's problem, and the birthday problem. While many details are new, the results are closely related to a body of work developed by Andrew Barbour, Louis Chen, Richard Arratia, Lou Gordon, Larry Goldstein, and their collaborators. Some comparison with these other approaches is offered.

Talks

Here are slides from a more or less random selection of some of my talks. If you want slides from a talk that isn't posted here, just send me email and I can probably dig them up.

Slides from "Another observation about operator compressions" at the AMS Special Session on Random Matrices and Applications, University of New Mexico, April 2010.

Slides from "When is normal normal? Quantitative asymptotics of graphical projection pursuit" at Cleveland State University, November 13, 2009.

Slides from "Stein's method and infinitesimal symmetries" at the Workshop on Stein's Method at the National University of Singapore in April, 2008.

Slides from my talks Stein's method: the discrete case and Stein's method: the continuous case at Carnegie Mellon in February 2008.

Slides from my talk "Exchangeable pairs and multivariate normal approximation" at the Third Cornell Probability Summer School in June 2007.
(Somehow I lost track of the page numbering -- pages 13-15 do not exist.)

Slides of my talks Approximating by the Normal Distribution and Stein's Method for Continuous Symmetries at the Conference on Number Theory and Random Phenomena at the University of Bristol in March 2007.

Slides from my talk "Linear functions on the compact classical groups" at the Conference on Number Theory and Random Matrix Theory at the University of Rochester.
(These are a little difficult to read in places.)