93JCGS02\P0225-------------------------------------------------------
Projection Pursuit Indexes Based On Orthonormal Function Expansions
Dianne Cook, Andreas Buja, and Javier Cabrera
Projection pursuit describes a procedure for searching high-dimensional
data for ``interesting'' low-dimensional projections via the optimization
of a criterion function called the projection pursuit index. By
empirically examining the optimization process for several projection
pursuit indexes, we observed differences in the types of structure that
maximized each index. We were especially curious about differences
between two indexes based on expansions in terms of orthogonal polynomials,
the Legendre index, and the Hermite index. Being fast to compute, these
indexes are ideally suited for dynamic graphics implementations.
Both Legendre and Hermite indexes are weighted $L^2$ distances between
the density of the projected data and a standard normal density. A
general form for this type of index is introduced that encompasses
both indexes. The form clarifies the effects of the weight function
on the index's sensitivity to differences from normality, highlighting
some conceptual problems with the Legendre and Hermite indexes. A new
index, called the Natural Hermite index, which alleviates some of these
problems, is introduced.
A polynomial expansion of the data density reduces the form of the index
to a sum of squares of the coefficients used in the expansion. This
drew our attention to examining these coefficients as indexes in their
own right. We found that the first two coefficients, and the lowest-order
indexes produced by them, are the most useful ones for practical
data exploration because they respond to structure that can be analytically
identified, and because they have ``long-sighted'' vision that enables
them to ``see'' large structure from a distance. Complementing this
low-order behavior, the higher-order indexes are ``short-sighted''.
They are able to see intricate structure, but only when they are close to it.
We also show some practical use of projection pursuit using the polynomial
indexes, including a discovery of previously unseen structure in a set of
telephone usage data, and two cautionary examples which illustrate that
structure found is not always meaningful.
Key Words: Clustering; Density estimation; Exploratory multivariate data
analysis; Nonnormality; Principal component analysis; Skewness.
93JCGS02\P0251-------------------------------------------------------
Performance of the Gibbs, Hit-and-Run, and Metropolis Samplers
Ming-Hui Chen and Bruce Schmeiser
We consider the performance of three Monte Carlo Markov-chain
samplers---the Gibbs sampler, which cycles through coordinate directions;
the Hit-and-Run (H\&R) sampler, which randomly moves in any direction;
and the Metropolis sampler, which moves with a probability that is a ratio
of likelihoods. We obtain several analytical results. We provide a
sufficient condition of the geometric convergence on a bounded region $S$
for the H\&R sampler. For a general region $S$, we review the Schervish
and Carlin sufficient geometric convergence condition for the Gibbs sampler.
We show that for a multivariate normal distribution this Gibbs sufficient
condition holds and for a bivariate normal distribution the Gibbs marginal
sample paths are each an AR(1) process, and we obtain the standard errors
of sample means and sample variances, which we later use to verify empirical
Monte Carlo results. We empirically compare the Gibbs and H\&R samplers
on bivariate normal examples. For zero correlation, the Gibbs sampler provides
independent data, resulting in better performance than H\&R. As the
absolute value of the correlation increases, H\&R performance improves,
with H\&R substantially better for correlations above .9. We also suggest
and study methods for choosing the number of replications, for estimating
the standard error of point estimators and for reducing point-estimator
variance. We suggest using a single long run instead of using multiple iid
separate runs. We suggest using overlapping batch statistics (obs) to
get the standard errors of estimates; additional empirical results show
that obs is accurate. Finally, we review the geometric convergence of the
Metropolis algorithm and develop a Metropolized H\&R sampler. This
sampler works well for high-dimensional and complicated integrands or
Bayesian posterior densities.
Key Words: AR(1) process; Bayesian posteriors; Geometric convergence; Markov
chain; Monte Carlo; Multidimensional integration; Overlapping batch
statistics; Simulation.
93JCGS02\P0273-------------------------------------------------------
Empirical Likelihood Confidence Bands in Density Estimation
Peter Hall and Art B. Owen
Empirical likelihood methods are developed for constructing confidence
bands in problems of nonparametric density estimation. These techniques
have an advantage over more conventional methods in that the shape of the
bands is determined solely by the data. We show how to construct an empirical
likelihood functional, rather than a function, and contour it to produce
the confidence bands. Analogs of Wilks's theorem are established in this
infinite-parameter setting and may be used to select the appropriate contour.
An alternative calibration, based on the bootstrap, is also suggested.
Large-sample theory is developed to show that the bands have asymptotically
correct coverage, and a numerical example is presented to demonstrate the
technique. Comparisons are made with the use of bootstrap replications to
choose both the shape and size of the bands.
Key Words: Bootstrap; Confidence bands; Empirical likelihood; Hypothesis
test; Nonparametric density estimation; Nonparametric regression;
Wilks's theorem.
93JCGS02\P0291-------------------------------------------------------
On Generating Random Intervals and Hyperrectangles
Luc Devroye, Peter Epstein, and J\"org--R\"udiger Sack
We look at the problem of generating a random hyperrectangle in a unit
hypercube such that each point of the hypercube has probability $p$ of being
covered. For random intervals of $[0,1]$, we compare various methods based
either on the distribution of the length or the distribution of the midpoint.
It is shown that no constant length solution exists. Nice versatility is
achieved when a random interval is chosen from among the spacings defined
by a Dirichlet process. A small simulation is included.
Key Words: Computational geometry; Dirichlet process; Monte Carlo;
Random cover; Random variate generation; Uniform coverage probability.
93JCGS02\P0309-------------------------------------------------------
Asymptotic Corrections for Multivariate Posterior Moments With Factored
Likelihood Functions
Neal Thomas
Asymptotic corrections are used to compute the means and the
variance-covariance matrix of multivariate posterior distributions that are
formed from a normal prior distribution and a likelihood function that factors
into separate functions for each variable in the posterior distribution.
The approximations are illustrated using data from the National Assessment of
Educational Progress (NAEP). These corrections produce much more accurate
approximations than those produced by two different normal approximations. In
a second potential application, the computational methods are applied to
logistic regression models for severity adjustment of hospital-specific
mortality rates.
Key Words: EM algorithm; Item response model; Laplace expansion.