97JCGS04\P0355-----------------------------------------------
On the Visualization of Outliers via Self-Organizing Maps
Jorge Muruz\'{a}bal and Alberto Mu$\tilde{\rm{n}}$oz
We consider an exploratory approach to multivariate outlier
detection based on the neural network introduced by Kohonen
and generally known as the self-organizing map. Working in
cooperation with each other, a few meaningful 2-D images
(readily derived from the trained map) are shown to provide
an inexpensive, partly interactive framework where various
types of outlying patterns can be detected. Some robust
aspects of the key underlying notion of self-organization
are discussed.
Key Words: Atypical data; Dimensionality reduction; Neural networks;
Nonlinear projections; Self-organization.
97JCGS04\P0383--------------------------------------------------
Variable Resolution Bivariate Plots
Chisheng Huang, John Alan McDonald, & Werner Stuetzle
Scatterplots are the method of choice for displaying the
distribution of points in two dimensions. They are used to
discover patterns such as holes, outliers, modes, and
association between the two variables. A common problem
is overstriking---the overlap of the plotting surface of
glyphs representing individual observations. Overstriking
can create a misleading impression of the data distribution.
The variable resolution bivariate plots (\textit{Varebi plots})
proposed in this article deal with the problem of overstriking
by mixing display of a density estimate and display of
individual observations. The idea is to determine the display
format by analyzing the actual amount of overstriking on the
screen. Thus, the display format will depend on the sample
size, the distribution of the observations, the size and
shape of individual icons, and the size of the window. It
may change automatically when the window is resized. Varebi
plots detail wherever possible, and show the overall trend
when displaying detail is not feasible.
Key Words: Histogram; Scatterplot; Visualization.
97JCGS04\P0397----------------------------------------------------
Using Complex Integration to Compute Multivariate
Normal Probabilities
W. C. Soong and Jason C. Hsu
The execution of most multiple comparison methods involves, at
least in part, the computation of the probability that a
multivariate normal or multivariate $t$ random vector is in
a hyper-rectangle. In multiple comparison with a control as
well as multiple comparison with the best (or normal populations
or multinomial cell probabilities), the correlation matrix
\textbf{R} of the random vector is nonsingular and of the
form \textbf{R} = \textbf{D} + \underbar{$\lambda\,\lambda'$},
where \textbf{D} is a diagonal matrix and \underbar{$\lambda$}
is a known vector. It is well known that, in this case,
the multivariate normal rectangular probability can be
expressed as a one-dimensional integral and successfully
computed using Gaussian quadrature techniques. However, in
multiple comparison with the mean (sometimes called analysis
of means) of normal distributions, all-pairwise comparisons
of three normal distributions, as well as simultaneous inference
on multinomial cell probabilities themselves, the correlation
matrix is singular and of the form \textbf{R} = \textbf{D}
-- \underbar{$\eta\,\eta'$}. It is not well known that, in
this latter case, the multivariate normal rectangular
probability can still be expressed as a single integral,
albeit one with complex variables in its integrand.
Previously published proofs of the validity of this expression
either contained a gap or relied on a numerical demonstration,
and this article will provide an analytic proof. Furthermore,
we explain how this complex integral can be computed
accurately, using Romberg integration of complex variables
when the dimension is low, and using \v{S}id\'{a}k's
inequality as an approximation when the dimension is at
least moderate.
Key Words: Complex integration; Multiple comparisons; Multivariate
normal; Rectangular probability.
97JCGS04\P0416---------------------------------------------------
Variable Selection in Regression Via Repeated Data Splitting
Peter F. Thall, Kathy E. Russell, & Richard M. Simon
A new algorithm---backward elimination via repeated data splitting
(BERDS)---is proposed for variable selection in regression.
Initially, the data are partitioned into two sets \{$E,V$\},
and an exhaustive backward elimination (BE) is performed in $E$.
For each $p$ value cutoff $a$ used in BE, the corresponding
fitted model from $E$ is validated in $V$ by computing the sum
of squared deviations of observed from predicted values. This
is repeated $m$ times, and the $a$ minimizing the sum of the
$m$ sums of squares is used as the cutoff in a final BE on the
entire data set. BERDS is a modification of the algorithm
BECV proposed by Thall, Simon, and Grier (1992). An extensive
simulation study shows that, compared to BECV, BERDS has a
smaller model error and higher probabilities of excluding
noise variables, of selecting each of several uncorrelated
true predictors, and of selecting exactly one of two or three
highly correlated true predictors. BERDS is also superior to
standard BE with cutoffs .05 or .10, and this superiority
increases with the number of noise variables in the data
and the degree of correlation among true predictors. An
application is provided for illustration.
Key Words: Cross validation; Data splitting; Monte Carlo Simulation;
Regression; Variable Section.
97JCGS04\P0435---------------------------------------------------
Some Q-Q Probability Plots to Test Spherical and Elliptical Symmetry
Run-Ze Li, Kai-Tai Fang, & Li-Xing Zhu
This article proposes some probability plots are proposed to
test spherical and elliptical symmetry in terms of some
invariant statistics under orthogonal transformations. Some
correlation coefficients as numerical measures of detecting
deviation from spherical or elliptical symmetry are
recommended, and the empirical percentiles of these
correlation coefficients are calculated by simulation. The
simulation results for data sets from 12 different populations
show that the new plots are useful for testing spherical and
elliptical symmetry. Some discussion is given also.
Key Words: Beta distribution; Elliptical distribution; $F$-distribution;
Invariant statistics; Probability plot; Q-Q plot; Robust
statistics; Spherical distribution; $t$-distribution.
97JCGS04\P0451------------------------------------------------------
Regularization of Ill-Posed Problems by Envelope Guided Conjugate Gradients
Linda Kaufman and Arnold Neumaier
We propose a new way to iteratively solve large scale ill-posed
problems by exploiting the relation between Tikhonov regularization
and multiobjective optimization to obtain, iteratively,
approximations to the Tikhonov L-curve and its corner. Monitoring
the change of the approximate Ll-curves allows us to adjust the
regularization parameter adaptively during a preconditioned
conjugate gradient iteration, so that the desired solution can
be reconstructed with a low number of iterations. We apply the
technique to an idealized image reconstruction problem in\
positron emission tomography.
Key Words: Envelope; Ill-posed; L-curve; Multiobjective optimization;
Preconditioned conjugate gradients; Tikhonov
regularization.
97JCGS04\P0464-------------------------------------------------------------
Manual Controls for High-Dimensional Data Projections
Dianne Cook and Andreas Buja
Projections of high-dimensional data onto low-dimensional
subspaces provide insightful views for understanding multivariate
relationships. This article discusses how to manually control
the variable contributions to the projection. The user has
control of the way a particular variable contributes to the
viewed projection and can interactively adjust the variable's
contribution. These manual controls complement the automatic
views provided by a grand tour, or a guided tour, and give
greatly improved flexibility to data analysts.
Key Words: Data visualization; Dynamic graphics; Grand tour; Multivariate
analysis; Projection pursuit.