96JCGS02\P0113----------------------------------------------------
Interactive Graphics for Data Sets With Missing Values --- MANET
Antony Unwin, George Hawkins, Heike Hofmann, and Bernd Siegl
Missing values are a problem for statistical methods. This applies
just as much to modern methods such as interactive graphics as to
more classical methods. The MANET software has been developed for
keeping track of missing values in interactive graphics analyses
and for investigating new interactive graphics tools.
Key Words: Interactive graphics; Missing values.
96JCGS02\P0123----------------------------------------------------
The Visual Design and Control of Trellis Display
Richard A. Becker, William S. Cleveland, and Ming-Jen Shyu
Trellis display is a framework for the visualization of data. Its most
prominent aspect is an overall visual design, reminiscent of a garden
trelliswork, in which panels are laid out into rows, columns, and pages.
On each panel of the trellis, a subset of the data is graphed by a
display method such as a scatterplot, curve plot, boxplot, 3-D
wireframe, normal quantile plot, or dot plot. Each panel shows the
relationship of certain variables conditional on the values of other
variables. A number of display methods employed in the visual design
of Trellis display enable it to succeed in uncovering the structure
of data even when the structure is quite complicated. For example,
Trellis display provides a powerful mechanism for understanding
interactions in studies of how a response depends on explanatory
variables. Three examples demonstrate this; in each case, we make
important discoveries not appreciated in the original analyses.
Several control methods are also essential to Trellis display. A
control method is a technique for specifying information so that a
display can be drawn. The control methods of Trellis display form a
basic conceptual framework that can e used in designing software. We
have demonstrated the viability of the control methods by implementing
them in the S/S-PLUS system for graphics and data analysis, but they
can be implemented in any software system with a basic capability for
drawing graphs.
Key Words: Aspect ratio control by banking; Conditioning plots; Multipanel
graphs; Shingle data structure.
96JCGS02\P0156---------------------------------------------------------
Treed Regression
William P. Alexander and Scott D. Grimshaw
Given a data set consisting of $n$ observations on $p$ independent
variables and a single dependent variable, treed regression creates
a binary tree with a simple linear regression function at each of
the leaves. Each node of the tree consists of an inequality
condition on one of the independent variables. The tree is
generated from the training data by a recursive partitioning
algorithm. Treed regression models are more parsimonious than
CART models because there are fewer splits. Additionally,
monotonicity in some or all of the variables can be imposed.
Key Words: CART; MARS; Nonlinear regression models; Recursive partitioning;
Tree-structured regression}
96JCGS02\P0176----------------------------------------------------------
Gibbs Sampling Will Fail in Outlier Problems With Strong Masking
Ana Justel and Daniel Pena
This article discusses the convergence of the Gibbs sampling algorithm
when it is applied to the problem of outlier detection in regression
models. Given any vector of initial conditions, theoretically, the
algorithm converges to the true posterior distribution. However, the
speed of convergence may slow down in a high-dimensional parameter
space where the parameters are highly correlated. We show that the
effect of the leverage in regression models makes very difficult the
convergence of the Gibbs sampling algorithm in sets of data with
strong masking. The problem is illustrated with examples.
Key Words: Bayesian analysis; Leverage; Linear regression; Scale contamination
96JCGS02\P0190-----------------------------------------------------------
Hazard Rate Regression Using Ordinary Nonparametric Regression Smoothers
Robert J. Gray
This article proposes a method for nonparametric estimation of hazard
rates as a function of time and possibly multiple covariates. The
method is based on dividing the time axis into intervals, and
calculating number of event and follow-up time contributions from the
different intervals. The number of event and follow-up time data are
then separately smoothed on time and the covariates, and the hazard rate
estimators obtained by taking the ratio. Pointwise consisency and
asymptotic normality are shown for the hazard rate estimators for a
certain class of smoothers, which includes some standard approaches
to locally weighted regression and kernel regression. It is shown
through simulation that a variance estimator based on this asymptotic
distribution is reasonably reliable in practice. The problem of how to
select the smoothing parameter is considered, but a satisfactory
resolution to this problem has not been identified. The method is
illustrated using data from several breast cancer clinical trials.
Key Words: Censored data; Kernel regression; Locally weighted linear
regression; Survival analysis}