#####
#
# This is a shell archive, containing a discussion of strategies
# for local customization of S and some functions and shell
# scripts that may be of some use.
#
# Unpacking it will create 1 ordinary ASCII file, "discussion",
# and three directories, "src", "doc", and "scripts",
# which respectively contain the ASCII definitions of some S functions,
# their corresponding .Help documentation,
# and two shell scripts, whose uses are described in "discussion".
#
# To unpack the archive, create the needed directories by
# "mkdir src doc scripts", then remove any material preceding the #####
# line in this file and run /bin/sh on the resulting file.
#
echo "unpacking discussion"
cat > "discussion" << "!!!EOF!!!"
I'd like to contribute some suggestions to the discussion
of modifying or customizing S functions. The ideas I shall
discuss have been implemented in the New S installation
at the Columbia University Department of Psychology.
They were developed in collaboration with Ted Wright.
We are happy to share shell scripts, functions, and documentation.
Handling of NA's was certainly one motive for customizing S,
but by no means the only one.
At least five different kinds of LOCAL customization
seemed to be needed:
1. CENTRAL CHANGES:
We modified the operation of a few standard S functions,
and added quite a few new ones that we wanted to share with
everyone using S on our network. The directories
$SHOME/local/src
$SHOME/local/.Data
$SHOME/local/.Data/.Help
were created for the new code, functions, and documentation.
The modified functions mask the old versions, and the new
functions become available, when $SHOME/local/.Data is placed
in position 2 on a user's search vector.
(See below re modification of search vectors.)
We try to modify a standard S function by adding one or more
arguments, in such a way that, when the added arguments take
default values, the behavior of the modified function is
the same as that of the standard function. Rare exceptions
are cases where the standard behavior of the function seems
clearly unsatisfactory.
2. LIBRARY SECTIONS:
These are subdirectories of $SHOME/library, accessed through
the S library() function. This method is a good way to add
data analysis packages, since several different data analysts
may put together somewhat different library sections, say,
with different t-test functions, or different ANOVA packages.
Users can select the versions that work best for their needs.
(We have actually made little use of this method of customization,
but we still hope to in the future.)
3. USER-CONTROLLED CENTRAL STORAGE:
Individual directories named $HOME/.SFunctions are defined,
where users can put their own new general-purpose S functions
(and other related objects, such as standardized search vectors
and option lists; see below). In addition to new general-purpose
functions, users can put their private modifications of standard
S functions here, but I suggest the use of a distinctive name,
e.g., "OKmean" instead of "mean". Each $HOME/.SFunctions should
be attached at a suitable position on each of that user's search vectors
(position 4 or 5 is apt to be best).
We wrote shell scripts that set these things up with a delicate
compromise between automaticity and flexibility
(see description of "Sinit0" and "Sinit" below).
4. USER-CONTROLLED PROJECT-LEVEL STORAGE:
One often creates functions that have broad applicability within
one large project, but which are too specialized to be used
for other projects or by other data analysts.
For example, a function that accepts data in a particular form
and recodes it according to rules formulated in the context of
a particular theory might be highly useful in all the work
devoted to testing that theory, but it might not qualify as
a general data-analysis function.
We assume that such functions will typically be stored
in a .Data subdirectory of one of the directories devoted to
that project and we have therefore provided mechanisms that
readily attach such project-storage locations to search vectors.
5. USER-CONTROLLED ENVIRONMENT STRUCTURES:
We use specialized .Options lists (e.g., .Options.Anova, .Options.Reg)
to modify the operation of certain S functions.
To have options lists function as a specification
for a statistical computing environment, however,
S functions must be written or modified to seek guidance from them,
taking default actions only when such guidance is absent.
Such new or modified S functions must of course be stored by one
of the methods listed in points 1 - 4 above.
To take a somewhat trivial but clear example,
consider someone who generally prefers to calculate 99%-confidence
intervals instead of 95% intervals.
One can program t-test or ANOVA functions to look at an appropriate
options list; if a specification of default confidence levels
is found there, this overrides the standard default levels.
MODIFICATIONS OF STANDARD S FUNCTIONS
Since S adjusts search vectors by the functions attach()
and detach(), and adjusts an options list by the options() function,
we had to modify these three functions.
These modifications were done by adding arguments;
the defaults produce the standard functions.
Here are the args() outputs for these three modified functions:
> args(attach)
Attach new directories to session or permanent search vector
USAGE:
attach(file=NULL, pos=2, Perm=F, Both=F, Write=F)
> args(detach)
Detach directories from session or permanent search vector
USAGE:
detach(what=2, Perm=F, Both=F)
> args(options)
Set, modify or report (session or permanent) list of options
USAGE:
options(..., Suffix, Perm=F, Rm.opt=F, Write=F, Names=F)
This function is a modification of the standard one, which
lacked the arguments Suffix, Perm, Rm.opt, Write, and
Name. When Suffix is not given, and the other four new
arguments take their default values (FALSE), the function
operates just like the standard one.
The function can be used to modify a session or a per-
manent named list structure, either the standard one,
called ".Options", or any user-created one with a name of
the form ".Options.Suffix". The function can also be used
merely to report the values or just the names of specified
components of such a list.
The only other standard S function that we modified is apply();
but in this case, the changes were not restricted to nondefault
values of new arguments.
INITIALIZATION SCRIPTS
Sinit0 sets up the directory $HOME/.SFunctions, runs S
to create a .Search.list with
./.Data in position 1,
$SHOME/local/.Data in position 2,
$SHOME/s/.Functions in position 3,
$HOME/.SFunctions in position 4,
$SHOME/s/.Datasets in position 5.
It then stores this .Search.list in $HOME/.SFunctions.
This script has to be run only once by each S user.
(Running it again later does no harm: if $HOME/.SFunctions already
exists, Sinit0 does not tamper with it.)
A different default search order might be considered
for facilities used mainly by highly experienced programmers.
Note that this search vector has ./.Data in position 1.
If $HOME/.Data is used as the working data directory,
when the current directory is not $HOME, this will fail;
therefore $HOME/.SFunctions/.Search.list should NEVER be
copied to $HOME/.Data!!
The second script, Sinit, is designed to be used
once in each directory where S is run.
It initializes that directory for S
by creating subdirectory ./.Data to serve as the working directory
and placing an appropriate .Search.list in ./.Data
(and perhaps one or more option lists as well).
When it is used without additional arguments, Sinit tests
whether ./.Data is present, and if not, creates it;
then it tests whether ./.Data contains a structure called .Search.list;
if not, then it copies .Search.list from $HOME/.SFunctions to ./.Data,
(assuming that .Search.list has previously been constructed, by Sinit0
or otherwise). The exception to this occurs when Sinit is run
from the user's $HOME directory; in that case, Sinit runs S
to create a .Search.list with $HOME/.Data rather than ./.Data in position 1.
Finally, Sinit tests whether ./.Data contains any structures
with names beginning with the string ".Options";
if not, and if it finds a structure called ".Options",
or any other structures whose names begin with ".Options"
in $HOME/.SFunctions, then it copies all such structures to ./.Data.
Sinit accepts arguments that take a .Search.list and/or option lists
from some other directory. The syntax is
Sinit [-search dirname] [-option dirname]
If the argument -search is used, then .Search.list is copied
from the .Data subdirectory of the succeeding named directory.
The -option argument works analogously with respect to the option lists.
Note that when "-search" is used, there is no test to see
whether .Search.list already exists in ./.Data,
hence, an existing .Search.list can be overwritten.
This will not happen when Sinit is used without "-search",
since then the test is performed.
Similar considerations apply to "-option";
thus, the -option argument can be used to replace outdated option lists.
To summarize: Sinit0 should be run once, before any use of S,
to set up $HOME/.SFunctions and to construct the appropriate
search vector, which become $HOME/.SFunctions/.Search.list;
then Sinit should be run once in each new directory from which
S will be called, to set up the .Data subdirectory and to
put in it the appropriate search vector and possibly option lists.
Sinit can be rerun at any time, most likely with appropriate
arguments of form "-search dirname" and/or "-option dirname",
to copy some altered search vector or options lists.
DEALING WITH MISSING DATA
The problems of data-analysis strategy raised in the
discussion of NA's in snews are not going to be resolved
by any simple pronouncement. It is indeed good to force
data analysts to think about their NA's, it is also excellent
to permit them to handle them routinely once they've done
the thinking, and short of thought police, there isn't any
way to ensure that both will take place.
We have used a mixture of three strategies:
(1) modifying standard S functions, using distinctive nomenclature
to prevent masking the standard functions and to highlight calls
to the modified functions in other S code
(OKmean, OKvar, etc.);
(2) writing some special functions to be used by themselves
and to be called by other functions--count.na(), fill.na(), rm.na();
(3) introducing an argument called "nachk" into functions such
as rm.na() and ttest(). The functions provide a variety of different
actions, depending on the value of this argument. The default
values are tailored to our most common data analysis problems.
In experimental psychology, we usually want functions of vectors,
such as mean() and ttest(), to use all the nonmissing data, and to issue
a warning, hence the CU Psych default is nachk="warn"; for matrix
functions, such as cor() and lsfit(), the issues are more complex.
Default actions of functions can be modified by setting appropriate
options, e.g., the function ttest() consults .Options.Anova
for an overriding value of nachk.
Strategy (3) extends readily to other choices that one does
not wish to respecify repeatedly. For example, ttest() looks
for an overriding value of eqvar in .Options.Anova;
this is a logical argument that determines whether
equal variance is assumed in the two-sample case.
If .Options.Anova$eqvar is absent, then the CU Psych default
(eqvar=T, in deference to the majority) is used.
Another function, OKcor(), consults .Options.Cor for a choice
between cases="complete" and cases="pairwise"
to determine whether the correlation matrix
should be calculated for the rows with complete data or whether
each pairwise correlation should be calculated for all the
complete pairs; if .Options.Cor$cases is absent, the CU Psych
default, cases="complete" is used.
In all these cases, specifying default actions invites
their thoughtless use. But it is simply too inefficient
for each user to be required to write and debug programs
that accomplish these simple tasks.
!!!EOF!!!
echo "unpacking doc/OKcases"
cat > "doc/OKcases" << "!!!EOF!!!"
.BG
.FN OKcases
.TL
Explanation of "cases" arguments in function OKcor
.CS
OKcor(x, y=x, cases, case.record=F, trim=0)
.AG cases
character string, must be specified and must have one of two values,
"complete" or "pairwise".
.AG cases="complete"
produces correlation matrix based on only those cases
(rows of the `x, y' matrix) with complete data,
i.e., no `NA's in any column.
The sample size will be constant for all correlations,
equal to the number of complete rows.
.AG cases="pairwise"
produces correlation matrix based on all the available cases
for each pair of columns correlated.
The sample size may vary from correlation to correlation,
depending on how many cases (rows) have a `NA' for one or
both columns being correlated.
.AG case.record
logical, default `FALSE'. If `case.record=T',
then for `cases="complete"', the output will include a vector
of the row numbers omitted because of one or more `NA's in the row.
The default produces only a report of the count of such omitted rows,
not which ones they are.
This argument has no effect when `cases="pairwise"'.
.PP
The values of `cases' and/or `case.record' can be specified
as options that remain in effect for a data-analysis session,
or as part of a permanent options list that automatically takes effect
in each data-analysis session in the given directory.
Such specification is overridden in any instance when the
argument values are explicitly given in the call to `OKcor'.
.EX
options(Suffix="Cor", cases="pairwise")
.sp .5
options(Suffix="Cor", cases="complete", case.record=T, Perm=T)
.sp .5
options(Suffix="Cor", cases="pairwise", case.record=T, Perm=T)
.PP
The first example specifies cases="pairwise"; by default (Perm=F),
the option is effective only for the session.
The second specifies cases="complete" and case.record=T, on the
permanent options list called `.Options.Cor'.
The third choice seems strange, since case.record=T has no effect
when cases="pairwise", but in fact, case.record=T will come into
effect on occasions when the default "pairwise" is overridden
by giving the argument value "complete" on a call to `OKcor'.
.WR
!!!EOF!!!
echo "unpacking doc/OKcor"
cat > "doc/OKcor" << "!!!EOF!!!"
.BG
.FN OKcor
.TL
Correlation matrix with missing data
.CS
OKcor(x, y=x, cases, case.record=F, trim=0)
.AG x
matrix or vector; missing values (`NA's) allowed.
.AG y
matrix or vector; missing values (`NA's) allowed.
If omitted, same as `x'.
.AG cases
character, must be either "complete" or "pairwise".
If `cases="complete"', then rows in which any entry from
either `x' or `y' is `NA' are dropped, and the correlation
matrix is computed from the remaining rows.
If `cases="pairwise"', then each entry in the correlation
matrix is computed from all available pairs for the two
columns in question.
There is no standard default; if `cases' is missing,
and there are `NA's, an error results.
A default can be specified in the list `.Options.Cor'.
.AG case.record
logical; if `TRUE', then for `cases="complete"',
the row numbers of dropped rows are preserved in the output;
if `FALSE' (the default), then the count of rows dropped
is reported in the output.
The default can be altered in the list `.Options.Cor'.
Has no effect when `cases="pairwise"'.
.AG trim
proportion trimmed; this argument is passed to a call to `cor'.
.RT
If `cases' is misspecified or missing, and there are `NA's,
a call to `help(OKcases)' provides an explanation of what is needed.
Otherwise, a list is returned with two components.
.PP
The first component (named "corr") is a correlation matrix,
in which the `[j, k]' element corresponds
to the `j'th column of `x' and the `k'th column of `y'.
A correlation matrix with only one row or only one column
is returned as a vector.
If `y' is not given, then the correlation matrix of `x' is returned.
If a matrix with at least 2 rows and 2 columns is returned,
it uses column names of `x' and `y' as `dimnames'.
.PP
The second component varies depending on `cases'.
For "complete", it is named "miss" and is either a character string
reporting how many rows of `cbind(x, y)' were omitted
or (for `case.record=T')
the vector of row numbers of the omitted rows.
For "pairwise", it is named "sampsize" and gives
the matrix of sample sizes corresponding to the correlation matrix;
if this matrix has only one row or one column,
it, like the correlation matrix, is returned as a vector.
.EX
z <- OKcor(x, cases="complete", case.record=T)
.sp .5
z <- OKcor(x[ ,1], x[ ,-1], cases="pairwise")
.PP
The first example deletes all the rows of matrix `x' that have missing
data and then correlates each pair of columns of `x'.
`z$corr' is the resulting correlation matrix.
Since `case.record' has been set `TRUE', the second component, `z$miss',
is a vector of the row numbers of rows deleted from `x'.
In the second example, the first column of `x' is correlated in turn
with each succeeding column.
The returned value is a list of two vectors:
`z$corr[i]' is the correlation between column `1' and column `i+1'
while`z$sampsize[i]' is the number of rows in which the entries
in columns `1' and `i+1' are both nonmissing.
.SA
var, cor
.WR
!!!EOF!!!
echo "unpacking doc/OKhist"
cat > "doc/OKhist" << "!!!EOF!!!"
.BG
.FN hist
.TL
Plot frequencies (or relative frequencies) of data
.CS
hist(x, nclass, breaks, plot=T, relative=F, ...,
xlab=deparse(substitute(x)), ylab="")
.PP
This function modifies the standard `hist' function in two ways:
it omits missing data (`NA's) and it offers an option
to plot relative frequencies.
.AG x
numeric vector of data for histogram.
Missing values (`NA's) are simply omitted;
if any are found, a warning is issued.
.sp 1
.AG nclass
optional recommendation for the number of classes (i.e., bars) the
histogram should have.
Default is of the order of log to the base 2 of the number of nonmissing
data in `x'.
.sp 1
.AG breaks
optional vector of the break points for the bars of the histogram.
The count in the
.Co i -th
bar is
.Cs
sum( breaks[i] < x & x <= breaks[i+1] )
.Ce
If omitted, evenly-spaced break points
are determined from `nclass' and the extremes of the data.
.sp 1
.AG plot
logical; if `TRUE' (default), the histogram is plotted;
if `FALSE', a list giving breakpoints and counts is returned.
.sp 1
.AG relative
logical; if `TRUE', the ordinate shows relative frequencies,
that is, `counts' divided by `sum(counts)'.
(The value of `plot' is ignored when `relative=T';
the histogram is always plotted.)~
In this plot, the axis is labelled with
the midpoints of the intervals and the raw frequency counts.
Default is `FALSE'
(in order to maintain compatibility with the standard `hist' function).
.sp 1
.AG ...
Additional arguments to be passed to `barplot'.
The `hist' function uses the function `barplot'
(with argument `histo=T')
to do the actual plotting.
Consequently, arguments to `barplot' that control bar shading, etc.,
such as `angle', `density', `col', and `inside'
(see `barplot' documentation)
can be passed from `hist' as part of ` ... '.
If this is not done, the `barplot' defaults are used.
Other graphic parameters, such as `ylim' (see example __ below)
can also be passed to `barplot' using ` ... '.
Certain parameter specifications are already included in the
specification `histo=T'.
The call to `barplot' is modified for `relative=T'.
.RT
When the histogram is plotted, the bar midpoints
are returned (invisibly).
See example __.
Otherwise (when both `plot=F' and `relative=F')
the value returned is a list with two components:
.RC counts
count or density in each bar of the histogram.
.RC breaks
break points between histogram classes.
.EX
hist(x)
.sp 1
par(mfrow=c(4,1))
for(i in 1:4) hist(x[, i], relative=T, xlim=range(x), ylim=c(0,.4))
# CONTINUE REVISION
# the example plot is produced by:
my.sample <- rt(50,5)
lab <- "50 samples from a t distribution with 5 d. f."
hist(my.sample,main=lab)
.PI
.KW hplot
.KW dplot
.WR
!!!EOF!!!
echo "unpacking doc/OKlength"
cat > "doc/OKlength" << "!!!EOF!!!"
.BG
.FN OKlength
.TL
Length of data omitting missing values
.CS
OKlength(x, Warn=T)
.AG x
any object
.AG Warn
logical; if `TRUE' (the default),
and if `x' has 1 or more missing values (`NA's),
a warning message is printed
telling how many missing values were omitted.
.RT
The length of the structure after missing values are deleted.
.EX
sqrt(OKvar(x)/OKlength(x, Warn=F))
# Calculates standard error estimate for a vector `x', with
warning for missing values generated only from the `OKvar' call.
.SA
length
.WR
!!!EOF!!!
echo "unpacking doc/OKlsfit"
cat > "doc/OKlsfit" << "!!!EOF!!!"
.BG
.FN OKlsfit
.TL
Least-squares fit for complete cases, with options to print summaries
.CS
OKlsfit(x, y, wt, anova=F, partials=F, intercept=T,
coefnames=NULL, case.record=F, Print=T, ...)
.PP
`OKlsfit' removes rows of `x' and `y' which have
any missing values (`NA' s) for \fIeither\fR `x' or `y',
then calls the standard S function `lsfit' to perform least-squares fits
of linear combinations of the columns of `x'
to each column of `y'.
Weighted fits can be obtained by using the `wt' argument
to pass a weight vector to `lsfit'.
`anova=T' produces analysis-of-variance tables for
omnibus F-tests (null linear combination of `x' ), one for each column of `y'.
`partials=T' evaluates, for each column of `y', the losses of fit incurred
when each coefficient of `x' is in turn constrained to be null.
.sp .5
If a list named ".Options.Reg" can be found,
`OKlsfit' checks it for possible alternative default values
for each argument except `x', `y', `wt', and `coefnames'.
.AG x
vector or matrix; `NA' s allowed.
A vector is treated as a matrix with one column.
.AG y
vector or matrix; `NA' s allowed.
A vector is treated as a matrix with one column.
The number of rows must match the number in `x'.
.AG wt
optional vector of weights, as in `lsfit'.
If not given, an unweighted regression is carried out.
If given, it must have length equal to the number of rows in `x';
weights for rows omitted because of missing data are automatically deleted.
.AG anova
logical; default `FALSE'.
If `TRUE', and if `intercept' is also `TRUE',
then an analysis of variance table is produced
for each column of `y'.
.AG partials
logical; default `FALSE'.
If `TRUE', then a table of coefficients is produced
(for each column of `y' )
for the linear combination of columns of `x'
that fits best in the least-squares sense.
The table includes an estimated standard error for each coefficient
and the proportion of residual variance accounted for
by removing the constraint that the coefficient be 0.
When `intercept' is `TRUE', the table also includes increments
in proportion of total variance accounted for.
.AG intercept
logical; default `TRUE'.
If `TRUE', the linear combinations of `x' are nonhomogeneous,
i.e., they include an intercept.
.AG coefnames
character vector of coefficient names to label the table(s)
produced by `partials=T'.
If not given, then names of the form "x1", "x2", etc. are used,
unless the matrix `x' already has column names,
in which case these are utilized.
.AG case.record
logical; default `FALSE'.
If `TRUE', then one component of the returned value
contains the row numbers of the rows of `x' and `y' deleted
because of missing data.
Otherwise, the return includes only the count of rows deleted.
.AG Print
logical; default `TRUE'.
If `TRUE', then tables produced by `anova=T' and/or `partials=T'
are printed at the terminal despite assignment of the returned value
as a named data structure.
.AG ...
additional arguments passed through to all calls to `lsfit',
including the main one and additional ones that arise in connection
with `anova' and `partials'.
At present, the only argument passed in this way is `tolerance'.
.RT
The function returns a list whose first four components,
`coef', `residuals', `qr', and `intercept' are the same as for `lsfit'.
Additional components are:
.AG miss
Character string reporting how many rows were omitted because of
missing `x' or `y' values; or, if `case.record=T', a 1-row matrix
reporting the row numbers of the omitted rows.
.AG anova
When `anova=T' and `intercept=T',
an array with dimension `(3 4 ncol(y))'
is returned (or just a `3 \(mu 4' matrix, if `y' has only one column),
in which each `3 \(mu 4' section is an analysis-of-variance table
for a given column of `y'.
Such a table compares the least-squares fit of the "null" model
(intercept only, all other coefficients zero)
to the least-squares fit of the best linear combination of the
columns of `x', including an intercept.
The first 3 columns of the matrix give degrees of freedom, sums of squares
and mean squares; the last gives the F statistic, the estimated
error standard deviation for the more general model,
and the proportion of the null-model variance accounted
for in the more general model.
.AG partials
When `partials=T',
an array with dimension `(M 4 ncol(y))' is returned
(or just a `M \(mu 4' matrix, if `y' has only one column),
where `M' is the number of coefficients
(number of columns of `x', plus `1' if `intercept=T').
Each `M \(mu 4' section gives estimates of each coefficient (column 1),
the standard error for that coefficient (column 2),
the proportion of residual variance accounted for by removing the
constraint that that coefficient be `0' (column 3),
and, if `intercept=T', the incremental proportion of
the null-model variance accounted for by removing the constraint
that that coefficient be `0' (column 4).
Column 3 is labelled "Cond Rsq" because it computes the proportion
of residual variance accounted for by each variable
conditional on reducing the residual variance as far as possible first,
using all the other variables.
.EX
1.A attach("/u/dhk/manx/.Data")
1.B reg1 <- OKlsfit(mat1[, 1:4], mat1[, 5:6],
anova=T, partials=T, Print=F)
1.C reg1$miss
1.D reg1$anova
1.E reg1$partials
2.A pred1 <- cbind((100*longley.x[, 2])/longley.x[, 1],
longley.x[, c(3,4,6)])
2.B dimnames(pred1)[[2]][1] <- "Adjusted GNP"
2.C targ1 <- cbind(longley.x[, 5], longley.y)
2.D dimnames(targ1)[[2]] <- c("Population", "Employed")
2.E .Options.Reg <- list(anova=T, partials=T, Print=T)
2.F OKlsfit(pred1, targ1)
2.G OKlsfit(pred1[, 2:4], targ1)
.PP
Example 1 (A-E) analyzes questionnaire and other data
obtained from 1366 entering college students.
The 6 variables involved in the analysis are Verbal Scholastic Aptitude
Test score (VSAT), Quantitative SAT score (QSAT), a scale constructed to assess
interest in working with "things" rather than people (t/p.int),
mathematics anxiety/confidence (manx), degree of interest in chemistry
or physics as a career (Icp) and degree of interest in engineering (Ien).
The analysis shows attempts to predict each of Icp and Ien from linear
combinations of the first four variables.
The statistical reliability is high, because of the large sample size,
but the interrelations are modest.
It is clear that for the population under study,
mathematics anxiety/confidence bears a more important relation to
science career interest than does mathematics aptitude.
.PP
As the output shows, 25 of the students had missing data.
Note also that the column names of the input matrix, `mat1',
are used to label subarrays ("Icp", "Ien") and coefficients.
The incremental R-squared (proportion of variance accounted for)
for the intercept does not have a natural definition and is never
calculated by `OKlsfit'.
.PP
Output from Examples 1.C to 1.E.
.sp 1
[1] "25 row(s) omitted for missing data."
.sp 1
.ne 12
.TS
l s s s s
r r r r r.
, , Icp
df ssq ssq/df Fval/sdev/rsq
.sp .5
regression 4 232.15 58.03800 65.2300
error 1336 1188.60 0.88969 0.9432
total 1340 1420.80 1.06030 0.1634
.sp 1
.T&
l s s s s
r r r r r.
, , Ien
df ssq ssq/df Fval/sdev/rsq
.sp .5
regression 4 214.23 53.55900 79.6400
error 1336 898.45 0.67249 0.8201
total 1340 1112.70 0.83036 0.1925
.TE
.sp 1
.ne 16
.TS
l s s s s
r r r r r.
, , Icp
Coefficient Std Error Cond Rsq Incr Rsq
Intercept 4.1600000 0.303000 0.124 NA
QSAT 0.0002449 0.000401 0.000 0.000
VSAT -0.0001278 0.000391 0.000 0.000
t/p.int 0.0990100 0.017600 0.023 0.020
manx -0.0721500 0.005610 0.110 0.103
.sp 1
.T&
l s s s s
r r r r r.
, , Ien
Coefficient Std Error Cond Rsq Incr Rsq
Intercept 3.7200000 0.263000 0.130 NA
QSAT 0.0001981 0.000348 0.000 0.000
VSAT 0.0005437 0.000340 0.002 0.002
t/p.int 0.1332000 0.015300 0.054 0.046
manx -0.0599000 0.004880 0.101 0.091
.TE
.sp 1
.PP
Example 2 (A-G) analyzes the Longley data on employment during the 16-year
span from 1947 to 1962.
Parts 2.A - 2.D show preliminaries of forming variables and
adding names that will be picked up as labels on the output.
2.E shows how to initialize `.Options.Reg',
which subsequently controls the defaults in calls to `OKlsfit'.
Since "Adjusted GNP" and "Year" correlate about `0.99' for these years,
2.F shows a very high degree of multicollinearity,
which is arbitrarily reduced (though not eliminated) in 2.G.
(Note that the `$partials' output in 2.G shows
a large effect of "Year"; however, if "Unemployed" and "Armed Forces"
were also dropped from the analysis, "Year" would, in isolation,
account for 99% of the variance in "Population" and 94% in "Employed".)
.PP
Output from Example 2.F:
.sp 1
.ne 12
.TS
l s s s s
r r r r r.
, , Population
df ssq ssq/df Fval/sdev/rsq
regression 4 719.9500 179.99000 337.8000
error 11 5.8604 0.53276 0.7299
total 15 725.8100 48.38700 0.9919
.sp 1
.T&
l s s s s
r r r r r.
, , Employed
df ssq ssq/df Fval/sdev/rsq
regression 4 183.7500 45.93900 402.9000
error 11 1.2542 0.11402 0.3377
total 15 185.0100 12.33400 0.9932
.TE
.sp 1
.ne 16
.TS
l s s s s
r r r r r.
, , Population
Coefficient Std Error Cond Rsq Incr Rsq
Intercept -1.371e+03 1.40e+03 0.081 NA
Adjusted GNP 5.506e-02 5.21e-02 0.092 0.001
Unemployed 5.235e-03 7.23e-03 0.046 0.000
Armed Forces -6.878e-03 4.04e-03 0.208 0.002
Year 7.511e-01 7.25e-01 0.089 0.001
.sp 1
.T&
l s s s s
r r r r r.
, , Employed
Coefficient Std Error Cond Rsq Incr Rsq
Intercept -2.297e+03 6.45e+02 0.535 NA
Adjusted GNP -1.875e-02 2.41e-02 0.052 0.000
Unemployed -1.694e-02 3.34e-03 0.700 0.016
Armed Forces -7.794e-03 1.87e-03 0.612 0.011
Year 1.216e+00 3.35e-01 0.544 0.008
.TE
.sp 1
Output from Example 2.G (dropping Adjusted GNP):
.sp 1
.ne 12
.TS
l s s s s
r r r r r.
, , Population
df ssq ssq/df Fval/sdev/rsq
regression 3 719.350 239.780 445.7000
error 12 6.456 0.538 0.7335
total 15 725.810 48.387 0.9911
.sp 1
.T&
l s s s s
r r r r r.
, , Employed
df ssq ssq/df Fval/sdev/rsq
regression 3 183.6900 61.22800 555.2000
error 12 1.3234 0.11028 0.3321
total 15 185.0100 12.33400 0.9928
.TE
.sp 1
.ne 14
.TS
l s s s s
r r r r r.
, , Population
Coefficient Std Error Cond Rsq Incr Rsq
Intercept -2.838e+03 1.52e+02 0.967 NA
Unemployed -1.347e-03 3.69e-03 0.011 0.000
Armed Forces -7.087e-03 4.06e-03 0.203 0.002
Year 1.513e+00 7.85e-02 0.969 0.276
.sp 1
.T&
l s s s s
r r r r r.
, , Employed
Coefficient Std Error Cond Rsq Incr Rsq
Intercept -1.797e+03 68.60000 0.983 NA
Unemployed -1.470e-02 0.00167 0.866 0.046
Armed Forces -7.723e-03 0.00184 0.596 0.011
Year 9.564e-01 0.03550 0.984 0.432
.TE
.br
.ne 3
.SA
lsfit, qr, ls1fit
.WR
!!!EOF!!!
echo "unpacking doc/OKmean"
cat > "doc/OKmean" << "!!!EOF!!!"
.BG
.FN OKmean
.TL
Arithmetic mean of nonmissing values
.CS
OKmean(x, Warn=T, ...)
.AG x
numeric; missing values (`NA's) are allowed.
.AG Warn
logical; if `TRUE' (the default),
and if `x' has 1 or more missing values (`NA's),
a warning message is printed
telling how many missing values were omitted.
.AG ...
other arguments to be passed to a call to `mean';
at present, the only one that makes sense is `trim'.
.RT
mean of the nonmissing values of `x'.
.EX
apply(x, 1, OKmean, Warn=F, trim=.05)
.PP
If `x' is a matrix,
this expression yields the 5% trimmed mean for the nonmissing entries
in each row of `x', omitting the many warnings
that might be generated if `x' has many rows with `NA's.
Note that `apply' passes the arguments `Warn' and `trim' to OKmean,
and `OKmean' then passes `trim' to `mean'.
.SA
mean
.WR
!!!EOF!!!
echo "unpacking doc/OKprod"
cat > "doc/OKprod" << "!!!EOF!!!"
.BG
.FN OKprod
.TL
Product, omitting missing values.
.CS
OKprod(..., Warn=T)
.AG ...
numeric or complex objects. Missing values (NA's) are allowed.
.AG Warn
logical; if `TRUE' (the default),
and if `x' has 1 or more missing values (`NA's),
a warning message is printed
telling how many missing values were omitted.
.RT
the product of all the nonmissing elements of all the arguments.
If all elements of all arguments are missing, `OKprod' returns `1'.
.EX
OKprod(x > 0) # returns 1 if all nonmissing values of x are positive,
0 if any nonmissing value is nonpositive.
.SA
prod
.WR
!!!EOF!!!
echo "unpacking doc/OKquantile"
cat > "doc/OKquantile" << "!!!EOF!!!"
.BG
.FN OKquantile
.TL
Empirical quantiles based on nonmissing data
.CS
OKquantile(x, probs=seq(0,1,.25), Warn=T)
.AG x
numeric; missing values (`NA's) are allowed.
.AG probs
vector of desired probability levels. Values must be
between 0 and 1 inclusive. The default produces a five
number summary: the minimum, lower quartile, median, upper
quartile, and maximum of `x'.
.AG Warn
logical; if `TRUE' (the default),
and if `x' has 1 or more missing values (`NA's),
a warning message is printed
telling how many missing values were omitted.
.RT
vector of empirical quantiles corresponding to the `probs'
levels in the sorted `x' data,
after deletion of missing values.
Results are currently computed
to single-precision accuracy only.
.sp 1
The algorithm linearly interpolates between order statistics
of `x', assuming that the ith order statistic is the
`(i-.5)/length(x)' quantile. The algorithm uses partial
sorting, hence is quickly able to find a few quantiles
even of large datasets.
.EX
OKquantile(mydata) # five number summary
.sp .5
OKquantile(xxx, c(.33,.67)) # 33% and 67% points of xxx
.sp .5
apply(x, 1, OKquantile, Warn=F)
.PP
If `x' in the third example is a matrix,
this expression yields
a 5-rowed matrix in which the `i'th column is
the 5-quantile summary for the nonmissing entries
in the `i'th row of `x'.
Note that `apply' passes the argument `Warn' to OKquantile,
thus avoiding the many warnings
that might be generated if `x' has many rows with `NA's.
.SA
quantile
.WR
!!!EOF!!!
echo "unpacking doc/OKse"
cat > "doc/OKse" << "!!!EOF!!!"
.BG
.FN OKse
.TL
Standard error estimate based on nonmissing values
.CS
OKse(x, Warn=T)
.AG x
numeric vector; missing values (`NA's) allowed.
.AG Warn
logical; if `TRUE' (the default),
and if `x' has 1 or more missing values (`NA's),
a warning message is printed
telling how many missing values were omitted.
.RT
square root of OKvar(x)/OKlength(x),
where `x' is first coerced to a vector.
.EX
apply(x, 1, OKse, Warn=F)
.PP
If `x' is a matrix,
this expression yields the estimated standard error of the mean
for the nonmissing entries
in each row of `x', omitting the many warnings
that might be generated if `x' has many rows with `NA's.
.WR
!!!EOF!!!
echo "unpacking doc/OKsum"
cat > "doc/OKsum" << "!!!EOF!!!"
.BG
.FN OKsum
.TL
Sum, omitting missing values.
.CS
OKsum(..., Warn=T)
.AG ...
numeric or complex objects. Missing values (NA's) are allowed.
.AG Warn
logical; if `TRUE' (the default),
and if `x' has 1 or more missing values (`NA's),
a warning message is printed
telling how many missing values were omitted.
.RT
the sum of all the nonmissing elements of all the arguments.
If all elements of all arguments are missing, `OKsum' returns `0'.
.EX
OKsum(x > 0) # counts the number of positive nonmissing values of `x'.
.SA
sum
.WR
!!!EOF!!!
echo "unpacking doc/OKvar"
cat > "doc/OKvar" << "!!!EOF!!!"
.BG
.FN OKvar
.TL
Variance and covariance, based on rows with no missing data.
.CS
OKvar(x, y = x, Warn=T)
.AG x
matrix or vector; missing values (`NA's) allowed.
.AG y
matrix or vector; missing values (`NA's) allowed.
If omitted, same as `x'.
.AG Warn
logical; if `TRUE' (the default),
and if `x' or `y' has missing values (`NA's),
a warning message is printed
telling how many rows of `cbind(x, y)' were omitted.
.RT
covariance matrix, in which the `[j, k]' element corresponds
to the `j'th column of `x' and the `k'th column of `y'.
If the matrix has only one row or only one column,
it is returned as a vector.
If `y' is not given, then the variance-covariance matrix of `x'
(or just the variance of vector `x') is returned.
.PP
If any item in the `i'th row of either `x' or `y' is `NA',
then that row is omitted in the calculation.
.EX
sqrt(OKvar(x)) # standard deviation of a vector, omitting missing data
.SA
var
.WR
!!!EOF!!!
echo "unpacking doc/apply"
cat > "doc/apply" << "!!!EOF!!!"
.BG
.FN apply
.TL
Apply a Function to Sections of an Array
.CS
apply(X, MARGIN, FUN, ..., Lcomp)
.PP
The version of `apply' documented here has been changed
substantially from the standard version in S.
It has the capability of preserving complex array structures
returned by the applied function `FUN';
or if `FUN' returns several list components,
`apply' returns a list with some or all of those same components.
.PP
The standard `apply' is still available in directory $SHOME/s/.Functions,
for use with scripts that were designed to take advantage
of the features (deficiencies) of the standard version.
.PP
Since `apply' is designed to handle virtually any `FUN', it is
implemented in a general way, using a `for' loop; it is therefore
quite slow when the loop is long.
Special functions can sometimes be applied much more efficiently,
as illustrated in the seventh example below.
.AG X
array. Missing values (`NA's) are allowed if `FUN' accepts them.
.AG MARGIN
the subscripts that define array sections
over which the function is to be applied.
For example, if `X' is a matrix, 1 indicates
that `FUN' is to be applied to each row,
2 means apply to each column.
For a more complex example of the use of `MARGIN', see
the fourth example below.
In general, a subarray is extracted from `X' for each
combination of the levels of the subscripts named in
`MARGIN'. The function `FUN' is invoked for each of these
subarrays, and the results, if any, concatenated into a new
array. Note that `MARGIN' tells which dimensions of `X'
are retained in the result.
.AG FUN
function (or
character string giving the name of the function) to be
applied to the specified array sections.
The character form is necessary only for functions with
unusual names, e.g., "%*%".
.AG ...
optional, any arguments to `FUN'; they are passed unchanged
to each call of `FUN'.
Each section of the input array is passed as the first argument to an
invocation of `FUN'. It is passed without a keyword modifier, so, by
attaching suitable keywords in `...', it should be possible
to make the array section correspond to any argument to `FUN'.
.AG Lcomp
optional numeric or character vector,
with numbers or names of list components
that are to be preserved when `FUN' produces a list.
If not given, all list components are preserved.
.RT
If each call to `FUN' returns an array of dimension `dA',
or a vector of length `dA', then
`apply' returns an array of dimension
.Cs
c(dA, dim(X)[MARGIN])
.Ce
But dimension components with length 1 are dropped;
for example, if `dA==1' and `MARGIN'
has length > 1, the value is an array of dimension `dim(X)[MARGIN]' ;
if `dA==1' and `MARGIN' has length `1', the value is a simple vector.
.PP
If the calls to `FUN' return results that are irregularly sized,
then `apply' returns a list of the results, with one list
component for each section of the input array.
.PP
If `FUN' returns a list,
then `Lcomp' is used to select the list components to be retained
from each call to `FUN'.
If only one component is selected,
and that component is an array or vector
then `apply' returns
a vector, array, or list following the above rules.
If several components are selected,
then `apply' returns a list with those same components;
the preceding rules for vector, array, or list output
are implemented separately within each list component.
.EX
apply(x,2,mean,trim=.25) # 25% trimmed column means from matrix `x'.
# The result is a vector of length `ncol(x)'.
apply(x,2,sort) # sort columns of matrix `x'.
t(apply(x,1,sort)) # transpose result of row sort.
apply(z,c(1,3),sum)
apply(x,1, function(y)lsfit(y[,-1], y[,1]), Lcomp=c("coef","residuals"))
# produces list, components named "coef" and "residuals".
z <- apply(x,1,ttest, Lcomp="CI", nachk="omit", hyp=F, Table=F, Print=F)
#produces a 3-way array.
apply(x, 2, sum) # a slow way to sum the columns of a matrix
rep(1, nrow(x)) %*% x # a faster way
.PP
The sorting examples show the difference between
row and column operations when the
results returned by `FUN' are vectors. The returned value
becomes the
.I first
dimension of the result, hence the transpose
is necessary with row sorts.
.PP
Suppose, in the fourth example, that `z' is a 4-way
array with dimension vector
(2,3,4,5). The expression computes the 2 by 4 matrix obtained by
summing over the second and fourth extents of `z' (i.e., `sum' is
called 8 times, each time on 15 values).
.PP
In the fifth example, suppose `x' is a 3-way array with dimension vector
(20, 100, 4). The "coef" component of the result will be a matrix
with dimension (4, 20) (since each least-squares fit will include
an intercept and 3 regression coefficients) and the "residuals" component
will be a matrix with dimension (100, 20).
.PP
In the sixth example, also explained in `help(ttest)',
the `ttest' function is applied in `m' columns or sections
of matrix or array `x', where `m = dim(x)[1]',
and the `CI' component of the value is selected.
By default, this value is a `3 \(mu 2' matrix, containing
the upper and lower 90%-, 95%-, and 99%-confidence limits for the mean,
and so the resulting array will be `3 \(mu 2 \(mu m'.
The remaining keywords are passed to `ttest' to suppress other
possible computations by `ttest' and to suppress warnings about
omission of missing values;
since only the `CI' computation is used,
this makes the operation appreciably faster and more compact for large `m'.
.br
.ne 3
.SA
Functions `lapply' and `sapply' apply a function to each element of a list.
Function `tapply' applies a function to a ragged array, defined by
categories.
.PP
At this writing, only the standard versions of these are available.
.KW iteration
.KW algebra
.KW array
.WR
!!!EOF!!!
echo "unpacking doc/attach"
cat > "doc/attach" << "!!!EOF!!!"
.BG
.FN attach
.TL
Attach new directories to session or permanent search vector
.CS
attach(file=NULL, pos=2, Perm=F, Both=F, Write=F)
.AG file
character string giving the name of a new data directory to be
accessed. The name must be a correct UNIX file-system pathname,
specifying an S data directory.
The named directory is added to either the session search vector
(.Search.list) or to the corresponding object in the current
working directory, depending on the value of arguments `Perm' and `Both'.
(See below.)
If `file' is omitted, `attach' has no effect on the search vector;
instead, `attach' simply returns the current value of the search vector.
.AG pos
positive integer, giving position in search vector that `file' should occupy.
The data directories after `file', i.e.,
those originally in position `pos' or beyond,
are pushed to positions one higher.
If `pos=1', `file' will
be attached as the working directory (you must have write permission).
If `pos' exceeds the length of the search vector, then the new data
directory is attached as the last component of the vector.
.AG Perm
logical; if `TRUE', `file' is added, in the specified position, to the
structure .Search.list in the current working directory, but the session
search vector is not modified. Default is `FALSE'.
If `Perm' is `TRUE' and `file' is NULL, then the value of the
structure .Search.list in the working directory is given (if found),
rather than the session search vector.
.AG Both
logical; if `TRUE', and if the session search vector is identical to the
one stored in the current working directory, then `file' is added to both.
If the two are not equal, an error message is generated and nothing is modified.
Default is `FALSE'.
.AG Write
logical; if `TRUE', and if no other arguments are given,
then the current session search vector is written to the
working directory, overwriting the previous .Search.list, if any.
If `TRUE', and other arguments are given, an error message is generated.
Default is `FALSE'.
.RT
a character vector giving the old value (prior to the attachment)
of the session or permanent search vector,
depending on whether `Perm' is `FALSE' or `TRUE'.
.PP
This function is a modification of the standard one,
which lacked the arguments `Perm', `Both', and `Write'.
When all three of these new arguments take their default values (`FALSE')
the function is exactly the same as in standard implementations of S.
.PP
If no structure `.Search.list' is found in the working directory,
then the initial session search vector has
three data directories : working, system functions
and system data, in that order.
The working directory will be the subdirectory `.Data' of the
current directory; if that does not exist it will be `.Data' in your
home directory; and if that does not exist S will create `.Data' in
your home directory.
If `.Search.list' does exist in the working directory,
it becomes the initial session search vector.
.PP
In the Columbia Psychology S, there are two scripts, Sinit0 and Sinit,
which initialize S globally and in each directory respectively.
Sinit0 creates a directory $HOME/.SFunctions and places in it
a .Search.list in which $SHOME/local/.Data occupies position 2
(after ./.Data), and $HOME/.SFunctions occupies position 4
(between the standard system function directory and the
system data directory). Sinit creates ./.Data and by default
copies $HOME/.SFunctions/.Search.list into it.
.PP
Only directories containing S datasets should be attached as S data directories;
ordinary ASCII data files should be read by the `scan' function.
.SA
See `detach' for removing a data directory from the search list.
.SE
The object `.Search.list' is a character vector containing
the current list of directories to be searched; this object is changed
in frame 0, if `Perm' is `FALSE',
or in the current working directory, if `Perm' is `TRUE',
or in both of `Both' is `TRUE',
as a side effect of `attach'.
.EX
attach()
[1] "./.Data" "/usr/local/s/local/.Data"
[3] "/usr/local/s/s/.Functions" "/usr/local/s/s/.Datasets"
attach("/u/dhk/.SFunctions", pos=4, Both=T)
attach(Perm=T)
[1] "./.Data" "/usr/local/s/local/.Data"
[3] "/usr/local/s/s/.Functions" "/u/dhk/.SFunctions"
[5] "/usr/local/s/s/.Datasets"
attach("/u/cew/exper1/analysis/.Data", pos=5)
attach()
[1] "./.Data" "/usr/local/s/local/.Data"
[3] "/usr/local/s/s/.Functions" "/u/dhk/.SFunctions"
[5] "/u/cew/exper1/analysis/.Data" "/usr/local/s/s/.Datasets"
attach(Write=T)
.PP
The first example displays a possible default search vector.
The second attaches dhk's .SFunctions directory in position 4 on both
the permanent .Search.list (in the working directory, ordinarily ./.Data)
and the session .Search.list.
The third confirms that the permanent .Search.list has been modified
as expected.
The fourth attaches a particular .Data directory in position 5
on the session .Search.list only.
The fifth confirms that this has been done as expected.
The last example makes the newly modified session search vector permanent.
.KW data
.WR
!!!EOF!!!
echo "unpacking doc/chitab"
cat > "doc/chitab" << "!!!EOF!!!"
.BG
.FN chitab
.TL
Compare fits of models for a frequency table
.CS
chitab(table, models=list("ind", "sat"), save=F, Print=T, ...)
.AG table
array of nonnegative numbers to be fit by the models.
.sp 1
.AG models
list of models fit to `table'.
Each model is specified in one of two ways:
by providing the margins to be fit in a call to `loglin',
or by providing the fitted array
together with a parameter count and label for the model.
The list should be in hierarchical order, that is,
each model should be a generalization of the preceding one.
.PP
When a model is specified by a set of margins for `loglin',
the several margins are separated by `0's as delimiters.
For example, if `table' is a 4-way array,
then `c(1,2,0,3,4)' specifies a model in which the 1,2 margin is
fitted exactly (summing over 3,4) and likewise the 3,4 margin.
The model is log-additive in parameters
based on the 1,2 and the 3,4 margins;
the 1,2 and 3,4 interactions are fit exactly.
(The syntax is exactly the same as in `loglin'.)
.PP
The character string `"ind"' serves
as an abbreviation for the multiway independence model,
in which the only parameters are the 1-way marginals;
for a 4-way array, it translates as `c(1,0,2,0,3,0,4)'.
The character string `"sat"' stands for the saturated model,
in which the entire table is fit exactly,
e.g., `c(1,2,3,4)' or simply `1:4' in a 4-way array.
.PP
Each specification of margins is passed, with `table',
to a separate call to `loglin'.
.PP
The second way to specify a model is by a list
with three components, named `fit', `par', and `src'.
Component `fit' is an array with dimension vector
identical to `table'
(presumably obtained by a previous call to `loglin' or to some other
function that fits a frequency table).
Component `par' gives the number of parameters
estimated in fitting the model.
(By convention, the parameter for overall level is omitted;
thus, `par' is `sum(dim(table - 1))' for the "ind" model
and `prod(dim(table)) - 1' for the "sat" model.)
(When the model is specified by margins, the function `loglin.par'
supplies the value of `par'.)
Component `src' is a character string that identifies the model
and is used to label the proper row in the table of chi-square statistics.
(When the model is specified by margins, the specification string itself
is used to label the row.)
.sp 1
.AG save
logical; when `TRUE', the fitted models are saved
as a sublist of the returned value.
(See below.)
Default `FALSE'.
.sp 1
.AG Print
logical; when `TRUE', the function prints the matrix described below
with columns rounded so that they can be read easily.
Default `TRUE'.
.sp 1
.AG ...
Optional arguments that are passed to all the calls to `loglin'.
(The possibilities are `start', `eps', `iter', and `print'.)
.RT
When `save=F', the function returns
a matrix with one row for each model that is fit,
and with six columns.
The model specifications are used as labels for the matrix rows.
The columns contain different information for each fit.
(1) For all but the last row of the table, column one contains
the incremental likelihood-ratio chi-square statistic
comparing the model with the next one in the list.
(2) The degrees of freedom for that chi-square statistic.
The df are calculated as the difference in number of
parameters estimated in the two models
(supplied by a call to the function `loglin.par'
or by component `par' of a fitted model).
(3) The descriptive level (\fIp\fP-value) associated with
the statistic.
(4) The likelihood-ratio chi-square statistic for the overall goodness
of fit of the model.
(5) The df remaining after fitting the model.
(6) The descriptive level (\fIp\fP-value) associated with
the goodness-of-fit statistic.
.PP
When `save=T', the function returns
a list with components `tab' and `fit'.
Component `tab' is the matrix just described.
Component `fit' is in turn a list,
with a subcomponent for each model fit to `table'.
The subcomponents in turn are lists with components
`fit', `par', and `src' as described above;
thus, the component `fit' can be used
as the `models' argument to a new call to `chitab'.
In practice, a new call to `chitab' might use some
previous `fit' subcomponents,
mixed with specifications of new models.
(See third example below.)
.EX
chitab(tab1)
.sp .5
chitab(tab2, list("ind", c(1,2,0,1,3,0,2,3), "sat"), print=F)
.sp .5
y <- chitab(tab2, save=T)
chitab(tab2, c(y$fit[1], list(c(1,2,0,1,3,0,2,3)), y$fit[2]))
.br
.PP
Example 1 produces the common chi-square test of independence.
For the 2\(mu2 table
.sp 1
.TS
center;
l r r
r r r.
tab1
Group A Group B
Yes 40 15
No 20 15
.TE
.sp 1
the printed output from example 1 is as follows:
.sp 1
.TS
center;
r r r r r r r
r n n n n n n.
INCRchisq INCRdf INCRp chisq df p
ind 2.31 1 0.13 2.31 1 0.13
sat NA NA NA 0.00 0 NA
.TE
.sp 1
Example 2 tests the triple interaction
and (simultaneously) the three pairwise interactions in a 3-way table.
(The argument `print=F' is passed to the calls to `loglin',
suppressing the reports about convergence.)
For the data
.sp 1
.TS
center;
l r r
r c s
r r r.
tab2
Region I
Group A Group B
Yes 40 15
No 20 15
.sp 1
.T&
r c s
r r r.
Region II
Group A Group B
Yes 25 20
No 20 30
.TE
.sp 1
the printed output from example 2 would be
.sp 1
.TS
center;
r r r r r r r
r n n n n n n.
INCRchisq INCRdf INCRp chisq df p
ind 15.21 3 0.0016 15.22 4 0.0043
1 2 0 1 3 0 2 3 0.01 1 0.9200 0.01 1 0.9200
sat NA NA NA 0.00 0 NA
.TE
.sp 1
This output reflects the fact that there is virtually no
observed interaction between "Region" and "Group"
in determining "Yes" vs. "No" (chi-square = 0.011).
.sp 1
Example 3 produces essentially the same result as example 2,
but in two stages. First, the independence model is tested
against the saturated model, with `save=T' used to preserve
the fits in `y$fit'.
Then the model with pairwise interactions is added,
without recalculating the earlier fits.
The printed output is exactly the same as above.
.ne 3
.SA
loglin, loglin.par
.WR
!!!EOF!!!
echo "unpacking doc/count.na"
cat > "doc/count.na" << "!!!EOF!!!"
.BG
.FN count.na
.TL
Count NA's in each row of a matrix
.CS
count.na(x)
.AG x
matrix or vector
.RT
vector with length equal to the number of rows of `x',
with entries equal to the number of missing values (NA's)
in each row.
A nonmatrix argument is treated as a matrix with one row,
so `count.na' returns the total number of missing values in the structure.
.EX
x[count.na(x)==0, ] # Selects rows of x with no missing values
.WR
!!!EOF!!!
echo "unpacking doc/detach"
cat > "doc/detach" << "!!!EOF!!!"
.BG
.FN detach
.TL
Detach directories from session or permanent search vector
.CS
detach(what=2, Perm=F, Both=F)
.AG what
character string, logical vector, or numeric subscripts.
If `what' is character, then it is the name
of the directory to be taken out of the search vector.
If `what' is a logical vector, it should be the same length as the
current search vector; any `TRUE' values correspond to directories
that are to be detached.
If `what' is numeric, it gives the positions of directories in the
search vector to be detached.
.AG Perm
logical; if `TRUE', then the directories specified by `what' are
removed from the .Search.list (if found) in the working directory,
rather than from the session object. Default `FALSE'.
.AG Both
logical; if `TRUE', and if the .Search.list in the working directory
is identical to the current session .Search.list,
then the directories specified by `what' are removed from both.
Default `FALSE'.
.RT
a character vector giving the old value (prior to the detachment)
of the session or permanent search vector,
depending on whether `Perm' is `FALSE' or `TRUE'.
.PP
This function is a modification of the standard one,
which lacked the arguments `Perm', and `Both'.
When these new arguments take their default values (`FALSE')
the function is exactly the same as in standard implementations of S.
.SE
The object `.Search.list' is a character vector containing
the current list of directories to be searched; this object is changed
in frame 0, if `Perm' is `FALSE',
or in the current working directory, if `Perm' is `TRUE',
or in both of `Both' is `TRUE',
as a side effect of `detach'.
.SA
`attach', `assign'.
.EX
detach("/u/cew/exper1/analysis/.Data")
attach()
[1] "./.Data" "/usr/local/s/local/.Data"
[3] "/usr/local/s/s/.Functions" "/u/dhk/.SFunctions"
[5] "/usr/local/s/s/.Datasets"
detach(what=2, Both=T)
.PP
The first example removes a named data directory from the
session search vector. This might be done, for example,
prior to attaching "/u/cew/exper2/analysis/.Data", to avert
a situation in which you are uncertain whether a named data
structure comes from exper1 or exper2.
The subsequent `attach()' displays the revised session search vector.
The final example removes the component in position 2 from both
the session and permanent search vectors; referring back to the
preceding display, this would inactivate the local modifications
of standard S functions, so in particular, the version of `detach()'
here described would become unavailable, and the function and its
documentation would revert to the standard `detach()'.
.KW data
.WR
!!!EOF!!!
echo "unpacking doc/fill.na"
cat > "doc/fill.na" << "!!!EOF!!!"
.BG
.FN fill.na
.TL
Replace missing values by a designated constant of the same mode
.CS
fill.na(x, nomiss=F, case.record=F, Warn=T)
.AG x
any data structure, missing values (`NA's) permitted.
.AG nomiss
constant used to replace each `NA' in structure `x'.
Must have the same mode as `x'
(i.e., numeric, complex, or logical).
Default is `FALSE', since the most useful application of `fill.na'
is likely to be to obtain strict comparisons (see example 2 below).
The `.Options' list is consulted for a possible overriding default.
.AG case.record
logical; if `TRUE', the returned value contains a component named "miss",
the vector of case numbers of `NA's that were replaced.
If none, this component takes the value `NA'.
Default `FALSE'.
The `.Options' list is consulted for a possible overriding default.
.AG Warn
logical; if `TRUE' (the default), then if `case.record' is `FALSE',
a warning message reports the count of `NA's replaced.
The `.Options' list is consulted for a possible overriding default.
.RT
In the default situation, a structure like `x' is returned,
in which each `NA' has been replaced by the value of `nomiss'.
But if `case.record=T', then a list is returned,
with components named "struc" and "miss";
the former is the same as the value returned when `case.record=F',
while the latter is a vector of case numbers of cases replaced,
that is, it is the vector `(1:length(x))[is.na(x)]'.
(In case there are no `NA's, this component is `NULL'.)
.EX
y <- fill.na(x, 0)
.sp .5
y <- fill.na(x > 0)
.sp .5
y <- fill.na(x <= 0, nomiss=T)
.sp .5
y <- fill.na(x < sqrt(x), case.record=T)
.sp .5
options(nomiss=0, Warn=F)
y <- apply(x, 2, fill.na, case.record=T)
.PP
The first example simply replaces missing values in `x' by `0'
(this is rarely useful in S).
The second example illustrates the use of `fill.na' for strict comparisons,
which motivated writing this function.
In this instance, the returned vector `y' is `T' at every index
where `x' is strictly positive, `F' everywhere else, including
indices where `x' is `NA'.
The third example generates the complement of the value returned
in the second example:
the `nomiss' argument is used to put `T' rather than `F' for the `NA's.
Obviously `!fill.na(x > 0)' would do the same thing.
In the fourth example, `y$struc' is `T' only where `x' is strictly
between 0 and 1, `F' everywhere else, and `y$miss'
records the cases where the `F' is assigned because
either `x' or `sqrt(x)' is missing (so it includes
all cases where `x' is negative).
In the final example, the `options' function is used to set new defaults
for `nomiss' and `Warn'
(the latter is useful if the function is to be used in a loop such as `apply';
one avoids generating multiple warnings).
In fact, the subsequent call to `apply' passes `case.record=T'
to each application of `fill.na', so the value of `Warn' does not matter.
.br
.ne 30
.PP
The following is a more complete version of the final example;
it displays an input structure `x' and contrasts the outputs
when `fill.na' is used once, on the full structure,
versus when it is used within `apply' on each column of the structure.
.sp 1
.TS
l r r r.
`x'
[,1] [,2] [,3]
[1,] 1 1 NA
[2,] NA 1 NA
.TE
.sp 1
.nf
`options(nomiss=0, Warn=F)'
`y1 <- fill.na(x, c=T)'
.fi
.sp 1
.TS
l r r r.
`y1'
$struc:
[,1] [,2] [,3]
[1,] 1 1 0
[2,] 0 1 0
.TE
.sp 1
.nf
$miss:
[1] 2 5 6
.sp 1
`y2 <- apply(x, 2, fill.na, c=T)'
.fi
.sp 1
.TS
l r r r.
`y2'
$struc:
[,1] [,2] [,3]
[1,] 1 1 0
[2,] 0 1 0
.TE
.sp 1
.nf
$miss:
[1] 2 NA 1 2
.fi
.PP
Note that `y1$struc' and `y2$struc' are identical: both come from
replacing the `NA's by `0', the first globally, the second column-wise.
But in `y1$miss' the vector of indices of missing values is obtained
globally on a structure of length 6,
while in `y2$miss', the first `2' is the index of the missing value
in column 1 of `x', the `NA' comes from the absence of missing values
in column 2 of `x', and the final `1 2' comes from the missing values
in column 3.
.br
.ne 3
.SA
OKcases, count.na, rm.na
.WR
!!!EOF!!!
echo "unpacking doc/loglin.par"
cat > "doc/loglin.par" << "!!!EOF!!!"
.BG
.FN loglin.par
.TL
Parameter count for loglinear fit
.CS
loglin.par(margin, DIM)
.PP
This function is called in `chitab' to provide parameter counts
used for degrees of freedom of chi-square statistics.
.AG margin
vector specifying margins to be fitted in call to `loglin'.
The syntax is the same as in `loglin' and `chitab'.
.sp 1
.AG DIM
dimension vector of table fit by call to `loglin'.
.RT
number of parameters estimated in the loglinear fit.
By convention, the overall frequency is not counted as a parameter.
.EX
loglin.par(c(1,0,2,0,3,0,4), c(2,2,2,2))
# answer = 4
.sp .5
loglin.par(c(1,2,3,0,1,2,4,0,3,4)), c(2,2,2,2))
# answer = 12
.sp .5
loglin.par(1:4, c(2,2,2,2))
# answer = 15
.sp .5
loglin.par(1:4, c(3,3,3,3))
# answer = 80
.sp .5
loglin.par(c(1,0,2,0,3,0,4), dim(x))
# answer = 4
.sp .5
loglin.par(1:4, dim(x))
# answer = `prod(dim(x)) - 1', assuming `length(dim(x))==4'
.br
.ne 3
.SA
loglin, chitab
.WR
!!!EOF!!!
echo "unpacking doc/options"
cat > "doc/options" << "!!!EOF!!!"
.BG
.FN options
.TL
Set, modify or report (session or permanent) list of options
.CS
options(..., Suffix, Perm=F, Rm.opt=F, Write=F, Names=F)
.PP
This function is a modification of the standard one,
which lacked the arguments `Suffix', `Perm', `Rm.opt', `Write', and `Name'.
When `Suffix' is not given, and the other four
new arguments take their default values (`FALSE'),
the function operates just like the standard one.
.PP
The function can be used to modify a session or a permanent
named list structure,
either the standard one, called ".Options", or any user-created one
with a name of the form ".Options.Suffix".
The function can also be used merely to report
the values or just the names of specified components of such a list.
.AG ...
Any arguments (except `Suffix', `Perm', `Rm.opt', `Write', `Names').
Arguments `...' can be given in 4 different syntactic forms,
depending on the desired action.
First, to set or modify options, provide arguments in the `name=value' form.
For example,
.Cs
options(digits=3, check=T, greeting="Hello")
.Ce
modifies the `digits' option to `3' and the `check' option to `TRUE'
and sets a silly option named `greeting' to `"Hello"'.
Newly set options, such as `greeting', are placed at the end of the list.
A list of the previous values of any modified
options is returned.
In the example, the return would be a list of length 2,
with components `digits' and `check'
(since `greeting' was not previously set).
This list can be used in a subsequent call to `options'
to restore the previous values of options.
Second, to get a partial report of options on a given list,
give a single unnamed character vector as the argument `...',
e.g.,
.Cs
options(c("digits", "check", "greeting"))
.Ce
If no argument `...' is given,
`options' returns the entire current options list.
Third, in conjunction with `Rm.opt=T' (see below),
the arguments `...' should be the names of the options to be removed.
For example,
.Cs
options("check", "greeting", Rm.opt=T)
.Ce
would remove the "check" and "greeting" options altogether from the list.
(It is not advisable to remove standard options such as "check";
see below for a list of these.)
A list of the values of the removed options is returned,
which can be used in a subsequent call to `options' to restore them.
Fourth, if an unnamed object of mode `list'
is given as the only `...' argument,
its components will be copied in
as the values for options with the corresponding names.
For example,
.Cs
options(list(check=T, greeting="Hello"))
.Ce
would restore the "check" and "greeting" options
that were first set and then removed in the previous examples.
The sequence
.Cs
oldopt <- options("check", "greeting", Rm.opt=T)
options(oldopt)
.Ce
would first remove the two options named, returning the list `oldopt',
and then would use the latter to restore those options.
A similar sequence could be based on the first syntactic form
described above, in which options are modified, rather than removed.
.PP
Since `...' is placed first, other arguments must be given in full;
for example, `Suff="Anova"' would be interpreted as setting a new
option at the end of the list `.Options',
having the name `Suff' and taking the value `"Anova"',
while `Suffix="Anova"' means that `.Options.Anova' will be used
as the options list in place of `.Options'.
.AG Suffix
character string, giving the identifying suffix of the options list to be used.
If not given, the list `.Options' is used;
otherwise, `.Options.' is prepended to `Suffix',
for example, `Suffix="Anova"' results in use of `.Options.Anova'
as the list to be set, modified, or reported.
.AG Perm
logical; if `TRUE', then the action or the report uses the list
(if found)
in the working data directory rather than the session list.
Default `FALSE'.
.AG Rm.opt
logical; if `TRUE', and if `...' is given as character strings
with names of options, the named options are removed from the list.
Default `FALSE'.
.AG Write
logical; if `TRUE', and if no argument other than `Suffix'
is given, then the session list is assigned in the working data directory,
overwriting the corresponding list already there, if any.
Default `FALSE'.
.AG Names
logical; if `TRUE', then the vector of names
for the specified options list is reported.
In this case, arguments other than `Suffix' are ignored;
nothing is modified.
Default `FALSE'.
.PP
The standard `.Options' list has the following entries:
.AG echo
if `TRUE', each complete expression will be echoed before
it is evaluated.
.AG width
the width (in print positions) of the user's
terminal.
.AG length
the length (in lines) of an output page.
.AG prompt
the string to be printed by the S parser to prompt for
an expression.
.AG continue
the string to be printed by the S parser to prompt for the continuation of
an expression.
.AG digits
number of significant digits to use in `print' (and therefore in automatic
printing).
.AG check
if `TRUE' S performs various internal checks during evaluation.
This provides more information
about warning messages and reloading, and
may help track down mysterious bugs (such as S
terminating abnormally).
Evaluation will be somewhat slower with this option turned on.
.AG memory
the maximum total size (in bytes) for all in-memory data.
If this limit is exceeded, the session will be terminated (to
avoid runaway computations that may slow down or crash the computing system).
If total memory reaches half the allowed limit, S will automatically reload
at the completion of the next expression, to get rid of garbage.
See function `reload'.
.AG object.size
the maximum size (in bytes) for any single S object.
If this limit is exceeded, an error is generated, but the sesion continues.
.AG keep
determines what mode of objects are to be kept in an internal table once
accessed in a session.
Default strategy is `keep="functions"', so that future calls will be faster
from not having to find the functions on the system database.
The only likely other strategy is `keep=NULL' which turns off keeping,
and might be a good idea if you were about to access a large number of
functions sequentially.
.AG error
function to be called when an error or interrupt occurs.
Likely alternatives are `dump.calls', `dump.frames' or `NULL' (no error action).
.AG audit.size
the maximum size (in characters) for the audit file.
If this limit is exceeded at the beginning of a session, a warning message
will be printed.
You should then use the shell-level utility
.Co "S TRUNC_AUDIT"
to reduce the size of the audit file.
.AG show
should graphics be shown directly or returned as a graphics object?
.PP
The initial values for these options are as follows:
.Cs
echo=F width=80 length=48
prompt="> " continue="+ " digits=7
check=F memory=5e7 object.size=5e6
keep="function" error=dump.calls
audit.size=5e5 show=T
.Ce
.PP
Other options lists to be used with this function must be named
with `.Options.' prepended to an identifying suffix.
Users may create any lists they need;
such lists will be useful insofar as there are functions
programmed to consult them.
For example, a simple version of `.Options.Anova' might look like:
.Cs
conf=c(.90,.99) eqvar=F Print=T
.Ce
specifying 90%- and 99%-confidence intervals as defaults,
no default equality-of-variance assumption,
and printing of output by default, even when returned values
are assigned as new data objects.
Suitably programmed data-analysis functions would consult
this list and use it to override their preprogrammed defaults,
though the user could in turn override any defaults by supplying
argument values appropriate to the occasion.
.PP
It is assumed that departures from the standard `.Options',
and any new option lists, will be stored in working data directories.
They are to be changed with `Perm=T' or `Write=T'.
Setting `Rm.opt=T' is a convenient way to prune a list.
Setting `Name=T' is a quick way to be reminded what sorts of things
are on a given list.
.PP
Note that `Perm=T' modifies the permanent list (which must already exist),
but not the session list.
In many applications, it will be simplest to modify the session list
until something suitable for permanent storage is obtained, then
use `Write=T' to record it.
(The argument `Both', used with modified `attach' and `detach' functions,
has not been made available here, since it is too hard to check in general
whether session lists match permanent lists.)
.PP
The initialization script `Sinit' looks for files
whose names begin with ".Options" and copies them
into the (possibly newly created) current .Data subdirectory.
By default, it copies from $HOME/.SFunctions (created by script Sinit0),
but does so only if there are no ".Options*" files already in .Data.
However, if the -option flag is given, Sinit copies
all ".Options*" files from the .Data subdirectory
of the succeeding named directory,
whether or not ".Options*" files exist already in the target .Data;
existing structures will be overwritten
if they have the same name as a copied structure.
.RT
Except when `Names=T',
`options' returns a list,
whose components give either the previous or the current options, as
determined by the arguments supplied.
Thus, the expression
.Cs
options(show=F, echo=T, audit.size=3e4)$show
.Ce
gives the previous value of the option `show'.
(Options that have never
been set yield the value `NULL'.)
The returned list is based on the permanent or the session version
of the list, depending on whether `Perm' is `TRUE' or `FALSE'.
For `Names=T', the return is the vector of names from the appropriate list.
.SE
A permanent or session object, named `.Options' or `.Options.Suffix',
is modified.
.EX
options(width=50)
.sp .5
options(audit.size=3e4, Perm=T)
.sp .5
temp <- options(prompt="Say something! ",continue="\\t")
.sp .5
options(temp)
.sp .5
temp.opt.anova <- options(Suffix="Anova", conf=c(.8, .95, .999), eqvar=T)
.sp .5
old.opt.anova <- options(Suffix="Anova", Write=T)
.sp .5
options(Suffix="Anova", temp.opt.anova)
.sp .5
options(Suffix="Anova", old.opt.anova, Perm=T)
.PP
The first example temporarily sets a 50-character wide line.
The second one causes a warning when `.Data/.Audit' exceeds 30000 characters.
This is a reasonable permanent change, which encourages the user to do
something with the Audit file before it gets out of hand.
In examples 3 and 4, the prompt and continue character are temporarily
changed, but after some computations, they are restored,
using the saved list of modified options.
The fifth example temporarily modifies the `.Options.Anova' list,
specifying computation of 3 confidence intervals by default,
with equal variance assumed.
The previous values of changed options are saved as `temp.opt.anova'.
Next, the temporary .Options.Anova is made permanent,
by assigning it in working directory, but the old version in the
working directory is saved as `old.opt.anova'.
The last two examples illustrate different reversals of previous changes.
.KW error
.WR
!!!EOF!!!
echo "unpacking doc/rm.na"
cat > "doc/rm.na" << "!!!EOF!!!"
.BG
.FN rm.na
.TL
Treat `NA's in data according to specified method
.CS
rm.na(x, nachk="warn")
.AG x
vector or array, possibly containing `NA's.
.AG nachk
character, method for treating `NA's.
Can take one of four values:
.AG nachk="omit"
The function returns the vector of nonmissing elements of `x',
with indices adjusted accordingly.
.AG nachk="warn"
Same as `"omit"' except that a warning message is generated
telling how many missing elements were omitted.
This is the default.
.AG nachk="stop"
Returns an error; a structure with `NA's is not accepted
by any function that incorporates `rm.na' with this value for `nachk'.
.AG nachk="returnNA"
Returns the character string `"returnNA"',
which can then be used in the calling function to perform
some non-error action.
.RT
Either a vector without `NA's (namely `x[!is.na(x)]')
or the string `"returnNA'" or an error, depending on `nachk'
as described above.
.PP
This function does not consult an options list to
reset the default for `nachk',
but the default can be overridden by functions that call `rm.na',
and some of these other functions may reset their default for `nachk'
by consulting an appropriate options list.
.EX
y <- rm.na(x)
.sp .5
ttest(x)
.sp .5
ttest(x, nachk="omit")
.PP
The first example is equivalent to `y <- x[!is.na(x)]' except that
a warning is returned if `x' has missing data.
The latter two examples reflect the fact that
the `ttest' function has a call to `rm.na'.
In the second example, `ttest' uses a default value for `nachk',
which may be the system default `"warn"' or the value
specified in `.Options.Anova' (if any).
Whatever the value, it is passed to `rm.na';
for example, if it is `"stop"', then `ttest' will not accept `NA's.
In the last example, `ttest' passes `"omit"' to `rm.na';
the latter returns to `ttest' the vector `x' with missing values
deleted, without generating any warning.
.WR
!!!EOF!!!
echo "unpacking doc/ttest"
cat > "doc/ttest" << "!!!EOF!!!"
.BG
.FN ttest
.TL
One- or two-sample t test
.CS
ttest(x, y, conf=c(.90,.95,.99), conf0, hyp=0, tails=2, eqvar=T,
nachk="warn", Table=T, Print=T)
.PP
The principal arguments are data vectors, `x' and `y'.
The output includes an obligatory vector of descriptive statistics,
and three optional tables, one with formatted descriptive statistics,
one with confidence intervals, and one with hypothesis tests.
Defaults for subsidiary arguments (all but `x' and `y')
can be reset in `.Options.Anova'.
.AG x
numeric vector, `NA's allowed. Must be supplied.
.AG y
numeric vector, `NA's allowed.
If omitted, one-sample t test is performed using argument `x' only.
If `y' is supplied, a two-sample test compares location parameters
of two gaussian populations treating `x' and `y' as respective
random samples.
.AG conf
numeric vector (values between 0 and 1), or `FALSE'.
Confidence level(s) used in interval estimation;
if `FALSE', no table of interval estimates is returned,
though a single interval estimate may be obtained from argument `conf0'.
Default gives .90, .95, and .99 intervals;
it can be reset in `.Options.Anova'.
.AG conf0
a number between 0 and 1, or `FALSE'.
Confidence level for the interval estimate included
with the obligatory output and with table of descriptive statistics.
If `FALSE', no interval estimate is given in those outputs.
Default is the minimum level of `conf', or .95 if `conf=F';
can be reset in `.Options.Anova'.
.AG hyp
numeric vector, or `FALSE'.
If numeric, it specifies a finite number of hypotheses to be tested,
with t-statistics and attained significance levels printed for each.
Default is 0 ("null" hypothesis is tested).
If `FALSE', no table of hypothesis tests is returned.
The default can be reset in `.Options.Anova'.
.AG tails
1 or 2. If 2 (the default), attained significance levels for
hypothesis tests are based on two-tailed symmetric rejection regions.
If 1, attained significance levels are halved
(useful for those who take significance levels very seriously).
The default can be reset in `.Options.Anova'.
.AG eqvar
logical. If `TRUE' (default), the classical Student assumption
of equal-variance gaussian populations is used.
If `FALSE', the Satterthwaite approximation is used.
Default can be (and probably should be) reset in `.Options.Anova'.
Equality-of-variance tests, which we consider to be useless
in conjunction with two-sample t-tests, are not provided.
.AG nachk
character, must be one of
`"omit"', `"warn"' (the default), `"stop"', or `"returnNA"'.
The value is passed to a call to `rm.na'.
If `"omit"' or `"warn"' is used, `NA's are omitted and the
calculations proceed;
`"warn"' produces a message telling how many were omitted.
With `"stop"', `NA's are not accepted at all.
With `"returnNA"', an appropriately sized and labelled
vector of `NA's is returned.
Default can be reset in `.Options.Anova'.
.AG Table
logical; if `TRUE' (default), a formatted table of descriptive
statistics is generated, in addition to the obligatory output.
The default can be reset in `.Options.Anova'.
.AG Print
logical; if `TRUE' (default), then each generated table is printed,
even if the returned output is assigned a name.
If `Print=T', then `Table' is automatically set `TRUE' also.
If `FALSE', and if an assignment is made, tables can be printed
selectively later (see below).
The default can be reset in `.Options.Anova'.
.br
.ne 4
.RT
In the one-sample case, the obligatory return is a vector
giving the mean, estimated standard error of the mean, and sample size.
If `conf0' is numeric, then lower and upper confidence limits are
added to this vector.
For the two-sample procedure, two means, two standard errors,
and two sample sizes are returned;
if `conf0' is numeric, then lower and upper confidence limits
for the difference in means are given also.
.sp .5
If any of the following tables are prepared,
the obligatory return becomes
the first component, `$Calc', of a list.
Otherwise it is a simple vector.
.PP
If `Table=T' these same statistics are also prepared as a formatted table,
given in component `$Desc' of the returned list.
If `eqvar=T', then in the two-sample case, estimates of proportion of
variance accounted for and standard deviation of the residuals
are also included with `$Desc'.
.PP
If `conf' is numeric, the return includes component `$CI',
a table with one or more confidence intervals.
.PP
If `hyp' is numeric, the return includes component `$Test',
a table with one or more hypothesis tests.
.EX
ttest(x, y)
z <- ttest(x, y, Print=F)
z <- ttest(x, conf=c(.9, .99), conf0=.95, hyp=F)
z <- apply(x, 1, ttest, Lcomp="CI", nachk="omit", hyp=F, Table=F, Print=F)
options(Suffix="Anova", Print=F)
z <- ttest(rnorm(15), rnorm(6, 3, 5), eqvar=F)
.PP
The first is a two-sample comparison with all defaults in place.
In the second, the output (consisting of named components
is assigned to `z', which will have all four components,
`z$Calc', `z$Desc', `z$CI', `z$Test'.
Nothing will appear at the terminal, since `Print=F'.
The third example changes the confidence levels from the defaults
and suppresses the `Test' component entirely; `z$Desc' and `z$CI'
will be printed at the terminal, since `Print' is `TRUE' by default.
.PP
The fourth example uses `ttest' within `apply'.
If `x' is a 100 x 30 matrix, with some missing data,
each t-test will be applied to the nonmissing data in a vector
of length 30, and the output of each will be a 3 x 2 matrix of
lower and upper confidence limits for the mean, using confidence
levels .90, .95, .99 by default.
This part of the output is selected by `Lcomp="CI"';
setting the other arguments prevents generation of up to 100
warnings about missing data (one for each row) and prevents
useless calculation and printing.
The return will be a 3 x 2 x 100 array.
.PP
The `options' call before the last example changes the default
for `Print' for the remainder of the session (adding `Perm=T' would
change it for future sessions, but not for this one).
Thus, in the last example, unequal variance is (correctly)
assumed, and the output will not appear at the terminal because
of the preceding change in `.Options.Anova'.
One can later print `z' or any of its 4 components.
When we tried this,
the full printout of `z' looked as follows:
.br
.ne 32
.DS
.TS
l s s s s s s s
r r r r r r r r.
$Calc:
mean1 se1 N1 mean2 se2 N2 L90% U90%
.sp .3
0.1774428 0.2900104 15 2.432034 2.939021 6 -8.181635 3.672452
.TE
.DE
.DS
.TS
l s s s
l s s s
r r r r.
$Desc:
$Desc$"M/SE/DF":
mean se df
.sp .3
rnorm(15) 0.17744 0.290 14.0
rnorm(6, 3, 5) 2.43200 2.939 5.0
diff -2.25460 2.953 5.1
.T&
l s s s
r r r r.
$Desc$"RSQ/SD.RESID/CI LIMITS":
rsq sd.resid L90% U90%
.sp .3
NA NA -8.1816 3.6725
.T&
l s s s.
$Desc$ASSUMPTION:
[1] " Unequal variance assumed; rsq & sd.resid omitted."
.TE
.DE
.DS
.TS
l s s
r r r.
$CI:
Lower limit Upper limit
.sp .3
.9 Interval -8.1816 3.6725
.95 Interval -9.8041 5.2950
.99 Interval -14.0430 9.5343
.TE
.TS
l s s
r r r.
$Test:
hypothesis t(5.1) Pr(Type I error), 2-tailed
.sp .3
0 -0.763 0.48
.TE
.DE
.sp 1
.SA
options, rm.na, apply
.WR
!!!EOF!!!
echo "unpacking scripts/Sinit"
cat > "scripts/Sinit" << "!!!EOF!!!"
### initialize (NEW) S in current directory
# make subdirectory .Data
# copy standard or optional search list into .Data
# copy option list(s), if any, into .Data
#############################################
HSF=$HOME/.SFunctions
xx=0
Sarg=0
Oarg=0
#
if test -d .Data
then echo ".Data already exists in `pwd`."; xx=1
else mkdir .Data
echo "Subdirectory .Data created in `pwd`."
fi
#
while [ $# -gt 0 ]
do
case $1 in
-search) Sarg=1
if [ ! -f $2/.Data/.Search.list ]
then echo "No .Search.list found in $2/.Data"; xx=1
else cp $2/.Data/.Search.list .Data
echo ".Search.list copied from $2/.Data"
fi
shift; shift
;;
-option) Oarg=1
trap 'rm -f /tmp/opt$$' 0
ls $2/.Data/.Options $2/.Data/*.Options >/tmp/opt$$ \
2>/dev/null
if [ -s /tmp/opt$$ ]
then cp $(cat /tmp/opt$$) .Data
echo "Option lists copied from $2/.Data"
else echo "No option lists found in $2/.Data"; xx=1
fi
shift; shift
;;
*) echo "Illegal argument: $1"
echo "Usage: Sinit [-search dirname] [-option dirname]"
shift
;;
esac
done
#
if [ Sarg -eq 0 ]
then if [ -f .Data/.Search.list ]
then echo ".Search.list already exists in .Data"; xx=1
elif test $PWD = $HOME
then echo 'attach(pos=1,unix("echo $HOME/.Data")); detach("./.Data"); attach(pos=3,unix("echo $HSF")); temp <- .Search.list' | S NEW
mv .Data/temp .Data/.Search.list
echo "Special HOME .Search.list made in $HOME/.Data"
elif test ! -f $HSF/.Search.list
then echo "No .Search.list found to copy."; xx=1
else cp $HSF/.Search.list .Data
echo ".Search.list copied from $HSF"
fi
fi
#
if [ Oarg -eq 0 ]
then trap 'rm -f /tmp/exist$$' 0
ls .Data/.Options .Data/*.Options >/tmp/exist$$ 2>/dev/null
if [ -s /tmp/exist$$ ]
then echo "Option list(s) already exist in .Data"
echo "No options lists were copied from $HSF"
else trap 'rm -f /tmp/opt$$' 0
ls $HSF/.Options $HSF/*.Options >/tmp/opt$$ 2>/dev/null
if [ -s /tmp/opt$$ ]
then cp $(cat /tmp/opt$$) .Data
echo "Option lists copied from $HSF"
else echo "No option lists were found in $HSF"
echo " so option lists have not been customized."
fi
fi
fi
exit $xx
!!!EOF!!!
echo "unpacking scripts/Sinit0"
cat > "scripts/Sinit0" << "!!!EOF!!!"
### set up user-controlled central storage for S structures
# make subdirectory $HOME/.SFunctions
# make .Search.list that looks in $HOME/.SFunctions and store it
############################################
HSF=$HOME/.SFunctions
export HSF
if [ -d $HSF ]
then echo "$HSF already exists."; exit 1
fi
mkdir $HSF
cd $HSF
trap "rm -r $HSF/temp$$" 0
mkdir temp$$
cd temp$$
mkdir .Data
echo 'attach(pos=3,unix("echo $HSF")); temp <- .Search.list' | S NEW
cp .Data/temp $HSF/.Search.list
echo "Made directory $HSF;\ncreated $HSF/.Search.list"
!!!EOF!!!
echo "unpacking src/OKcor"
cat > "src/OKcor" << "!!!EOF!!!"
OKcor <- function(x, y=x, cases, case.record=F, trim=0)
{
# Force specification of cases to exclude for missing data;
# provide help if not specified correctly.
#
if(exists(".Options.Cor"))
topt <- .Options.Cor
else topt <- vector("list", 1)
if(missing(cases)) {
if(length(topt$cases) > 0)
cases <- topt$cases
else if(count.na(c(x,y)) > 0) {
help(OKcases)
stop("Argument `cases' must be specified.")
}
else cases <- "complete"
}
if((!cases=="complete") & (!cases=="pairwise")) {
help(OKcases)
stop("Illegal value for `cases' argument.")
}
if(missing(case.record) & length(topt$case.record) > 0)
case.record <- topt$case.record
#
# Coerce arguments to matrices and obtain dimensions.
#
x <- as.matrix(x)
y <- as.matrix(y)
nx <- nrow(x)
if(nx != nrow(y))
stop("Arguments must have same length or nrow.")
ncx <- ncol(x)
ncy <- ncol(y)
#
# Calculation for cases (rows) with complete data;
# record row numbers or total of cases dropped.
#
if(cases=="complete") {
OK <- (1:nx)[count.na(cbind(x,y)) == 0]
z <- cor(x[OK, ], y[OK, ], trim=trim)
if(!case.record)
M <- paste(nx - length(OK),
"rows eliminated for incomplete data.")
else M <- (1:nx)[-OK]
}
#
# Calculation for all available cases for each pair of variables;
# record number of cases for each correlation.
#
if(cases=="pairwise") {
z <- N <- matrix(NA, nr=ncx, nc=ncy)
for(j in 1:ncx) for(k in 1:ncy) {
z[j, k] <- OKcor(x[,j], y[,k], "complete", trim=trim)$corr
N[j, k] <- sum(!is.na(x[,j] + y[,k]))
}
}
#
# Change matrices with 1 row or column to vectors;
# pick up dimnames from column names for 2x2 or larger matrices
# return correlations and missing data record or sample sizes as list.
#
if(min(ncx, ncy) == 1) {
z <- as.vector(z)
if(cases=="pairwise")
N <- as.vector(N)
}
if(min(ncx, ncy) > 1) {
dimnames(z) <- list(dimnames(x)[[2]], dimnames(y)[[2]])
if(cases=="pairwise")
dimnames(N) <- dimnames(z)
}
ans <- list(corr=z, NULL)
if(cases=="complete") {
ans[[2]] <- M
names(ans)[2] <- "miss"
}
if(cases=="pairwise") {
ans[[2]] <- N
names(ans)[2] <- "sampsize"
}
ans
}
!!!EOF!!!
echo "unpacking src/OKhist"
cat > "src/OKhist" << "!!!EOF!!!"
OKhist <- function(
x, nclass, breaks, plot=T, relative=F, ...,
xlab=deparse(substitute(x)), ylab="")
{
miss <- (1:length(x))[is.na(x)]
if(length(miss) > 0) {
warning(paste(length(miss),
"missing value(s) omitted in histogram."))
x <- x[-miss]
}
if(missing(breaks)) {
if(missing(nclass))
nclass <- log(length(x), base=2) + 1
if(length(x)==1)
breaks <- x + c(-1, 1)
else breaks <- pretty(x, nclass)
if(any(x <= breaks[1]))
breaks <- c(breaks[1] - diff(breaks)[1], breaks)
}
bin <- cut(x, breaks)
if(any(is.na(bin)))
stop("breaks do not span the range of x.")
L <- length(levels(bin))
counts <- tabulate(bin, L)
if(relative) {
total <- sum(counts)
midp <- signif(d=3, (breaks[-1] + breaks[-(L+1)])/2)
left <- 1.01*min(breaks) - .01*max(breaks)
if(missing(ylab))
ylab <- "relative frequency"
oldpar <- par(mar=c(6.1, 5.1, 4.1, 0))
invisible(barplot(counts/total, width=breaks, histo=T,
axes=F, ..., xlab=""))
mtext(paste(midp), side=1, line=1, at=midp)
mtext("midpoints", side=1, line=1, at=left, adj=1)
mtext(paste(counts), side=1, line=3, at=midp)
mtext("counts", side=1, line=3, at=left, adj=1)
mtext(xlab, side=1, line=5)
axis(2)
mtext(ylab, side=2, line=4)
par(oldpar)
invisible(midp)
}
else if(plot) {
if(missing(ylab))
ylab <- "frequency"
invisible(barplot(counts, width=breaks, histo=T,
..., xlab=xlab, ylab=ylab))
}
else list(breaks=breaks, counts=counts)
}
!!!EOF!!!
echo "unpacking src/OKlength"
cat > "src/OKlength" << "!!!EOF!!!"
OKlength <- function(x, Warn=T)
{
y <- x[!is.na(x)]
z <- length(y)
if(Warn) {
n <- length(x) - z
if(n > 1) warning(paste(n, "missing data omitted."))
if(n==1) warning("1 missing datum omitted.")
}
z
}
!!!EOF!!!
echo "unpacking src/OKlsfit"
cat > "src/OKlsfit" << "!!!EOF!!!"
OKlsfit <- function(x, y, wt, anova=F, partials=F, intercept=T,
coefnames=NULL, case.record=F, Print=T, ...)
{
# Check options list for alternative defaults
#
if(exists(".Options.Reg"))
topt <- .Options.Reg
else topt <- vector("list", 1)
if(missing(anova) & length(topt$anova)==1)
anova <- topt$anova
if(missing(partials) & length(topt$partials)==1)
partials <- topt$partials
if(missing(intercept) & length(topt$intercept)==1)
intercept <- topt$intercept
if(missing(case.record) & length(topt$case.record)==1)
case.record <- topt$case.record
if(missing(Print) & length(topt$Print)==1)
Print <- topt$Print
#
# Remove cases with missing data in x or y
#
ncx <- ncol(as.matrix(x))
nrx <- nrow(as.matrix(x))
mat <- cbind(x, y)
OK <- (1:nrx)[count.na(mat)==0]
nR <- length(OK)
if(nR < nrx) {
if(case.record) miss <- matrix((1:nrx)[-OK], nr=1,
dimnames=list("Row numbers of deleted rows:", NULL))
else miss <- paste(nrx-nR, "row(s) omitted for missing data.")
}
else miss <- "No missing data."
xOK <- mat[OK, 1:ncx]
yOK <- mat[OK, -(1:ncx)]
yOK <- as.matrix(yOK)
ncy <- ncol(yOK)
if(missing(wt))
wOK <- rep(1, nR)
else wOK <- wt[OK]
#
# Perform least-squares fits to each column of yOK;
# Use ... to pass parameters (such as tolerance) to lsfit
#
I <- intercept
reg <- lsfit(xOK, yOK, wOK, int=I, ...)
r1 <- as.matrix(reg$residuals)
SS1 <- apply(r1^2, 2, sum)
df1 <- nR - I - ncx
MSE <- SS1/df1
#
# Set names for dependent variables
#
if(length(dimnames(y)[[2]]) == 0)
ynames <- paste("y", 1:ncy, sep="")
else ynames <- dimnames(y)[[2]]
#
# If requested, compute omnibus ANOVA table(s), one for each column of yOK
#
if(anova &!I) warning("No intercept; anova table not computed.")
if(anova & I) {
reg0 <- lsfit(rep(1, nR), yOK, wOK, int=F, ...)
r0 <- as.matrix(reg0$residuals)
SS0 <- apply(r0^2, 2, sum)
rsq <- (1 - SS1/SS0)
df0 <- nR - 1
Fnum <- (SS0 - SS1)/(df0 - df1)
Fval <- Fnum/MSE
tab1 <- array(NA, dim = c(3, 4, ncy))
for(k in 1:ncy) {
tab1[, , k] <- cbind(c(df0 - df1, df1, df0),
signif(c(SS0[k] - SS1[k], SS1[k], SS0[k]), 5),
signif(c(Fnum[k], MSE[k], SS0[k]/df0), 5),
signif(c(Fval[k], sqrt(MSE[k]), rsq[k]), 4))
}
dimnames(tab1) <- list(c("regression", "error", "total"),
c(" df", " ssq", " ssq/df", " Fval/sdev/rsq"),
ynames)
if(ncy==1) tab1 <- tab1[ , ,1]
}
else {
tab1 <- matrix(signif(sqrt(MSE), 4), nc=1)
dimnames(tab1) <- list(ynames, "sdev")
}
#
# If requested, evaluate each column of x in regard to each column of y
#
if(partials & (I+ncx) > 1) {
if(!I) warning("No intercept; partial rsq's not computed.")
ncy <- ncol(yOK)
tab2 <- array(NA, dim=c(ncx+I, 4, ncy))
c1 <- as.matrix(reg$coef)
tab2[, 1, ] <- signif(d=4, c1)
reg0 <- lsfit(rep(1, nR), yOK, wOK, int=F, ...)
r0 <- as.matrix(reg0$residuals)
SS0 <- apply(r0^2, 2, sum)
for(i in (!I):ncx) {
if(i==0) preg <- lsfit(xOK, yOK, wOK, int=F, ...)
else if(i==1 & ncx==1)
preg <- reg0
else preg <- lsfit(xOK[, -i], yOK, wOK, int=I, ...)
pr0 <- as.matrix(preg$residuals)
pSS0 <- apply(pr0^2, 2, sum)
Fnum <- pSS0 - SS1
Fval <- Fnum/MSE
se <- abs(tab2[i+I, 1, ])/sqrt(Fval)
tab2[i+I, 2, ] <- signif(d=3, se)
tab2[i+I, 3, ] <- round(d=3, Fnum/pSS0)
if(I & i>0) tab2[i+1, 4, ] <- round(d=3, Fnum/SS0)
}
if(length(coefnames)==0 & length(dimnames(x)[[2]])==0)
coefnames <- paste("x", 1:ncx, sep="")
if(length(coefnames)==0 & length(dimnames(x)[[2]]) > 0)
coefnames <- dimnames(x)[[2]]
if(I) coefnames <- c("Intercept", coefnames)
head2 <- c("Coefficient", " Std Error", "Cond Rsq", " Incr Rsq")
if(length(dimnames(y)[[2]]) == 0)
ynames <- paste("y", 1:ncy, sep="")
else ynames <- dimnames(y)[[2]]
dimnames(tab2) <- list(coefnames, head2, ynames)
if(ncy==1) tab2 <- tab2[ , ,1]
}
else tab2 <- NULL
#
# Construct answer
ans <- list(coef=reg$coef,
residuals=reg$residuals,
qr=reg$qr,
intercept=reg$intercept,
miss=miss,
sdev=tab1,
partials=tab2)
if(anova & I)
names(ans)[6] <- "anova"
#
#
# If requested, print summary tables;
# return tables plus usual lsfit components invisibly;
# else return list with usual lsfit components plus summary tables
#
if(Print) {
print(tab1)
cat("\n")
if(!is.null(tab2)) print(tab2)
invisible(ans)
}
else ans
}
!!!EOF!!!
echo "unpacking src/OKmean"
cat > "src/OKmean" << "!!!EOF!!!"
OKmean <- function(x, Warn=T, ...)
{
y <- x[!is.na(x)]
z <- mean(y, ...)
if(Warn) {
n <- length(x) - length(y)
if(n > 1) warning(paste(n, "missing data omitted in mean."))
if(n==1) warning("1 missing datum omitted in mean.")
}
z
}
!!!EOF!!!
echo "unpacking src/OKprod"
cat > "src/OKprod" << "!!!EOF!!!"
OKprod <- function(..., Warn=T)
{
x <- c(...)
y <- x[!is.na(x)]
z <- prod(y)
if(Warn) {
n <- length(x) - length(y)
if(n > 1) warning(paste(n, "missing data omitted in prod."))
if(n==1) warning("1 missing datum omitted in prod.")
}
z
}
!!!EOF!!!
echo "unpacking src/OKquantile"
cat > "src/OKquantile" << "!!!EOF!!!"
OKquantile <- function(x, probs=seq(0,1,.25), Warn=T)
{
y <- x[!is.na(x)]
z <- quantile(y, probs)
names(z) <- paste("Q" , signif(d=2, probs), sep="")
if(Warn) {
n <- length(x) - length(y)
if(n > 1) warning(paste(n, "missing data omitted in quantile."))
if(n==1) warning("1 missing datum omitted in quantile.")
}
z
}
!!!EOF!!!
echo "unpacking src/OKse"
cat > "src/OKse" << "!!!EOF!!!"
OKse <- function(x, Warn=T)
{
y <- OKvar(as.vector(x), Warn=F)/OKlength(x, Warn=Warn)
sqrt(y)
}
!!!EOF!!!
echo "unpacking src/OKsum"
cat > "src/OKsum" << "!!!EOF!!!"
OKsum <- function(..., Warn=T)
{
x <- c(...)
y <- x[!is.na(x)]
z <- sum(y)
if(Warn) {
n <- length(x) - length(y)
if(n > 1) warning(paste(n, "missing data omitted in sum."))
if(n==1) warning("1 missing datum omitted in sum.")
}
z
}
!!!EOF!!!
echo "unpacking src/OKvar"
cat > "src/OKvar" << "!!!EOF!!!"
OKvar <- function(x, y = x, Warn=T)
{
x <- as.matrix(x)
y <- as.matrix(y)
nx <- nrow(x)
if(nx != nrow(y))
stop("Arguments must have same length or nrow.")
OK <- (1:nx)[count.na(cbind(x,y)) == 0]
nOK <- length(OK)
if(nOK < 2) {
if(Warn && nOK < 1)
warning("All rows have missing data.")
if(Warn && nOK==1)
warning("Only 1 row has no missing data.")
z <- matrix(NA, nr=ncol(x), nc=ncol(y))
if(min(dim(z)) > 1)
return(z)
else return(as.vector(z))
}
xm <- as.double(rep(1, nOK))
dim(xm) <- c(nOK, 1)
z <- (.Fortran("dqr",
qr = xm,
as.integer(c(nOK, 1)),
pivot = as.integer(1),
qraux = as.double(0),
as.double(1e-07),
as.double(0),
rank = as.integer(1)))[c("qr", "qraux", "rank", "pivot")]
x0 <- qr.resid(z, x[OK, ])
if(nargs()==3 | (nargs==(2) & missing(Warn))) {
y0 <- qr.resid(z, y[OK, ])
z <- x0 %c% y0
}
else z <- crossprod(x0)
if(Warn) {
m <- nx - nOK
if(m > 1) warning(paste(m, "rows omitted for missing data."))
if(m==1) warning("1 row omitted for missing data.")
}
if(min(ncol(x), ncol(y)) > 1)
z/(nOK - 1)
else as.vector(z)/(nOK - 1)
}
!!!EOF!!!
echo "unpacking src/apply"
cat > "src/apply" << "!!!EOF!!!"
apply <- function(X, MARGIN, FUN, ..., Lcomp)
{
# Set up basic elements and transform X to a matrix, newX,
# whose columns will provide the variable arguments to the FUN loop:
#
if(is.character(FUN))
FUN <- get(FUN)
d <- dim(X)
dn <- dimnames(X)
if(!is.null(dn))
dn <- dn[MARGIN]
newX <- aperm(X, c(seq(1, length(d))[-MARGIN], MARGIN))
subdim <- d[-MARGIN]
nc <- prod(d[MARGIN])
dim(newX) <- c(prod(subdim), nc)
#
# Test the effect of FUN by applying it to column 1 of newX;
# if it is a list, find which components to extract;
# set up a simple or compound list in which to assign results;
# and assign the result computed from column 1.
#
if(length(subdim) > 1)
ans1 <- FUN(array(newX[, 1], subdim), ...)
else
ans1 <- FUN(newX[, 1], ...)
if(missing(Lcomp))
Lcomp <- 1:length(ans1)
if(is.list(ans1)) {
name1 <- names(ans1)
if(is.character(Lcomp))
Rcomp <- match(Lcomp, name1)
else if (is.numeric(Lcomp) & any(Lcomp < 0)) {
if (!all(Lcomp < 0))
stop("Values <0 and >0 can not be mixed in Lcomp")
Rcomp <- (1:length(ans1))[Lcomp]
} else Rcomp <- Lcomp
if(length(Rcomp[!is.na(Rcomp)]) < length(Lcomp))
stop("Arg \"Lcomp\" asks for nonexistent components.")
}
else Rcomp <- 1
L <- length(Rcomp)
if(L > 1) {
ans <- vector("list", L)
names(ans) <- name1[Rcomp]
ans1 <- ans1[Rcomp]
ans0 <- vector("list", nc)
if(length(MARGIN)==1 && length(dn[[1]]) > 0)
names(ans0) <- dn[[1]]
for(k in 1:L) {
ans[[k]] <- ans0
ans[[k]][[1]] <- ans1[[k]]
}
}
else {
ans <- vector("list", nc)
if(length(MARGIN)==1 && length(dn[[1]]) > 0)
names(ans) <- dn[[1]]
if(is.list(ans1))
ans[[1]] <- ans1[[Rcomp]]
else ans[[1]] <- ans1
}
#
# Computation loop:
#
if(nc > 1) {
if(length(subdim) > 1)
for(i in 2:nc) {
t.ans <- FUN(array(newX[, i], subdim), ...)
if(L > 1) {
t.ans <- t.ans[Rcomp]
for(k in 1:L)
ans[[k]][[i]] <- t.ans[[k]]
}
else if(is.list(ans1))
ans[[i]] <- t.ans[[Rcomp]]
else ans[[i]] <- t.ans
}
else for(i in 2:nc) {
t.ans <- FUN(newX[, i], ...)
if(L > 1) {
t.ans <- t.ans[Rcomp]
for(k in 1:L)
ans[[k]][[i]] <- t.ans[[k]]
}
else if(is.list(ans1))
ans[[i]] <- t.ans[[Rcomp]]
else ans[[i]] <- t.ans
}
}
#
# Restructure result and assign names:
#
if(L==1) return(apply.struc(ans, nc, MARGIN, d, dn))
else lapply(ans, FUN=apply.struc, nc, MARGIN, d, dn)
}
!!!EOF!!!
echo "unpacking src/apply.struc"
cat > "src/apply.struc" << "!!!EOF!!!"
apply.struc <- function(LIST, NC, MAR, DIM, DN)
{
# Define a function to restructure simple lists, to be called in apply()
#
ans0 <- unlist(LIST, recursive = F)
if(is.recursive(ans0))
return(LIST)
OKdim <- function(y)
{
if(is.array(y)) dOK <- dim(y)
else dOK <- length(y)
dOK
}
dlist <- vector("list", NC)
m0 <- rep(NA, NC)
for(j in 1:NC) {
dlist[[j]] <- OKdim(LIST[[j]])
m0[j] <- length(dlist[[j]])
}
if(!is.constant(m0))
return(LIST)
d0 <- matrix(unlist(dlist), nr=m0[1], nc=NC)
t0 <- rep(NA, m0[1])
for(i in 1:m0[1])
t0[i] <- !is.constant(d0[i,])
if(sum(t0) > 0)
return(LIST)
else d1 <- d0[,1]
if(prod(d1)==1) {
if(length(MAR)==1) {
newA <- ans0
if(length(DN[[1]]) > 0)
names(newA) <- DN[[1]]
}
else newA <- array(ans0, DIM[MAR], DN)
}
else {
A <- as.array(LIST[[1]])
n1 <- dimnames(A)
if(is.null(n1)) n1 <- vector("list", length(d1))
if(is.null(DN)) DN <- vector("list", length(MAR))
if(prod(d1)==max(d1)) {
Imax <- (1:length(d1))[d1 > 1]
d1 <- d1[Imax]
n1 <- n1[Imax]
}
newA <- array(ans0, c(d1, DIM[MAR]), c(n1, DN))
}
return(newA)
}
!!!EOF!!!
echo "unpacking src/attach"
cat > "src/attach" << "!!!EOF!!!"
attach <- function(file=NULL, pos=2, Perm=F, Both=F, Write=F)
{
tlist <- .Search.list
if(exists(".Search.list", where=1))
plist <- get(".Search.list", where=1)
else if(Perm | Both)
stop("No permanent .Search.list in working directory.")
if(Write)
if(!is.null(file) | !missing(pos) | Perm | Both)
stop("Use \"Write=T\" without other arguments to record current session .Search.list in working directory.")
else {
assign(".Search.list", tlist, where=1)
if(exists(".Search.list", 1))
return(invisible(plist))
else(return(NULL))
}
#
# (re)define Perm and temp;
#
if(Both) Perm <- T
temp <- ifelse(Both, T, !Perm)
#
# check that session and permanent search vectors match if Both=T
#
if(Both) {
c0 <- max(length(plist), length(tlist))
c1 <- 1:c0
c2 <- match(plist, tlist)
c3 <- max(abs(c1-c2))
if(is.na(c3) || c3 != 0)
stop("Permanent and session .Search.lists don't match.")
}
#
# with no 'file', return permanent or session search vector unchanged
#
if(is.null(file))
if(Perm) return(plist)
else return(tlist)
#
# attach 'file' to permanent search vector when required;
# when temp == FALSE, exit with session search vector unchanged
#
if(Perm) {
pN <- length(plist)
if(pos <= 1)
pnew <- c(file, plist)
else if (pos > pN)
pnew <- c(plist, file)
else pnew <- c(plist[1:(pos-1)], file, plist[pos:pN])
assign(".Search.list", pnew, where=1)
if(!temp) {
assign(".Search.list", tlist, frame=0)
warning("Session .Search.list unchanged; use \"Both=T\" to attach to both permanent and session .Search.lists.")
return(invisible(plist))
}
}
#
# attach 'file' to session search vector when required
#
if(temp) {
tN <- length(tlist)
if(pos <= 1)
tnew <- c(file, tlist)
else if(pos > tN)
tnew <- c(tlist, file)
else tnew <- c(tlist[1:(pos - 1)], file, tlist[pos:tN])
assign(".Search.list", tnew, frame = 0)
return(invisible(tlist))
}
}
!!!EOF!!!
echo "unpacking src/chitab"
cat > "src/chitab" << "!!!EOF!!!"
chitab <- function(table, models=list("ind","sat"), save=F, ...)
{
# Definitions and blank structures
#
L <- length(models)
ftab <- vector("list", L)
chi <- df <- numeric(L)
src <- character(L)
df[L] <- chi[L] <- 0
DIM <- dim(table)
M <- length(DIM) - 1
#
# Calculation loop
#
for(i in L:1) {
if(is.list(models[[i]])) {
ftab[[i]] <- models[[i]]
src[i] <- models[[i]]$src
}
else {
thyp <- models[[i]]
src[i] <- paste(thyp, collapse=" ")
if(length(thyp)==1 && thyp=="ind")
thyp <- c(rbind(1:M, rep(0,M)), M+1)
if(length(thyp)==1 && thyp=="sat")
thyp <- 1:(M+1)
tfit <- loglin(table, thyp, ...)
tpar <- loglin.par(thyp, dim(table))
ftab[[i]] <- list(fit=tfit, par=tpar, src=src[i])
}
if(i < L) {
tLLR <- log(ftab[[i+1]]$fit/ftab[[i]]$fit)
chi[i] <- 2 * OKsum(table*tLLR)
df[i] <- ftab[[i+1]]$par - ftab[[i]]$par
}
}
ans <- cbind(df, chi)
dimnames(ans) <- list(src, c("df", "chisq"))
if(save) list(tab=ans, fit=ftab)
else ans
}
!!!EOF!!!
echo "unpacking src/count.na"
cat > "src/count.na" << "!!!EOF!!!"
count.na <- function(x)
{
if(!is.matrix(x))
x <- t(as.matrix(x))
n <- ncol(x)
I <- rep(1, n)
as.vector(is.na(x) %*% I)
}
!!!EOF!!!
echo "unpacking src/detach"
cat > "src/detach" << "!!!EOF!!!"
detach <- function(what=2, Perm=F, Both=F)
{
tlist <- .Search.list
if(exists(".Search.list", where=1))
plist <- get(".Search.list", where=1)
else if(Perm | Both)
stop("No permanent .Search.list in working directory.")
#
# (re)define Perm and temp;
#
if(Both) Perm <- T
temp <- ifelse(Both, T, !Perm)
#
# check that session and permanent search vectors match if Both=T
#
if(Both) {
c0 <- max(length(plist), length(tlist))
c1 <- 1:c0
c2 <- match(plist, tlist)
c3 <- max(abs(c1-c2))
if(is.na(c3) || c3 != 0)
stop("Permanent and session .Search.lists don't match.")
}
#
# if required, remove specified element(s) from permanent search vector;
# unless temp == TRUE, exit with session search vector unchanged
#
if(Perm) {
switch(mode(what),
character = {
subs <- match(what, plist)
if(is.na(subs))
stop(paste("Database", what,
"isn't on .Data/.Search.list"))
else subs <- -subs
},
logical = subs <- !what,
subs <- -what)
assign(".Search.list", plist[subs], where = 1)
if(!temp) {
assign(".Search.list", tlist, frame = 0)
warning("Session .Search.list unchanged; use \"Both=T\" to detach from both permanent and session .Search.lists.")
return(invisible(plist))
}
}
#
# if required, remove specified element(s) from session search vector
#
if(temp) {
switch(mode(what),
character = {
subs <- match(what, tlist)
if(is.na(subs))
stop(paste("Database", what,
"isn't being searched"))
else subs <- -subs
},
logical = subs <- !what,
subs <- -what)
assign(".Search.list", tlist[subs], frame = 0)
return(invisible(tlist))
}
}
!!!EOF!!!
echo "unpacking src/fill.na"
cat > "src/fill.na" << "!!!EOF!!!"
fill.na <- function(x, nomiss=F, case.record=F, Warn=T)
{
if(missing(nomiss) & length(.Options$nomiss)==1)
nomiss <- .Options$nomiss
if(missing(case.record) & length(.Options$case.record)==1)
case.record <- .Options$case.record
if(missing(Warn) & length(.Options$Warn)==1)
Warn <- .Options$Warn
miss <- (1:length(x))[is.na(x)]
if(length(miss) > 0) {
if(mode(x) != mode(nomiss))
stop(paste(
"mode(", deparse(substitute(x)), ")=", mode(x),
" but mode(nomiss)=", mode(nomiss), ".", sep=""))
else x[miss] <- nomiss
}
else miss <- numeric(0)
if(case.record) return(list(struc=x, miss=miss))
else {
if(Warn) warning(paste(OKlength(miss, W=F),
" NA(s) replaced by ", nomiss, ".", sep=""))
return(x)
}
}
!!!EOF!!!
echo "unpacking src/loglin.par"
cat > "src/loglin.par" << "!!!EOF!!!"
loglin.par <- function(margin, DIM)
{
# set up matrix of binary digits encoding possible margins (bitm)
# vector of binary digit sums (bits)
# and vector of margin parameter counts (mpc)
#
D <- length(DIM)
bitm <- matrix(NA, nr=D, nc=2^D)
for(j in 1:D)
bitm[j, ] <- rep(rep(0:1, rep(2^(D-j), 2)), 2^(j-1))
bits <- rep(1,D) %*% bitm
dm <- DIM * bitm
dm[dm==0] <- 1
mpc <- apply(dm, 2, prod) - 1
#
# find breaks in margin specification (brk)
# and set up matrix of occurrences of margins (marm)
#
N <- length(margin)
indic <- margin==0
brk <- c(0, (1:N)[indic], N+1)
M <- sum(indic)
marm <- matrix(NA, nr=M+1, nc=2^D)
#
# calculate occurrences of margins (marm) and summary vector (mar)
#
for(i in 0:M) {
tmar <- margin[(brk[i+1]+1):(brk[i+2]-1)]
thyp <- rep(0,D)
thyp[tmar] <- 1
marm[i+1, ] <- rep(1,D) %*% (1-(thyp >= bitm))
}
mar <- rep(1, M+1) %*% (marm==0)
#
# calculate parameter counts recursively
#
parv <- rep(0, 2^D)
for(i in 1:D) {
K <- (1:(2^D))[bits==i & mar > 0]
for(k in K) {
tsub <- rep(1,D) %*% (1-(bitm[,k] >= bitm))
parv[k] <- mpc[k] - sum(parv[tsub==0])
}
}
sum(parv)
}
!!!EOF!!!
echo "unpacking src/options"
cat > "src/options" << "!!!EOF!!!"
options <- function(..., Suffix, Perm=F, Rm.opt=F, Write=F, Names=F)
{
if(missing(Suffix))
elist <- ".Options"
else elist <- paste(".Options.", Suffix, sep="")
tlist <- get(elist)
if(exists(elist, where=1))
plist <- get(elist, where=1)
else if(Perm)
stop(paste("No permanent", elist,
"list in working directory."))
###
if(Write) {
if(missing(Suffix) & nargs() > 1)
stop(paste("Use \"Write=T\" without other arguments",
"to record current session", elist,
"list in working directory."))
else if(nargs() > 2)
stop(paste("Use `Write=T' and `Suffix=\"", Suffix,
"\"' without further arguments ",
"to record current session ", elist,
" list in working directory.", sep=""))
else {
assign(elist, tlist, where=1)
if(exists(elist, 1))
return(invisible(plist))
else return(invisible(NULL))
}
}
###
if(nargs()==0 | (nargs()==1 & !missing(Suffix)))
return(tlist)
if((nargs()==1 & Perm) | (nargs()==2 & Perm & !missing(Suffix)))
return(plist)
###
if(Perm) current <- plist
else current <- tlist
if(Names) return(names(current))
###
if(Rm.opt) {
temp <- c(...)
if(!is.character(temp))
stop("Give character names of options to remove.")
remove <- match(temp, names(current), nomatch = 0)
changed <- current[remove]
current <- current[-remove]
if(!Perm)
assign(elist, current, frame=0)
else {
assign(elist, current, where = 1)
assign(elist, tlist, frame=0)
}
return(invisible(changed))
}
###
temp <- list(...)
if(length(temp)==1 && is.null(names(temp))) {
arg <- temp[[1]]
switch(mode(arg),
list = temp <- arg,
character = return(current[arg]),
stop(paste("invalid argument:", arg)))
}
if(length(temp) == 0)
return()
n <- names(temp)
if(is.null(n))
stop("options must be given by name")
changed <- current[n]
current[n] <- temp
if(!Perm)
assign(elist, current, frame=0)
else {
assign(elist, current, where = 1)
assign(elist, tlist, frame=0)
}
invisible(changed)
}
!!!EOF!!!
echo "unpacking src/rm.na"
cat > "src/rm.na" << "!!!EOF!!!"
rm.na <- function(x, nachk="warn")
{
if(is.na(match(nachk, c("omit", "warn", "returnNA", "stop"))))
stop(
"nachk must be one of \"omit\", \"warn\", \"returnNA\", or \"stop\""
)
nax <- sum(is.na(x))
if(nax > 0) {
if(nachk=="stop")
stop(paste("x vector contains", nax, "NAs"))
else if(nachk=="returnNA")
return(nachk)
else if(nachk=="warn")
warning(paste("x vector contains", nax,
"NAs omitted from the computation"))
}
x[!is.na(x)]
}
!!!EOF!!!
echo "unpacking src/ttest"
cat > "src/ttest" << "!!!EOF!!!"
ttest <- function(
x, y, conf=c(.90,.95,.99), conf0, hyp=0,
tails=2, eqvar=T, nachk="warn", Table=T, Print=T
)
{
# Parameter checks
#
if(missing(x))
stop("Argument \"x\" must be given.")
if(is.array(x))
x <- as.vector(x)
if(missing(y))
type <- 1
else type <- 2
if(exists(".Options.Anova"))
topt <- .Options.Anova
else topt <- vector("list", 1)
if(missing(conf) & length(topt$conf) > 0)
conf <- topt$conf
if(missing(conf0)) {
if(length(topt$conf0)==1)
conf0 <- topt$conf0
else if(is.numeric(conf))
conf0 <- min(conf)
else conf0 <- .95
}
if(length(conf0) != 1) stop(
"Argument \"conf0\" should be a single confidence level, e.g., .95, or FALSE.")
if(missing(hyp) & length(topt$hyp) > 0)
hyp <- topt$hyp
if(missing(tails) & length(topt$tails)==1)
tails <- topt$tails
if(tails!=1 & tails!=2) stop(
"Argument \"tails\" must equal 1 or 2")
if(missing(eqvar) & length(topt$eqvar)==1)
eqvar <- topt$eqvar
if(eqvar!=T & eqvar!=F) stop(
"Argument \"eqvar\" must equal T or F.")
if(missing(nachk) & length(topt$nachk)==1)
nachk <- topt$nachk
if(missing(Table) & length(topt$Table)==1)
Table <- topt$Table
if(missing(Print) & length(topt$Print)==1)
Print <- topt$Print
if(Print) Table <- T
ci0 <- is.numeric(conf0)
ci1 <- is.numeric(conf)
hy1 <- is.numeric(hyp)
#
# Prepare obligatory return component
#
names1 <- c("mean", " se", " N")
if(ci0) names2 <- paste(c(" L", " U"), 100*conf0, "%", sep="")
else names2 <- NULL
if(type==1) {
NoCalc <- rep(NA, ifelse(ci0, 5, 3))
names(NoCalc) <- c(names1, names2)
}
if(type==2) {
NoCalc <- rep(NA, ifelse(ci0, 8, 6))
names(NoCalc) <- c(paste(rep(names1,2),
rep(1:2, rep(3,2)), sep=""), names2)
}
#
# Deal with NAs in x and y
#
x <- rm.na(x, nachk)
if(x[1]=="returnNA")
return(NoCalc)
if(length(x)==0)
stop("All values of x are NA.")
if(length(x)==1 & (type==1 | eqvar==F))
stop("Only 1 nonmissing value in x.")
if(length(x)==1)
warning("Only 1 nonmissing value in x.")
if(type==2) {
if(is.array(y))
y <- as.vector(y)
y <- rm.na(y, nachk)
if(y[1]=="returnNA")
return(NoCalc)
if(length(y)==0)
stop("All values of y are NA.")
if(length(y)==1 & eqvar==F)
stop("Only 1 nonmissing value in y.")
if(length(c(x,y))==2)
stop("Only 1 nonmissing value in each vector.")
if(length(y)==1)
warning("Only 1 nonmissing value in y.")
}
#
# Perform the basic calculations
#
v.xm <- mean(x)
v.xv <- var(x)
v.xn <- length(x)
v.xse <- sqrt(v.xv/v.xn)
if(type==2) {
v.ym <- mean(y)
v.yv <- var(y)
v.yn <- length(y)
v.yse <- sqrt(v.yv/v.yn)
v.diff <- v.xm - v.ym
if(eqvar) {
SSerr <- (v.xn-1)*v.xv + (v.yn-1)*v.yv
v.df <- v.xn + v.yn - 2
sd.rsid <- sqrt(SSerr/v.df)
v.se <- sd.rsid * sqrt((1/v.xn) + (1/v.yn))
SStot <- var(c(x,y))*(v.df + 1)
v.rsq <- (SStot - SSerr)/SStot
}
else {
tv <- v.xv/v.xn
tw <- v.yv/v.yn
tu <- tv + tw
v.se <- sqrt(tu)
v.df <- tu^2/(tv^2/(v.xn - 1) + tw^2/(v.yn - 1))
v.df <- round(d=2, v.df)
sd.rsid <- NA
v.rsq <- NA
}
}
else v.diff <- v.se <- v.df <- NULL
ctr <- ifelse(type==2, v.diff, v.xm)
v.se <- ifelse(type==2, v.se, v.xse)
v.df <- ifelse(type==2, v.df, v.xn-1)
#
# Confidence limits to be included in "Calc"
#
if(ci0) {
hw0 <- v.se * qt(.5 + conf0/2, v.df)
v.CI <- c(ctr - hw0, ctr + hw0)
}
else v.CI <- NULL
#
# Check for reasonable error estimate
#
if(ci0 | ci1 | hy1)
if(v.se==0 || v.xm/v.se > 1e10) {
if(ci0) v.CI <- c(NA, NA)
ci1 <- hy1 <- F
warning("Error estimate effectively 0; t-stat(s) and c.i.(s) not calculated.")
}
#
# Fill in values of "Calc" (unrounded statistics)
#
if(type==1)
Calc <- c(v.xm, v.xse, v.xn, v.CI)
else Calc <- c(v.xm, v.xse, v.xn, v.ym, v.yse, v.yn, v.CI)
names(Calc) <- names(NoCalc)
#
# Description for readable printout (if Table==T)
#
if(Table & type==1) {
Desc <- c(signif(d=5, v.xm), signif(d=4, v.xse), v.xn)
if(ci0) Desc <- c(Desc, signif(d=5, v.CI))
dim(Desc) <- c(1, ifelse(ci0, 5, 3))
xname <- paste(" ", deparse(substitute(x)), " ")
dimnames(Desc) <- list(xname, names(Calc))
}
if(Table & type==2) {
Dmat <- c(signif(d=5, v.xm), signif(d=4, v.xse), v.xn-1,
signif(d=5, v.ym), signif(d=4, v.yse), v.yn-1,
signif(d=5, v.diff), signif(d=4, v.se), v.df)
Dmat <- matrix(Dmat, byr=T, nr=3)
xname <- paste(" ", deparse(substitute(x)), " ")
yname <- paste(" ", deparse(substitute(y)), " ")
names3 <- c("mean", " se", " df")
dimnames(Dmat) <- list(c(xname, yname, "diff "), names3)
Dvec <- c(round(d=3, v.rsq), signif(d=5, sd.rsid),
signif(d=5, v.CI))
names(Dvec) <- c("rsq", " sd.resid", names2)
if(eqvar)
Comment <- " Equality of variance assumed."
else Comment <- " Unequal variance assumed; rsq & sd.resid omitted."
Desc <- list(Dmat, Dvec, Comment)
names(Desc) <- c("M/SE/DF", "RSQ/SD.RESID/CI LIMITS",
"ASSUMPTION")
}
if(!Table) Desc <- NULL
#
# Confidence Intervals (if conf is numeric)
#
if(ci1) {
hw <- v.se * qt(.5 + conf/2, v.df)
CI <- signif(d=5, matrix(c(ctr - hw, ctr + hw), nc=2))
dimnames(CI) <- list(paste(" ", conf, "Interval "),
c("Lower limit", " Upper limit"))
}
else CI <- NULL
#
# Hypothesis tests (if hyp is numeric)
#
if(hy1) {
v.t <- (ctr - hyp)/v.se
v.p <- (1 - pt(abs(v.t), v.df)) * tails
Test <- c(hyp, round(v.t, 3), signif(v.p, 2))
dim(Test) <- c(length(hyp), 3)
dimnames(Test) <- list(rep(" ", length(hyp)),
c("hypothesis",
paste(" t(", v.df, ")", sep=""),
paste(" Pr(Type I error) ", tails, "-tailed", sep="")))
}
else Test <- NULL
#
# Value to be returned
#
val <- list(Calc=Calc, Desc=Desc, CI=CI, Test=Test)
val <- val[c(T, Table, ci1, hy1)]
if(length(val)==1) val <- Calc
#
# Print results if requested
#
if(Print) {
print(Desc)
cat("\n")
if(ci1) print(CI)
cat("\n")
if(hy1) print(Test)
return(invisible(val))
}
else return(val)
}
!!!EOF!!!