info 0 MACRO
) This file contains MacAnova macros used in multivaratie analysis
)
) Many use features available only in MacAnova4.10 or later
)
)) File version 000905
)) All macros are the "new" format (no line count, macro body terminated
)) by %macroname%)
) It contains the following macros.
) & = may be pre-defined and need not be read in
) * Not meaningful for all versions of MacAnova
) + Obsolete
) Covariance matrix related
) covar distcomp groupcovar oldcovar +
) oldgroupcov +
) Factor analysis related macros
) goodfit stepuls stepgls stepml
) Discriminant analysis related macros
) discrim jackknife dastepsetup dastepstatus
) daentervar daremovevar dalook
) oldjackknife + compf + forstep + backstep +
) Miscellaneous data analysis macros
) standardize
) Plotting macros
) chiqqplot
) Other miscellaneous macros
) mvngen mathhelp
===> covar <===
covar MACRO DOLLARS
) Macro to compute sample covariance matrix and mean vector
) Usage:
) d <- covar(x), x a REAL matrix with no MISSING values
) Value is structure with components 'n' (scalar), 'mean' (row vector),
) and 'covariance' (matrix)
)) Version of 000531, uses tabs(y,covar:T,n:T,mean:T)
# usage: d <- $S(x), where x is a REAL matrix with no MISSING values
@R <- tabs(argvalue($1,"argument to $S","real matrix nonmissing"),\
mean:T,count:T,covar:T)
@R <- structure(n:@R$count[1], mean:@R$mean', covariance:@R$covar)
delete(@R, return:T)
%covar%
===> groupcovar <===
groupcovar MACRO DOLLARS
) macro to compute group means and pooled sample covariance matrix
) usage: groupcovar(groups, y)
) groups a factor or vector of positive integers
) y a REAL matrix with nrows(y) = length(groups)
) Result:
) structure(n:N, means:ybar, covariance:V)
) N is an integer vector of group sizes, length(N) = g =
) max(groups)
) ybar is a g by ncols(y) matrix of group mean vectors, with
) MISSING values for any empty groups
) V is the pooled variance-covariance matrix
)) 000601 new version using tabs(y,means:T,count:T,covar:T)
@groups <- argvalue($1, "group numbers", "positive integer vector")
@n <- tabs(,@groups)
if (max(@n) < 2){
error("no groups have 2 or more cases")
}
@R <- tabs(argvalue($2,"argument to $S","real matrix nonmissing"),\
delete(@groups, return:T), mean:T, covar:T)
@ng <- length(@n)
@cov <- 0
for(@i,1,@ng){
if (@n[@i] > 1){
@cov <-+ (@n[@i]-1)*matrix(@R$covar[@i,,])
}
}
@cov <-/ sum(@n[@n > 0] - 1)
delete(@ng,@i)
@R <- structure(n:delete(@n,return:T),means:@R$mean, \
covariance:delete(@cov,return:T))
delete(@R,return:T)
%groupcovar%
===> distcomp <===
distcomp MACRO DOLLARS
) Macro to compute vector of generalized distances
) Usage:
) d <- distcomp(y), y a REAL matrix with no MISSING values
) value is a vector d of length nrows(y)
) d[i] is (yi - ybar)' %*% solve(s) %*%(yi - ybar),
) where yi' is the row i of y and s is the sample covariance
) matrix of the data y. s is computed using macro covar.
)) Version of 000826, uses tabs() instead of macro covar
# usage: d <- $S(y) where y is a REAL matrix with no MISSING values
@y <- matrix(argvalue($1, "data matrix", "real matrix nonmissing"))
@tmp <- tabs(@y, covar:T, mean:T)
@y <- @y' - @tmp$mean # now p by n
@s <- @tmp$covar # p by p
delete(@tmp)
@d <- vector(sum(@y * solve(delete(@s,return:T), @y)))
delete(@y)
delete(@d, return:T)
%distcomp%
===> chiqqplot <===
chiqqplot MACRO DOLLARS
) Macro to make a chisquare or sqrt(chisquare) QQ plot of generalized
) distance of residuals from 0
) usage:
) chiqqplot(dsq, df [,sqrt:T] [, symbol:sym])
) dsq is a REAL vector or matrix of squared distances >= 0
) df is a scalar > 0
) or
) chiqqplot(y [,sqrt:T] [, symbol:sym])
) y is a REAL data matrix with no MISSING values from which a vector
) dsq of squared generalized distances from the mean vector is
) is and df = ncols(y).
) Ordered values of dsq are plotted against equally spaced quantiles
) of chi-squared on df degrees of freedom. When dsq is a matrix, each
) column is plotted
)
) With symbol:0, when ncols(d) = 1, the symbols are the case numbers, and
) when ncols(d) > 1, the symbols are the column numbers. Otherwise,
) symbol:sym works as for plot().
)
) With 'sqrt:T', sqrt(dsq) is plotted against sqrt(chi-squared quantiles)
)
) The usual graphics keywords such as 'xlab', 'ylab' and 'title' may also
) be used
))version 000601
#$S(d, p [, sqrt:T] [,symbols:T] [,graphics keyword phrases])
if ($v > 2){
error("usage: $S(y [keyword phrases]) or $S(d, df [keyword phrases]")
}
if ($v == 1){ # need to compute vector of distances
if (!ismacro(covar)){
getmacros(covar,quiet:T)
}
if (!ismacro(distcomp)){
getmacros(distcomp,quiet:T)
}
@d <- matrix(argvalue($1,"data","nonmissing real matrix"))
@df <- ncols(@d)
@d <- distcomp(@d)
}else{ # arg 1 is matrix of distance; arg2 is d.f.
@d <- matrix(argvalue($1,"distances","nonmissing real matrix"))
@df <- argvalue($2,"degrees of freedom","positive number")
}
@sqrt <- keyvalue($K,"sqrt","TF",default:F)
@n <- nrows(@d)
@m <- ncols(@d)
@J <- grade(@d)
@ch <- vector("\1","\2","\3","\4","\5","\6","\7","\10")[1 + run(0,@m-1)%%8]
@symbols <- keyvalue($K,"symbol*","matrix",default:@ch)
if (!anytrue(ischar(@symbols),\
alltrue(isreal(@symbols), sum(vector(@symbols!=floor(@symbols))) == 0))){
error("Value of 'symbols' not CHARACTER or integer")
}
if (alltrue(isscalar(@symbols,real:T),@symbols == 0)){
@symbols <- if (@m == 1){
@J
}else{
run(@m)'
}
}
@d <- sort(@d)
@vals <- invchi((run(@n) - .5)/@n,@df)
if (@sqrt){
@d <- sqrt(@d)
@vals <- sqrt(@vals)
@what <- "Chi"
@ylab <- "Distances"
}else{
@what <- "Chi-squared"
@ylab <- "Squared distances"
}
@xlab <- paste(@what,"(",@df,") probability points",sep:"")
@title <- paste(@what, "Q-Q plot of $1")
plot(delete(@vals,return:T), delete(@d,return:T),\
symbols:delete(@symbols,return:T), $K,\
xlab:delete(@xlab,return:T),ylab:delete(@ylab,return:T),\
title:delete(@title,return:T),xmin:0, ymin:0)
%chiqqplot%
===> mvngen <===
mvngen macro dollars
) Macro to compute multivariate normal sample
) Usage:
) mvngen(n , p) generates MVN(0, I_p) sample
) n is sample size and must be a positive integer
) p is dimension and is a positive integer1
) mvngen(n, sigma) generates MVN(0, sigma) sample
) covariance matrix sigma positive definite symmetric, nrows(sigma)>1
) mvngen(n, p or sigma, mu) generates MVN(mu, I_p or sigma) sample
) mu a REAL vector with length(mu) = nrows(sigma)
)
) Use mu + sigma*rnorm(n) to generate a univariate sample
)
)) Version 991120, C. Bingham, kb@stat.umn.edu
#$S(n,sigma [,mu])
@n <- argvalue($1,"sample size","positive count")
@sigma <- argvalue($2,"covariance matrix","nonmissing real square")
if (isscalar(@sigma)){
@p <- argvalue(@sigma,"dimension","positive count")
@sigma <- NULL
}else{
if(max(abs(vector(@sigma-@sigma')/vector(@sigma))) > 1e-12){
error("Covariance matrix not symmetric")
}
@sigma <- cholesky(@sigma, nonposok:T)
if (isnull(@sigma)){
error("covariance not positive definite")
}
@p <- nrows(@sigma)
}
@mu <- if ($v > 2){
argvalue($3,"mean vector","nonmissing real vector")
}else{
rep(0,@p)
}
if(length(@mu) != @p){
error("mean vector length != nrows(sigma)")
}
@y <- matrix(rnorm(@n*@p),@n)
if (!isnull(@sigma)){
@y <- @y %*% @sigma
}
delete(@n,@sigma,@p)
delete(@y,return:T) + delete(@mu, return:T)'
%mvngen%
===> discrim <===
discrim MACRO DOLLARS
) Macro to compute coefficients and offsets for multi-group linear
) classification, assuming Multivariate normal populations with equal
) variances matrices.
) Usage:
) discrim(groups, y)
) groups a vector of positive integers defining groups membership
) y a N by p REAL matrix with no MISSING values, N = nrows(groups)
) Result:
) structure(coefs:L, addcon:C), L p by g matrix, C 1 by g vector
) where linear discriminant function for group j is
) y' %*% L[,j] + C[j]
)
)) 000825 uses tabs() instead of macro covar
# usage: $S(groups, y), N by 1 vector groups, N by p matrix y
if($v != 2){
error("usage is $S(groups, y)",macroname:F)
}
@groups <- argvalue($1,"group numbers","positive integer vector")
@y <- argvalue($2, "data", "nonmissing matrix")
@g <- max(@groups) # number of groups
@p <- ncols(@y) # number of variables
@n <- nrows(@y) # number of cases
if (length(@groups) != @n){
error("length of group index differs from number of cases")
}
if (@n <= @p){
error("number of cases not larger than dimension")
}
@varlabs <- if (haslabels(@y)){
getlabels(@y,2)
}else{
"@"
}
@facname <- "$1"
if (!isname(@facname) || match("@*",@facname,0,exact:F) != 0){
@facname <- "Group "
}
@ybar <- matrix(rep(0,@p*@g),@p,labels:structure(@varlabs,@facname))
@spooled <- matrix(rep(0,@p*@p),@p,labels:structure(@varlabs,@varlabs))
@stuff <- tabs(@y,@groups,covar:T,count:T,mean:T)
@N <- vector(@stuff$count[,1])
@ybar <- matrix(@stuff$mean',labels:structure(@varlabs,@facname))
@spooled <-\
matrix(sum((@N-1)*matrix(@stuff$covar,length(@N))),@p,\
labels:structure(@varlabs,@varlabs))/sum(@N-1)
delete(@stuff,@N,@y,@p)
@coefs <- solve(delete(@spooled, return:T), @ybar) # p x g
@addcon <- -0.5*sum(@coefs * delete(@ybar, return:T)) # 1 x g
structure(coefs:delete(@coefs,return:T), addcon:delete(@addcon,return:T))
%discrim%
====> jackknife <====
jackknife MACRO dollars
) Macro to do jackknife validation of linear discrimination
) probs <- jackknife(groups,y [,prior:P])
) groups is a vector of positive integers defining group membership
) all groups must have at least two cases
) y is a n by p REAL data matrix with n = length(groups)
) P is a REAL positive vector of length g = max(groups), default rep(1,g)
)
) probs is a n by g+1 matrix
) Column j of probs, 1 <= j <= g contains the estimated posterior
) probabilities of group j for each case
) Column g+1 of probs contains the leave-one-out classification of each case
)
)) 991120 use down dating formula for computing coefficients
)) Version of 991120
# usage: $S(groups, y [,prior:P]), factor groups, matrix y, positive vector P
if ($v != 2 || $k > 1){
error("Usage: $S(groups, y [,prior:P]), factor groups, matrix y, positive vector P",\
macroname:F)
}
@groups <- argvalue($1, "group numbers","positive integer vector")
@y <- argvalue($2, "data","nonmissing matrix")
@g <- max(@groups)
@P <- keyvalue($K,"prior","positive vector", default:rep(1,@g))
if (length(@P) != @g){
error("number of prior probabilities != number of groups")
}
@n <- tabs(,@groups)
if(min(@n) < 2){
error("min(group sizes) < 2")
}
@N <- nrows(@y)
@p <- ncols(@y)
@fe1 <- @N - 1 - @g
#compute mean vectors and pooled error matrix
@ybar <- matrix(rep(0,@p*@g),@p)#,labels:structure(@varlabs,@facname))
@E <- matrix(rep(0,@p*@p),@p)#,labels:structure(@varlabs,@varlabs))
for(@i,run(@g)){
@yi <- @y[@groups == @i,]
@ybar1 <- sum(@yi)/@n[@i]
@ybar[,@i] <- @ybar1
@yi <-- @ybar1
@E <-+ @yi %c% @yi
}
@n1 <- @n - 1
@nn1 <- (@n*@n1) # sqrt(n(n-1))
@probs <- matrix(rep(0,@N*@g),@g) #g by N
@coefs <- solve(delete(@E, return:T),hconcat(@ybar,dmat(@p,1)))
@Einv <- @coefs[,-run(@g)] # p by p
@coefs <- @coefs[,run(@g)] # p by g
for(@i,run(@N)){
@ybar1 <- @ybar
@j <- @groups[@i]
@yi <- @y[@i,]
@d <- (@yi' - @ybar[,@j])/@n1[@j]
@ybar1[,@j] <- @ybar[,@j] - @d
@tmp <- @Einv %*% @d
@c <- 1 - @nn1[@j] * @tmp %c% @d
@coefs1 <- @coefs + @nn1[@j] * @tmp * (@d %c% @coefs)/@c
@coefs1[,@j] <- @coefs1[,@j] - @tmp/@c
@probs[,@i] <- @fe1*vector(@yi %*% @coefs1 - .5*sum(@ybar1 * @coefs1))
}
@probs <- @P * exp(@probs - @probs[1,])
@probs <- vconcat(@probs/sum(@probs),grade(@probs)[@g,])'
delete(@groups, @y, @n, @g, @N, @n1, @nn1, @fe1, @tmp, @i, @j,\
@Einv,@P,@ybar,@coefs,@ybar1,@coefs1)
delete(@probs, return:T)
%jackknife%
discrimquad macro dollars
) Macro to compute coefficients for quadratic discrimination,
) assuming multivariate normal.
)
) Usage: discrimquad(groups, y)
) where groups is a vector of positive integers defining populations
) and y is a n by p data matrix with no missing data and n = nrows(groups)
) It is an error if the smallest group has p or fewer members
)
) Value is structure(Q:q, L:l, addcon:c, grandmean:ybar)
) where q is a structure of g p by p REAL matrices,
) l is a structure of g length p REAL vectors,
) c is a vector of length g.
) ybar is a vector of length p
)
) If x is a vector of length p to be classified, the quadratic score
) for group j is
) qs[j] = (x-ybar)' %*% q[[,j] %*% (x-ybar) + (x-ybar)' %*% l[,j] + c[j]
) and posterior probabilities (assuming equal prior) are exp(qs)/sum(exp(qs)).
)
)) Macro written 991119 by C. Bingham, kb@stat.umn.edu
)) Version of 000826 uses tabs() instead of covar
#$S(groups,y)
@groups <- argvalue($1,"group index","positive integer vector")
@y <- argvalue($2, "data", "nonmissing matrix")
@p <- ncols(@y)
@g <- max(@groups)
@N <- nrows(@y)
if (length(@groups) != @N){
error("length of group index differs from number of cases")
}
@stats <- tabs(@y, @groups,count:T,mean:T,covar:T)
@n <- vector(@stats$count[,1]) # group sizes
@means <- @stats$mean # row i is mean of group i
@covars <- @stats$covar # @covars[i,,] is s for group i
delete(@stats)
if (min(@n) <= @p){
error("All groups must have at least p+1 cases")
}
@varlabs <- if (haslabels(@y)){
getlabels(@y,2)
}else{
"@"
}
delete(@y)
@facname <- "$1"
if (!isname(@facname) || match("@*",@facname,0,exact:F) != 0){
@facname <- "Group"
}
@facname <- paste(@facname," ",sep:"")
@grandmean <- vector(sum(@n*@means)/@N,labels:@varlabs)
#make structures and row vector to fill
@l <- @q <- strconcat(split(rep(0,@g)'),labels:@facname)
@addcon <- matrix(rep(0,@g)',labels:structure("@",@facname))
for(@i,run(@g)){
@s <- matrix(@covars[@i,,],labels:structure(@varlabs,@varlabs))
@offset <- vector(@means[@i,] - @grandmean');#ll(@s,@ll,@q,@addcon,@offset)
@q[@i] <- -.5*solve(@s)
@l[@i] <- vector(solve(@s,@offset),labels:@varlabs)
@addcon[@i] <- -.5*@l[@i] %c% @offset - .5*log(det(@s))
}
delete(@groups,@p,@g,@i,@N,@varlabs,@facname,@means,@n)
structure(Q:delete(@q,return:T),L:delete(@l,return:T),\
addcon:delete(@addcon,return:T),grandmean:delete(@grandmean,return:T))
%discrimquad%
probsquad macro dollars
) Macro to compute posterior probabilities based on a quadratic
) discriminant functions with coefficients computed by macro discrimquad
)
) Usage:
) probsquad(x,structure(Q,L,addcon,grandmean) [, prior:P])
)
) The structure is as returned by macro discrimquad.
) Q is a structure of g square matrices
) L is a structure of g vectors of length p
) addcon is a 1 by g row vector
) grandmean is a vector of length p
)
) x must be a REAL matrix or vector with no missing values
) P is a positive REAL vector proportional to prior probabilities,
) with default rep(1,g), where g = ncomps(Q)
)
) If x is a vector and p = length(grandmean) > 1, x must have length p
) Otherwise x must be a matrix with p columns and probabilities are
) for each row of x.
)
) The value is a matrix of g columns with one row for each vector
) classified.
)) Version 991119, C. Bingham, kb@stat.umn.edu
#$S(x,structure(Q,L,addcon,grandmean) [,prior:p])
@x <- argvalue($1,"data","real matrix nonmissing")
@stuff <- argvalue($2,"quadratic disciminant info","structure")
if (!isstruc(@stuff[1]) || !isstruc(@stuff[2]) ||\
!isvector(@stuff[3]',real:T) ||\
!isvector(@stuff[4],real:T)){
error("Argument 2 to $S has wrong form")
}
@p <- nrows(@stuff[1][1]) # dimension
@g <- ncomps(@stuff[1]) # number of groups
@P <- keyvalue($K,"prior","positive vector",default:rep(1,@g))
if (!isvector(@x) && ncols(@x) != @p ||\
isvector(@x) && @p > 1 && length(@x) != @p){
error("Wrong number of variables in data")
}
if(length(@P) != @g){
error("Wrong length prior probability vector")
}
@n <- if(!isvector(@x) || @p == 1){
nrows(@x)
}else{
@x <- vector(@x)'
1
}
@x <- @x' - @stuff$grandmean #transpose x and center it
# The following is arcane, but it takes advantage of MacAnova's ability
# to do matrix arithmetic with structures of matrices
# @stuff$Q and @stuff$L are both structures with g components
# @stuff$addcon is a 1 by g matrix (row vector)
@probs <- matrix(vector(sum(@x * (@stuff$Q %*% @x + @stuff$L))),@n) +\
@stuff$addcon
if (haslabels(@stuff$Q)){
@probs <- matrix(@probs,labels:structure("@",getlabels(@stuff$Q)))
}
delete(@stuff,@x,@p,@g,@n)
@probs <- @P'*exp(@probs - @probs[,1])
@probs <- @probs/sum(@probs')'
delete(@P)
delete(@probs,return:T)
%probsquad%
===> hotellval <===
hotellval MACRO DOLLARS
# $S(x [, pval:T])
@x <- matrix(argvalue($1,"x","real matrix nonmissing"))
@pval <- keyvalue($K,"pval*","TF",default:F)
@n <- nrows(@x)
@p <- ncols(@x)
if (@n <= @p){
error("nrows(x) <= ncols(x)")
}
@xbar <- sum(@x)/@n
@x <-- @xbar
@result <- @n * (@n - 1) * @xbar %*% solve(@x %c% @x,@xbar')
if (@pval){
@fe <- @n - 1
@fe1 <- @fe - @p + 1
@P <- 1 - cumF(@fe1*@result/(@fe*@p),@p-1,@fe1)
@result <- strconcat(hotelling:@result, pvalue:delete(@P,return:T))
delete(@fe, @fe1)
}
delete(@x,@n,@p,@xbar,@pval)
delete(@result,return:T)
%hotellval%
===> hotell2val <===
hotell2val MACRO DOLLARS
# $S(x1, x2 [, pval:T])
@x1 <- argvalue($1,"x1","real matrix nonmissing")
@x2 <- argvalue($2,"x2","real matrix nonmissing")
@pval <- keyvalue($K,"pval*","TF",default:F)
@n1 <- nrows(@x1)
@n2 <- nrows(@x2)
@p <- ncols(@x1)
if (ncols(@x2) != @p){
error("ncols(x1) != ncols(x2)")
}
@fe <- @n1 + @n2 - 2
if (@fe <= @p - 1){
error("n1 + n2 < ncols(x1) + 1")
}
@xbar1 <- describe(@x1,mean:T)
@xbar2 <- describe(@x2,mean:T)
@d <- @xbar1 - @xbar2
@x1 <-- @xbar1'
@x2 <-- @xbar2'
@vhat <- (1/@n1 + 1/@n2)*(@x1 %c% @x1 + @x2 %c% @x2)/@fe
delete(@x1,@x2,@xbar1,@xbar2,@n1,@n2)
@result <- @d %c% solve(@vhat, @d)
if (@pval){
@P <- 1 - cumF((@fe - @p + 1)*@result/(@fe*@p),@p-1,@fe-@p+1)
@result <- strconcat(hotelling:@result,pvalue:delete(@P,return:T))
}
delete(@pval,@p,@vhat,@d)
delete(@result,return:T)
%hotell2val%
===> dastepsetup <===
dastepsetup MACRO DOLLARS
) Setup stepwise discriminant
) Usage:
) dastepsetup(Model [,allin:T or in:logvec] [,silent:T]), where
) Model is a glm model, usually of the form "y = groups"
) dastepsetup([allin:T] [,silent:T]), with no model, is equivalent
) to dastepsetup(STRMODEL [,allin:T] [,silent:T])
)
) dastepsetup initializes an invisible structure _DASTEPSTATE and,
) except with silent:T, prints F-to-enter with P-values
)
) With allin:T, stepsetup starts with all dependent variables "in".
) With in:logvec, logvec must be a LOGICAL vector of length ncols(y)
) with logvec[j] True if and only if y[,j] is "in".
) allin:T and in:rep(T,ncols(y)) have the same effect.
)
)) Version 000530
#$S(model [,silent:T] [,in:logVector or allin:T])
@model <- if ($v > 0){
$01
}else{
NULL
}
@model <- if (!isnull(@model)){
argvalue(@model,"model","string")
}elseif(isscalar(STRMODEL,char:T)){
STRMODEL
}else{
error("No model provided and STRMODEL does not exist")
}
if(match("*=*",@model, 0, exact:F) == 0){
error(paste("Illegal model \"",@model,"\"",sep:""))
}
@allin <- keyvalue($K,"allin","TF",default:F)
@instart <- keyvalue($K,"in","logical vector nonmissing")
if (@allin && !isnull(@instart)){
error("keyword 'in' illegal with 'allin:T'")
}
@nocon <- match("*-1",@model,0, exact:F) != 0 ||\
match("*- 1",@model,0, exact:F) != 0
manova(@model,silent:T) #compute side effect variables
@nvars <- dim(SS)[2]
if (!isnull(@instart)){
if(length(@instart) != @nvars){
error("Length of 'instart' != number of variables")
}
@allin <- (sum(@instart) == @nvars)
}else{
@instart <- rep(@allin,@nvars)
}
@nterms <- length(DF) - 1
@fe <- DF[@nterms+1]
@fh <- sum(DF[run(2,@nterms)])
@e <- matrix(SS[@nterms+1,,])
@hpluse <- matrix(sum(SS[-1,,]))
if (@nocon){
@fh <-+ DF[1]
@hpluse <-+ matrix(SS[1,,])
}
if(haslabels(SS)){
@varnames <- getlabels(SS,2)
if(!isname(@varnames[1])){
delete(@varnames)
}
}
if (!isvector(@varnames)){
@varnames <- getlabels(vector(rep(0,@nvars),labels:"X"))
}
@Fs <- matrix(rep(0, 2*@nvars), 2, labels:structure(vector("F", "P"),\
@varnames))
if (!@allin){
if (sum(@instart) > 0){
@hpluse <- swp(@hpluse, run(@nvars)[@instart])
@e <- swp(@e, run(@nvars)[@instart])
@fe <-- sum(@instart)
}
}else{
@hpluse <- solve(@hpluse)
@e <- solve(@e)
@fe <-- @nvars
}
@history <- if(sum(@instart) > 0){
run(@nvars)[@instart]
}else{
NULL
}
setlabels(@hpluse, structure(@varnames,@varnames))
setlabels(@e, structure(@varnames,@varnames))
@fe <- vector(@fe,@fe+1,labels:vector("enter", "remove"))
_DASTEPSTATE <- structure(model:delete(@model,return:T),\
hpluse:delete(@hpluse,return:T),\
e:delete(@e,return:T),\
in:vector(delete(@instart,return:T),labels:@varnames),\
F:delete(@Fs,return:T), fh:delete(@fh,return:T), fe:delete(@fe,return:T),\
history:delete(@history,return:T) )
delete(@nocon,@nvars,@varnames,@allin)
if(!ismacro(dastepstatus)){
getmacros(dastepstatus,quiet:T)
}
dastepstatus($K)
%dastepsetup%
===> dasteplook <===
dasteplook MACRO DOLLARS
) Macro to acess elements of _DASTEPSTATE
) Usage:
) dasteplook(name1, name2, ...) names selected from 'model', 'hpluse',
) 'e', 'in', 'F', 'fh', 'fe', and 'history'
) These names are the component names of _DASTEPSTATE
)
) When there is one argument, steplook returns a scalar, vector or matrix
)
) dasteplook(all) returns all of _DASTEPSTATE
) Otherwise, step look returns a structure.
)) Version 000826
# $(name1, name2, ...) names model, sscp, in, F, dfe, history
if ($k > 0){
error("no keywords allowed as arguments")
}
@names <- $A
for(@i,run($v)){
# strip off double quotes, if any
if (match("\"*\"",@names[@i], 0,exact:F) != 0){
@names[@i] <- <<@names[@i]>>
}
}
@all <- match("all",@names,0) != 0
if(delete(@all,return:T)){
if ($v > 1){
error("component names can't be specified with 'all'")
}
delete(@names,@i)
@result <- _DASTEPSTATE
} else {
@compnames <- compnames(_DASTEPSTATE)
@ncomps <- length(@names)
@result <- split(rep(0,@ncomps)',compnames:@names)
for(@i,1,@ncomps){
@k <- match(@names[@i],@compnames,0)
if(@k == 0){
error(paste("\"",@names[@i],"\" not a legal name",sep:""))
}
@result[@i] <- _DASTEPSTATE[@k]
}
delete(@compnames,@names,@i,@k)
if (delete(@ncomps,return:T) == 1){
@result <- @result[1]
}
}
delete(@result,return:T)
%dasteplook%
===> dastepstatus <===
dastepstatus MACRO DOLLARS
) Macro to report status of current stage of stepwise discriminant
) analysis
) Usage:
) dastepstatus([silent:T])
)) Version 000530
# $S([silent:T])
@silent <- keyvalue($K,"silent","TF",default:F)
@hpluse <- _DASTEPSTATE$hpluse
@e <- _DASTEPSTATE$e
@nvars <- nrows(@e)
@Fs <- _DASTEPSTATE$F
@in <- _DASTEPSTATE$in
@out <- !@in
@u <- diag(@hpluse)
@v <- diag(@e)
@fh <- _DASTEPSTATE$fh
@fe <- _DASTEPSTATE$fe
if (sum(@out) > 0){
@Fs[1,@out] <- (@fe[1]/@fh)*(@u[@out]/@v[@out] - 1)
@Fs[2,@out] <- 1 - cumF(@Fs[1,@out],@fh,@fe[1])
}
if (sum(@in) > 0){
@Fs[1,@in] <- ((@fe[2])/@fh)*(@v[@in]/@u[@in] - 1)
@Fs[2,@in] <- 1 - cumF(@Fs[1,@in],@fh,@fe[2])
}
@ii <- match("F",compnames(_DASTEPSTATE))
_DASTEPSTATE[@ii] <- delete(@Fs,return:T)
delete(@ii)
if (!delete(@silent,return:T)){
@nin <- sum(@in)
print(paste("Model: \"",_DASTEPSTATE$model,"\"",sep:""))
if(@nin > 0) {
print(paste("F(",@fh, ",",@fe[2],") to delete",sep:""))
print(_DASTEPSTATE$F[, @in], header:F)
}else{
print("No variables are \"in\"")
}
print("")
if(@nin < @nvars) {
print(paste("F(",@fh,",",@fe[1],") to enter",sep:""))
print(_DASTEPSTATE$F[, !@in], header:F)
}else{
print("All variables are \"in\"")
}
print("")
delete(@nin)
}
delete(@fe,@fh,@in,@nvars)
_DASTEPSTATE # invisible so it won't print but can be assigned
%dastepstatus%
===> daentervar <===
daentervar MACRO DOLLARS
) Macro to take one or more steps forward, entering one or more
) response variables into the current stepwise discriminant
) analysis
) Usage:
) daentervar(var1 [,var2 ...] [,silent:T]), var1, var2 ... names
) or numbers of variables to be entered
)) Version 000530
# $S(var1 [,var2 ...] [,silent:T]), varj a variable name or number
if ($v == 0 || $k > 1){
error("usage: $S(var1 [,var2 ...] [,silent:T])", macroname:F)
}
if (!isstruc(_DASTEPSTATE)){
error("You must run dastepsetup(Model) before using $S",macroname:F)
}
@varnames <- getlabels(_DASTEPSTATE$in)
@nvars <- length(@varnames)
@vars <- $A
if(!ismacro(dastepstatus)){
getmacros(dastepstatus,quiet:T)
}
for (@j, 1, length(@vars)){
@var <- @vars[@j]
if(match("*:*",@var,0,exact:F) != 0){
next
}
# not a keyword phrase
if (match("\"*\"",@var,0,exact:F) != 0){
@var <- <<@var>> #strip off quotes if present
}
if (!isname(@var)){
@newvar <- vecread(string:@var,byfields:T,badvalue:-.5)
if (!isscalar(@newvar) || @newvar[1] <= 0 || @newvar[1] > @nvars ||\
@newvar[1] != ceiling(@newvar[1])){
error(paste(@var,"not positive integer <= number of variables"))
}
@var <- @varnames[@newvar]
}else{
@newvar <- match(@var, @varnames, 0)
if(@newvar == 0) {
error(paste("Unrecognized variable \"",@var,"\"",sep:""))
}
}
if (_DASTEPSTATE$in[@newvar]){
error(paste("Variable \"",@var,"\" already 'in'", sep:""))
}
delete(@var)
@ii <- match("hpluse",compnames(_DASTEPSTATE))
_DASTEPSTATE[@ii] <- swp(_DASTEPSTATE$hpluse, @newvar)
@ii <- match("e",compnames(_DASTEPSTATE))
_DASTEPSTATE[@ii] <- swp(_DASTEPSTATE$e, @newvar)
@in <- _DASTEPSTATE$in
@in[@newvar] <- T
@ii <- match("in",compnames(_DASTEPSTATE))
_DASTEPSTATE[@ii] <- @in
@ii <- match("fe",compnames(_DASTEPSTATE))
_DASTEPSTATE[@ii] <- _DASTEPSTATE$fe - 1
@ii <- match("history",compnames(_DASTEPSTATE))
_DASTEPSTATE[@ii] <- vector(_DASTEPSTATE$history,@newvar)
delete(@ii,@newvar,@in)
dastepstatus($K)
} #for (@j, 1, length(@vars))
delete(@vars, @j, @varnames)
_DASTEPSTATE
%daentervar%
===> daremovevar <===
daremovevar MACRO DOLLARS
) Macro to take one or more steps backward, removing a variable or
) several variables from the current stepwise disciminant analysis
) Usage:
) daremovevar(var1 [,var2 ...] [,silent:T]), var1, var2 ... names
) or numbers of variables to be removed
) Adapted from a macro by Gary Oehlert
)) Version 000530
# $S(var1 [,var2 ...] [,silent:T]), varj a variable name or number
if ($v == 0 || $k > 1){
error("usage is $S(var1 [,var2 ...] [,silent:T])")
}
if (!isstruc(_DASTEPSTATE)){
error("You must run dastepsetup(Model) before using $S",macroname:F)
}
@nvars <- nrows(_DASTEPSTATE$e)
@varnames <- getlabels(_DASTEPSTATE$in)
@vars <- $A
if(!ismacro(dastepstatus)){
getmacros(dastepstatus,quiet:T)
}
for (@j, 1, length(@vars)){
@var <- @vars[@j]
if(match("*:*",@var,0,exact:F) != 0){
next
}
# not a keyword phrase
@outvar <- match("$1", @varnames, 0)
if (match("\"*\"",@var,0,exact:F) != 0){
@var <- <<@var>> #strip off quotes if present
}
if (!isname(@var)){
@outvar <- vecread(string:@var,byfields:T,badvalue:-.5)
if (!isscalar(@outvar) || @outvar[1] <= 0 || @outvar[1] > @nvars ||\
@outvar[1] != ceiling(@outvar[1])){
error(paste(@var,"not positive integer <= number of variables"))
}
@var <- @varnames[@outvar]
}else{
@outvar <- match(@var, @varnames, 0)
if(@outvar == 0) {
error(paste("Unrecognized variable \"",@var,"\"",sep:""))
}
}
if (!_DASTEPSTATE$in[@outvar]){
error(paste("Variable \"",@var,"\" not currently 'in'",sep:""))
}
delete(@var)
@ii <- match("hpluse",compnames(_DASTEPSTATE))
_DASTEPSTATE[@ii] <- swp(_DASTEPSTATE$hpluse, @outvar)
@ii <- match("e",compnames(_DASTEPSTATE))
_DASTEPSTATE[@ii] <- swp(_DASTEPSTATE$e, @outvar)
@in <- _DASTEPSTATE$in
@in[@outvar] <- F
@ii <- match("in",compnames(_DASTEPSTATE))
_DASTEPSTATE[@ii] <- @in
@ii <- match("fe",compnames(_DASTEPSTATE))
_DASTEPSTATE[@ii] <- _DASTEPSTATE$fe + 1
@ii <- match("history",compnames(_DASTEPSTATE))
_DASTEPSTATE[@ii] <- vector(_DASTEPSTATE$history,-@outvar)
dastepstatus($K)
}
delete(@ii,@outvar, @varnames, @vars, @j, @nvars)
_DASTEPSTATE
%daremovevar%
====> ulsfactor <====
ulsfactor MACRO DOLLARS
) Macro to use nonlinear least squares macro levmar to find
) ULS estimates of uniquenesses and loadings. It uses macro ulsresids
) to compute residuals used by levmar.
)
) Usage:
) ulsfactor(s, m [, quiet:T,silent:T,start:psi0,uselogs:T],\
) maxit:nmax, minit:nmin, print:T,crit:crvec])
) s: a positive definite covariance or correlation matrix
) m: a positive integer < (2*p + 1 - sqrt(8*@p+1))/2
) nmax: nonnegative integer, maximum number of iterations permitted [30]
) nmin: nonnegative integer, minimum of iterations required [0]
) crvec: vector(numsig, nsigsq, delta), 3 criteria for convergence,
) a negative criterion is not used
)
) There is no guarantee that min(psihat) > = 0. In that case you can try
) using uselogs:T to parametrize in terms of log(psi). This ensures
) min(psi) > 0 if it converges.
) Howerever, but if the minimum is actually outside the permissible
) range, it will probably lead to a 'argument to solve() singular'
) error message.
)
) The result is an "invisible" structure
) With print:T, something is printed on each interation
) structure(psihat:psi,loadings:l,criterion:crit,\
) eigenvals:vals,status:iconv,iter:n)
) where
) psi is the vector of estimated uniquenesses
) l is the p by m matrix of loadings satisfying
) crit is sum of squared differences of s from sigmathat
) vals are eigenvalues of s - psi
) iconv = 0 or 4 not converged
) = 1, converged with relative change in coeffs < relcon = 10^-numsig
) = 2, converged with relative change in rss <= relssq = 10^-nsigsq
) = 3, converged with ||gradient|| < delta
) = 4, interation stopped; could not reduce rss.
)
) Macro ulsresids saves a copy of psi on every call in invisible variable
) _PSIULS. Type 'print(_PSIULS) to see this value.
)) Version of 991114, written by C. Bingham, kb@stat.umn.edu
)) It uses macro levmar to minimize the sum of squared residuals.
# $S(s or r, m [,start:psi0,maxit:m,print:T,quiet:T])
@s <- argvalue($1,"variance or correlation","square real nonmissing")
@quiet <- keyvalue($K, "quiet","TF",default:F)
@silent <- keyvalue($K, "silent","TF",default:F)
@maxit <- keyvalue($K, "maxit*","positive integer",default:30)
__USELOGSULS <- keyvalue($K, "uselogs","TF",default:F)
@m <- argvalue($2,"number of factors","positive integer")
@p <- nrows(@s)
@kind <- if(sum(diag(@s) == 1) == @p){
"correlation"
}else{
"covariance"
}
if (min(eigenvals(@s)) <= 0){
error(paste(@kind,"matrix is not positive definite"))
}
if (2*@m >= 2*@p + 1 - sqrt(8*@p+1)){
error(paste("With m = ",@m," and p = ",@p,\
", m > (2*p + 1 - sqrt(8*p+1))/2",sep:""))
}
if (!ismacro(levmar)){
getmacros(levmar,quiet:T)
}
if (!ismacro(ulsresids)){
getmacros(ulsresids,quiet:T)
}
@psi0 <- keyvalue($K,"start","positive vector",\
default:1/diag(solve(@s))) # starting value
if (length(@psi0) != @p){
error("Wrong number of starting uniqueness")
}
if (__USELOGSULS){
@psi0 <- log(@psi0)
}
@result <- levmar(@psi0,0,vector(@s),@m,resid:ulsresids,$K)
@psihat <- if (__USELOGSULS){
exp(@result$coefs)
}else{
@result$coefs
}
@iconv <- @result$iconv
@iter <- @result$iter
if (@iconv == 0){
print(paste("WARNING: failed to converge in",@maxit,"iterations"))
}
@eigs <- eigen(@s - dmat(@psihat))
@vals <- @eigs$values
@J <- run(@m)
@lhat <- if (@vals[@m] < 0){
NULL
}else{
@eigs$vectors[,@J] * sqrt(@vals[@J]')
}
delete(@eigs,@J)
if (!@silent){
if (isnull(@lhat)){
print(paste("WARNING: eigenvalue",@m,"of (R or S) - Psi is negative"))
}
if (min(@psihat) < 0){
print("WARNING: psi not acceptable since min(psihat) < 0")
}
if (@iconv == 4){
print("WARNING: could not reduce RSS on iteration",@iter)
}elseif (@iconv <= 0){
print(paste("WARNING: no convergence in",@iter,"iterations"))
}
}
if (!@quiet && !@silent){
if(@iconv > 0 && @iconv < 4){
print(paste("Convergence in",@iter,"iterations by criterion",@iconv))
}
print(psihat:@psihat,loadings:@lhat,criterion:@result$rss,eigenvals:@vals)
}
delete(@quiet,@silent,@p,@m,@psi0,__USELOGSULS)
@result <- structure(psihat:delete(@psihat,return:T),\
loadings:delete(@lhat,return:T),criterion:delete(@result$rss,return:T),\
eigenvals:delete(@vals,return:T),status:delete(@iconv,return:T),\
iter:delete(@iter,return:T))
delete(@result,return:T,invisible:T)
%ulsfactor%
====> ulsresids <====
ulsresids MACRO DOLLARS OUTLINE
) Macro to be used with levmar to do ULS factor extraction
)
) levmar usage:
) levmar(if(__USELOGSULS){log(psi)}else{psi},0,\
) vector(r),m,resid:ulsresids)
)) 0000828
# $S(psi,0,vector(r),m)
_PSIULS <- if(__USELOGSULS){
exp($1)
}else{
$1
}
@psi <- dmat(_PSIULS)
@r <- $3
@p <- sqrt(length(@r))
@J <- run($4) #run(m)
@eigs <- eigen(matrix(@r,@p) - @psi)
@l <- @eigs$vectors[,@J]
@res <- delete(@r,return:T) - vector((@l * @eigs$values[@J]') %C% @l + @psi)
delete(@psi,@p,@J,@eigs,@l)
delete(@res, return:T)
%ulsresids%
====> glsfactor <====
glsfactor MACRO DOLLARS
) Macro to use nonlinear least squares macro levmar to find
) GLS estimates of uniquenesses and loadings. It uses macro glsresids
) to compute residuals used by levmar.
)
) Usage:
) glsfactor(s, m [, quiet:T,silent:T,start:psi0,uselogs:T],\
) maxit:nmax, minit:nmin, print:T,crit:crvec])
) s: a positive definite covariance or correlation matrix
) m: a positive integer < (2*p + 1 - sqrt(8*p+1))/2
) nmax: nonnegative integer, maximum number of iterations permitted [30]
) nmin: nonnegative integer, minimum of iterations required [0]
) crvec: vector(numsig, nsigsq, delta), 3 criteria for convergence,
) a negative criterion is not used
) With print:T, something is printed on each interation
)
) There is no guarantee that min(psihat) > = 0. In that case you can try
) using uselogs:T to parametrize in terms of log(psi). This ensures
) min(psi) > 0 if it converges.
) Howerever, but if the minimum is actually outside the permissible
) range, it will probably lead to a 'argument to solve() singular'
) error message.
)
) The result is an "invisible" structure
) structure(psihat:psi,loadings:l,criterion:crit,\
) eigenvals:vals,status:iconv,iter:n)
) where
) psi is the vector of estimated uniquenesses
) l is the p by m matrix of loadings satisfying
) crit is sum of squared differences of s from sigmathat
) vals are eigenvalues of s - psi
) iconv = 0 or 4 not converged
) = 1, converged with relative change in coeffs < relcon = 10^-numsig
) = 2, converged with relative change in rss <= relssq = 10^-nsigsq
) = 3, converged with ||gradient|| < delta
) = 4, interation stopped; could not reduce rss.
)
) Macro glsresids saves a copy of psi on every call in invisible variable
) _PSIGLS. Type 'print(_PSIGLS) to see this value.
)) Version of 991115, written by C. Bingham, kb@stat.umn.edu
)) It uses macro levmar to minimize the sum of squared residuals.
# $S(s or r, m [,start:psi0,maxit:m,print:T,quiet:T])
@s <- argvalue($1,"variance or correlation","square real nonmissing")
@quiet <- keyvalue($K, "quiet","TF",default:F)
@silent <- keyvalue($K, "silent","TF",default:F)
@maxit <- keyvalue($K, "maxit*","positive integer",default:30)
__USELOGSGLS <- keyvalue($K, "uselog*","TF",default:F)
@m <- argvalue($2,"number of factors","positive integer")
@p <- nrows(@s)
@kind <- if(sum(diag(@s) == 1) == @p){
@what <- "r"
"correlation"
}else{
@what <- "s"
"covariance"
}
@kind <- if(sum(diag(@s) == 1) == @p){
"correlation"
}else{
"covariance"
}
if (min(eigenvals(@s)) <= 0){
error(paste(@kind,"matrix is not positive definite"))
}
if (2*@m >= 2*@p + 1 - sqrt(8*@p+1)){
error(paste("With m = ",@m," and p = ",@p,\
", m > (2*p + 1 - sqrt(8*p+1))/2",sep:""))
}
if (!ismacro(levmar)){
getmacros(levmar,quiet:T)
}
if (!ismacro(glsresids)){
getmacros(glsresids,quiet:T)
}
@psi0 <- keyvalue($K,"start","positive vector",\
default:1/diag(solve(@s))) # starting value
if (length(@psi0) != @p){
error("Wrong number of starting uniqueness")
}
if (__USELOGSGLS){
@psi0 <- log(@psi0)
}
@result <- levmar(@psi0,vector(@s),vector(@s),@m,resid:glsresids,$K)
@psihat <- if (__USELOGSGLS){
exp(@result$coefs)
}else{
@result$coefs
}
@iconv <- @result$iconv
@iter <- @result$iter
if (@iconv == 0){
print(paste("WARNING: failed to converge in",@maxit,"iterations"))
}
@eigs <- releigen(solve(@s), dmat(1/@psihat))
@J <- run(@m)
@vals <- 1/reverse(@eigs$values) - 1
@lhat <- if (@vals[@m] < 0){
NULL
}else{
@eigs$vectors[,@p + 1 - @J] * sqrt(@vals[@J]')
}
delete(@eigs,@J)
if (!@silent){
if (isnull(@lhat)){
print(paste("WARNING: eigenvalue",@m,"is negative"))
}
if (min(@psihat) < 0){
print("WARNING: psi not acceptable since min(psihat) < 0")
} elseif (min(eigenvals(@s - dmat(@psihat))) < 0) {
print(paste("WARNING:",@what,\
"- dmat(psihat) not non-negative definite"),macroname:T)
}
if (@iconv == 4){
print("WARNING: could not reduce RSS on iteration",@iter)
}elseif (@iconv <= 0){
print(paste("WARNING: no convergence in",@iter,"iterations"))
}
}
if (!@quiet && !@silent){
if(@iconv > 0 && @iconv < 4){
print(paste("Convergence in",@iter,"iterations by criterion",@iconv))
}
print(psihat:@psihat,loadings:@lhat,criterion:@result$rss,vals:@vals)
}
delete(@quiet,@silent,@p,@m,@psi0,__USELOGSGLS)
@result <- structure(psihat:delete(@psihat,return:T),\
loadings:delete(@lhat,return:T),criterion:delete(@result$rss,return:T),\
vals:delete(@vals,return:T),status:delete(@iconv,return:T),\
iter:delete(@iter,return:T))
delete(@result,return:T,invisible:T)
%glsfactor%
====> glsresids <====
glsresids MACRO DOLLARS
) Macro to be used with levmar to do GLS factor extraction
) This is very much a work in progress
) Usage:
) stuff <- levmar(if(__USELOGSGLS){log(psi)}else{psi},\
) vector(r),vector(r),m,resid:glsresids), m = number of factors
) glsresids itself returns vector(Ip - solve(r,sigmahat))
) It saves its most recent argument as _PSIGLS
)
) You need to set __USELOGSGLS to T or F before calling levmar
) and use log(psi) as an argument to levmar when __USELOGSGLS is True
) This is done automatically by macro glsfactor
)
) The result from levmar is the structure returned by levmar.
) stuff$coefs is psihat (or log(psihat))
) stuff$rss is the sum(vector(Ip - solve(r,sigmahat))^2)
) stuff$iconv is convergence status -- 0 means not converged
) stuff$iter is the number of iterations
) Version 991115
# $S(psi,vector(r),vector(r),m)
# residualstr((Ip - solve(s,sigmahat))^2)
_PSIGLS <- if(__USELOGSGLS){
exp($1)
}else{
$1
}
@r <- $2 #x
@p <- sqrt(length(@r))
@r <- matrix(@r,@p)
@m <- $4
@eigs <- releigen(solve(@r), dmat(1/_PSIGLS))
@J <- run(@p, @p - @m + 1)
@tmpl <- @eigs$vectors[,@J]
@vals <- 1/@eigs$values[@J] - 1
@sighat <- (@tmpl * @vals') %C% @tmpl + dmat(_PSIGLS)
@resids <- sqrt(abs(dmat(@p,1) - solve(@r, @sighat)))
@resids <- @resids * @resids'
delete(@m,@J,@eigs,@tmpl,@vals,@p,@r,@sighat)
vector(delete(@resids, return:T))
%glsresids%
====> goodfit <====
goodfit MACRO DOLLARS
) Usage: goodfit(s,lhat,psihat), s a covar or correl matrix
) Compute three goodness of fit criteria for factor analysis
) solution, the maximum absolute deviation, the sum of squared
) deviations and the sum of absolution deviations of the fitted
) covariance matrix lhat %*% lhat' + dmat(psihat) from the actual
) covariance matrix s
))Version 991114
# usage: $S(s,lhat,psihat), s a covar or correl matrix
@lhat <- argvalue($2,"lhat","real matrix nonmissing")
@dev <- vector(argvalue($1,"s","real square") -\
(@lhat %C% @lhat) -\
dmat(argvalue($3[[1]],"psi","nonmissing real vector")))
@tmp <- vector(max(abs(@dev)),sum(@dev^2), sum(abs(@dev)),\
labels:vector("max(|dev|)","sum(dev^2)","sum(|dev|)"))
delete(@dev)
delete(@tmp,return:T)
%goodfit%
====> stepuls <====
stepuls MACRO DOLLARS
) Macro to do 1 step of crude iteration for ULS factor analysis
) usage:
) stepuls(s, psi, m [,print:T])
) s = covariance or correlation matrix
) psi = vector of current values for uniquenesses or a structure
) whose first component is such a vector
) m = number of factors
) With print:T, the new values will be printed
) The output is a structure with components 'psi' and 'loadings'
) stepuls also creates side effect variabls PSI, LOADINGS, and
) CRITERION
))Version 000531
# usage: $S(r,psi,m [,print:T]), s=covar or corr, psi=current value
# m = order = number of factors
if ($v != 3 || $k > 1){
error("usage: $S(s, psi, m [print:T])")
}
@s <- matrix(argvalue($1,"s","real square"))
@psi <- argvalue($2[[1]],"psi","positive vector")
@M <- argvalue($3,"number of factors","positive integer scalar")
@print <- keyvalue($K,"print","TF",default:F)
@eigs <- eigen(@s - dmat(@psi))
@vals <- @eigs$values[run(@M)]
if(min(@vals)<= 0){
error("s - psi not positive definite")
}
@l <- sqrt(@vals)' * @eigs$vectors[,run(@M)]
delete(@vals)
@psi <- diag(delete(@s,return:T)) - vector(sum((@l * @l)'))
if(min(@psi) <= 0){
error("min(psi) <= 0")
}
CRITERION <- sum(@eigs$values[-run(@M)]^2)
delete(@eigs)
PSI <- delete(@psi, return:T)
LOADINGS <- delete(@l, return:T)
if(delete(@print,return:T)){ # print goodness of fit & psi
print(psi:PSI ,criterion:CRITERION)
}
structure(psi:PSI, loadings:LOADINGS, crit:CRITERION)
%stepuls%
====> stepml <====
stepml MACRO DOLLARS
) Macro to do 1 step of crude iteration for MLE factor analysis
) usage:
) stepml(s,psi,m [, print:T])
) s = covariance or correlation matrix
) psi = vector of current values for uniquenesses or structure
) whose first component is such a vector
) m = number of factors
) With print:T, the new values will be printed
) The output is a structure with components 'psi' and 'loadings'
) stepml also creates side effect variabls PSI, LOADINGS, and
) CRITERION
))Version 000531
# usage: $S(s,psi,m [,print:T]), s=covar or corr, psi=current value
# m=order, print=T or F
if ($v != 3 || $k > 1){
error("usage: $S(s, psi, m [print:T])")
}
@s <- matrix(argvalue($1,"argument 1","real square"))
@M <- argvalue($3,"number of factors","positive integer scalar")
@psi <- argvalue($2[[1]],"psi","positive vector")
@print <- keyvalue($K,"print","TF",default:F)
@eigs <- releigen(@s, dmat(@psi))
@vals <- @eigs$values[run(@M)] - 1
if(min(@vals)<= 0){
error("s - psi not positive definite")
}
@l <- @psi * @eigs$vectors[,run(@M)] * sqrt(@vals)'
delete(@vals)
@psi <- diag(@s) - vector(sum((@l*@l)'))
if(min(@psi) <= 0){
error("non-positive psi; cannot continue")
}
if (haslabels(@s)){
@psi <- vector(@psi,labels:getlabels(@s,1))
}
CRITERION <- sum((@eigs$values - log(@eigs$values)-1)[-run(@M)])
delete(@eigs, @s)
PSI <- delete(@psi, return:T)
LOADINGS <- delete(@l, return:T)
if(delete(@print, return:T)){ # print goodness of fit & psi
print(psi:PSI ,criterion:CRITERION)
}
structure(psi:PSI, loadings:LOADINGS, crit:CRITERION)
%stepml%
====> stepgls <====
stepgls MACRO DOLLARS
) Macro to do 1 step of crude iteration for GLS factor analysis
) usage:
) stepgls(s, psi, m [, print:T])
) s = covariance or correlation matrix
) psi = vector of current values for uniquenesses or a structure
) whose first component is such a vector
) m = number of factors
) With print:T, the new values will be printed
) The output is a structure with components 'psi' and 'loadings'
) stepgls also creates side effect variabls PSI, LOADINGS, and
) CRITERION
))Version 000531
# usage: $S(r, psi, m [,print:T]), s=covar or corr, psi=current value
# m = order = number of factors
if ($v != 3 || $k > 1){
error("usage: $S(s, psi, m [print:T])")
}
@s <- matrix(argvalue($1,"argument 1","real square"))
@psi <- argvalue($2[[1]],"psi","positive vector")
@M <- argvalue($3,"number of factors","positive integer scalar")
@print <- keyvalue($K,"print","TF",default:F)
@eigs <- releigen(@s, dmat(@psi))
@vals <- @eigs$values[run(@M)] - 1
if(min(@vals)<= 0){
error("s - psi not positive definite")
}
@l <- @psi * @eigs$vectors[,run(@M)] * sqrt(@vals)'
@sinv <- solve(@s)
@psi <- vector(solve(@sinv^2,\
diag(@sinv) - vector(sum((@l %c% @sinv)^2))))
if(min(@psi) <= 0){
error("non-positive psi; cannot continue")
}
CRITERION <- sum((1-1/@eigs$values[-run(@M)])^2)
delete(@eigs, @sinv, @s)
PSI <- delete(@psi, return:T)
LOADINGS <- delete(@l, return:T)
if(delete(@print,return:T)){ # print goodness of fit & psi
print(psi:PSI ,criterion:CRITERION)
}
structure(psi:PSI, loadings:LOADINGS, crit:CRITERION)
%stepgls%
===> compf <===
compf MACRO DOLLARS
) f <- compf(h,e,fh,fe) ; INS must be defined
) Compute F-to-enter for "out" variables in a forward stepwise
) discriminant analysis, when variables listed in integer vector
) INS are "in".
)
) h and e must be the overall p by p hypothesis and error matrices
) with associated degrees of freedom fh and fe.
) Vector INS must be defined as a list of variables alread swept.
) INS[1]==0 means none is in.
)
) compf is obsolete. Use dastepsetup, dastepstatus, daentervar
) daremovevar and dasteplook to do stepwise discriinant analysis
)) Version 000529, use argvalue()
# usage: f <- $S(h,e,fh,fe) ; INS must be defined
@h <- matrix(argvalue($1,"hypothesis matrix", "real square nonmissing"))
@e <- matrix(argvalue($2,"error matrix", "real square nonmissing"))
@fh <- argvalue($3, "hypothesis DF", "positive count")
@fe <- argvalue($4, "error DF", "positive count")
@p <- nrows(@h)
if (nrows(@e) != @p){
error("hypothesis and error matrices are different sizes")
}
if (alltrue(!isvector(INS,real:T), !isnull(INS))){
print("WARNING: INS not defined; initialized to 0")
INS <- 0
} elseif (!isnull(INS)) {
if(sum(INS != ceiling(INS)) != 0 || min(INS) < 0 || max(INS) > @p ||\
!isscalar(INS) && min(INS) == 0){
error("INS not 0 or vector of positive integers <= nrows(h)")
}
if(length(unique(INS)) != length(INS)){
error("duplicate elements in INS")
}
if(@p < length(INS)) {
error("length(INS) > p = nrows(h); not possible")
}
}
if (alltrue(!isnull(INS), @p == length(INS))){
print("WARNING: all variables IN", macroname:T)
@out <- structure(f:NULL,df:vector(@fh,@fe - @p),ins:INS,OUTS:NULL)
} else {
if(anytrue(isnull(INS), INS[1]==0)){ # initialize
@h <- diag(@h)
@e <- diag(@e)
OUTS <- run(@p)
} else {
OUTS <- run(@p)[-INS]
@h <- diag(swp(@h + @e,INS))[OUTS]
@e <- diag(swp(@e,INS))[OUTS]
@h <-- @e
@fe <-- length(INS)
}
@out <- structure(f:(@h/@fh)/(@e/@fe),df:vector(@fh,@fe),ins:INS,outs:OUTS)
}
delete(@p,@fh,@fe,@h,@e)
delete(@out,return:T)
%compf%
===> forstep <===
forstep MACRO DOLLARS
) forstep(i,h,e,fh,fe)
) Macro to add i-th variable to list of "in" variables in multigroup
) stepwise discriminant analysis.
) Integer vector INS must be defined and is a list of numbers of
) variables already "in". INS = 0 means no variables are "in"
)
) Returns structure(f, df, ins, outs) where f is the vector of
) F-to-enter values, df = vector(fh,fe) contains their degrees
) of freedom, ins and outs are vectors of variable numbers that
) are "in" and "out" respectively. INS = 0 means no variables are "in"
)
) forstep is obsolete. Use dastepsetup, dastepstatus, daentervar
) daremovevar and dasteplook to do stepwise discriinant analysis
)) Version 000826 use argvalue
# usage: $S(i,h,e,fh,fe) ; INS must be defined
if ($v != 5){
error("usage is $S(ivar,h,e,fh,fe)")
}
@h <- matrix(argvalue($2,"hypothesis matrix","nonmissingreal square"))
@e <- matrix(argvalue($3,"error matrix","nonmissingreal square"))
if(nrows(@h) != nrows(@e)){
error("hypothesis and error matrices are different sizes")
}
@fh <- argvalue($4,"hypothesis DF","positive count")
@fe <- argvalue($5,"error DF","positive count")
@i <- argvalue($1,"entering variable number","positive count")
if (!isvector(INS, real:T) && !isnull(INS)){
print("WARNING: initializing INS to 0")
INS <- 0
}
if (alltrue(!isnull(INS), match(@i, INS, 0) != 0)){
error("variable already IN")
}
INS <- if(alltrue(!isnull(INS), INS[1] != 0)){
vector(INS,@i)
}else{
@i
}
OUTS <- if(isscalar(OUTS) == 1){
0
}else{
OUTS[OUTS != @i]
}
if (!ismacro(compf)){
getmacros(compf)
}
@out <- compf(@h, @e, @fh, @fe)
delete(@h, @e, @fh, @fe, @i)
delete(@out, return:T)
%forstep%
===> backstep <===
backstep MACRO DOLLARS
) Macro for doing a back step in stepwise discriminant analysis.
) Usage:
) backstep(h,e,fh,fe)
) INS must be defined and contain indices of in variables
) h and e are hypothesis and error matrices with fh and fe degrees of
) freedom
) backstep deletes the variable with smallest F-to-delete, and returns
) structure(f:F_to_delete,df:vector(fh,fe),ins:INS,outs:OUTS)
) INS is updated and OUTS is created
)) Version 000826 Use argvalue()
# usage: $S(h,e,fh,fe)
if ($v != 4){
error("usage is $S(h,e,fe,fh)")
}
@h <- matrix(argvalue($1,"hypothesis matrix","nonmissing square real"))
@e <- matrix(argvalue($2,"error matrix","nonmissing square real"))
@fh <- argvalue($3,"hypothesis DF","positive count")
@fe <- argvalue($4,"error DF","positive count")
@p <- nrows(@h)
if (!isvector(INS,real:T)){
print(paste("WARNING: INS not defined; set to run(",@p,")",sep:""))
INS <- run(@p)
}
if(anytrue(isnull(INS), INS[1] == 0)){
error("no variables left IN")
}
# get submatrices of h & e for variables in INS
@h <- @h[INS,INS]
@e <- @e[INS,INS]
@fe <-- length(INS) - 1
@u <- 1/diag(solve(@h + @e))
@v <- 1/diag(solve(@e))
@f <- ((@u - @v)/@fh)/(@v/@fe)
delete(@u,@v,@h,@e)
@deleted <- grade(@f)[1]
INS <- if(isscalar(INS)){
0
}else{
INS[-@deleted]
}
delete(@deleted)
OUTS <- if(INS[1] == 0){
run(@p)
}else{
run(@p)[-INS]
}
delete(@p)
structure(f:delete(@f,return:T),df:vector(delete(@fh,return:T),\
delete(@fe,return:T)),ins:INS,outs:OUTS)
%backstep%
===> mulvarhelp <===
mulvarhelp MACRO
) Macro to get help on macros in mulvar.mac
# usage $S(topic1 [, topic2 ...] [help keywords])
if(!ismacro(_gethelp)){
getmacros(_gethelp,quiet:T)
}
___HELPFILE_ <- "mulvar.mac"
___INDEXTOP_ <- "mulvar_index"
___MACRO_ <- "$S"
_gethelp($0)
%mulvarhelp%
_E_N_D_O_F_M_A_C_R_O_S_ Marker for end of macros and data
Help file for mulvar.mac for MacAnova
(C) 2000 by Gary W. Oehlert and Christopher Bingham
Updated 000825 CB
!!!! Starting marker for message of the day
!!!! Ending marker for message of the day
???? Starting marker for list of up to 32 comma/newline separated keys
???? Ending marker for keys
====backstep()#
%%%%
backstep(H,E,fh,fe), H and E symmetric REAL matrices of the same size
with no MISSING values
%%%%
NOTE: This macro is OBSOLETE and is retained only for backward
compatibility because it was in file MacAnova.mac in earlier versions of
MacAnova. For doing stepwise variable selection in discriminant analysis
you should use newer macros dastepsetup, daentervar, daremovevar,
dastepstatus and dasteplook.
You can use macro backstep to perform a variable elimination step in
backwards stepwise variable selection in linear discriminant analysis.
Macro backstep is intended to be used after you have used manova() to
compute hypothesis and error matrices H and E, with fh and fe degrees of
freedom respectively.
Status information about the variables currently "in" and "out" is
maintained in integer vectors INS and OUTS containing numbers of variables
currently included and currently excluded. When no variables are "in", INS
= NULL (INS = 0 means the same thing); when all variables are "in", OUTS =
NULL. INS must be initialized, usually to run(p), before backstep can be
used.
backstep(H,E,fh,fe) computes F-to-delete for all variables currently
included in vector INS. It then updates INS by removing the index of
the variable with the smallest F-to-delete, setting INS to 0 if no
variable is left "in". OUTS is computed as run(p) when there are no
variables "in" and as run(p)[-INS] otherwise.
The value returned is structure(f:F_to_delete, df:vector(fh,fe-k+1),
ins:INS,outs:OUTS), where F_to_delete is the vector of F-to-delete
statistics and k = length(INS) before deletion. INS and OUTS are
copies of the updated INS and OUTS vectors. The F-to-delete statistics
have nominal degrees of freedom fh an fe - k + 1.
The variable that is deleted is the one with the smallest F-to-delete
statistic. If this is large enough, you may want to reenter the
variable using forstep. See 'forstep'.
You can initialize things by
Cmd> manova("y = groups",silent:T) # response matrix y, factor groups
Cmd> H <- matrix(SS[2,,]); E <- matrix(SS[3,,])
Cmd> fh <- DF[2]; fe <- DF[3]
Cmd> INS <- run(nrows(H)) # all variables "in"
Macro forstep is available for doing a forward step (variable inclusion).
One difference between backstep and forstep is that backstep determines the
variable to eliminate, and then updates INS and OUTS; you must tell forstep
which variable to include. See 'forstep' for details. See also compf
which computes F-to-enter for variables not in INS. compf does not compute
F-to-delete.
Both forstep and compf are OBSOLETE and are retained only for backward
compatibility.
See also manova(), 'daentervar', 'daremovevar', 'dastepsetup',
'dastepstatus' and 'dasteplook'.
====chiqqplot()#
%%%%
chiqqplot(y [,sqrt:T] [,graphics keywords phrases]), REAL matrix y
chiqqplot(dsq, df [,sqrt:T] [, graphic keyword phrases]), REAL vector
or matrix dsq of squared distances, REAL scalar df > 0
%%%%
chiqqplot is a macro to make a chisquare or sqrt(chisquare) quantile-
quantile (Q-Q) plot of ordered generalized distances against chisquare
or sqrt(chisquare) quantiles.
chiqqplot(y [, graphics keyword phrases]), where y is a REAL matrix
with no MISSING elements, plots the ordered values of generalized
distances against x[i] = invchi((i - 1/2)/n, df), where df = p =
ncols(y). Common graphics keywords are 'xlab', 'ylab', 'title' and
'symbols'. 'symbols' may be used as with chplot(), except that
'symbols:0' labels the points by row number.
The generalized distance of y[i,] from ybar, the sample mean row vector,
= sum(y)/nrows(y) is
dsq[i] = (y[i,] - ybar) %*% solve(s) %*% (y[i,] - ybar)'
with s the sample variance-covariance matrix with divisor n-1.
When the rows of y are a random sample from a multivariate normal
distribution, d[i] is distributed approximately as chi-squared on p
degrees of freedom and the plot should approximate a straight line with
slope 1.
Very commonly, the "data" matrix is RESIDUALS, the matrix of residuals
computed by manova(). The Q-Q plot is a way to assess the multivariate
normality of the errors.
chiqqplot(y, sqrt:T ...) does the same, except the ordered values of
sqrt(dsq[i]) are plotted against sqrt(invchi((i - 1/2)/n, df)).
Again the plot should be linear when y is multivariate normal.
chiqqplot(dsq, df, [,sqrt:T] ...) does the same except the ordered
columns of REAL vector or matrix dsq or sqrt(dsq) are plotted against
quantiles of chisquare or sqrt(chisquare) with df degrees of freedom.
When dsq is computed by distcomp(y) and df = ncols(y), this usage is
equivalent to chiqqplot(y, [,sqrt:T] ...)
When dsq is a matrix, the columns are separately ordered and plotted
with different symbols. With symbol:0, the symbols are the row numbers
when ncols(dsq) = 1 and are the column numbers otherwise.
chiqqplot uses macros covar and distcomp.
====compf()#
%%%%
f <- compf(h,e,fh,fe), REAL symmetric matrices h and e, positive
integers fh and fe; integer vector INS must be defined
%%%%
NOTE: This macro is OBSOLETE and is retained only for backward
compatibility because it was in file MacAnova.mac in earlier versions of
MacAnova. For doing stepwise variable selection in discriminant analysis
you should use newer macros dastepsetup, daentervar, daremovevar,
dastepstatus and dasteplook.
Macro compf computes Fs-to enter at any stage in stepwise variable
selection in linear discriminant analysis.
You can use compf after manova() has computed hypothesis and error
matrices H and E, with fh and fe degrees of freedom respectively.
Integer vector INS must be defined, containing the variable numbers
that are "in". INS = 0 means no variables are in and the Fs-to-enter
are simply the separate ANOVA Fs. When INS != 0, the Fs-to-enter are
the analysis of covariance Fs for each "out" variable, with the "in"
variables being used as covariates.
compf(H,E,fh,fe) returns structure(f:f_to_enter,df:vector(fh,fe-k),
ins:INS, outs:OUTS), where OUTS is run(p) when INS = 0, is NULL when
INS contains all integers 1, ..., p and is run(p)[-J] otherwise, where
p = ncols(H). k is the number of variables "in". The F-to-enter
statistics have nominal degrees of freedom fh and fe - k.
Here is an example of starting forward stepwise variable selection.
Cmd> manova("y = groups",silent:T)#, response matrix y, factor groups
Cmd> H <- matrix(SS[2,,]); E <- matrix(SS[3,,])
Cmd> fh <- DF[2]; fe <- DF[3]
Cmd> INS <- 0 # no variables in
Cmd> results <- compf(H,E,fh,fe)
Cmd> j <- grade(results$f,down:T)[1] # index of largest F
Cmd> # now continue with forstep(j,H,E,fh,fe)
====covar()#
%%%%
covar(x), REAL matrix x with no MISSING values, returns
structure(n:sampleSize,mean:xbar,covariance:s)
%%%%
covar(x), where x is a REAL matrix with no MISSING values computes the
sample mean and variance-covariance matrix of x. It returns
structure(n:N, mean:xbar, covariance:s)
where N = nrows(x) is the sample size, xbar is a row vector of the
sample means of the columns, and s is the sample variance-covariance
matrix (with divisor N - 1).
Macro covar is obsolete but is retained for backward compatibility.
Essentially the same output can be obtained from tabs(y, count:T,
covar:T, mean:T) which returns structure(mean:means, covar:s,count:n).
The components of covar() output differ from those of tabs() in three
ways:
(a) Component names ('covariance' instead of 'covar' and 'n'
instead of 'count')
(b) The value of 'mean' is a row vector (1 by ncols(x)) for covar and
a (column) vector for tabs()
(c) The value of covar component 'n' is the scalar N, while tabs()
component 'count' is rep(N,ncols(x)), with a count for each column
of x
See also tabs() and groupcovar.
====daentervar()#
%%%%
daentervar(var1 [,var2 ...] [,silent:T]), var1, var2 ... names
or numbers of variables to be entered
%%%%
daentervar(j), where j is a positive integer, enters dependent variable
j as part of a stepwise dependent variable selection, usually as one
stage in stepwise discriminant analysis. This is what is sometimes
called a "forward" step. It is an error if variable j is already an
"in" variable.
daentervar updates variable _DASTEPSTATE which encapsulates the current
state of the variable selection process and prints a report with the
new values of F-to-enter or F-to-remove, and their P-values. See topic
'_DASTEPSTATE' and 'dastepsetup' for more details. It returns a copy of
the updated _DASTEPSTATE as an invisible variable that can be assigned
but is not normally printed.
You normally choose the variable to enter as the variable with the
largest value of F-to-enter as printed by macro dastepsetup,
dastepstatus or a preceding use of daentervar
daentervar(varname), where varname is the quoted or unquoted name of
a variable in the model, does the same. Thus if the original data
matrix had column labels "SepLen", "SepWid", "PetLen" and "PetWid",
either daentervar(SepWid) or daentervar("SepWid") would be equivalent
to daentervar(2).
daentervar(j1, j2, ...) and daentervar(varname1, varname2, ...)
successively enter several variables, printing a report after each is
entered. The value returned is _DASTEPSTATE after all the variables
have been entered.
daentervar(j1 [,j2, ...], silent:T) and daentervar(varname1 [,varname2,
...], silent:T) do the same, except no report is printed. This would
normally be followed by dastepstatus() to print a report after all the
variables have been entered.
See also 'dastepstatus', 'daremovevar' and 'dasteplook'.
====daremovevar()#
%%%%
daremovevar(var1 [,var2 ...] [,silent:T]), var1, var2 ... names or
numbers of variables to be removed
%%%%
daremovevar(j), where j is a positive integer, removes dependent
variable j as part of a stepwise dependent variable selection, usually
as one stage in stepwise discriminant analysis. This is what is
sometimes called a "backward" step. It is an error if variable j is
not an "in" variable (is already an "out" variable).
daremovevar updates variable _DASTEPSTATE which encapsulates the
current state of the variable selection process and prints a report with
the new values of F-to-remove or F-to-remove, and their P-values. See
topic '_DASTEPSTATE' and 'dastepsetup' for more details. It returns a
copy of the updated _DASTEPSTATE as an invisible variable that can be
assigned but is not normally printed.
You normally choose which variable to remove as the variable with the
smallest value of F-to-remove as printed by macro dastepsetup,
dastepstatus or a preceding use of daremovevar.
daremovevar(varname), where varname is the quoted or unquoted name of a
variable in the model, does the same. Thus if the original data matrix
had column labels "SepLen", "SepWid", "PetLen" and "PetWid", either
daremovevar(SepWid) or daremovevar("SepWid") would be equivalent to
daremovevar(2).
daremovevar(j1, j2, ...) and daremovevar(varname1, varname2, ...)
successively remove several variables, printing a report after each is
removed. The value returned is _DASTEPSTATE after all the variables
have been removed
daremovevar(j1 [,j2, ...], silent:T) and daremovevar(varname1
[,varname2,...], silent:T) do the same, except no report is printed.
This would normally be followed by dastepstatus() to print a report
after all the variables have been removed.
See also 'dastepstatus', 'daentervar' and 'dasteplook'.
====dasteplook()#
%%%%
dasteplook(name1, name2, ...), names selected from 'model', 'hpluse',
'e', 'in', 'F', 'fh', 'fe', and 'history' dasteplook(all)
%%%%
dasteplook(compname), where compname is the name of a component of
variable _DASTEPSTATE, that is one of 'model', 'hpluse', 'e', 'in',
'F', 'fh', 'fe', and 'history', returns that component of
_DASTEPSTATE. See topic '_DASTEPSTATE' for details on these components
compname can either be quoted, as in dasteplook("hpluse"), or unquoted,
as in dasteplook(hpluse).
dasteplook(compname1, compname2, ...), where the compnames are quoted
or unquoted _DASTEPSTATE component names, returns a structure
consisting of those components.
dasteplook(all) or dasteplook("all") return all of _DASTEPSTATE as
value.
dasteplook is designed for use in a macro which controls the use of
macros dastepsetup, daentervar, daremovevar to carry out an entire
stepwise dependent variable selection.
See also topics 'dastepsetup', 'daentervar', 'daremovevar' and
'dastepstatus'.
====dastepsetup()#
%%%%
dastepsetup([Model] [,allin:T or in:logvec] [,silent:T]), Model a
CHARACTER scalar glm model, usually of the form "y = groups"
%%%%
You use macro dastepsetup at the start of a forward or backward
stepwise selection of dependent variables in a discriminant analysis or
more generally in a multivariate linear model.
What it actually does is create and initialize invisible variable
_DASTEPSTATE which encapsulates information on which dependent
variables are "in" and which are "out" at any stage of the variable
selection process. See topic '_DASTEPSTATE'.
The most common use is in stepwise linear discriminant analysis where
you are trying to select a subset of reponse variables that effectively
discriminate among two or more groups. It can also be used in any
linear model when you are trying to select a subset of reponse
variables that are responsible for any violation of the overall null
hypothesis H0: all model coefficients except constant term are 0.
dastepsetup(Model), where Model is a CHARACTER scalar specifying a GLM
model, initializes _DASTEPSTATE so that no variables are "in" and all
are "out". This is appropriate at the start of forward stepwise
dependent variable selection. In linear discrimination analysis, Model
has the form "y = groups", where groups is a factor defining the groups
to be discriminated.
A report of the current status is printed. This includes all the
F-to-enter statistics and their P-values.
A copy of _DASTEPSTATE is returned as an "invisible" variable which can
be assigned but is not automatically printed.
dastepsetup(Model, silent:T) does the same, except the printed report
is suppressed.
dastepsetup([,silent:T]) does the same, except variable STRMODEL,
usually the most recent GLM model used, is taken as Model.
dastepsetup([Model], allin:T [,silent:T]) does the same, except that
all response variables are "in" and no variables are "out". Component
'history' of _DASTEPSTATE is initialized to run(p), where p is the
number of variables.
dastepsetup([Model], in:ins [,silent:T]), where ins is a LOGICAL vector
of length p, does the same, except only variables j1, j2 , ... are "in"
where ins[j1], ins[j2] ... are T and the remaining elements are F.
Component 'history' is initialized to vector(j1,j2,...).
After dastepsetup, your next step is to use daentervar to enter a new
variable or daremovevar to remove a variable. The choice of which
variable to enter or remove is usually made on the basis of the
F-to-enter and/or F-to-remove statistics in the printed report.
See also topics 'daentervar', 'daremovevar', 'dastepstatus' and
'dasteplook.
====_DASTEPSTATE%#
%%%%
Invisible variable _DASTEPSTATE encapsulates the current state of a
process of stepwise variable selection in linear discriminant analysis.
You use macro dastepsetup to initialize _DASTEPSTATE, macros daentervar
and daremovevare to update it, macro dastepstatus to report F-to-enter
and F-to-remove statistics in _DASTEPSTATE and macro dasteplook to
extract information from _DASTEPSTATE.
%%%%
_DASTEPSTATE is an invisible variable which encapsulates the current
state of a process of stepwise response variable selection for a
multivariate linear model. The model is most commonly of the form "y =
groups", groups a factor, and the stepwise process corresponds to
stepwise linear discriminant analysis. You normally don't need to be
concerned with _DASTEPSTATE itself, since all interaction with it can
be done using macros.
In the following, p = number of variables, E = p by p error matrix from
manova() and H = p by p hypothesis matrix. For a model of the form "y
= groups", H = matrix(SS[2,,]) and E = matrix(SS[3,,]). H excludes
contributions from a constant term. See topic 'models'
At a particular stage there are from 0 to p "in" variables, variables
tentatively considered important in rejecting the null hypothesis H0
associated with H; the remainder are "out" and either have not yet been
considered or are tentatively considered unimportant in rejecting H0.
At each stage there is an F-to-enter statistics for each "out"
variable, if any. This is the F-statistic in an analysis of covariance
of the variable with the "in" variables as covariates. When no
variables are "in", this is just the usual ANOVA F-statistic for the
variable.
Similarly, at each stage, there is an F-to-remove statistic for each
"in" variable, if any. This is the F-to-enter statistic for the
variable that would be computed if it were to be removed and turned
into an "out" variable.
_DASTEPSTATUS has the form
structure(model:glmModel, hpluse:(H+E)swept, e:Eswept, in:ins,\
F:F_stats, fh:hypDF, fe:errDF, history:hist)
glmModel CHARACTER scalar specifying a GLM model, usually of the
form "y = groups", groups a factor
ins LOGICAL vector of length p with ins[j] = T when variable
j is "in"
(H+E)swept H+E when no variables are "in"; swp(H+E,run(p)[ins]),
otherwise; that is any "in" variables have been swept
Eswept E when no variables are "in"; swp(E,run(p)[ins])
otherwise
F_statistics A REAL vector of F-statistics, with F[ins] being values
of F-to-remove and F[!ins] being values of F-to-enter
hypDF Numerator d.f. of F-to-enter and F-to-remove = fh, the
degrees of freedom associated with H
errDF Denominator d.f of F_statistics = fe - k - 1 for "in"
and fe - k otherwise, where fe are the error d.f. and
k = sum(ins) = number of "in variables"
hist An integer vector summarizing the path followed to
reach the current state. when hist[i] = j > 0, variable
j was entered at step i; when hist[i] = -j < 0, variable
j was removed at step i.
You normally use macro dastepsetup to initialize _DASTEPSTATUS.
dastepsetup uses manova() to find E, H, fh and fe. When variables j1,
j2, ... jk are specified as being "in", hist is initialized to
vector(j1,...,jk); when no variables are "in" at the start, hist is
NULL. dastepsetup calls stepstatus to compute component 'F' and
optionally report on the initial status.
You use macros daentervar and daremovevar to update _DASTEPSTATUS by
changing an "out" variable to an "in" or vice versa. These optionally
print a report of the new status.
You use macro dastepstatus to compute component F and optionally print
a report of the information in _DASTEPSTATUS.
You use macro dasteplook to extract components of _DASTEPSTATUS without
changing it.
Macros daentervar, daremovevar, dastepsetup and dasetupstatus return
the new value of _DASTEPSTATUS as an "invisible" result which can be
assigned but is not printed.
See also topics 'dastepsetup', 'daentervar', 'daremovevar',
'dasteplook' and 'dastepstatus'.
====dastepstatus()#
%%%%
dastepstatus([silent:T])
%%%%
dastepstatus() computes and prints out the values of F-to-enter or
F-to-remove, and their P-values at the current stage of stepwise
dependent variable selection. You can use this information to select
the next variable to be entered (the one with the largest F-to-enter)
or remove (the one with the smallest F-to remove). Component
'F_statistics' of variable _DASTEPSTATE is updated.
dastepstatus returns as value an "invisible" copy of variable
_DASTEPSTATE. This can be assigned but is not normally printed.
_DASTEPSTATE must have been previously initialized or updated by macros
dastepsetup, daentervar or daremovevar.
dastepstatus(silent:T) sets component 'F_statistics' of _DASTEPSTATE
and returns an invisible copy of _DASTEPSTATE but does not print the
F-statistics.
You ordinarily don't need to use dastepstatus directly since macros
daentervar, daremovevar and dastepstatus all use dastepstatus to
report F-statistic values and their P-values. An exception is when you
want to enter or remove several variables at once and want to see a
report only at the end. Suppose, for example, that you want to add
variables 2 and 4 together, but don't want to see a report after
variable 2 is entered.
Cmd> daentervar(2,4,silent:T) # silently enter variables 2 and 4
Cmd> dastepstatus() # print report
See also topics 'dastepsetup', 'daentervar', 'daremovevar',
'dasteplook' and '_DASTEPSTATE'.
====discrim()#
%%%%
discrim(groups, y), vector of positive integers groups, REAL matrix y
with no MISSING values
%%%%
discrim(groups, y), where groups is a factor or an integer vector, and
y is a REAL data matrix with no MISSING elements, computes the
coefficients of linear discriminant functions that can be used to
classify an observation into one of the populations specified by
argument groups.
The functions being estimated are optimal in the case when the
distribution in each population is multivariate normal and the
variance-covariance matrices are the same for all populations.
When there are g = max(groups) populations, and p = ncols(y) variables,
the value returned is structure(coefs:L, add:C) where L is a REAL p by
g matrix and C is a 1 by g row vector.
If y is a length p vector of data to be classified to one of the
populations, then f = L' %*% y + C' is the vector of discriminant
function values (scores) for the g populations.
If f[j] = max(f) is the largest element of f, then, assuming the g
populations are equally probable (each have prior probability 1/g), then
population j is the most probable population based on y.
If P is a length g vector such that P[j] = prior probability a randomly
selected case belongs to population j, then the estimated
posterior probability that y belongs to population k is
P[k]*exp(f[k])/sum(P * exp(f)) =
P[k]*exp(f[k] - f[1])/sum(P * exp(f - f[1]))
The second form is preferred since exp(f[k]) can be too large to
compute.
When Y is a m by p data matrix whose rows are to be classified, F = Y %*% L
+ C is m by g matrix, with F[i,j] containing the value of the
discriminant function for population j evaluated with the data in row i
of Y. A m by g matrix of posterior probabilities for each group and
case can be computed by
P * exp(F - F[,1])/((P * exp(F - F[,1])) %*% rep(1,g))
NOTE: It is well known that posterior probabilities computed for a case
that is in "training set", the data set from which a classification method
was estimated, are biased in an "optimistic" direction: The estimated
posterior probability for its actual population is biased upward. For
this reason posterior probabilities should be estimated only for cases
that are not in the training set. See macro jackknife for a partial
remedy.
====discrimquad()#
%%%%
discrimquad(groups, y), factor or vector of positive integers groups,
REAL matrix y with nrows(y) = length(groups)
%%%%
discrimquad(groups, y), where groups is a factor or an integer vector,
and y is a REAL data matrix, computes the coefficients of quadratic
discriminant functions that can be used to classify an observation into
one of the populations specified by argument groups.
It is an error if the smallest group has p or fewer members or if y has
any MISSING elements.
When there are g = max(groups) populations, and p = ncols(y) variables,
the value returned is structure(Q:q, L:l, addcon:c, grandmean:ybar),
where the components are as follows:
q structure(Q1,Q2,...Qg), each Qj a REAL p by p matrix
l structure(L1,L2,...Lg), each L2 a REAL vector of length p
c vector(c1,c2,...cg), cj REAL scalars
ybar vector(ybar1,...ybarp), the vector of column means
When x is a vector of length p to be classified, the quadratic score
for group j is
qs[j] = (x-ybar)' %*% q[j] %*% (x-ybar) + (x-ybar)' %*% l[j] + c[j]
The functions are optimal in the case when the distribution in each
population is multivariate normal with no assumption that the variance-
covariance matrices are the same for all populations.
When P = vector(P1,P2,...,Pg) is a vector of prior probabilities a
randomly selected case comes from the various populations, then the
posterior probabilities the elements of the vector
P*exp(qs)/sum(P*exp(qs)) = P*exp(qs - qs[1])/sum(P*exp(qs - qs[1])
The latter form is usually preferable since it is possible for
exp(qs[1]) to be so large as to be uncomputable. These probabilities
can be computed using macro probsquad.
NOTE: It is well known that posterior probabilities computed for a case
that is in "training set", the data set from which a classification method
was estimated, are biased in an "optimistic" direction: The estimated
posterior probability for its actual population is biased upward. For this
reason posterior probabilities should be estimated only for cases that are
not in the training set.
See also discrim and probsquad.
====distcomp()#
%%%%
distcomp(y), REAL matrix y with no MISSING values
%%%%
distcomp(y) computes the generalized distances of each row of REAL
matrix y from the mean of all the rows. y must not have any missing
elements.
When y is n by p, the result is the length nrows(y) vector
d = diag((y - ybar') ' %*% solve(s) %*% (y - ybar'),
where ybar is the vector of column means and s is the sample covariance
matrix of y. The individual elements of d[i] are
d[i] = (y[i,]' - ybar)' %*% solve(s) %*% (y[i,]' - ybar),
Very commonly matrix y is the matrix RESIDUALS computed by manova().
See also chiqqplot.
====forstep()#
%%%%
forstep(i,H,E,fh,fe), integer i > 0, fh > 0, fe > 0, REAL symmetric
matrices H and E with no MISSING values
%%%%
NOTE: This macro is OBSOLETE and is retained only for backward
compatibility because it was in file MacAnova.mac in earlier versions of
MacAnova. For doing stepwise variable selection in discriminant analysis
you should use newer macros dastepsetup, daentervar, daremovevar,
dastepstatus and dasteplook.
Macro forstep performs a variable inclusion step in forward stepwise
variable selection in linear discriminant analysis.
forstep is intended to be used after you have used manova("y =
groups"), where y is a data matrix and groups is a factor, to compute
hypothesis and error matrices H = matrix(SS[2,,]) and E =
matrix(SS[3,,]), with fh = DF[2] and fe = DF[3] degrees of freedom
respectively.
Status information about the variables currently "in" and "out' is
maintained in integer vectors INS and OUTS containing numbers of
variables currently included and currently excluded. When no variables
are "in", INS = 0; when all variables are "in", OUTS = NULL. INS must
be initialized, usually to 0, before forstep can be used.
forstep(j,H,E,fh,fe), where j is the number of a variable not currently
"in", adds j to INS and removes it from OUTS, and then uses macro compf
to compute F-to-enter for all variables included in the updated INS.
The Fs-to-enter are the analysis of covariance Fs for each "out"
variable, with the "in" variables being used as covariates. See topic
'compf'. When no variables are "in", the Fs-to-enter are the ordinary
ANOVA F-statistics for each variable.
The value returned (which will normally be printed if not assigned) is
structure(f:F_to_enter, df:vector(fh,fe-k), ins:INS,outs:OUTS), where
F_to_enter is the vector of F-to-enter statistics, one for each
variable not in INS, INS and OUTs are copies of the status vectors INS
and OUTS. k is the number of variables currently "in".
The F-to-enter statistics have nominal degrees of freedom fh and fe -
k. The next variable to be entered, if any, is normally the variable
with the largest F-to-enter. The decision to enter it is based on the
size of F-to-enter.
You can somewhat automate the start of this process as follows:
Cmd> manova("y = groups", silent:T) # response matrix y, factor groups
Cmd> H <- matrix(SS[2,,]); E <- matrix(SS[3,,])
Cmd> fh <- DF[2]; fe <- DF[3]
Cmd> INS <- 0; stuff <- compf(H,E,fh,fe)
Cmd> stuff <- forstep(OUTS[grade(stuff$f,down:T)[1]],H,E,fh,fe);
The last step can be repeated to bring "in" variables. Of course, in
practice, you want to examine the computed F-to-enter statistics to see
if another variable *should* be entered.
You can do a backward step (variable deletion) using macro backstep. One
difference between backstep and forstep is that backstep determines the
variable to eliminate, and then updates INS and OUTS; you must tell forstep
which variable to include. See 'backstep' for details. See also 'compf'
which computes F-to-enter for variables not in INS.
Both backstep and compf are OBSOLETE and are retained only for backward
compatibility.
See also manova(), 'daentervar', 'daremovevar', 'dastepsetup',
'dastepstatus' and 'dasteplook'.
====glsfactor()#
%%%%
glsfactor(s, m [, quiet:T,silent:T,start:psi0,uselogs:T], maxit:nmax,
minit:nmin, print:T,crit:crvec]), REAL symmetric matrix s with no
MISSING values, integers nmax > 0 and nmin > 0, crvec =
vector(numsig, nsigsq, delta)
%%%%
glsfactor is a macro to find GLS (generalized least squares) estimates
the vector psi of factor analysis uniquenesses and loading matrix L by
means of nonlinear least squares. It uses macro glsresids to compute
residuals used by non-linear least squares macro levmar.
glsfactor(r, m) estimates uniquenesses and loadings for a m-factor
analysis based on p by p symmetric correlation or variance-covariance
matrix r. m > 0 must be an integer < (2*p + 1 - sqrt(8*p+1))/2. The
default starting value for the uniquenesses is psi0 = 1/diag(solve(r)).
glsfactor prints the estimated uniquenesses psihat and loading matrix,
the minimized criterion value, and vals, a vector of numbers computed
from certain eigenvalues; see below.
The result is an "invisible" (assignable but not automatically printed)
variable of the form
structure(psihat:psi,loadings:L,criterion:crit,eigenvals:vals,\
status:iconv,iter:n)
The values of these components are as follows
psi length p vector of estimated uniquenesses
L p by m matrix of estimated loadings satisfying
L' %*% dmat(1/psi) %*% L is diagonal
crit sum of squared "residuals" of r from rhat
vals 1/v - 1, where v is the vector of eigenvalues of
dmat(psi) %*% solve(r) in increasing order
iconv integer >= 0 indicating convergence status, with 0 and 4
indicating non-convergence
n number of iterations
There are several keyword phrases that can be used as arguments to
control the behavior of glsfactor.
start:psi0 psi0 a REAL vector of length p with min(psi0) > 0
to be used as starting values for the iteration
uselogs:T log(psi) is used in the iteration instead of psi;
this ensures psi does not become negative and may
speed convergence
maxit:nmax No more than nmax iterations are to be used; the
default is 30
minit:nmin At least nmin iterations will be performed; the
default is 1
crit:crvec crvec = vector(numsig, nsigsq, delta), 3 criteria
for convergence; a negative criterion is ignored;
see macro levmar for details
quiet:T uniquenesses, loadings and vals are not printed;
only certain warning messages are printed
silent:T nothing is printed except error messages
print:T macro levmar will print a status report on every
iteration
Values of iconv indicate the following situations
iconv Meaning
0 Not converged
1 Converged with relative change in psi or log(psi) < 10^-numsig
2 Converged with relative change in rss <= 10^-nsigsq
3 Converged with ||gradient|| < delta
4 Interation stopped; could not reduce rss.
Without uselogs:T, there is no guarantee that min(psihat) > = 0.
Even with uselogs:T, there is no guarantee that the minimum is inside
the permissible range (r - dmat(psihat) positive semi-definite);
however, this often leads to an 'argument to solve() singular' error
message.
Everytime it is called by levmar, macro glsresids saves a copy of the
current value of psi in invisible variable _PSIGLS. Type
'print(_PSIGLS)' to see this value.
glsfactor is very much a work in progress, that may in fact be
abandoned in favor of other methods using optimization macros in macro
file Math.mac distributed with MacAnova.
See also ulsfactor and glsresids.
====glsresids()#
%%%%
glsresids is used by macro glsfactor to do GLS factor extraction by a
nonlinear least squares method.
%%%%
glsresids is used by macro glsfactor which does generalized least
squares (GLS) factor extraction.
glsresid(theta,0,vector(r),m) returns a length p^2 "residual vector"
computed from but not identical with dmat(p,1) - solve(r) %*% rhat, where
r is a REAL p by p correlation or variance-covariance matrix. m > 0 is
an integer specifying the number of factors to extract.
When __USELOGSGLS, a LOGICAL scalar defined by glsfactor, has value
False, argument theta is interpreted as psi, the length(p) REAL vector
of uniquenesses. When __USELOGSGLS is True, theta is interpreted as
log(psi). glsresid saves a copy of psi in variable _PSIGLS.
rhat = dmat(psi) + V, where V is the best rank m approximation to r -
dmat(psi) in a generalized least squares sense.
====goodfit()#
%%%%
goodfit(s,lhat,psihat), REAL symmetric matrix lhat REAL vector or
matrix, psihat REAL vector, all with no MISSING values
%%%%
goodfit(r, L, psi), where r is a p by p symmetric correlation or
variance-covariance matrix, L is a REAL p by m matrix of purported
factor loadings, and psi is a length p REAL vector of factor analysis
uniquenesses computes three measures of lack of fit of r to the factor
analytic approximation rhat = L %*% L' + dmat(psi).
If dev = vector(r - rhat) is the length p^2 vector of deviations of r
from rhat, then goodfit returns
vector(max(abs(dev)), sum(dev^2), sum(abs(dev)))
When L and psi have been computed by some factor analysis estimation
method, and one or more of the elements of the result is large it may
indicate lack of fit of r to the factor analytic model.
These are intended to describe how bad the fit is, not to test it.
====groupcovar()#
%%%%
groupcovar(groups, y), factor or vector of positive integers groups,
REAL matrix y with no MISSING values
%%%%
groupcovar(G, y), where G is a length N factor or vector of positive
integers and y is a REAL N by p data matrix with no MISSING values,
computes group means and the pooled sample covariance matrix for the
groups defined by the elements of G. It is an error if no group has at
least 2 members.
The result is structure(n:n, means:ybar, covariance:V).
n = vector(n1, ...,ng) is the vector of group sample sizes, where g =
max(G).
ybar is a g by p REAL matrix with ybar[i,] = sample mean for
group i. When group i is empty, ybar[i,] is all MISSING.
V is the pooled variance-covariance matrix. When all the groups are
non-empty, V = ((n1-1)*S1 + (n2-1)*S2 + ... + (ng-1)*Sg)/(N - g)
where S1, ..., Sg are the sample variance-covariance matrices for
the g groups.
See also tabs().
====hotellval()#
%%%%
hotellval(x [, pval:T]), REAL matrix x with no MISSING elements
%%%%
hotellval(x), where x is a n by p REAL matrix with no MISSING elements,
returns a REAL scalar containing Hotelling's T^2 statistic for testing
the null hypothesis H0: mu = 0. mu' is the expectation of each row of
x when the rows are assumed to be random sample from a multi-variate
population.
It is an error if n <= p.
hotellval(x, pval:T) does the same, except a P-value is also computed
under the assumption that the rows of x are independent multivariate
normal. The result is structure(hotelling:tsq, pvalue:pval), where tsq
and pval are REAL scalars.
Examples:
Cmd> hotellval(x - mu0', pval:T)
where mu0 is a REAL vector of length p, computes a test statistic and
associated P-value for testing H0: mu = mu0.
Cmd> hotellval(x1 - x2, pval:T)
where x1 and x2 are both n by p, computes Hotelling's paired P^2
statistic for testing H0: E(x1 - x2) = 0, together with a P-value.
See also hotell2val, tval() and t2val().
====hotell2val()#
%%%%
hotell2val(x1, x2 [, pval:T]), REAL matrices x1 and x2 with no MISSING
elements and ncols(x1) = ncols(x2)
%%%%
hotell2val(x1, x2), where x1 and x2 are REAL matrices with n1 and n2
rows and p columns, returns Hotelling's two-sample T^2 statistic for
testing the null hypothesis that the rows of x1 have the same mean as
the rows of x2 when they are considered as independent random samples
from two multivariate populations. The statistic assumes that the
variance-covariance matrix is the same in the two populations.
It is an error if n1 + n2 <= p.
hotell2val(x1, x2, pval:T) does the same, except it also computes a
P-value under the assumption that the rows of x1 and of x2 constitute
independent random samples from multivariate normal distributions with
the same variance-covariance matrix. The result is
structure(hotelling:tsq, pvalue:pval).
See also hotellval, tval() and t2val().
====mulvar_index*#
%%%%
Macros related to discriminant analysis
backstep*, compf*, daentervar, daremovevar, dasteplook, dastepsetup,
dastepstatus, discrim, discrimquad, forstep*, jackknife, probsquad
Macros related to Hotelling's T^2
hotellval, hotell2val
Macros related to factor analysis
glsfactor, glsresid, goodfit, stepgls, stepml, stepuls, ulsfactor,
ulsresids
Data generation macro
mvngen
Other data analysis macros
chiqqplot, covar*, distcomp, groupcovar
Other macros
mulvarhelp
* Obsolete; retained for backward compatibility
%%%%
Entries related to discriminant analysis
daentervar Macro to enter a variable in stepwise variable selection
daremovevar Macro to remove a variable in stepwise variable selection
dasteplook Macro to access components of structure _DASTEPSTATE
dastepsetup Macro to initialize structure _DASTEPSTATE
dastepstatus Macro to summarize the contents of structure _DASTEPSTATE
_DASTEPSTATE Description of structure _DASTEPSTATE which summarizes
the current state of the variable selection process
discrim Macro to compute a linear discriminant function, assuming
multivariate normal distributions with same variance-
covariance matrices
discrimquad Macro to compute coefficients for quadratic discrimination,
assuming multivariate normal distributions with differing
variance-covariance matrices
jackknife Macro to estimate mis-classification probabilities using
the leave-one-out method
probsquad Macro to compute posterior probabilities based on quadratic
discriminant functions computed by macro discrimquad
The following are obsolete but are retained for backward compatibility
backstep Macro to do single backward step in variable selection
compf Macro to compute F-to-enter in stepwise variable selection
forstep Macro to do single forward step in variable selection
Macros related to Hotelling's T^2
hotellval Macro to compute single sample Hotelling's T^2
hotell2val Macro to compute two sample Hotelling's T^2
Macros related to factor analysis
glsfactor Macro to do generalized least squares (GLS) factor
extraction using a method based on a non-linear least
squares computation
glsresid Macro used by glsfactor
ulsfactor Macro to do unweighted least squares (ULS) factor
extraction using a method based on a non-linear least
squares computation
ulsresid Macro used by ulsfactor
goodfit Macro that computes several descriptive measures of close
ness of the sample correlation or covariance matrix to an
estimate based on the factor analytic model
stepgls Macro to compute one step of an iterative method of
generalized least squares (GSL) factor extraction.
stepml Macro to compute one step of an iterative method of
maximum likelihood (ML) factor extraction.
stepuls Macro to compute one step of an iterative method of
unweighted least squares (ULS) factor extraction. This
is essentially the iterated principal factor method.
Data generation macro
mvngen Macro to compute a pseudorandom sample from a multivariate
normal distribution
Other data analysis macros
chiqqplot Macro to make a chi-squared or sqrt(chi-squared) Q-Q
plot using generalized distances, usually computed
from residuals from a multivariate linear model
covar Macro to compute the sample mean and covariance matrix
from a multivariate sample
distcomp Macro to compute generalized distances from of the
rows of a matrix from the vector of column means
groupcovar Macro to compute the group means and pooled variance-
covariance matrix from multi-group data
Other macros
mulvarhelp Macro to print help and usage information related to
macros in file Mulvar.mac.
====jackknife()#
%%%%
probs <- jackknife(groups,y [,prior:P]), factor or vector of positive
integers groups, REAL matrix y and positive vector P with no MISSING
elements
%%%%
jackknife(G, y), where G is a factor or vector of positive integers of
length n and y is a REAL n by p matrix with no MISSING elements, carries
out a jackknife validation of linear discriminant functions designed to
discriminate among the g groups defined by the levels of G.
When you try to estimate the error rate of a classification method by
counting the errors it makes in classifying the cases in the "training
sample", the data set you are using to estimate the method, your estimate
is biased in an optimistic direction. That is, the proportion of cases
misclassified in the training sample tends to be smaller than the
proportion of cases misclassified in new sample (validation sample).
jackknife attempts to avoid this bias by classifying each case in the
training sample with linear discriminant functions computed from all the
other cases in the training sample. This is the "leave-one-out" method,
sometimes called the Lachenbruch-Mickey method.
Macro jackknife returns a n by g+1 matrix probs.
probs[i,j], for j = 1,...,g is an estimate of the posterior probability
that the data in y[i,] were derived from population j.
probs[i,g+1] is an integer from 1 to g indicating the population in
which the case should be classified, that is the population for which
the posterior probability is largest.
For each i, 1 <= i <= n, the posterior probabilities probs[i,j], j =
1,..., g are computed as follows.
The linear discriminant function based on y[-i,], that is using all the
data except row i, and the discriminant functions scores for the data
in y[i,] are computed. From these the posterior probabilities are
computed assuming equal prior probability 1/g for each of the groups.
Each group is assumed to be multivariate normal with the same variance-
covariance matrix in each group.
Because the discriminant functions used to classify y[i,] are computed
without using y[i,], the method is close to unbiased.
jackknife(G,y,prior:P), where P is a REAL vector of length n with no
MISSING elements, does the same except the posterior probabilities are
computed using P.
Here is how you might use jackknife to estimate the expected
probability of misclassification, assuming the prior probability that a
randomly selected case comes from population j is P[j].
Cmd> probs <- jackknife(G, y, prior:P)
Cmd> n <- tabs(,G,count:T) # vector of sample sizes
Cmd> misclassprob <- tabs(,G,probs[,g+1],count:T)/n
Cmd> misclassprob[hconcat(run(g),run(g))] <- 0# set diags to 0
Cmd> sum(misclassprob' %*% P)
The last line computes the estimated probability case randomly selected
from a group with prior probabilitys P will be misclassified by
linear discriminant functions estimated from y. misclassprob[i,j] with
i != j is an estimate that a case from population i is misclassifed as
population j.
This version of jackknife is relatively fast since it computes the
successive leave-one-out discriminant functions by modification of the
discriminant functions using all the data, rather than starting fresh.
====mulvarhelp()#
%%%%
mulvarhelp(topic1 [, topic2 ...] [,usage:T])
mulvarhelp(index:T)
%%%%
mulvarhelp(topicname) prints help on a topic related to file
mulvar.mac. Usually topicname is the name of a macro in the file.
When quoted, topicname may contain "wildcard" characters "*" and "?".
You can also use help keyword 'key'. See help() for details.
mulvarhelp(topicname1, topicname2, ...) prints help on more than one
topic.
mulvarhelp(topicname1 [, topicname2 ...], usage:T) prints just a brief
summary of usage for the each topic.
====mvngen()#
%%%%
mvngen(n , p), integers n > 0, p > 1
mvngen(n, sigma), REAL positive definite symmetric matrix sigma,
nrows(sigma) > 1
mvngen(n, p or sigma, mu), REAL vector mu with length(mu) = p or
ncols(sigma)
%%%%
mvngen(n, p), where n > 0 and p >= 2 are integer scalars, returns an n
by p matrix whose rows are a random sample from the standard
multivariate normal distribution MVN(0, I_p), where I_p is the p by p
identity matrix.
mvngen(n, p, mu) where mu is a REAL vector of length p does the same,
except the rows of the result are MVN(mu, I_p).
mvngen(n, sigma [,mu]), where sigma is a positive definite p by p
symmetric matrix, does the same, except the rows of the result are
MVN(0, sigma) pr MVN(mu, sigma).
When p = 1, use mu + sd*rnorm(n) to generate a normal sample with mean
mu and standard deviation sd.
====probsquad()#
%%%%
probsquad(x,structure(Q,L,addcon,grandmean) [, prior:P]), REAL matrix x
with no MISSING elements, argument 2 a structure as returned by macro
discrimquad, REAL vector P of prior probabilities with P[i] > 0,
default = rep(1/g,g)
%%%%
Suppose info = structure(Q,L,addcon,grandmean) as computed by
discrimquad(groups, y) summarizes quadratic discriminant functions
estimated from a n by p matrix of data from g groups.
probsquad(x, info), where x a REAL vector of length p, returns the
length g vector of posterior probabilities, assuming multivariate
normality of each groups and equal prior probability 1/g for each group.
probsquad(x, info, prior:P), where P is a length g vector with P[j] >
0, does the same using P[j]/sum(P) as prior probability of group j.
In both these usages x can be a m by p matrix, each row of which is an
observation and probsquad returns a m by g matrix of posterior
probabilities.
See probsquad for information about the form of info.
====stepgls()#
%%%%
stepgls(s, psi, m [, print:T]), REAL positive definite symmetric matrix
s, REAL vector psi or structure with psi[1] a REAL vector, integer m
> 0
%%%%
stepgls(r, psi, m) performs one step of an iteration which attempts to
extract m factors in factor analyis. r is a p by p correlation or
variance-covariance matrix, psi is a REAL vector of trial uniquenesses
of length p with min(psi) > 0 and integer m > 0.
The value returned is structure(psi:newpsi, loadings:L, crit:criterion),
where newpsi is an updated vector of uniquenesses, L is a p by m matrix
of factor loadings satisfying L' %*% dmat(1/psi) %*% L is diagonal, and
criterion is a value of the criterion being minimized; see below.
In addition, stepgls creates side effect variabls PSI, LOADINGS, and
CRITERION containing newphi, L and criterion.
Actually L is the loadings that minimize the criterion for argument
phi, not the output newphi, and criterion is the criterion assocated
with phi and L.
Each step reduces the generalized least squares (GLS) criterion =
trace((I - solve(r) %*% rhat) %*% (I - solve(r) %*% rhat)), where rhat
has the form rhat = dmat(psi) + V where V is the unique rank m matrix
that minimizes this criterion for given psi.
Argument psi can also have the form structure(psi1, ...) where psi1 is
the vector of uniquenesses. This means you can do many steps of the
iteration as follows.
Cmd> psi <- psi0 # psi0 a vector of starting values
Cmd> for (i,1,500){psi <- stepgls(r,psi,m);;} # 500 steps
stepgls(r, psi, m, print:T) does the same except the updated psi and the
criterion value are printed. For example,
Cmd> for (i,1,500){psi <- stepgls(r,psi,m,print:i %% 50 == 0);;}
prints out psi and and the criterion every 50 steps.
The iteration performed by stepgls is similar but not identical to what
is known Principal Factor Iteration. It is not unusual for it to
converge very slowly. In addition, it is possible for a step to
produce a psi with min(psi) < 0 or r - dmat(psi) not non-negative
definite. When this happens stepgls aborts. You can retrieve the most
recent uniquenesses and loadings from variables PSI and LOADINGS.
It is usually preferable to use a method that more directly minimizes
the criterion, such as macro glsfactor. In addition, one of the
function minimization macros, dfp, bfs or broyden could be used to
minimize the criterion as a function of psi or log(psi).
====stepml()#
%%%%
stepml(s, psi, m [, print:T]), REAL positive definite symmetric matrix
s, REAL vector psi or structure with psi[1] a REAL vector, integer m
> 0
%%%%
stepml(r, psi, m) performs one step of an iteration which attempts to
extract m factors in factor analyis. r is a p by p correlation or
variance-covariance matrix, psi is a REAL vector of trial uniquenesses
of length p with min(psi) > 0 and integer m > 0.
The value returned is structure(psi:newpsi, loadings:L, crit:criterion),
where newpsi is an updated vector of uniquenesses, L a p by m matrix of
factor loadings satisfying L' %*% dmat(1/psi) %*% L is diagonal, and
criterion is a value of the criterion being minimized; see below.
In addition, stepml also creates side effect variabls PSI, LOADINGS,
and CRITERION containing newphi, L and criterion.
Actually L is the loadings that minimize the criterion for argument
phi, not the output newphi, and criterion is the criterion assocated
with phi and L.
Each step reduces the likelihood (ML) criterion = log(det(rhat)) -
trace(r %*% solve(rhat)) - log(det(r)) - p, where rhat has the form
rhat = dmat(psi) + V where V is the unique rank m matrix that minimizes
this criterionfor given psi.
Argument psi can also have the form structure(psi1, ...) where psi1 is
the vector of uniquenesses. This means you can do many steps of the
iteration as follows.
Cmd> psi <- psi0 # psi0 a vector of starting values
Cmd> for (i,1,500){psi <- stepml(r,psi,m);;}
stepml(r, psi, m, print:T) does the same except the updated psi and the
criterion value are printed. For example,
Cmd> for (i,1,500){psi <- stepml(r,psi,m,print:i %% 50 == 0);;}
prints out psi and and the criterion every 50 steps.
The iteration performed by stepml is similar but not identical to what
is known Principal Factor Iteration. It is not unusual for it to
converge very slowly. In addition, it is possible a step to produce a
psi with min(psi) < 0 or r - dmat(psi) not non-negative definite. When
this happens stepml aborts. You can retrieve the most recent
uniquenesses and loadings from variables PSI and LOADINGS.
It is usually preferable to use a method that more directly minimizes
the criterion. At present, none such is distributed with MacAnova.
However, one of the function minimization macros, dfp, bfs or broyden
could be used to minimize the criterion as a function of psi or
log(psi).
====stepuls()#
%%%%
stepuls(r, psi, m [, print:T]), REAL positive definite symmetric matrix
r, REAL vector psi or structure with psi[1] a REAL vector, integer m
> 0
%%%%
stepuls(r, psi, m) performs one step of an iteration which attempts to
extract m factors in factor analyis. r is a p by p correlation or
variance-covariance matrix, psi is a REAL vector of trial uniquenesses
of length p with min(psi) > 0 and integer m > 0.
The value returned is structure(psi:newpsi, loadings:L, crit:criterion),
where newpsi is an updated vector of uniquenesses, L a p by m matrix of
factor loadings satisfying L' %*% L is diagonal, and criterion is a
value of the criterion being minimized; see below.
In addition, stepuls also creates side effect variabls PSI, LOADINGS,
and CRITERION containing newphi, L and criterion.
Actually L is the loadings that minimize the criterion for argument
phi, not the output newphi, and criterion is the criterion assocated
with phi and L.
Each step reduces the unweighted least squares (ULS) criterion =
trace((r - rhat) %*% (r - rhat)), where rhat has the form rhat =
dmat(psi) + V where V is the unique rank m matrix that minimizes this
criterion for given psi.
Argument psi can also have the form structure(psi1, ...) where psi1 is
the vector of uniquenesses. This means you can do many steps of the
iteration as follows.
Cmd> psi <- psi0 # psi0 a vector of starting values
Cmd> for (i,1,500){psi <- stepuls(r,psi,m);;}
stepuls(r, psi, m, print:T) does the same except the updated psi and the
criterion value are printed. For example,
Cmd> for (i,1,500){psi <- stepuls(r,psi,m,print:i %% 50 == 0);;}
prints out psi and and the criterion every 50 steps.
The iteration performed by stepuls is also known as Principal Factor
Iteration. It is not unusual for it to converge very slowly. In
addition, it is possible a step to produce a psi with min(psi) < 0 or r
- dmat(psi) not non-negative definite. When this happens stepuls
aborts. You can retrieve the most recent uniquenesses and loadings
from variables PSI and LOADINGS.
It is usually preferable to use a method that more directly minimizes
the criterion, such as macro ulsfactor. In addition, one of the
function minimization macros, dfp, bfs or broyden could be used to
minimize the criterion as a function of psi or log(psi).
====ulsfactor()#
%%%%
ulsfactor(s, m [, quiet:T,silent:T,start:psi0,uselogs:T], maxit:nmax,
minit:nmin, print:T,crit:crvec]), REAL symmetric matrix s with no
MISSING values, integers nmax > 0 and nmin > 0, crvec =
vector(numsig, nsigsq, delta)
%%%%
ulsfactor is a macro to find ULS (unweighted least squares) estimates
the vector psi of factor analysis uniquenesses and loading matrix L by
means of nonlinear least squares. It uses macro ulsresids to compute
residuals required by non-linear least squares macro levmar.
ulsfactor(r, m) estimates uniquenesses and loadings for a m-factor
analysis based on p by p symmetric correlation or variance-covariance
matrix r. m > 0 must be an integer < (2*p + 1 - sqrt(8*p+1))/2. The
default starting value for the uniquenesses is psi0 = 1/diag(solve(r)).
ulsfactor prints the estimated uniquenesses psihat and loading matrix,
the minimized criterion value, and vals, a vector of numbers computed
from certain eigenvalues; see below.
The result is an "invisible" (assignable but not automatically printed)
variable of the form
structure(psihat:psi,loadings:L,criterion:crit,eigenvals:vals,\
status:iconv,iter:n)
The values of these components are as follows
psi length p vector of estimated uniquenesses
L p by m matrix of estimated loadings satisfying
L' %*% L is diagonal
crit sum of squared "residuals" = vector(r - rhat)
vals v, where v is the vector of eigenvalues of r - dmat(psi)
iconv integer >= 0 indicating convergence status, with 0 and 4
indicating non-convergence
n number of iterations
There are several keyword phrases that can be used as arguments to
control the behavior of ulsfactor.
start:psi0 psi0 a REAL vector of length p with min(psi0) > 0
to be used as starting values for the iteration
uselogs:T log(psi) is used in the iteration instead of psi;
this ensures psi does not become negative
maxit:nmax No more than nmax iterations are to be used; the
default is 30
minit:nmin At least nmin iterations will be performed; the
default is 1
crit:crvec crvec = vector(numsig, nsigsq, delta), 3 criteria
for convergence; a negative criterion is ignored;
see macro levmar for details
quiet:T uniquenesses, loadings and vals are not printed;
only certain warning messages are printed
silent:T nothing is printed except error messages
print:T macro levmar will print a status report on every
iteration
Values of iconv indicate the following situations
iconv Meaning
0 Not converged
1 Converged with relative change in psi or log(psi) < 10^-numsig
2 Converged with relative change in rss <= 10^-nsigsq
3 Converged with ||gradient|| < delta
4 Interation stopped; could not reduce rss.
With or without uselogs:T, there is no guarantee that the solution
found is admissible in the sense that r - dmat(psihat) is positive
semi-definite. A warning is printed when is not. With uselogs:T this
situation is sometimes indicated by an 'argument to solve() singular'
error message.
Without uselogs:T, there is no guarantee that min(psihat) > = 0. A
warning is printed if it is not.
Every time macro ulsresids is called by levmar, it saves a copy of the
current value of psi in invisible variable _PSIGLS. Type
print(_PSIGLS) to see this value.
ulsfactor is very much a work in progress that may in fact be abandoned
in favor of other methods using optimization macros in macro file
Math.mac distributed with MacAnova.
====ulsresids()#
%%%%
ulsresids is used by macro ulsfactor to do ULS factor extraction by a
nonlinear least squares method.
%%%%
ulsresids is used by macro ulsfactor to do unweighted least squares
(ULS) factor extraction.
ulsresid(theta,0,vector(r),m) returns the residual matrix r - rhat as a
vector of length p^2, where r is a REAL p by p correlation or variance-
covariance matrix. m > 0 is an integer specifying the number of
factors to extract.
When __USELOGSULS, a LOGICAL scalar defined by ulsfactor, has value
False, argument theta is interpreted as psi, the length(p) REAL vector
of uniquenesses. When __USELOGSULS is True, theta is interpreted as
log(psi). ulsresid saves a copy of psi in variable _PSIULS.
rhat = dmat(psi) + V, where V is the best rank m approximation to r -
dmat(psi) in the least squares sense.
_E_O_F_#This should be the last line, an internal End Of File marker