fpca                   package:psy                   R Documentation

_F_o_c_u_s_e_d _P_r_i_n_c_i_p_a_l _C_o_m_p_o_n_e_n_t_s _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     Graphical representation similar to a principal components
     analysis but adapted to data structured with dependent/independent
     variables

_U_s_a_g_e:

     fpca(datafile, y, x, cx=0.75, namesvar=attributes(datafile)$names, pvalues="No", partial="Yes", input="data", contraction="No", sample.size=1)

_A_r_g_u_m_e_n_t_s:

datafile: name of datafile

       y: column number of the dependent variable

       x: column numbers of the independent (explanatory) variables

      cx: size of the lettering (0.75 by default, 1 for bigger letters,
          0.5 for smaller)

namesvar: label of variables (names of columns by default)

 pvalues: vector of prespecified pvalues (pvalues="No" by default) (see
          below)

 partial: partial="Yes" by default, corresponds to the original method
          (see below)

   input: input="Cor" for a correlation matrix (input="data" by
          default)

contraction: change the aspect of the diagram, contraction="Yes" is
          convenient for large data set (contraction="No" by default)

sample.size: to be specified if input="Cor"

_D_e_t_a_i_l_s:

     This representation is close to a Principal Components Analysis
     (PCA). Contrary to PCA, correlations between the dependent
     variable and the other variables are represented faithfully. The
     relationships between non dependent variables are interpreted like
     in a PCA: correlated variables are close or diametrically opposite
     (for negative correlations), independent variables make a right
     angle with the origin. The focus on the dependent variable leads
     formally to a partialisation of the correlations between the non
     dependent variables by the dependent variable (see reference). To
     avoid this partialisation, the option partial="No" can be used. It
     may be interesting to represent graphically the strength of
     association between the dependent variable and the other variables
     using p values coming from a model. A vector of pvalue may be
     specified in this case.

_V_a_l_u_e:

     A plot (q plots in fact).

_A_u_t_h_o_r(_s):

     Bruno Falissard, Bill Morphey

_R_e_f_e_r_e_n_c_e_s:

     Falissard B, Focused Principal Components Analysis: looking at a
     correlation matrix with a particular interest in a given variable.
     Journal of Computational and Graphical Statistics (1999), 8(4):
     906-912.

_E_x_a_m_p_l_e_s:

     data(sleep)
     fpca(sleep,5,c(2:4,7:11)) 
     ## focused PCA of the duration of paradoxical sleep (dreams, 5th column)
     ## against constitutional variables in mammals (columns 2, 3, 4, 7, 8, 9, 10, 11).
     ## Variables inside the red cercle are significantly correlated
     ## to the dependent variable with p<0.05.
     ## Green variables are positively correlated to the dependent variable,
     ## yellow variables are negatively correlated.
     ## There are three clear clusters of independent variables.

     corsleep <- as.data.frame(cor(sleep[,2:11],use="pairwise.complete.obs"))
     fpca(corsleep,4,c(1:3,6:10),input="Cor",sample.size=60) 
     ## when missing data are numerous, the representation of a pairwise correlation
     ## matrix may be preferred (even if mathematical properties are not so good...)

     numer <- c(2:4,7:11)
     l <- length(numer)
     resu <- vector(length=l)
     for(i in 1:l)
     {
     int <- sleep[,numer[i]]
     mod <- lm(sleep$Paradoxical.sleep~int)
     resu[i] <-  summary(mod)[[4]][2,4]*sign(summary(mod)[[4]][2,1])
     }
     fpca(sleep,5,c(2:4,7:11),pvalues=resu)
     ## A representation with p values
     ## When input="Cor" or pvalues="Yes" partial is turned to "No"

     mod <- lm(sleep$Paradoxical.sleep~sleep$Body.weight+sleep$Brain.weight+
     sleep$Slow.wave.sleep+sleep$Maximum.life.span+sleep$Gestation.time+
     sleep$Predation+sleep$Sleep.exposure+sleep$Danger)
     resu <-  summary(mod)[[4]][2:9,4]*sign(summary(mod)[[4]][2:9,1])
     fpca(sleep,5,c(2:4,7:11),pvalues=resu)
     ## A representation with p values which come from a multiple linear model
     ## (here results are difficult to interpret)

