sn.em                   package:sn                   R Documentation

_F_i_t_t_i_n_g _S_k_e_w-_n_o_r_m_a_l _v_a_r_i_a_b_l_e_s _u_s_i_n_g _t_h_e _E_M _a_l_g_o_r_i_t_h_m

_D_e_s_c_r_i_p_t_i_o_n:

     Fits a skew-normal (SN) distribution to data, or fits a linear
     regression model with skew-normal errors, using the EM algorithm
     to locate the MLE estimate. The estimation procedure can be global
     or it can fix some components of the parameters vector.

_U_s_a_g_e:

     sn.em(X, y, fixed, p.eps=0.0001, l.eps=0.01, trace=FALSE, data=FALSE)

_A_r_g_u_m_e_n_t_s:

       y: a vector contaning the observed variable. This is the
          response variable in case of linear regression. 

       X: a matrix of explanatory variables. If 'X' is missing, then a
          one-column matrix of all 1's is created. If 'X' is supplied,
          and an intercept term is required, then it must include a
          column of 1's. 

   fixed: a vector of length 3, indicating which components of the
          parameter vector must be regarded as fixed. In
          'fixed=c(NA,NA,NA)', which is the default setting, a global
          maximization is performed. If the 3rd component is given a
          value, then maximization is performed keeping that value
          fixed for the shape parameter. If the 3rd and 2nd parameters
          are fixed, then the scale and the shape parameter are kept
          fixed. No other patterns of the fixed values are allowed. 

   p.eps: numerical value which regulates the parameter convergence
          tolerance. 

   l.eps: numerical value which regulates the log-likelihood
          convergence tolerance. 

   trace: logical value which controls printing of the algorithm
          convergence. If 'trace=TRUE', details are printed. Default
          value is 'F'. 

    data: logical value. If 'data=TRUE', the returned list includes the
          original data. Default value is 'data=FALSE'. 

_D_e_t_a_i_l_s:

     The function works using the direct parametrization; on
     convergence, the output is then given in both parametrizations.

     This function is based on the EM algorithm; it is generally quite
     slow, but it appears to be very robust. See 'sn.mle' for an
     alternative method, which also returns standard errors.

_V_a_l_u_e:

     a list with the following components:

      dp: a vector of the direct parameters, as explained in the
          references below. 

      cp: a vector of the centred parameters, as explained in the
          references below. 

    logL: the log-likelihood at congergence. 

    data: optionally (if 'data=TRUE'), a list containing 'X' and 'y,'
          as supplied on input, and a vector of 'residuals', which
          should have an approximate SN distribution with 'location=0'
          and 'scale=1', in the direct parametrization. 

_B_a_c_k_g_r_o_u_n_d:

     Background information on the SN distribution is given by Azzalini
     (1985). See  Azzalini and Capitanio (1999) for a more detailed
     discussion of the direct and centred parametrizations.

_R_e_f_e_r_e_n_c_e_s:

     Azzalini, A. (1985). A class of distributions which includes the
     normal ones. _Scand. J. Statist._ *12*, 171-178.

     Azzalini, A. and Capitanio, A. (1999). Statistical applications of
     the multivariate skew-normal distribution. _J.Roy.Statist.Soc. B_
     *61*, 579-602.

_S_e_e _A_l_s_o:

     'dsn', 'sn.mle', 'cp.to.dp'

_E_x_a_m_p_l_e_s:

     data(ais)
     attach(ais)
     #
     a<-sn.em(y=bmi)
     #
     a<-sn.em(X=cbind(1,lbm,lbm^2),y=bmi)
     #
     M<-model.matrix(~lbm+I(ais$sex))
     b<-sn.em(M,bmi)
     #
     fit <- sn.em(y=bmi, fixed=c(NA, 2, 3), l.eps=0.001)

