mmglm              package:HiddenMarkov              R Documentation

_M_a_r_k_o_v _M_o_d_u_l_a_t_e_d _G_L_M _O_b_j_e_c_t

_D_e_s_c_r_i_p_t_i_o_n:

     Creates a Markov modulated generalised linear model object with
     class '"mmglm"'.

_U_s_a_g_e:

     mmglm(x, Pi, delta, family, link, beta, glmformula = formula(y~x1),
           sigma = NA, nonstat = TRUE)

_A_r_g_u_m_e_n_t_s:

       x: a dataframe containing the observed variable (i.e. the
          response variable in the generalised linear model) and the
          covariate. Currently, the response variable must be named 'y'
          and the covariate 'x1'.  Alternatively, 'x' could be
          specified as 'NULL', meaning that the data will be added
          later (e.g. simulated). See Details below for the binomial
          case.

      Pi: is the m times m transition probability matrix of the hidden
          Markov chain.

   delta: is the marginal probability distribution of the m hidden
          states at the first time point.

  family: character string, the GLM family, one of '"gaussian"',
          '"poisson"', '"Gamma"' or '"binomial"'.

    link: character string, the link function. If 'family ==
          "binomial"', then one of '"logit"', '"probit"' or
          '"cloglog"'; else one of '"identity"', '"inverse"' or
          '"log"'.

    beta: a 2 times m matrix containing parameter estimates. The first
          row contains the m constants in the linear predictor for each
          Markov state, and the second row contains the linear
          regression coefficient in the linear predictor for each
          Markov state.

glmformula: currently the only model formula is 'y~x1'.

   sigma: if 'family == "gaussian"', then it is the variance; if
          'family == "Gamma"', then it is '1/sqrt(shape)'; for each
          Markov state.

 nonstat: is logical, 'TRUE' if the homogeneous Markov chain is assumed
          to be non-stationary, default.

_D_e_t_a_i_l_s:

     This model assumes that the observed responses are ordered in
     time, together with a covariate at each point. The model is based
     on a simple regression model within the 'glm' framework (see
     McCullagh & Nelder, 1989), but where the coefficients beta_0 and
     beta_1 in the linear predictor vary according to a hidden Markov
     state. The responses are assumed to be conditionally independent
     given the value of the Markov chain.

     If 'family == "binomial"' then the response variable 'y' is
     interpreted as the number of successes. The dataframe 'x' must
     also contain a variable called 'size' being the number of
     Bernoulli trials. This is different to the format used by the
     function 'glm' where 'y' would be a matrix with two columns
     containing the number of successes and failures, respectively. The
     different format here allows one to specify the number of
     Bernoulli trials _only_ so that the number of successes or
     failures can be simulated later.

     When the density function of the response variable is from the
     exponential family (Charnes et al, 1976, Eq. 2.1), the likelihood
     function (Charnes et al, 1976, Eq. 2.4) can be maximised by using
     iterative weighted least squares (Charnes et al, 1976, Eq. 1.1 and
     1.2). This is the method used by the R function 'glm'. In this
     Markov modulated version of the model, the third term of the
     complete data likelihood, as given in Harte (2006, \S 2.3), needs
     to be maximised. This is simply the sum of the individual
     log-likelihood contributions of the response variable weighted by
     the Markov state probabilities calculated in the E-step. This can
     also be maximised using iterative least squares by passing these
     additional weights (Markov state probabilities) into the 'glm'
     function.

_V_a_l_u_e:

     A 'list' object with class '"mmglm"', containing the above
     arguments as named components.

_R_e_f_e_r_e_n_c_e_s:

     Charnes, A.; Frome, E.L. & Yu, P.L. (1976). The equivalence of
     generalized least squares and maximum likelihood estimates in the
     exponential family. _JASA_ *71(353)*, 169-171. URL: <URL:
     http://dx.doi.org/10.2307/2285762>

     Harte, D. (2006). _Mathematical Background Notes for Package
     "HiddenMarkov"._ Statistics Research Associates, Wellington. URL:
     <URL:
     http://homepages.paradise.net.nz/david.harte/SSLib/Manuals/notes.pdf>.

     McCullagh, P. & Nelder, J.A. (1989). _Generalized Linear Models
     (2nd Edition)._ Chapman and Hall, London.

_E_x_a_m_p_l_e_s:

     delta <- c(0,1)

     Pi <- matrix(c(0.8, 0.2,
                    0.3, 0.7),
                  byrow=TRUE, nrow=2)

     #--------------------------------------------------------
     #     Poisson with log link function

     x <- mmglm(NULL, Pi, delta, family="poisson", link="log",
                beta=rbind(c(0.1, -0.1), c(1, 5)))

     x <- simulate(x, nsim=5000, seed=10)

     y <- BaumWelch(x)

     hist(residuals(y))
     print(summary(y))
     print(logLik(y))

     #--------------------------------------------------------
     #     Binomial with logit link function

     x <- mmglm(NULL, Pi, delta, family="binomial", link="logit",
                beta=rbind(c(0.1, -0.1), c(1, 5)))

     x <- simulate(x, nsim=5000, seed=10)

     y <- BaumWelch(x)

     hist(residuals(y))
     print(summary(y))
     print(logLik(y))

     #--------------------------------------------------------
     #     Gaussian with identity link function

     x <- mmglm(NULL, Pi, delta, family="gaussian", link="identity",
                beta=rbind(c(0.1, -0.1), c(1, 5)), sigma=c(1, 2))

     x <- simulate(x, nsim=5000, seed=10)

     y <- BaumWelch(x)

     hist(residuals(y))
     print(summary(y))
     print(logLik(y))

     #--------------------------------------------------------
     #     Gamma with log link function

     x <- mmglm(NULL, Pi, delta, family="Gamma", link="log",
                beta=rbind(c(2, 1), c(-2, 1.5)), sigma=c(0.2, 0.1))

     x1 <- seq(0.01, 0.99, 0.01)
     plot(x1, exp(x$beta[1,2] + x$beta[2,2]*x1), type="l",
          xlim=c(0,1),ylim=c(0, 10), col="red", lwd=3)
     points(x1, exp(x$beta[1,1] + x$beta[2,1]*x1), type="l",
            col="blue", lwd=3)

     x <- simulate(x, nsim=1000, seed=10)

     points(x$x$x1, x$x$y)

     x$beta[2,] <- c(-3, 4)
     y <- BaumWelch(x, bwcontrol(posdiff=FALSE))

     hist(residuals(y))
     print(summary(y))
     print(logLik(y))

