breakpoints           package:strucchange           R Documentation

_D_a_t_i_n_g _B_r_e_a_k_s

_D_e_s_c_r_i_p_t_i_o_n:

     Computation of breakpoints in regression relationships. Given a
     number of breaks the function computes the optimal breakpoints.

_U_s_a_g_e:

     ## S3 method for class 'formula':
     breakpoints(formula, h = 0.15, breaks = NULL,
         data = list(), ...)
     ## S3 method for class 'breakpointsfull':
     breakpoints(obj, breaks = NULL, ...)
     ## S3 method for class 'breakpointsfull':
     summary(object, breaks = NULL, sort = TRUE,
         format.times = NULL, ...)
     ## S3 method for class 'breakpoints':
     lines(x, breaks = NULL, lty = 2, ...)

     ## S3 method for class 'breakpointsfull':
     coef(object, breaks = NULL, names = NULL, ...)
     ## S3 method for class 'breakpointsfull':
     fitted(object, breaks = NULL, ...)
     ## S3 method for class 'breakpointsfull':
     residuals(object, breaks = NULL, ...)
     ## S3 method for class 'breakpointsfull':
     vcov(object, breaks = NULL, names = NULL,
         het.reg = TRUE, het.err = TRUE, vcov = NULL, sandwich = TRUE, ...)

_A_r_g_u_m_e_n_t_s:

 formula: a symbolic description for the model in which breakpoints
          will be estimated.

       h: minimal segment size either given as fraction relative to the
          sample size or as an integer giving the minimal number of
          observations in each segment.

  breaks: integer specifying the maximal number of breaks to be
          calculated. By default the maximal number allowed by 'h' is
          used.

    data: an optional data frame containing the variables in the model.
          By default the variables are taken from the environment which
          'breakpoints' is called from.

     ...: currently not used.

obj, object: an object of class '"breakpointsfull"'.

    sort: logical. If set to 'TRUE' 'summary' tries to match the
          breakpoints from partitions with different numbers of breaks.

format.times: logical. If set to 'TRUE' a vector of strings with the
          formatted breakdates is printed. See 'breakdates' for more
          information.

       x: an object of class '"breakpoints"'.

     lty: line type.

   names: a character vector giving the names of the segments. If of
          length 1 it is taken to be a generic prefix, e.g.
          '"segment"'.

 het.reg: logical. Should heterogenous regressors be assumed? If set to
          'FALSE' the distribution of the regressors is assumed to be
          homogenous over the segments.

 het.err: logical. Should heterogenous errors be assumed? If set to
          'FALSE' the distribution of the errors is assumed to be
          homogenous over the segments.

    vcov: a function to extract the covariance matrix for the
          coefficients of a fitted model of class '"lm"'.

sandwich: logical. Is the function 'vcov' the sandwich estimator or
          only the middle part?

_D_e_t_a_i_l_s:

     All procedures in this package are concerned with testing or
     assessing deviations from stability in the classical linear
     regression model


                          y_i = x_i' b + u_i


     In many applications it is reasonable to assume that there are m
     breakpoints, where the coefficients shift from one stable
     regression relationship to a different one. Thus, there are m+1
     segments in which the regression coefficients are constant, and
     the model can be rewritten as


 y_i = x_i' b_j + u_i   (i = i_{j-1} + 1, ..., i_j,   j = 1, ..., m+1)


     where j denotes the segment index. In practice the breakpoints i_j
     are rarely given exogenously, but have to be estimated.
     'breakpoints' estimates these breakpoints by minimizing the
     residual sum of squares (RSS) of the equation above.

     The foundation for estimating breaks in time series regression
     models was given by Bai (1994) and was extended to multiple breaks
     by Bai (1997ab) and Bai & Perron (1998). 'breakpoints' implements
     the algorithm described in Bai & Perron (2003) for simultanous
     estimation of multiple breakpoints. The distribution function used
     for the confidence intervals for the breakpoints is given in Bai
     (1997b). The ideas behind this implementation are described in
     Zeileis et al. (2003).

     The algorithm for computing the optimal breakpoints given the
     number of breaks is based on a dynamic programming approach. The
     underlying idea is that of the Bellman principle. The main
     computational effort is to compute a triangular RSS matrix, which
     gives the residual sum of squares for a segment starting at
     observation i and ending at i' with i < i'.

     Given a 'formula' as the first argument, 'breakpoints' computes an
     object of class '"breakpointsfull"' which inherits from
     '"breakpoints"'. This contains in particular the triangular RSS
     matrix and functions to extract an optimal segmentation. A
     'summary' of this object will give the breakpoints (and
     associated) breakdates for all segmentations up to the maximal
     number of breaks together with the associated RSS and BIC. These
     will be plotted if 'plot' is applied and thus visualize the
     minimum BIC estimator of the number of breakpoints. From an object
     of class '"breakpointsfull"' an arbitrary number of 'breaks'
     (admissable by the minimum segment size 'h') can be extracted by
     another application of 'breakpoints', returning an object of class
     '"breakpoints"'. This contains only the breakpoints for the
     specified number of breaks and some model properties (number of
     observations, regressors, time series properties and the
     associated RSS) but not the triangular RSS matrix and related
     extractor functions. The set of breakpoints which is associated by
     default with a '"breakpointsfull"' object is the minimum BIC
     partition.

     Breakpoints are the number of observations that are the last in
     one segment, it is also possible to compute the corresponding
     'breakdates' which are the breakpoints on the underlying time
     scale. The breakdates can be formatted which enhances readability
     in particular for quarterly or monthly time series. For example
     the breakdate '2002.75' of a monthly time series will be formatted
     to '"2002(10)"'. See 'breakdates' for more details.

     From a '"breakpointsfull"' object confidence intervals for the
     breakpoints can be computed using the method of 'confint'. The
     breakdates corresponding to the breakpoints can again be computed
     by 'breakdates'. The breakpoints and their confidence intervals
     can be visualized by 'lines'. Convenience functions are provided
     for extracting the coefficients and covariance matrix, fitted 
     values and residuals of segmented models.

     The log likelihood as well as some information criteria can be
     computed using the methods for the 'logLik' and 'AIC'. As for
     linear models the log likelihood is computed on a normal model and
     the degrees of freedom are the number of regression coefficients
     multiplied by the number of segements plus the number of estimated
     breakpoints plus 1 for the error variance. More details can be
     found on the help page of the method 'logLik.breakpoints'.

     As the maximum of a sequence of F statistics is equivalent to the
     minimum OLS estimator of the breakpoint in a 2-segment partition
     it can be extracted by 'breakpoints' from an object of class
     '"Fstats"' as computed by 'Fstats'. However, this cannot be used
     to extract a larger number of breakpoints.

     For illustration see the commented examples below and Zeileis et
     al. (2003).

_v_a_l_u_e:

     An object of class '"breakpoints"' is a list with the following
     elements:

     _b_r_e_a_k_p_o_i_n_t_s the breakpoints of the optimal partition with the
          number of breaks specified,

     _R_S_S the associated RSS,

     _n_o_b_s the number of observations,

     _n_r_e_g the number of regressors,

     _c_a_l_l the function call,

     _d_a_t_a_t_s_p the time series properties 'tsp' of the data, if any,
          'c(0, 1, nobs)' otherwise.

     If applied to a 'formula' as first argument, 'breakpoints' returns
     an object of class '"breakpointsfull"' (which inherits from
     '"breakpoints"'), that contains some additional (or slightly
     different) elements such as:

     _b_r_e_a_k_p_o_i_n_t_s the breakpoints of the minimum BIC partition,

     _R_S_S a function which takes two arguments 'i,j' and computes the
          residual sum of squares for a segment starting at observation
          'i' and ending at 'j' by looking up the corresponding element
          in the triangular RSS matrix 'RSS.triang',

     _R_S_S._t_r_i_a_n_g a list encoding the triangular RSS matrix.

_R_e_f_e_r_e_n_c_e_s:

     Bai J. (1994), Least Squares Estimation of a Shift in Linear
     Processes, _Journal of Time Series Analysis_, *15*, 453-472.

     Bai J. (1997a), Estimating Multiple Breaks One at a Time,
     _Econometric Theory_, *13*, 315-352.

     Bai J. (1997b), Estimation of a Change Point in Multiple
     Regression Models, _Review of Economics and Statistics_, *79*,
     551-563.

     Bai J., Perron P. (1998), Estimating and Testing Linear Models
     With Multiple Structural Changes, _Econometrica_, *66*, 47-78.

     Bai J., Perron P. (2003), Computation and Analysis of Multiple
     Structural Change Models, _Journal of Applied Econometrics_, *18*,
     1-22.

     Zeileis A., Kleiber C., Krmer W., Hornik K. (2003), Testing and
     Dating of Structural Changes in Practice, _Computational
     Statistics and Data Analysis_, forthcoming.

_E_x_a_m_p_l_e_s:

     if(! "package:stats" %in% search()) library(ts)

     ## Nile data with one breakpoint: the annual flows drop in 1898
     ## because the first Ashwan dam was built
     data(Nile)
     plot(Nile)

     ## F statistics indicate one breakpoint
     fs.nile <- Fstats(Nile ~ 1)
     plot(fs.nile)
     breakpoints(fs.nile)
     lines(breakpoints(fs.nile))

     ## or
     bp.nile <- breakpoints(Nile ~ 1)
     summary(bp.nile)

     ## the BIC also chooses one breakpoint
     plot(bp.nile)
     breakpoints(bp.nile)

     ## fit null hypothesis model and model with 1 breakpoint
     fm0 <- lm(Nile ~ 1)
     fm1 <- lm(Nile ~ breakfactor(bp.nile, breaks = 1))
     plot(Nile)
     lines(fitted(fm0), col = 3)
     lines(fitted(fm1), col = 4)
     lines(bp.nile)

     ## confidence interval
     ci.nile <- confint(bp.nile)
     ci.nile
     lines(ci.nile)

     ## UK Seatbelt data: a SARIMA(1,0,0)(1,0,0)_12 model
     ## (fitted by OLS) is used and reveals (at least) two
     ## breakpoints - one in 1973 associated with the oil crisis and
     ## one in 1983 due to the introduction of compulsory
     ## wearing of seatbelts in the UK.
     data(UKDriverDeaths)
     seatbelt <- log10(UKDriverDeaths)
     seatbelt <- cbind(seatbelt, lag(seatbelt, k = -1), lag(seatbelt, k = -12))
     colnames(seatbelt) <- c("y", "ylag1", "ylag12")
     seatbelt <- window(seatbelt, start = c(1970, 1), end = c(1984,12))
     plot(seatbelt[,"y"], ylab = expression(log[10](casualties)))

     ## testing
     re.seat <- efp(y ~ ylag1 + ylag12, data = seatbelt, type = "RE")
     plot(re.seat)

     ## dating
     bp.seat <- breakpoints(y ~ ylag1 + ylag12, data = seatbelt, h = 0.1)
     summary(bp.seat)
     lines(bp.seat, breaks = 2)

     ## minimum BIC partition
     plot(bp.seat)
     breakpoints(bp.seat)
     ## the BIC would choose 0 breakpoints although the RE and supF test
     ## clearly reject the hypothesis of structural stability. Bai &
     ## Perron (2003) report that the BIC has problems in dynamic regressions.
     ## due to the shape of the RE process of the F statistics choose two
     ## breakpoints and fit corresponding models
     bp.seat2 <- breakpoints(bp.seat, breaks = 2)
     fm0 <- lm(y ~ ylag1 + ylag12, data = seatbelt)
     fm1 <- lm(y ~ breakfactor(bp.seat2)/(ylag1 + ylag12) - 1, data = seatbelt)

     ## plot
     plot(seatbelt[,"y"], ylab = expression(log[10](casualties)))
     time.seat <- as.vector(time(seatbelt))
     lines(time.seat, fitted(fm0), col = 3)
     lines(time.seat, fitted(fm1), col = 4)
     lines(bp.seat2)

     ## confidence intervals
     ci.seat2 <- confint(bp.seat, breaks = 2)
     ci.seat2
     lines(ci.seat2)

