boston                 package:spdep                 R Documentation

_C_o_r_r_e_c_t_e_d _B_o_s_t_o_n _H_o_u_s_i_n_g _D_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     The 'boston.c' data frame has 506 rows and 20 columns. It contains
     the Harrison and Rubinfeld (1978) data corrected for a few minor
     errors and augmented with the latitude and longitude of the
     observations. Gilley and Pace also point out that MEDV is
     censored, in that median values at or over USD 50,000 are set to
     USD 50,000. The original data set without the corrections is also
     included in package 'mlbench' as 'BostonHousing'. In addition, a
     matrix of tract point coordinates projected to UTM zone 19 is
     included as 'boston.utm', and a sphere of influence neighbours
     list as 'boston.soi'.

_U_s_a_g_e:

     data(boston)

_F_o_r_m_a_t:

     This data frame contains the following columns:

     _T_O_W_N a factor with levels given by town names

     _T_O_W_N_N_O a numeric vector corresponding to TOWN

     _T_R_A_C_T a numeric vector of tract ID numbers

     _L_O_N a numeric vector of tract point longitudes in decimal degrees

     _L_A_T a numeric vector of tract point latitudes in decimal degrees

     _M_E_D_V a numeric vector of median values of owner-occupied housing
          in USD 1000

     _C_M_E_D_V a numeric vector of corrected median values of
          owner-occupied housing in USD 1000

     _C_R_I_M a numeric vector of per capita crime

     _Z_N a numeric vector of proportions of residential land zoned for
          lots over 25000 sq. ft per town (constant for all Boston
          tracts)

     _I_N_D_U_S a numeric vector of proportions of non-retail business acres
          per town (constant for all Boston tracts)

     _C_H_A_S a factor with levels 1 if tract borders Charles River; 0
          otherwise

     _N_O_X a numeric vector of nitric oxides concentration (parts per 10
          million) per town

     _R_M a numeric vector of average numbers of rooms per dwelling

     _A_G_E a numeric vector of proportions of owner-occupied units built
          prior to 1940

     _D_I_S a numeric vector of weighted distances to five Boston
          employment centres

     _R_A_D a numeric vector of an index of accessibility to radial
          highways per town (constant for all Boston tracts)

     _T_A_X a numeric vector full-value property-tax rate per USD 10,000
          per town (constant for all Boston tracts)

     _P_T_R_A_T_I_O a numeric vector of pupil-teacher ratios per town
          (constant for all Boston tracts)

     _B a numeric vector of '1000*(Bk - 0.63)^2' where Bk is the
          proportion of blacks

     _L_S_T_A_T a numeric vector of percentage values of lower status
          population

_S_o_u_r_c_e:

     <URL: http://lib.stat.cmu.edu/datasets/boston_corrected.txt>

_R_e_f_e_r_e_n_c_e_s:

     Harrison, David, and Daniel L. Rubinfeld, Hedonic Housing Prices
     and the Demand for Clean Air, _Journal of Environmental Economics
     and Management_, Volume 5, (1978), 81-102. Original data.

     Gilley, O.W., and R. Kelley Pace, On the Harrison and Rubinfeld
     Data, _Journal of Environmental Economics and Management_, 31
     (1996), 403-405. Provided corrections and examined censoring.

     Pace, R. Kelley, and O.W. Gilley, Using the Spatial Configuration
     of the Data to Improve Estimation,  _Journal of the Real Estate
     Finance and Economics_, 14 (1997), 333-340.

_E_x_a_m_p_l_e_s:

     data(boston)
     hr0 <- lm(log(MEDV) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2) + I(RM^2) +
      AGE + log(DIS) + log(RAD) + TAX + PTRATIO + B + log(LSTAT), data=boston.c)
     summary(hr0)
     logLik(hr0)
     gp0 <- lm(log(CMEDV) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2) + I(RM^2) +
      AGE + log(DIS) + log(RAD) + TAX + PTRATIO + B + log(LSTAT), data=boston.c)
     summary(gp0)
     logLik(gp0)
     lm.morantest(hr0, nb2listw(boston.soi))
     gp1 <- errorsarlm(log(CMEDV) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2) + I(RM^2)
      +  AGE + log(DIS) + log(RAD) + TAX + PTRATIO + B + log(LSTAT),
      data=boston.c, nb2listw(boston.soi), method="SparseM", 
      tol.opt = .Machine$double.eps^(1/4))
     summary(gp1)
     gp2 <- lagsarlm(log(CMEDV) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2) + I(RM^2)
      +  AGE + log(DIS) + log(RAD) + TAX + PTRATIO + B + log(LSTAT),
      data=boston.c, nb2listw(boston.soi), method="SparseM")
     summary(gp2)

