awsdens                 package:aws                 R Documentation

_l_o_c_a_l _c_o_n_s_t_a_n_t _d_e_n_s_i_t_y _e_s_t_i_m_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     This function uses adaptive weights smoothing for density
     estimation. The estimate is constructed using poisson regression
     and the structural assumption of a local constant density.
     Implementation is for 1D, 2D and 3D problems, although it 
     applications may be interesting only in 2D and 3D.

_U_s_a_g_e:

     awsdens(y, ngrid = NULL, nempty = NULL, qlambda = NULL, eta = 0.5, 
             lkern = "Triangle", fu = NULL, hinit = 1, hincr = 1.2, 
         hmax = NULL, graph = FALSE, demo = FALSE, symmetric = TRUE)

_A_r_g_u_m_e_n_t_s:

       y: 'y' contains the observed random variables or vectors.  In
          case of random vectors 'y' is supposed to be an observation
          matrix with columns containing the observed vectors

   ngrid: 'ngrid' determines the size of the grid used for binning.  If
          'ngrid' is a vector its components are supposed to determine 
          the number of intervals for the corresponding component of
          'y'; if it is a scalar 'rep(ngrid,dim(y)[1])' is used. If
          'is.null(ngrid)' a default of '2*n^{1/d}', with 
          'dim(y)=c(d,n)', is chosen, i.e. a grid with '2^d*n' bins.

  nempty: determines the width of a boundary region of the grid
          containing only empty bins. 'nempty' defaults to '0.1*ngrid'.
          The grid generated is equispaced with  'prod(ngrid)' bins and
          grid-coordinates '(nempty+1)[i]' and '(ngrid+1-nempty)[i]'
          determined by 'min(y[i,])' and 'max(y[i,])'.

 qlambda: 'qlambda' determines the scale parameter 'qlambda'  for the
          stochastic penalty. The scaling parameter in the stochastic 
          penalty 'lambda' is choosen as the 'qlambda'-quantile of a
          Chi-square-distribution with number of parameters in the
          polynomial  model as degrees of freedom. If 'qlambda=NULL' a
          standard value  depending on 'model' and 'symmetric' is
          choosen. 

     eta: 'eta' is a memory parameter used to stabilize the procedure. 
          'eta'  has to be between '0' and '1', with  'eta=.5' being
          the default. 

   lkern: 'lkern' determines the location kernel to be used. Options 
          are '"Uniform"', '"Triangle"', '"Quadratic"',  '"Cubic"' and
          '"Exponential"'. Default is '"Triangle"'.  The Kernel
          operates on the squared distance, so '"Triangle"' corresponds
          to the use of an Epanechnikov kernel in kernel smoothing. 
          '"Exponential"' requires larger values of 'hmax' and 
          therefore more iterations to reach comparable results. 

      fu: 'fu' can be used to specify a function to calculate the 
          values of the true density on the grid. This may be used for
          test purposes to calculate  Mean Squared Error (MSE) and 
          Mean Absolute Error (MAE) 

   hinit: 'hinit' Initial bandwidth for the location penalty. 
          Appropriate value is choosen in case of 'hinit==NULL'  

   hincr: 'hincr' 'hincr^(1/d)', with 'd' the  dimensionality of the
          design, is used as a factor to increase the  bandwidth
          between iterations. Defauts to 'hincr=1.2' 

    hmax: 'hmax' Maximal bandwidth to be used. Determines the  number
          of iterations and is used as the stopping rule. 

   graph: 'graph' if 'TRUE' results are displayed after each  iteration
          step. 

    demo: 'demo' if 'TRUE' after each iteration step results  are
          displayed and the process waits for user interaction. 

symmetric: If 'symmetric==TRUE' the stochastic penalty is symmetrized,
          i.e. '(sij + sji)/lambda' is used instead of  'sij/lambda'.
          See references for details.

_D_e_t_a_i_l_s:

     The density estimate ist constructed using an approach proposed by
      Lindsay (1974a,b). 1, 2 or 3-dimensional binning is used to
     produce counts, i.e. numbers of observations, per bin, with bins
     defined by a regular grid. THe bin-counts are then considered as
     poisson random variables with intensity depending on the location
     within the grid. Adaptive Weights Smoothing for poisson regression
     models, i.e. function  'laws' with parameter 'model="Poisson"', is
     used  to construct a location dependent intensity estimate which
     is then  transformed into a density estimate, see Section 6 in
     Polzehl and Spokoiny (2002b) for details.

_V_a_l_u_e:

     The returned object is a list with components 

    bin : Bin counts

   dens : density values for the bin. Values correspond to the
          grid-centers determined by component 'xgrid'

  xgrid : list with components containing the coordinates of
          bin-centers. The 'dim(y)[1]' components of 'xgrid' correspond
          to the components of the grid-coordinates. 

   call : actual function call

_A_u_t_h_o_r(_s):

     Joerg Polzehl, polzehl@wias-berlin.de 

_R_e_f_e_r_e_n_c_e_s:

  Polzehl, J. and Spokoiny, V. (2002).  _Local likelihood modelling by
     adaptive weights smoothing_,  WIAS-Preprint 787  

  Polzehl, J. and Spokoiny, V. (2000). _Adaptive Weights Smoothing with
     applications to image restoration_, J.R.Statist.Soc. B, 62, Part
     2, pp.335-354

  Lindsay, J. (1974a). _Comparison of probabbility distributions_, J.
     Royal Statist. Soc. Ser. B 36, 38-47.

  Lindsay, J. (1974b). _Construction and comparison of statistical
     models_, J. Royal Statist. Soc. Ser. B 36, 418-425. 

_S_e_e _A_l_s_o:

     SEE ALSO 'aws', 'laws'

_E_x_a_m_p_l_e_s:

     ###
     ###   univariate density estimation
     ###
     f0 <- function(x) 1.5*(x>=0)-(x>=.25)+(x>=.75)-1.5*(x>1)
     set.seed(1)
     m <- 1000
     x1 <- runif(m)
     ind <- (runif(m)<f0(x1)/1.5)
     n <- 200
     y <- x1[ind][1:n]
     tmp0 <- awsdens(y,440,20,hmax=2000)
     plot(tmp0$xgrid[[1]],tmp0$dens,type="l",lwd=2,
          ylim =range(0,1.5,tmp0$dens),xlab="x",ylab="Density")
     lines(tmp0$xgrid[[1]],f0(tmp0$xgrid[[1]]),lty=3,col=2,lwd=2)
     title("Estimated and true density (n=200)")
     ###
     ###   bivariate density estimation
     ###
     f1 <- function(x,y) 7.5*pmax(x*(1-x^2-y^2),0)*(x>0)*(y>0)
     set.seed(1)
     m <- 10000
     x1 <- runif(m)
     x2 <- runif(m)
     fx12 <- f1(x1,x2)
     ind <- (runif(m)<f1(x1,x2)/.385/7.5)
     n <- 2500
     y <- rbind(x1[ind],x2[ind])[,1:n]
     tmp <- awsdens(y,120,10,hmax=5)
     image(tmp$xgrid[[1]],tmp$xgrid[[2]],tmp$dens,xlab="x1",ylab="x2",col=gray(.5+(255:0)/511),lwd=2)
     #   thats the estimated density on the grid
     lines(seq(0,1,.01),sqrt(1-seq(0,1,.01)^2),col=2,lty=2,lwd=2)
     lines(c(0,1),c(0,0),col=2,lty=2,lwd=2)
     lines(c(0,0),c(0,1),col=2,lty=2,lwd=2)
     #    thats the boundary of the support
     title("Estimated density (n=2500)")
     #    now add contour lines of the estimated density
     contour(tmp$xgrid[[1]],tmp$xgrid[[2]],tmp$dens,nlevels=20,add=TRUE,col=1,lty=1,labcex=.1)
     rm(f0,m,x1,ind,n,y,tmp0,f1,x2,fx12,tmp)

