\name{bkde}
\alias{bkde}
\title{
Compute a Binned Kernel Density Estimate
}
\description{
Returns x and y coordinates of the binned
kernel density estimate of the probability
density of the data.  
}
\usage{
bkde(x, kernel="normal", canonical=FALSE, bandwidth,
     gridsize=401, range.x, truncate=TRUE)
}
\arguments{
\item{x}{
vector of observations from the distribution whose density is to
be estimated.
Missing values are not allowed.
}
\item{bandwidth}{
the kernel bandwidth smoothing parameter.
Larger values of \code{bandwidth} make smoother estimates,
smaller values of \code{bandwidth} make less smooth estimates.
}
\item{kernel}{
character string which determines the smoothing kernel.
\code{kernel} can be:
\code{"normal"} - the Gaussian density function (the default).
\code{"box"} - a rectangular box.
\code{"epanech"} - the centred beta(2,2) density.
\code{"biweight"} - the centred beta(3,3) density.
\code{"triweight"} - the centred beta(4,4) density.
}
\item{canonical}{
logical flag: if \code{TRUE}, canonically scaled kernels are used.
}
\item{gridsize}{
the number of equally spaced points at which to estimate
the density.
}
\item{range.x}{
vector containing the minimum and maximum values of \code{x}
at which to compute the estimate.
The default is the minimum and maximum data values, extended by the
support of the kernel.
}
\item{truncate}{
logical flag: if \code{TRUE}, data with \code{x} values outside the
range specified by \code{range.x} are ignored.
}}
\value{
a list containing the following components:

\item{x}{
vector of sorted \code{x} values at which the estimate was computed.
}
\item{y}{
vector of density estimates
at the corresponding \code{x}.
}}
\details{
This is the binned approximation to the ordinary kernel density estimate.
Linear binning is used to obtain the bin counts.  
For each \code{x} value in the sample, the kernel is
centered on that \code{x} and the heights of the kernel at each datapoint are summed.
This sum, after a normalization, is the corresponding \code{y} value in the output.
}
\section{Background}{
Density estimation is a smoothing operation.
Inevitably there is a trade-off between bias in the estimate and the
estimate's variability: large bandwidths will produce smooth estimates that
may hide local features of the density; small bandwidths may introduce
spurious bumps into the estimate.
}
\references{
Wand, M. P. and Jones, M. C. (1995).
\emph{Kernel Smoothing.}
Chapman and Hall, London.
}
\seealso{
  \code{\link{density}}, \code{\link{dpik}}, \code{\link{hist}},
  \code{\link[mva]{ksmooth}}.
}
\examples{
data(geyser, package="MASS")
x <- geyser$duration
est <- bkde(x, bandwidth=0.25)
plot(est, type="l")
}
\keyword{distribution}
\keyword{smooth}
% Converted by Sd2Rd version 0.2-a5.

\eof
\name{bkde2D}
\alias{bkde2D}
\title{
Compute a 2D Binned Kernel Density Estimate
}
\description{
Returns the set of grid points in each coordinate direction,
and the matrix of density estimates over the mesh induced by
the grid points. The kernel is the standard bivariate normal
density. 
}
\usage{
bkde2D(x, bandwidth, gridsize=c(51, 51), range.x=<<see below>>, 
       truncate=TRUE)
}
\arguments{
\item{x}{
a two-column matrix containing the observations from the  
distribution whose density is to be estimated.
Missing values are not allowed.
}
\item{bandwidth}{
vector containing the bandwidth to be used in each coordinate
direction.
}
\item{gridsize}{
vector containing the number of equally spaced points in each direction
over which the density is to be estimated.
}
\item{range.x}{
a list containing two vectors, where each vector 
contains the minimum and maximum values of \code{x}
at which to compute the estimate for each direction.
The default minimum in each direction is minimum
data value minus 1.5 times the bandwidth for
that direction. The default maximum is the maximum
data value plus 1.5 times the bandwidth for
that direction
}
\item{truncate}{
logical flag: if TRUE, data with \code{x} values outside the
range specified by \code{range.x} are ignored.
}}
\value{
a list containing the following components:

\item{x1}{
vector of values of the grid points in the first coordinate
direction at which the estimate was computed. 
}
\item{x2}{
vector of values of the grid points in the second coordinate
direction at which the estimate was computed. 
}
\item{fhat}{
matrix of density estimates 
over the mesh induced by \code{x1} and \code{x2}.
}}
\section{Details}{
This is the binned approximation to the 2D kernel density estimate.
Linear binning is used to obtain the bin counts and the
Fast Fourier Transform is used to perform the discrete convolutions.
For each \code{x1},\code{x2} pair the bivariate Gaussian kernel is
centered on that location and the heights of the 
kernel, scaled by the bandwidths, at each datapoint are summed.
This sum, after a normalization, is the corresponding 
\code{fhat} value in the output.
}
\references{
Wand, M. P. (1994).
Fast Computation of Multivariate Kernel Estimators.
\emph{Journal of Computational and Graphical Statistics,}
\bold{3}, 433-445.


Wand, M. P. and Jones, M. C. (1995).
\emph{Kernel Smoothing.}
Chapman and Hall, London.
}
\seealso{
  \code{\link{bkde}}, \code{\link{density}}, \code{\link{hist}},
  \code{\link[mva]{ksmooth}}.
}
\examples{
data(geyser, package="MASS")
x <- cbind(geyser$duration, geyser$waiting)
est <- bkde2D(x, bandwidth=c(0.7,7))
contour(est$x1, est$x2, est$fhat)
persp(est$fhat)
}
\keyword{distribution}
\keyword{smooth}
% Converted by Sd2Rd version 0.2-a5.

\eof
\name{bkfe}
\alias{bkfe}
\title{
Compute a Binned Kernel Functional Estimate 
}
\description{
Returns an estimate of a binned approximation to
the kernel estimate of the specified density functional. 
The kernel is the standard normal density.
}
\usage{
bkfe(x, drv, bandwidth, gridsize=401, range.x, binned=FALSE, truncate=TRUE)
}
\arguments{
\item{x}{
vector of observations from the distribution whose density is to
be estimated.
Missing values are not allowed.
}
\item{drv}{
order of derivative in the density functional. Must be a
non-negative even integer.
}
\item{bandwidth}{
the kernel bandwidth smoothing parameter.
}
\item{gridsize}{
the number of equally-spaced points over which binning is
performed.
}
\item{range.x}{
vector containing the minimum and maximum values of \code{x}
at which to compute the estimate.
The default is the minimum and maximum data values, extended by the
support of the kernel.
}
\item{binned}{
logical flag: if \code{TRUE}, then \code{x} and \code{y} are taken to be grid counts
rather than raw data.
}
\item{truncate}{
logical flag: if \code{TRUE}, data with \code{x} values outside the
range specified by \code{range.x} are ignored.
}}
\value{
the estimated functional.
}
\details{
The density functional of order \code{drv} is the integral of the
product of the density and its \code{drv}th derivative. 
The kernel estimates
of such quantities are computed using a binned implementation,
and the kernel is the standard normal density.
}
\section{Background}{
Estimates of this type were proposed by Sheather and
Jones (1991).
}
\references{
Sheather, S. J. and Jones, M. C. (1991).
A reliable data-based bandwidth selection method for
kernel density estimation.
\emph{Journal of the Royal Statistical Society, Series B},
\bold{53}, 683--690.

Wand, M. P. and Jones, M. C. (1995).
\emph{Kernel Smoothing.}
Chapman and Hall, London.
}
\examples{
data(geyser, package="MASS")
x <- geyser$duration
est <- bkfe(x, drv=4, bandwidth=0.3)
}
\keyword{smooth}
% Converted by Sd2Rd version 0.2-a5.

\eof
\name{dpih}
\alias{dpih}
\title{
Select a Histogram Bin Width 
}
\description{
Uses direct plug-in methodology to select the bin width of 
a histogram.
}
\usage{
dpih(x, scalest="minim", level=2, gridsize=401, 
     range.x=range(x), truncate=TRUE)
}
\arguments{
\item{x}{
vector containing the sample on which the
histogram is to be constructed.
}
\item{scalest}{
estimate of scale.

 \code{"stdev"} - standard deviation is used.

 \code{"iqr"} - inter-quartile range divided by 1.349 is used.

 \code{"minim"} - minimum of \code{"stdev"} and \code{"iqr"} is used.
}
\item{level}{
number of levels of functional estimation used in the
plug-in rule.
}
\item{gridsize}{
number of grid points used in the binned approximations
to functional estimates.
}
\item{range.x}{
range over which functional estimates are obtained.
The default is the minimum and maximum data values.
}
\item{truncate}{
if \code{truncate} is \code{TRUE} then observations outside
of the interval specified by \code{range.x} are omitted.
Otherwise, they are used to weight the extreme grid points.
}}
\value{
the selected bin width.
}
\details{
The direct plug-in approach, where unknown functionals
that appear in expressions for the asymptotically
optimal bin width and bandwidths
are replaced by kernel estimates, is used.
The normal distribution is used to provide an
initial estimate.
}
\section{Background}{
This method for selecting the bin width of a histogram is
described in Wand (1995). It is an extension of the
normal scale rule of Scott (1979) and uses plug-in ideas
from bandwidth selection for kernel density estimation
(e.g. Sheather and Jones, 1991).
}
\references{
Scott, D. W. (1979). 
On optimal and data-based histograms.
\emph{Biometrika},
\bold{66}, 605--610.

Sheather, S. J. and Jones, M. C. (1991).
A reliable data-based bandwidth selection method for
kernel density estimation.
\emph{Journal of the Royal Statistical Society, Series B},
\bold{53}, 683--690. 

Wand, M. P. (1995).
Data-based choice of histogram binwidth.
\emph{University of New South Wales},
Australian Graduate School of Management 
Working Paper Series No. 95--011.
}
\seealso{
  \code{\link{hist}}
}
\examples{
data(geyser, package="MASS")
x <- geyser$duration
h <- dpih(x)
bins <- seq(min(x)-0.1, max(x)+0.1+h, by=h)
hist(x, breaks=bins)
}
\keyword{smooth}
% Converted by Sd2Rd version 0.2-a5.

\eof
\name{dpik}
\alias{dpik}
\title{
Select a Bandwidth for Kernel Density Estimation
}
\description{
Use direct plug-in methodology to select the bandwidth
of a kernel density estimate.
}
\usage{
dpik(x, scalest="minim", level=2, kernel="normal",   
     canonical=FALSE, gridsize=401, range.x=range(x), 
     truncate=TRUE)
}
\arguments{
\item{x}{
vector containing the sample on which the
kernel density estimate is to be constructed.
}
\item{scalest}{
estimate of scale.

 \code{"stdev"} - standard deviation is used.

 \code{"iqr"} - inter-quartile range divided by 1.349 is used.

 \code{"minim"} - minimum of \code{"stdev"} and \code{"iqr"} is used.
}
\item{level}{
number of levels of functional estimation used in the
plug-in rule.
}
\item{kernel}{
character string which determines the smoothing kernel.
\code{kernel} can be:
\code{"normal"} - the Gaussian density function (the default).
\code{"box"} - a rectangular box.
\code{"epanech"} - the centred beta(2,2) density.
\code{"biweight"} - the centred beta(3,3) density.
\code{"triweight"} - the centred beta(4,4) density.
}
\item{canonical}{
logical flag: if \code{TRUE}, canonically scaled kernels are used
}
\item{gridsize}{
the number of equally-spaced points over which binning is 
performed to obtain kernel functional approximation. 
}
\item{range.x}{
vector containing the minimum and maximum values of \code{x}
at which to compute the estimate.
The default is the minimum and maximum data values.
}
\item{truncate}{
logical flag: if \code{TRUE}, data with \code{x} values outside the
range specified by \code{range.x} are ignored.
}}
\value{
the selected bandwidth.
}
\details{
The direct plug-in approach, where unknown functionals
that appear in expressions for the asymptotically
optimal bandwidths
are replaced by kernel estimates, is used.
The normal distribution is used to provide an
initial estimate.
}
\section{Background}{
This method for selecting the bandwidth of a kernel
density estimate was proposed by Sheather and
Jones (1991)
and is
described in Section 3.6 of Wand and Jones (1995). 
}
\references{
Sheather, S. J. and Jones, M. C. (1991).
A reliable data-based bandwidth selection method for
kernel density estimation.
\emph{Journal of the Royal Statistical Society, Series B},
\bold{53}, 683--690.

Wand, M. P. and Jones, M. C. (1995).
\emph{Kernel Smoothing.}
Chapman and Hall, London.
}
\seealso{
\code{\link{bkde}}, \code{\link{density}}, \code{\link{ksmooth}}
}
\examples{
data(geyser, package="MASS")
x <- geyser$duration
h <- dpik(x)
est <- bkde(x,bandwidth=h)
plot(est,type="l")
}
\keyword{smooth}
% Converted by Sd2Rd version 0.2-a5.

\eof
\name{dpill}
\alias{dpill}
\title{
Select a Bandwidth for Local Linear Regression
}
\description{
Use direct plug-in methodology to select the bandwidth
of a local linear Gaussian kernel regression estimate, as described
by Ruppert, Sheather and Wand (1995). 
}
\usage{
dpill(x, y, blockmax=5, divisor=20, trim=0.01, proptrun=0.05, 
      gridsize=401, range.x=<<see below>>, truncate=TRUE)
}
\arguments{
\item{x}{
vector of x data.
Missing values are not accepted.
}
\item{y}{
vector of y data.
This must be same length as \code{x}, and
missing values are not accepted.
}
\item{blockmax}{
the maximum number of blocks of the data for construction
of an initial parametric estimate. 
}
\item{divisor}{
the value that the sample size is divided by to determine
a lower limit on the number of blocks of the data for
construction of an initial parametric estimate.
}
\item{trim}{
the proportion of the sample trimmed from each end in the
\code{x} direction before application of the plug-in methodology.
}
\item{proptrun}{
the proportion of the range of \code{x} at each end truncated in the
functional estimates.
}
\item{gridsize}{
number of equally-spaced grid points over which the
function is to be estimated.
}
\item{range.x}{
vector containing the minimum and maximum values of \code{x} at which to
compute the estimate.
For density estimation the default is the minimum and maximum data values
with 5\% of the range added to each end.
For regression estimation the default is the minimum and maximum data values.
}
\item{truncate}{
logical flag: if \code{TRUE}, data with \code{x} values outside the range specified
by \code{range.x} are ignored.
}}
\value{
the selected bandwidth.
}
\details{
The direct plug-in approach, where unknown functionals
that appear in expressions for the asymptotically
optimal bandwidths
are replaced by kernel estimates, is used.
The kernel is the standard normal density.
Least squares quartic fits over blocks of data are used to 
obtain an initial estimate. Mallow's \eqn{C_p}{Cp} is used to select
the number of blocks.
}
\section{Warning}{
If there are severe irregularities (i.e. outliers, sparse regions)
in the \code{x} values then the local polynomial smooths required for the
bandwidth selection algorithm may become degenerate and the function
will crash. Outliers in the \code{y} direction may lead to deterioration
of the quality of the selected bandwidth.
}
\references{
Ruppert, D., Sheather, S. J. and Wand, M. P. (1995).
An effective bandwidth selector for local least squares
regression.
\emph{Journal of the American Statistical Association},
\bold{90}, 1257--1270.

Wand, M. P. and Jones, M. C. (1995).
\emph{Kernel Smoothing.}
Chapman and Hall, London.
}
\seealso{
\code{\link{ksmooth}}, \code{\link{locpoly}}.
}
\examples{
data(geyser, package="MASS")
x <- geyser$duration
y <- geyser$waiting
plot(x, y)
h <- dpill(x, y)
fit <- locpoly(x, y, bandwidth=h)
lines(fit)
}
\keyword{smooth}
% Converted by Sd2Rd version 0.2-a5.

\eof
\name{locpoly}
\alias{locpoly}
\title{
Estimate Functions Using Local Polynomials
}
\description{
Estimates a probability density function,  
regression function or their derivatives
using local polynomials. A fast binned implementation
over an equally-spaced grid is used.
}
\usage{
locpoly(x, y, drv=0, degree=<<see below>>, kernel="normal", 
        bandwidth, gridsize=401, bwdisc=25, 
        range.x=<<see below>>,  binned=FALSE, truncate=TRUE)
}
\arguments{
\item{x}{
vector of x data.
Missing values are not accepted.
}
\item{bandwidth}{
the kernel bandwidth smoothing parameter.
It may be a single number or an array having
length \code{gridsize}, representing a bandwidth
that varies according to the location of
estimation.
}
\item{y}{
vector of y data.
This must be same length as \code{x}, and
missing values are not accepted.
}
\item{drv}{
order of derivative to be estimated.
}
\item{degree}{
degree of local polynomial used. Its value
must be greater than or equal to the value
of \code{drv}. The default value is of \code{degree} is
\code{drv} + 1.
}
\item{kernel}{
\code{"normal"} - the Gaussian density function.
}
\item{gridsize}{
number of equally-spaced grid points over which the 
function is to be estimated.
}
\item{bwdisc}{
number of logarithmically-equally-spaced bandwidths
on which \code{bandwidth} is discretised, to speed up
computation.
}
\item{range.x}{
vector containing the minimum and maximum values of \code{x} at which to
compute the estimate.
}
\item{binned}{
logical flag: if \code{TRUE}, then \code{x} and \code{y} are taken to be grid counts
rather than raw data. 
}
\item{truncate}{
logical flag: if \code{TRUE}, data with \code{x} values outside the range specified
by \code{range.x} are ignored.
}}
\value{
if \code{y} is specified, a local polynomial regression estimate of 
E[Y|X] (or its derivative) is computed.
If \code{y} is missing, a local polynomial estimate of the density
of \code{x} (or its derivative) is computed.


a list containing the following components:

\item{x}{
vector of sorted x values at which the estimate was computed.
}
\item{y}{
vector of smoothed estimates for either the density or the regression
at the corresponding \code{x}.
}}
\section{Details}{
Local polynomial fitting with a kernel weight is used to
estimate either a density, regression function or their
derivatives. In the case of density estimation, the 
data are binned and the local fitting procedure is applied to 
the bin counts. In either case, binned approximations over
an equally-spaced grid is used for fast computation. The
bandwidth may be either scalar or a vector of length
\code{gridsize}.
}
\references{
Wand, M. P. and Jones, M. C. (1995).
\emph{Kernel Smoothing.}
Chapman and Hall, London.
}
\seealso{
  \code{\link{bkde}}, \code{\link{density}}, \code{\link{dpill}},
  \code{\link{ksmooth}}, \code{\link{loess}}, \code{\link{smooth}},
  \code{\link{supsmu}}.
}
\examples{
data(geyser, package="MASS")
x <- geyser$duration
est <- locpoly(x,bandwidth=0.25)
plot(est,type="l")
# local linear density estimate
y <- geyser$waiting
plot(x,y)
fit <- locpoly(x,y,bandwidth=0.25)
lines(fit)
# local linear regression estimate
}
\keyword{smooth}
\keyword{regression}
% Converted by Sd2Rd version 0.2-a5.

\eof
