\name{BostonHousing}
\alias{BostonHousing}
\title{Boston Housing Data}
\usage{data(BostonHousing)}
}
\description{Concerns housing values in suburbs of Boston.
}
\format{A data frame with 506 observations on 14 variables,
    the last one \code{medv} being the target variable:
    \tabular{rll}{
 [,1] \tab crim \tab per capita crime rate by town \cr
 [,2] \tab zn \tab proportion of residential land zoned for lots over 25,000 sq.ft \cr
 [,3] \tab indus \tab proportion of non-retail business acres per town \cr
 [,4] \tab chas \tab Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) \cr
 [,5] \tab nox \tab nitric oxides concentration (parts per 110 million) \cr
 [,6] \tab rm \tab average number of rooms per dwelling \cr
 [,7] \tab age \tab proportion of owner-occupied units built prior to 1940 \cr
 [,8] \tab dis \tab weighted distances to five Boston employment centres \cr
 [,9] \tab rad \tab index of accessibility to radial highways \cr
[,10] \tab tax \tab full-value property-tax rate per USD 10,000 \cr
[,11] \tab ptratio \tab pupil-teacher ratio by town \cr
[,12] \tab b \tab \eqn{1000(B - 0.63)^2} where \eqn{B} is the proportion of blacks by town\cr
[,13] \tab lstat \tab lower status of the population \cr
[,14] \tab medv \tab median value of owner-occupied homes in USD 1000's
    }
}
\source{
    Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the 
    demand for clean air', J. Environ. Economics & Management,
    vol.5, 81-102, 1978.
    
    These data have been taken from the UCI Repository Of Machine Learning
    Databases at
    \itemize{
      \item \url{ftp://ftp.ics.uci.edu/pub/machine-learning-databases}
      \item \url{http://www.ics.uci.edu/~mlearn/MLRepository.html}
    }
    and were converted to R format by Friedrich.Leisch@ci.tuwien.ac.at.
}
\keyword{datasets}


\eof
\name{BreastCancer}
\title{Wisconsin Breast Cancer Database}
\usage{data(BreastCancer)}
\alias{BreastCancer}
\format{A data frame with 699 observations on 11 variables, one being a
    character variable, 9 being ordered or nominal, and 1 target class.

    \tabular{cll}{ 
 [,1] \tab Id \tab Sample code number\cr
 [,2] \tab Cl.thickness \tab Clump Thickness\cr
 [,3] \tab Cell.size \tab Uniformity of Cell Size\cr
 [,4] \tab Cell.shape \tab Uniformity of Cell Shape\cr
 [,5] \tab Marg.adhesion  \tab Marginal Adhesion\cr
 [,6] \tab Epith.c.size \tab Single Epithelial Cell Size\cr
 [,7] \tab Bare.nuclei \tab Bare Nuclei\cr
 [,8] \tab Bl.cromatin \tab Bland Chromatin\cr
 [,9] \tab Normal.nucleoli \tab Normal Nucleoli\cr
[,10] \tab Mitoses \tab Mitoses\cr
[,11] \tab Class \tab Class
}
    }
\description{
    The objective is to identify each of a number of benign or malignant
    classes. Samples arrive periodically as
    Dr. Wolberg reports his clinical cases.
    The database therefore reflects this chronological grouping of the
    data.  This grouping information appears immediately below, having been
    removed from the data itself.  Each variable except for the first was
    converted into 11 primitive numerical attributes with values ranging
    from 0 through 10.  There are 16 missing attribute values. See cited
    below for more details.}
\source{
    \itemize{
       	\item Creator: Dr. WIlliam H. Wolberg (physician); University of
	Wisconsin Hospital ;Madison; Wisconsin; USA 
        \item Donor: Olvi Mangasarian (mangasarian@cs.wisc.edu)
	\item Received: David W. Aha (aha@cs.jhu.edu)
    }
    These data have been taken from the UCI Repository Of Machine Learning
    Databases at
    \itemize{
      \item \url{ftp://ftp.ics.uci.edu/pub/machine-learning-databases}
      \item \url{http://www.ics.uci.edu/~mlearn/MLRepository.html}
     }
    and were converted to R format by Evgenia.Dimitriadou@ci.tuwien.ac.at.
}
\references{   
   1. Wolberg,W.H., \& Mangasarian,O.L. (1990). Multisurface method of 
      pattern separation for medical diagnosis applied to breast cytology. In
      Proceedings of the National Academy of Sciences, 87,
      9193-9196.\cr
      - Size of data set: only 369 instances (at that point in time)\cr
      - Collected classification results: 1 trial only\cr
      - Two pairs of parallel hyperplanes were found to be consistent with
         50\% of the data\cr
         - Accuracy on remaining 50\% of dataset: 93.5\%\cr
      - Three pairs of parallel hyperplanes were found to be consistent with
         67\% of data\cr
         - Accuracy on remaining 33\% of dataset: 95.9\%

   2. Zhang,J. (1992). Selecting typical instances in instance-based
      learning.  In Proceedings of the Ninth International Machine
      Learning Conference (pp. 470-479).  Aberdeen, Scotland: Morgan
      Kaufmann.\cr
      - Size of data set: only 369 instances (at that point in time)\cr
      - Applied 4 instance-based learning algorithms\cr
      - Collected classification results averaged over 10 trials\cr
      - Best accuracy result: \cr
         - 1-nearest neighbor: 93.7\%\cr
         - trained on 200 instances, tested on the other 169\cr
      - Also of interest:\cr
         - Using only typical instances: 92.2\% (storing only 23.1 instances)\cr
         - trained on 200 instances, tested on the other 169

}
\keyword{datasets}
    

\eof
\name{DNA}
\title{Primate splice-junction gene sequences (DNA)} 
\usage{data(DNA)}
\alias{DNA}
\format{A data frame with 3,186 observations on 180 variables, all
nominal and a target class.}

\description{It consists of 3,186 data points (splice junctions). The
    data points are described by 180 indicator binary
    variables and the problem is to recognize the 3 classes (ei, ie,
    neither), i.e., the boundaries between exons (the parts of the DNA
    sequence retained after splicing) and introns (the parts of the DNA
    sequence that are spliced out).
    
    The StaLog dna dataset is a processed version of the Irvine 
    database described below. The main difference is that the 
    symbolic variables representing the nucleotides (only A,G,T,C) 
    were replaced by 3 binary indicator variables. Thus the original 
    60 symbolic attributes were changed into 180 binary attributes.  
    The names of the examples were removed. The examples with 
    ambiguities were removed (there was very few of them, 4).   
    The StatLog version of this dataset was produced by Ross King
    at Strathclyde University. For original details see the Irvine 
    database documentation.

    The nucleotides A,C,G,T were given indicator values as follows:
    \tabular{cl}{
    	\tab A -> 1 0 0\cr
    	\tab C -> 0 1 0\cr
    	\tab G -> 0 0 1\cr
    	\tab T -> 0 0 0\cr
    }
    Hint. Much better performance is generally observed if attributes
    closest to the junction are used. In the StatLog version, this
    means using attributes A61 to A120 only.   
}
\source{
    \itemize{
       	\item Source:\cr
  	- all examples taken from Genbank 64.1 (ftp site:
	genbank.bio.net)\cr
       	- categories "ei" and "ie" include every "split-gene" 
        for primates in Genbank 64.1\cr
       	- non-splice examples taken from sequences known not to include
        a splicing site\cr
   	\item Donor: G. Towell, M. Noordewier, and J. Shavlik, 
        {towell,shavlik}@cs.wisc.edu, noordewi@cs.rutgers.edu
    }
    These data have been taken from: 
    \itemize{
    	\item ftp.stams.strath.ac.uk/pub/Statlog
    	    }
    and were converted to R format by Evgenia.Dimitriadou@ci.tuwien.ac.at.
}
\references{
     machine learning:\cr
       	-- M. O. Noordewier and G. G. Towell and J. W. Shavlik, 1991; 
           "Training Knowledge-Based Neural Networks to Recognize Genes in 
           DNA Sequences".  Advances in Neural Information Processing Systems,
           volume 3, Morgan Kaufmann.

	-- G. G. Towell and J. W. Shavlik and M. W. Craven, 1991;  
           "Constructive Induction in Knowledge-Based Neural Networks",  
           In Proceedings of the Eighth International Machine Learning
	   Workshop, Morgan Kaufmann.

        -- G. G. Towell, 1991;
           "Symbolic Knowledge and Neural Networks: Insertion, Refinement, and
           Extraction", PhD Thesis, University of Wisconsin - Madison.

        -- G. G. Towell and J. W. Shavlik, 1992;
           "Interpretation of Artificial Neural Networks: Mapping 
           Knowledge-based Neural Networks into Rules", In Advances in Neural
           Information Processing Systems, volume 4, Morgan Kaufmann.  
}
\keyword{datasets}
    

\eof
\name{Glass}
\alias{Glass}
\title{Glass Identification Database}
\usage{data(Glass)}
\keyword{datasets}
\description{A data frame with 214 observation containing examples of
  the chemical analysis of 7 different types of glass. The problem is to
  forecast the type of class on basis of the chemical analysis.  The
  study of classification of types of glass was motivated by
  criminological investigation.  At the scene of the crime, the glass left
  can be used as evidence (if it is correctly identified!).
}
\format{
    A data frame with 214 observations on 10 variables:
    \tabular{cll}{
 [,1] \tab RI \tab refractive index\cr
 [,2] \tab Na \tab Sodium\cr
 [,3] \tab Mg \tab Magnesium\cr
 [,4] \tab Al \tab Aluminum\cr
 [,5] \tab Si \tab Silicon\cr
 [,6] \tab K  \tab Potassium\cr
 [,7] \tab Ca \tab Calcium\cr
 [,8] \tab Ba \tab Barium\cr
 [,9] \tab Fe \tab Iron \cr
[,10] \tab Type \tab Type of glass (class attribute) \cr 
}
}   
\source{
    \itemize{
       	\item Creator: B. German, Central Research Establishment, Home
	Office Forensic Science Service, Aldermaston, Reading, Berkshire
	RG7 4PN 
   	\item Donor: Vina Spiehler, Ph.D., DABFT, Diagnostic Products
	Corporation
    }
    
    These data have been taken from the UCI Repository Of Machine Learning
    Databases at
    \itemize{
      \item \url{ftp://ftp.ics.uci.edu/pub/machine-learning-databases}
      \item \url{http://www.ics.uci.edu/~mlearn/MLRepository.html}
    }
    and were converted to R format by Friedrich.Leisch@ci.tuwien.ac.at.
}









\eof
\name{HouseVotes84}
\alias{HouseVotes84}
\title{United States Congressional Voting Records 1984}
\usage{data(HouseVotes84)}
\description{
    This data set includes votes for each of the U.S. House of
    Representatives Congressmen on the 16 key votes identified by the
    CQA.  The CQA lists nine different types of votes: voted for, paired
    for, and announced for (these three simplified to yea), voted
    against, paired against, and announced against (these three
    simplified to nay), voted present, voted present to avoid conflict
    of interest, and did not vote or otherwise make a position known
    (these three simplified to an unknown disposition).
}
\keyword{datasets}
\format{
    A data frame with 435 observations on 17 variables:
    \tabular{rl}{
   1 \tab Class Name: 2 (democrat, republican)\cr
   2 \tab handicapped-infants: 2 (y,n)\cr
   3 \tab water-project-cost-sharing: 2 (y,n)\cr
   4 \tab adoption-of-the-budget-resolution: 2 (y,n)\cr
   5 \tab physician-fee-freeze: 2 (y,n)\cr
   6 \tab el-salvador-aid: 2 (y,n)\cr
   7 \tab religious-groups-in-schools: 2 (y,n)\cr
   8 \tab anti-satellite-test-ban: 2 (y,n)\cr
   9 \tab aid-to-nicaraguan-contras: 2 (y,n)\cr
  10 \tab mx-missile: 2 (y,n)\cr
  11 \tab immigration: 2 (y,n)\cr
  12 \tab synfuels-corporation-cutback: 2 (y,n)\cr
  13 \tab education-spending: 2 (y,n)\cr
  14 \tab superfund-right-to-sue: 2 (y,n)\cr
  15 \tab crime: 2 (y,n)\cr
  16 \tab duty-free-exports: 2 (y,n)\cr
  17 \tab export-administration-act-south-africa: 2 (y,n)\cr
  }
}
\source{
    \itemize{
	\item Source: Congressional Quarterly Almanac, 98th Congress,
	2nd session 1984, Volume XL: Congressional Quarterly Inc.,
	ington, D.C., 1985
	\item Donor: Jeff Schlimmer (Jeffrey.Schlimmer@a.gp.cs.cmu.edu)
    }

    These data have been taken from the UCI Repository Of Machine Learning
    Databases at
    \itemize{
      \item \url{ftp://ftp.ics.uci.edu/pub/machine-learning-databases}
      \item \url{http://www.ics.uci.edu/~mlearn/MLRepository.html}
    }
    and were converted to R format by Friedrich.Leisch@ci.tuwien.ac.at.
}
    

\eof
\name{Ionosphere}
\title{Johns Hopkins University Ionosphere database}
\usage{data(Ionosphere)}
\alias{Ionosphere}
\format{A data frame with 351 observations on 35 independent variables, some 
    numerical and 2 nominal, and one last defining the class.}

\description{
    This radar data was collected by a system in Goose Bay, Labrador.  This
   system consists of a phased array of 16 high-frequency antennas with a
   total transmitted power on the order of 6.4 kilowatts.  See the paper
   for more details.  The targets were free electrons in the ionosphere.
   "good" radar returns are those showing evidence of some type of structure 
   in the ionosphere.  "bad" returns are those that do not; their signals pass
   through the ionosphere.  

   Received signals were processed using an autocorrelation function whose
   arguments are the time of a pulse and the pulse number.  There were 17
   pulse numbers for the Goose Bay system.  Instances in this databse are
   described by 2 attributes per pulse number, corresponding to the complex
   values returned by the function resulting from the complex electromagnetic
   signal. See cited below for more details.}
\source{
    \itemize{
       	\item Source: Space Physics Group; Applied Physics Laboratory;
	Johns Hopkins University; Johns Hopkins Road; Laurel; MD 20723 
        \item Donor: Vince Sigillito (vgs@aplcen.apl.jhu.edu)
    }
    These data have been taken from the UCI Repository Of Machine Learning
    Databases at
    \itemize{
      \item \url{ftp://ftp.ics.uci.edu/pub/machine-learning-databases}
      \item \url{http://www.ics.uci.edu/~mlearn/MLRepository.html}
    }
    and were converted to R format by Evgenia.Dimitriadou@ci.tuwien.ac.at.
}
\references{
   Sigillito, V. G., Wing, S. P., Hutton, L. V., \& Baker, K. B. (1989).
      Classification of radar returns from the ionosphere using neural 
      networks. Johns Hopkins APL Technical Digest, 10, 262-266.

      They investigated using backprop and the perceptron training algorithm
      on this database.  Using the first 200 instances for training, which
      were carefully split almost 50\% positive and 50\% negative, they found
      that a "linear" perceptron attained 90.7\%, a "non-linear" perceptron
      attained 92\%, and backprop an average of over 96\% accuracy on the 
      remaining 150 test instances, consisting of 123 "good" and only 24 "bad"
      instances.  (There was a counting error or some mistake somewhere; there
      are a total of 351 rather than 350 instances in this domain.) Accuracy
      on "good" instances was much higher than for "bad" instances.  Backprop
      was tested with several different numbers of hidden units (in [0,15])
      and incremental results were also reported (corresponding to how well
      the different variants of backprop did after a periodic number of 
      epochs).

      David Aha (aha@ics.uci.edu) briefly investigated this database.
      He found that nearest neighbor attains an accuracy of 92.1\%, that
      Ross Quinlan's C4 algorithm attains 94.0\% (no windowing), and that
      IB3 (Aha \& Kibler, IJCAI-1989) attained 96.7\% (parameter settings:
      70\% and 80\% for acceptance and dropping respectively).

}
\keyword{datasets}
    

\eof
\name{LetterRecognition}
\title{Letter Image Recognition Data}
\usage{data(LetterRecognition)}
\alias{LetterRecognition}
\format{A data frame with 20,000 observations on 17 variables, the first
    is a factor with levels A-Z, the remaining 16 are numeric.

    \tabular{rll}{
 [,1] \tab lettr \tab  capital letter\cr
 [,2] \tab x.box \tab  horizontal position of box\cr
 [,3] \tab y.box \tab  vertical position of box\cr
 [,4] \tab width \tab  width of box\cr
 [,5] \tab high  \tab  height of box\cr
 [,6] \tab onpix \tab  total number of on pixels\cr
 [,7] \tab x.bar \tab  mean x of on pixels in box\cr
 [,8] \tab y.bar \tab  mean y of on pixels in box\cr
 [,9] \tab x2bar \tab  mean x variance\cr
[,10] \tab y2bar \tab  mean y variance\cr
[,11] \tab xybar \tab  mean x y correlation\cr
[,12] \tab x2ybr \tab  mean of \eqn{x^2 y} \cr
[,13] \tab xy2br \tab  mean of \eqn{x y^2} \cr
[,14] \tab x.ege \tab  mean edge count left to right\cr
[,15] \tab xegvy \tab  correlation of x.ege with y\cr
[,16] \tab y.ege \tab  mean edge count bottom to top\cr
[,17] \tab yegvx \tab  correlation of y.ege with x\cr
    }
}
\description{
   The objective is to identify each of a large number of black-and-white
   rectangular pixel displays as one of the 26 capital letters in the English
   alphabet.  The character images were based on 20 different fonts and each
   letter within these 20 fonts was randomly distorted to produce a file of
   20,000 unique stimuli.  Each stimulus was converted into 16 primitive
   numerical attributes (statistical moments and edge counts) which were then
   scaled to fit into a range of integer values from 0 through 15.  We
   typically train on the first 16000 items and then use the resulting model
   to predict the letter category for the remaining 4000.  See the article
   cited below for more details.
}
\source{
    \itemize{
       	\item Creator: David J. Slate
     	\item Odesta Corporation; 1890 Maple Ave; Suite 115; Evanston, IL 60201
   	\item Donor: David J. Slate (dave@math.nwu.edu) (708) 491-3867   
    }
    These data have been taken from the UCI Repository Of Machine Learning
    Databases at
    \itemize{
      \item \url{ftp://ftp.ics.uci.edu/pub/machine-learning-databases}
      \item \url{http://www.ics.uci.edu/~mlearn/MLRepository.html}
    }
    and were converted to R format by Friedrich.Leisch@ci.tuwien.ac.at.
}
\references{
    P. W. Frey and D. J. Slate (Machine Learning Vol 6/2 March 91):
    "Letter Recognition Using Holland-style Adaptive Classifiers".

    The research for this article investigated the ability of several
    variations of Holland-style adaptive classifier systems to learn to
    correctly guess the letter categories associated with vectors of 16
    integer attributes extracted from raster scan images of the letters.
    The best accuracy obtained was a little over 80\%.  It would be
    interesting to see how well other methods do with the same data.
}
\keyword{datasets}
    

\eof
\name{Ozone}
\alias{Ozone}
\title{Los Angeles ozone pollution data, 1976}
\usage{data(Ozone)}
\keyword{datasets}
\description{A data frame with 366 observations on 13 variables, each
  observation is one day}
\format{
    \tabular{rl}{
   1 \tab Month: 1 = January, ..., 12 = December\cr
   2 \tab Day of month\cr
   3 \tab Day of week: 1 = Monday, ..., 7 = Sunday\cr
   4 \tab Daily maximum one-hour-average ozone reading\cr
   5 \tab 500 millibar pressure height (m) measured at Vandenberg AFB\cr
   6 \tab Wind speed (mph) at Los Angeles International Airport (LAX)\cr
   7 \tab Humidity (\%) at LAX\cr
   8 \tab Temperature (degrees F) measured at Sandburg, CA\cr
   9 \tab Temperature (degrees F) measured at El Monte, CA\cr
  10 \tab Inversion base height (feet) at LAX\cr
  11 \tab Pressure gradient (mm Hg) from LAX to Daggett, CA\cr
  12 \tab Inversion base temperature (degrees F) at LAX\cr
  13 \tab Visibility (miles) measured at LAX\cr
  }
}
\details{
The problem is to predict the daily maximum one-hour-average
ozone reading (V4).
}
\source{
    Leo Breiman, Department of Statistics, UC Berkeley.  Data used in
    Leo Breiman and Jerome H. Friedman (1985), Estimating optimal
    transformations for multiple regression and correlation, JASA, 80, pp.
    580-598.
}
    

\eof
\name{PimaIndiansDiabetes}
\alias{PimaIndiansDiabetes}
\title{Pima Indians Diabetes Database}
\usage{data(PimaIndiansDiabetes)}
\keyword{datasets}
\description{
    A data frame with 768 observations on 9 variables.}
\format{
    \tabular{rl}{
   1 \tab Number of times pregnant\cr
   2 \tab Plasma glucose concentration (glucose tolerance test)\cr
   3 \tab Diastolic blood pressure (mm Hg)\cr
   4 \tab Triceps skin fold thickness (mm)\cr
   5 \tab 2-Hour serum insulin (mu U/ml)\cr
   6 \tab Body mass index (weight in kg/(height in m)\^2)\cr
   7 \tab Diabetes pedigree function\cr
   8 \tab Age (years)\cr
   9 \tab Class variable (test for diabetes)\cr
  }
}
\source{
    \itemize{
	\item Original owners: National Institute of Diabetes and Digestive and
        Kidney Diseases
	\item Donor of database: Vincent Sigillito
	(vgs@aplcen.apl.jhu.edu)
    }
    
    These data have been taken from the UCI Repository Of Machine Learning
    Databases at
    \itemize{
      \item \url{ftp://ftp.ics.uci.edu/pub/machine-learning-databases}
      \item \url{http://www.ics.uci.edu/~mlearn/MLRepository.html}
    }
    and were converted to R format by Friedrich.Leisch@ci.tuwien.ac.at.
}
    

\eof
\name{Satellite}
\alias{Satellite}
\title{Landsat Multi-Spectral Scanner Image Data}
\description{
  The database consists of the multi-spectral values of pixels in 3x3
  neighbourhoods in a satellite image, and the classification associated
  with the central pixel in each neighbourhood.  The aim is to predict
  this classification, given the multi-spectral values.
}
\usage{data(Satellite)}
\format{
  A data frame with 36 inputs (\code{x.1 \ldots x.36}) and one target
  (\code{classes}).
}
\details{
  One frame of Landsat MSS imagery consists of four digital images of
  the same scene in different spectral bands.  Two of these are in the
  visible region (corresponding approximately to green and red regions
  of the visible spectrum) and two are in the (near) infra-red.  Each
  pixel is a 8-bit binary word, with 0 corresponding to black and 255 to
  white. The spatial resolution of a pixel is about 80m x 80m.  Each
  image contains 2340 x 3380 such pixels.
    
  The database is a (tiny) sub-area of a scene, consisting of 82 x 100
  pixels. Each line of data corresponds to a 3x3 square neighbourhood of
  pixels completely contained within the 82x100 sub-area.  Each line
  contains the pixel values in the four spectral bands (converted to
  ASCII) of each of the 9 pixels in the 3x3 neighbourhood and a number
  indicating the classification label of the central pixel.

  The classes are
  \tabular{l}{
    red soil\cr
    cotton crop\cr
    grey soil\cr
    damp grey soil\cr
    soil with vegetation stubble\cr
    very damp grey soil\cr
  }

  The data is given in random order and certain lines of data have been
  removed so you cannot reconstruct the original image from this
  dataset.
	
  In each line of data the four spectral values for the top-left pixel
  are given first followed by the four spectral values for the
  top-middle pixel and then those for the top-right pixel, and so on
  with the pixels read out in sequence left-to-right and top-to-bottom.
  Thus, the four spectral values for the central pixel are given by
  attributes 17,18,19 and 20.  If you like you can use only these four
  attributes, while ignoring the others.  This avoids the problem which
  arises when a 3x3 neighbourhood straddles a boundary.
}
\section{Origin}{
  The original Landsat data for this database was generated from data
  purchased from NASA by the Australian Centre for Remote Sensing, and
  used for research at: The Centre for Remote Sensing, University of New
  South Wales, Kensington, PO Box 1, NSW 2033, Australia.

  The sample database was generated taking a small section (82 rows and
  100 columns) from the original data.  The binary values were converted
  to their present ASCII form by Ashwin Srinivasan.  The classification
  for each pixel was performed on the basis of an actual site visit by
  Ms. Karen Hall, when working for Professor John A. Richards, at the
  Centre for Remote Sensing at the University of New South Wales,
  Australia. Conversion to 3x3 neighbourhoods and splitting into test
  and training sets was done by Alistair Sutherland.
}
\section{History}{
  The Landsat satellite data is one of the many sources of information
  available for a scene. The interpretation of a scene by integrating
  spatial data of diverse types and resolutions including multispectral
  and radar data, maps indicating topography, land use etc. is expected
  to assume significant importance with the onset of an era characterised
  by integrative approaches to remote sensing (for example, NASA's Earth
  Observing System commencing this decade). Existing statistical methods 
  are ill-equipped for handling such diverse data types. Note that this
  is not true for Landsat MSS data considered in isolation (as in
  this sample database). This data satisfies the important requirements
  of being numerical and at a single resolution, and standard
  maximum-likelihood classification performs very well. Consequently,
  for this data, it should be interesting to compare the performance
  of other methods against the statistical approach.
}
\source{
  Ashwin Srinivasan,
  Department of Statistics and Data Modeling,
  University of Strathclyde,
  Glasgow,
  Scotland,
  UK,
  \email{ross@uk.ac.turing}

  These data have been taken from the UCI Repository Of Machine Learning
  Databases at
  \itemize{
      \item \url{ftp://ftp.ics.uci.edu/pub/machine-learning-databases}
      \item \url{http://www.ics.uci.edu/~mlearn/MLRepository.html}
  }
  and were converted to R format by Friedrich.Leisch@ci.tuwien.ac.at.
}
\keyword{datasets}

\eof
\name{Servo}
\title{Servo Data}
\usage{data(Servo)}
\alias{Servo}
\format{A data frame with 167 observations on 5 variables, 4 nominal and
    1 as the target class.}

\description{This data set is from a simulation of a servo system
    involving a servo amplifier, a motor, a lead screw/nut, and a
    sliding carriage of some sort. It may have been on of the
    translational axes of a robot on the 9th floor of the AI lab. In any
    case, the output value is almost certainly a rise time, or the time
    required for the system to respond to a step change in a position
    set point. The variables that describe the data set and their values
    are the following:
    
    \tabular{cll}{
	[,1] \tab Motor \tab A,B,C,D,E\cr
	[,2] \tab Screw \tab A,B,C,D,E\cr
    	[,3] \tab Pgain \tab 3,4,5,6\cr
    	[,4] \tab Vgain \tab 1,2,3,4,5\cr
    	[,5] \tab Class \tab 0.13 to 7.10
    }
}
\source{
    \itemize{
       	\item Creator: Karl Ulrich (MIT) in 1986
	\item Donor: Ross Quinlan 
    }
    These data have been taken from the UCI Repository Of Machine Learning
    Databases at
    \itemize{
      \item \url{ftp://ftp.ics.uci.edu/pub/machine-learning-databases}
      \item \url{http://www.ics.uci.edu/~mlearn/MLRepository.html}
    }
    and were converted to R format by Evgenia.Dimitriadou@ci.tuwien.ac.at.
}
\references{
    1. Quinlan, J.R., "Learning with continuous classes", Proc. 5th
    Australian Joint Conference on AI (eds A. Adams and L. Sterling),
    Singapore: World Scientific, 1992 
    2. Quinlan, J.R., "Combining instance-based and model-based
    learning", Proc. ML'93 (ed P.E. Utgoff), San Mateo: Morgan Kaufmann 1993 
}
\keyword{datasets}
    


\eof
\name{Shuttle}
\title{Shuttle Dataset (Statlog version)}
\usage{data(Shuttle)}
\alias{Shuttle}
\format{A data frame with 58,000 observations on 9 numerical independent
    variables and 1 target class.}

\description{The shuttle dataset contains 9 attributes all of which are
    numerical with the first one being time.  The last column is the class
    with the following 7 levels: Rad.Flow, Fpv.Close, Fpv.Open, High, Bypass,
    Bpv.Close, Bpv.Open.
    
    Approximately 80\% of the data belongs to class 1. Therefore the
    default accuracy is about 80\%. The aim here is to obtain an
    accuracy of 99 - 99.9\%.

}
\source{
    \itemize{
       	\item Source: Jason Catlett of Basser Department of Computer
	Science; University of Sydney; N.S.W.; Australia.
    }
    These data have been taken from the UCI Repository Of Machine Learning
    Databases at
    \itemize{
      \item \url{ftp://ftp.ics.uci.edu/pub/machine-learning-databases}
      \item \url{http://www.ics.uci.edu/~mlearn/MLRepository.html}
    }
    and were converted to R format by Evgenia.Dimitriadou@ci.tuwien.ac.at.
}
\keyword{datasets}
    


\eof
\name{Sonar}
\title{Sonar, Mines vs. Rocks}
\usage{data(Sonar)}
\alias{Sonar}
\format{A data frame with 208 observations on 61 variables, all numerical and one (the Class) nominal.}

\description{This is the data set used by Gorman and Sejnowski in their
    study of the classification of sonar signals using a neural network
    [1]. The task is to train a network to discriminate between sonar
    signals bounced off a metal cylinder and those bounced off a roughly
    cylindrical rock.  
    
    Each pattern is a set of 60 numbers in the range 0.0 to 1.0. Each
    number represents the energy within a particular frequency band,
    integrated over a certain period of time. The integration aperture
    for higher frequencies occur later in time, since these frequencies
    are transmitted later during the chirp.
    
    The label associated with each record contains the letter "R" if the
    object is a rock and "M" if it is a mine (metal cylinder). The
    numbers in the labels are in increasing order of aspect angle, but
    they do not encode the angle directly. 
}

\source{
    \itemize{
       	\item Contribution: Terry Sejnowski, Salk Institute and
	University of California, San Deigo.
	\item Development: R. Paul Gorman, Allied-Signal Aerospace
	Technology Center. 
	\item Maintainer: Scott E. Fahlman 
	
    }
    These data have been taken from the UCI Repository Of Machine Learning
    Databases at
    \itemize{
      \item \url{ftp://ftp.ics.uci.edu/pub/machine-learning-databases}
      \item \url{http://www.ics.uci.edu/~mlearn/MLRepository.html}
    }
    and were converted to R format by Evgenia.Dimitriadou@ci.tuwien.ac.at.
}

\references{
    1. Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden
    Units in a Layered Network Trained to Classify Sonar Targets" in
    Neural Networks, Vol. 1, pp. 75-89. 
    }
    
    \keyword{datasets}
    


\eof
\name{Soybean}
\title{Soybean Database}
\usage{data(Soybean)}
\alias{Soybean}
\format{A data frame with 683 observations on 36 variables. There are 35
    categorical attributes, all numerical and a nominal denoting the
    class.
    \tabular{cll}{
	[,1] \tab Class \tab the 19 classes\cr
	[,2] \tab date \tab
	apr(0),may(1),june(2),july(3),aug(4),sept(5),oct(6).\cr
	[,3] \tab plant.stand \tab normal(0),lt-normal(1).\cr
    	[,4] \tab precip \tab lt-norm(0),norm(1),gt-norm(2).\cr
    	[,5] \tab temp \tab lt-norm(0),norm(1),gt-norm(2).\cr
    	[,6] \tab hail \tab yes(0),no(1).\cr
    	[,7] \tab crop.hist \tab dif-lst-yr(0),s-l-y(1),s-l-2-y(2),
	s-l-7-y(3).\cr
    	[,8] \tab area.dam \tab
	scatter(0),low-area(1),upper-ar(2),whole-field(3).\cr
    	[,9] \tab sever \tab minor(0),pot-severe(1),severe(2).\cr
    	[,10] \tab seed.tmt \tab none(0),fungicide(1),other(2).\cr
   	[,11] \tab germ \tab 90-100\%(0),80-89\%(1),lt-80\%(2).\cr
   	[,12] \tab plant.growth \tab norm(0),abnorm(1).\cr
   	[,13] \tab leaves \tab norm(0),abnorm(1).\cr
   	[,14] \tab leaf.halo \tab
	absent(0),yellow-halos(1),no-yellow-halos(2).\cr
   	[,15] \tab leaf.marg \tab w-s-marg(0),no-w-s-marg(1),dna(2).\cr
   	[,16] \tab leaf.size \tab lt-1/8(0),gt-1/8(1),dna(2).\cr
   	[,17] \tab leaf.shread \tab absent(0),present(1).\cr
   	[,18] \tab leaf.malf \tab absent(0),present(1).\cr
   	[,19] \tab leaf.mild \tab absent(0),upper-surf(1),lower-surf(2).\cr
   	[,20] \tab stem \tab norm(0),abnorm(1).\cr
   	[,21] \tab lodging \tab	yes(0),no(1).\cr
   	[,22] \tab stem.cankers \tab
	absent(0),below-soil(1),above-s(2),ab-sec-nde(3).\cr
   	[,23] \tab canker.lesion \tab dna(0),brown(1),dk-brown-blk(2),tan(3).\cr
   	[,24] \tab fruiting.bodies \tab absent(0),present(1).\cr
   	[,25] \tab ext.decay \tab absent(0),firm-and-dry(1),watery(2).\cr
   	[,26] \tab mycelium \tab absent(0),present(1).\cr
   	[,27] \tab int.discolor \tab none(0),brown(1),black(2).\cr
   	[,28] \tab sclerotia \tab absent(0),present(1).\cr
   	[,29] \tab fruit.pods \tab norm(0),diseased(1),few-present(2),dna(3).\cr
   	[,30] \tab fruit.spots \tab
	absent(0),col(1),br-w/blk-speck(2),distort(3),dna(4).\cr
   	[,31] \tab seed \tab norm(0),abnorm(1).\cr
   	[,32] \tab mold.growth \tab absent(0),present(1).\cr
   	[,33] \tab seed.discolor \tab absent(0),present(1).\cr
   	[,34] \tab seed.size \tab norm(0),lt-norm(1).\cr
   	[,35] \tab shriveling \tab absent(0),present(1).\cr
   	[,36] \tab roots \tab norm(0),rotted(1),galls-cysts(2).

}
    }

\description{
    There are 19 classes, only the first 15 of which have been used in prior
    work.  The folklore seems to be that the last four classes are
    unjustified by the data since they have so few examples.
    There are 35 categorical attributes, some nominal and some ordered.  The
    value ``dna'' means does not apply.  The values for attributes are
    encoded numerically, with the first value encoded as ``0,'' the second as
    ``1,'' and so forth. 
  }
\source{
    \itemize{
       	\item Source: R.S. Michalski and R.L. Chilausky "Learning by
	Being Told and Learning from Examples: An Experimental
	Comparison of the Two Methods of Knowledge Acquisition in the
	Context of Developing an Expert System for Soybean Disease
	Diagnosis", International Journal of Policy Analysis and
	Information Systems, Vol. 4, No. 2, 1980.
        \item Donor: Ming Tan & Jeff Schlimmer (Jeff.Schlimmer\%cs.cmu.edu)
    }
    These data have been taken from the UCI Repository Of Machine Learning
    Databases at
    \itemize{
      \item \url{ftp://ftp.ics.uci.edu/pub/machine-learning-databases}
      \item \url{http://www.ics.uci.edu/~mlearn/MLRepository.html}
    }
    and were converted to R format by Evgenia.Dimitriadou@ci.tuwien.ac.at.
}
\references{
    Tan, M., & Eshelman, L. (1988). Using weighted networks to represent
    classification knowledge in noisy domains.  Proceedings of the Fifth
    International Conference on Machine Learning (pp. 121-134). Ann Arbor,
    Michigan: Morgan Kaufmann.
    -- IWN recorded a 97.1\% classification accuracy 
    -- 290 training and 340 test instances
	    
    Fisher,D.H. & Schlimmer,J.C. (1988). Concept Simplification and
    Predictive Accuracy. Proceedings of the Fifth
    International Conference on Machine Learning (pp. 22-28). Ann Arbor,
    Michigan: Morgan Kaufmann.
    -- Notes why this database is highly predictable
}
\keyword{datasets}
    

\eof
\name{Vehicle}
\alias{Vehicle}
\title{Vehicle Silhouettes}
\usage{data(Vehicle)}

\keyword{datasets}
\format{
    A data frame with 846 observations on 19 variables, all numerical
    and one nominal defining the class of the objects.
    
    \tabular{cll}{
   [,1] \tab Comp \tab Compactness\cr
   [,2] \tab Circ \tab Circularity\cr
   [,3] \tab D.Circ \tab Distance Circularity\cr
   [,4] \tab Rad.Ra \tab Radius ratio\cr
   [,5] \tab Pr.Axis.Ra \tab pr.axis aspect ratio\cr
   [,6] \tab Max.L.Ra \tab max.length aspect ratio\cr
   [,7] \tab Scat.Ra \tab scatter ratio\cr
   [,8] \tab Elong \tab elongatedness\cr
   [,9] \tab Pr.Axis.Rect \tab pr.axis rectangularity\cr
  [,10] \tab Max.L.Rect \tab max.length rectangularity\cr
  [,11] \tab Sc.Var.Maxis \tab scaled variance along major axis\cr
  [,12] \tab Sc.Var.maxis \tab scaled variance along minor axis\cr
  [,13] \tab Ra.Gyr \tab scaled radius of gyration\cr
  [,14] \tab Skew.Maxis \tab skewness about major axis\cr
  [,15] \tab Skew.maxis \tab skewness about minor axis\cr
  [,16] \tab Kurt.maxis \tab kurtosis about minor axis\cr
  [,17] \tab Kurt.Maxis \tab kurtosis about major axis\cr
  [,18] \tab Holl.Ra \tab hollows ratio\cr
  [,19] \tab Class \tab type
  }
}
\description{
    The purpose is to classify a given silhouette as one of four types
    of vehicle, using a set of features extracted from the
    silhouette. The vehicle may be viewed from one of many different
    angles. The features were extracted from the silhouettes by the HIPS
    (Hierarchical Image Processing System) extension BINATTS, which
    extracts a combination of scale independent features utilising both
    classical moments based measures such as scaled variance, skewness
    and kurtosis about the major/minor axes and heuristic measures such
    as hollows, circularity, rectangularity and compactness. 
    
    Four "Corgie" model vehicles were used for the experiment: a double
    decker bus, Cheverolet van, Saab 9000 and an Opel Manta 400. This
    particular combination of vehicles was chosen with the expectation
    that the bus, van and either one of the cars would be readily
    distinguishable, but it would be more difficult to distinguish
    between the cars. 
}
\source{
    \itemize{
	\item Creator: Drs.Pete Mowforth and Barry Shepherd, Turing
	Institute, Glasgow, Scotland.   
    }

    These data have been taken from the UCI Repository Of Machine Learning
    Databases at
    \itemize{
      \item \url{ftp://ftp.ics.uci.edu/pub/machine-learning-databases}
      \item \url{http://www.ics.uci.edu/~mlearn/MLRepository.html}
    }
    and were converted to R format by Evgenia.Dimitriadou@ci.tuwien.ac.at.
}

\references{
    Turing Institute Research Memorandum TIRM-87-018 "Vehicle
    Recognition Using Rule Based Methods" by Siebert,JP (March 1987) 
    }


\eof
\name{Vowel}
\alias{Vowel}
\title{Vowel Recognition (Deterding data)}
\usage{data(Vowel)}
\keyword{datasets}
\format{
    A data frame with 990 observations on 10 independent variables, one
    nominal and the other numerical, and 1 as the target class.}
   
\description{Speaker independent recognition of the eleven steady state
    vowels of British English using a specified training set of lpc
    derived log area ratios. The vowels are indexed by integers
    0-10. For each utterance, there are ten floating-point input values,
    with array indices 0-9. The vowels are the following: hid, hId, hEd,
    hAd, hYd, had, hOd, hod, hUd, hud, hed. 
}
\source{
    \itemize{
	\item Creator: Tony Robinson 
	\item Maintainer: Scott E. Fahlman, CMU
    }
    
    These data have been taken from the UCI Repository Of Machine Learning
    Databases at
    \itemize{
      \item \url{ftp://ftp.ics.uci.edu/pub/machine-learning-databases}
      \item \url{http://www.ics.uci.edu/~mlearn/MLRepository.html}
    }
    and were converted to R format by Evgenia.Dimitriadou@ci.tuwien.ac.at.
}

\references{
    D. H. Deterding, 1989, University of Cambridge, "Speaker
    Normalisation for Automatic Speech Recognition", submitted for PhD.
    
    M. Niranjan and F. Fallside, 1988, Cambridge University Engineering
    Department, "Neural Networks and Radial Basis Functions in
    Classifying Static Speech Patterns", CUED/F-INFENG/TR.22.
    
    Steve Renals and Richard Rohwer, "Phoneme Classification Experiments
    Using Radial Basis Functions", Submitted to the International Joint
    Conference on Neural Networks, Washington, 1989.
}




\eof
\name{as.data.frame.mlbench}
\alias{as.data.frame.mlbench}
\title{Convert an mlbench object to a dataframe}
\usage{
as.data.frame.mlbench(x, row.names=NULL, optional=FALSE)
}
\arguments{
  \item{x}{Object of class \code{"mlbench"}.}
  \item{row.names,optional}{currently ignored.}
}
\description{
    Converts \code{z} (which is basically a list) to a dataframe.  }
\examples{
p <- mlbench.xor(5)
p
as.data.frame(p) }


\keyword{manip}

\eof
\name{bayesclass}
\alias{bayesclass}
\alias{bayesclass.noerr}
\alias{bayesclass.mlbench.2dnormals}
\alias{bayesclass.mlbench.circle}
\alias{bayesclass.mlbench.xor}
\alias{bayesclass.mlbench.cassini}
\alias{bayesclass.mlbench.cuboids}
\alias{bayesclass.mlbench.twonorm}
\alias{bayesclass.mlbench.threenorm}
\alias{bayesclass.mlbench.ringnorm}

\title{Bayes classifier}
\usage{
bayesclass(z)
}
\arguments{
 \item{z}{An object of class \code{"mlbench"}.}
}
\description{
    Returns the decision of the (optimal) Bayes classifier for a given
    data set. This is a generic function, i.e., there are different
    methods for the various mlbench problems.

    If the classes of the problem do not overlap, then the Bayes
    decision is identical to the true classification, which is
    implemented as the dummy function \code{bayesclass.noerr} (which
    simply returns \code{z$classes} and is used for all problems with
    disjunct classes).
}
\examples{
# 6 overlapping classes
p <- mlbench.2dnormals(500,6)
plot(p)

plot(p$x, col=as.numeric(bayesclass(p)))
}

\keyword{classif}

\eof
\name{mlbench.2dnormals}
\alias{mlbench.2dnormals}
\title{2-dimensional Gaussian Problem}
\usage{
mlbench.2dnormals(n, cl=2, r=sqrt(cl), sd=1)
}
\arguments{
    \item{n}{number of patterns to create}
    \item{cl}{number of classes}
    \item{r}{radius at which the centers of the classes are located}
    \item{sd}{standard deviation of the Gaussians}
}
\value{Returns an object of class \code{"bayes.2dnormals"} with components
    \item{x}{input values}
    \item{classes}{factor vector of length \code{n} with target classes} 
}
\description{
    Each of the \code{cl} classes consists of a 2-dimensional
    Gaussian. The centers are equally spaced on a circle around the
    origin with radius \code{r}.
}
\examples{
# 2 classes
p <- mlbench.2dnormals(500,2)
plot(p)
# 6 classes
p <- mlbench.2dnormals(500,6)
plot(p)
}
\keyword{datagen}

\eof
\name{mlbench.cassini}
\alias{mlbench.cassini}
\title{Cassini: A 2 Dimensional Problem}
\usage{
mlbench.cassini(n, relsize=c(2,2,1))
}
\arguments{
    \item{n}{number of patterns to create}
    \item{relsize}{relative size of the classes (vector of length 3)}
}
\value{Returns an object of class \code{"mlbench.cassini"}  with components
    \item{x}{input values}
    \item{classes}{vector of length \code{n} with target classes} 
}
\description{
    The inputs of the cassini problem are uniformly distributed on
    a \code{2}-dimensional space within 3 structures. The 2 external
    structures (classes) are banana-shaped structures and in between them, the
    middle structure (class) is a circle.
}

\author{Evgenia Dimitriadou and Andreas Weingessel}

\examples{
p <- mlbench.cassini(5000)
plot(p)
}
\keyword{datagen}

\eof
\name{mlbench.circle}
\alias{mlbench.circle}
\title{Circle in a Square Problem}
\usage{
mlbench.circle(n, d=2)
}
\arguments{
    \item{n}{number of patterns to create}
    \item{d}{dimension of the circle problem}
}
\value{Returns an object of class \code{"mlbench.circle"}  with components
    \item{x}{input values}
    \item{classes}{factor vector of length \code{n} with target classes} 
}
\description{
    The inputs of the circle problem are uniformly distributed on
    the \code{d}-dimensional cube with corners \eqn{\{\pm 1\}}{\{+-1\}}. 
    This is a 2-class problem: The first class is a \code{d}-dimensional
    ball in the middle of the cube, the remainder forms the second
    class. The size of the ball is chosen such that both classes have equal
    prior probability 0.5.
}
\examples{
# 2d example
p<-mlbench.circle(300,2)
plot(p)
#
# 3d example
p<-mlbench.circle(300,3)
plot(p)
}
\keyword{datagen}

\eof
\name{mlbench.cuboids}
\alias{mlbench.cuboids}
\title{Cuboids: A 3 Dimensional Problem}
\usage{
mlbench.cuboids(n, relsize=c(2,2,2,1))
}
\arguments{
    \item{n}{number of patterns to create}
    \item{relsize}{relative size of the classes (vector of length 4)}
}
\value{Returns an object of class \code{"mlbench.cuboids"}  with components
    \item{x}{input values}
    \item{classes}{vector of length \code{n} with target classes} 
}
\description{
    The inputs of the cuboids problem are uniformly distributed on
    a \code{3}-dimensional space within 3 cuboids and a small
    cube in the middle of them. 
}

\author{Evgenia Dimitriadou, and Andreas Weingessel}

\examples{
p <- mlbench.cuboids(7000)
plot(p)
\dontrun{
library(Rggobi)
g <- ggobi(p$x)
g$setColors(p$class)
g$setMode("2D Tour")
}}
\keyword{datagen}

\eof
\name{mlbench.friedman1}
\alias{mlbench.friedman1}
\title{Benchmark Problem Friedman 1}
\usage{
mlbench.friedman1(n, sd=1)
}
\arguments{
\item{n}{number of patterns to create}
\item{sd}{Standard deviation of noise}
}
\description{
The regression problem Friedman 1 as described in Friedman (1991) and
Breiman (1996). Inputs are 10 independent variables uniformly
distributed on the interval \eqn{[0,1]}, only 5 out of these 10 are actually
used. Outputs are created according to
the formula
\deqn{y = 10 \sin(\pi x1 x2) + 20 (x3 - 0.5)^2 + 10 x4 + 5 x5 + e}{
  y = 10 sin(\pi x1 x2) + 20 (x3 - 0.5)^2
  + 10 x4 + 5 x5 + e}

where e is N(0,sd).
}
\value{Returns a list with components
\item{x}{input values (independent variables)}
\item{y}{output values (dependent variable)}
}
\references{
Breiman, Leo (1996) Bagging predictors. Machine Learning 24, pages
123-140.

Friedman, Jerome H. (1991) Multivariate adaptive regression
splines. The Annals of Statistics 19 (1), pages 1-67. 
}
\keyword{datagen}

\eof
\name{mlbench.friedman2}
\alias{mlbench.friedman2}
\title{Benchmark Problem Friedman 2}
\usage{
mlbench.friedman2(n, sd=125)
}
\arguments{
\item{n}{number of patterns to create}
\item{sd}{Standard deviation of noise. The default value of 125 gives
a signal to noise ratio (i.e., the ratio of the standard deviations) of
3:1. Thus, the variance of the function itself (without noise)
accounts for 90\% of the total variance.}
}
\description{
The regression problem Friedman 2 as described in Friedman (1991) and
Breiman (1996). Inputs are 4 independent variables uniformly
distrtibuted over the ranges
\deqn{0 \le x1 \le 100}
\deqn{40 \pi \le x2 \le 560 \pi}
\deqn{0 \le x3 \le 1}
\deqn{1 \le x4 \le 11}

The outputs are created according to the formula
\deqn{y = (x1^2 + (x2 x3 - (1/(x2 x4)))^2)^{0.5} + e}
where e is N(0,sd).
}
\value{Returns a list with components
\item{x}{input values (independent variables)}
\item{y}{output values (dependent variable)}
}
\references{
Breiman, Leo (1996) Bagging predictors. Machine Learning 24, pages
123-140.

Friedman, Jerome H. (1991) Multivariate adaptive regression
splines. The Annals of Statistics 19 (1), pages 1-67. 
}
\keyword{datagen}


\eof
\name{mlbench.friedman3}
\alias{mlbench.friedman3}
\title{Benchmark Problem Friedman 3}
\usage{
mlbench.friedman3(n, sd=0.1)
}
\arguments{
\item{n}{number of patterns to create}
\item{sd}{Standard deviation of noise. The default value of 0.1 gives
a signal to noise ratio (i.e., the ratio of the standard deviations) of
3:1. Thus, the variance of the function itself (without noise)
accounts for 90\% of the total variance.}
}
\description{
The regression problem Friedman 3 as described in Friedman (1991) and
Breiman (1996). Inputs are 4 independent variables uniformly
distrtibuted over the ranges
\deqn{0 \le x1 \le 100}
\deqn{40 \pi \le x2 \le 560 \pi}
\deqn{0 \le x3 \le 1}
\deqn{1 \le x4 \le 11}

The outputs are created according to the formula
\deqn{y = \mbox{atan}((x2 x3 - (1/(x2 x4)))/x1) + e}{
  y = atan ((x2 x3 - (1/(x2 x4)))/x1) + e}

where e is N(0,sd).
}
\value{Returns a list with components
\item{x}{input values (independent variables)}
\item{y}{output values (dependent variable)}
}
\references{
Breiman, Leo (1996) Bagging predictors. Machine Learning 24, pages
123-140.

Friedman, Jerome H. (1991) Multivariate adaptive regression
splines. The Annals of Statistics 19 (1), pages 1-67. 
}
\keyword{datagen}

\eof
\name{mlbench.peak}
\alias{mlbench.peak}
\title{Peak Benchmark Problem}
\usage{
mlbench.peak(n, d=20)
}
\arguments{
    \item{n}{number of patterns to create}
    \item{d}{dimension of the problem}
}
\description{
    Let \eqn{r=3u} where u is uniform on
    [0,1]. Take x to be uniformly distributed on the d-dimensional
    sphere of radius r. Let \eqn{y=25exp(-.5r^2)}. This data set is not a
    classification problem but a regression problem where y is the
    dependent variable.
}
\value{Returns a list with components
\item{x}{input values (independent variables)}
\item{y}{output values (dependent variable)}
}

\keyword{datagen}

\eof
\name{mlbench.ringnorm}
\alias{mlbench.ringnorm}
\title{Ringnorm Benchmark Problem}
\usage{
mlbench.ringnorm(n, d=20)
}
\arguments{
    \item{n}{number of patterns to create}
    \item{d}{dimension of the ringnorm problem}
}
\value{Returns an object of class \code{"mlbench.ringnorm"} with components
    \item{x}{input values}
    \item{classes}{factor vector of length \code{n} with target classes} 
}
\description{
    The inputs of the ringnorm problem are points from two Gaussian
    distributions. Class 1 is multivariate normal with mean 0 and
    covariance 4 times the identity matrix. Class 2 has unit covariance
    and mean \eqn{(a,a,\ldots,a)}, \eqn{a=d^{-0.5}}.

}
\references{
    Breiman, L. (1996). Bias, variance, and arcing classifiers.
    Tech. Rep. 460, Statistics Department, University of California,
    Berkeley, CA, USA.
}
\examples{
p<-mlbench.ringnorm(1000, d=2)
plot(p)
}
\keyword{datagen}

\eof
\name{mlbench.smiley}
\alias{mlbench.smiley}
\title{The Smiley}
\usage{
mlbench.smiley(n=500, sd1 = 0.1, sd2 = 0.05)
}
\arguments{
    \item{n}{number of patterns to create}
    \item{sd1}{standard deviation for eyes}
    \item{sd2}{standard deviation for mouth}
}
\value{Returns an object of class \code{"mlbench.smiley"}  with components
    \item{x}{input values}
    \item{classes}{factor vector of length \code{n} with target classes} 
}
\description{
    The smiley consists of 2 Gaussian eyes, a trapezoid nose and a
    parabula mouth (with vertical Gaussian noise).
}
\examples{
p<-mlbench.smiley()
plot(p)
}
\keyword{datagen}

\eof
\name{mlbench.spirals}
\alias{mlbench.spirals}
\alias{mlbench.1spiral}
\title{Two Spirals Benchmark Problem}
\usage{
mlbench.spirals(n, cycles=1, sd=0)
mlbench.1spiral(n, cycles=1, sd=0)
}
\arguments{
    \item{n}{number of patterns to create}
    \item{cycles}{the number of cycles each spiral makes}
    \item{sd}{standard deviation of data points around the spirals}
}
\value{Returns an object of class \code{"mlbench.spirals"} with components
    \item{x}{input values}
    \item{classes}{factor vector of length \code{n} with target classes} 
}
\description{
    The inputs of the spirals problem are points on two entangled spirals. If
    \code{sd>0}, then Gaussian noise is added to each data
    point. \code{mlbench.1spiral} creates a single spiral.
}
\examples{
# 1 cycle each, no noise
p<-mlbench.spirals(300)
plot(p)
#
# 1.5 cycles each, with noise
p<-mlbench.spirals(300,1.5,0.05)
plot(p)
}
\keyword{datagen}

\eof
\name{mlbench.threenorm}
\alias{mlbench.threenorm}
\title{Threenorm Benchmark Problem}
\usage{
mlbench.threenorm(n, d=20)
}
\arguments{
    \item{n}{number of patterns to create}
    \item{d}{dimension of the threenorm problem}
}
\value{Returns an object of class \code{"mlbench.threenorm"} with components
    \item{x}{input values}
    \item{classes}{factor vector of length \code{n} with target classes} 
}
\description{
    The inputs of the threenorm problem are points from two Gaussian
    distributions with unit covariance matrix. Class 1 is drawn with
    equal probability from a unit multivariate normal with mean
    \eqn{(a,a,\ldots,a)} and from a unit multivariate normal with mean 
    \eqn{(-a,-a,\ldots,-a)}. Class 2 is drawn from a multivariate normal
    with mean at \eqn{(a,-a,a, \ldots,-a)}, \eqn{a=2/d^{-0.5}}. 

}
\references{
    Breiman, L. (1996). Bias, variance, and arcing classifiers.
    Tech. Rep. 460, Statistics Department, University of California,
    Berkeley, CA, USA.
}
\examples{
p<-mlbench.threenorm(1000, d=2)
plot(p)
}
\keyword{datagen}

\eof
\name{mlbench.twonorm}
\alias{mlbench.twonorm}
\title{Twonorm Benchmark Problem}
\usage{
mlbench.twonorm(n, d=20)
}
\arguments{
    \item{n}{number of patterns to create}
    \item{d}{dimension of the twonorm problem}
}
\value{Returns an object of class \code{"mlbench.twonorm"} with components
    \item{x}{input values}
    \item{classes}{factor vector of length \code{n} with target classes} 
}
\description{
    The inputs of the twonorm problem are points from two Gaussian
    distributions with unit covariance matrix. Class 1 is multivariate
    normal with mean \eqn{(a,a,\ldots,a)} and class 2 with mean
    \eqn{(-a,-a,\ldots,-a)}, \eqn{a=2/d^{-0.5}}. 

}
\references{
    Breiman, L. (1996). Bias, variance, and arcing classifiers.
    Tech. Rep. 460, Statistics Department, University of California,
    Berkeley, CA, USA.
}
\examples{
p<-mlbench.twonorm(1000, d=2)
plot(p)
}
\keyword{datagen}

\eof
\name{mlbench.waveform}
\alias{mlbench.waveform}
\title{Waveform Database Generator (written in C)}
\usage{
  mlbench.waveform(n)
}
\arguments{
  \item{n}{number of patterns to create}
}

\value{
  Returns an object of class \code{"mlbench.waveform"} with components
  \item{x}{input values}
  \item{classes}{factor vector of length \code{n} with target classes}
}

\description{
    The generated data set consists of 21 attributes with continuous
    values and a variable showing the 3 classes (33\% for each of 3
    classes). Each class is generated from a combination of 2 of 3
    "base" waves. 
}
\references{
  Breiman, L. (1996). Bias, variance, and arcing
  classifiers. Tech. Rep. 460, Statistics Department, University of
  California, Berkeley, CA, USA.
}

\examples{
  p<-mlbench.waveform(100)
  plot(p)
}

\keyword{datagen}

\eof
\name{mlbench.xor}
\alias{mlbench.xor}
\title{Continuous XOR Benchmark Problem}
\usage{
mlbench.xor(n, d=2)
}
\arguments{
\item{n}{number of patterns to create}
\item{d}{dimension of the XOR problem}
}
\value{Returns an object of class \code{"mlbench.xor"} with components
\item{x}{input values}
\item{classes}{factor vector of length \code{n} with target classes} 
}
\description{
    The inputs of the XOR problem are uniformly distributed on
    the \code{d}-dimensional cube with corners \eqn{\{\pm 1\}}{\{+-1\}}. Each pair of
    opposite corners form one class, hence the total number of classes is
    \eqn{2^(d-1)}
}
\examples{
# 2d example
p<-mlbench.xor(300,2)
plot(p)
#
# 3d example
p<-mlbench.xor(300,3)
plot(p)
}
\keyword{datagen}

\eof
\name{plot.mlbench}
\alias{plot.mlbench}
\title{Plot mlbench objects}
\usage{
plot.mlbench(x, xlab="", ylab="", ...)
}
\arguments{
 \item{x}{Object of class \code{"mlbench"}.}
 \item{xlab}{Label for x-axis.}
 \item{ylab}{Label for y-axis.}
 \item{\dots}{Further plotting options.}
}
\description{
    Plots the data of an mlbench object using different colors for each
    class. If the dimension of the input space is larger that 2, a pairs
    plot is issued.
}
\examples{
# 6 normal classes
p <- mlbench.2dnormals(500,6)
plot(p)
# 4-dimensiona XOR
p <- mlbench.xor(500,4)
plot(p)
}

\keyword{hplot}

\eof
