============================
GGPlot Architecture & Design
============================

The basic structure of the GGPlot package is that there is a data definition
and basic visual parameter configuration (aka a "template") which can be
attached to a "ggplot" object.  Then various geometry specifications are added
to it, like geom_point and geom_line, and they inherit the definition from
their upstream pipeline node, and potentially override any parameter with
arguments they were given.

Data parameters and visual parameters are both encapsulated in the "aesthetic"
(which makes sense in kind of a bizarre way), and are injected into the
pipeline by passing the return value of a call to aes() as an argument into
either the ggplot() or geom_*() functions.

It should be noted that different kind of geoms can have different aesthetic
parameters.  So the aesthetic parameters that don't apply to a given geom is
just ignored.  The aesthetic can be thought of as a stylesheet on which
geometry just looks up visual attributes.

Useful information can be found in:
http://www.ling.upenn.edu/~joseff/rstudy/summer2010_ggplot2_intro.html


Aesthetics ----------

Whereas facets are data-driven layout, the Aesthetic specification system in
ggplot is a data-driven stylesheet mechanism.  Renderers that reference the
aesthetic will use its visual parameters.  Additionally, by assigning
categorical data nodes out of the datagraph to visual parameters, the Aesthetic
hierarchy further expands the number of unique renderer configurations that
appear in the plot, just as giving a categorical data node to facet_wrap causes
a multiplicity of renderers to be invoked.

The same group= argument for aes() doesn't apply to facet() because whereas
there is an obvious way to visually merge two aesthetically different
renderers/geoms in the same panel, there is not a similar kind of guidance on
how two panels should always line up.  Hence, facet_grid and facet_wrap
hardcode particular kinds of inter-panel alignment.

Recognizing that the role of aes() is two fold, we can tackle one of its roles,
the rendering generation part, by using list comprehensions or some other
vectorial expression.  This same vectorial expression/expansion mechanism can
then be used to specify a multiplicity of panels, in a way that is more
flexible than just facet_grid and facet_wrap.

Plots convey information through various aspects of their aesthetics. Some
aesthetics that plots use are:
    x position
    y position
    size of elements
    shape of elements
    color of elements

The elements in a plot are geometric shapes, like
    points
    lines
    line segments
    bars
    text

Some of these geometries have their own particular aesthetics. For instance:
    points
        point shape
        point size
    lines
        line type
        line weight
    bars
        y minimum
        y maximum
        fill color
        outline color
    text
        label value

There are other basics of these graphics that you can adjust, like the scaling
of the aesthetics, and the positions of the geometries.

Statistics
----------

The values represented in the plot are the product of various statistics. If
you just plot the raw data, you can think of each point representing the
identity statistic. Many bar charts represent the mean or median statistic.
Histograms are bar charts where the bars represent the binned count or density
statistic.

To add stats annotations to a plot, you use the stats_*() functions. These
process the input data and then render them with a default geom, which can be
changed.

Grouping
--------

If a categorical factor is used for any of the aes() parameters, then the
aes node is tagged as having a grouping, and that grouping is used by
all downstream geoms and stats to vectorize over the appropriate subsetting
of the source data.

Different aesthetic parameters can each have their own grouping; in that case,
groups are defined as unique combinations of each set of factors.

A grouping can also be explicitly created with the group= parameter for aes().
In this case, no visual attributes are being specified, but the geoms and
stats are being vectorially created over some sharding of the dataset.

Faceting
--------

Two faceting functions just control layout: facet_wrap() and facet_grid().
facet_wrap() is a one-factor form that uses the given factor to produce a 2D
wrapping of a 1D array of plots. A 2D grid of plots can be created using
facet_grid(factorA, factorB).

They will share axes by default; to free this restriction, just pass in the
'scales="free"' parameter.  If you only want one or the other axis to be
free, then use "free_y" or "free_x".

Scales
------

scale_*() functions share a common set of arguments:
    name
    limits (min, max)
    breaks (labeled breaks for the data)
    labels (labels for the breaks)
    trans (transformations to use on the data)

Scale functions are formatted as:
    scale_AESTHETIC-NAME_SCALE-NAME()

So:
    p <- ggplot(mpg, aes(displ, hwy)) + geom_point()

    p + scale_x_continuous(label = "Engine Displacement in Liters")
    #or
    p + xlab("Engine Displacement in Liters")

    p + scale_x_continuous(limits = c(2,4))
    #or
    p + xlim(2, 4)

    p + scale_x_continuous(trans = "log10)
    #or
    p + scale_x_log10()

It's important to note that functions like xlim() actually throw away data.

For color and fill scales:

    p <- p + aes(color = factor(cyl))
    p + scale_color_hue(label = "Cylinders")
        
    scale_color_brewer(pal = 'palette_name')
    scale_fill_gradient(high="colorname", low="colorname")

