TITLE

Perlpp: cpp on steroids.

RESOURCES

Perlpp, and the examples of this article, are available as a tar file by
anonymous ftp:

[cw]
  ftp://zot.mesastate.edu/pub/wmacevoy/perlpp/lj-article.tar.gz
  ftp://zot.mesastate.edu/pub/wmacevoy/perlpp/perlpp-0.5.tar.gz
[ecw]

The distribution includes installation instructions for perlpp.  You
may also check the TROUBLESHOOTING section of this article if you have
problems running perlpp.

You must have perl 5, or later, installed to use perlpp.  All Linuxes 
have perl available as a package in some form.  The  web page

[cw]
  http://www.perl.org
[ecw]

is a great place to begin if you want to learn more about perl.

ABOUT THE AUTHOR

Dr. Warren MacEvoy has been a happy Linux user since 1993.  
He enjoys reading with his son, Bryce, eating really hot food, and
listening to disturbingly loud music.  He also hates putting commas
inside of "quotes," especially when the comma is not part of what he's
quoting! 

0. PRELIMINARIES

Using perlpp, the perl-pre-processor, requires at least rudimentary
knowledge about programming in perl.  Perl 5, or later, must be
installed on your system as well.  

Since perl is such a useful language, almost every programmer should
know a little about it.  The point of this section is to cover some of
the rudiments of perl that we use in the examples.  If you are fairly
comfortable with perl, then move on to the next section!

If you are still reading, here are the basic basics of perl:

  <B>Variables.<B>  Scalar variables, which can take on values of
  strings, integers, or doubles, always have a [cw]$[ecw]-sign as the
  first character.  List variables, which are simple lists of scalars,
  always have an [cw]@[ecw]-sign as the first character.  All
  variables are global, unless preceded by 'my' when first used within
  a block. 

  <B>String quoting.<B>  We make a great deal of use of the three ways
  strings can be quoted in perl.  They can be [cw]'quoted almost
  exactly'[ecw], [cw]"quoted with interpolation"[ecw], or [cw]`system
  quotes`[ecw].  We will present  more detail on this later, but: 

    Single-quoted strings are subject to minimal translation. For
    example, [cw]'\n'[ecw] is a backslash followed by an en.

    Double-quoted strings have a great deal of translation.  For
    example, [cw]"i=$i\n"[ecw] is the characters [cw]i=[cw],
    followed by the value of the variable [cw]$i[ecw], followed by a
    newline character.  In perl parlance, double-quoted strings are
    said to be <I>interpolated<I>. 

    Back-quoted strings are interpolated like double-quoted strings,
    but the value of a back-quoted string is the output (whatever
    is sent to STDOUT) of executing the translated string as
    a shell command. For example, [cw]`ls $dir`[ecw] is the output of
    running the [cw]ls[ecw] command with the value of [cw]$dir[ecw] as
    an argument. 

  <B>Loops.<B> Perl supports the csh-style loop of the form

[cw]
      foreach $index (@LIST) { statement1; statement2; .... }
[ecw]

    As well as the C-style loop

[cw]
      for (do-once; check-first-each-time; do-last-each-time) { .... }
[ecw]

  We use both kinds.

In fact, the basic syntax of perl mimics C in many respects, so that
C programmers can read perl scripts pretty easily.  No, that is too
bold: a C programmer can write C-looking perl, and it will mostly work
as expected.  A perl programmer would solve the same problem in a
completely different manner.  In doing so, she may perhaps accomplish
something difficult to imagine: she may write a program that is more
obscure than what can readily be done in C. 

If you don't believe me, look at the perlpp source, which is itself
a perl script.

Perl is a great deal more than this tiny view I've presented here, but
these ideas should be enough to understand the examples.  See the
resources box for more information about perl.  Now on to perlpp!

1. INTRODUCTION

The point of this article is to introduce a tool which I call perlpp;
the perl-preprocessor.  Since I only just wrote it, perlpp is not
available in any linux distribution.  Please see the resources box for
information on obtaining perlpp and the examples described in this
article.

Perlpp is a beefy sister of cpp, the C-preprocessor:  she can do what cpp
can do, and much more.  For example, introducing the idea of code templates
in any programming language is easy using perlpp.

Let's begin by talking about perlpp's older brother, cpp.
C programmers don't get far before learning that C programs, at least
logically, pass through two stages of translation.  The first stage,
the pre-processing stage, is when commands such as

[cw]
  #include <stdio.h>
[ecw]

and

[cw]
  #define FOO(x) bar(x)
[ecw]

are used to translate the hybrid C/cpp input file into a pure C input
file, which is then the input to the pure C compiler.  Pictorially,

  input file ==> cpp ==> cc1 ==> object file

While the intended job of cpp is to pre-process input files for a
C (or C++) compiler, it is used to pre-process other files.  For
example, xrdb uses cpp to pre-process X11 resource files before
loading them.

Cpp is a very useful tool, but a programmer can quickly run into
limitations.  Essentially, this is because cpp is a macro-processor
with limited facilities for computation and the manipulation of
text. 

The reason I came up with perlpp was to overcome these limitations
for a scientific computation problem at Pacific Northwest National
Laboratories:  I wrote the chemical equilibrium portion of a ground
water transport model. For the sake of compatibility with the rest 
of the model, it had to be programmed in FORTRAN.   For the sake of
compatibility with Linux, Sun, and SGI development environments, it
had to be FORTRAN 77.  The problem statement was roughly this:

  Given the chemical equilibrium equations for a given set of
  species, automatically generate an efficient reliable solver
  for these equations.

This created a need to go from chemical equilibrium equations in
symbolic form, to the generation of a Maple V (a symbolic mathematics
package) batch file from a template, followed by the inclusion of the
results from that batch file into a template-generated Fortran
subroutine library that satisfied the requirements of the project.

This environment required the automatic generation of several kinds of
programs from templates, and was a natural breeding ground for thoughts
about useful pre-processors.  Although it took me most of a week to
come up with the alpha version of perlpp, it easily saved that amount
of time just for that project.  Solving the same problem without it
may have taken four or five weeks longer than it did.  Furthermore,
without perlpp, the project would be much harder to maintain.

So much for pep talk and history!  Let's go about the business of describing
what perlpp does and how to use it.

2. WHAT PERLPP DOES

Perlpp takes input files and generates perl scripts which, when run,
creates similar---but hopefully better---output files.

2.1 Example 1: Hello World!

Do this.  Create a file called [cw]hello.c.ppp[ecw] that contains the
lines 

[cw]
   #include <stdio.h>
   int main()
   {
     printf("Hello World!\n");
     return 0;
   }
[ecw]

Now run the perlpp command with

[cw]
   perlpp -pl hello.c.ppp
[ecw]

We will worry about the [cw]-pl[ecw] option in a moment; just be
assured it is important for this example.  If you check, perlpp
created [cw]hello.c.pl[ecw], which contains the following perl script:

[cw]
   #!/usr/bin/perl
   print '#include <stdio.h>
   ';
   print 'int main()
   ';
   print '{
   ';
   print '  printf("Hello World!\\n");
   ';
   print '  return 0;
   ';
   print '}
   ';
[ecw]

Your mileage may vary on the exact contents of the first line.  See
the TROUBLESHOOTING section if you have problems generating this
script. 

Running [cw]hello.c.pl[ecw] generates the same text as the original
input file, [cw]hello.c.ppp[ecw].  In this way, perlpp can be viewed
as an obscure and computationally expensive way to copy text files.

The [cw]-pl[ecw] option means "create a perl program."  If you leave
it off, it simply runs the program and saves the output in
[cw]hello.c[ecw].  This means

[cw]
  perlpp hello.c.ppp
[ecw]

is equivalent to

[cw]
  perlpp -pl hello.c.ppp
  ./hello.c.pl > hello.c
  rm hello.c.pl
[ecw]

except that the file [cw]hello.c.pl[ecw] is never explicitly created.

So our first example, [cw]hello.c.ppp[ecw], when normally processed by
perlpp, creates a copy of itself, [cw]hello.c[ecw].  While this
shouldn't excite you, it shouldn't surprise you either.  After all, if
you processed a text file, that had no cpp directives in it, with cpp,
then you would get back exactly what you put in.

Cpp is only interesting when the input file contains cpp directives.
Perlpp is only slightly interesting when the input file contains no
perlpp directives, because it generates a perl script that regenerates
the input file using print statements. To get any further, we need to
discuss the perlpp directives.

There are only four directives for perlpp, and a default directive.
Each describes how a given line of input will be translated into the
perl script:

  1)  <B>!<B> Perl source rule.  If the first character of a line is 
      a ! (bang), then copy the remaining part of the line to the
      generated perl script verbatim.

  2)  ' Print exact.  If the first character of a line is a
      ' (single quote), then generate a single-quoted (uninterpolated)
      print statement.  Executing this print statement will produce the
      remaining part of the input line exactly. 

  3)  <B>"<B> Print interpolated.   If the first character of a line
      is a " (double quote), then generate a double-quoted
      (interpolating) print statement.  Quoting from the perlop(1)
      man page, interpolating strings do the following for you:

        For constructs that do interpolation, variables beginning
        with "$" or "@" are interpolated, as are the following
        sequences:

           \t          tab             (HT, TAB)
           \n          newline         (LF, NL)
           \r          return          (CR)
           \f          form feed       (FF)
           \b          backspace       (BS)
           \a          alarm (bell)    (BEL)
           \e          escape          (ESC)
           \033        octal char
           \x1b        hex char
           \c[         control char
           \l          lowercase next char
           \u          uppercase next char
           \L          lowercase till \E
           \U          uppercase till \E
           \E          end case modification
           \Q          quote regexp metacharacters till \E

        If use locale is in effect, the case map used by \l, \L,

        \u and <\U> is taken from the current locale.  See the
        perllocale manpage.

      It should be noted that \\ (two backslashes) in an interpolated
      string translates into a single backslash, so \\n interpolates
      to \n in the output.  This will show up in our next example.

  4)  ` Print system.  If the first character of a line is a ` (back
      quote), then generate a back-quoted (system) print statement.
      Executing this print statement will produce the output of,
      first, interpolating the remainder of the line like rule 2
      above, then running the interpolated text as a shell command.

If none of the characters BANG(!), SINGLE-QUOTE('), DOUBLE-QUOTE("),
or BACK-QUOTE(`) begin a line, then a default translation occurs:

   With no [cw]-qq[ecw] option, perlpp treats these lines as if they
   began with a single quote---i.e., use the "print exact" rule 2.

   With the [cw]-qq[ecw] option, perlpp treats these lines as if they
   began with a double quote---i.e., use the "print interpolated" 
   rule 3.

The next example will put these directives to work.

2.2 Example 2: Salutations

Create a file called [cw]salutations.c.ppp[ecw] that contains the lines

[cw]
  #include <stdio.h>
  int main()
  {
  !foreach $s ('Hello World!','Hola Mundo!','Caio!') {
  "  printf("$s\\n");
  !}
    return 0;
  }
[ecw]

Let's look at the generated perl script first.  Type

[cw]
  perlpp -pl salutations.c.ppp
[ecw]

In [cw]salutations.c.pl[ecw], you will find

[cw]
  print '#include <stdio.h>
  ';
  print 'int main()
  ';
  print '{
  ';
  foreach $s ('Hello World!','Hola Mundo!','Caio!') {
  print "  printf(\"$s\\n\");
  ";
  }
  print '  return 0;
  ';
  print '}
  ';
[ecw]

Look carefully at the print statement generated by the printf statement
in [cw]salutations.c.ppp[ecw]:

[cw]
  print "  printf(\"$s\\n\");
  ";
[ecw]

Perlpp goes to the trouble of adding backslashes where appropriate so
that double quotes do not prematurely terminate the string.  The same
idea applies to the other forms of quoted print statements perlpp
generates.

Let perlpp run this script for us with

[cw]
  perlpp salutations.c.ppp
[ecw]

This generates the file [cw]salutations.c[ecw],

[cw]
  #include <stdio.h>
  int main()
  {
    printf("Hello World!\n");
    printf("Hola Mundo!\n");
    printf("Caio!\n");
    return 0;
  }
[ecw]

How would you get cpp to do that for you?  And we've only just
begun....

2.3 Example 3: Fast Point Template

This last example uses perlpp to generate a template for 
fixed-length vector classes in C++, where loops are unwound.
Unwinding a loop means, for example, replacing the code,

  for (int i=0; i<3; ++i) a[i]=i;

with

  a[0]=0; a[1]=1; a[2]=2;

Unwinding the loop does not change the effect of the code, but
it does make it faster.  This is because the index variable does
not have to be incremented and compared between each assignment.

Such a fixed-length template class would be useful, for example,
in a graphics library where 2-dimensional and 3-dimensional vectors
of fixed types (float, int, double) would be used by the package.
All of these would be essentially the same---and thus a candidate
for a template class---except that the performance overhead for
the looping may not be acceptable in such a high-end application.

Here is where perlpp can help.  Perlpp is first used to generate a
perl program (using the [cw]-pl[ecw] option) from a template file,
[cw]Point.Template.ppp[ecw].  The [cw]Point.Template.pl[ecw] script is
designed to create different fixed-length vector classes depending on
what arguments are passed to it.  Using the back-quote print system
directive, this script is then used in the primary source file,
[cw]testPoint.cpp.ppp[ecw], to generate the specific desired class.

The file [cw]Point.Template.ppp[ecw] is fairly long, and available by
anonymous ftp as noted in the resources box.  Consequently, I will
only consider the portions of this file which illustrate something
interesting about how to use perlpp. 

The first interesting line of [cw]Point.Template.ppp[ecw] is

[cw]
  ! eval join(";",@ARGV);
[ecw]

This of course will translate into the perl statement

[cw]
  eval join(";",@ARGV);
[ecw]

Only the leading bang is deleted.  Executing this joins all the
command-line arguments of the script, separated by semi-colons,
and evaluates that as a sequence of perl statements.  This is an
extremely crude form of command-line argument processing, but it
serves our purposes.

The next few lines check that the previous command-line evaluation
actually defined three crucial variables: 

  [cw]$name[ecw], the desired name of the class.

  [cw]$dim[ecw], the dimension of the vector.

  [cw]$type[ecw], type of the vector.

If they were not, the script whines to [cw]STDERR[ecw] about it and
exits with an exit code of 1. 

After this, the template gets about the business of generating the
desired class.  This begins with 

[cw]
  "class $name {
  "public:
  !#
  !# Declare an internal array of the desired type and size.
  !#
  "  $type a[$dim];
  "  static const int dim=$dim;
[ecw]

Here [cw]$name[ecw], [cw]$type[ecw] and [cw]$dim[ecw] are used to
create specific text in the class definition.  In perl, [cw]#[ecw]
denotes a comment, so [cw]!#[ecw] is effectively a comment in perlpp.

We see the first instance of loop unwinding in the default
constructor for the class.  The lines  

[cw]
  !  for ($i=0; $i<$dim; ++$i) {
  "    a[$i]=0;
  !  }
[ecw]

Translates into the perl segment

[cw]
  for ($i=0; $i<$dim; ++$i) {
     print("    a[$i]=0;
  ");
  }
[ecw]

The loop is executed in the perl script, as the pre-processor,
where the assignment will be expanded to a sequence of
assignments in the C++ class source.  Loops are unwound in a
similar fashion in other parts of the class definition.

Efficiency aside, the next block of the perlpp source provides
a class constructor which would be impossible to declare using
standard template facilities: one with as many arguments as the
dimension of the vector class to be constructed.  

[cw]
  !  @arg=(); for ($i=0; $i<$dim; ++$i) { $arg[$i]="$type a$i"; } 
  !  $args=join(',',@arg);
  !
  "  $name($args)
[ecw]

If you are new to perl, the first line may be difficult to understand:
it starts by setting the [cw]@arg[ecw] list to an empty list, then
loops to build [cw]$dim[ecw] entries in [cw]@arg[ecw]: [cw]"$type
a0"[ecw], [cw]"$type a1"[ecw], etc.  The reason elements of
[cw]@arg[ecw] are denoted by [cw]$arg[$i][ecw] in the for loop is that
[cw]@arg[ecw], once subscripted, refers to the scalar variable
available as the ith entry of [cw]@arg[ecw].  Remember: scalar
variables always start with a [cw]$[ecw]-sign---even those tucked
inside a list!

Following this declaration, the constructor is defined to initialize
the vector with its arguments:

[cw]
  "  {
  !    for ($i=0; $i<$dim; ++$i) {
  "      a[$i]=a$i;
  !    }
  "  }
[ecw]

This is followed by the definition of subscript operators, which are
perfectly standard.  After this, another feature of perlpp is
illustrated: the code for defining all the assignment operators
is generated using a loop structure:

[cw]
!  foreach $op ("=","+=","-=","*=","/=") {
    .
    . # define the $op assignment operator
    .
!  }
[ecw]

Since all the assignment operators are defined in essentially the
same way, the loop allows the template to be written more compactly
than with the standard template facilities.  This makes the template
faster to write, maintain, and debug.

A similar loop follows this to define the various binary operators for
the class: addition, subtraction, etc.  These loops reduce the
redundancy of effort in defining the template, which, amusingly, is
itself a tool to reduce redundancy of effort.  Okay, I admit that I am
easily amused....

The rest of the template declares friendship with three operators you
might hope for in such a class: I/O functions, and a scalar multiply,
then defines them.  They do what they are supposed to do, and nothing
new about perlpp is divined from going over them.  So I won't.

Let's move on to using [cw]Point.Template.ppp[ecw].  First, convert
it to a perl script with 

[cw]
   perlpp -pl Point.Template.ppp
[ecw]

Now look in the test program source file, [cw]testPoint.cpp.ppp[ecw].
The only interesting line is

[cw]
  ` ./Point.Template.pl '\$name="FixVect"' '\$dim=2' '\$type="float"'
[ecw]

This runs the [cw]Point.Template.pl[ecw] script we just generated with
the arguments 

[cw]
   $name="FixVect"  $dim=2 $type="float"
[ecw]

With these arguments, the template script prints out a
[cw]FixVect[ecw] class, which represents two-dimensional arrays of
floats.  The back-quote perlpp directive includes this into the
[cw]testPoint.cpp[ecw] source file. 

Generating template classes in this way is not completely satisfying,
because the idea of declaring and defining the class usually must be
separated.  However, this can be corrected by modifications of the
template file.  Essentially, a fourth variable could be set on calling
the script, [cw]$use[ecw], which has a value of either
[cw]"declare"[ecw] or [cw]"define"[ecw].  Using if clauses, the script
would then provide either the definition or declaration portion of the
class.  This is yet another way in which the redundancy of a template
can be reduced using perlpp.

3. CONCLUSIONS

I don't want you to leave thinking of perlpp as sort of a "compression
algorithm."  Keeping ideas together in a project simplifies
maintaining them.  The goal of perlpp is to prevent "concept leakage,"
where several parts of source files redundantly represent an idea, and
those source files have to be maintained separately.

Essentially, perlpp replaces the rather rigid (but simple!)
text-processing language available as cpp with the expressive
(but complex) text-processing language available as perl.
Many programmers use perl in any case, so knowing the syntax
of perl pays twice: once as a language in and of itself, and
once as a powerful macro language for any programming language.

If you don't know perl, then perlpp is just another good reason
to learn it!

4. ACKNOWLEDGEMENTS

Thanks to the Linux community for providing such a wonderful
environment for reliable scientific computations.  I try very hard not
to taunt every time a colleague of mine tries to accomplish something
useful on a machine which crashes so often they have come to expect
it.

I have a story to tell.

Over the summer, I left my RedHat 5.0 machine running in my Mesa
State College, Grand Junction, Colorado office.  I then went to
Pacific Northwest National Laboratories in Richland, Washington,
where I dreamed up of perlpp.

I used my Linux box remotely for the whole summer: web-browsing, 
email, obtaining old source files, using emacs, Maple, TeX, Perl,
or the FORTRAN compiler.  It's true that I used these tools on the
PNNL machines as well, but sometimes a license was not available, or
the Linux tool was better for my purposes than what I could obtain at
the lab. 

For six weeks I used that machine remotely at least once each day.
Only once did I have a problem with connecting to it for a resource.
After the summer, I learned that my Colorado office, which is in a
building that is being remodeled, had experienced several power
failures. Apparently, my machine had restarted each time without a
hitch, and I had only noticed the single time I requested something
during an outage.

Now that is far more reliability---and accessibility---than many of
my colleagues experience with other operating systems.

I also want to thank Mike Littlejohn for test-driving perlpp and this
article, as well as Karl Castleton, Steve Yabusaki and Ashok
Chilakapati for getting me on the ground water modeling project.

Finally, thanks to Pacific Northwest National Laboratories, the
Associated Western Universities fellowship program, and Mesa State
College for allowing me the time, resources and opportunity to develop
perlpp. 

TROUBLESHOOTING

Perlpp is a perl script that generates perl scripts.  To use it, you
must have perl installed, and perlpp must be able to find it.  If
perlpp does not work, check that the first two lines of perlpp
reflects the actual location of your perl executable.  

If these are correct, make sure that execute permissions are set for
the script, [cw]chmod 755 perlpp[ecw], and that perlpp is visible from
your [cw]PATH[ecw].

If you just installed perlpp, you may have to refresh your shell PATH
directory cache with [cw]hash -r[ecw] (if you use bash) or [cw]rehash[ecw]
(if you use csh).
