- check that mert-moses.pl emits devset score after every iteration
  - correctly for whichever metric we are optimizing
  - even when using --pairwise-ranked (PRO)
    - this may make use of 'evaluator', soon to be added by Matous Machacek

- check that --pairwise-ranked is compatible with all optimization metrics

- Replace the standard rand() currently used in MERT and PRO with better
  random generators such as Boost's random generators (e.g., boost::mt19937).
  - create a Random class to hide the details, i.e., how to generate
    random numbers, which allows us to use custom random generators more
    easily.

  Pros:
  - In MERT, you might want to use the random restarting technique to avoid
    local optima.
  - PRO uses a sampling technique to choose candidate translation pairs
    from N-best lists, which means the choice of random generators seems to
    be important.

  Cons:
  - This change will require us to re-create the truth results for regression
    testing related to MERT and PRO because the new random generator will
    generate different numbers from the current generator does.
