2018-18-03 Kim Walisch  <kim.walisch@gmail.com>

  Version 4.3 released.

  * Support big-endian CPUs.
  * Speed up x86 without POPCNT.
  * libdivide.h: update to libdivide 1.0.
  * calculator.hpp: Fix integer overflow in pow().
  * Required CMake version is now 3.4 (previously 3.1).
  * CMakeLists.txt: Fix make install issue.
  * CMakeLists.txt: Get rid of int128_t, __int128_t checks.
  * isqrt_constexpr.cpp: Add test for compile time square root.

2017-12-02 Kim Walisch  <kim.walisch@gmail.com>

  Version 4.2 released.

  The computation of the second partial sieve function
  P2(x, a) has been speed up by 10% due to an upgrade to the
  latest primesieve-6.3 library.

  * lib/primesieve: upgrade to version 6.3.
  * src/backup.cpp: Speed up resume from backup.
  * test/RiemannR.cpp: Fix bash/ubuntu on Windows issue.
  * CMakeLists.txt: Silence GCC 7 warnings.
  * .travis.yml: Update to Ubuntu 14.

2017-07-19 Kim Walisch  <kim.walisch@gmail.com>

  Version 4.1 released.

  This version improves the backup & resume functionality
  and uses up to 8% less memory due to a reduced alpha
  tuning factor.

  * S2_easy.cpp: Fix severe backup scaling issues.
  * S2_easy_libdivide.cpp: Fix severe backup scaling issues.
  * S2_hard.cpp: Faster resume.
  * LoadBalancer.cpp: Simplify backup.
  * results.txt: Fix incorrect time elapsed.
  * primecount.cpp: smaller alpha tuning factor.

2017-07-06 Kim Walisch  <kim.walisch@gmail.com>

  Version 4.0 released.

  This version features a new highly optimized sieve of
  Eratosthenes implementation for the computation of the
  hard special leaves that speeds up the S2_hard algorithm
  by up to 40%, giving an overall speed up of up to 20%.

  * Sieve.cpp: New sieve of Eratosthenes.
  * S2_hard.cpp: Use new sieve.
  * S2_hard_mpi.cpp: Use new sieve.
  * lmo5.cpp: Use new sieve.
  * pi_lmo_parallel.cpp: Use new sieve.
  * LoadBalancer.cpp: L1 cache size optimization.
  * MpiLoadBalancer.cpp: L1 cache size optimization.

2017-06-27 Kim Walisch  <kim.walisch@gmail.com>

  Version 3.8 released.

  This version reduces the memory usage by a factor of 2
  above 10^20! Furthermore the S2_hard algorithm for
  computing the hard special leaves has been improved: The
  binary indexed tree data structure (random memory access
  pattern) has been removed. This fixes the scaling issues
  above 10^24 primecount had previously.

  * libdivide.h: Reduce memory usage, pack structs.
  * BitSieve.hpp: Reduce memory usage, store odd integers.
  * S2_hard.cpp: Remove binary indexed tree.
  * S2_hard_mpi.cpp: Remove binary indexed tree.
  * primecount.cpp: Update alpha tuning factor formula.

2017-06-07 Kim Walisch  <kim.walisch@gmail.com>

  Version 3.7 released.

  This version features a new multi-threading load balancer
  for the computation of the hard special leaves. The new
  load balancer requires no synchronization between threads
  and achieves 100% CPU cores usage. The new load balancer
  is based on ideas from Xavier Gourdon and Douglas Staple.

  * LoadBalancer.cpp: New multi-threading load balancer.
  * generate_phi.hpp: Part of new load balancer.
  * S2_hard.cpp: Use new load balancer.
  * BitSieve.cpp: Get rid of AVX2 (use POPCNT).
  * Li.cpp: Riemann R and inverse Riemann R implementations.
  * fast_div.hpp: Refactor using template metaprogramming.
  * src/test: Added 27 unit tests.
  * phi(x, a) now scales > 8 threads.
  * BinaryIndexedTree.hpp: Do not store multiples of 2.
  * CMakeLists.txt: Faster C++ compilation.
  * S2Status.cpp: Reduce --status overhead.

2017-03-04 Kim Walisch  <kim.walisch@gmail.com>

  Version 3.6 released.

  This version features a new AVX2 popcount algorithm which
  computes the hard special leaves up to 15% faster on x86 CPUs
  with AVX2 support (2013 or later).

  * BitSieve-popcnt.cpp: New AVX2 popcount algorithm.
  * popcnt.hpp: Fix clang performance bug.
  * primecount.cpp: Fix clang time measuring.
  * CMakeLists.txt: Add AVX2 check.

2016-12-16 Kim Walisch  <kim.walisch@gmail.com>

  Version 3.5 released.

  * CMake: Use CMake build system instead of Autotools.
  * README.md: Update build instructions.

2016-08-06 Kim Walisch  <kim.walisch@gmail.com>

  Version 3.4 released.

  * Makfile.msvc: Use libdivide also with MSVC compiler.
  * S2LoadBalancer.cpp: Improved load balancing.
  * P2.cpp: Improved load balancing.
  * P2_mpi.cpp: Improved load balancing.
  * .travis.yml: Add static C++ code analysis using cppcheck.

2016-07-16 Kim Walisch  <kim.walisch@gmail.com>

  Version 3.3 released.

  * README.md: Fix error in "C++ API" section.
  * S2_hard.cpp: Refactoring.
  * S2_hard_mpi.cpp: Refactoring.
  * P2.cpp: Refactoring.
  * P2_mpi.cpp: Refactoring.

2016-05-09 Kim Walisch  <kim.walisch@gmail.com>

  Version 3.2 released.

  This version runs up to 5% faster due to an improved P2(x, a)
  implementation.

  * P2.cpp: 30% faster.
  * P2_mpi.cpp: 30% faster.
  * libdivide.h: Update to latest version.
  * autogen.sh: Print error msg if Autotools is not installed.

2016-04-02 Kim Walisch  <kim.walisch@gmail.com>

  Version 3.1 released.

  This version runs up to 20% faster! The speed up is due to
  the usage of libdivide in S2_easy, libdivide allows to replace
  expensive integer divides with comparatively cheap
  multiplication and bitshifts.

  * S2_easy_libdivide.cpp: 40% speed up due to libdivide.
  * S2_easy_mpi_libdivide.cpp: 40% speed up due to libdivide.
  * BitSieve.cpp: Fix broken Big-Endian CPU support.

2016-03-12 Kim Walisch  <kim.walisch@gmail.com>

  Version 3.0 released.

  In this release I have merged the MPI (Message Passing Interface)
  branch into the master branch. primecount can now distribute
  computations onto cluster nodes if MPI is enabled in the build
  process (--enable-mpi).

  * doc/primecount-MPI.md: primecount MPI documentation.
  * src/mpi: primecount MPI source code.
  * include/FactorTable.hpp: Multi-threaded initialization.
  * build.sh: Improved build script with support for primecount MPI.
  * README.md: Updated build instructions.

2016-02-14 Kim Walisch  <kim.walisch@gmail.com>

  Version 2.6 released.

  primecount has been distributed using MPI (Message Passing Interface)
  and can now run computations on large clusters!
  The distributed version of primecount is named primecount-mpi and
  the code can be found at:

  https://github.com/kimwalisch/primecount/tree/mpi

  * src/phi.cpp: 2x speed up due to pi(x) lookup table optimization.
  * scripts/build.sh: Easily build primecount.

2016-01-24 Kim Walisch  <kim.walisch@gmail.com>

  Version 2.5 released.

  This version adds logging to primecount-backup and fixes 2
  integer overflow bugs in primecount-backup.

  * --log: Logs results into results.txt and primecount.log.
  * scripts/worktodo.sh: Script for batch processing via worktodo.txt.
  * src/S1.cpp: Fix integer overflow bug (in backup branch).
  * src/deleglise-rivat/S2_trivial.cpp: Fix integer overflow bug (in backup branch).

2015-12-27 Kim Walisch  <kim.walisch@gmail.com>

  Version 2.4 released.

  * README.md: Simplified build instructions.
  * appveyor.yml: Automated Windows (MSVC++) testing.
  * configure.ac: Removed usage of buggy ax_gcc_builtin.m4 macro.
  * src/P2.cpp: Port to primesieve-5.5.0 library.
  * src/test.cpp: Faster nth prime testing.

2015-10-07 Kim Walisch  <kim.walisch@gmail.com>

  Version 2.3 released.

  This version runs about 10% faster for x <= 10^21. Because of the
  sieving optimizations implemented in primecount-2.2 the sieving
  limit has now been increased which reduces the time to compute the
  easy special leaves.

  I have now also officially published primecount-backup binaries
  which save intermediate results to a file once per hour and which
  can resume from these files after e.g. a crash:
  https://github.com/kimwalisch/primecount/tree/backup

  * Renamed to --P2, --S1, --S2_trivial, --S2_easy, --S2_hard.
  * find_fastest_alpha.sh: Benchmark which finds fastest alpha factors.
  * src/primecount.cpp: Alpha factor tuning.
  * src/P2.cpp: Use multi-threading for initialization.
  * src/S2Status.cpp: --status[=N], N digits after decimal point.
  * src/pi_lehmer.cpp: Remove pi_lehmer2(x).

2015-09-19 Kim Walisch  <kim.walisch@gmail.com>

  This version runs about 10% faster due to newly added pre-sieving
  of small primes and wheel factorization.

  Version 2.2 released.

  * src/primecount.cpp: Increase pi(x) limit to 10^31 (previously 10^27).
  * src/BitSieve.cpp: Add pre-sieving of small primes.
  * src/P2.cpp: Use pre-sieving and wheel factorization.
  * src/deleglise-rivat/*: Use pre-sieving and wheel factorization.
  * src/lmo/*: Use pre-sieving and wheel factorization.
  * include/popcount.hpp: Faster popcount algorithm for CPUs without POPCNT.
  * include/Wheel.hpp: New wheel factorization data structures.

2015-04-12 Kim Walisch  <kim.walisch@gmail.com>

  Version 2.1 released.

  * src/S1.cpp: Fix Windows performance regression.
  * src/S2LoadBalancer.cpp: Refactoring.
  * Makefile.am: Add autogen.sh to EXTRA_DIST.

2015-03-26 Kim Walisch  <kim.walisch@gmail.com>

  Version 2.0 released.

  This version improves the POPCNT algorithm in the computation of the
  hard special leaves and contains a new algorithm for the computation
  of the ordinary leaves which uses half as much memory.

  * src/S1.cpp: New implementation based on Douglas Staple's algorithm.
  * src/S2LoadBalancer.cpp: Improved load balancing.
  * src/deleglise-rivat/S2_hard.cpp: Also use POPCNT if high < y.
  * src/lmo/pi_lmo5.cpp: Optimized sieving, up to 10% faster.
  * src/lmo/pi_lmo_parallel3.cpp: Optimized sieving, up to 10% faster.
  * include/BitSieve.hpp: Add count backwards optimization.

2015-03-08 Kim Walisch  <kim.walisch@gmail.com>

  Version 1.9 released.

  This version introduces new command-line options for calculating
  intermediate formulas of the Deleglise-Rivat algorithm. This allows
  to distribute the computation of large pi(x) values on multiple
  systems. This version also fixes two non-critical bugs.

  * src/app: New options: --p2, --s1, --s2_trivial, --s2_easy, --s2_hard.
  * src/deleglise-rivat/S2_trivial.cpp: Fix off by 1 bug.
  * src/deleglise-rivat/*: Bugfix, set 1 and unset 2 in sieve.
  * src/deleglise-rivat/S2_hard.cpp: Improved scaling for large x.
  * src/deleglise-rivat/*: Alpha factor tuning.
  * src/lmo/*: Bugfix, set 1 and unset 2 in sieve.
  * src/lmo/*: Alpha factor tuning.
  * src/primecount.cpp: Improved alpha formula.
  * src/print.cpp: New print functions for terminal output.

2015-02-28 Kim Walisch  <kim.walisch@gmail.com>

  Version 1.8 released.

  This version reduces the memory usage of the Deleglise-Rivat
  implementation by up to 30 percent. Instead of using the same set of
  memory intense data structures for all formulas (P2, S1, S2_trvial,
  S2_easy, S2_hard), each individual formula now generates its own set
  of data structures and the size of each data structure is the
  smallest possible for the given formula.

  * -a<N>, --alpha=<N>: Manually set the tuning factor.
  * src/primecount.cpp: Improved alpha factor formula.
  * src/deleglise-rivat/*: Reduce memory usage.
  * src/lmo/*: Update S1(x) function calls.
  * src/nth_prime.cpp: Add optimization for small primes.
  * src/app/*: Add --alpha option.
  * avoid_128bit_div.patch: merged into master branch.

2015-01-26 Kim Walisch  <kim.walisch@gmail.com>

  Version 1.7.1 released.

  * Makefile.am: Add avoid_128bit_div.patch to EXTRA_DIST
  * avoid_128bit_div.patch: Renamed

2015-01-24 Kim Walisch  <kim.walisch@gmail.com>

  Version 1.7 released.

  This version runs to 20% faster on Windows for numbers >= 2^63 by
  using 64-bit divisions instead of 128-bit divisions whenever
  possible. While primecount-1.6 has already been very fast on Linux
  it ran slower on Windows, primecount-1.7 now achieves the same
  speed on both Windows and Linux.

  * fast_div.patch: Avoid 128-bit divisions.

2015-01-05 Kim Walisch  <kim.walisch@gmail.com>

  Version 1.6 released.

  This version calculates hard special leaves much faster, above a
  certain threshold the algorithm switches to POPCNT for counting the
  number of unsieved elements (instead of Tomás Oliveira's special
  tree data structure) which dramatically improves memory efficiency.
  This version also fixes a serious bug in P2(x, a) for x > 10^21,
  thanks to Dana Jacobsen for reporting it.

  * src/deleglise-rivat/S2_hard.cpp: Add POPCNT optimization.
  * src/lmo/pi_lmo_parallel3: Add POPCNT optimization.
  * src/lmo/pi_lmo5: Add POPCNT optimization.
  * src/P2.cpp: Bug fix for numbers > 10^21.
  * src/BitSieve.hpp: Improved memory efficiency.
  * include/isqrt.hpp: Fixed C++11 bug in isqrt(x).
  * include/min.hpp: Refactoring.
  * include/int128.hpp: Add int128_t trait functions.

2014-12-27 Kim Walisch  <kim.walisch@gmail.com>

  Version 1.5 released.

  This version runs up to 30 percent faster than primecount-1.4 for
  numbers > 2^64. The speed up is achieved using a clever trick:
  Instead of calculating xn = x / (primes[b] * primes[l]) which uses
  a 128-bit division, x2 = x / primes[b] is calculated upfront and
  then xn = x2 / primes[l]. If x2 is < 2^64 then xn can be calculated
  using a 64-bit division which is much faster.

  * src/deleglise-rivat/S2_*.cpp: Avoid 128-bit divisions.
  * src/S2Status.cpp: Improved status precision.
  * src/app/cmdoptions.cpp: Set -l = --lmo instead of --lehmer.
  * README.md: Add command-line options summary.

2014-12-15 Kim Walisch  <kim.walisch@gmail.com>

  Version 1.4 released.

  * src/deleglise-rivat/*.cpp: Improved computation of special leaves.
  * src/P2.cpp: Use unsigned division.
  * src/S1.cpp: Use unsigned division.
  * src/S2LoadBalancer.cpp: Update documentation.
  * src/PhiCache.cpp: Update documentation.
  * include/PiTable.hpp: Update documentation.

2014-11-04 Kim Walisch  <kim.walisch@gmail.com>

  Version 1.3 released.

  * Fixed bug in m4-ax_popcnt.m4 for QEMU virtual machines.
  * Limit number of threads in phi.cpp to 8.
  * Improve scaling of P2_lehmer(x, a).
  * Improve scaling of P3(x, a).

2014-08-24 Kim Walisch  <kim.walisch@gmail.com>

  Version 1.2 released.

  * Add --status (-s) command-line option.
  * src/S2LoadBalancer.cpp: New faster load balancer.
  * src/S2Status.cpp: Print S2 progress.
  * Fixed integer overflow bugs in: Li_inverse(x), isqrt(x), iroot(x)

2014-08-06 Kim Walisch  <kim.walisch@gmail.com>

  Version 1.1 released.

  * pi_deleglise_rivat4.cpp: 128-bit implementation.
  * pi_deleglise_rivat_parallel4.cpp: 128-bit implementation.
  * Add --time option to print elapsed seconds.
  * include/S1.hpp: Added multi-threading and templates.
  * include/FactorTable.hpp: template implementation.
  * src/PiTable.cpp: Faster initialization.
  * src/balance_S2_load.cpp: Improved load balancer.
  * src/generate.cpp: Contains all generate_*() functions.

2014-07-19 Kim Walisch  <kim.walisch@gmail.com>

  Version 1.0 released.

  * src/BitSieve.cpp: Fix big-endian CPU bug.
  * src/lmo/*.cpp: Improved alpha factor.
  * src/deleglise-rivat/*.cpp: Improved alpha factor.
  * include/pmath.hpp: Add max3(a, b, c).
  * m4/m4-ax_popcnt.m4: Support non x86 CPU architectures.
  * Readme.md: Update documentation.

2014-07-06 Kim Walisch  <kim.walisch@gmail.com>

  Version 0.21 released.

  * pi_deleglise_rivat_parallel3.cpp: Uses 10 times less memory!
  * include/FactorTable.hpp: Compressed lookup table for lpf and Moebius.
  * include/PiTable.hpp: Compressed lookup table for prime counts.
  * include/popcount.hpp: Count bits using the POPCNT instruction.
  * configure.ac: Check if CPU supports POPCNT.

2014-06-15  Kim Walisch  <kim.walisch@gmail.com>

  Version 0.20 released.

  * src/balance_S2_load.cpp: New load balancing algorithm.
  * src/deleglise-rivat/*: Deleglise & Rivat implementations.
  * src/lmo/pi_lmo6.cpp: New LMO implementation.
  * src/lmo/pi_lmo_parallel6.cpp: New parallel LMO implementation.
  * src/lmo/init_square_free.cpp: For special leaves of type (prime * square_free).
  * src/popcount.cpp: Fast bit population count algorithms for BitSieve.
  * src/test.cpp: Add tests for pi_lmo*6.cpp.
  * include/aligned_vector.hpp: vector without false sharing issues.
  * include/BitSieve.hpp: bit vector with 8 numbers per byte.

2014-05-26  Kim Walisch  <kim.walisch@gmail.com>

  Version 0.19 released.

  * src/lmo/pi_lmo_parallel1.cpp: Simple parallel LMO implementation.
  * src/lmo/pi_lmo_parallel2.cpp: Advanced parallel LMO implementation.
  * src/lmo/pi_lmo_parallel3.cpp: Improved load balancing.
  * src/P2.cpp: New parallel P2(x, y) implementation.
  * src/phi.cpp: Fix race condition.
  * src/PhiTiny.cpp: Make thread safe.
  * src/test.cpp: Add phi(x, a) thread safety test.

2014-05-10  Kim Walisch  <kim.walisch@gmail.com>

  Version 0.18 released.

  * docs/computing-special-leaves.md: How to compute special leaves (LMO).
  * src/pi_lmo5.cpp: Faster new LMO implementation.
  * src/Pk.cpp: Faster new P2(x, a) implementation.
  * src/test.cpp: Add test for pi_lmo5(x).
  * src/pi_lmo*.cpp: Refactoring.

2014-04-19  Kim Walisch  <kim.walisch@gmail.com>

  Version 0.17 released.

  * .travis.yml: Travis continuous integration testing.
  * configure.ac: Make silent building default.
  * Makefile.msvc: Add Microsoft Visual C++ Makefile.
  * src/pi_lmo2.cpp: Speed improvement.
  * src/pi_lmo3.cpp: Speed improvement.
  * src/pi_lmo4.cpp: Speed improvement.
  * src/nth_prime.cpp: Port to primesieve-5.2.
  * src/test.cpp: Display testing status.

2014-03-30  Kim Walisch  <kim.walisch@gmail.com>

  Version 0.16 released.

  * src/pi_lmo2.cpp: LMO using sieve of Eratosthenes.
  * src/pi_lmo3.cpp: LMO using segmented sieve of Eratosthenes.
  * src/pi_lmo4.cpp: LMO using special tree data structure.

2014-02-02  Kim Walisch  <kim.walisch@gmail.com>

  Version 0.15 released.

  * Precompiled binaries for Windows 64-bit and Linux x64-64.
  * Readme.md: New "Precompiled binaries" section.
  * Readme.md: Use svg formula images.
  * src/pi_lmo_simple.cpp: Code cleanup.

2014-01-05  Kim Walisch  <kim.walisch@gmail.com>

  Version 0.14 released.

  * Moved build system to GNU Autotools.
  * Renamed *.h to *.hpp
  * src/test.cpp: Refactoring.

2013-12-31  Kim Walisch  <kim.walisch@gmail.com>

  Version 0.10 released.

  * src/PhiCache.h: Faster phi(x, a).
  * src/test.cpp: Update pi_lmo_simple(x) test.
  * src/pi_primesieve.cpp: Fix primesieve API changes.
  * src/nth_prime.cpp: Fix primesieve API changes.
  * README: Updated timings.

2013-12-26  Kim Walisch  <kim.walisch@gmail.com>

  Version 0.9 released.

  * src/pi_lmo_simple.cpp: Parallelized algorithm.
  * src/PhiTiny.h: New class, cache of phi(x, a) values for a < 7.
  * src/PhiCache.h:: Add PhiTiny optimization.
  * src/phi.cpp: Add PhiTiny optimization.
  * src/pi_bsearch.h: Refactoring.
  * src/Pk.cpp: Refactoring.

2013-12-14  Kim Walisch  <kim.walisch@gmail.com>

  Version 0.8 released.

  * src/cmdoptions.cpp: Fixed a calculator bug.
  * src/calculator.hpp: New expression parser for command-line options.

2013-10-24  Kim Walisch  <kim.walisch@gmail.com>

  Version 0.7 released.

  * src/*: Ported to primesieve-5.0.
  * include/primecount.h: Ported to primesieve-5.0.
  * Readme.md: Updated build instructions.
  * Makefile: Updated.

2013-09-04  Kim Walisch  <kim.walisch@gmail.com>

  Version 0.6 released.

  * src/phi.cpp: Added phi binary search optimization.
  * Makefile: Fixed ldconfig bug.
  * Readme.md: Updated benchmark timings.
  * TODO: Updated.
  * THANKS: Updated.

2013-09-01  Kim Walisch  <kim.walisch@gmail.com>

  Version 0.5 released.

  * src/primecount.cpp: New functions: phi(x, a), pi_lmo_simple(x).
  * src/cmdoptions.cpp: New command-line options: '--phi', '--lmo_simple'.
  * src/help.cpp: Added phi and Lagarias-Miller-Odlyzko info.
  * src/pi_lmo_simple.cpp: Simple Lagarias-Miller-Odlyzko implementation.
  * src/phi.cpp: Fixed bug, uses less memory.
  * include/primecount.h: Added phi(x, a) to the public API.
  * Makefile: Added pi_lmo.o, pi_lmo_simple.o.

2013-07-30  Kim Walisch  <kim.walisch@gmail.com>

  Version 0.4 released.

  * README.md: High definition math formula images.
  * src/Pk.cpp: More accurate math formulas.
  * src/pi_meissel.cpp: Simpler algorithm.
  * src/pi_lehmer.cpp: Simpler algorithm.
  * images/*.png: High definition math formula images.

2013-07-26  Kim Walisch  <kim.walisch@gmail.com>

  Version 0.3 released.

  * README: Text version of README.md for offline reading.
  * README.md: Added algorithms, timings and nth prime sections.
  * THANKS: List of people that contributed to primecount.
  * src/Pk.cpp: Partial sieve functions: P2(x, a), P3(x, a).
  * src/*.cpp: Documented all source files.
  * include/primecount.h: Updated API documentation.
  * images/*.png: Math formula images.
