[Previous] [Up] [Next]

A Tour of NTL: Using NTL with GMP


GMP is the GNU Multi-Precision library. You can get more information about it, as well as the latest version from here.

Briefly, GMP is a library for long integer arithmetic. It has hand-crafted assembly routines for a wide variety of architectures. For basic operations, like integer multiplication, it can be two to three (and sometimes bit more) times faster than NTL. The speedup is most dramatic on x86 machines.

As of version 4.2, it is possible to link the GMP library with NTL so as to get most of the benefits of GMP, but while still maintaining complete backward compatability. Building NTL with GMP takes a few extra minutes work, and you certainly do not need to use NTL with GMP if you don't want to. As far as I know, GMP is only available on Unix systems and on Windows systems using Cygwin tools.

Downloading and building GMP

To dowload and build GMP on your machine, do the following:

Step 1. Download GMP from here. You will get a file gmp-XXX.tar.gz.

Step 2. Unpack GMP as follows:

   % gunzip gmp-XXX.tar.gz
   % tar xf gmp-XXX.tar
This creates a directory gmp-XXX. Go there now:
   % cd gmp-XXX

Step 3. Build GMP as follows:

   % ./configure --disable-shared --prefix=<gmp_prefix>
   % make
   % make install
Here, <gmp_prefix> should be the name of a directory where you would like to store the GMP library components. This builds and installs GMP, creating files <gmp_prefix>/include/gmp.h and <gmp_prefix>/lib/libgmp.a.

The options --disable-shared and --prefix=<gmp_prefix> to configure are both optional. The first option disables the creation of shared libraries, which simplifies things just a bit (in particular, this documentation). If you don't pass the second option, then <gmp_prefix> defaults to /usr/local, and and you have to have root permissions to run make install.

Executing make uninstall undoes the make install.

Executing make distclean removes everything created by configure and make.

Building and using NTL with GMP

When building NTL with GMP, you have to tell NTL that you want to use GMP, and where the include files and library are. The easiest way to do this is by passing the argument GMP=on to the configuration script when you are installing NTL. That is, you execute:

   % ./configure GMP=on  GMP_PREFIX=<gmp_prefix>
where <gmp_prefix> is the name of the directory in which GMP was installed above.

If you need more fine-grained control, you can execute:

   % ./configure GMP=on GMP_INCDIR=-I<gmp_prefix>/include GMP_LIBDIR=-L<gmp_prefix>/lib
Alternatively, the following achieves more or less the same thing:
   % ./configure GMP=on CPPFLAGS=-I<gmp_prefix>/include LDFLAGS=-L<gmp_prefix>/lib

If you installed GMP in a standard system directory, then

   % ./configure GMP=on
does the job.

Instead of passing arguments to the configure script, you can also just edit the makefile by hand. The documentation in the makfile should be self-explanatory.

When compiling programs that use NTL with GMP, you need to link with the GMP library. If GMP is not installed in a standard place, this just means adding -L<gmp_prefix>/lib -lgmp to the compilation command. If you installed GMP in a standard system directory, thewn just -lgmp does the job.

NTL has been tested and works correctly with versions 2.0.2, 3.0.1, and 3.1 of GMP. The latter version is generally faster. It is not recommended to use versions prior to 2.0.2, nor with version 3.0.

When using NTL with GMP, as a user of NTL, you do not need to know or understand anything about the the GMP library. So while there is detailed documentation available about how to use GMP, you do not have to read it.

Some implementation details

The way NTL uses GMP is a "quick and dirty", yet fairly effective hack. There are two ways one could incorporate GMP into NTL. One way is the "morally correct" way, and the other is the quick and dirty hack that was actually implemented.

The morally correct way would be to have an abstract interface for long integer arithmetic that could be implemented in one of several ways, so in particular, either with LIP or GMP. Although NTL provides a nice abstract interface for long integer arithmetic, it in fact subverts this abstraction at a number of places, so that taking the morally correct path would be both painstaking and, worse, error prone.

The quick and dirty approach that I actually took was to convert "on the fly" between LIP and GMP representations. This makes the use of GMP completely invisible to higher layer software.

Of course, there is a penalty: converting between representations takes time. For operations like addition, conversion would take longer than performing the operation, and so it is not done. However, for computationally expensive operations like multiplication, the "overhead" is not so bad, at least for numbers that are not too small. To multiply two 256-bit numbers on a Pentium-II, the extra time required for the data conversions is just 35% of the time to do the multiplication in GMP, i.e., the "overhead" is 35%. Put differently, we could perform the multiply 26% faster if we used GMP directly, so the "opportunity cost" is 26%. That's not too bad. For 512-bit numbers, the corresponding opportunity cost is about 14%, and for 1024-bit numbers, it is less than 10%.

For smaller numbers, the opportunity cost is greater, but never much worse than about 50%.

Multiplication is the worst case scenario. Operations like division are slower, so that the corresponding "opportunity cost" is even smaller, and for really heavyweight operations like modular exponentiation, the opportunity cost is truly negligible even for quite small numbers. For example, the "opportunity cost" for 512-bit by 256-bit division on the Pentium-II is about 20%.

So by using this quick and dirty approach, I was able to get most of the benefits of GMP, without too much effort, and more importantly, while maintaining complete backward compatability, and also minimizing the chance introducing bugs. Maybe someday I will find the time and courage to take the morally correct path, but that day is still some time off. In the meantime, NTL users can enjoy most of the speed benefits of GMP.

Besides multiplication, the following integer operations benefit from GMP: division, GCD, extended GCD, modular inverse, modular exponentiation, and square roots. Speeding up these basic operations of course has a ripple effect, speeding up many other operations throughout NTL (although not in any uniform fashion).

Here is some timing data. I measured the running time of multiplying two n-bit numbers, for n=64,128,256,512,1024,2048,4096. I made these timings with "classic NTL" (i.e., LIP only), "NTL with GMP", and "pure GMP". I used GMP version 3.0.1 in all cases, and performed the tests on three different platforms:

For each platform, the GNU gcc compiler was used.

The following tables present the timing information. There is a separate table for each platform. Each row in the table gives the running time for the three different codes (classic NTL, hybrid NTL/GMP, pure GMP). Of course, each operation was repeated many times, and an average was taken. Nevertheless, the timings should be taken as fairly rough estimates.



Pentium-II
time (in microseconds) to multiply two n-bit numbers

   n: classic NTL  NTL/GMP  pure GMP
---------------------------------------
  64:     0.801     0.803     0.618
 128:     3.023     1.900     1.082
 256:     8.821     3.762     2.789
 512:    29.373    10.300     8.907
1024:    94.147    30.899    28.687
2048:   278.320    93.384    88.959
4096:   858.154   282.593   274.353




Pentium-III 
time (in microseconds) to multiply two n-bit numbers

   n: classic NTL  NTL/GMP  pure GMP
---------------------------------------
  64:     0.319     0.309     0.243
 128:     1.119     0.700     0.399
 256:     3.185     1.366     1.017
 512:    10.719     3.743     3.214
1024:    33.798    11.120    10.395
2048:    99.640    33.455    32.120
4096:   307.007   101.013    98.572



PowerPC
time (in microseconds) to multiply two n-bit numbers

   n: classic NTL  NTL/GMP   pure GMP
---------------------------------------
  64:     1.745     1.740     1.385
 128:     4.578     3.653     2.148
 256:    10.986     7.172     5.207
 512:    37.079    19.150    16.041
1024:   119.781    56.534    51.270
2048:   352.783   167.847   160.828
4096:  1107.178   513.916   493.774





Alpha
time (in microseconds) to multiply two n-bit numbers

   n: classic NTL  NTL/GMP   pure GMP
---------------------------------------
  64:     0.562     0.562     0.313
 128:     0.996     0.996     0.490
 256:     2.905     2.119     1.179
 512:     8.481     4.711     3.345
1024:    24.807    12.821    11.032
2048:    76.727    37.097    34.636
4096:   234.439   109.371   107.033



Note that on the two 32-bit machines, for the two n=32 timings, the classic NTL and hybrid NTL/GMP codes are the same. For the 64-bit machine, the same holds for n=32,64.

Below are the results of some timing tests for divsion with remainder on the Pentium-II.



Pentium II
time (in microseconds) to compute a % b, where a has 2*n bits, and b has n-bits

   n: classic NTL  NTL/GMP   pure GMP
---------------------------------------
  64:     3.467     3.481     1.287
 128:     5.903     4.768     2.618
 256:    13.599     8.125     6.475
 512:    38.986    18.005    14.477
1024:   128.174    45.395    42.953
2048:   452.271   141.602   134.735
4096:  1682.129   488.281   477.295


Below are some timing tests for modular exponentiation on the Pentium-II.



Pentium II
time (in microseconds) to compute a^b % c for n-bit integers a, b, c

   n: classic NTL  NTL/GMP   pure GMP
---------------------------------------
  64:   314.331   231.018   219.116
 128:  1394.043   823.975   819.092
 256:  6240.234  3291.016  3281.250
 512: 35312.500 14921.875 14882.812
1024:228125.000 78281.250 78281.250



[Previous] [Up] [Next]