A Tour of NTL: Using NTL with GMP
GMP is the GNU Multi-Precision library. You can get more information about it, as well as the latest version from here.
Briefly, GMP is a library for long integer arithmetic. It has hand-crafted assembly routines for a wide variety of architectures. For basic operations, like integer multiplication, it can be two to three (and sometimes bit more) times faster than NTL. The speedup is most dramatic on x86 machines.
As of version 4.2, it is possible to link the GMP library with NTL so as to get most of the benefits of GMP, but while still maintaining complete backward compatability. Building NTL with GMP takes a few extra minutes work, and you certainly do not need to use NTL with GMP if you don't want to. As far as I know, GMP is only available on Unix systems and on Windows systems using Cygwin tools.
To dowload and build GMP on your machine, do the following:
Step 1. Download GMP from here. You will get a file gmp-XXX.tar.gz.
Step 2. Unpack GMP as follows:
% gunzip gmp-XXX.tar.gz % tar xf gmp-XXX.tarThis creates a directory gmp-XXX. Go there now:
% cd gmp-XXX
Step 3. Build GMP as follows:
% ./configure --disable-shared --prefix=<gmp_prefix> % make % make installHere, <gmp_prefix> should be the name of a directory where you would like to store the GMP library components. This builds and installs GMP, creating files <gmp_prefix>/include/gmp.h and <gmp_prefix>/lib/libgmp.a.
The options --disable-shared and --prefix=<gmp_prefix> to configure are both optional. The first option disables the creation of shared libraries, which simplifies things just a bit (in particular, this documentation). If you don't pass the second option, then <gmp_prefix> defaults to /usr/local, and and you have to have root permissions to run make install.
Executing make uninstall undoes the make install.
Executing make distclean removes everything created by configure and make.
When building NTL with GMP, you have to tell NTL that you want to use GMP, and where the include files and library are. The easiest way to do this is by passing the argument GMP=on to the configuration script when you are installing NTL. That is, you execute:
% ./configure GMP=on GMP_PREFIX=<gmp_prefix>where <gmp_prefix> is the name of the directory in which GMP was installed above.
If you need more fine-grained control, you can execute:
% ./configure GMP=on GMP_INCDIR=-I<gmp_prefix>/include GMP_LIBDIR=-L<gmp_prefix>/libAlternatively, the following achieves more or less the same thing:
% ./configure GMP=on CPPFLAGS=-I<gmp_prefix>/include LDFLAGS=-L<gmp_prefix>/lib
If you installed GMP in a standard system directory, then
% ./configure GMP=ondoes the job.
Instead of passing arguments to the configure script, you can also just edit the makefile by hand. The documentation in the makfile should be self-explanatory.
When compiling programs that use NTL with GMP, you need to link with the GMP library. If GMP is not installed in a standard place, this just means adding -L<gmp_prefix>/lib -lgmp to the compilation command. If you installed GMP in a standard system directory, thewn just -lgmp does the job.
NTL has been tested and works correctly with versions 2.0.2, 3.0.1, and 3.1 of GMP. The latter version is generally faster. It is not recommended to use versions prior to 2.0.2, nor with version 3.0.
When using NTL with GMP, as a user of NTL, you do not need to know or understand anything about the the GMP library. So while there is detailed documentation available about how to use GMP, you do not have to read it.
The way NTL uses GMP is a "quick and dirty", yet fairly effective hack. There are two ways one could incorporate GMP into NTL. One way is the "morally correct" way, and the other is the quick and dirty hack that was actually implemented.
The morally correct way would be to have an abstract interface for long integer arithmetic that could be implemented in one of several ways, so in particular, either with LIP or GMP. Although NTL provides a nice abstract interface for long integer arithmetic, it in fact subverts this abstraction at a number of places, so that taking the morally correct path would be both painstaking and, worse, error prone.
The quick and dirty approach that I actually took was to convert "on the fly" between LIP and GMP representations. This makes the use of GMP completely invisible to higher layer software.
Of course, there is a penalty: converting between representations takes time. For operations like addition, conversion would take longer than performing the operation, and so it is not done. However, for computationally expensive operations like multiplication, the "overhead" is not so bad, at least for numbers that are not too small. To multiply two 256-bit numbers on a Pentium-II, the extra time required for the data conversions is just 35% of the time to do the multiplication in GMP, i.e., the "overhead" is 35%. Put differently, we could perform the multiply 26% faster if we used GMP directly, so the "opportunity cost" is 26%. That's not too bad. For 512-bit numbers, the corresponding opportunity cost is about 14%, and for 1024-bit numbers, it is less than 10%.
For smaller numbers, the opportunity cost is greater, but never much worse than about 50%.
Multiplication is the worst case scenario. Operations like division are slower, so that the corresponding "opportunity cost" is even smaller, and for really heavyweight operations like modular exponentiation, the opportunity cost is truly negligible even for quite small numbers. For example, the "opportunity cost" for 512-bit by 256-bit division on the Pentium-II is about 20%.
So by using this quick and dirty approach, I was able to get most of the benefits of GMP, without too much effort, and more importantly, while maintaining complete backward compatability, and also minimizing the chance introducing bugs. Maybe someday I will find the time and courage to take the morally correct path, but that day is still some time off. In the meantime, NTL users can enjoy most of the speed benefits of GMP.
Besides multiplication, the following integer operations benefit from GMP: division, GCD, extended GCD, modular inverse, modular exponentiation, and square roots. Speeding up these basic operations of course has a ripple effect, speeding up many other operations throughout NTL (although not in any uniform fashion).
Here is some timing data. I measured the running time of multiplying two n-bit numbers, for n=64,128,256,512,1024,2048,4096. I made these timings with "classic NTL" (i.e., LIP only), "NTL with GMP", and "pure GMP". I used GMP version 3.0.1 in all cases, and performed the tests on three different platforms:
The following tables present the timing information. There is a separate table for each platform. Each row in the table gives the running time for the three different codes (classic NTL, hybrid NTL/GMP, pure GMP). Of course, each operation was repeated many times, and an average was taken. Nevertheless, the timings should be taken as fairly rough estimates.
Pentium-II time (in microseconds) to multiply two n-bit numbers n: classic NTL NTL/GMP pure GMP --------------------------------------- 64: 0.801 0.803 0.618 128: 3.023 1.900 1.082 256: 8.821 3.762 2.789 512: 29.373 10.300 8.907 1024: 94.147 30.899 28.687 2048: 278.320 93.384 88.959 4096: 858.154 282.593 274.353
Pentium-III time (in microseconds) to multiply two n-bit numbers n: classic NTL NTL/GMP pure GMP --------------------------------------- 64: 0.319 0.309 0.243 128: 1.119 0.700 0.399 256: 3.185 1.366 1.017 512: 10.719 3.743 3.214 1024: 33.798 11.120 10.395 2048: 99.640 33.455 32.120 4096: 307.007 101.013 98.572
PowerPC time (in microseconds) to multiply two n-bit numbers n: classic NTL NTL/GMP pure GMP --------------------------------------- 64: 1.745 1.740 1.385 128: 4.578 3.653 2.148 256: 10.986 7.172 5.207 512: 37.079 19.150 16.041 1024: 119.781 56.534 51.270 2048: 352.783 167.847 160.828 4096: 1107.178 513.916 493.774
Alpha time (in microseconds) to multiply two n-bit numbers n: classic NTL NTL/GMP pure GMP --------------------------------------- 64: 0.562 0.562 0.313 128: 0.996 0.996 0.490 256: 2.905 2.119 1.179 512: 8.481 4.711 3.345 1024: 24.807 12.821 11.032 2048: 76.727 37.097 34.636 4096: 234.439 109.371 107.033
Note that on the two 32-bit machines, for the two n=32 timings, the classic NTL and hybrid NTL/GMP codes are the same. For the 64-bit machine, the same holds for n=32,64.
Below are the results of some timing tests for divsion with remainder on the Pentium-II.
Pentium II time (in microseconds) to compute a % b, where a has 2*n bits, and b has n-bits n: classic NTL NTL/GMP pure GMP --------------------------------------- 64: 3.467 3.481 1.287 128: 5.903 4.768 2.618 256: 13.599 8.125 6.475 512: 38.986 18.005 14.477 1024: 128.174 45.395 42.953 2048: 452.271 141.602 134.735 4096: 1682.129 488.281 477.295
Below are some timing tests for modular exponentiation on the Pentium-II.
Pentium II time (in microseconds) to compute a^b % c for n-bit integers a, b, c n: classic NTL NTL/GMP pure GMP --------------------------------------- 64: 314.331 231.018 219.116 128: 1394.043 823.975 819.092 256: 6240.234 3291.016 3281.250 512: 35312.500 14921.875 14882.812 1024:228125.000 78281.250 78281.250