A Tour of NTL: NTL Implementation and Portability
NTL is designed to be portable, fast, and relatively easy to use and extend.
To make NTL portable, no assembly code is used. This is highly desirable, as architectures are constantly changing and evolving, and maintaining assembly code is quite costly. By avoiding assembly code, NTL should remain usable, with virtually no maintenance, for many years.
However, NTL makes two requirements of its platform, neither of which are guaranteed by the C++ language definition, but nevertheless appear to be essentially universal:
Relying on floating point may seem prone to errors, but with the guarantees provided by the IEEE standard, one can prove the correctness of the NTL code that uses floating point. Actually, NTL is quite conservative, and substantially weaker conditions are sufficient for correctness. In particular, NTL works with any mix of double precision and extended double precision operations (which arise, for example, with Intel x86 processors). NTL does require that the special quantities "infinity" and "not a number" are implemented correctly.
With this strategy, NTL represents arbitrary length integers using a 30-bit radix on 32-bit machines, and a 50-bit radix on 64-bit machines. If at some point in the future even larger word sizes are available, then NTL will still work correctly, but will unfortunately still use only a 50-bit radix, unless the precision of a double is also increased.
The general strategy used to implement large integers is taken from that is used in A. K. Lenstra's LIP library for arbitrary-length integer arithmetic. Indeed, NTL's integer arithmetic evolved from LIP, but over time almost all of this code has been rewritten to enhance performance as well as portability. LIP's philosophy of "portability plus performance" carries on in NTL.
To implement large integer arithmetic, several algorithmic strategies are available. Which is best depends on the particular platform you are using. These strategies can be selected either by editing the "config.h" header file, or, on Unix systems, you can let the configuration wizard figure out which strategy is best.
By avoiding assembly code for certain low-level operations, NTL pays a certain performance penalty. On typical platforms, NTL's long integer multiplication and division are about twice as slow as those of highly optimized, assembly language based libraries, such as GMP. Howver, NTL usually more than makes up for this by making use of the best available higher-level algorithms.
However, as of version 4.2, NTL may be used with GMP, the GNU Multi-Precision library, for enhanced performance [more details].
Long integer multiplication is implemented using the classical algorithm, crossing over to Karatsuba for very big numbers. Polynomial multiplication and division is carried out using a combination of the classical algorithm, Karatsuba, the FFT using small primes, and the FFT using the Schoenhagge-Strassen approach. The choice of algorithm depends on the coefficient domain. Also, many algorithms employed throughout NTL are recent inventions of the author (Victor Shoup) and his colleagues Joachim von zur Gathen and Erich Kaltofen, as well as John Abbott and Paul Zimmermann.
NTL is not a "perfect" library. Here are some limitations of NTL that a "perfect" library would not have:
However, as a compromise, as of version 4.2, one can use NTL with GMP, obtaining most of the benefits of GMP, but while maintaining complete backward compatability [more details].