A Tour of NTL: Some Performance Data
For some detailed performance measurements for basic integer arithmetic go here.
This page discusses the performance of other, higher-level operations. Here are some timing figures from using NTL. The figures were obtained using an IBM RS6000 Workstation, Model 43P-133, which has a 133 MHz PowerPC Model 604 processor. The operating system is AIX and the compiler is xlC. The compiler options were -O2 -qarch=ppc and the NTL flags NTL_AVOID_FLOAT and NTL_TBL_REM were set. GMP was not used.
The first problem considered is the factorization of univariate polynomials modulo a prime p. As test polynomials, we take the family of polynomials defined in [V. Shoup, J. Symb. Comp. 20:363-397, 1995]. For every n, we define p to be the first prime greater than 2^(n-2)*PI, and the polynomial is
sum(a[n-i]*X^i, i = 0..n),
where a[0] = 1, and a[i+1] = a[i]^2 + 1. Here are some running times:
n | 64 | 128 | 256 | 512 | 1024 |
hh:mm:ss | 2 | 13 | 1:53 | 21:01 | 4:05:25 |
Also of interest is space usage. The n = 512 case used 4MB main memory, and the n = 1024 case used 17 MB main memory.
Just for fun, I tried version 4.2 of NTL on a 700MHz Pentium-III, using GMP. This took just over 12 hours. For comparison, this took 12 days when I ran it on a Sparc station in 1994. Some of the improvement was due just to faster clock speeds, but using GMP for the long arithmetic helped even more, which greatly increased the speed of the inner products computed in the "modular compsoition" operation.
Another test suite, this time using small primes, was used by Kaltofen and Lobo (Proc. ISSAC '94). One of their polynomials is a degree 10001 polynomial, modulo the prime 127. This polynomial was factored with NTL in just over 3 hours, using 17MB of memory.
The second problem considered is factoring univariate polynomials over the integers. This test suite comes from Paul Zimmermann. The polynomial P1(X) has degree 156, coefficients up to 424 digits, and 36 factors (12 of degree 2, 15 of degree 4, 9 of degree 8). The polynomial P2(X) has degree 196, coefficients up to 419 digits and 12 factors (2 of degree 2, 4 of degree 12 and 6 of degree 24). The polynomial P3(X) has degree 336, coefficients up to 597 digits and 16 factors (4 of degree 12 and 12 of degree 24). The polynomial P4(X) has degree 462, coefficients up to 756 digits, and two factors of degree 66 and 396. The polynomial P5(X) has degree 64, coefficients of up to 40 digits, and is irreducible. It is a so-called Swinnerton-Dyer polynomial, which is one of the most difficult types to factor (or prove irreducible). More details on this test suite are available.
Our running times (hh:mm:ss) were as follows:
1.7, 6.7, 14, 3:56, 2:17.In all cases less than 5MB of main memory was used.
These times relect an improvement over version 3.7a due to a simple trick of exploiting the structure of polynomials of the form g(x^k). It is a real hack, but many other factorizers "on the market" do this too, so it is only fair. Moreover, as these polynomials arise from "natural problems", and are not artificially constructed, it seems reasonable to look for and exploit such structure.
Without this hack (as in version 3.7a) the times for P1(X), P2(X), and P3(X) (hh:mm:ss) we as follows:
15, 21, 1:12.The times for P4(X) and P5(X) remain unchanged.
All of these times reflect significant improvements to the factorizer available since version 3.6a. As a comparison, here are the corresponding times in version 3.5a:
21, 23, 1:16, 1:37:10, 1:25:00.
In addition to the above polynomials, Zimmermann has recently
added a challenge polynomial P8(X) to his list
of challenge polynomials.
This is a particularly hard-to-factor polynomial, but using
new techniques, we were able to prove that it is irreducible
in time 1:41:00 using 128MB of memory on a 375MHz PowerPC
(model 43P-150).
This technique is based on a time/space tradeoff.
For example, using 64MB of memory, the running time for the
same calculation was 2:40:00.
NTL's lattice basis reduction code has been used to push the envelope
on breaking new lattice-based cryptosystems.
To date, NTL's lattice code has been used to break the
GGH cryptosystem [Goldreich, Goldwasser, Halevi, Crypto '97]
up to dimension 350.
These experiments were designed
and conducted by
Phong Nguyen.
Lattice basis reduction