However with the development of Perl v5, Perl acquired 'Objects'. To put it simply users can define their own special data types, and write custom routines to manipulate them either in low level languages (C and Fortran) or in Perl itself.
This has been fully exploited by the PerlDL developers. The 'PDL' module is
a complete Object-Oriented extension to Perl (although you don't have to
know what an object is to use it) which allows large N-dimensional data
sets, such as large images, spectra, time series, etc to be stored efficiently and manipulated en masse. For example with the PDL module we can write the perl code $a=$b+$c
, where $b
and $c
are large datasets (e.g.
2048x2048 images), and get the result in only a fraction of a second.
PDL variables (or 'piddles' as they have come to be known) support a wide range of fundamental data types - arrays can be bytes, short integers (signed or unsigned), long integers, floats or double precision floats. And because of the Object-Oriented nature of PDL new customised datatypes can be derived from them.
As well as the PDL modules, that can be used by normal perl programs, PerlDL comes with a command line perl shell, called 'perldl', which supports command line editing. In combination with the various PDL graphics modules this allows data to be easily played with and visualised.
$x
, which can hold numbers or strings, lists or arrays of scalars, e.g. @x
, and associative arrays/hashes of scalars, e.g. %x
.
perl v5 introduces to perl data structures and objects. A simple scalar
variable $x
now be a user-defined data type or full blown object (it actually holds a
reference (a smart ``pointer'') to this but that is not relevant for
ordinary use of perlDL)
The fundamental idea behind perlDL is to allow $x
to hold a
whole 1D spectrum, or a 2D image, a 3D data cube, and so on up to large
N-dimensional data sets. These can be manipulated all at once, e.g.
$a = $b + 2
does a vector operation on each value in the spectrum/image/etc.
You may well ask: ``Why not just store a spectrum as a simple perl @x
style list with each pixel being a list item?'' The two key answers to this
are memory and speed. Because we know our spectrum consists of pure numbers we can compactly
store them in a single block of memory corresponding to a C style numeric
array. This takes up a LOT less memory than the equivalent perl list. It is
then easy to pass this block of memory to a fast addition routine, or to
any other C function which deals with arrays. As a result perlDL is very
fast --- for example one can mulitiply a 2048*2048 image in exactly the
same time as it would take in C or FORTRAN (0.1 sec on my SPARC). A further
advantage of this is that for simple operations (e.g. $x += 2
) one can manipulate the whole array without caring about its
dimensionality.
I find when using perlDL it is most useful to think of standard perl
@x
variables as ``lists'' of generic ``things'' and PDL variables like
$x
as ``arrays'' which can be contained in lists or hashes. Quite often in my
perlDL scripts I have @x
contain a list of spectra, or a list of images (or even a mix!). Or perhaps
one could have a hash (e.g. %x
) of images... the only limit is memory!
perlDL variables support a range of data types - arrays can be bytes, short integers (signed or unsigned), long integers, floats or double precision floats.
use PDL; # in perl scripts: use the standard perlDL modules
There are also a lot of extension modules, e.g. PDL::Graphics::TriD. Most of these (but not all as sometimes it is not appropriate) follow a standard convention. If you say:
use PDL::Graphics::TriD;
You import everything in a standard list from the module. Sometimes you might want to import nothing (e.g. if you want to use OO syntax all the time and save the import tax). For these you say:
use PDL::Graphics::TriD '';
And the blank quotes '' are regonised as meaning 'nothing'. You can also specify a list of functions to import in the normal Perl way.
There is also an interactive shell, perldl
, see perldl.
$a = pdl [1..10]; # 1D array $a = pdl (1,2,3,4); # Ditto $b = pdl [[1,2,3],[4,5,6]]; # 2D 3x2 array $b = pdl 42 # 0-dimensional scalar $c = pdl $a; # Make a new copy
$d = byte [1..10]; # See "Type conversion" $e = zeroes(3,2,4); # 3x2x4 zero-filled array
$c = rfits $file; # Read FITS file
@x = ( pdl(42), zeroes(3,2,4), rfits($file) ); # Is a LIST of PDL variables!
The pdl()
function is used to initialise a PDL variable from a scalar, list, list
reference or another PDL variable.
In addition all PDL functions automatically convert normal perl scalars to PDL variables on-the-fly.
(also see ``Type Conversion'' and ``Input/Output'' sections below)
$a = $b + 2; $a++; $a = $b / $c; # Etc.
$c=sqrt($a); $d = log10($b+100); # Etc
$e = $a>42; # Vector condtional
$e = 42*($a>42) + $a*($a<=42); # Cap top
$a = $a / ( max($a) - min($a) );
$f = index($a, where($a > 10)); # where returns set of indices for 1D pdl.
print $a; # $a in string context prints it in a N-dimensional format
(and other perl operators/functions)
There are also predefined functions, which you will find on other manpages. The most important are the PDL/Primitive manpage and the PDL/Slices manpage.
'x'
is hijacked as the matrix multiplication operator. e.g.
$c = $a x $b
;
perlDL is row-major not column major so this is actually
c(i,j) = sum_k a(k,j) b(i,k)
- but when matrices are printed the results will look right. Just remember
the indices are reversed. e.g.:
$a = [ $b = [ [ 1 2 3 0] [1 1] [ 1 -1 2 7] [0 2] [ 1 0 0 1] [0 2] ] [1 1] ]
gives $c = [ [ 1 11] [ 8 10] [ 2 2] ]
Note: transpose()
does what it says and is a convenient way to
turn row vectors into column vectors. It is bound to the unary operator '~'
for convenience.
sub dotproduct { my ($a,$b) = @_; return sum($a*$b) ; } 1;
If put in file dotproduct.pdl would be autoloaded if you are using PDL::AutoLoader (see below).
Of course, this function is already available as the inner
function, see the PDL/Primitive manpage.
pdl()
is double. Conversions are:
$a = float($b); $c = long($d); # "long" is generally a 4 byte int $d = byte($a);
Also double(),
short(),
ushort().
These routines also automatically convert perl lists to allow the convenient shorthand:
$a = byte [[1..10],[1..10]]; # Create 2D byte array $a = float [1..1000]; # Create 1D float array
etc.
print $a;
$b = "Answer is = $a ";
0..99,0..99
.
Furthermore [Which modules!?!], the convention is that the center of the
pixel (0,0) IS at coordinate (0.0,0.0). Thus the above image ranges from
-0.5..99.5, -0.5..99.5
in real space. All perlDL graphics functions conform to this defintion and
hide away the unit-offsetness of, for example, the PGPLOT FORTRAN library.
Again following the usual convention coordinate (0,0) is displayed at the
bottom left when displaying an image. It appears at the top left when using
``print $a
'' etc.
$b = $a->slice("$x1:$x2,$y1:$y1,$z1:$z2"); # Take subsection
# Set part of $bigimage to values from $smallimage ($tmp = $bigimage->slice("$xa:$xb,$ya:$yb")) .= $smallimage;
$newimage = ins($bigimage,$smallimage,$x,$y,$z...) # Insert at x,y,z
$c = nelem ($a); # Number of pixels
$val = at($object, $x,$y,$z...) # Pixel value at position $val = $object->at($x,$y,$z...) # equivalent set($myimage, $x, $y, ... $value) # Set value in image
$b = xvals($a); # Fill array with X-coord values (also yvals(), zvals(), # axisvals($x,$axis) and rvals() for radial distance # from centre).
PDL::IO
modules implement several useful IO format functions. It would be too much
to give examples of each so you are referred to the individual manpages for
details.
There is an easy interface to this in the internal module PDL::Graphics::PGPLOT. (This calls routines in the separately available PGPLOT top-level module.)
The PDL::Graphics::IIS package provides allows one to display images in these (``IIS'' is the name of an ancient item of image display hardware whose protocols these tools conform to.)
One can also write PDL extensions as normal Perl modules.
perldl
provides a simple command line - if the latest Readlines/ReadKey modules
have beeen installed perldl
detects this and enables command line recall and editing. See the manpage
for details.
e.g.:
jhereg% perldl perlDL shell v1.23 ReadLines enabled Reading PDL/default.perldlrc... Found docs database /home/kgb/soft/dev/lib/perl5/site_perl/PDL/pdldoc.db Type 'help' for online help Loaded PDL v1.95_01 perldl> $x = rfits 'm51.fits' BITPIX = 16 size = 65536 pixels Reading 131072 bytes BSCALE = 1.0000000000E0 && BZERO = 0.0000000000E0
perldl> imag $x
Loaded PGPLOT Displaying 256 x 256 image
from 24 to 500 ...
You can also run it from the perl debugger (perl -MPDL -d -e 1
) if you want.
Miscellaneous shell features:
print
, e.g.
perldl> p ones 5,3
[ [1 1 1 1 1] [1 1 1 1 1] [1 1 1 1 1] ]
~/.perldlrc
and local.perldlrc
(in the current directory) are sourced if found. This allows the user to
have global and local PDL code for startup.
#
character is treated as a shell escape. This character is configurable by
setting the perl variable $PERLDL_ESCAPE. This could, for example, be set
in ~/.perldlrc
.
+ - * / > < >= <= << >> & | ^ == != <=> ** % ! ~ sin log abs atan2 sqrt cos exp
[All the unary functions (sin etc.) may be used with inplace()
- see ``Memory'' below.]
By using overloading one can make mathematical operators do whatever you please, and PDL has some built-in tricks which allow existing PDL functions to work unchanged, even if the underlying data representation is vastly changed! See the PDL/Objects manpage
NOTE: On some most systems it is better to configure perl (during the build
options) to use the system malloc()
function rather than
perl's built-in one. This is because perl's one is optimised for speed
rather than consumption of virtual memory - this can result in a factor of
two improvement in the amount of memory storage you can use. The Perl
malloc in 5.004 and later does have a number of compile-time options you
can use to tune the behaviour.
$a
is a big image (e.g. occupying 10MB) then the command
$a = $a + 1;
eats up another 10MB of memory. This is because the expression ``$a+1''
creates a temporary copy of $a
to hold the result, then
$a
is assigned a reference to that. After this, the original
$a
is destroyed so there is no permanent
memory waste. But on a small machine, the growth in the memory footprint
can be considerable. It is obviously done this way so ``$c=$a+1'' works as
expected.
Also if one says:
$b = $a; # $b and $a now point to same data $a = $a + 1;
Then $b
and $a
end up being different, as one
naively expects, because a new reference is created and $a
is
assigned to it.
However if $a
was a huge memory hog (e.g. a 3D volume)
creating a copy of it may not be a good thing. One can avoid this memory
overhead in the above example by saying:
$a++;
The operations ++,+=,--,-=
, etc. all call a special ``in-place'' version of the arithmetic
subroutine. This means no more memory is needed - the downside of this is
that if $b=$a then $b
is also incremented. To force a copy
explicitly:
$b = pdl $a; # Real copy
or, alternatively, perhaps better style:
$b = $a->copy;
log(),
return a result which is a
transformation of their argument. This makes for good programming practice.
However many operations can be done ``in-place'' and this may be required
when large arrays are in use and memory is at a premium. For these
circumstances the operator inplace()
is provided which
prevents the extra copy and allows the argument to be modified. e.g.:
$x = log($array); # $array unaffected log( inplace($bigarray) ); # $bigarray changed in situ
topdl()
first to make it safe. e.g.:
sub myfiddle { my $pdl = topdl(shift); $pdl->fiddle_foo(...); ... }
topdl()
does NOT perform a copy if a pdl variable is passed -
it just falls through - which is obviously the desired behaviour. The
routine is not of course necessary in normal user defined functions which
do not care about internals.