<<     >>     Contents     Help    

                                               19. Performance: Measurement & Tips

J lets you express your ideas tersely, but it is up to you to make sure they are good ideas.  Since each keystroke in a J sentence can summon up billions of machine cycles, you must make sure that you don't force the interpreter into a bad choice of algorithm.  This will be hard at first, when you are struggling to figure out what the interpreter is doing, never mind how it is doing it; fortunately the interpreter gives you tools to measure the performance of your code.

Timing Individual Sentances

If you run the JforC script with

load 'system\packages\misc\jforc.ijs'

it will define the verb Ts.  Ts stands for 'time and space' and it tells you how long it takes to run a given J sentence, and how much space the interpreter used during the execution.  For example:

   a3 =. i. 1000

   Ts '+/\ a3'

4.3581e_5 5248

We define a noun a3, and we calculate the running total of its items.  It took 0.00004 seconds to create the 1000-item running total, and used 5248 bytes.  We could have done the whole operation in one line with Ts '+/\ i. 1000', but the monad i. uses time and space too, so if we want to find out only what is used by +/\, we make sure that's all we measure.

We can use Ts to start to understand what can make J programs slow.  Let's define a verb to do the addition operation:

   sum =: dyad : 'x. + y.'"0

sum is an exact replacement for dyad +, having the same rank and function.  Replacing + with sum does not change the result of a sentence:

   +/\ i. 7

0 1 3 6 10 15 21

   sum/\ i. 7

0 1 3 6 10 15 21

But the performance is quite different, as we can measure:

   a10 =. i. 10

   1000 Ts '+/\ a10'

2.68191e_5 1280

Because +/\ is so fast, we give Ts a left argument to report the average time over 1000 runs.  If we just ran the sentence once, the result would be corrupted by small timing variations introduced by the operating system.  sum/\ is not so fast so we run it only once:

   Ts 'sum/\ a10'

0.00181867 3648

Quite a difference: in this running total sum seems to be about 50 times slower than .  Let's just try adding a list to itself (remember that u~ y is equivalent to y u y):

   1000 Ts '+~ a10'

2.68191e_5 896

   100 Ts 'sum~ a10'

0.00033021 2560

Yes, sum is definitely slower than +, though only by a factor of 10 or so this time.  Why should it be slower?  The answer is, Because it deals with atoms.  Since J verb-definitions are not compiled, but interpreted line-by-line on each execution, every single time we add two numbers with sum, the interpreter has to parse 'x. + y.' and perform the addition.  Why, it's a miracle that it only slows down by a factor of 10!  The lesson is that if you define verbs with small rank, the interpretive overhead will be significant.

Still, that doesn't fully explain why sum/\ is so much slower than +/\ .  Let's investigate further by increasing the size of the operand:

   a20 =. i. 20

   1000 Ts '+/\ a20'

2.68191e_5 1280

   Ts 'sum/\ a20'

0.00728641 3648

+/\ is unchanged when we move to a list of 20 items—the operation is so fast that time is being spent starting the verb rather than running it—but sum/\ slows down noticeably.  Interesting; let's try bigger and bigger operands:

   a40 =. i. 40

   1000 Ts '+/\ a40'

2.76572e_5 1408

   Ts 'sum/\ a40'

0.0299561 4160

 

   a100 =. i. 100

   1000 Ts '+/\ a100'

2.76572e_5 1664

   Ts 'sum/\ a100'

0.185741 5184

 

   a400 =. i. 400

   1000 Ts '+/\ a400'

3.77143e_5 3200

   Ts 'sum/\ a400'

3.00367 11328

Holy cow!  On a 400-item list, sum/\ is 80000 times slower than +/\!  What happened?

Recall what monad sum/\ is really doing.  It applies monad sum/ to the first item of the list; then to the list made of the first 2 items; then the list made of the first 3 items; and so on.  At each evaluation of monad sum/, the dyad sum verb is interleaved between the items and the result is evaluated right-to-left.  The problem is, the interpreter doesn't analyze sum to know that it is associative—that x sum (y sum z) is the same as (x sum y) sum z—so it doesn't know that it can use the result from one subset as an input to the operation for the next subset, and it winds up performing every single addition: for the 400th item it adds all 400 numbers together.  That's why its time increases as the square of the length of the list.

Monad +/\ is fast because the interpreter knows that dyad + is associative, and therefore it reuses the result from one subset as input to the next, producing each item of the result with a single addition.

Well then, can we give a hint to the interpreter that sum is associative?  Alas, no, but we have another trick up our sleeve.  Consider monad sum/\., which applies monad sum/ to successive suffixes.  If the interpreter is clever, it will notice that if it starts with the smallest suffix—the one made up of just the last item—and processes the suffixes in order of increasing size, it will always be evaluating x sum (previous suffix result), and right-to-left evaluation implies that the result of the previous suffix can always be used as the right operand to each application of monad sum, without needing any knowledge of associativity.  Let me tell you, this interpreter is nothing if not clever, and that's just what it does.  All we have to do is to convert our sum/\ into a variant of sum/\. .  The way to do that is simple: we reverse the order of the items, apply sum/\., and reverse the order again:

   sum/\.&.|. i. 7

0 1 3 6 10 15 21

This arises enough to be a standard J idiom: use it whenever you need to apply an associative verb on prefixes.  It's much faster:

   Ts 'sum/\.&.|. a400'

0.014805 59264

Still not as fast as +/\, but the suffix version uses time proportional to the number of items rather than the square of the number of items.

Compounds Recognized by the Interpreter

The interpreter recognizes a great many compounds and has special code to perform the compound functions.  For example, we have learned that u@:v y gives the same result as u v y, but it does not follow that the two forms are identical: +/@:, y is faster than +/ , y .  How do know what forms are handled with special code?

An appendix to the Dictionary gives a list of special code in the interpreter (press F1 to bring up help; then click on 'Dic' at the top of the page to bring up the Contents page; the appendices are listed at the end of the contents).  There we see that there is special code for f/@:, so we know to use that form.  Similarly, farther along we see that x i.&1@:< y has special coding, so we know to prefer that form over (x < y) i. 1 .  This list changes from release to release, so you should review it occasionally.

J's performance is very good even if you pay no attention whatsoever to which compounds have special coding, but if you're going to code a lot of J you might as well learn the interpreter's preferred idioms.

Use Large Verb-Ranks! and Integrated Rank Support

'Think big' is a watchword not just for program design, but for coding as well.  Starting a primitive has a small cost, but if you start a primitive for each atom of a large array, the cost will add up.  To reduce the time spent starting primitives, apply them to the biggest operands possible.  This means, Use as large a verb-rank as you can.  See what a difference a tiny change can make:

   a =. i. 100000 10

   Ts 'a -@+ a'

3.96384 4.19552e6

   Ts 'a -@:+ a'

0.12801 8.3895e6

These two verbs produce identical results, but -@+ is 30-fold slower than -@:+ on this large operand.  The reason is that -@+ has rank 0 (taken from the rank of +), while -@:+ has infinite rank.  Rank 0 means that each pair of atoms is fed individually through the verb. So, when -@+ is executed, two primitives are started for each pair of atoms, one to add and the other to change the sign. Execution of -@:+ requires only two primitive-starts for the entire array.

You do not need to worry much about the ranks at which individual primitives are applied, because of an important feature of J called integrated rank support.  When a verb with integrated rank support is used as the u in u"n, the resulting verb runs with a single primitive-start and the application of the verb on the proper cells is handled within the primitive.  So,

   100 Ts 'a + a'

0.0623949 4.19501e6

   100 Ts 'a +"0 a'

0.248846 4.19526e6

   100 Ts 'a +"1 a'

0.0681035 4.19526e6

   100 Ts 'a +"2 a'

0.0626361 4.1952e6

All these forms produce identical results.  The weak dependence of the speed on the rank is typical of a verb with integrated rank support.  Fastest execution is achieved when the verb is used alone, but the form u"n still runs fast, and the higher the rank, the less the loop-control overhead.  The Special Code page referred to in the previous section includes the long list of the primitives with integrated rank support.  You will see there that u/, u/\, and the like are also taken care of.

The practical effect of integrated rank support is that you don't need to worry much about using the largest possible rank for primitives.  In compounds and verbs that you write, you do need to keep the rank high:

   Ts '(<a:;1) { a'

0.00939758 525568

   Ts '1 {"1 a'

0.00952329 525184

Integrated rank support in dyad { gives the two forms equal performance.  Look what happens when we replace the { by a user-defined verb with the same function:

   from =. {

   Ts '(<a:;1) from a'

0.00953335 525760

   Ts '1 from"1 a'

0.365966 525696

from lacks integrated rank support, even though it is defined to have the same function as {, and it suffers when it is applied to each 1-cell.  This is a good reason for you to learn the J primitives and not replace them with mnemonic equivalents.

Shining a Light: The J Performance Monitor

A magnet makes it easy to pick up a needle, but it won't much help you find a needle in a haystack.  Likewise, being able to time and tune individual sentences will not suffice to let you improve the performance of a large J program.  A large program spends most of its time executing a small subset of its code, and any improvements you make to other areas are simply wasted effort.  I remember a case where a 20,000-line program was spending 30% of its time executing a single machine-language instruction—and that instruction turned out to be unnecessary!  What you need is a tool that will direct your attention to the areas where a speedup will really matter.

The J Performance Monitor will show you how much time is spent executing each line of your application.  You can run the Lab on the Performance Monitor to see all the facilities available, or you can jump right into timing your code with the simple sequence

   load 'jpm'

Do this once to load the tool.  Then, for each timing run, execute

   start_jpm_ 1e7

357142

The operand of start_jpm_ is the size in bytes of the trace buffer, and the result is the number of trace entries that can fit in the buffer.  A trace entry is added for each line executed, and for entry and exit of explicit definitions (i. e. verbs defined with verb define).

   run the code you want to time

   viewtotal_jpm_ ''

J will display a popup window with information about the time spent in each verb.  An example display is

+---------+------+--------+--------+-----+----+---+

|name     |locale|all     |here    |here%|cum%|rep|

+---------+------+--------+--------+-----+----+---+

|accpay   |base  |0.001435|0.000829| 57.8| 58 |1  |

|intrep   |base  |0.000213|0.000213| 14.8| 73 |1  |

|accint   |base  |0.000393|0.000147| 10.2| 83 |1  |

|stretch  |base  |0.000142|0.000142|  9.9| 93 |1  |

|intexpand|base  |0.000105|0.000105|  7.3|100 |1  |

|[total]  |      |        |0.001435|100.0|100 |   |

+---------+------+--------+--------+-----+----+---+

The columns contain the following information:

 

name  the name of the verb

locale  the locale the verb was running in (we will discuss locales in a later chapter)

all  the amount of time spent in this verb including time spent in verbs called by this verb

here  the amount of time spent in this verb but not including time spent in verbs called by this verb

here%  the here time as a percentage of total time

cum%  cumulative total of here%

rep  the number of times the verb was executed

You should focus your attention on the here column.  If you see a verb that is taking longer than you think it should, double-click on its name to look at the details of its execution.  Double-clicking on accpay will pop up another window showing

+--------+--------+---+----------------------------------+

|all     |here    |rep|accpay                            |

+--------+--------+---+----------------------------------+

|0.000041|0.000041|1  |monad                             |

|0.000040|0.000040|1  |[8] if. 4~:#y. do.                |

|0.000000|0.000000|0  |[9] 'imm frq int pay' return. end.|

|0.000054|0.000054|1  |[10] 'm f i p'=.y.                |

|0.000116|0.000116|1  |[11] len=.$p=.f#p%f               |

|0.000724|0.000131|1  |[12] j=.}.len accint f intrep i   |

|0.000322|0.000322|1  |[13] r=.j*+/\p%m}.1,(m-1)}.j      |

|0.000137|0.000137|1  |[14] (len$(-f){.1)#r              |

|0.001435|0.000841|1  |total monad                       |

+--------+--------+---+----------------------------------+

We see that line 13 takes the most time.  Clicking on the column heading will sort the lines using that column as a key, making it easy for you to concentrate on the individual lines that are taking the most time.

The J Performance Monitor makes it easy to give your code a good finish by pounding down the nails that are sticking up.  As of Release 5.01a there are a few quirks you need to work around: you cannot have a verb with the same name as a locale; you must close a detail window before you create a new one; and time spent in explicit modifiers is not correctly accounted for.

 


<<     >>     Contents     Help