The SuperWabaJump Toolkit Performance Guide
Introduction
In this guide I will try to assign some numbers to the relative cost of different elements of a
SuperWabaJump application. These numbers are very rough because they will depend on the Palm OS device used. A simple integer add will be assigned the cost of
1 time unit. A slower operation might be assigned a cost of
3 units, say. Some operations vary significantly in speed from one use to another. I will try to be representation. Also, some operations are disproportionately faster on a Palm OS 5.x based device using and ARM or XScale processor. I will try to note such differences.
SpeedTest
There is an example application supplied with the Jump distribution called SpeedTest. This application can be built with WabaJump,
SuperWaba or
SuperWabaJump. The SuperWabaJump build of this application is part of the SuperWabaJump toolkit distribution. Results from running this application and examination of the generated code were used to compile this guide.
General Issues
If there is one rule to follow it is to use local variables and parameters where possible. Accessing static or instance fields is never faster than local variables. In the very best case a static or instance field is as fast as local variable is in the worst case. The cost of accessing a local variable is often
0 because it can be combined with other operations, at worst it will be
1 time unit.
Jump Specific Issues
NullPointerException
In principle, whenever field or method of an object is accessed there is the possibility that the object is
null and that a NullPointerException will be thrown. This means that Jump must generate code to check the object for null, and if required throw the exception. All these checks for null-pointers can have a significant impact on the performance of the application even though the test itself costs only
3 time units.
Jump tries to get round this problem in several ways. In the simplest, brute force, method Jump provides a compiler option (
-n ) that turns off all null-pointer checking. Fast, effective, but not safe. Jump also tracks local variables and parameters that are objects, remembering which ones have been checked for null already. If the variable is known to be non-null via all routes to the current usage point then Jump will know that a test is not required. This analysis removes approximately 75%-80% of all null-pointer checks. So try to avoid null-pointer checks in loops by using an object before the loop.
The implicit
this parameter of intance methods is guaranteed not to be null. So null-pointer checks are never need to access instance fields or methods via
this. The cost of accessing byte, char, short, int or object fields in
1 -
2 time units, while double and long fields require
2 -
3 units. Accessing instance fields and methods via a local variable will also take this time plus the time for a null-pointer check, if necessary. The actual time varies somewhat depending on the context, particularly if the object has been used recently.
example:
void myMethod(AnotherObject obj)
{
this.count++; // no null-pointer check because this can't be null.
obj.field = 1; // must check for obj==null.
obj.anotherField = 'x'; // obj has beened checked already so it can't be null here.
obj.data[1] = '?'; // obj is OK, but the obj.data array must be checked for null.
obj.data[2] = '!'; // Jump can't tract fields for null, obj.data is checked again.
}
example:
void myMethod(AnotherObject obj)
{
if ( ok )
{
obj.field = 1; // must check for obj==null.
obj.anotherField = 'x'; // obj has beened checked already so it can't be null here.
}
obj.yetAnother = '?'; // obj might not have been tested for null so check here.
}
Note that arrays are objects too. The null-pointer checks apply equally to them.
IndexOutOfBoundsException
Every time an array is subscripted Jump must generate code to check that the subscript is within bounds. Jump does
not have any way to infer that a subscripting operation is in bounds, even if the subscript is constant. The cost of this check is
4 time units. However, Jump has a command line option (
-a ) that turns of all subscript bounds checking. For applications that do substantial array handling turning of array bounds checking can provide a significant boost in speed. The cost of subscripting a byte, char, short, int or object array is
2 time units, while double and long arrays take
3. As you can see, the bounds check can take longer than the wor it is checking.
Simple Integer Expressions
In general there is no advantage to declaring
local variables a smaller than integer because all internal operations must be performed at integer precision. This can lead to extra work to truncate the results or sign-extend the operands. There is some benefit to fields that are smaller because they use less memory.
Jump can optimize some expressions involving constants. However it is not as clever as it could be. For example
a = b+1 is better than
a = 1+b. In general Jump recognizes special case involving constants only when the constant is the right-hand operand of a sub-expression.
Add and subtract cost
1 time unit. Adding or subtracting a constant can be a little quicker, particularly if between 1 and 8.
Multiple costs
15 time units, but can be slower if either operand is outside the range -32768 to 32767. Many cases of multiplication by a constant are converted to faster combinations of shifts and adds.
Divide costs
15 time units, but can be substantially slower if either operand is outside the signed 16-bit range. Division by a power of two constant is optimized shift and some checking, take
5 time units - but an explicit shift is faster because no sign checking is involved.
Shifts cost about
2 -
4 time units. The time depends on the shift amount - the more bits shifted the longer the time. Constant shifts are a fraction faster.
Strings
String constants are extremely cheap to use. An assignment of a constant string to a local variable costs
1 time unit. Further,if two string constants in two different classes have the same value then Jump will have only one instance. There is no need to put all the string constants into one class as static final String to get the benefit of shared strings. Of course manipulation of string constants such as
substring will take just as long as for any dynamically created string.
Floating Point
There is very little in the way of optimization when using floating point operation, both
float and
double. The reason is that almost all of the time is spent in routines performing the low-level operations. None of the Palm OS devices have hardware support for floating point operations, everything is done in software, even on ARM and XScale based devices.
float add, subtract and multiple cost
60 time units. Divide costs
90.
double add and subtract costs
80 time units, while multiply costs
100 and divide
150.
Square root, trig, log, exp and pow methods take between
1300 and
6000 time units. But
Math.abs(double) takes only
9.
On Palm OS 5.x devices
float and
double arithmetic is about five times faster than the figures given above because the floating point support routines use native ARM code.
It is clear that floating point operations are substantially slower than integer ones.
Memory Usage
Each object has four bytes of overhead. Fields require 1, 2, 4 or eight bytes of memory each. Booleans are one byte each. Fields that are 2, 4 or 8 bytes long must be positioned on an even boundary, so there may be a pad byte between a boolean (say) and a following int. The total object size is then rounded up to a multiple of eight bytes. However, if a field is never accessed from the methods that are actually used then the field will be pruned from the class. So, if there are debug methods and fields, and the debug methods are never called then they and the unused fields with not have any impact on the application at all.
Dynamic Class Initialization Mode
Comparison Of The SuperWabaJump Toolkit and SuperWaba
The results below are for SpeedTest run on a Sony Clié N770C Palm OS 4.1 device using The SuperWabaJump Toolkit 4.01 Beta 4 with Jump 2.1.8 and the SuperWaba VM 4.02. Note that the empty loop time was subtracted from all tests that report in μs. So for example one loop of the integer add test took 1μs+1μs with Jump but 9μs+38μs for SuperWaba.
| Test | Jump 2.1.8 | SW 4.02 |
| Empty loop | 1 μs | 38 μs |
| integer |
| Add | 1 μs | 9 μs |
| Subtract | 1 μs | 9 μs |
| Multiply | 14 μs | 20 μs |
| Divide | 14 μs | 23 μs |
| Shift | 3 μs | 12 μs |
| long to int | 3 μs | 10 μs |
| float to int | 44 μs | 179 μs |
| double to int | 51 μs | 54 μs |
| int to byte | 1 μs | 6 μs |
| long |
| Add | 8 μs | 38 μs |
| Subtract | 8 μs | 38 μs |
| Multiply | 38 μs | 59 μs |
| Divide | 267 μs | 808 μs |
| Shift | 13 μs | 46 μs |
| int to long | 2 μs | 16 μs |
| float to long | 50 μs | 46 μs |
| double to long | 56 μs | 66 μs |
| float |
| Add | 53 μs | 48 μs |
| Subtract | 60 μs | 55 μs |
| Multiply | 60 μs | 55 μs |
| Divide | 82 μs | 106 μs |
| int to float | 50 μs | 40 μs |
| long to float | 55 μs | 58 μs |
| double to float | 46 μs | 49 μs |
| double |
| Add | 72 μs | 82 μs |
| Subtract | 78 μs | 88 μs |
| Multiply | 100 μs | 110 μs |
| Divide | 143 μs | 186 μs |
| int to double | 57 μs | 52 μs |
| long to double | 61 μs | 70 μs |
| float to double | 42 μs | 48 μs |
| transendental |
| Double abs | 9 μs | 582 μs |
| Double square root | 1309 μs | 2582 μs |
| Double sin | 2589 μs | 6722 μs |
| Double log | 2964 μs | 4632 μs |
| Double pow | 5849 μs | 10902 μs |
| basic |
| Time stamp | 340 ms | 3200 ms |
| Empty loop | 130 ms | 3120 ms |
| Simple arithmetic | 900 ms | 10870 ms |
| Hanoi 17 | 2890 ms | 29940 ms |
| Tak 5 | 2950 ms | 35180 ms |
| Virtual calls | 3240 ms | 33800 ms |
| Float arithmetic | 1660 ms | 2210 ms |
| Double arithmetic | 2410 ms | 3500 ms |
| Transendentals | 7800 ms | 19180 ms |
| int getter method | 1 μs | 100 μs |
| boolean getter method | 1 μs | 100 μs |
| static int getter method | 1 μs | 82 μs |
| static boolean getter method | 1 μs | 82 μs |
| testing static boolean | 1 μs | 84 μs |
| byte [] | 2 μs | 14 μs |
| short [] | 2 μs | 14 μs |
| int [] | 2 μs | 14 μs |
| long [] | 3 μs | 19 μs |
| float [] | 2 μs | 14 μs |
| double [] | 3 μs | 19 μs |
| compression |
| Compress string | 8049 μs | 59622 μs |
| Decompress string | 4139 μs | 30272 μs |
| wabamark |
| Sieve | 7140 ms | 39390 ms |
| Whetstone | 10710 ms | 25290 ms |
| Linpack | 10760 ms | 17350 ms |
| exceptions |
| throw | 32 μs | 60 μs |
| deep throw | 58 μs | 422 μs |
| garbage |
| Garbage | 42269 μs | 56192 μs |
| Allocate tree | 11220 ms | 34680 ms |
| Create list 1000 | 80 ms | 180 ms |
| GC with list | 100 ms | 270 ms |
| Create list 5000 | 440 ms | 920 ms |
| GC with list | 340 ms | 580 ms |
| finalize† | 100 ms | 320 ms |
| forName | 779 μs | 2802 μs |
| newInstance | 111 μs | 648 μs |
| String |
| valueOf(int) | 680 μs | 473 μs |
| valueOf(long) | 3619 μs | 11892 μs |
| valueOf(float) | 2969 μs | 1922 μs¶ |
| valueOf(double) | 4534 μs | 13852 μs |
| valueOf(char) | 275 μs | 264 μs |
| hashCode | 96 μs | 253 μs |
| Concatenate | 1365 μs | 2039 μs |
| length | 1 μs | 97 μs |
| charAt | 15 μs | 109 μs |
| substring | 118 μs | 965 μs |
| equals | 61 μs | 186 μs |
| compareTo | 27 μs | 190 μs |
| startsWith | 47 μs | 998 μs |
| endsWith | 50 μs | 1041 μs |
| indexOf(int) | 90 μs | 164 μs |
| indexOf(String) | 126 μs | 303 μs |
| lastIndexOf(int) | 178 μs | 1751 μs |
| toUpperCase | 1029 μs | 8282 μs |
| intern | 264 μs | 807 μs |
† both Jump and SW failed this test. However, the test is slightly faulty in that Jump's behavior is valid. SuperWaba failed because it does not support finalizers.
¶ SW failed this test. The result string was incorrect.
--
PeterDickerson? - 30 Jan 2004