r4 - 01 Feb 2004 - 09:39:16 - PeterDickerson?You are here: SuperWaba >  Codev Web  >  PortingGeneral > SuperWabaJumpPerformanceGuide

The SuperWabaJump Toolkit Performance Guide

Introduction

In this guide I will try to assign some numbers to the relative cost of different elements of a SuperWabaJump application. These numbers are very rough because they will depend on the Palm OS device used. A simple integer add will be assigned the cost of 1 time unit. A slower operation might be assigned a cost of 3 units, say. Some operations vary significantly in speed from one use to another. I will try to be representation. Also, some operations are disproportionately faster on a Palm OS 5.x based device using and ARM or XScale processor. I will try to note such differences.

SpeedTest

There is an example application supplied with the Jump distribution called SpeedTest. This application can be built with WabaJump, SuperWaba or SuperWabaJump. The SuperWabaJump build of this application is part of the SuperWabaJump toolkit distribution. Results from running this application and examination of the generated code were used to compile this guide.

General Issues

If there is one rule to follow it is to use local variables and parameters where possible. Accessing static or instance fields is never faster than local variables. In the very best case a static or instance field is as fast as local variable is in the worst case. The cost of accessing a local variable is often 0 because it can be combined with other operations, at worst it will be 1 time unit.

Jump Specific Issues

NullPointerException

In principle, whenever field or method of an object is accessed there is the possibility that the object is null and that a NullPointerException will be thrown. This means that Jump must generate code to check the object for null, and if required throw the exception. All these checks for null-pointers can have a significant impact on the performance of the application even though the test itself costs only 3 time units.

Jump tries to get round this problem in several ways. In the simplest, brute force, method Jump provides a compiler option ( -n ) that turns off all null-pointer checking. Fast, effective, but not safe. Jump also tracks local variables and parameters that are objects, remembering which ones have been checked for null already. If the variable is known to be non-null via all routes to the current usage point then Jump will know that a test is not required. This analysis removes approximately 75%-80% of all null-pointer checks. So try to avoid null-pointer checks in loops by using an object before the loop.

The implicit this parameter of intance methods is guaranteed not to be null. So null-pointer checks are never need to access instance fields or methods via this. The cost of accessing byte, char, short, int or object fields in 1 - 2 time units, while double and long fields require 2 - 3 units. Accessing instance fields and methods via a local variable will also take this time plus the time for a null-pointer check, if necessary. The actual time varies somewhat depending on the context, particularly if the object has been used recently.

example:

  void myMethod(AnotherObject obj)
  {
    this.count++;            // no null-pointer check because this can't be null.
    obj.field = 1;           // must check for obj==null.
    obj.anotherField = 'x';  // obj has beened checked already so it can't be null here.
    obj.data[1] = '?';       // obj is OK, but the obj.data array must be checked for null.
    obj.data[2] = '!';       // Jump can't tract fields for null, obj.data is checked again.
  }

example:

  void myMethod(AnotherObject obj)
  {
    if ( ok )
    {
      obj.field = 1;           // must check for obj==null.
      obj.anotherField = 'x';  // obj has beened checked already so it can't be null here.
    }
    obj.yetAnother = '?';      // obj might not have been tested for null so check here.
  }

Note that arrays are objects too. The null-pointer checks apply equally to them.

IndexOutOfBoundsException

Every time an array is subscripted Jump must generate code to check that the subscript is within bounds. Jump does not have any way to infer that a subscripting operation is in bounds, even if the subscript is constant. The cost of this check is 4 time units. However, Jump has a command line option ( -a ) that turns of all subscript bounds checking. For applications that do substantial array handling turning of array bounds checking can provide a significant boost in speed. The cost of subscripting a byte, char, short, int or object array is 2 time units, while double and long arrays take 3. As you can see, the bounds check can take longer than the wor it is checking.

Simple Integer Expressions

In general there is no advantage to declaring local variables a smaller than integer because all internal operations must be performed at integer precision. This can lead to extra work to truncate the results or sign-extend the operands. There is some benefit to fields that are smaller because they use less memory.

Jump can optimize some expressions involving constants. However it is not as clever as it could be. For example a = b+1 is better than a = 1+b. In general Jump recognizes special case involving constants only when the constant is the right-hand operand of a sub-expression.

Add and subtract cost 1 time unit. Adding or subtracting a constant can be a little quicker, particularly if between 1 and 8.

Multiple costs 15 time units, but can be slower if either operand is outside the range -32768 to 32767. Many cases of multiplication by a constant are converted to faster combinations of shifts and adds.

Divide costs 15 time units, but can be substantially slower if either operand is outside the signed 16-bit range. Division by a power of two constant is optimized shift and some checking, take 5 time units - but an explicit shift is faster because no sign checking is involved.

Shifts cost about 2 - 4 time units. The time depends on the shift amount - the more bits shifted the longer the time. Constant shifts are a fraction faster.

Strings

String constants are extremely cheap to use. An assignment of a constant string to a local variable costs 1 time unit. Further,if two string constants in two different classes have the same value then Jump will have only one instance. There is no need to put all the string constants into one class as static final String to get the benefit of shared strings. Of course manipulation of string constants such as substring will take just as long as for any dynamically created string.

Floating Point

There is very little in the way of optimization when using floating point operation, both float and double. The reason is that almost all of the time is spent in routines performing the low-level operations. None of the Palm OS devices have hardware support for floating point operations, everything is done in software, even on ARM and XScale based devices.

float add, subtract and multiple cost 60 time units. Divide costs 90.

double add and subtract costs 80 time units, while multiply costs 100 and divide 150.

Square root, trig, log, exp and pow methods take between 1300 and 6000 time units. But Math.abs(double) takes only 9.

On Palm OS 5.x devices float and double arithmetic is about five times faster than the figures given above because the floating point support routines use native ARM code.

It is clear that floating point operations are substantially slower than integer ones.

Memory Usage

Each object has four bytes of overhead. Fields require 1, 2, 4 or eight bytes of memory each. Booleans are one byte each. Fields that are 2, 4 or 8 bytes long must be positioned on an even boundary, so there may be a pad byte between a boolean (say) and a following int. The total object size is then rounded up to a multiple of eight bytes. However, if a field is never accessed from the methods that are actually used then the field will be pruned from the class. So, if there are debug methods and fields, and the debug methods are never called then they and the unused fields with not have any impact on the application at all.

Dynamic Class Initialization Mode

Comparison Of The SuperWabaJump Toolkit and SuperWaba

The results below are for SpeedTest run on a Sony Clié N770C Palm OS 4.1 device using The SuperWabaJump Toolkit 4.01 Beta 4 with Jump 2.1.8 and the SuperWaba VM 4.02. Note that the empty loop time was subtracted from all tests that report in μs. So for example one loop of the integer add test took 1μs+1μs with Jump but 9μs+38μs for SuperWaba.

Test Jump 2.1.8 SW 4.02
Empty loop 1 μs 38 μs
integer
Add 1 μs 9 μs
Subtract 1 μs 9 μs
Multiply 14 μs 20 μs
Divide 14 μs 23 μs
Shift 3 μs 12 μs
long to int 3 μs 10 μs
float to int 44 μs 179 μs
double to int 51 μs 54 μs
int to byte 1 μs 6 μs
long
Add 8 μs 38 μs
Subtract 8 μs 38 μs
Multiply 38 μs 59 μs
Divide 267 μs 808 μs
Shift 13 μs 46 μs
int to long 2 μs 16 μs
float to long 50 μs 46 μs
double to long 56 μs 66 μs
float
Add 53 μs 48 μs
Subtract 60 μs 55 μs
Multiply 60 μs 55 μs
Divide 82 μs 106 μs
int to float 50 μs 40 μs
long to float 55 μs 58 μs
double to float 46 μs 49 μs
double
Add 72 μs 82 μs
Subtract 78 μs 88 μs
Multiply 100 μs 110 μs
Divide 143 μs 186 μs
int to double 57 μs 52 μs
long to double 61 μs 70 μs
float to double 42 μs 48 μs
transendental
Double abs 9 μs 582 μs
Double square root 1309 μs 2582 μs
Double sin 2589 μs 6722 μs
Double log 2964 μs 4632 μs
Double pow 5849 μs 10902 μs
basic
Time stamp 340 ms 3200 ms
Empty loop 130 ms 3120 ms
Simple arithmetic 900 ms 10870 ms
Hanoi 17 2890 ms 29940 ms
Tak 5 2950 ms 35180 ms
Virtual calls 3240 ms 33800 ms
Float arithmetic 1660 ms 2210 ms
Double arithmetic 2410 ms 3500 ms
Transendentals 7800 ms 19180 ms
int getter method 1 μs 100 μs
boolean getter method 1 μs 100 μs
static int getter method 1 μs 82 μs
static boolean getter method 1 μs 82 μs
testing static boolean 1 μs 84 μs
byte [] 2 μs 14 μs
short [] 2 μs 14 μs
int [] 2 μs 14 μs
long [] 3 μs 19 μs
float [] 2 μs 14 μs
double [] 3 μs 19 μs
compression
Compress string 8049 μs 59622 μs
Decompress string 4139 μs 30272 μs
wabamark
Sieve 7140 ms 39390 ms
Whetstone 10710 ms 25290 ms
Linpack 10760 ms 17350 ms
exceptions
throw 32 μs 60 μs
deep throw 58 μs 422 μs
garbage
Garbage 42269 μs 56192 μs
Allocate tree 11220 ms 34680 ms
Create list 1000 80 ms 180 ms
GC with list 100 ms 270 ms
Create list 5000 440 ms 920 ms
GC with list 340 ms 580 ms
finalize† 100 ms 320 ms
forName 779 μs 2802 μs
newInstance 111 μs 648 μs
String
valueOf(int) 680 μs 473 μs
valueOf(long) 3619 μs 11892 μs
valueOf(float) 2969 μs 1922 μs¶
valueOf(double) 4534 μs 13852 μs
valueOf(char) 275 μs 264 μs
hashCode 96 μs 253 μs
Concatenate 1365 μs 2039 μs
length 1 μs 97 μs
charAt 15 μs 109 μs
substring 118 μs 965 μs
equals 61 μs 186 μs
compareTo 27 μs 190 μs
startsWith 47 μs 998 μs
endsWith 50 μs 1041 μs
indexOf(int) 90 μs 164 μs
indexOf(String) 126 μs 303 μs
lastIndexOf(int) 178 μs 1751 μs
toUpperCase 1029 μs 8282 μs
intern 264 μs 807 μs
† both Jump and SW failed this test. However, the test is slightly faulty in that Jump's behavior is valid. SuperWaba failed because it does not support finalizers.
¶ SW failed this test. The result string was incorrect.

-- PeterDickerson? - 30 Jan 2004

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r4 < r3 < r2 < r1 | More topic actions
 
SuperWaba home
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding SuperWaba? Send feedback