This summer I’ve been an intern at Factual, and this is an experience report from the semiannual internal hackathon where Alan ‘amalloy’ Malloy and I experimented with using Alexander Yakushev’s Skummet fork of Clojure to emit lean(er) bytecode.
One of Clojure’s primary use cases is as a more palatable tool with which to interact with the rich Java ecosystem and existing Java libraries. Because of its facilities for such inter-operation, Clojure is even sometimes used to write performance sensitive code which would otherwise be written in Java. However there are limitations to the success with which this may be done.
While JVM byte code is statically typed, Clojure is a dynamically checked language which makes pervasive use of the Object type to delay type checking. To this end, Clojure will use JVM Object reflection to resolve instance fields and methods when performing interoperation. While correct for unknown types, because reflective access is slow compared to direct access for known types, it has long been possible to write type hints which advise Clojure about the runtime JVM type of a value and enable Clojure to use direct access and direct method invocation rather than reflective access.
However these hints are not types in the sense of a static type being a contract on the domain of values; they are merely hints for reflection elimination and place no contract on the domain of a hinted value.
This hinting behavior for reflection elimination comes at the cost of emitting
checkcast instructions. As the JVM is statically typed, one cannot simply swear that a value is of a type, a checking cast must be used. Clojure, when emitting non-reflective method calls and field accesses, does not statically know (and makes no attempt to prove) that the value or expression in play is in fact of the type which you may have tagged it. All local variables and function parameters which are not JVM primitives are stored as
Objects and so must be
checkcasted to the desired type on every use.
So are we stuck trading slow reflective access for
checkcast instructions (which are faster to be sure but cause method bloat when doing lots of interop on previously checked values)? Of course not! While Clojure does not currently have support for real contractual types, we can sure add it!
Now, clearly since Clojure does not currently have strict local types, we can’t just make tags strict. TEMJVM actually makes that mistake, and as a result cannot compile
clojure/core because among others,
clojure.core/ns-publics makes use of a type hint which while safe for nonstrict type tags is not correct in the context of strict tags. This has to be an additive, opt-in change.
So, what Alan and I did was create new special
fn metadata flag
^:strict. If a
fn body being compiled has the
^:strict tag, then and only then are are type tags treated as a contract rather than being advisory. This is a strictly additive change because stock Clojure will ignore the metadata and emit less efficient but still correct code.
So as an example, let’s consider the following
Eliding a bunch of implementation details for brevity, this
fn compiles on stock Clojure 1.7 to the following JVM bytecodes:
public final long invokePrim(java.lang.Object xs); 0 aload_1 [xs] 1 aconst_null 2 astore_1 [xs] 3 checkcast java.lang.Iterable  6 invokeinterface java.lang.Iterable.iterator() : java.util.Iterator  [nargs: 1] 11 astore_2 [iter] 12 lconst_0 13 nop 14 lstore_3 [tot] 15 aload_2 [iter] 16 checkcast java.util.Iterator  19 invokeinterface java.util.Iterator.hasNext() : boolean  [nargs: 1] 24 ifeq 51 27 lload_3 [tot] 28 aload_2 [iter] 29 checkcast java.util.Iterator  32 invokeinterface java.util.Iterator.next() : java.lang.Object  [nargs: 1] 37 invokestatic clojure.lang.RT.longCast(java.lang.Object) : long  40 invokestatic clojure.lang.Numbers.add(long, long) : long  43 lstore_3 [tot] 44 goto 15 47 goto 52 50 pop 51 lload_3 52 lreturn Local variable table: [pc: 15, pc: 52] local: tot index: 3 type: long [pc: 12, pc: 52] local: iter index: 2 type: java.lang.Object [pc: 0, pc: 52] local: this index: 0 type: java.lang.Object [pc: 0, pc: 52] local: xs index: 1 type: java.lang.Object // Method descriptor #77 (Ljava/lang/Object;)Ljava/lang/Object; // Stack: 5, Locals: 2 public java.lang.Object invoke(java.lang.Object arg0); 0 aload_0 [this] 1 aload_1 [arg0] 2 invokeinterface clojure.lang.IFn$OL.invokePrim(java.lang.Object) : long  [nargs: 2] 7 new java.lang.Long  10 dup_x2 11 dup_x2 12 pop 13 invokespecial java.lang.Long(long)  16 areturn
So here we have two methods. The first one, the
invokePrim, takes an
Object and returns a primitive
long since we long hinted our function. The
invoke method is a wrapper around the
invokePrim method which provides for “boxing” (wrapping in an
Object) the primitive result of calling
invokePrim. This allows our
fn to be used by code which wants and can use a
long, and code which doesn’t know/care and just wants an
Object back like a normal
fn would return.
So lets dig into the
xsoff the arguments stack. It's just typed
Objectbecause that's the parameter type.
nilto the local named
xs, thus clearing it. Note that in the locals table,
xshas the type
Object. This means that when we get
xsfrom the local, we have to
checkcastit again because we've lost type information by storing and loading it.
xswe loaded to
Iterablesince we don't really know what it is.
.iteratormethod to get an
Iteratorfrom our now guaranteed
iterlocal (discarding type information as above).
0from the class constant pool.
totlocal (primitive typed).
checkcastbecause in storing it we forgot that it's an
invokeinterfaceto see if there are elements left, producing a primitive
booleangoing to 21 (in this list) if
totfrom the local.
iterfrom the local.
invokeinterfaceto get the next value from the iterator, producing an
invokestaticto call the static
clojure.lang.RTmethod for converting
Objects to primitive
invokestaticto add the two primitive
longs on the stack.
So with the exception of the first
checkcast to make sure that the
Object we got as an argument that should be
Iterable is in fact an instance of
load are all provably uncalled for. The static types of these values is known because their Java signatures are known, and the only reason that we have to emit all these checks is that the Compiler throws that information away by storing these values in untyped (
Every Expr in
clojure.lang.Compiler already knows (or can state) its type either as tagged or inferred, and whether it has such a tag. However, these stated Java classes are lies! A function invocation (
IFn.invoke call site) is statically typed to return
Object (unless it’s a primitive call site, but we know that as well) no matter what the tag on the
IFn being invoked may say. For example
clojure.core/str is tagged
^String and does indeed return a
String, however after invoking the appropriate
IFn the JVM doesn’t know that there’s a
String on the stack because the
IFn interface discards that type information. It just knows it has an
Object. The fix is that we add an
Expr.needsCast method and implement it for every instance of
Expr in Compiler.java. So now when in strict mode, we know that unless
true, the value on the stack after
Expr.emit absolutely is of type
Expr.getJavaClass. Otherwise we cannot avoid the
We also have to change the behavior of locals so that we can emit locals with types other than
Object. By typing locals, we preserve their type information as tagged or inferred across loads and stores. This allows the
Expr representing a local use to report that it only needs a cast when the usage of the local doesn’t have the same tag as the type of the binding and we cannot statically show no cast is required.
With these changes, our modified Compiler.java can indeed produce and use strictly typed locals. So lets add our annotation…
And generate bytecode on our modified version of Skummet 1.7-RC1-r4 (again abbreviated).
public final long invokePrim(java.lang.Object); Code: 0: aload_1 1: aconst_null 2: astore_1 3: checkcast #30 // class java/lang/Iterable 6: invokeinterface #34, 1 // InterfaceMethod java/lang/Iterable.iterator:()Ljava/util/Iterator; 11: astore_2 12: lconst_0 13: nop 14: lstore_3 15: aload_2 16: invokeinterface #40, 1 // InterfaceMethod java/util/Iterator.hasNext:()Z 21: ifeq 43 24: lload_3 25: aload_2 26: invokeinterface #44, 1 // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object; 31: invokestatic #49 // Method clojure/lang/RT.longCast:(Ljava/lang/Object;)J 34: ladd 35: lstore_3 36: goto 15 39: goto 44 42: pop 43: lload_3 44: lreturn LocalVariableTable: Start Length Slot Name Signature 15 29 3 tot J 12 32 2 iter Ljava/util/Iterator; 0 44 0 this Ljava/lang/Object; 0 44 1 xs Ljava/lang/Object; public java.lang.Object invoke(java.lang.Object); Code: 0: aload_0 1: aload_1 2: invokeinterface #59, 2 // InterfaceMethod clojure/lang/IFn$OL.invokePrim:(Ljava/lang/Object;)J 7: new #13 // class java/lang/Long 10: dup_x2 11: dup_x2 12: pop 13: invokespecial #62 // Method java/lang/Long."":(J)V 16: areturn
The win compared to the original bytecode should be obvious. Sure enough in the
invokeStatic method we only emit the one
checkcast we absolutely have to have because the
xs argument could really be anything. The
iter locals are both statically typed, and so we can just load them and invoke the appropriate interfaces directly.
In some simple benchmarks, this optimization on this
fn translates to a 5%-10% performance improvement which isn’t too impressive. However other
clojure.core/str in our testing were able to get up to 20% performance improvements from strict locals.
This is the product of a two day hack. While Alan and I have been able to get it to work and emit working code, honestly this isn’t something we’re comfortable taking to production yet. Some clear wins such as being able to emit typed
fn arguments by popping arguments, checking them and then putting them in typed local bindings for use and being able to take advantage of types on closed over locals remain on the table.
What Didn’t (Seem to) Work
While Alan debugged Compiler.java and picked off some types related wins we hadn’t gotten yet, I worked on adding more inlining behavior to
clojure.core. Much of
core, especially the lexically early
fns are just thin wrappers around interop on
clojure.lang.RT, which does have reasonable interface types on most of its methods.
The hope was that with the typed locals work, preserving more type information across calls to the Clojure standard library and inlining the Clojure standard library where possible to interop calls with clearly inferable types, and we would be able to produce demonstrably faster code.
While in theory this should at least break even and probably even be a win, we haven’t managed to benchmark it well and show a clear win from the aggressive inlining work. In fact, the really interesting case of possibly aggressive inlining being an
into which is able to use a typed, static
Transient loop is impossible because
into is implemented in terms of
reduce, which takes the reducing
fn as a value and then dispatches via
clojure.lang.IReduce in order to get fast iteration over chunked seq. However, we can’t statically inline a call site through being taken as a value so that’s the end of the line for that idea.
After about a week of polishing, bugfixing and squashing, we’re happy to say that this behavior has been upstreamed into Skummet. While we expect that the patches we have developed will never be included into Clojure as-is if only due to high impact, we hope to see this behavior or something like it enter the mainline in a future version of Clojure.
- Reid McKenzie, Software Engineering Intern