Applicable symbols
Here are my notes about being puzzled about some Clojure code and diving into the implementation to figure it out. Although I figured it out the hard way, the exploration turned out to be interesting for me.
As I was reading Geoff Shannon’s blog post, I was thinking, “Oh, maybe this will turn out to be a mixup between compile time and run time”. (At least, in my experience writing macros, it can be challenging to keep that straight.)
Instead, what confused me was line 103 in the last example (here I’ve changed the symbol from '+
to 'x
):
1 |
(apply 'x [1 2]) ;=> 2 |
Huh? I don’t understand. Why does this return 2
, instead of raising an error?
First, let’s factor out apply
and confirm the same result for a direct application:
1 |
('x 1 2) ;=> 2 |
Same result. Same mystery.
Let’s explore some examples:
1 2 3 4 5 6 |
('x) ;=> ArityException Wrong number of args (0) passed to: Symbol clojure.lang.AFn.throwArity (AFn.java:429) ('x 1) ;=> nil ??? ('x 1 2) ;=> 2 ??? ('x 1 2 3) ;=> ArityException Wrong number of args (3) passed to: Symbol clojure.lang.AFn.throwArity (AFn.java:429) (type 'x) ;=> clojure.lang.Symbol |
There’s another mystery — the arity 1 case of ('x 1)
.
To summarize so far: It appears that it is not an error to use a symbol in the function position of an application if the arity is 1 or 2.
I’m aware that Clojure collections are applicable. And I’m aware that Clojure tends to nil
pun. So the arity 1 case might be less surprising… except that neither 'x
nor 1
is a collection, so this seems like it should raise an exception, not evaluate to nil
. And I don’t understand the arity 2 case.1
What to do? I could ask for help. But let’s take this as an opportunity to go spelunking in some Clojure implementation code I’ve never seen before.
From looking at the source for clojure.lang.AFn, it seems that all invoke
members call throwArity
. It’s been some years since I did C++, and I’m not up-to-speed on Java. But if I understand correctly, AFn
is a sort of default class that always errors. Applicable things (functions, keywords, collections) probably derive from AFn
and override invoke
members to do something other than error. Somehow such a class is created for the symbol 'x
, and I’m guessing that it overrides the arity 1 and 2 variants of invoke
.
That seems plausible. So, how to find this in the Clojure source? Instead of searching the code on GitHub, let’s git clone
the source locally and use Emacs rgrep
to search for invoke
. I’m sure there are Java source navigation tools, but probably not worth choosing/installing/learning for something this straightforward. Yes rgrep
returns 1200+ matches, but flipping through them quickly there are some obvious patterns — huge swaths that seem OK to ignore, and a few names that pop out.
For example in APersistentMap.java
I see arity 1 and 2 definitions of invoke
:
1 2 3 4 5 6 7 |
public Object invoke(Object arg1) { return valAt(arg1); } public Object invoke(Object arg1, Object notFound) { return valAt(arg1, notFound); } |
That sure seems how ({:key val} :key)
and ({:key val} :key
default)
would work. valAt
is a core member function that does the lookup of a key and value in the map.
OK. Next let’s try to find a class that might correspond to a symbol like 'x
and examine its invoke
member(s).
And bingo, here’s a file called Symbol.java
, also defining arity 1 and 2 variants of invoke
:
1 2 3 4 5 6 7 |
public Object invoke(Object obj) { return RT.get(obj, this); } public Object invoke(Object obj, Object notFound) { return RT.get(obj, this, notFound); } |
So this seems to explain why arities 1 and 2 are special. Next question: What is RT.get
? In C++ this would mean a static get
member of an RT
class. That seems to be the case here, looking in RT.java
. get
comes in both arity 1 and 2 flavors (not counting the first argument; this is a static member so the implicit this
argument is explicit). First here’s arity 1:
1 2 3 4 5 |
static public Object get(Object coll, Object key){ if(coll instanceof ILookup) return ((ILookup) coll).valAt(key); return getFrom(coll, key); } |
Which of the two paths here would 'x
take? I’m going to make a guess that a symbol probably isn’t an instance of ILookup
, and what’s happening is the call to getFrom
. Which is defined in RT.java
as:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
static Object getFrom(Object coll, Object key){ if(coll == null) return null; else if(coll instanceof Map) { Map m = (Map) coll; return m.get(key); } else if(coll instanceof IPersistentSet) { IPersistentSet set = (IPersistentSet) coll; return set.get(key); } else if(key instanceof Number && (coll instanceof String || coll.getClass().isArray())) { int n = ((Number) key).intValue(); if(n >= 0 && n < count(coll)) return nth(coll, n); return null; } return null; } |
My guess is that all of the conditional branches fail for a symbol like 'x
, and getFrom
returns Java null
. Which becomes Clojure nil
. First mystery solved.
How about arity 2? This is the case where a “not-found” value is supplied:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
static public Object get(Object coll, Object key, Object notFound){ if(coll instanceof ILookup) return ((ILookup) coll).valAt(key, notFound); return getFrom(coll, key, notFound); } static Object getFrom(Object coll, Object key, Object notFound){ if(coll == null) return notFound; else if(coll instanceof Map) { Map m = (Map) coll; if(m.containsKey(key)) return m.get(key); return notFound; } else if(coll instanceof IPersistentSet) { IPersistentSet set = (IPersistentSet) coll; if(set.contains(key)) return set.get(key); return notFound; } else if(key instanceof Number && (coll instanceof String || coll.getClass().isArray())) { int n = ((Number) key).intValue(); return n >= 0 && n < count(coll) ? nth(coll, n) : notFound; } return notFound; } |
Aha. ('x 1 2)
is being treated as, “Look up 1
in 'x
and if not found return 2
”.
Editorializing
It makes sense to support a symbol in the function position of an application to enable doing this:
1 2 3 |
('key {'key 42}) ;=> 42 ('key {'key 42} 42) ;=> 42 ('key {} 42) ;=> 42 |
Of course. That’s awesome.
But when the second element is not a collection? It should error. Looking up a key in an integer doesn’t make any sense. It is the result of a mistake. Clojure should tell you it’s an error. Returning something here is as bad as silent coercions in Javascript.
Postscript
For what it’s worth, here’s how this works in rackjure
:
1 2 3 4 5 6 7 8 9 10 11 |
#lang rackjure ('key {'key 42}) ;=> 42 ('key {'key 42} #:else 42) ;=> 42 ('key {} #:else 42) ;=> 42 ('key 1) ; applicable-dict: No dict? supplied ; in: ('key 1) ; Context: ; /tmp/foo.rkt:1:1 [running body] |
One difference is that arity 3 means assoc
. If you want to supply an explicit default, you use an #:else
keyword argument.
The main difference is that a nonsense application results in an error.
-
Spoiler: By the end of this, I remember there’s an optional not-found argument. ↩