Learning ClojureSeptember 3, 2010 at 9:19 pm | Posted in Clojure, Programming | 2 Comments
Recently I rediscovered an old love of mine. No, this didn’t happen at a high-school reunion. As part of my New Year’s resolution I had planned to learn two new programming languages in 2010. And somehow Clojure caught my eye. The language makes a lot of sense to me since I have always been (hence the old love) fond of functional programming languages. This started back in ’90 when I learned Emacs and soon discovered that I could do wonderful things using elisp. Like generating loads of boilerplate C++ code. Next I was working on a drawing and display application for chemical plants. For performance reasons the core was written in C, but I needed an extension language for quick proof of concepts. Someone on Usenet recommended Scheme to me. This worked great for that particular application. Functional programming had a major impact on my programming style, even when doing mostly Java, C, C++ and C# later. Of course C# nowadays has a lot of nice features (Lambda expressions, LINQ, etc.) that makes this easier.
Now to really learn a new language one should spent time implementing something not trivial. “Hello world” just won’t do. Unfortunately I don’t have the time to work on such a project. I did some experiments with Clojure on my Android phone, but soon found out that at it’s current incarnation it is too slow. This is caused by shortcomings on the Dalvik platform and might be improved in the (hopefully) near future.
In the meantime I try to learn the language by explaining it to others and toying with well defined little problems. One of the problems I run into was on StackOverflow: this question is about how to select n random values from a given range [begin, end], with the additional restriction that all the values should be unique. For example if I select 5 random values from the range is between 1 and 13 ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]) I could end up with 4, 9, 2, 12, 10.
You can do this (since Clojure 1.2) in a very simple one-liner, but let me show my first implementation here:
(defn my-helper [n coll selected] (if (zero? n) selected (let [val (rand-nth coll) coll1 (remove (fn [item] (= item val)) coll) sel1 (cons val selected)] (my-helper (dec n) coll1 sel1)))) (defn n-select [n coll] (my-helper n coll nil)) (println (n-select 5 (range 1 15)))
Before I go through the code let me explain the algorithm first: I first create a sequence with a given range together with a new and initially empty sequence. Next I select random entries from the generated range, remove them from it, and add these entries to the new sequence. In short, I randomly move items from one sequence to the other, thus guaranteeing that the values in the new sequence are unique.
Since the algorithm is recursive (Clojure doesn’t have regular loops!) I have to test for an end condition. This is done in line 2, which test whether the number of remaining items to select (n) is already zero. If so, I return the new selection in line 3. If not, we continue with line 4 in which I initialize 3 variables: first val which is a random value from the collection coll. The new collection coll1 is the old collection with this value removed. And finally in line 6 we add this value which results in a new collection sel1. Line 7 calls the function again with these new values. This code is pretty easy to understand for anyone who ever implemented a recursive version of a factorial calculation in any language.
This first implementation can be somewhat shortened by using a shortcut for anonymous functions:
(defn my-helper [n coll selected] (if (zero? n) selected (let [val (rand-nth coll) coll1 (remove #(= % val) coll) sel1 (cons val selected)] (my-helper (dec n) coll1 sel1))))
On line 5 I have introduced #(= % val) The % character is the first parameter. As you can see there is no need anymore for the parameter block. The # is a shortcut for an anonymous function.
Since the assignments to col1 and sel1 are now quite short, let’s just put the code directly in the iterative function call. The other improvement is to use the recur command instead of this function call, since the function has tail-recursion. This avoids a stack overflow when using large values of n:
(defn my-helper [n coll selected] (if (zero? n) selected (let [val (rand-nth coll)] (recur (dec n) (remove #(= % val) coll) (cons val selected)))))
Not bad, but I still have a helper function. Let’s remove that one as well:
(defn n-select [n coll] (loop [n n coll coll selected nil] (if (zero? n) selected (let [val (rand-nth coll)] (recur (dec n) (remove #(= % val) coll) (cons val selected))))))
Instead of initializing the values in the n-select function, we have now initialized them directly in the loop form. In Clojure this form establishes a recursion point at the top of the loop
And finally, a simple one-liner that works since Clojure 1.2:
(defn n-select [n coll] (take n (shuffle coll)))
All the n-select implementations that I have shown here don’t generate a lazy sequence. This makes sense, since we select random entries from a sequence. The most important lesson to take away from this last implementation is that you always have to look at the functionality that sequences (and other data structures) already offer in Clojure. A lot is already available and many operations on sequences are lazy, which means that the sequence is only realized when you request data from it.
Please don’t hesitate to give feedback on this blog and code since I’m still learning every day about Clojure.