Clojure Mad Science: An Evil Threading Macro Experiment?

You are a mad scientist, and you’d like to perform an experiment on your mysterious test subjects. You want half of the subjects to be assigned to one of three experimental groups, and the other half of the subjects will serve as the control group.

The details of the experiments are not important for now, but suffice to say they will not be condoned by the American Psychological Association.

Got a solution already? Okay, okay, we’ll put this in more concrete, mathy terms, but everything after this sentence is really part of a solution, so don’t think you’re getting away with anything here, Doctor Scaryhausen.

Given a collection of length n, shuffle the elements randomly, grab the odd ones, and map those to values cycling among :x, :y, and :z. Don’t forget to include the even ones in your result map, mapped to :c (for the ”control“ group).

As a diligent student of Clojure’s sequence library, you might come up with a solution like this one:

 1 (defn group-assignments [subjects]
 2   (let [indexed-subjects (map-indexed vector (shuffle subjects))]
 3     (apply hash-map
 4       (concat
 5         (interleave
 6           (map last
 7                (filter (fn [[i subject]] (odd? i))
 8                        indexed-subjects))
 9           (cycle [:x :y :z]))
10         (interleave
11           (map last
12                (filter (fn [[i subject]] (even? i))
13                        indexed-subjects))
14           (repeat :c))))))

This works just fine, and as expected, you’ll get a different result every time because of the shuffle:

1 user=> (group-assignments ["frankenstein", "dracula", "the hulk", "wolfman", "spiderman",
2 "the mummy"])
3 ;=> {"the mummy" :c, "wolfman" :y, "frankenstein" :z, "spiderman" :c, "dracula" :c,
4 "the hulk" :x}

However, it’s pretty hard to tell what’s going on here. The duplication is pretty gross: (interleave (map last (filter (fn [[i subject]], and that contributes to a feeling of just too many responsibilities for one function. It’s hard to read and hard to test, and that kind of thing makes the Wolfman, for one, pretty angry. Let’s clean that up:

 1 (defn subject-groups [pred indexed-subjects groups]
 2   (apply hash-map
 3     (interleave
 4       (map last (filter pred indexed-subjects))
 5       (cycle groups))))
 6 
 7 (defn assign-indexed-subjects [indexed-subjects]
 8   (let [odd-subject? #(odd? (first %))
 9         even-subject? (complement odd-subject?)]
10       (into
11         (subject-groups odd-subject? indexed-subjects [:x :y :z])
12         (subject-groups even-subject? indexed-subjects [:c]))))
13 
14 (defn random-indexed-subjects [subjects]
15   (map-indexed vector (shuffle subjects)))
16 
17 (defn group-assignments [subjects]
18   (assign-indexed-subjects (random-indexed-subjects subjects)))
1 user=> (group-assignments ["frankenstein", "dracula", "the hulk", "wolfman", "spiderman",
2 "the mummy"])
3 ;=> {"the mummy" :c, "wolfman" :y, "frankenstein" :c, "spiderman" :z, "dracula" :x,
4 "the hulk" :c}

Much better. In the process of removing the duplication, we’ve refactored out a function subject-groups that has no knowledge of the algorithm we’re using to split up our subjects. That strategy gets injected as the function argument pred, for “predicate”. Why not spell the whole word out?

In short, because it’s a convention. In slightly less short, because if you’ve watched Uncle Bob Martin’s recent video on naming, you know that the convention itself is okay because the scope of that binding is pretty small.

The other good thing about splitting these responsibilities out is that now subject-groups and assign-indexed-subjects, the most complex pieces of this solution, are now referentially transparent.

They are deterministic, so they can now be more easily understood, tested, and even memorized! It’s generally a good idea to partition side effects into small parts of a system in order to get these benefits.

Now, because we have some time left before the sun sets and we’re able to actually perform our evil experiments, let’s look more critically at subject-groups.

1 (defn subject-groups [pred indexed-subjects groups]
2   (apply hash-map
3     (interleave
4       (map last (filter pred indexed-subjects))
5       (cycle groups))))

For an experienced Lisper, this is relatively clear. Just four lines of implementation, not too much cleverness, standard Clojure sequence functions. But what if we represented this function in terms of the way the input subjects is transformed into a map of subjects pointing to groups?

If you’ve never seen Clojure’s threading macros (-> and ->>), then today’s your lucky day! The idea is that you have a series of transformations, represented by Clojure forms, inside the macro.

And that macro takes the result of one form and inserts it (in some way, depending on which macro) into the next form. As an example, the -> macro inserts into the second position (the first argument):

1 user=> (-> 2       ; 2
2            (* 2)   ; (* 2 2)
3            (+ 10)  ; (+ (* 2 2) 10)
4            (/ 2)   ; (/ (+ (* 2 2) 10) 2)
5            (* 6))  ; (* (/ (+ (* 2 2) 10) 2) 6)
6 ;=> 42

Using the ->> macro inserts into the last position:

1 user=> (->> 2       ; 2
2             (* 2)   ; (* 2 2)
3             (+ 10)  ; (+ 10 (* 2 2))
4             (/ 2)   ; (/ 2 (+ 10 (* 2 2)))
5             (* 6))  ; (* 6 (/ 2 (+ 10 (* 2 2))))
6 ;=> 6/7

Pretty simple, right? One edge case is that a bare symbol, like count, will be translated into (count) so that inserting into the form is meaningful, but we won’t need that detail here.

So let’s give this a shot with subject-groups. First, recall our current implementation:

1 (defn subject-groups [pred indexed-subjects groups]
2   (apply hash-map
3     (interleave
4       (map last (filter pred indexed-subjects))
5       (cycle groups))))

Now let’s use the ->> macro:

1 (defn subject-groups [pred indexed-subjects groups]
2   (->> indexed-subjects
3        (filter pred)
4        (map last) ; Wait a minute, we're stuck!

Well crud. Do you see the problem? We’d like (interleave (cycle groups)) to be the next form (a transformer form?) to thread our data through, but that’ll give us a map keyed by group!

This is because the previous form always gets stuck in at the end of the next form with the ->> macro. At this point, I’ve often used a cop-out, bailing out of one threading form and moving on with a different one:

1 (defn subject-groups [pred indexed-subjects groups]
2   (let [keys (->> indexed-subjects
3                   (filter pred)
4                   (map last))]
5     (-> keys
6         (interleave (cycle groups)) ; Oh crap, stuck again!!

Gah, it’s starting to feel like Clojure is doing evil experiments on us! So now what? Well, it seems we need to switch back to ->>:

1 (defn subject-groups [pred indexed-subjects groups]
2   (let [keys (->> indexed-subjects
3                          (filter pred)
4                          (map last))
5         keys-values (-> keys
6                         (interleave (cycle groups)))]
7     (->> keys-values
8          (apply hash-map))))

OK, this gets the job done…

1 user=> (subject-groups #(odd? (first %))
2                        (map-indexed vector (range 10))
3                        ["frankenstein", "dracula", "the hulk",
4                         "wolfman", "spiderman", "the mummy"])
5 {1 "frankenstein", 3 "dracula", 5 "the hulk", 7 "wolfman", 9 "spiderman"}

But are we really being clearer here than we were with our first version of subject-groups, without any threading macros? I’d argue that we’re not, and I think it’d be tough to convince me otherwise.

Thankfully, there does exist a way to hack in the ordering we really want and use one threading macro the whole way down:

1 (defn subject-groups [pred indexed-subjects groups]
2       (->> indexed-subjects
3        (filter pred)
4        (map last)
5        (#(interleave % (cycle groups)))
6        (apply hash-map)))

This works as perfectly as Frankenstein’s neck bolts! Since we want our transforming data to be the first argument to interleave, but the last argument for the remainder of the transforming forms, we’ve just created an anonymous function that’s similar to interleave, but swaps the argument order.

Recall here that because we’re using a macro (->>), at runtime the code is nearly identical to our initial subject-groups solution: the only difference is this new swapped-argument interleave-like function that we’ve created. So we don’t pay a performance cost at runtime for using this version to express the data transformation.

Looking back at our initial subject-groups implementation, I admit I’m not convinced that either one is definitively better than the other.

1 (defn subject-groups [pred indexed-subjects groups]
2   (apply hash-map
3     (interleave
4       (map last (filter pred indexed-subjects))
5       (cycle groups))))

However, I do find the data transformation pattern encouraged by the threading macros to be a compelling way to visualize this process, and in many cases you don’t even need the argument-swapping hack to make it happen.

Pause…

Okay, you’re right, you’re not a mad scientist and therefore the example doesn’t apply to you directly, but the bind it puts you into from a threading macro perspective is very real world. The fact is that when data is being threaded through your forms, the macro depends on the position of the data being consistent throughout the flow.

This problem will also come up (for instance) anytime you’re using ->> to thread sequential data, and you need to conj something onto it, since conj takes a sequence as the first argument.

I’m interested to hear how you feel about these two versions of subject-groups, and what you might do differently.

Colin Jones, Software Craftsman

Colin Jones is particularly interested in web security and functional programming, using languages like Clojure.