As you may have heard in the upcoming 0.12.0 release Elixir's enumerators gained some new features. In this blog post I'll explain what's new, what it enables and how it works.

For those of you who use the development version of Elixir these changes are already available. For the exact differences in code you can look at the relevant pull request.

A recap of enumerators, and some terminology

The basic idea of enumerators is that you traverse some data structure or resource (lines from a file) by putting the thing that is traversed in control. That is if you're reading from a file you have a loop that reads lines from a file and for each line calls a function. Just calling a function isn't all that useful for most tasks as there'd be no way to remember previous lines (ugly hacks aside), so some accumulator value is passed to the function and a new accumulator is returned by it.

For example here's how you can count the total length of strings in a list.

Enumerable.reduce(l, 0, fn x, acc -> String.length(x) + acc end)

Often the actual call to Enumerable.reduce/3 is hidden inside another function. Say that we want to define a sum function. The usual way is to write it like this:

def sum(coll) do
  Enumerable.reduce(coll, 0, fn x, acc -> x + acc end)
end

This could get called as Enum.map(1..10, &(&1 * &1)) |> sum() to get the sum of squares. Desugaring this means sum(Enum.map(1..10, &(&1 * &1))).

The general pattern is this:

def outer_function(coll, ...) do
  ...
  Enumerable.reduce(coll, initial_consumer_acc, consumer)
  ...
end

something_that_returns_an_enumerable(...) |> outer_function(...)

You'll notice the slightly uncommon terminology of "outer function" and "consumer" (normally called an "iteratee"). That's intentional, naming an iteratee a consumer better reflects that it consumes values.

Along the same lines I call the reduce function for a specific enumerable a producer, it produces values which are given to a consumer.

The outer function is the function to which the enumerable is passed. Syntactically it looks like this is the consumer, but it's really a function that combines the producer and the consumer. For simple consumers (say fn x, acc -> length(x) + acc end) the consumer will often be written directly in the source text of the outer function, but let's try to keep those concepts distinguished.

Two issues with classic Elixir enumerators

Enumerators are great, but they have their limitations. One issue is that it's not possible to define a function that only returns at most 3 elements without traversing all elements or using ugly tricks such as throw (with a try...catch construct in the outer function). The throw trick is used in Enum and Stream to implement functions such as Enum.take/2 and Stream.take_while/2. It works, but it's not what I'd call stylish.

A bigger problem, that doesn't have a workaround, is that there's no way to interleave two enumerables. That is, it's not possible to define a function that for two enumerables A and B returns a list [A1, B1, A2, B2, A3, ...] (where A1 is the first element of A) without first traversing both lists and then interleaving the collected values. Interleaving is important because it's the basis of a zip function. Without interleaving you cannot implement Stream.zip/2.

The underlying problem, in both cases, is that the producer is fully in control. The producer simply pushes out as many elements to the consumer as it wants and then says "I'm done". There's no way aside from throw/raise for a consumer to tell a producer "stop producing". There is definitely no way to tell a producer "stop for now but be prepared to continue where you left off later".

Power to the consumer!

At CodeMeshIO José Valim and Jessica Kerr sat down and discussed this problem. They came up with a solution inspired by a Monad.Reader article (third article). It's an elegant extension of the old system, based on a simple idea. Instead of returning only an accumulator at every step (for every produced value) the consumer returns a combination of an accumulator and an instruction to the producer. Three instructions are available:

  • :cont - Keep producing.
  • :halt - Stop producing.
  • :suspend - Temporarily stop producing.

A consumer that always returns :cont makes the producer behave exactly the same as in the old system. A consumer may return :halt to have the producer terminate earlier than it normally would.

The real magic is in :suspend though. It tells a producer to return the accumulator and a continuation function.

{ :suspended, n_, cont } = Enumerable.reduce(1..5, { :cont, 0 }, fn x, n ->
  if x == 3 do
    { :suspend, n }
  else
    { :cont, n + x }
  end
end)

After running this code n_ will be 3 (1 + 2) and cont will be a function. We'll get back to cont in a minute but first take a look at some of the new elements here. The initial accumulator has an instruction as well, so you could suspend or halt a producer immediately, if you really want to. The value passed to the consumer (n) does not contain the instruction. The return value of the producer also has a symbol in it. Like with the instructions of consumers there are three possible values:

  • :done - Completed normally.
  • :halted - Consumer returned a :halt instruction.
  • :suspended - Consumer return a :suspend instruction.

Together with the other values returned the possible return values from a producer are { :done, acc } | { :halted, acc } | { :suspended, acc, continuation }.

Back to the continuation. A continuation is a function that given an accumulator returns a new producer result. In other words it's a way to swap out the accumulator but keep the same producer in the same state.

Implementing interleave

Using the power of suspension it is now possible to create an interleave function.

defmodule Interleave do
  def interleave(a, b) do
    step = fn x, acc -> { :suspend, [x|acc] } end
    af = &Enumerable.reduce(a, &1, step)
    bf = &Enumerable.reduce(b, &1, step)
    do_interleave(af, bf, []) |> :lists.reverse()
  end

  defp do_interleave(a, b, acc) do
    case a.({ :cont, acc }) do
      { :suspended, acc, a } ->
        case b.({ :cont, acc }) do
          { :suspended, acc, b } ->
            do_interleave(a, b, acc)
          { :halted, acc } ->
            acc
          { :done, acc } ->
            finish_interleave(a, acc)
        end
      { :halted, acc } ->
        acc
      { :done, acc } ->
        finish_interleave(b, acc)
    end
  end

  defp finish_interleave(a_or_b, acc) do
    case a_or_b.({ :cont, acc }) do
      { :suspended, acc, a_or_b } ->
        finish_interleave(a_or_b, acc)
      { _, acc } ->
        acc
    end
  end
end

Interleave.interleave([1,2], [:a, :b, :c, :d])
#=> [1, :a, 2, :b, :c, :d]

Lets go through this step by step. The main interleave function first partially applies Enumerable.reduce/3 to get function values that work just like the continuations. This makes things easier for do_interleave.

The do_interleave function first calls a (af from interleave) with the step function so that the available element of a gets added to the accumulator and a immediately suspends afterwards. Then the same is done for b. If either producer is done all the remaining elements of the other get added to the accumulator list.

Note that acc is sometimes used to mean a tuple like { :cont, x } and sometimes the accumulator value proper. It's a bit confusing, yes.

This example shows that through clever combination of an outer function (do_interleave) and an inner function step two producers can be interleaved.

Conclusion

The new system of enumerators certainly makes things a bit more complicated but also adds power. I suspect many interesting and "interesting" functions can be built on top of it.