A Classy Approach to Recursion

October 11, 2016
Tagged: haskell, programming, dsl

Contents

There is a rich variety of techniques that have been put forward for working with DSLs in Haskell, and typed functional languages generally. The recursion scheme approach described in Bananas, Lenses, Envelopes, and Barbed Wire stands out as a particularly powerful and elegant one, with a Haskell implementation discussed in An Introduction to Recursion Schemes. This work introduced the category theory concepts of catamorphism and anamorphsim – in this context representing generalizations of folds and unfolds respectively – to the functional programming lexicon.

However the representation it imposes on DSL data types can be unwieldy to work with, especially for mundane manipulations that don’t fall within this recursion scheme framework. In an attempt to get the best of both worlds, I’ve experimented with an approach that allows one to stick with the most straightforward representation, while providing access to the equivalent form used by the recursion schemes through a type class. In this post I’ll review some essential elements of the prior art, and then show this type class mechanism.

I came up with this while working on a project to provide a foundation for a computer algebra system with core functionality managed through type classes, which would be extensible by instantiating new data types into these classes. This project is in a very early state, but it is nevertheless up on Github.

Review of catamorphisms in Haskell

If you haven’t seen this material before, I recommend the aforementioned tutorial, especially part 2, for a proper fleshed-out introduction. My review here is far from comprehensive, although it is self-contained.

For the sake of discussion lets consider this very simple DSL data type.

The data type could be anything that has a recursive definition. The idea is that we have expressions or arbitrary size and we’re going to show a nice way to perform recursive computations on them.

An example of the straightforward way to do this would be as follows.

There is nothing wrong with this basic approach, particularly when the example is so simple, but when the complexity begins to grow it can be extremely helpful to use a recursion scheme that takes care of all the recursive function calls.

First we rewrite our data type in a polymorphic form, making every constructor that referred to itself instead refer to a type variable.

The motivation for this is not to have to describe our computation all at once, but instead to be able to describe it as a single recursive step, which is then applied repeatedly. The polymorphic type variable in this definition can represent the return type of any particular function we may wish to create.

First, though, how do we represent an arbitrarily recursive expression with this type? Seemingly this would entail the infinite type

Expr (Expr (Expr ... ))

This obviously is not valid Haskell. But we can accomplish the same thing with the following data type.

It may take some practice to get used to but now we can represent our arbitrarily nested expressions just as before. We just need to wrap each node, or constructor, with this In constructor.

Now we’re ready to re-cast our computation. As stated, all we really need to do is define the computation at a particular nesting layer. Intuitively, you might think of it as applying to the deepest layer.

You can see that this definition is almost the same as the basic eval function defined above, except that it doesn’t have any recursive calls in the function body. The values brought into scope by the pattern match are already of the return type. We only need to describe what to do with these values, essentially assuming the deeper layers of the structure have already been evaluated.

Since every recursive function we define will be built in the same way from these “step” functions, we might as well give them a name.

Now the magic of this catamorphism scheme allows us to turn any such computation step into the full thing.

This line is quite compact. You’ll likely need to stare at it for a while thinking throught the steps to fully absorb it. First the out record accessor function from the definition of Fix, unwraps our Fix Expr into an Expr (Fix Expr). Then we map this whole computation over all the immediate children nodes of that structure, yielding an Expr a. And finally we perform the provided Algebra f a function one last time to obtain an a.

Worth noticing is that cata contains the self-reference that we avoided in evalStep, and in any other recursive function we might define using it.

Our full evaluation function is now simply:

There is also a dual notion of anamorphsim, genearlizing unfolds whereas catamorphisms are a generalization of folds. I write them together here to make clear how they are similar and complementary.

Having our cake and eating it too

In writing my library I found that, while sometimes the above approach was amazingly helpful, other times it would be easier to work with a vanilla data type. After all, if we aren’t doing this fancy recursion scheme stuff it can be a pain to work with these extra In constructors all over the place.

Well one straightforward way to have it both ways is to define both versions of our data type and relate them with a type class.

Lets adopt the following definitions. The parametric data type will be indicated by adding an underscore to every constructor.

Eventually, it may be nice to allow the user to specify their language just in the first non-parametric form and then have the second form generated by Template Haskell.

In any event, once we have these definitions we can relate them with a type class, essentially making an isomorphism with a little help from the recursion infrastructure we just introduced.

We can instantiate our example as:

And now we can operate even more painlessly. Specifically we can define a function that operates on a vanilla data type but that uses a function that acts on the polymorphic variant.

Now, applying this computation is just as easy as in the previous example, only there is no longer any need to be messing around with Fix directly.