Avoiding Pattern Matching Regaining open composability with refunctionalization
A few days ago, my colleague Martin Janiczek published a post on his blog called Defunctionalization in Elm. In the post, he talks about applying technique from James Koppel’s talk The Best Refactoring You’ve Never Heard Of.
The talk itself describes a sort of isomorphism between continuations and data and a way to refactor between the two. This is probably not as life changing information for experienced functional programmers as it might be for people with only imperative programming background but I recommend you to watch this talk especially because it establishes a good base and terminology for talking about this relationship.
The talk also describes a relation between CSP and iteration, which is what I wanted to talk about in the last part of my “Deeper Dive To (Tail) Recursion” series of write-ups that I haven’t finished as my focus shifted. You can find part 1 and part 2 in archive.
I’m personally not a fan of “over-usage” of sum types and pattern matching and I disagree that changes Martin describes in his post are better than the original code. But I haven’t planned to write a post about this until another friend of mine asked me if I can write something about the topic we’ve discussed together on several occations — Why I believe that the pattern matching is often misused and why I think that in many cases it’s desirable to avoid it completely.
I’m going to use the code from Martin’s article to describe what I believe is one common case of misuse of pattern matching and why I believe that the implementation avoiding its usage is more composable and scalable. The only reason why I’m basing this on Martin’s article is that this will offer the reader to see two opposite views on the same example implementation. That will also help the reader to see, that in fact, there is a trade-off when making design decisions like this one.
I’ll try my best to use consist terminology within the scope of this article. Just keep in mind that this is just a blog post and not an academic article though so expect simplifications.
Data are constructed from primitive types. Functions are not data.
A function is a pure1 mathematical computation.
A value is any thing that can be passed or returned by a function. It can be either a datum or a function2.
A sum type is a type in which final set of values is sum of declared cases:
type Bool = True | False
-- bool has True or False values -> 2 constructors
type Maybe a = Noting | Just a
-- maybe is `1 + (1*a)` where a is number of constructors for a
-- Maybe Bool is `1 + (1*2)` -> 3 (Nothing, Just True, Just False)
Enums and all primitive data are sums. Like a bit being either 0 or 1 but not both at the same time3.
A product type is a type where final set of possible values is a product of all individual subtypes:
type MyPair = MyPair Bool Bool
-- MyPair is a product of two Bools -> 2 * 2 -> 4
type alias WithBool a = (a, Bool)
-- WithBool is a * 2 where a is number of a constuctors
Records and tuples are a common way of constructing products.
Algebraic Data Type is a type constructed using sums, products and combinations of both.
Pattern matching is a language feature that allows to deconstruct value of a type by extracting parts of product and branching over sum.
Closed implementation is an implementation where all valid inputs are predetermined by types used within its interface.
Opened implementation is an implementation using interface that doesn’t restrict where and how the data it uses are constructed as long as some type contract is satisfied.
Getting Started
The actual implementation is a bit more involved but for the purpose of this post, we can simply mock it using types with these purposes:
Flags
is a type contained app configuration and contains data necessary for creating requestsStore
is a type in which we store data from serverConfig msg
contains configuration for of the module (in reality it’s record of functions, for simplification we’re going to use a single function)Msg
is a sum type of a different types of messages requests generate.
mocking all of those with types isomorphic to unit looks like this:
module Store exposing (..)
type Flags
= Flags
type Store
= Store
type alias Config msg =
Msg -> msg
type Msg
= Msg
This part of the Store module is not changing in between different implementations, but it’s important as the other code we’re about to write uses these types.
The fact that we’re mocking with type wit dummy values will make the rest of the code look a bit useless. You must believe me that all of this fits into working implementation and has its purpose.
Version with Sum Type and Pattern Matching
The defunctionalized version (the one proposed by Martin) follows like this:
type alias FetchConfig a b =
update : a -> Store -> Store
{ , done : Maybe b -> Cmd Msg
}
fetch_ : FetchConfig a b -> Config msg -> Flags -> Store -> ( Store, Cmd msg )
fetch_ { done } toMsg Flags Store =
Store, Cmd.map toMsg <| done Nothing )
(
type FetchAction
= FetchFoo
| FetchBar
fetch : FetchAction -> Config msg -> Flags -> Store -> ( Store, Cmd msg )
fetch action =
case action of
FetchFoo ->
fetch_
update = always
{ , done = always Cmd.none
}
FetchBar ->
fetch_
update = always
{ , done = always Cmd.none
}
fetchMany : List FetchAction -> Config msg -> Flags -> Store -> ( Store, Cmd msg )
fetchMany actions config flags store =
List.foldl
action ( store_, cmd ) ->
(\let
newStore, newCmd ) =
( fetch action config flags store_
in
newStore, Cmd.batch [ cmd, newCmd ] )
(
)store, Cmd.none )
( actions
I would like to highlight some properties of this approach:
FetchConfig msg
is sort of intermediate structure, a product of all the data that differ for each type of request.fetch_
is a helper function translating theFlagConfig msg
to another (not visible in this example) API.FetchAction
type is a point of coupling. Sum of all possible actions.fetch
does just a branching for different actions.fetchMany
folds actions and aggregates state and commands.
And this is how the usage looks of this API looks like:
module Main exposing (..)
import Store exposing (Flags, Store)
type Msg
= StoreMsg Store.Msg
caller : ( Store, Cmd Msg )
caller =
Store.fetchMany [ Store.FetchFoo, Store.FetchBar ] StoreMsg Flags Store
The caller
is a function that calls API of the Store module.
Seeing this I’m tempted to argue that putting the list argument as a last one
would be better in this case as then it would be possible to first configure the function by applying arguments
and than calling the function returned with different actions. Also in elm it would be possible to format
it a bit nicely with |>
like
caller : ( Store, Cmd Msg )
caller =
Store.FetchFoo, Store.FetchBar ]
[ |> fetchMany configForA Flags Store
But in the reality we use yet another abstraction that expects the last value to be state of the store so this API is better.
No matter how hard I try to look at the version with pattern matching over the sum type, I don’t see how it makes things simpler. In fact, I think the only benefit gained from doing all of this was just that in the process, Martin reconstructed the functionality which helps with understanding it.
Data constructors are, in essence, just constants and functions. The only thing which makes them different is that unlike regular constants and functions, their names start with an uppercase and not a lowercase letter. Well, there is another thing that makes them different. Pattern matching is basically branching over data constructors and thus is unique for cases when working with values other than functions and opaque aliases.
I think it’s fair to say that arguing for API build around data constructors is therefore same, or at least often same, as arguing for usage of pattern matching in the actual implementation. But there are some other (more or less) good arguments for using data over functions:
Decoupling by Refunctionalization
What I dislike about this defunctionalized implementation is that the fetch
function now couples all the individual usages together. In other words, we can say that the API is closed over the FetchAction
type.
Let’s see how it would look like if we replaced the FetchAction
sum by individual constants.
Instead of having FetchAction
type containing all the constants (constructors),
we’re going to have a bunch of constants of the same type.
First, we just define a type without worrying about details.
type FetchAction =
FetchAction
fetchFoo : FetchAction
fetchFoo = Debug.todo "implement me"
fetchBar : FetchAction
fetchBar = Debug.todo "implement me"
We expect these functions to do everything so there is no need for fetch
and fetch_
helpers.
Now when we know how our API should look like, let’s fill the implementation details.
Starting with FetchData
which is itself just a function:
type FetchAction msg =
FetchAction (Config msg -> Flags -> Store -> ( Store, Cmd msg ))
I’m wrapping a function to a constructor for extra clarity — to make it look more like a special value. Martin also mentions this in his post:
And now, because the
fetchMenu
type annotation no longer contains any parameterized msg types, it simplifies all types that touch it to the point where we don’t need toCmd.map
theMsg
at all!
We will need to give up this simplification in our version.
We need this polymorphism in our new API.
The implementation for new FetchAction
type might look like something like this:
fetchFoo : FetchAction msg
fetchFoo =
FetchAction <|
toMsg Flags Store -> ( store, Cmd.map toMsg Cmd.none )
\
fetchBar : FetchAction msg
fetchBar =
FetchAction <|
toMsg Flags Store -> ( store, Cmd.map toMsg Cmd.none ) \
If you have a difficult time dealing with the logic involving a lot of higher order function tricks, you might find this way of thinking useful.
Just forget about functions and data and focus on values. Everything is just a value which you can further reduce and group.
Whenever I have to deal with too many things at once, I try to look for some pattern.
If I see some repeating part like Foo -> Bar -> a -> List a
I know that I can reduce it in head to some Placeholder a
.
The only place where you really need to understand the detail is
when you bridge the level of abstraction to the level in which you need to concern yourself with individual pieces of this value.
This works well with full-blown continuations or higher order functions in general. The key is to understand how two different things can be viewed as the same thing on some level of abstraction.
In context of this article, the important thing to understand is that:
type Fruit
= Apple
| Orange
color : Fruit -> String
color fruit =
case fruit of
Apple ->
"green"
Oragne ->
"orange"
is on some level same as
-- In Elm we get `.color : Fruit -> String` for free
type alias Fruit =
color : String }
{
apple : Fruit
apple =
color = "green" }
{
orange : Fruit
orange =
color = "orange" } {
But both implementation are different on another level.
Defining Fruit
via sum creates a closed set of values.
Defining it as a product of properties creates an open set of values.
It’s even possible to define something like this:
type Fruit a
= Fruit a (a -> String)
getColor : Fruit a -> String
getColor (Fruit val getColor_) =
getColor_ val
type MyFruit
= Apple
| Orange
type alias SpecialFruit =
Fruit MyFruit
specialFruit : MyFruit -> SpecialFruit
specialFruit a =
myFruit ->
(\case myFruit of
Apple ->
"green"
Orange ->
"orange"
)|> Fruit a
type MyFruit2
= Rapsberry
type alias OtherFruit =
Fruit MyFruit2
repsberry : OtherFruit
repsberry =
always "red"
|> Fruit Rapsberry
caller : List String
caller =
List.map getColor [ specialFruit Orange, repsberry ]
Even though this last example seems odd for a simple case like this, it sort of merges the properties of the two previous implementations. This is how type classes are sometimes simulated in languages which don’t have them (like Elm).
Fruit a
acts like a class
of types. MyFruit
is then sort of instance
of this class.
Obviously, without first-class support for such abstraction, it’s usually impractical to work with it.
This is why in a language without higher order (ad hoc) polymorphism, it might be often
favorable to avoid data in favor of keeping value set opened.
In languages like Haskell or PureScript, it’s much easier to turn closed types to opened ones using things like Free or by utilizing type classes.
In fetchMany
, we only need a simple change. We no longer need to call fetch
function because our argument
now becomes this function itself. Also, since we’re boxing the function into the FetchAction
constructor,
we’re going to need to extract it first.
fetchMany : List (FetchAction msg) -> Config msg -> Flags -> Store -> ( Store, Cmd msg )
fetchMany actions config flags store =
List.foldl
FetchAction action) ( store_, cmd ) ->
(\(let
newStore, newCmd ) =
( action config flags store_
in
newStore, Cmd.batch [ cmd, newCmd ] )
(
)store, Cmd.none )
( actions
FetchAction
type now has to be parametrized the same way the fetch
function was in the previous version.
Look at the usage now:
module Main exposing (..)
import Store exposing (Flags, Store)
type Msg
= StoreMsg Store.Msg
caller : ( Store, Cmd Msg )
caller =
fetchMany [ Store.fetchFoo, Store.fetchBar ] StoreMsg Flags Store
See the difference? It’s just Store.FetchFoo
for “defunctionalized” version versus Store.fetchFoo
in the new one.
Is this more complicated in any way? I let you be the judge.
Extensibility
Since the second example doesn’t contain tight coupling to the same sum type, it can be quite easily extended by composition.
For instance we can generalize the Store
module to work with extensible record:
module Store exposing (..)
type Flags
= Flags
type alias Store r =
r | x : () }
{
type alias Config msg =
Msg -> msg
type Msg
= Msg
type FetchAction msg r
= FetchAction (Config msg -> Flags -> Store r -> ( Store r, Cmd msg ))
fetchFoo : FetchAction msg r
fetchFoo =
FetchAction <|
toMsg Flags store -> ( store, Cmd.map toMsg Cmd.none )
\
fetchBar : FetchAction msg r
fetchBar =
FetchAction <|
toMsg Flags store -> ( store, Cmd.map toMsg Cmd.none )
\
fetchMany : List (FetchAction msg r) -> Config msg -> Flags -> Store r -> ( Store r, Cmd msg )
fetchMany actions config flags store =
List.foldl
FetchAction action) ( store_, cmd ) ->
(\(let
newStore, newCmd ) =
( action config flags store_
in
newStore, Cmd.batch [ cmd, newCmd ] )
(
)store, Cmd.none )
( actions
And extend the store with custom data in the main module:
module Main exposing (..)
import Store exposing (Flags)
type Msg
= StoreMsg Store.Msg
type alias ExtendedStore =
x : (), y : () }
{
caller : ( ExtendedStore, Cmd Msg )
caller =
Store.fetchMany
Store.fetchFoo
[ , Store.fetchBar
, fetchBaz
]StoreMsg
Flags
x = (), y = () }
{
fetchBaz : FetchAction Msg ExtendedStore
fetchBaz =
Store.FetchAction <|
toMsg Flags store -> ( store, Cmd.none ) \
This would not be possible with defunctionalized version.
Usage Patterns with Higher Order Functions
Martin ends his post with showing the screenshot — a part of final diff. In this code section, he simplified code into an alias to data constructor. This code was also highlighted in the PR by a comment.
I don’t think this part is significant in any way though. Furthermore, I think it’s avoidable to do something like this in any shape or form.
Instead of emitting any type of intermediate command, I think the init
function should look like this.
init : (List (FetchAction msg) -> Cmd msg) -> ( (), Cmd msg )
init fetchMany =
, fetchMany [ Store.fetchFoo, Store.fetchBar ] ) ( ()
With this, we won’t need any special Msg
in parent module.
All we do is pass down the Store.fetchMany
with applied arguments.
Simple as that.
Conclusion
I hope I managed to demonstrates one reasonable use-case where avoiding pattern matching and data in favor of functions leads to more extensible implementation. This doesn’t mean that pattern matching is bad in general. In fact, it’s a useful tool for modeling closed APIs. I’m personally leaning towards an opinion that branching over closed data is generally good in high level and low level code but not that much in the abstract source in the middle of logic. In upper level, it’s often desirable to dispatch the control between the blocks of program. In lower level logic, implementation details are being extracted and acted upon. But it’s likely that most of the stuff in the middle should be designed with extensibility provided by open design.
A good example of open API is elm/html which is opened using node constructor.
It would be possible to define type Node
and functions operating with this closed sum type but it would be a poor design choice.
By designing all the code around closed structures, we’re in some sense making a full circle back to C interfaces with integer arguments used to change behavior of procedure. We just have slightly more expressive tool to accomplish that.
Some time ago, I’ve also created elm-continue, a package with even more generalized abstractions for working with continuations than what we used here. I just must warn you, that not everyone from elm community would likely approve of using it.
I was honestly unsure if I should write this post. I was a bit afraid that it will looks too much like a criticism of Martin’s original article which is not my point at all. I also know a lot of people who will likely disagree with my feelings towards pattern matching and “over-usage” of data. The reason why I wrote it in the end is the slight chance that someone will find this useful or interesting (looking at you, Zdenko) and that maybe I manage to avoid unnecessary negative feelings around this. This doesn’t mean you should avoid criticism. In the end, it’s just a tool of progress.
In any turing complete language (including functional ones), it’s possible to define partial functions. Like functions that never terminate.↩︎
This is true only in languages with higher order functions.↩︎
All hail the quantum universe.↩︎
Some languages like Unison are able to serialize functions. Unison specifically does so by sending the AST over the wire. Even much simpler techniques comes in mind. Languages with
eval
(like most of the dynamic languages and almost all lisps (excluding clojure-script) has such function). The primary concern here is security and executing arbitrary code from some unknown source is what is often called arbitrary code execution vunerability. Precisely because of the security, it’s always a good idea to close APIs exposed to untrusted 3rd parties.↩︎Elm type system allows you to use equality operator (
(==)
) over any type but blows up in runtime with functions. Languages with higher order polymorphism or operator overloading won’t be able to statically detect such cases and won’t allow you to use equality over functions. I’m not aware of any language in which equality over function is possible but I believe in Unison, it might be possible.↩︎By definition, it will still help you to just see an intermediate value but that might be enough for certain problems. Debugging is usually not an issue with step in debugging.↩︎