Anonymous types matter Anonymous tuples and records for designing APIs compatible by a structure, not a construction.

Posted on May 12, 2020

So I recently released another Elm library. I wished to have a library like this for a long time but I was hoping someone else will come up with the same design idea eventually. A few weeks ago I finally found enough motivation to do the tedious work and code it. Usually, I don’t write announcements to my releases but I think in this particular case the design idea might be a bit controversial so I think it’s worth the explanation. I want to keep this post relatively general so it’s not an announcement post neither. I just want to talk a bit about anonymous types.

This post will be primarily about Elm but I’m going to compare it to Haskell and PureScript as well. It should be also simple to map these ideas to OCaml and F# (or any other language with similar type system ML, Rust, Idris..) if that’s your cup of tea.

What is an Anonymous Type?

The term “anonymous type” is somewhat semi-technical. If you look up the Wikipedia article you find some programming languages do include a feature named “anonymous types” though. So hopefully there are some similarities between these implementations.

Anonymous types are a feature of C# 3.0, Visual Basic .NET 9.0, Oxygene, Scala, and Go that allows data types to encapsulate a set of properties into a single object without having to first explicitly define a type.

If you look closer into how these features work you find out that it essentially allows a programmer to construct record struct or object value (depending on language terminology and features) without need to declare the corresponding type first.

For purposes of this post, I’m actually going to generalize the definition to include all product types to potentially satisfy (or not) the condition of being anonymous. In particular, I mean that traditional tuple types from Haskell or ML are also anonymous types.

If I should attempt to define what Anonymous Type is, the definition would be something like this:

Anonymous type is a product type whose values can be constructed without an explicit type declaration. Since all the information about the shape of product type can be inferred at the time of construction (which is not true about variant) type can be deduced directly from the value. This should also already suggest that anonymity is provided by the language itself, not by its libraries.

This is not to be confused by type inference which is an ability to deduce the type of a value. However type inference still requires the type to be declared, it just doesn’t require a declaration of type on a value. For instance, in this example Elm code:

type Object = Earth | Mars | Moon | Sun

destination = Mars

the type of destination is inferred as Object but the declaration of an Object type is required.

If you want to dive into much more advanced topic of anonymous variants, check The Cofree Comonad and the Expression Problem by Edward Kmett.

Anonymous Types in a Wild

I will be primarily talking about records and tuples as I don’t want to attempt to solve the expression problem here. Perhaps I should also mention that I’m not going to talk about extensible records or row polymorphism. This post is about plain simple boring records and tuples.

Since I don’t want to assume deep familiarity with all, or even just a single language I’m going to mention, I will use simple code examples for demonstrations. I’ve chosen Haskell Elm and PureScript as these are good examples I’m familiar with myself.

Be aware that Elm has some (arguably confusing) naming inconsistency with Haskell and PureScript:

Language Data Type Unboxed Data Type Type Synonym
Haskell data newtype type
PureScript data newtype type
Elm type type (inferred) type alias

Anonymous Tuples

A tuple is a sequence of elements. You can think about it as about struct in which values are accessed by index. It has a fixed size and each value in a slot has a specific type.

In examples below, I’m also defining custom type just to make it obvious that tuple can hold any type of value even the one which is not known to the standard library.

Haskell

This is the Haskell syntax for defining a tuple (pair in this case):

newtype MyData = MyData { unMyData :: Int }

myPair = ("Foo", MyData 42)

If we run the ghci and load the module we can inspect the type of the value:

λ :l Tuples.hs
[1 of 1] Compiling Main             ( Tuples.hs, interpreted )
Ok, one module loaded.
λ :t myPair
myPair :: ([Char], MyData)

This is the behavior we would expect from an anonymous type. We’re given syntax to construct tuple in any module which results in a value of compatible type.

Elm

Elm has tuples very similar to the Haskell:

-- Elm requires module definition
module Tuples exposing (myPair)

type MyData = MyData Int

myPair = ("Foo", MyData 42)

To check the type we can fire repl again (use elm repl command. Be aware that the presence of elm.json file required):

---- Elm 0.19.1 ----------------------------------------------------------------
Say :help for help and :exit to exit! More at <https://elm-lang.org/0.19.1/repl>
--------------------------------------------------------------------------------
> import Tuples exposing (..)
> myPair
("Foo",MyData 42) : ( String, MyData )

This is the same situation as with Haskell.

PureScript

PureScript is interesting as it does ship only with minimal language support, but without the standard library. This design decision makes sense in a wider design context but it also means that PureScript doesn’t contain support for tuples in the core language. Tuples are provided by optional library instead.

I’m using spago as a build tool for PS (which is sort of similar to stack in Haskell world) because we need to manage dependencies in this case.

module Tuples where

import Data.Tuple

newtype MyData = MyData Int

myPair = Tuple "Foo" (MyData 42)

Run spago repl to inspect the type:

[info] Installation complete.
PSCi, version 0.13.6
Type :? for help

import Prelude

> import Tuples
> :t myPair
Tuple String MyData

If we look into the definition of Tuple, we can see it’s a good old custom product type:

data Tuple a b = Tuple a b

Data.Tuple.Nested provides a bunch of aliases and type and value operator (/\) for convenience. When importing this module, we can also define our pair as follows:

myPair :: String /\ MyData
myPair = "Foo" /\ MyData 42

In fact, this way of defining tuples is pretty close to the mathematical definition.

Since tuples in PureScript are not first-class language construct, we definitely can’t consider them as being anonymous. Anyway, the PureScript implementation also nicely demonstrates how it is possible to emulate the absence of anonymous type within a language with parametric polymorphism.

  • Define parametric custom type (eg. Tuple a b)
  • Provide general functions to work with this type (fst, snd…)
  • Make sure everybody is using the same this definition of type (community standard)

I also want to point out that the absence of built-in tuples is not much of a problem in PureScript in practice. PureScript’s product types are all about records as we will see later. I personally still think that it’s good to have first-class tuples as in some cases, the positional nature of tuples works better than named keys of records. That being said, in most cases, however, records are nicer to work with.

Records

Record is another example of product type which we might want to make anonymous. Let’s look at the three languages again.

Haskell

The lack of anonymous records is causing some headaches in Haskell for ages. Some folks went even as far as solving at least parts that are solvable in user space (without compiler changes) in projects like record or superrecord.

Lets have a look at this idiomatic Haskell code:

data User = User { name :: String, age :: Int }
data Project = Project { name :: String, description :: String }

This fails with an error.

Records.hs:2:26: error:
    Multiple declarations of ‘name’
    Declared at: Records.hs:1:20
                 Records.hs:2:26
  |
2 | data Project = Project { name :: String, description :: String }
  |                          ^^^^
Failed, no modules loaded.

Haskell records are just regular custom product types like data User = User String Int. This is why record declaration is using data. Records are new data types not type synonyms. Record syntax just provides extra getter and setter functions. These functions are where multiple declarations problem happens – Haskell wants to generate two name accessors and these collide.

To fix this problem, we need to :set -XDuplicateRecordFields. Anyway if we do so we lose the ability to use getter and setter functions:

λ :t name

<interactive>:1:1: error:
    Ambiguous occurrence ‘name’
    It could refer to
       either the field ‘name’, defined at Records.hs:2:26
           or the field ‘name’, defined at Records.hs:1:20

So in order to make records usable, we need to allow other extensions like -XRecordWildCards.

Since there are so many extensions and recommended practices around Haskell’s records, I can’t possibly talk or even list all of them. Feel free to refer to these articles for more information though:

Idiomatic Haskell’s records are definitely not anonymous though. Common usage of Hungarian notation is an ever-present reminder that of Haskell’s relation to Microsoft research [insert troll face].

Elm

In Elm, on the other hand, we can easily do this.

module Records exposing (..)

type alias User = { name : String, age : Int }
type alias Project = { name : String, description : String }

As you can see, these records are defined as synonyms (aliases) to “already existing” record type. We, in fact, don’t even need to declare these aliases at all:

> foo = { asdf = "asdf", lkj = -1 }
{ asdf = "asdf", lkj = -1 }
    : { asdf : String, lkj : number }

If we define record alias though, we get a value constructor for free:

> User
<function> : String -> Int -> User

Further more, Elm gives us polymorphic getters (and setters):

> .name
<function> : { b | name : a } -> a

.name as well as {record}.name will work with any records which has name field what ever its type might be. This is much like a fst or Tupple.first functions but it also with named rather than positional keys. Don’t confuse this with maps with string based keys (so common in dynamic languages). This is nothing like map with string because records have known shape. Querying record for key it doesn’t have is a type error.

Elm’s records are anonymous. They are also so-called extensible records. You won’t believe what these records can do! article by Jonas Berdal goes deeper into this.

PureScript

I already said that records are a big deal in PureScript, so let’s just dive right into it. This is the idiomatic declaration of a specific record type.

module Records where

type User = { name :: String, age :: Int }
type Project = { name :: String, description :: String }

Unlike Haskell (without language extension) and like Elm, this code is perfectly fine. Also, you can see that records declaration uses keyword for a synonym (similar to Elm).

We, of course, don’t need to declare synonyms unless we want to:

> foo = { asdf : "asdf", lkj : -1 }
> :t foo
{ asdf :: String
, lkj :: Int
}

In PureScript, aliases won’t give us function constructors (User : String -> Int -> User) like they did in Elm. Instead, there is special syntax for declaring value constructor we can use even without declaration of synonym:

:t { foo : _, bar : _ }
forall t1 t2.
  t1
  -> t2
     -> { bar :: t2
        , foo :: t1
        }

We also don’t get .name style getters so we need to use {record}.{field} syntax:

> { name : "Jane Doe"}.name
"Jane Doe"

PureScript records have even more powers provided by row polymorphism abilities of the language. You can check Making Diffs of differently-typed Records in PureScript article by Justin Woo if you’re keen to learn more.

Overview

Based on our findings we can compile this overview of three languages:

Language Anonymous Tuples Anonymous Records Other Records Feature
Haskell yes no via extensions
Elm yes yes extensibility
PureScript no yes row polymorphism

So Why Anonymous Types Matter?

Anonymous types fill certain needs in software design pretty well. This goes back to the open/closed principles I wrote about some time ago.

In all three languages, we have the whole spectrum of features that help to craft APIs with the right properties.

Type Opened/Closed Description
Opaque type strictly closed Type can’t be constructed and deconstructed outside of the module
Custom ADT closed Type is defined in a specific module which id required for working with type
Common ADT almost opened Like custom ADT just expected to universally available (part of stdlib etc)
Anonymous type opened Completely independent of its definition, compatible by a structure

Depending on the nature of the API, different level of openness/closeness might be appropriate.

The three languages we have looked into, while being generally fairly similar, have some obvious differences. This is why the best API for a problem might often look a bit different across the three.

For example, uncons function has different API in PureScript because it’s desirable to return anonymous product type. It also seems to be preferable to use positional tuple over the record provided language does have anonymous tuple (at least Elm which has choice uses tuple).

  • Elm (elm-community/list-extra): uncons : List a -> Maybe ( a, List a )
  • PureScript (purescript-lists) : uncons :: forall a. List a -> Maybe { head :: a, tail :: List a }
  • Haskell uncons :: [a] -> Maybe (a, [a])

An example where both Elm and PureScript might favor record (and Haskell is likely using just multiple arguments) is in functions with a bit more complicated API. In these cases, records can work as a substitution for named arguments (which Haskell does not, but OCaml does). Using records with named fields brings additional semantic clarity, making it easier to understand the API. An example might be a function that performs HTTP requests.

Non-Empty List as Synonym

About a week ago, I published non-empty-list-alias library for Elm. In my opinion, there are several benefits of defining non-empty list in terms of (a, List a). Especially in Elm.

Currently, it’s common to define non-empty list as custom type NonEmpty a = Cons a (List a). This is what all existing libraries do. It is also how Data.List.NonEmpty in Haskell’s base is defined.

I think synonym definition is especially well suited in Elm because:

  1. Elm doesn’t have ad hoc polymorphism – Unlike in Haskell and PureScript, we can’t make non empty instance of any type-class.
  2. elm/core doesn’t come with a definition of NonEmpty – There is no single implementation of this type everyone is expected to use.
  3. It’s easy enough to work directly with tuple – It’s virtually free to opt-in and out of such a library.
  4. elm-community/list-extra already provides uncons – this function is a compatible constructor with this definition.

That’s not to say there wouldn’t be any benefits in having a similar definition in a language like Haskell. There are folks who would prefer such definition. I think in the case of Elm, the case is even stronger.

That being said, I’m well aware of downsides of this approach:

  1. Less semantically expressive constructor (in pattern matching)
  2. Potentially encouraging even more fragmentation in implementations (or diversity, depends on where you stand).

The library also comes with a zipper module. Unlike the NonEmpty, the Zipper type is opaque. Zipper type contains private data field users are not supposed to be able to mess with – therefore the opaque type is the right choice in this case.

Both NonEmpty and Zipper modules come with full-featured implementations, including but not limited to Functor Applicative Functor and Comonad functions.

If you want to learn more, there are some links:

Conclusion

I’m pretty sure there are and will always be some folks who won’t like to give up the semantically more expressive definition of custom constructors. And I think that’s fine as long as we all understand the trade-off. Both alias to pair and custom ADTs are isomorphic anyway, so it’s just a matter of practicality in the end. I’m personally often willing to sacrifice some expressiveness to make APIs a bit easier to work with myself.

Anonymous types do change the ergonomics of a language and APIs which exploit their power. For what it’s worth, I think that we should always think about what is an appropriate level of openness or closeness when designing APIs. Especially if language offers a bunch of them.

Since I'm not a fan of disqus or any other commenting system there is no disscusion under this post.
However I do like reddit as a platform so feel free to shout here:
r/elm/comments/givnx0/anonymous_types_matter_non_empty_list_as_a_list_a