4 Functions and one view
Let us define an Agda module:
> module Partition where
>
> open import Data.Bool
> open import Data.Nat
> open import Data.Product
In Agda, we can define the option-type as the following hybrid between Haskell and ML:
> data Option (A : Set) : Set where
> Nothing : Option A
> Just : (o : A) -> Option A
That is, option types are formed by either being Nothing
or by being Just A
for some element type A
. Likewise, we can define Lists as:
> data List (A : Set) : Set where
> [] : List A
> _∷_ : (x : A) -> (xs : List A) -> List A
That is, lists are either the empty list []
or they are formed as a cons operator, where we have an element x
of type A
, the head, and a tail-list. Together this forms a list.
We now define a really simple function on lists. We will ask if there is an element of the list that satisfies a predicate p
. That is, we ask “Exists an element satisfying the predicate?”
> exists : {A : Set} -> (A -> Bool) -> List A -> Bool
> exists p [] = false
> exists p (x ∷ xs) with p x
> ... | true = true
> ... | false = exists p xs
The function makes use of a with
statement of Agda. with p x
can in this case be seen as a case p x of
where the two possible variants are then handled to the left of the pipe bar |
. The three dots, ...
signifies that nothing else change in the parameters. Generally, the match in Agda is dependent, so the output of p x
could have constrained the other parameters (that is the other parameters depend on the output of the match).
This function is very simple, but it does not yield us a lot of information, should we want to know which element that actually existed. Essentially, we have become blind to some information. On the other hand, if we are only interested in the question of existence, then the function will not produce additional information, we later need to throw away.
Let us add a remedy to our alchemical formulae: We present the find
function:
> find : {A : Set} -> (A -> Bool) -> List A -> Option A
> find f [] = Nothing
> find f (x ∷ xs) with f x
> ... | true = Just x
> ... | false = find f xs
This guy is exists
-one-step-up. Here, when our scrutineer finds a matching element, we return the element itself. Thus, if we need the element for further work, we have it at our disposal. In practice, it may seem more expensive than exists
, but it is not: first, x
we return is a reference, so it has a fixed size. Furthermore, if we don’t use x
, then the compiler is free to declare all code involving producing x
as dead and remove it from the code. Essentially, a compiler like Haskell’s GHC will probably re-obtain the exists
version automatically by optimization.
But the sexiness of this function is still pretty low. We don’t know if there is only a single element matching, or if there are more elements matching. We add another ingredient, and present to you the filter
function:
> filter : {A : Set} -> (A -> Bool) -> List A -> List A
> filter f [] = []
> filter f (x ∷ xs) with f x
> ... | true = x ∷ filter f xs
> ... | false = filter f xs
Now, when the function return, we get back a list of all the guys matching the predicate. Thus, we can just pick up the head and in a lazy language, we would probably not make more work on the rest of the list if we chose to do so. It is yet a generalization in our stack. It provides a view with a lot more information than from before. We can for instance easily detect the case where there are 3 elements satisfying the predicate by pattern matching, and then handle that case specially, should we want to do so.
But we are not done yet. Everything was merely foreplay. Add the next level of abstraction and generality:
> partition : {A : Set} -> (A -> Bool) -> List A
> -> List A × List A
> partition d xs = part d xs [] []
> where
> part : ∀ {A : Set} -> (A -> Bool)
> -> List A -> List A -> List A
> -> List A × List A
> part d [] yes no = yes , no
> part d (x ∷ xs') yes no with d x
> ... | true = part d xs' (x ∷ yes) no
> ... | false = part d xs' yes (x ∷ no)
Partition use a discriminator function, d
, which splits the list into the yes
’s — those matching the predicate — and the no
’s; those that do not. It is interesting we can walk down back the chain easily:
> filter₁ : {A : Set} -> (A -> Bool) -> List A -> List A
> filter₁ p xs = proj₁ (partition p xs)
>
> find₁ : {A : Set} -> (A -> Bool) -> List A -> Option A
> find₁ p xs with filter₁ p xs
> ... | [] = Nothing
> ... | x ∷ xs' = Just x
>
> exists₁ : {A : Set} -> (A -> Bool) -> List A -> Bool
> exists₁ p xs with find₁ p xs
> ... | Nothing = false
> ... | Just _ = true
Which shows how partition
is an aggressive generalization of exists
.
Essentially, this is a question of information you have when using the different functions. With partition, you have the most information: you know exactly which elements matched, and which did not. Again, with enough optimization, it should be possible for a compiler to derive the faster versions if possible. The view this function provides happen to be better than the views the other functions provide. It is, in some sense, the correct implementation.
In Agda, views mean something more specific. A view there is a datatype, with the sole purpose of being evidence of some observation. If we compute some outcome, we shove the resulting information into a data type, the view and then we can later ask the question “Why?” by matching on that type. The Agda tutorials has more details, but the general idea of using a data structure to carry information is what motivated this blog post initially. I also find the relationship between the four functions interesting by itself.
You may wonder: Why do we even bother to have the other definitions around? Mostly, it has to do with optimization hints to the compiler. An implementation like find
tell the compiler we are only interested in a single element, first match and it can throw away other parts of the computation. In other words, it can be seen as a guide to the compiler about our current intent. It is also an intent-hint to the human reader. We make it explicit we don’t need certain parts of the computation, so the reader is free to ignore those.
In some languages, the only discriminator is the boolean test. So you are essentially forced into a model where either you only have exists
or you are to write a specialized version of the above whenever you need access to them. If your language miss Algebraic Data Types or has no construct which can stand in for it easily, then you are doomed to reimplement lots of stuff all the time. Your modularity suffer deeply as a result.
View comments