1. Agda musings on Huttons razor

    So Ralf Lämmel, also known as “Professor Fish”, defined a small language of which he took full copyright. The language was used as an assignment for students hacking Haskell and they were to carry out two alterations on the language in question. First, they were to give it a CPS style semantics and next, they were to define an analysis determining when expressions in the language were pure.

    So far so good, but I am kind of tongue-in-cheek when I see tasks like these. And I wanted to play a bit with the idea. The first decision was to throw out Haskell as a tool. No, if we will have types, then let us get some real types. Dependent types. And let us require that all our programs terminate while we are at it. In other words, employ Agda2 to solve the task.

    We begin by aptly naming the module as fish

    module fish where
    

    in honor of the grand professor who devised this magnificent language we are to present shortly. To define the language, we will need to pull in some Agda libraries for our use. If there is anything I’ve missed in my Twelf-hacking, it is the ability to use libraries defined by others when I am writing code.

    open import Function hiding (const)
    open import Data.Bool
    open import Data.Nat
    open import Data.Product
    open import Data.Sum
    open import Relation.Binary.PropositionalEquality
    

    At this point, we get to define the language in question. We begin with a simple definition of a local module in which to define the language:

    module HRazor where
      data Exp : Set where
        Const : ℕ -> Exp
        Plus  : Exp -> Exp -> Exp
    

    What is this? This language is in a HRazor module. While it seems Professor Fish took copyright on the language, we know that Graham Hutton already took a patent on this language and dubbed it Hutton’s razor. We will shortly see that the language is different from the one which Ralf had in mind eventually - but the Razor is at the heart of it.

    In Hutton’s Razor, we have expressions defined by Exp and these have exactly two constructors. The first constructor, Const takes a natural number and returns an expression. Note how Agda is a Unicode-aware language. This is an idea that will slowly creep upon you until you begin liking it. It signifies constants in the language. The other constructor, Plus adds together two expressions. This is the full language.

    Is the language silly? Most decidedly not! Suppose you are researching some new idea in language theory. The Razor described above is about the smallest language you can define that does something meaningful. So your idea better work for the razor. Otherwise, you can as well ignore getting the idea working for harder problems.

    Here is the function for (denotational) evaluation of the language:

      eval : Exp -> ℕ
      eval (Const y) = y
      eval (Plus y y') = (eval y) + (eval y')
    

    which for most programmers of functional languages should be pretty straightforward. Essentially, we turn the expression tree into an evaluation in Agda.

    Ralf adds a construct

    Ralf’s assignment does not revolve around Huttons Razor. After all, Hutton has a patent (which he graciously grants every language researcher free use of). Rather, Ralf augments the razor with a Exit construction:

    data Exp : Set where
      Const : ℕ -> Exp
      Plus  : Exp -> Exp -> Exp
      Exit  : Exp -> Exp
    

    The informal meaning of Exit is this: Exit e will evaluate the computation e and then abort the rest of the program, returning e as its value. To reflect the idea that the program may be in two possible states now, either computing, or exiting, we need a way to tag what state we are in. Haskell people have the Either a b type, but in Agda, this is a disjoint sum:

    R = ℕ ⊎ ℕ
    

    Ralf generously provided some helper functions defining an algebra for how to compute with the disjoint sum so we translate these to Agda. Constant expressions are injected to the left in the sum:

    const : ℕ -> R
    const i = inj₁ i
    

    Addition compares the two sides. It propagates right injections up, but computes on left injections. This is as Ralf does in his post.

    add : R -> R -> R
    add (inj₁ x) (inj₁ x') = inj₁ (x + x')
    add (inj₁ x) (inj₂ y) = inj₂ y
    add (inj₂ y) y' = inj₂ y
    

    Finally, if we want to exit, we convert a left injection to a right one, and just pass on the right injection.

    exit : R -> R
    exit (inj₁ x) = inj₂ x
    exit (inj₂ y) = inj₂ y
    

    The neat thing about defining an algebra like this for R-computations is that now the evaluation function is almost as simple as in the Razor:

    eval : Exp -> R
    eval (Const y) = const y
    eval (Plus y y') = add (eval y) (eval y')
    eval (Exit e) = exit (eval e)
    

    The only new thing is the Exit case which formalizes our informal definition above: compute eval e and exit with it.

    So what is the point?

    Well, this is Agda, so we can analyze the language we have defined for some very simple properties easily. Remember that I said that Agda programs are required to terminate. This comes in very handy when we want to prove that our language does, because we have a definition of it in Agda.

    If Agda’s termination checker says that the Agda-program terminates, so does our Ralf augmented-Razor. In other words, we immediately have the property. Neat. In the same vein, our language must be deterministic. These properties can be formalized in Agda and proven as well, but they are immediate.

    Task 1, continuations

    Next, we CPS-transform our semantics. We could naively transform the language, above into CPS and then provide that, but that is a boring solution. Nay, the cool thing about CPS is that it makes the control flow of the program explicit. We can exploit this explicit control-flow to optimize the case of Exit expressions in the program.

    module NotNaive where
    

    Ok, so we throw everything into a submodule, NotNaive, to limit what is in our scope at any point in time so naming is not going to trip us up later.

      constK : ℕ -> (ℕ -> ℕ) -> ℕ
      constK i k = k i
    

    This is part of our algebra as before. If we have a constant, how do we handle it? We add another parameter, the k above which is the kontinuation. In the case of the Razor-extension, we can regard k as “the rest of the program”. It has type (ℕ -> ℕ) which sort-of makes it a transformer on natural numbers. So if we are to deliver the natural number i to the kontinuation, we apply k i.

    Additions take two values and delivers the addition to the kontinuation (which still is, intuitively, the rest of the program):

      addK : ℕ -> ℕ -> (ℕ -> ℕ) -> ℕ
      addK v1 v2 k = k (v1 + v2)
    

    In the Exit case comes the trick. We do the following

      exitK : ℕ -> (ℕ -> ℕ) -> ℕ
      exitK i k = i
    

    which is to not deliver the value to k but just return it straight away. This means that we will abort the evaluation of “the rest of the program” and just return the exit value straight away.

    We now turn our attention in this language towards the evaluation function in the language. Constants are handled in the obvious way by using the algebra directly, but the two other cases are handled differently:

      evalK : Exp -> (ℕ -> ℕ) -> ℕ
      evalK (Const y) k = constK y k
      evalK (Plus y y') k = evalK y (\v1 -> evalK y' (\v2 -> addK v1 v2 k))
      evalK (Exit y) k = evalK y (\v1 -> exitK v1 k)
    

    Take the Exit-case first. We evaluate the y like in the direct case. But then, we need to supply a continuation which says what the rest of the program does. Our continuation assumes it gets passed a value v1 which is the result of the y computation. It then exits with this value. The Plus case is similar, but there are two continuations to juggle. We evaluate y and deliver it to a continuation that first evaluates y' and then delivers to the final continuation, using our addK function from above.

    To run the program:

    run2 : Exp -> ℕ
    run2 e = NotNaive.evalK e id
    

    note we pass in the id identity function to “start up” the evaluation. This is in balance with what Ralf did. Again, Agda’s termination checker immediately provides termination guarantees for program.

    Task 2: A Taste of dependent types

    The main reason I set out to do this however, was to utilize the dependent types Agda has built-in. There are something eternally pleasing in defining programs which do not need to compute to answer questions, and here we will see an example of it.

    The task is to create a function pure taking an expression and returning true if the expression is pure and false otherwise. Purity is defined as absence of an Exit expression. New module:

    module DepAnalysis where
    

    Now, dependent types are more than what the following claims it to be, I am aware. But as a first foray, it is quite mind-boggling in itself and serves as a great simple introduction. If we define

      data Exp' : Bool -> Set where
    

    we have declared a dependent datatype Exp' which has a Bool value attached to it. This means we have two types of expressions: Exp' true and Exp' false. These are different types from the view of the type checker. You can read Bool -> Set as “if we supply a boolean value to an Exp', then it will yield a Set - Set being the elements of the Agda”world" we are working in. So we can just assume in our world, that if an expression is pure, it is Exp' true and otherwise, it is Exp' false.

        Const : ℕ -> Exp' true
    

    This constructor tells Agda that if we supply the Const constructor with a natural number, it will hand us a pure expression. Seems in order, sir! Next constructor:

        Exit  : ∀ {c} -> Exp' c -> Exp' false
    

    This can be read as “if you give my any expression, pure or not - I don’t care - then I will give you a non-pure expresssion”. This makes sense. There is an Exit in the expression, so it better not be pure. It doesn’t matter what the internal expression is.

        Plus  : ∀ {c1 c2} -> Exp' c1 -> Exp' c2 -> Exp' (c1 ∧ c2)
    

    In the final case, Plus we get to supply two expressions. These two expressions have c1 and c2 as their purity-values. Agda will hand us back an expression which is pure if and only if both subexpressions are - due to the conjunction (logical and) in the returned expression.

    What have we done here? We have created a dependent data type which will itself track if an expression is pure or not. As we build up the abstract syntax tree, the system will make the calculations necessary. Notice, that if the whole AST is known at compile time, then so is the purity of the expression - right away.

    For completeness, we supply the required function pure. Here is the type:

      pure : {b : Bool} -> Exp' b -> Bool
    

    It introduces a new concept, namely implicit parameters. The type {b : Bool} names a parameter which is implicit. We can normally omit implicit parameters when writing functions because it can be extracted from another type in the specification (in this case from Exp' b which contains the b). However, if we need access to the implicit parameter, we can positionally match it by wrapping it in brackets like this:

      pure {x} e = x
    

    So the purity of the expression is already there, we just need to return it! I personally think this is awfully neat.

    I want to play more

    The Agda wiki contains tutorials, the Agda implementation, and all the other cool information on how to play with Agda. The development here is in a gist on github should you want to play with. Note that the development I made hardly qualifies to show off the really cool features of Agda but only scratches (barely) the surface of what is possible. My intuition is, however, that we will see much more dependently typed programming in the future, when everyone is programming functionally :)

    Commentary is as always welcome.

    0

    Add a comment

  2. Errlxperimentation: Clustering of Erlang Error Reports.

    If anyone wonders what I hacked over the holidays, this is it :)


    In the Erlang/OTP system, there is a registered process with the name of error_logger. Whenever an error happens in the Erlang system by a correctly configured OTP process, it will send reports to the error_logger. The logger is then responsible for taking the report and in a pub-sub manner send it to any error handler currently configured.

    One of these error handlers, log_mf_h, is a multi file log handler. It writes reports in binary term format to a set of files. Whenever a file reaches a certain size, it is rotated and a new file is written. The result is that we essentially have ring-buffer of all errors that occur in a running Erlang system.

    A tool named rb the “Report Browser” allows us to read in the binary logs and process them. We can list certain errors, “grep” among them, apply filters to the reports and so on. It gives us a system for looking into a running Erlang system for errors — and handling them if we deem them to be problematic. This tool is very neat when you have a running system to inspect. You can basically leave it on for some time and then go look into the log-mf files for errors and crashes. Since the file is rotated on size the only worry is that important errors get rotated out of the log. But if you have enough log space, this won’t happen. In a modern world, you can easily leave 100 Megabytes of error logs. I have taken the leisure of having a modified log_mf_h which can compress term that goes to disk on the fly. This of course improves the size of error report sizes even more.


    The experiment

    An experiment that has been on my mind for some time was can I possibly do automatic clustering of similar error reports so like reports are clustered together? If this is possible, I could then just pick a representative for each error cluster and use that as a basis for handling like errors. Even if you have thousands of reports, it may be that there are only a few of them which are really different and a lot of them which are redundant.

    I know nothing about clustering. But a question later on StackOverflow and after a kind soul on the #erlang IRC channel hinted me in the direction of the paper Ephemeral Document Clustering for Web Applications (2000) DOI: 10.1.1.59.131 I was set for an experiment with this. The algorithm implemented here is a variant though, mostly due to getting something fast. So it does not implement the full paper. We implement the ideas from there and the paper can be consulted if anything below is unclear (or comment, I'll try to explain).

    I altered rb to rerb, short for the riak_err report browser. The Basho guys have this excellent alternative to the sasl application built into OTP, see riak_err and I used that as a starting point. I am not even sure riak_err is the right place for this, but at least this is what I did for now.


    Processing

    The first part is now to convert an error report. They usually look something along the lines of [{badarg, MFA1}, MFa2, ..., MFaN] where each MFA is a tuple {Module, Function, [Arg1, ..., ArgN]} — and each MFa is a tuple {Module, Function, Arity} (arity being the number of parameters of the function). Note that Erlang errors are Erlang terms, so they are easily manipulated by our code. It does make them harder to read for newcomers however.

    First, we linearize and canonicalize the input. We change the tuple to [badarg, {Module, Function, length([Arg1, ..., ArgN])}, MFa2, ..., MFaN]. In other words we flatten the error term and we change the occurrence of MFA to MFa. As an example we may have an MFA [{badarg, {ets, insert, [table_foo, {123, something}]}}] and this is changed into [badarg, {ets, insert, 2}]. This ensures that we don’t care about the difference of arguments, but only care about the arity of the function and the call stack.

    Next, suppose an error report has the structure [A, B, C] for arbitrary terms A, B and C. We now form bigrams ['#', A], [A, B], [B, C] and [C, '#']. The '#' is just an atom we use as padding for the boundary terms. For each of these we now run the erlang:phash2/2 function to obtain a hash of each bigram: [{phash2(Bg), 1} || Bg <- Bigrams]. Note we also smacked the hash into a tuple with a value of 1. This list of {Bigram, 1} pairs forms a vector in an (euclidian) vector space where the bigram part names a dimension in the space. If we form bigrams like this for all error reports, and take the union over all bigrams we have a set of bigrams defining the basis of the space.

    Now we define a metric called similarity on two vectors, A and B, of the space by (A . B) / (||A|| * ||B||) = cos(t) where . is the dot-product and ||x|| is the norm of the vector x. This is equivalent to the cosine of the angle t (theta) between the two vectors A and B. Now, if two vectors point in the exact same direction, the angle is 0 and the cosine is 1, i.e., perfect similarity. If the two vectors are orthogonal to each other, the similarity will be 0.

    The similarity of a set of vectors is had by taking the minimum of any two vectors from the set. It to a certain extent measures how “different” the cluster is because the minimum suggest the most dissimilar result is chosen.


    Clustering

    We form a forest of singleton clusters, and place each vector/error-report in its own cluster. Then we consider similarity of each pair of clusters. The two clusters with the highest similarity are joined. The joining also forms a binary-tree, a dendogram, of the two clusters. Whenever we join two clusters, the original clusters are removed. So whenever a join is executed, we are left with one less clusters. Recursively repeat this until we have a binary tree. This forms the cluster dendogram.

    Listing the clusters is done by choosing a similarity, 0.5 say and then walking the tree until a similarity of 0.5 is reached. The subtree is then flattened into a cluster of like error reports.


    Looking forward

    The code is in an experimental branch of my github repo, jlouis/riak_err/n-gram-analyzer. Note I currently make no guarantees of its stability currently. I may rebase it still, so poke me if you want to base something on it. Currently you can only configure the riak_err config tree like:

    {riak_err,
     [{log_mf, true},
      {error_logger_mf_dir, "log/mf"},
      {error_logger_mf_maxbytes, 10240000},
      {error_logger_mf_maxfiles, 10}
     ]},
    

    and then run rerb:start(). followed by rerb:analyze() to analyze the reports. Listing the clusters are done by rerb:list({clusters, Similarity}) where the similarity is between 0.0 and 1.0.
    The code not seen many wars yet, so it is probably quite unstable.

    Another important point is that the algorithm used is a simplification of what the above linked paper does. In particular the algorithm is at least O(n^3) though I think one can pretty easily lower that to O(n^2 lg n). Also, there are some constants that can be shaved down to improve the clustering speed (It is not that fast at the moment since it walks lists all the way). Finally, it is not parallel though some parts of it lends itself to a fork/join style speedup.

    I’d also want to provide the clustering tree in a raw way, so you can run arbitrary functions over it. I’ll probably implement a variant that for each cluster tells the size of it and picks three random representatives for it.


    Final words

    The clustering used here seems to work wonderfully well for the error reports I tried it on. It happily clusters up like error reports and different clusters contain different error reports. The Bigrams ensure that we grab the call-path rather than just a presence in a function. We only operate on the stack trace, but we could be operating on more of the error-report.

    Also, I am not too sure riak_err is the right place for this. I did some alterations to log_mf_h and rb which
    • Enables error reports above 16Kb in size. No limit-function is employed yet, though it would be preferable. I could imagine a report with the state of a large heap being pushed.
    • Use compression on terms. Disk IO is slow compared to in-memory compression.
    • Fixes some redundancy in the rb grep code by making the code into a match compiler. This patch can probably flow into OTP after some more testing.
    • Use binary notation. The rb and log_mf_h code is probably so old Erlang did not have binaries at that time.
    Some of the ideas here are inspired by analysis of stack traces in other languages, Java in particular. But perhaps some of these ideas are applicable to log files in general.

    Finally, any comments, positive/negative are welcome. Suggestions for improving the ideas here are also much appreciated.
    2

    View comments

Blog Archive
About Me
About Me
What this is about
What this is about
I am jlouis. Pro Erlang programmer. I hack Agda, Coq, Twelf, Erlang, Haskell, and (Oca/S)ML. I sometimes write blog posts. I enjoy beer and whisky. I have a rather kinky mind. I also frag people in Quake.
Popular Posts
Popular Posts
  • On Curiosity and its software I cannot help but speculate on how the software on the Curiosity rover has been constructed. We know that m...
  • In this, I describe why Erlang is different from most other language runtimes. I also describe why it often forgoes throughput for lower la...
  • Haskell vs. Erlang Since I wrote a bittorrent client in both Erlang and Haskell, etorrent and combinatorrent respectively, I decided to put ...
  • A response to “Erlang - overhyped or underestimated” There is a blog post about Erlang which recently cropped up. It is well written and pu...
  • The reason this blog is not getting too many updates is due to me posting over on medium.com for the time. You can find me over there at thi...
  • On using Acme as a day-to-day text editor I've been using the Acme text editor from Plan9Port as my standard text editor for about 9 m...
  • On Erlang, State and Crashes There are two things which are ubiquitous in Erlang: A Process has an internal state. When the process crashes,...
  • When a dog owner wants to train his dog, the procedure is well-known and quite simple. The owner runs two loops: one of positive feedback an...
  • This post is all about parallel computation from a very high level view. I claim Erlang is not a parallel language in particular . It is not...
  • Erlangs message passing In the programming language Erlang[0], there are functionality to pass messages between processes. This feature is...
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.