1. Suppose you have two functions:
    (* Type: int * int -> int *)
    fun f (x,y) = x + y
    
    (* Type: int -> int -> int *)
    fun g x y = x + y
    
    These functions do the same thing, namely add two numbers 'x' and 'y'. They have different types however, and they are called in different ways. 'f' is the usual way in most languages. We take a pair, and return the result. 'g' is curried: it is a function that takes an 'x' and then produces a new function taking a 'y'. The produced function returns the result when given an 'y'. Interestingly, 'g' is more powerful than 'f'. If we write
    (* Type: int -> int *)
    g 3
    
    we get a so called partial application of 'g'. This means we only supply the first parameter to the function. It allows us to optmize with respect to the first parameter and it is a good way to define very general functions and then specialize them into what we need later on. Here is an example:
    (* Type: int -> int -> int *)
    fun h x =
     let val z = expensive_operation x
     in (fn y => y + z)
     end
    
    We have defined a function which carries out some expensive operation between getting its first and second parameter. An application, 'let val foo = h 37' say, will only compute the expensive operation once and each time we use 'foo' we reap the benefits. Moscow ML is a quasi-implementation of the Standard ML programming language. It does not follow the standard completely, but it follows it enough to be usable and to let simple code flow between SML implementations without too much problems. What does not flow that easily is the module system which is treated somewhat differently between Moscow ML and e.g. SML/NJ or MLton. Moscow ML works by taking SML source code through a lexer and a parser into an abstract syntax tree. Then the types of all expressions are inferred in an 'elaboration' phase. Next, modules are stripped and pattern matches are simplified via a pattern match compiler. Now, the code is translated into an internal languauge, Lambda. Lambda is a lambda calculus + stuff. Finally, Lambda is translated into bytecode of a virtual machine. The Moscow ML runtime is written in C and can interpret the bytecode output from the compiler. The engine is neat, but it suffers somewhat from age. Several speedups are readily possible on this machine with relative little effort. The naive implementation of the 'g' function from before follows the semantics directly. Take the 'x', produce the intermediate function as a closure and then invoke the closure on 'y'. It is inefficient to produce the closure however, and we don't always have to! in 'g x y' we know all the arguments, so we can avoid building the intermediate function. Moscow ML relies on a derivative of the ZINC abstract machine as its VM. ZINC was an experiment by Xavier Leroy for writing a virtual machine for the Caml language. It is interesting because curried arguments do not build intermediate closures unless they have to. In other words, using the curried, more powerful, variant of a function has no extra cost in this machine. I want to compile Moscow ML into LLVM bytecode. The idea is to compile from Lambda down to LLVM, which is not that different to what the Ocaml compiler is currently doing. In the process, I would like carry over the ZINC machines good handling of curried arguments. I am taking some preliminary steps to get this done, but don't expect anything wild the next couple of months. This takes a lot of time getting right since I am writing it as a hobby project for the time being. I might be posting some updates later when the project has progressed furhter. References -- if you know any reading which might be interesting, do not hesitate to write me: The ZINC Experiment How to make a fast curry (I love Simon Peyton Jones paper names) The Mosml-llvm code lives here
    2

    View comments

  2. In this series of posts, we want to parse expressions. In order to make it worthwhile and fun, we will begin by placing some constraints on this.

    • We will be using the language Standard ML for describing the approach. Though note that any functional language of a certain power can do this. Converting it to Ocaml, Haskell, Erlang or a lisp should be pretty straightforward once you get the hang of the idea. In some sense, Standard ML is the simplest of the (statically typed) functional languages in that it has almost no fancy tricks you can pull.
    • We will not be using regular expressions. Not that regular expressions are bad, but I want to show that this can be done easily without the use of RE's. The goal here is to show people that there are different approaches to parsing data than just throwing a regular expression at the problem.
    The language we will study has the following simple syntax structure, given in BNF form:
      e ::= x | n | e1 + e2 | e1 - e2 | e1 * e2
    
    The 'x' is a variable. Its value can range over integers (we won't be concerned with the actual range of these integers, altough it is interesting to muse over). The 'n' is some integer constant. The term 'e1 + e2' is an addition of two numbers. Likewise for subtraction and multiplication. Here is a typical expression:
      x + 3 * (y - 5)
    
    Note that the parenthesis are not given in the syntax. We just assume their existence and they work as you would expect: grouping together terms to force evaluation order.

    Writing down the syntax in Standard ML is straightforward:
    datatype exp = Var of string | Const of int | Add of exp * exp
               | Sub of exp * exp | Mul of exp * exp
    
    Hopefully it is clear what is going on here. We have translated the syntax into an ML datatype. To discriminate each construct, we have added constructors 'Var','Const','Add','Sub', and 'Mul'.

    If we can somehow bring strings of the form 'x + 3 * (y - 5)' into this datatype (correctly!) -- and we know what 'x' and 'y' is, then we are able to evaluate the expression. Here is how:

    fun eval_exp env exp =
      case exp of
        Var x        => env x
      | Const n      => n
      | Add (e1, e2) => (eval_exp env e1) + (eval_exp env e2)
      | Sub (e1, e2) => (eval_exp env e1) - (eval_exp env e2)
      | Mul (e1, e2) => (eval_exp env e1) * (eval_exp env e2)
    
    So we destruct the form of 'exp'. If it is a 'Var', we look up the variable in the environment 'env', which we will define shortly. If it is a 'Const', we just return the integer value. Finally, if it is a binary operator, we proceed to evaluate the two sub-expressions and apply the right ML-operator to it.

    We still need to give a meaning to the environment 'env'. We would like to represent it as a map from strings to integers (ie, a function of type 'string -> int'), as our variables are strings. This is straightforward in Standard ML fortunately. Suppose we want to represent the mapping 'x' maps to 2 and 'y' maps to 7:

    exception env_notfound of string
    fun test_env v =
      case v of
        "x" => 2
      | "y" => 7
      | _   => raise (env_notfound v)
    
    We define an exception for the case where we don't know about the variable and raise it if the variable is unknown. In Standard ML '_' stands as a wild-card for "any other unnamed value."

    Let us round up by playing with what we have. If we manually parse the term 'x + 3 * (y - 5)' it becomes:
    val term = Add (Var "x", Mul (Const 3, Sub (Var "y", Const 5)))
    
    So if we try to run the term with our environment from above, we expect the result to be '2 + 3 * (7 - 5) = 2 + 3 * 2 = 2 + 6 = 8'. Here is what our Standard ML system says:
    - eval_exp test_env term;
    val it = 8 : int
    
    In the next part, we will write the parser that allows us to automate the parsing of the term into the datatype of 'val term'
    0

    Add a comment

  3. There is a common saying that Erlangs I/O layer is slow. Bullshit. The etorrent program can easily sustain 70 megabit with less than 5% of CPU usage on my old machines. The reason is that etorrent is shuffling 16k binary blocks around mostly. But that means you have a pointer to each 16k block and you are only a writev(2) call away from the disk. This means I/O is not itself a problem. The problem that people often hit is Erlangs string handling. In Erlang a string can either by represented as an immutable array of 8-bit characters, called a binary; or it can be represented as a list of integers (ie, a linked list of integers). The latter is mutable in the sense that you can generate new integers and update the linked list from the front only. Whenever you mutate the list of integers, garbage is generated. Allocation will happen and this directly translates to more garbage collection pressure later on. Had you had destructible arrays in Erlang it would have been very easy to circumvent the worst performance problems, but you will have to do with a limited zoo of types. In better languages, like Ocaml, Standard ML or Haskell, your type zoo is much richer here. You can choose between several immutable and mutable variants so the leverage is much greater. First trick: Erlang has the concept of an IO list. These are lists of binary/string chunks. They are fast because they lend themselves to a writev(2) in the bottom of the system. Thus you don't have to concatenate strings in the Erlang language but you can hand a quasi-concatenated list to the VM and let C do the heavy-lifting for you. Second trick: Mutating data as strings is slow. But if you have a abstract tree of tuples then it can be manipulated fast. If you think about the command [Bin | IOList] where we add a Binary to the front of an IOList, you'll know it happens in O(1) (constant) time. If we build a tuple {atom, D1, D2, ...}, we can build it in constant time. So the trick is to maintain your data in a non-string form for as long as possible and then only change it to a string at the very last point in time. Third trick: Correct data structure. Ropes. Finger Trees. M-way trees. Don't brute force yourself over a string. Process the structure by pattern matching in a clever way so you minimize the amount of change needed. A good tree will have around logarithmic change whereas a list will have linear change. This leads to the next point: allocation. Another crap problem in Erlang is that you have few (traditional) ways to minimize allocation. Any kind of operation will allocate data if you are not wary of it. Hence you should try to build programs that do not gloss over data and allocate all the time. Mapping of lists via list comprehensions is a good example. It is easy to understand but it will build a new list and garbage collection will cost you. If you can do multiple things in one list comprehension, you win traversals and allocations. Whenever I hear about "Slow string processing" in Erlang I immediately think: "Another Ruby programmer trying to cram his language into Erlang". If you hit slowness the trick in Erlang is to redefine the problem! Don't try to string process yourself into oblivion. If the sole thing your problem concerns is string-processing, go use perl. Please. Or be clever and use Ocaml. Pre-process the data in other languages and make them into erlang terms which can be read fast by erlang. Erlangs power is the concurrency at which it is really good. There were no reason for giving it fast string processing. Also, don't get bitten by the mnesia bug, please. It is almost always the case that mnesia and ETS are the wrong way to go. I got bitten by it multiple times in etorrent and now I am contemplating ripping all of it out again. Plain old data structures baby! There are some nasty caveats to mnesia and ETS and they are not related to the size limit of the database. For mnesia, be aware that a failed transaction will be re-run! Now take 100 transactions tripping over each other. They will never finish. Etorrent did that when peers should update their state. Instant snail-speed application. Mnesia is crap at doing joins. Just forget it. ETS is also problematic: Any store or get from ETS means a data copy. Don't do as etorrent and add large rows (128k+) to ETS. Your app will come to a grinding halt. Don't message big data structures around. Any message incurs a copy (in the default VM). If you need to work with a big data structure, shove it into a process and then send the Pid of that process around or register him globally. You can think of a Pid as "Passing a pointer around" in this case. And finally: Measure measure measure! Be scientific about your performance problems. You need to know exactly what causes it before jumping to conclusions. And don't worry too much about being parallel at the beginning. You can always split things up in finer granularity later.
    7

    View comments

  4. This is a little set of tools that I like to use now and then. I name them by their Ubuntu/Debian package names although I think one can find them easily for any package system out there. apg - The "automated password generator". It generates automated pronounceable passwords. It takes a lot of options as to how the passwords should look and what they must contain etc. It is a handy tool for generating passwords which are hard to guess but still easy to memorize. mp3gain - Mastered music today has different loudness levels. You will search for your volume button all the time because some tracks are much louder than others. This tool can analyze mp3 files and hint about the loudness level so programs can adjust accordingly. Or it can adjust the volume of the mp3 file in a lossless way (using a nice property of the MDCT in mp3s). password-gorilla - A Password safe application written on top of "pwsafe" in TCL. Main advantage of using this is that there are Windows and OSX programs able to read the format as well. stow - Manager for the hell that is /usr/local. You create /usr/local/stow and proceed to compile programs with "--prefix=/usr/local" but run "make install prefix=/usr/local/stow/foo-1.2.3". Now you can cd into /usr/local/stow and call the "stow" command on "foo-1.2.3". This creates a symlink farm in /usr/local. Want to get rid of the program? "stow -D" it followed by deletion. This is nice for handling seamless upgrades/downgrades of non-packaged software as wel. vorbisgain - Like mp3gain, just for vorbis files. graphviz - Hate drawing diagrams? Now you can program diagrams instead! Graphviz takes a description of a graph (not a plot!) and proceeds to build the graph for you. It uses some really good layout algorithms so you often get extremely neat graphs if you know a bit about the tool. Most kinds of graphs can be drawn easily. There are also tools for doing post-processing of drawn graphs if you want to do things a bit different than what the tool thought - but you rarely need that. It got a Cairo rendering backend and can output to PDF, PS, PNG, SVG, etc. The real power of graphviz is when your graph changes a lot. You just add vertices and edges and the system figures out layout itself. It is way easier than editing the graph in tool most of the time.
    0

    Add a comment

  5. Most system admins know about rsync. It is a really good tool to synchronize between different machines because it only moves data that happen to be changed. It lacks some things however. One of the things it is lacking is interactivity. Unison is a tool written by Benjamin C. Pierce and a couple of other guys. Pierce is a type theory researcher and he likes writing ocaml programs. But you should not let that put you off Unison. It uses the rsync protocol down below ;) When it sync's 2 sites it will ask you which way to propagate changes. It is also much more clever when it comes to shortcutting work. It can, for instance, detect that a file was moved so it has a local copy of that file it can use in the sync process. My preferred usage is to sync. directory contents between machines I work on. It often happens that I have worked on the machines in a way such a 2-way sync is needed. And Unison can give you that right away while it will be pain in rsync. Pierce now works on the Boomerang language which is really interesting. But I'll save that for a later post.
    0

    Add a comment

  6. One problem that might occur in long-running programs is that of memory fragmentation in the VM-heap. The problem is pretty simple: when you allocate and deallocate memory in the program you might end up with small "holes" all over your heap which are too small for the new data you need. In a page-allocated VM-heap world, this has a serious cost in memory usage. There are several ways to avoid the problem. The first one is to be aware of the problem and be smart when allocating memory. With the right amount of thought you can often get around the memory fragmentation, or at least minimize it. The second trick is to restart the application periodically. It is not as bad as it sounds in a UNIX system. You set up the application and fork() off a child to do the hard work. When the child has been running for some time, you kill it and fork() off another from the (non-fragmented) parent. Apache 1.x used this. The third trick is to allocate a big region of memory, keep a pointer to the first unallocated word and then reset the pointer when you are done with your request and don't need the memory anymore. Interestingly there has been considerable research in the area to attempt to automate this idea. It is known as "Region Inference" in the automated setting, but here it is used in a manual way in e.g. C to achieve the same thing. The Subversion project used a region handler some years ago when I looked at their code. I don't know if they still do however. Poul Henning Kamp uses the reset-trick in Varnish as well. The fourth trick is to use a good garbage collector. Garbage collectors have the advantage that they are allowed to move data around, so the good ones compact live data to one end of the heap periodically and clears up fragmentation in the process. Note that most garbage collectors used in "scripting languages" like Python or Ruby are pretty weak. They usually avoid doing "real" garbage collection and opts for some simple poor-mans solutions. It is unfortunate because this puts GC in a bad light. That and Java-enterprise-behemoths abusing the VM-heap :)
    0

    Add a comment

Blog Archive
About Me
About Me
What this is about
What this is about
I am jlouis. Pro Erlang programmer. I hack Agda, Coq, Twelf, Erlang, Haskell, and (Oca/S)ML. I sometimes write blog posts. I enjoy beer and whisky. I have a rather kinky mind. I also frag people in Quake.
Popular Posts
Popular Posts
  • On Curiosity and its software I cannot help but speculate on how the software on the Curiosity rover has been constructed. We know that m...
  • In this, I describe why Erlang is different from most other language runtimes. I also describe why it often forgoes throughput for lower la...
  • Haskell vs. Erlang Since I wrote a bittorrent client in both Erlang and Haskell, etorrent and combinatorrent respectively, I decided to put ...
  • A response to “Erlang - overhyped or underestimated” There is a blog post about Erlang which recently cropped up. It is well written and pu...
  • The reason this blog is not getting too many updates is due to me posting over on medium.com for the time. You can find me over there at thi...
  • On using Acme as a day-to-day text editor I've been using the Acme text editor from Plan9Port as my standard text editor for about 9 m...
  • On Erlang, State and Crashes There are two things which are ubiquitous in Erlang: A Process has an internal state. When the process crashes,...
  • When a dog owner wants to train his dog, the procedure is well-known and quite simple. The owner runs two loops: one of positive feedback an...
  • This post is all about parallel computation from a very high level view. I claim Erlang is not a parallel language in particular . It is not...
  • Erlangs message passing In the programming language Erlang[0], there are functionality to pass messages between processes. This feature is...
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.