1. A response to “Erlang - overhyped or underestimated”

    There is a blog post about Erlang which recently cropped up. It is well written and puts forth some critique of the Erlang/OTP language. Naturally, I have a bias. I write a lot of Erlang and I like the language - and anything less then a blog post myself would not be fair: There is much to discuss and a fleeting Twitter comment or a comment below the original post can’t convey the information needed.

    Erlang is like an exotic beautiful woman with no dressing sense.


    I Love this comment from the article. There is truth in it: underneath the clothes of Erlang, beauty is hiding. Yet, I feel that one might have misunderstood the dress code in the exotic world, which is why I sat down, C-x C-f’ed a new buffer and began writing.

    1. Today’s mainstream developers who are used to C or Java like syntax wont find its Prolog-like syntax too friendly.


    This point, the syntax point, comes up all the time. The claim is that Erlangs syntax is too far away from “mainstream” languages, whatever that means. I don’t think the critique is warranted, but since we have alternative languages like Efene and Reia there are a least some people who think Erlang has a syntax problem, so it warrants discussion.

    Perhaps, one should be nuanced and hammer through the difference between syntax and semantics. Syntax is, roughly, the rules for forming valid sentences in the programming languages. Whereas semantics roughly is the meaning of the language, what it denotes. In other words, syntax are rules for transforming a valid input text into an internal parse tree — semantics the rules for executing that parse tree, running the program.

    There is a tendency to focus more on Erlangs syntax than its semantics. I think this is partially wrong: the semantics shape the syntax and vice versa. I also have a hunch that people may claim a problem with the syntax of Erlang, where the point is really a misunderstanding of its semantics.

    Is Erlangs Prolog-like syntax unfriendly? I don’t think so! Erlang has extremely few syntax rules and they are quite consistent. The grammar is less than 500 lines. Contrast with the Ocaml parser 3 times as large. The Go programming language has a parser which is well above in size as well - I cannot imagine the parser for Java nor C be any smaller.

    The main objection is familiarity: “It doesn’t look like Java!” I think the point is somewhat moot. Python doesn’t look like Java. Neither does Ruby or Perl. Still programmers have no trouble picking up those languages. Before the C-style languages became dominant, programmers wrote Pascal, COBOL and Algol whose syntax is far from what we expect.

    I expect far more people have trouble with the unfamiliarity of the semantics. If you come from an imperative setting, you need time to wrap your head around functional programming styles and idioms. Yet, unfamiliar semantics should pose no problem either: Python, Javascript, PHP and Java all execute very differently if you look at them modulo imperative execution.

    2. While the core language itself is small and easy to learn, the libraries within the language are inconsistent, incomplete and poorly documented.


    The Erlang/OTP libraries suffer from inconsistency having been built over time whenever there was a need for a new function. This is indeed unfortunate, but note this: apart from a few libraries which implement their functionality directly in the Erlang VM kernel, most libraries are written in pure Erlang and can be replaced easily. If you hate the list module, you can write your own lst.

    Some libraries are de-facto deprecated as it is known they have certain limitations. The way Erlang tend to work however is that older modules known to have trouble are removed fairly slowly from the language - there may be a user out there, however poor that module is.

    I cannot agree with the bad documentation. Erlang has excellent documentation. There a man-pages for every module, accessible with erl -man lists for instance (provided your distribution of Erlang is correctly built) and we have several online places where you can look up function definitions. In addition many modules have users guides which you can use to get started quickly. There even is an efficiency guide so you know how to write efficient code and there is a set of programming conventions with good advice on how to structure your programs.

    Usually the documentation of functions are rather good I think. If you find something which you think is poorly documented, I’d advice you to make a patch against the documentation and discuss the change with others. Improving documentation is more important than ranting on its inadequacy.

    3. Only a few people have written production level codes and you rarely get to hear from them.


    I have written production level code in Erlang, namely a BitTorrent client good enough for everyday use. Our performance is currently as good as most other BitTorrent clients, CPU and memory-wise, despite we are writing the client in a “slow” language. I have also written programs professionally in Erlang - for the Web, but I am getting ahead of myself.

    I try to write about my experiences - in part to tell stories, in part to educate and encourage others to pick up the language. No programming language is a silver-bullet. But when your task is to write highly concurrent programs Erlang shines. And if you look at the usual protocols for distributed computing on the web, http-servers, xmpp-servers, ad-serving frameworks, payment processing systems, BitTorrent-clients, video-servers, and so on — you find that many of them are highly concurrent.

    Successful systems that just work will not catch headlines. A system that is shock-full of errors will. Many Erlang programs are running in companies with no open-source policy as well - don’t expect the programmers of those systems to even be able to talk about what they do. It is a competitive advantage to keep the mouth shut.

    4. I can’t imagine how you can organize large code-bases in Erlang or even work as team, and this doesn’t feel right to any OO programmer.


    Large Erlang programs are structured around the concepts of applications and releases. You write several isolated functional units, applications. Then you bundle these into a release which you can ship to the production environment. A typical application will provide an abstraction for something, be it running an HTTP server, talking to a database, controlling an external CPU-centered numerical program and so on.

    The organization of large programs hinges on API-design. You want to design your program such that each application has a small API used by the other parts of the program. There may thousands of code lines behind the wall of the API, yet the interface to the rest of the world is small.

    The trick of OO-languages is “abstraction is had by introducing another object”. If you take a look at the OO-design patterns, you will find that often a new object is introduced to mediate and solve an abstraction problem. This because the only way to abstract in those languages are to construct new objects (or classes, depending on lanuage).

    In Erlang, the mantra is “abstraction is had by introducing another process”. In other words, you can usually solve abstraction-problems by introducing a middle-man mediating process, store state in a separate process, split a computation between several processes and so on. The OO-property of isolation, much important to structuring large programs, is present in the isolation of processes: you can’t go rummaging around inside the memory heap space of another process, you must ask it gently and nicely.
    Naturally, this model has a design-pattern language as well and I know of several such. Remember this: “In OO-languages state is kept in objects; in Erlang, state is kept in processes”.

    5. Most of the performance matrices are one-sided, and are performed by people who have an interest in Erlang. I would love to see some independent analysis.


    In general, you should be wary of performance measurements where one does not fully understand the platforms they are working with. It is hard to make a program perform better but it is extremely easy to make a program perform worse. To improve a program you must understand the rules of the game you are playing. The rules used to speed up, e.g. node.js are much different from the rules used in Erlang. And that is different again from Ocaml, Scala, Java, Python, …

    Also, workarounds for problems tend to be vastly different. A recent blog post of mine lays bare a curious property of node.js but the seasoned Node programmer understands how to work around it. He or she may deploy the multi-node for instance and it fixes a lot of the problems by using a single accept() queue for several workers. This is a nice way to break the rules of the game to avoid a problem.

    Rather than thinking in terms of performance, I would argue you should think in terms of what your problem is. Erlang shines when a fully-preemptive, heavily concurrent process model is a good solution to your problem. It is powerful in that respect and it has the advantage it is a very mature system.

    6. Its support for web-development is very primitive. With web frameworks like rails and grails, there is a lot of serious work for Erlang if it ever intends to go to that market.


    I don’t think this is true. Web frameworks like Rails or Grails only talk about half of the web world. Clients in modern systems tend to be Javascript-heavy and only go to the server for their RPC Ajax-like requests. In this world, you need a lot less tooling at the side of the server. There are many web framworks popping up for Erlang currently, but let me plug the nitrogen project.

    Yet, I think we will see much less need for web-frameworks as they were. We will need a new kind of framework which is much easier to work with server-side. And I think node.js shows the server doesn’t need a lot of stuff to be effective.

    You should also think about the emerging alternatives to RDBMs data storage. There are systems such as couchdb and riak, which can cleverly bypass some of the usual Model-View-Controller pattern. I think we are in for a change in the way we do web development for the better and that Grails and Rails are a thing of the past if they don’t adapt to the new world (I am sure Rails will - but have to little experience with Grails to know if they stagnate or not).

    7. Did I talk about Strings in Erlang? IO speed?


    This single item is a blog post worthy in itself. First the strings.

    A string in most languages is a sequence of bytes, pointed to by reference. In some languages the string is the central data you pass around and in some, it is the only kind of data you could pass around. Haskell and Erlang most notably defines a string to be a list of characters and integers respectively. There is much good and bad to be said about this choice - but it hurts people who don’t understand the difference.

    Most web systems manipulate strings. The string is the ubiquitous data format: it stores integers, it stores floats, it stores XML documents, JSON, and such. The string is easy to transfer over a TCP/IP stream. It is no surprise that many languages center around string manipulation and are very effective at it. Perl is perhaps the ultimate string processor (apart from crm114, naturally).

    The ubiquity of strings are also their Achilles-heel. The type information they carry is weaknonexistent to be precise. To manipulate a string in any statically typed language, Java, Ocaml, Haskell, etc., you need to transform it into an internal format. You process the string to an Object-hierarchy or an algebraic data type — and then you throw the string away! The new representation has all the advantages: it is typed, it can carry additional information in object state, and it can make illegal states unrepresentable.

    You should never ever work directly with strings if performance matter. Even simple things like string comparisons may be fairly expensive (if the pointer comparison says different), whereas an atom comparison is not. The world of programming is more complex than just shoving every piece of data into a string.

    Another weakness of the string is that the representation only answers to query by regular expression, recursive descent or LALR(1)-parsing. Some languages are very good at the former, regex query, but Erlang is not one of them since regular expressions are not built into the syntax and semantics.

    So the first virtue of the Erlang programmer: Convert a string as fast as possible into an erlang-term() and then manipulate the term. Only work with crappy weakly-typed strings at the border of your application. An Erlang application should not constrain itself to work with only a single data type, namely strings!

    The second virtue follow fast: If your string is large, use a binary() for effective storage and sharing. The binary representation, like the ByteString in Haskell, is as space efficient as C and it can be pattern matched if needed.

    The third virtue is: Know thine iolists. When you construct strings in Erlang, you are not to build a sequence of characters! You should be building a tree of small string-like fragments, binaries, other trees, lists and so on. The output functions know how to effectively walk the tree and stream it to the output device.

    The IO performance of Erlang is pretty good. I easily had some early tests in Etorrent moving 700 megabit on a single 1.2Ghz Pentium M CPU. Without any optimization at all.

    Yet, it is important to notice that IO in Erlang is abstracted by default and this makes it a tad slower than what it can to be. The abstraction is rather nice and has to do with distribution. You can access a socket or file on another machine as if it is locally accessed. But this neat abstraction naturally has an overhead. Of course it is easy to build a primitive which throws away that abstraction if needed. And that will definitely run as fast as any other language.
    9

    View comments

  2. Differences between Node.js and Erlang

    Suppose we have a canonical ping/pong server written in Node,
    var sys = require("sys");
    var http = require("http");
    http.createServer(function (req, res) {
      res.writeHead(200, {"Content-Type": "text/plain"});
      res.end("Hello, World\n");
    }).listen(8124, "127.0.0.1");
    sys.puts("Server running at http://localhost:8124");
    
    We can run this server easily and test it from the command line:
    jlouis@illithid:~$ curl http://localhost:8124
    Hello, World
    
    And it does what we expect. Now suppose we do something silly. We make a tiny change to the Javascript code:
    var sys = require("sys");
    var http = require("http");
    http.createServer(function (req, res) {
      res.writeHead(200, {"Content-Type": "text/plain"});
      res.end("Hello, World\n");
      while(true) { // Do nothing
      }
    }).listen(8124, "127.0.0.1");
    sys.puts("Server running at http://localhost:8124");
    
    Now, the first invocation of our test works, but the second hangs:
    jlouis@illithid:~$ curl -m 10 http://localhost:8124
    Hello, World
    jlouis@illithid:~$ curl -m 10 http://localhost:8124
    curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received
    
    This should not surprise anybody. What we have here illustrated should be a common knowledge. Namely that Node is not preemptively multitasking but is asking each event to cooperate by yielding to the next one in turn.
    The example was silly. Now, suppose we have a more realistic example where we do work, but it completes:
    var sys = require("sys");
    var http = require("http");
    http.createServer(function (req, res) {
    res.writeHead(200, {"Content-Type": "text/plain"});
      x = 0;
      while(x < 100000000) { // Do nothing
        x++;
      }
    res.end("Hello, World " + x + "\n");
    }).listen(8124, "127.0.0.1");
    sys.puts("Server running at http://localhost:8124");
    
    We introduce a loop which does some real work. And then we arrange for it to be non-dead by requiring it in the output. Our server will now still return, but it will take some time before it does so.
    Let us siege the server:
    jlouis@illithid:~$ siege -l -t3M http://localhost:8124 
    ** SIEGE 2.69
    ** Preparing 15 concurrent users for battle.
    The server is now under siege...
    [..]
    
    For three minutes, we hammer the server and get a CSV file, which we can then load into R and process.


    Erlang enters…

    For comparison, we take Mochiweb, an Erlang webserver. We do not choose it specifically for its speed or its behaviour. We choose it simply because it is written in Erlang and it will context switch preemptively.
    The relevant part of the Mochiweb internals are this:
    count(X, 0) -> X;
    count(X, N) -> count(X+1, N-1).
    
    loop(Req, _DocRoot) ->
      "/" ++ Path = Req:get(path),
      try
        case Req:get(method) of
          Method when Method =:= 'GET' ->
        X = count(0, 100000000),
        Req:respond({200, [], ["Hello, World ", integer_to_list(X), "\n"]});
          [..]
    
    It should be pretty straightforward. We implement the counter as a tail-recursive loop and we force its calculation by requesting it to be part of the output.
    erl -pa deps/mochiweb/ebin -pa ebin
    Erlang R14B02 (erts-5.8.3) [source] [64-bit] [smp:2:2]
    [rq:2] [async-threads:0] [hipe] [kernel-poll:false]
    1> application:start(erlang_test).
    {error,{not_started,crypto}}
    2> application:start(crypto).     
    ok
    3> application:start(erlang_test).
    ** Found 0 name clashes in code paths 
    ok
    4> 
    
    Notice that we get both my CPUs to work here automatically. But performance is not the point I want to make.
    Again, we lay siege to this system:
    jlouis@illithid:~$ siege -l -t3M http://localhost:8080 | tee erlang.log
    

    Enter R

    We can take these data and load them into R for visualization:
    > a <- read.csv("erlang.log", header=FALSE);
    > b <- read.csv("node.js.log", header=FALSE);
    > png(file="density.png")
    > plot(density(b$V3), col="blue", xlim=c(0,40), ylim=c(0, 0.35));
    lines(density(a$V3), col="green")
    > dev.off()
    > png("boxplot.png")
    > boxplot(cbind(a$V3, b$V3))
    > dev.off()
    

    Discussion

    What have we seen here? We have a situation where Node.js has a much more erratic response time than Erlang. We see that while some Node.js responses complete very fast (a little more than one second) there are also responses which take 29.5 seconds to complete. The summary of the data is here for Node.js:
    > summary(b$V3)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1.040   6.328  13.580  13.940  20.940  29.590 
    
    And for Erlang:
    > summary(a$V3)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
       9.87   11.21   12.24   12.21   13.16   15.32 
    

    The densities are (green is Erlang, blue is Node.js)
    density plot

    And for completion, a boxplot:
    boxplot


    This is a result of Erlang preemptively multitasking the different processes so its responses all come around the same time. You can’t really use the mean for anything: Erlang ran 2 CPUs whereas Node.js only ran one. But the kernel density plot clearly shows how Erlang stably responds while the response times of Node.js is erratic.
    Does this mean Node.js is bad? No! Most node.js programs will not blindly loop like this. They will call into a database, make another web request or the like. When they do this, they will allow other requests to be processed in the event loop and the thing we have seen here is nonexistent. It does however show that if a Node.js request is expensive in the processing, it will block other requests from getting served. Contrast this with Erlang, where cheap requests will get through instantly as soon we switch context preemptively.
    It also hints you need to carry out histogram plots for your services (kernel density plots are especially nice for showing how the observations spread out). You may be serving all requests, but how long time does it take to serve the slowest one? A user might not want to wait 30 seconds on a result, but he may accept 10 seconds.


    Conclusion

    My main goal was to set out and exemplify a major difference in how a system like Node.js handles requests compared to Erlang. I think I have succeeded. It underpins the idea that you need to solve problems depending on platform. In Node.js, you will need to break up long-running jobs manually to give others a chance at the CPU (this is essentially cooperative multitasking). In Erlang, this is not a problem — and a single bad process can’t hose the system as a whole. On the other hand, I am sure there are problems for which Node.js shines and it will have to be worked around in Erlang.

    EDIT: Minor spelling correction.
    11

    View comments

Blog Archive
About Me
About Me
What this is about
What this is about
I am jlouis. Pro Erlang programmer. I hack Agda, Coq, Twelf, Erlang, Haskell, and (Oca/S)ML. I sometimes write blog posts. I enjoy beer and whisky. I have a rather kinky mind. I also frag people in Quake.
Popular Posts
Popular Posts
  • On Curiosity and its software I cannot help but speculate on how the software on the Curiosity rover has been constructed. We know that m...
  • In this, I describe why Erlang is different from most other language runtimes. I also describe why it often forgoes throughput for lower la...
  • Haskell vs. Erlang Since I wrote a bittorrent client in both Erlang and Haskell, etorrent and combinatorrent respectively, I decided to put ...
  • A response to “Erlang - overhyped or underestimated” There is a blog post about Erlang which recently cropped up. It is well written and pu...
  • The reason this blog is not getting too many updates is due to me posting over on medium.com for the time. You can find me over there at thi...
  • On using Acme as a day-to-day text editor I've been using the Acme text editor from Plan9Port as my standard text editor for about 9 m...
  • On Erlang, State and Crashes There are two things which are ubiquitous in Erlang: A Process has an internal state. When the process crashes,...
  • When a dog owner wants to train his dog, the procedure is well-known and quite simple. The owner runs two loops: one of positive feedback an...
  • This post is all about parallel computation from a very high level view. I claim Erlang is not a parallel language in particular . It is not...
  • Erlangs message passing In the programming language Erlang[0], there are functionality to pass messages between processes. This feature is...
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.