1. A response to “Erlang - overhyped or underestimated”

    There is a blog post about Erlang which recently cropped up. It is well written and puts forth some critique of the Erlang/OTP language. Naturally, I have a bias. I write a lot of Erlang and I like the language - and anything less then a blog post myself would not be fair: There is much to discuss and a fleeting Twitter comment or a comment below the original post can’t convey the information needed.

    Erlang is like an exotic beautiful woman with no dressing sense.


    I Love this comment from the article. There is truth in it: underneath the clothes of Erlang, beauty is hiding. Yet, I feel that one might have misunderstood the dress code in the exotic world, which is why I sat down, C-x C-f’ed a new buffer and began writing.

    1. Today’s mainstream developers who are used to C or Java like syntax wont find its Prolog-like syntax too friendly.


    This point, the syntax point, comes up all the time. The claim is that Erlangs syntax is too far away from “mainstream” languages, whatever that means. I don’t think the critique is warranted, but since we have alternative languages like Efene and Reia there are a least some people who think Erlang has a syntax problem, so it warrants discussion.

    Perhaps, one should be nuanced and hammer through the difference between syntax and semantics. Syntax is, roughly, the rules for forming valid sentences in the programming languages. Whereas semantics roughly is the meaning of the language, what it denotes. In other words, syntax are rules for transforming a valid input text into an internal parse tree — semantics the rules for executing that parse tree, running the program.

    There is a tendency to focus more on Erlangs syntax than its semantics. I think this is partially wrong: the semantics shape the syntax and vice versa. I also have a hunch that people may claim a problem with the syntax of Erlang, where the point is really a misunderstanding of its semantics.

    Is Erlangs Prolog-like syntax unfriendly? I don’t think so! Erlang has extremely few syntax rules and they are quite consistent. The grammar is less than 500 lines. Contrast with the Ocaml parser 3 times as large. The Go programming language has a parser which is well above in size as well - I cannot imagine the parser for Java nor C be any smaller.

    The main objection is familiarity: “It doesn’t look like Java!” I think the point is somewhat moot. Python doesn’t look like Java. Neither does Ruby or Perl. Still programmers have no trouble picking up those languages. Before the C-style languages became dominant, programmers wrote Pascal, COBOL and Algol whose syntax is far from what we expect.

    I expect far more people have trouble with the unfamiliarity of the semantics. If you come from an imperative setting, you need time to wrap your head around functional programming styles and idioms. Yet, unfamiliar semantics should pose no problem either: Python, Javascript, PHP and Java all execute very differently if you look at them modulo imperative execution.

    2. While the core language itself is small and easy to learn, the libraries within the language are inconsistent, incomplete and poorly documented.


    The Erlang/OTP libraries suffer from inconsistency having been built over time whenever there was a need for a new function. This is indeed unfortunate, but note this: apart from a few libraries which implement their functionality directly in the Erlang VM kernel, most libraries are written in pure Erlang and can be replaced easily. If you hate the list module, you can write your own lst.

    Some libraries are de-facto deprecated as it is known they have certain limitations. The way Erlang tend to work however is that older modules known to have trouble are removed fairly slowly from the language - there may be a user out there, however poor that module is.

    I cannot agree with the bad documentation. Erlang has excellent documentation. There a man-pages for every module, accessible with erl -man lists for instance (provided your distribution of Erlang is correctly built) and we have several online places where you can look up function definitions. In addition many modules have users guides which you can use to get started quickly. There even is an efficiency guide so you know how to write efficient code and there is a set of programming conventions with good advice on how to structure your programs.

    Usually the documentation of functions are rather good I think. If you find something which you think is poorly documented, I’d advice you to make a patch against the documentation and discuss the change with others. Improving documentation is more important than ranting on its inadequacy.

    3. Only a few people have written production level codes and you rarely get to hear from them.


    I have written production level code in Erlang, namely a BitTorrent client good enough for everyday use. Our performance is currently as good as most other BitTorrent clients, CPU and memory-wise, despite we are writing the client in a “slow” language. I have also written programs professionally in Erlang - for the Web, but I am getting ahead of myself.

    I try to write about my experiences - in part to tell stories, in part to educate and encourage others to pick up the language. No programming language is a silver-bullet. But when your task is to write highly concurrent programs Erlang shines. And if you look at the usual protocols for distributed computing on the web, http-servers, xmpp-servers, ad-serving frameworks, payment processing systems, BitTorrent-clients, video-servers, and so on — you find that many of them are highly concurrent.

    Successful systems that just work will not catch headlines. A system that is shock-full of errors will. Many Erlang programs are running in companies with no open-source policy as well - don’t expect the programmers of those systems to even be able to talk about what they do. It is a competitive advantage to keep the mouth shut.

    4. I can’t imagine how you can organize large code-bases in Erlang or even work as team, and this doesn’t feel right to any OO programmer.


    Large Erlang programs are structured around the concepts of applications and releases. You write several isolated functional units, applications. Then you bundle these into a release which you can ship to the production environment. A typical application will provide an abstraction for something, be it running an HTTP server, talking to a database, controlling an external CPU-centered numerical program and so on.

    The organization of large programs hinges on API-design. You want to design your program such that each application has a small API used by the other parts of the program. There may thousands of code lines behind the wall of the API, yet the interface to the rest of the world is small.

    The trick of OO-languages is “abstraction is had by introducing another object”. If you take a look at the OO-design patterns, you will find that often a new object is introduced to mediate and solve an abstraction problem. This because the only way to abstract in those languages are to construct new objects (or classes, depending on lanuage).

    In Erlang, the mantra is “abstraction is had by introducing another process”. In other words, you can usually solve abstraction-problems by introducing a middle-man mediating process, store state in a separate process, split a computation between several processes and so on. The OO-property of isolation, much important to structuring large programs, is present in the isolation of processes: you can’t go rummaging around inside the memory heap space of another process, you must ask it gently and nicely.
    Naturally, this model has a design-pattern language as well and I know of several such. Remember this: “In OO-languages state is kept in objects; in Erlang, state is kept in processes”.

    5. Most of the performance matrices are one-sided, and are performed by people who have an interest in Erlang. I would love to see some independent analysis.


    In general, you should be wary of performance measurements where one does not fully understand the platforms they are working with. It is hard to make a program perform better but it is extremely easy to make a program perform worse. To improve a program you must understand the rules of the game you are playing. The rules used to speed up, e.g. node.js are much different from the rules used in Erlang. And that is different again from Ocaml, Scala, Java, Python, …

    Also, workarounds for problems tend to be vastly different. A recent blog post of mine lays bare a curious property of node.js but the seasoned Node programmer understands how to work around it. He or she may deploy the multi-node for instance and it fixes a lot of the problems by using a single accept() queue for several workers. This is a nice way to break the rules of the game to avoid a problem.

    Rather than thinking in terms of performance, I would argue you should think in terms of what your problem is. Erlang shines when a fully-preemptive, heavily concurrent process model is a good solution to your problem. It is powerful in that respect and it has the advantage it is a very mature system.

    6. Its support for web-development is very primitive. With web frameworks like rails and grails, there is a lot of serious work for Erlang if it ever intends to go to that market.


    I don’t think this is true. Web frameworks like Rails or Grails only talk about half of the web world. Clients in modern systems tend to be Javascript-heavy and only go to the server for their RPC Ajax-like requests. In this world, you need a lot less tooling at the side of the server. There are many web framworks popping up for Erlang currently, but let me plug the nitrogen project.

    Yet, I think we will see much less need for web-frameworks as they were. We will need a new kind of framework which is much easier to work with server-side. And I think node.js shows the server doesn’t need a lot of stuff to be effective.

    You should also think about the emerging alternatives to RDBMs data storage. There are systems such as couchdb and riak, which can cleverly bypass some of the usual Model-View-Controller pattern. I think we are in for a change in the way we do web development for the better and that Grails and Rails are a thing of the past if they don’t adapt to the new world (I am sure Rails will - but have to little experience with Grails to know if they stagnate or not).

    7. Did I talk about Strings in Erlang? IO speed?


    This single item is a blog post worthy in itself. First the strings.

    A string in most languages is a sequence of bytes, pointed to by reference. In some languages the string is the central data you pass around and in some, it is the only kind of data you could pass around. Haskell and Erlang most notably defines a string to be a list of characters and integers respectively. There is much good and bad to be said about this choice - but it hurts people who don’t understand the difference.

    Most web systems manipulate strings. The string is the ubiquitous data format: it stores integers, it stores floats, it stores XML documents, JSON, and such. The string is easy to transfer over a TCP/IP stream. It is no surprise that many languages center around string manipulation and are very effective at it. Perl is perhaps the ultimate string processor (apart from crm114, naturally).

    The ubiquity of strings are also their Achilles-heel. The type information they carry is weaknonexistent to be precise. To manipulate a string in any statically typed language, Java, Ocaml, Haskell, etc., you need to transform it into an internal format. You process the string to an Object-hierarchy or an algebraic data type — and then you throw the string away! The new representation has all the advantages: it is typed, it can carry additional information in object state, and it can make illegal states unrepresentable.

    You should never ever work directly with strings if performance matter. Even simple things like string comparisons may be fairly expensive (if the pointer comparison says different), whereas an atom comparison is not. The world of programming is more complex than just shoving every piece of data into a string.

    Another weakness of the string is that the representation only answers to query by regular expression, recursive descent or LALR(1)-parsing. Some languages are very good at the former, regex query, but Erlang is not one of them since regular expressions are not built into the syntax and semantics.

    So the first virtue of the Erlang programmer: Convert a string as fast as possible into an erlang-term() and then manipulate the term. Only work with crappy weakly-typed strings at the border of your application. An Erlang application should not constrain itself to work with only a single data type, namely strings!

    The second virtue follow fast: If your string is large, use a binary() for effective storage and sharing. The binary representation, like the ByteString in Haskell, is as space efficient as C and it can be pattern matched if needed.

    The third virtue is: Know thine iolists. When you construct strings in Erlang, you are not to build a sequence of characters! You should be building a tree of small string-like fragments, binaries, other trees, lists and so on. The output functions know how to effectively walk the tree and stream it to the output device.

    The IO performance of Erlang is pretty good. I easily had some early tests in Etorrent moving 700 megabit on a single 1.2Ghz Pentium M CPU. Without any optimization at all.

    Yet, it is important to notice that IO in Erlang is abstracted by default and this makes it a tad slower than what it can to be. The abstraction is rather nice and has to do with distribution. You can access a socket or file on another machine as if it is locally accessed. But this neat abstraction naturally has an overhead. Of course it is easy to build a primitive which throws away that abstraction if needed. And that will definitely run as fast as any other language.
    9

    View comments

  2. Differences between Node.js and Erlang

    Suppose we have a canonical ping/pong server written in Node,
    var sys = require("sys");
    var http = require("http");
    http.createServer(function (req, res) {
      res.writeHead(200, {"Content-Type": "text/plain"});
      res.end("Hello, World\n");
    }).listen(8124, "127.0.0.1");
    sys.puts("Server running at http://localhost:8124");
    
    We can run this server easily and test it from the command line:
    jlouis@illithid:~$ curl http://localhost:8124
    Hello, World
    
    And it does what we expect. Now suppose we do something silly. We make a tiny change to the Javascript code:
    var sys = require("sys");
    var http = require("http");
    http.createServer(function (req, res) {
      res.writeHead(200, {"Content-Type": "text/plain"});
      res.end("Hello, World\n");
      while(true) { // Do nothing
      }
    }).listen(8124, "127.0.0.1");
    sys.puts("Server running at http://localhost:8124");
    
    Now, the first invocation of our test works, but the second hangs:
    jlouis@illithid:~$ curl -m 10 http://localhost:8124
    Hello, World
    jlouis@illithid:~$ curl -m 10 http://localhost:8124
    curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received
    
    This should not surprise anybody. What we have here illustrated should be a common knowledge. Namely that Node is not preemptively multitasking but is asking each event to cooperate by yielding to the next one in turn.
    The example was silly. Now, suppose we have a more realistic example where we do work, but it completes:
    var sys = require("sys");
    var http = require("http");
    http.createServer(function (req, res) {
    res.writeHead(200, {"Content-Type": "text/plain"});
      x = 0;
      while(x < 100000000) { // Do nothing
        x++;
      }
    res.end("Hello, World " + x + "\n");
    }).listen(8124, "127.0.0.1");
    sys.puts("Server running at http://localhost:8124");
    
    We introduce a loop which does some real work. And then we arrange for it to be non-dead by requiring it in the output. Our server will now still return, but it will take some time before it does so.
    Let us siege the server:
    jlouis@illithid:~$ siege -l -t3M http://localhost:8124 
    ** SIEGE 2.69
    ** Preparing 15 concurrent users for battle.
    The server is now under siege...
    [..]
    
    For three minutes, we hammer the server and get a CSV file, which we can then load into R and process.


    Erlang enters…

    For comparison, we take Mochiweb, an Erlang webserver. We do not choose it specifically for its speed or its behaviour. We choose it simply because it is written in Erlang and it will context switch preemptively.
    The relevant part of the Mochiweb internals are this:
    count(X, 0) -> X;
    count(X, N) -> count(X+1, N-1).
    
    loop(Req, _DocRoot) ->
      "/" ++ Path = Req:get(path),
      try
        case Req:get(method) of
          Method when Method =:= 'GET' ->
        X = count(0, 100000000),
        Req:respond({200, [], ["Hello, World ", integer_to_list(X), "\n"]});
          [..]
    
    It should be pretty straightforward. We implement the counter as a tail-recursive loop and we force its calculation by requesting it to be part of the output.
    erl -pa deps/mochiweb/ebin -pa ebin
    Erlang R14B02 (erts-5.8.3) [source] [64-bit] [smp:2:2]
    [rq:2] [async-threads:0] [hipe] [kernel-poll:false]
    1> application:start(erlang_test).
    {error,{not_started,crypto}}
    2> application:start(crypto).     
    ok
    3> application:start(erlang_test).
    ** Found 0 name clashes in code paths 
    ok
    4> 
    
    Notice that we get both my CPUs to work here automatically. But performance is not the point I want to make.
    Again, we lay siege to this system:
    jlouis@illithid:~$ siege -l -t3M http://localhost:8080 | tee erlang.log
    

    Enter R

    We can take these data and load them into R for visualization:
    > a <- read.csv("erlang.log", header=FALSE);
    > b <- read.csv("node.js.log", header=FALSE);
    > png(file="density.png")
    > plot(density(b$V3), col="blue", xlim=c(0,40), ylim=c(0, 0.35));
    lines(density(a$V3), col="green")
    > dev.off()
    > png("boxplot.png")
    > boxplot(cbind(a$V3, b$V3))
    > dev.off()
    

    Discussion

    What have we seen here? We have a situation where Node.js has a much more erratic response time than Erlang. We see that while some Node.js responses complete very fast (a little more than one second) there are also responses which take 29.5 seconds to complete. The summary of the data is here for Node.js:
    > summary(b$V3)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1.040   6.328  13.580  13.940  20.940  29.590 
    
    And for Erlang:
    > summary(a$V3)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
       9.87   11.21   12.24   12.21   13.16   15.32 
    

    The densities are (green is Erlang, blue is Node.js)
    density plot

    And for completion, a boxplot:
    boxplot


    This is a result of Erlang preemptively multitasking the different processes so its responses all come around the same time. You can’t really use the mean for anything: Erlang ran 2 CPUs whereas Node.js only ran one. But the kernel density plot clearly shows how Erlang stably responds while the response times of Node.js is erratic.
    Does this mean Node.js is bad? No! Most node.js programs will not blindly loop like this. They will call into a database, make another web request or the like. When they do this, they will allow other requests to be processed in the event loop and the thing we have seen here is nonexistent. It does however show that if a Node.js request is expensive in the processing, it will block other requests from getting served. Contrast this with Erlang, where cheap requests will get through instantly as soon we switch context preemptively.
    It also hints you need to carry out histogram plots for your services (kernel density plots are especially nice for showing how the observations spread out). You may be serving all requests, but how long time does it take to serve the slowest one? A user might not want to wait 30 seconds on a result, but he may accept 10 seconds.


    Conclusion

    My main goal was to set out and exemplify a major difference in how a system like Node.js handles requests compared to Erlang. I think I have succeeded. It underpins the idea that you need to solve problems depending on platform. In Node.js, you will need to break up long-running jobs manually to give others a chance at the CPU (this is essentially cooperative multitasking). In Erlang, this is not a problem — and a single bad process can’t hose the system as a whole. On the other hand, I am sure there are problems for which Node.js shines and it will have to be worked around in Erlang.

    EDIT: Minor spelling correction.
    11

    View comments

  3. Tracing Erlang programs for fun and profit

    One of the neat things about Erlang is its instrumentation capability. You can instrument programs to tell you interesting things about what is happening in the program. This blog post is about a tool by Mats Cronqvist, redbug.

    Redbug can be downloaded from github and is part of Mats’ eper suite of tools. Installing the tool is easy. I recommend to set the $ERL_LIBS environment to something. Mine is set at:

       jlouis@illithid:~$ env | grep ERL_LIBS
      ERL_LIBS=:/home/jlouis/lib/erlang/lib
    

    so I can just drop Erlang libraries I use on-and-off into that directory and they will be picked up by any Erlang I run. It is not a good solution when you are building software with dependencies, but for smaller tools used by you, like eper or Erlang QuickCheck, this mechanism is really good.

    Installing eper should be fairly simple.

    Redbug invocation

    Redbug can either be called from the command line. There is a redbug shell script you can call or you can call it from the Erlang shell. The main invocation is like this,

       redbug:start(TimeOut, MessageCount, MS)
    

    where TimeOut is a timeout in milliseconds at which redbug ceases to operate, MessageCount sets a limit to how many reports redbug is going to make, and MS is a match-spec matching trace points in the program. There are several possible ways to write MS and I am only going to give some simple examples to get you started. The tool is self-documenting and you can call

       redbug:help().
    

    from a shell and get the whole story.

    The timeout and messagecount limitation is very useful. Erlang has a built-in tracer on which redbug is built. But contrary to the built-in tracer, redbug protects the running system by the limitations. You can’t hose the system by accidentally set a specification which hoards all of the resources on the Erlang node.

    A typical MS is written like {erlang,now,[return,stack]}. This states we are tracing for the erlang:now call of any arity. When we match, we want the current stack printed and we want the return value of the call:

      22> redbug:start(100000, 2, {erlang,now,[return,stack]}).
     ok
     23> erlang:now().
    
     22:40:12 <{erlang,apply,2}> {erlang,now,[]}
     {1290,289212,474520}
      shell:eval_loop/3
      shell:eval_exprs/7
      shell:exprs/7
    
     22:40:12 <{erlang,apply,2}> {erlang,now,0} -> {1290,289212,474520}
     quitting: msg_count
     24> erlang:now().
     {1290,289219,57040}
    

    Notice the quitting: msg_count which states after two messages from redbug, it will cease to do any further tracing. In general, the MS can also be written like, e.g., {module,function,[return,{'_', 42}]} stating that we accept any call matching module:function(_, 42) and gets its return stack.

    A real-world bug hunt

    Redbug means you don’t in general have to add a lot of debug-printing to your Erlang code. Rather, it is easier to probe systematically with redbug on a running system. I was wondering why a recent patch in etorrent seemed to work incorrectly, so we go hunting:

    (etorrent@127.0.0.1)26> redbug:start(10000, 2, {etorrent_choker, split_preferred, [return]}).
    ok
    22:47:32  {etorrent_choker,split_preferred,[[]]}
    22:47:32  {etorrent_choker,split_preferred,1} -> {[],[]}
    quitting: msg_count
    

    This choker call should not be passed the empty list, so we look into the code and find that the rechoke_info builder right before it is odd:

    (etorrent@127.0.0.1)29> redbug:start(10000, 30, {etorrent_choker, build_rechoke_info, [return]}).
    [..]
    22:49:22  {etorrent_choker,build_rechoke_info,2} -> []
    22:49:22  {etorrent_choker,build_rechoke_info,1} -> []
    

    So - both build_rechoke_info/1 and build_rechoke_info/2 return the empty list. Something is wrong inside that function. Since the function is looking up data in other modules, we trace each of the module lookups:

    (etorrent@127.0.0.1)29> redbug:start(10000, 10, {etorrent_table, get_peer_info, [return]}).
    [..]
    22:51:42  {etorrent_table,get_peer_info,[<0.5759.0>]}
    22:51:42  {etorrent_table,get_peer_info,1} -> {peer_info,
                                                                  leeching,17}
    

    Nope, that looks right, on to the next:

    (etorrent@127.0.0.1)30> redbug:start(10000, 15, {etorrent_rate_mgr, fetch_send_rate, [return]}).
    ok
    22:53:12  {etorrent_rate_mgr,fetch_send_rate,[4,<0.3919.0>]}
    22:53:12  {etorrent_rate_mgr,fetch_send_rate,2} -> none
    

    Oh, a return of none is wrong here! Why does it return none? The call looks fine, but we are looking up data in an ETS table…

    At this point, we can use another nice little Erlang tool, tv or the Table Viewer. We run:

    tv:start().
    

    find the problematic table and inspect an element, which turned out to contain the wrong information. Thus, the hunt is all about figuring out why the wrong information is entered into the table in the first place.

    (etorrent@127.0.0.1)35> redbug:start(10000, 5, {ets,insert,[stack,{etorrent_send_state, {rate_mgr, {'_', undefined}, '_', '_'}}]}).
    

    Basically, we have now constricted the output to exactly the wrong types of calls. And the culprit function is easily found in the callers stack.

    Further digging shows the problem to be a race at the gproc process table which can be fixed by asking gproc to await the appearance of a given key.

    1

    View comments

  4. On Erlang, State and Crashes

    There are two things which are ubiquitous in Erlang:

    • A Process has an internal state.
    • When the process crashes, that internal state is gone.

    These two facts pose some problems for new Erlang programmers. If my state is gone, then what should I then do? The short answer to the question is that some other process must have the state and provide the backup, but this is hardly a fulfilling answer: It is turtles all the way down. Now, that other process might die, and then another beast of a process must have the state. And this observation continues ad infinitum. So what is the Erlang programmer to do? This is my attempt at answering the question.

    State Classification

    The internal state of an Erlang process can naturally be classified. First, state has different value. State related to the current computation residing on the stack may not be important at all after a process crash. It crashed for a reason and chance are that the exact same state will bring down the process again with the same error. The same observation might apply to some internal state: It is like a scratchpad or a blackboard: when the next lecture starts, it can be erased because it has served its purpose.

    Next is static state. If a process is governing a TCP/IP connection that process should probably connect to the same TCP/IP Address/Port pair if it crashes and is restarted. We call that kind of data configuration or static data. It is there, but it is not meant to change over the course of the application, or only change rarely.

    Finally our crude classification of state has dynamic data. This class is the data we generate over the course of the running program, get from user input, create because other programs communicate with us and so on. The class can be split into two major components: State we can compute from other data and state we cannot compute. The computable state is somewhat less of a problem. We can basically just recompute it after a crash, so the real problem is the other kind of user/program-supplied information.

    In other words, we have three major kinds of state: scratchpad, static and dynamic.

    The Error Kernel

    Erlang programs have a concept called the error kernel. The kernel is the part of the program which must be correct for its correct operation. Good Erlang design begins with identifying the error kernel of the system: What part must not fail or it will bring down the whole system? Once you have the kernel identified, you seek to make it minimal. Whenever the kernel is about to do an operation which is dangerous and might crash, you "outsource" that computation to another process, a dumb slave worker. If he crashes and is killed, nothing really bad has happened - since the kernel keeps going.

    Identifying the kernel plugs the "turtles all the way down" hole. As soon as the kernel is hit, we assume correctness. But since the kernel is small, the trusted computing base of our program is likewise. We only need to trust a small part of the program, and that part is also fairly simple.

    A visualization is this: A program is a patchwork of small squares. Some of the squares are red, and these are the "error kernel". Most (naively implemented) imperative programs are mostly red, save for a few squares. These are the squares where exceptions are handled explicitly and the error is correctly mitigated. The kernel is thus fairly large. In contrast, robustness-aware Erlang programs have few red squares - most of the patchwork is white. It is a design-goal to get as few red squares as possible. It is achieved by delegating dangerous work to the white areas so a crash does not affect the kernel.

    Handling the state classes

    Each class must be handled differently. First there is the scratchpad/blackboard class. If a process crashes, the class is interesting because it contains the stack trace and usually the data which tells a story - namely how and why the process crashed. We usually export this data via SASLs error logger, so we can look at a crash report and understand what went wrong. After all, the internal state is gone after the crash report is done and logged.

    Next, there is the static class. The simplest thing is to have another process feed in the static data. This can be done by, among others, the supervisor, by asking an ETS table, by asking GProc (if you use gproc in your system), by asking another process or by discovery through the call application:get_env/2. It is important to note just how static the data is - you have few options with differing advantages and disadvantages. Which one to choose depends on how much the data is going to change.

    Finally, the fully dynamic data is the nasty culprit. If you can recompute the data, you are lucky. As an example from my etorrent application, each peer has a dynamic table of what parts of a torrent file the given peer has. So the controlling process has an internal table of this information. But if we crash and reconnect to the peer, the virtue of the bittorrent protocol will send us this information again. So that information is hardly worth keeping around. Other times, you can simply recalculate the information when your process restarts, and that is almost never a problem either.

    So what about the user supplied data? This is where the error kernel comes in. You need to protect data which you can not reconstruct. You protect it by shoving it into the error kernel and keep some simple state maintenance processes there to handle the state. A word of warning though: If your state is corrupted, it means that processes basing their work on the state will do something wrong. To mitigate this, it is important to make some general sanity checking of your data. Make it a priority to check your data for invariants if you find them. And don't blindly trust non-error-kernel parts of the system.

    If a process crashes, you should definitely think how much of its internal state you want to recycle. If you recycle everything you risk hitting the exact same bug again and crash. Rather, there may be a benefit to only recycling parts of the internal state.

    The next step: Onion-layered Error kernels

    The next logical step up, is to recognize that the error kernel is not discrete. You want to regard the error kernel as an onion. Whenever you peel off a layer, you get a step closer to the trusted computing base of the application. Then your system design is to push down state maintenance to the outermost layer in the onion where it still makes sense. This in effect protects one part of the application from others. In Etorrent, we can download multiple torrent files at the same time. If one such torrent download fails, there is no reason it should affect the other torrent downloads. We can add a layer to the onion: Some state which is local to the torrent is kept in a separate supervisor tree - to mitigate the error if that part fails.

    The net effect is program robustness: A bug in the program will suddenly need perseverance. It has to penetrate several layers in the onion before it can take the full program down. And if the Erlang system is well designed, even the most grave bugs can only penetrate so far before the stopping power of the onion layers brings it to a halt.

    Furthermore, it underpins a mantra of Erlang programs: Small bugs have small impact. They won't even penetrate the first layer. And they will hardly be a scratch in the fabric of computing.

    (Aside: Good computer security engineering use the same onion-layered model. There are strong similarities between protecting a computer system against a well-armed intruder and protecting a program against an aggressive, persistent, dangerous and maiming bug. End of Aside)

    EDIT: smaller language changes where my first post was a bit drafty.

    6

    View comments

  5. Grace Hopper on multiple occasions explained the length of a nanosecond with a piece of wire. The length of the wire was exactly the distance light would travel in a nanosecond. See [1] for one such occasion.
    While Grace was using the method to describe the delay in satellite communication and why computers need to get smaller, it turns out it is more relevant than ever today. We are officially over the era of computing where people glued together libraries to form programs. Today, virtually every new application built is distributed.
    The problem is that if you think latency due to disk seeks are slow, then imagine reaching a machine at the other end of the world - the delay is high. From here, my delay to yahoo.com is around 200ms, much more than the average seek time on an old harddisk. There is a common trick in the high-performance computing world to battle latency: latency hiding. While waiting for a message to pass through the slow network, you do something else. And then when data from the message is in grave need to continue the program, you gamble that it already arrived. The same trick has been used in operating systems for years while waiting on the disk to return data: run another program in the meantime.
    And this is why any modern programming language must tackle concurrent operations. From now on, most programs will be distributed. The client program will be because they live on mobile phone-sized systems and they have to pull in data from multiple sources. The server program will be because distribution is key to scaling and redundancy. In short, any program will have a situation where getting data amounts to asking another system for them -- and then handling the inherent latency present in the communication. The limit of a computer from here on out will not be in the amount of instructions it can retire successfully on its cores. Rather, it will be the amount of communication it can perform and how well it deals with it. On the memory bus. On the network. To the satellite.
    I am making a bet: Distribution will be huge and will be solved by message passing concurrency. That is, I claim we already have the necessary tools in the toolbox to tackle the problem. There is not going to be the concurrent doomsday where the world curls up in a corner and deadlocks. There is not going to be a parallel doomsday revolution either. If the internet has told us anything it should be that we can handle the problems which will come forth to rear their ugly heads (and breath fire). The reason I am betting on message passing concurrency is that it is easy to program and fast enough for a 200ms round trip in most cases. The amount of nanosecond wires utterly dwarfs the price we pay to pass a message.
    I am making another bet as well: parallellism is not going to be as huge as we think it is. We have relied for years on our computing technology to be faster every other year -- and this low hanging fruit is not there anymore. But it disguised another trend with even greater consequences: computers keep getting smaller and smaller. My mobile phone has the computing power of a State-of-the-art computer in 2001-2002. Imagine that in 8 years! It is not about splitting the computation up so it can run on a single machine with many cores anymore. It is about splitting the computation so it can run on many machines with many cores.
    Today is the 10/10/10 but the day is only special in the calendar. The change did not happen overnight. There has been a slow crawl towards a more distributed world for some years. But undoubtedly, the mobile devices will propel us with full force into the new world order.
    -- All Hail! Think concurrently!
    0

    Add a comment

  6. Some torrent files contain a humongous amount of files. Thousands. This is one of the problems you have to cope with as a client-writer and I plan to take care of both etorrent and combinatorrent. However, the solution I've adopted for etorrent is sinisterly beautiful, so I decided to write it down.
    The Problem
    Open files are limited on UNIX systems. This is to protect different applications against each other and to circumvent an eventual resource exhaustion on the system. A typical limit is 1024 files, or fewer in some cases. In etorrent, a file is governed by a process which plays the role of that file. Whenever you want to do a file operation on that file, you get hold of the process Pid and send it a message. Simple.
    Resources are limited by a timeout in these file processes. When a file has not been in use in 60 seconds, the process governing it terminates and frees up the resources on that file. It works reasonably well. The problem, however, is that some torrents have more files than the file descriptor limit. When we check the file upon starting up, we unfortunately open more than 1024 and then hit the limit.
    The Solution
    The solution is deceptively simple. We add a janitor process to the game. Whenever a new file is opened the janitor gets informed and the file process enters itself into an ETS table. Whenever we do an operation on the file a timestamp is bumped in the ETS table. This goes on and on; if a process dies, a monitor in the janitor cleans out the entry from the ETS table.
    Now, whenever a new file is opened we check the size of the table against a high watermark, 128 by default. If more processes are opened, we extract the full table and order it by last bump. Thus, the first elements in the resulting list are the processes which have been used the farthest back in time. We then ask enough of these to terminate to bring ourselves back under a low watermark - ensuring we won't be hitting the resource collection all the time we add a new file system process to the game.
    Updating the ETS table is expected to be rather cheap. The table is public, so each file governing process maintains its own entry. I don't think they will spend much time waiting for each other on the table. And if they do, there is always {write_concurrency, true} we can set on the table.
    2

    View comments

  7. Haskell vs. Erlang

    Since I wrote a bittorrent client in both Erlang and Haskell, etorrent and combinatorrent respectively, I decided to put up some bait. This might erupt in a language war and “My language is better than yours”, but I feel I am obligated to write something subjective. Here is to woes of programming in Haskell and Erlang.

    Neither Haskell, nor Erlang was a first language for me. I have programmed serious programs in C, Standard ML, Ocaml, Python, Java and Perl; tasted the cake of Go, Javascript, Scheme and Ruby; and has written substantial stuff in Coq and Twelf. I love static type systems, a bias that will rear its ugly head and breathe fire.

    I have written Haskell code seriously since 2005 and Erlang code seriously since 2007. I have programmed functionally since 1997 or so. My toilet reading currently is “Categories for the working mathematician” by Mac Lane. Ten years ago it was “ML for the working programmer” by Paulson.

    Enough about me.

    Caveats:

    With any language war material follows a disclaimer and a healthy dose of caveats. This is subjective. You have to live with it being subjective. My writing can’t be objective and colorful at the same time. And I like colors in my life. Also, it is no fun reading a table listing the comparison. Rather, I will try to make it into a good foil duel with attacks, parries, guards, pierces, bananas, and barbed wire.

    I built etorrent in Erlang first and combinatorrent in Haskell second. Hence, the 2nd time around, with the sole goal of redoing the functionality of etorrent was much easier and could proceed much faster. The Erlang code is slightly fattier at 4.2K lines versus 3.6K lines of Haskell (SLOCs). The performance of the two clients is roughly equal, but more time was spent at optimizing the Haskell code.

    My hypothesis is this: The Erlang VM is much more optimized at the IO layer than the current IO layer I use in GHC (specifically, the way incoming data is handled allocates more memory. This might change in the future do to a new IO layer). GHC kills the Erlang VM for everything else though, perhaps including message passing.

    Also, the quality of the Erlang code could be better, relatively compared to the Haskell code.

    Enough!

    Enough with the caveats!

    Haskell cons:

    What weighs against using Haskell for the project? First is laziness. Sometimes you want your code to be strict and sometimes lazy. In combinatorrent, we do some statistics which we don’t really need to calculate unless we want to present them. Stuff like bytes uploaded and downloaded for instance. Since you do not necessarily ask for these statistics, the compiler is free to build up thunks of the calculation and you have a neat little space leak. This is a recurring problem until you learn how to harness the strictness annotations of Haskell. Then the problem disappears.

    IO in Haskell is somewhat weak if you naively assume a String is fast. But there is help from Bytestrings, attoparsec and low-level Socket networking. Combinatorrent could use more help with getting the speed up here. I have substituted the IO layers lowest level some 2–3 times in combinatorrent. Contrast this with Erlang, where the original protocol parser and IO is the one still standing. If you want fast network IO in Haskell, you should be using bytestrings and network-bytestring. It is not worth the simplicity going over String in my experience.

    The GHC compiler has, comparatively, more performance regressions compared to the Erlang VM. It should come as no surprise: GHC is acting as both a research vehicle and a compiler implementation. I want to stress however, that this has not worried me a lot. When asking the GHC developers for help, the response has been fast and helpful, and in every case it was easy to fix or work around. Also, change is a necessary thing if you want to improve.

    Haskell pros:

    Haskell has one very cool thing: Static typing (remember the bias!). The type system of Haskell is the most advanced type system for a general purpose language in existence. The only systems which can beat it are theorem provers like Coq, and they are not general purpose programming languages (Morriset and the YNot team might disagree though!). Static typing has some really cool merits. Bugs are caught fast and early; types ensure few corner cases in the programs (why check for null when it can’t be represented). The types is my program skeleton and the program inhabiting the type is the flesh. Getting the skeleton right yields small and succinct programs. The abstraction possibilities from this is unparalleled in any language I have seen (and I’ve seen a few).

    The GHC compiler provides programs which have excellent execution speed. You don’t need to worry a lot about speed when the compiler simply fixes most of the problems for you. This in turn means that you can write abstract code without worrying too much about the result. This yields vastly more general and simpler programs.

    One very big difference in the implementations is that of STM channels versus Erlangs message passing. In Erlang, each process has a mailbox of unbounded size. You send messages to the mailbox, identified by the process ID of the mailbox owner. In Haskell, we use STM Channels for most communication. Thus, you send messages not to the PID of a process, but to a specific channel. This effectively changes some rules in channel network configuration. In Erlang you must either globally register a process or propagate PIDs. In Haskell, channels are created and then propagated to communicating parties. I find the Haskell approach considerably easier - but also note that in a statically typed language, channels is the way to go. The sum type for a PID mailbox would be cumbersome in comparison.

    Haskell has excellent library and data structure support. For instance you have access to priority search queues via Hackage. PSQueues are useful for implementing the piece histogram in a bittorrent client: knowing how rare a given piece is so you can seek to fetch the rarest first.

    Haskell can create (im-)mutable (un-)boxed arrays. These are useful in a bittorrent client in several places. Immutable arrays for storing knowledge about pieces is an example. Or bit-arrays for storing knowledge about the pieces a given peer has. Erlang has no easy access to these and no guarantee of the data representation.

    Bryan O’Sullivans attoparsec library allows for incremental parsing. When you get a new chunk of data from the network, you feed it to attoparsec. It will either give you a parsed message and the remaining bytes, or it will hand you back a continuation. This continuation, if invoked with more food, will continue the parsing. For network sockets the incrementality is pure win.

    The GHC compiler has some awesome profiling tools, including a powerful heap profiler. Using this, the run-time and memory usage of combinatorrent was brought down.

    Finally, testing in Haskell is easy. QuickCheck and Test.Framework provides a —tests target built into the combinatorrent binary itself. Self tests are easy.

    Haskell mistakes:

    I made some mistakes when writing the Haskell client. For one I relied on the CML library until I realized STM would do an equal or better job. The amount of Haskell developers with STM experience compared to the CML head-count made the decision to change easy.

    Furthermore, I should have focused on laziness earlier in the process. The first combinatorrent releases leak memory because of lazy thunk buildup. The latter versions, after I understood it intuitively, does not leak.

    Erlang cons:

    In Erlang, dynamic typing is the norm. Rather than enforce typing, you can get warnings by a type analyzer tool, the dialyzer, if need be. Running this on the code is a good idea to weed out some problems quickly. When building etorrent I had much use of the dialyzer and used a at that time experimental extension: spec() specifications. Yet, I think that 19/20 errors in my erlang programs were errors which a type system would have caught easily. This means you spend more time actually running the program and observing its behavior. Also note that dynamic typing hurts less in Erlang compared to other languages. A process is comprehensible in its own right and that reduces the interface to the process communication - a much simpler task.

    Etorrent has less stability than combinatorrent and has erred more. Yet, this is no problem for a bittorrent client since the supervisor-tree in Erlang/OTP will automatically restart broken parts of the system. For a bittorrent client we can live with a death once a week or once a day without any troubles.

    You have no mutability in Erlang and you have far less options for data representation. This in turn make certain algorithms rather hard to express or you have to opt for variant with a larger space usage. There were no Cabal-equivalent at the time I wrote the code and thus fewer libraries to choose from.

    For the built-in libraries, the HTTP library was more strict with respect to correctness. In turn, many trackers would not communicate with it and I had to provide a wrapper around the library. Today, this might have changed though. Haskells HTTP library worked out of the box with no changes.

    Erlangs syntax, compared to Haskell, is ugly, clunky and cumbersome. Make no mistake though: Tanks are ugly, clunky and cumbersome. It does not make tanks less menacing.

    Erlang pros:

    One application SASL. SASL is a system logger which will record in a ring-buffer any kind of process death and process restart. I used this a lot when developing. I would load a couple of torrents in the client and go to bed. Next morning I would check the SASL log for any error that might have occurred and fix those bugs. This way of developing is good for a bittorrent client: utmost stability is not needed. We just to get the number of errors below a certain threshold. Rather than waste time fixing a bug which only occurs once every year, we can concentrate on the things that matter.

    The IO layer in Erlangs VM is FAST! It is written in C, and it is optimized heavily because this is what Erlang does best. For file IO it uses asynchronous threads to circumvent having to wait on the kernel. For the network, it plugs into epoll() getting good performance in turn.

    The Beam VM of Erlang is a beast of stability. Basically, it doesn’t quit unless you nuke it from orbit. One of the smaller things I learned some weeks ago was the rudimentary flow control trick. Erlang schedules by counting reductions in an Erlang process and then switching process context when it has no more reductions in its time share. Sending a message never fails but it costs reductions proportional to the queue size of the receiving process. Hence, many senders have a harder time overloading a single receiver. The trick is simple, easily implementable and provides some simple flow control. While not fail-safe, it ups the ante for when communication overload happens.

    Erlang has OTP, the Open Telecom Platform, which is a callback-framework for processes. You implement a set of callbacks and hand over control to the OTP-portion of your process. OTP then handles a lot of the ugly, gritty details leaving your part simple. OTP also provides the supervision of processes, restarting them if they err. Supervisor-processes form trees so they are in turn supervised. It isn’t turtles all the way down in an Erlang VM…

    Erlang executes fast enough for most things. Haskell gives you faster execution, but Erlang was more than adequate for a bittorrent client in the speed department. As an example of how this plays together with the IO layer, an early version of etorrent could sustain 700 megabit network load on a local network of 1 gigabit when seeding. The current version of etorrent can do the same as a seeder I suspect. Also, message passing in Erlang is blazing fast. It feels like a function call - a key to good Erlang I think.

    The Erlang shell can easily be used as a poor mans user interface. Etorrent simply responds to some functions in the shell, showing status of the running system. I suspect GHCi can do the same, but I never got around to doing it and it doesn’t seem as easy to pull off.

    I love the Erlang way of programming. You assume your code does the right thing and let it crash otherwise. If it crashes too often you handle that case. Code is not lingered with error handling for things that never happen and should it happen occasionally, the supervisor tree saves the day.

    Erlang mistakes:

    Unfortunately, I made a number of mistakes in Etorrent. Most of these has to do with being the first version. Fred P. Brooks hinted that you want to throw away things when building the first version. And I did. I used ETS tables in places where they are not good. ETS is a table in which you can store any erlang term and later retrieve it. They give you a way to circumvent the representation limitation in Erlang. But they are no silver bullet: When you pull out a term, you copy it to the process pulling it. When your terms are 4–8 megabyte in size, that hurts a lot.

    I relied far too much on mnesia, the database in Erlang. Mnesia is basically using software-transactional-memory so locking is optimistic. When you have something like 80 writers wanting access to the same row in a table, then the system starves. Also, there is no need for a bittorrent application to require a mnesia store. A simple serialization of key data to a file based on a timer is more than adequate.

    I made several mistakes in the process model. I thought that choking was local to a torrent while in reality it is a global thing for all torrents currently being downloaded. These reorganizations require quite some refactoring - and missing in the static typing department these are somewhat more expensive compared to Haskell refactorings.

    I thought autotools were a good idea. It is not. Autotools is the Maven of C programming.

    Finally, I shedded unit-tests. In a dynamically typed environment you need lots and lots of these. But I decided against them early on. In hindsight this was probably a mistake. While unit-testing Erlang code is hard, it is by no means impossible.

    Future:

    The future brings exciting things with it. I will continue Combinatorrent development. I am almost finished with the Fast-extension (BEP 0006) for combinatorrent and have some more optimization branches ready as well. I still follow Erlang in general because it is an interesting language with a lot of cool uses. I do check that etorrent compiles on new releases of Erlang. If anyone shows interest in any of the client implementations, feel free to contact me. I will happily answer questions.

    There is no clear winner in the duel. I prefer Haskell, but I am biased and believe in static typing. Yet I like programming in Erlang - both languages are good from different perspectives.

    16

    View comments

  8. Helping with HaskellTorrent

    I have a ideology in which an Open Source project must be easy to hack for other people than the original author. Thus, I am trying to make this possible with HaskellTorrent. In particular, I keep several things in unsolved so others might join the fray, should they want to. This is a list, in no particular order, in which I put forth some things there are left to be done.

    Do note that I tend to keep bugs off this list. Many projects use the bug tracker as a way to track what needs doing. I have a TODO.md file in the top-level dir which contains things. Some of these things are taken from this list.

    HAVE message pruning

    Torrent files work by splitting a file into pieces and then breaking pieces into blocks. The blocks are then exchanged between peers until every block in a piece is downloaded. At that point, the SHA1 checksum of the piece can be used to verify the piece got downloaded correctly. If this is the case, a HAVE message is broadcasted to all peers we are connected to. This notifies other peers of the newly available piece so they can begin requesting it.

    However, there is an optimization oppurtunity. If the peer already has told us (usually by a HAVE message) that it got the piece already, there is no reason to tell it about the availability. Thus, we can prune the sending of the HAVE for those clients. There is already an IntSet with the necessary information inside the PeerP process with the information, so the change is fairly simple.

    Optimize Peer/PieceMgr communication

    Profiling shows we spend a considerable amount of time in the communication between the Peer and the Piece Manager. The Peer will, when downloading, try to keep a pipeline of block REQUEST messages going towards the other end. It mitigates the delay on the internet doing so. Hence, it periodically asks the PieceManager for new blocks to request. It does so by sending a number of blocks it wants together with an IntSet of what pieces the peer at the other end has.

    The Piece Manager is responsible for giving exclusive access to blocks to peers. There is no reason to download the same blocks at multiple peers, so it keeps track of what blocks where requested. Also, it must only serve blocks that the given peer can download. Another goal is that we would like to complete a piece as early as possible so we will try to complete from pieces that are in progress first.

    Currently, the system works by looking at the pieces in progress and then try to serve from these. If all blocks have been taken on pieces in progress, we find one that is pending (randomly) and available at the peer. This one is then promoted to being in progress and the algorithm runs again.

    There are a couple of nice optimizations possible. First, if the peer is a seeder, it effectively has every piece. Thus, there are no reason to keep the IntSet around for those, nor is there any reason to carry out expensive IntSet intersections. Second, if the peer is not a seeder, it would be better to keep a bit array around rather than an IntSet. It does count for a good amount of live memory in the Peer processes, and we expect there to be quite many of those. Third, we could benefit from keeping a cache of the last piece the peer requested blocks from. Trying this cache element blindly in the Piece Manager is advantageous to us: we spend less time in the Piece Manager code. It is also nice for the peer: if a piece is cached at the other end, we might help by requesting as many blocks as possible from that piece. Disk IO is a point of contention in modern bittorrent client implementations.

    Always pick the rarest piece first

    We currently pick pending pieces at random. A better scheme is to know about their availability and then pick them rarest first. The rarest piece is the easiest for us to spread, so it maximizes our ability to give back, which in turn maximizes our ability to download fast. The right Haskell cabal library for this is Data.PSQueue in the package PSQueue. If there are more pieces eligible at the same rarity, we will pick one at random. This gives an excellent use of View Types in Haskell, should one be interested.

    To do this, there is a milestone you need to reach beforehand. When we receive knowledge of piece availability, either through a BITFIELD or a HAVE message, we should propagate that information to the Piece Manager. In the beginning, we can just do nothing with it, and throw it away. It will pave the way for a PSQueue implementation however.

    Increase code quality

    Run hlint on the code. There is a target in the Makefile, which has recently been fixed. Find a GHC warning which is not yet enabled in the .cabal file and enable it, fixing the bugs that turn up. I guess some of these are more evil to fix than others, so start with the easier ones.

    Another area are tests. Running HaskellTorrent —tests will run the embedded test suite. This one fails on 64bit architectures at the moment (I think, this is the narrowing-down I’ve done). If you spot any area which you can figure out how to test, I would really like to discuss it with you. The code can do with a lot more testing than it use right now.

    Discuss use of attoparsec, attoparsec-iteratee and the event library

    I am seriously contemplating impaling all network performance bottlenecks once and for all by using this triple. If you are interested in hacking these, I’ll be happy to talk to you about it. It would push the bottleneck to the Disk layer once and for all, I think. The cool thing is that there are code to gain inspiration from.

    Another nail to the coffin is to support the FAST-extension in the new parser right away. It would pave the way for the rest of the client to understand this extension, so we would be able to get better and faster communication. It also plugs a bug in the original bittorrent specification.

    Use mmap() for Disk IO

    The right way to do disk IO is by use of mmap() on the files. Reading and writing files are not going to be hard, but we also need to get the hopenssl library to talk to the mmap()’ed backend so we get fast checksum calculations.

    1

    View comments

  9. Tufte applied to games

    I played a small bit of the old Infocom game “Wishbringer” the other day. And somehting struck me. This game is small. It has something like 50 locations, with miniscule text at each location. Of course this is because of the limitations of the hardware from the day the game was released, but it also made me think. There is so much game crammed into those 50 locations. Clever location reuse makes the game seem larger than it is. In fact, the majority of locations change considerably over the course of the game.

    Now contrast this with newer sandbox games of enormous size: Morrowind, Oblivion, Fallout 3, all by Bethesda. These games are absolutely huge, but in contrast, the world is a mostly static one with very few changes over the course of the game. Tufte, who just got appointed by the US government to visualize government spending, had this notion of “ink carrying meaning” and “ink”. The ratio between these gives you a number of how much ink is wasted and how tightly information is packed on a piece of paper.

    Applying Tuftes observation to game worlds is interesting and fun. Games like Morrowind and Oblivion has a very low ratio of value, whereas an old game like Wishbringer has a very high value. Surprisingly, the game Arx Fatalis from 2002 will have a rather high ratio. In Arx Fatalis, the game world is pretty small. But the game uses several tricks to circumvent this. The game world is reusing locations for more than one thing. Also, the game world is opened up to you gradually along with the game unfolding. The same idea is applied in the game “The Witcher” where the majority of the game is going on inside a city. New major areas of the city opens up in later acts, but the old areas are kept and changes.

    Today, building huge worlds is not hard. You procedurally generate most of the content in the game through different technologies like SpeedTree for instance. Oblivion has hand-crafted dungeons it would seem, but usually they are just there for being there and not for moving the story ahead. Contrast that with random dungeon in Diablo 2. Usually, there is a chest at the end of those with good levelled loot. Being a loot-game, Diablo 2 encourages you to walk the dungeon.

    I sincerely hope that in the future, more games will be like “Wishbringer”. Mass Effect 1 (Have not played the 2nd game) shows some of the way, but it also adds a lot of randomly generated open spaces driven by a car. These are boring. Kill.

    Luck has it, that small flash-games and similar platforms will make the game logic be central again. I expect there to be some really interesting games with good “Tufte ratios” in the coming years. But the platform will be Web, mobile phone, Javascript or Flash.

    5

    View comments

  10. HaskellTorrent v0.0 released!

    Last weekend I released version 0.0 of the HaskellTorrent project. However, as modern development will have it, the interesting things happen on the main integration branch, master, on github: haskell-torrent. From the point where the 0.0 release was done and till today, three main things happened in the client.

    CPU optimizations

    A little bit of work with the profiler has shaved the use of the CPU down by a fairly large amount. I optimized the assertions on the piece database by using Data.IntSet rather than plain old lists. Right now the cost centres are the piece database assertions (still) and deciding which blocks to download. The former of these is simple to get rid of. We don’t need to assert the database at each message, but can do so with a rarer frequency. As for the latter, I have a couple of ideas to shave off CPU cycles on that one as well.

    Listen sockets

    HaskellTorrent now accepts incoming connections! It does this on port 1579 that has no special connotation apart from being one of the fairly low prime numbers. The etorrent project uses 1729 which has a more interesting history associated with it. Of course, one has to open the eventual NAT/PAT or Firewall to get connections flowing in, should you want to test it.

    test-framework

    Finally, we now use the excellent test-framework by Max Bolingbroke. The test inclusion was inspired a lot by Eric Koweys blog post on the subject, and it also used bits and pieces from Real World Haskell.

    The bottom line is that now you can execute tests directly:

    jlouis@illithid:~/Projects/haskell-torrent$ make test
    runghc Setup.lhs build
    Preprocessing executables for HaskellTorrent-0.0...
    Building HaskellTorrent-0.0...
    runghc Setup.lhs test
    Test test-framework:
    reverse-reverse/id: [OK, passed 100 tests]
    Protocol/BCode:
    QC encode-decode/id: [OK, passed 100 tests]
    HUnit encode-decode/id: [OK]
    Protocol/Wire:
    Piece (-1) 1 ""
    QC encode-decode/id: [Failed]
    Falsifiable with seed 2776559770653812966, after 1 tests. Reason: Falsifiable
    
           Properties  Test Cases  Total
    Passed  2           1           3
    Failed  1           0           1
    Total   3           1           4
    

    though it does seem we need to do some work in order to correct the software :)

    0

    Add a comment

Blog Archive
About Me
About Me
What this is about
What this is about
I am jlouis. Pro Erlang programmer. I hack Agda, Coq, Twelf, Erlang, Haskell, and (Oca/S)ML. I sometimes write blog posts. I enjoy beer and whisky. I have a rather kinky mind. I also frag people in Quake.
Popular Posts
Popular Posts
  • On Curiosity and its software I cannot help but speculate on how the software on the Curiosity rover has been constructed. We know that m...
  • In this, I describe why Erlang is different from most other language runtimes. I also describe why it often forgoes throughput for lower la...
  • Haskell vs. Erlang Since I wrote a bittorrent client in both Erlang and Haskell, etorrent and combinatorrent respectively, I decided to put ...
  • A response to “Erlang - overhyped or underestimated” There is a blog post about Erlang which recently cropped up. It is well written and pu...
  • The reason this blog is not getting too many updates is due to me posting over on medium.com for the time. You can find me over there at thi...
  • On using Acme as a day-to-day text editor I've been using the Acme text editor from Plan9Port as my standard text editor for about 9 m...
  • On Erlang, State and Crashes There are two things which are ubiquitous in Erlang: A Process has an internal state. When the process crashes,...
  • When a dog owner wants to train his dog, the procedure is well-known and quite simple. The owner runs two loops: one of positive feedback an...
  • This post is all about parallel computation from a very high level view. I claim Erlang is not a parallel language in particular . It is not...
  • Erlangs message passing In the programming language Erlang[0], there are functionality to pass messages between processes. This feature is...
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.