1. One interesting view on Erlang is that it is not really about functional programming that much. With the right kind of glasses on, the functional programs are just what is going on inside the processes of the Erlang program. This may be interesting, but to the outside user of the process, you can't discriminate what the inside looks like.

    The program might be imperative for that matter. And it turns out that this isolation between the process and the messages it communicates is central to Erlang. The fact that such a process is an actor, is not really that interesting as well. You could have made a channel-based message passing system, which is more to the heritage of the Pi-calculus (and before it, CSP). And I don't think it would have changed much on the outside. Inside, there are reasons for this choice to ease the programming of an Erlang process - to dictate a specific style of programming. But we cannot see it, unless we know the concrete implementation of the process.

    It turns out that Erlang is not about RPC either. A plug I got from Steve Vinoski at this years Erlang Users Conference (2012) was to go back in time to some of the original RFCs - 707 and 708. These two RFCs, written by J. E. White both of them, contains a deeper insight than what one might expect. They are from before my time, in 1975 and 1976 as well. It turns out that sometimes we forget our past and our history.

    It leads to a discussion on protocols I had with Joe Armstrong. He and I think that protocols are a key part of systems, not just systems written in Erlang, and I would like to try to emphasize this point. When you have two or more processes communicating in an Erlang program, you have defined a protocol between these two processes. Like in RFC 707, both parties can act as a client or server - initiating requests and receiving replies from the other peer(s) they are communicating with. There are some similarities between Erlang message passing and those RFCs which are deeply interesting. Getting to a clear understanding of what your protocols means is very valuable when designing your system. The processes in the system are not as interesting as the protocols. If the processes carries out work, the protocols are there to orchestrate and coordinate.

    Your typical protocol consists of two parts. First there is the syntax of the protocol. This explains which messages you are allowed to send, their format, and what you can receive. But second, there are the semantics: rules governing when a message is valid to send or receive and an explanation of what a message means. Many protocols focus too much about the syntax parts and almost excludes the semantics parts. To make a better internet, we need to change that. And we can begin with our Erlang programs. By formulating the protocol semantics as well, one can often arrive at simpler and more succinct protocols with fewer moving parts. They tend to be easier to implement, and easier to extend. To boot their simple construction often makes for less errors in the code.

    Protocols are important because they standardize and abstract. First of all, they make a standard on which everyone is building: HTTP, TCP/IP, BitTorrent Wire communication, DNS. The neat thing about the standard is that I can go write a webserver in Erlang and have it communicate with a browser, written mostly in C++. And I do not have to worry about the details of implementation language, the concrete implementation design and a whole lot of other small things. The only reason it works is because of the standard. Second, the protocol abstracts details for me. The fact that I can chose to hide the implementation from the rest of the world makes it possible to interoperate in a seamless manner. It is a key feature which drove the internet to where it is.

    One particular daunting protocol is IP (version 4 or 6). It can be implemented in, perhaps, 500 lines of Erlang code. Yet, it underpins all communication you make on the internet. The cost/benefit ratio of those 500 lines of code is low. Tremendously low. IPv4, IPv6, 500 lines of Erlang is enough. Everything today uses this protocol as a basis - so it looks like we hit the jackpot and got the right kind of abstraction. Note that IP choses to leave a lot of problems unsolved. It standardizes on what can be solved instead. If your Erlang protocols can be as succinct and simple, then you have a good basis for building nice reusable systems.

    A way of looking at an Erlang program is to forget about the details inside the processes. What you want to do is to describe the protocol of communication instead. You can describe what will happen if a process receives a certain message. As an example, a BitTorrent client may have a message you can send to the IO layer: `{read, K, Tag}`. The semantics are that the IO subsystem will read piece `K` from the underlying stable storage and it will respond with either `{ok, Tag, Data}` or with `{error, Tag, Reason}`. The tag acts as a unique identifier reference so we can match up a particular read with a response. And it is part of a standard OTP `gen_server` behaviours call semantics to use tagging like this.

    Note we have not told anything about the implementation of reading off of stable storage. We have only described the behaviour of the process to the outside world. This is a nice abstraction since we are now free to implement or reimplement the IO subsystem without ever changing the protocol at all. Also note that we are not implementing a function call, nor are we doing "RPC" as it has become. The IO subsystem is free to carry out other requests before ours. It is also allowed to serve another process in between our request and the response. Or it may choose to interleave several such requests. Or perhaps, the may issue multiple read requests and then wait for them to arrive one by one. The IO subsystem might also send a message `{cached, K}` signifying that a piece was recently cached in memory so serving that one will not require a disk seek. The IO subsystem may also crash in which case there will never be a reply - a case we must handle with either a timeout or a monitor. None of these examples are encapsulated in function call semantics.

    A protocol should have room to "wiggle". If you look at the definition of the TCP protocol, one will come to the realization that it is specified just at the right level. It is not underspecified, so differing implementations will be able to use it as a communication medium. But it is not overspecified either, so implementations are free to interpret the protocol in different ways. A good example is that it is possible to implement TCP as a stop-n-go protocol without windows. And any conformant TCP implementation would understand this. It allows us to build a simple version of TCP, make it work, and then go improve it. Or it allows us to write a simple TCP/IP stack for a small embedded device where code size constraints reign.

    We can do the same with Erlang protocols. Build simple protocols, but allow for certain amounts of wiggling in their implementation. It makes our systems more extensible in the long run, and opens up the venue for improving upon the implementation later, without one having to redefine the whole system. Extensibility is usually achieved by making the procol composite. The reason JSON is winning is because it is simple and protocols designed on top of it is automatically extensible. In Erlang the same is true since Erlang terms are by definition extensible.

    A good example of a bad extension design is from the original BitTorrent protocol. In the handshake, there is a 64 bit value which is a bitstring of 64 possible extensions. The problem is that an extension now requires central coordination since everyone has to agree that bit X means Y. The problem was fixed by using one bit to signify that a new kind of message was valid. This new message contains a JSON-like (bencoded) structure which in turns describes what extensions are understood. Now everyone can add extensions as they like in an ad-hoc fashion.

    Another good example of bad design is the Minecraft server protocol for the game of Minecraft. It has undergone several iterations, but the newest one has a specific packet (0x11) for "Use Bed", or (0x47) which is a thunderbolt striking in the game world. All messages are flat, where a tree-structure ought to have been used. And packets do not have a "length" field so in order to determine the length of the packet, you need to do a partial decode. The latter also has the interesting feature that if you don't understand a packet your can only abort - since it is impossible for you to skip ahead in the packet stream to the next packet.

    The above is part of Jon Postel's principle/law: be conservative in what you do, be liberal in what you accept from others. That is, always conform to the protocol - especially when you send data, but if you receive something you don't understand - then skip it. This assumes that the other party can speak the protocol in a newer version than you, and you should design accordingly. A take-away though is that browsers parsing HTML were too lenient. If the HTML had an obvious parse error, then rather than producing an empty page the parser tried to fix the problem. The key here is that you should only be liberal as long as the message is still meaningful to you. Wrong HTML isn't. This, and the fact that HTML is not an extensible protocol medium creates so many problems today.

    Another important part of protocol design is reliability. If we send a message, are we guaranteed that it arrives? I usually always design my protocols for message delivery failure. Message delivery is not reliable. On a local machine we may think that the Erlang process we are sending to is really there. But if that other process just crashed, there will not be a process to receive our message. But here is th e problem: if that process received the message, operated on it and then crashed, we do not know the state of wether or not our message was processed. In general, I tend to follow the principle of IP: it worked for the internet, so let me design my protocols around that principle as well.

    The only way to solve this problem is by designing protocols around the principle of "Best effort delivery". If the `{read, K, Tag}` command fails to produce an answer within 10 seconds, we can assume the IO subsystem crashed and restarted. Since a read is idempotent (upon success) we can just restate the read request again. The read is in fact nullipotent which is slightly stronger. The whole trick is that by accepting failure your protocol design can cope with it.

    Distributed systems, has the problem to a far greater extent: the network can fail and we cannot trust the machine in the other end to behave as well. So we must design our protocol for failure. There is no other way. Note that almost all modern programs are distributed systems. While we expect the error rate to be fairly low, we cannot by any means guarantee that no errors will occur. Thus our design should incorporate a low error rate, perhaps 1%. It means that it is okay if error-handling is rather expensive: it happens rarely. It just happens to be such in practice, that distribution and unreliability go hand in hand. It would be foolish to make a design around full reliability. To me, any concurrency design which assumes no network or subsystem failure is outright dangerous. The whole system must function to give full availability and since independent parts may fail. Probability will be against us in this situation: The larger the system, the less it will run without trouble.

    In the kingdom of distribution, unreliability is reigning king. He who is an apostate embraces the fact: guarantees are not discrete anymore. There is a fuzzy factor and a risk of something failing, however small. You must account for this in your protocol design - local or distributed. The harness in Erlang, to tame the beast, is fault-tolerant code. The toolbox of supervision, linked processes and valid state isolation are all there to help you with handling unreliability in your protocol design.

    The fuzziness is currently changing the world of computing as we know it. Multicore is but one problem we face. The fact of distribution and failure is another dragon we have to slay. For instance, the CAP theorem is a direct consequence: consistency is not a discrete entity anymore.

    You need fault-tolerance in a modern world of distribution and protocols. There is no way around it.

    1

    View comments

Blog Archive
About Me
About Me
What this is about
What this is about
I am jlouis. Pro Erlang programmer. I hack Agda, Coq, Twelf, Erlang, Haskell, and (Oca/S)ML. I sometimes write blog posts. I enjoy beer and whisky. I have a rather kinky mind. I also frag people in Quake.
Popular Posts
Popular Posts
  • On Curiosity and its software I cannot help but speculate on how the software on the Curiosity rover has been constructed. We know that m...
  • In this, I describe why Erlang is different from most other language runtimes. I also describe why it often forgoes throughput for lower la...
  • Haskell vs. Erlang Since I wrote a bittorrent client in both Erlang and Haskell, etorrent and combinatorrent respectively, I decided to put ...
  • A response to “Erlang - overhyped or underestimated” There is a blog post about Erlang which recently cropped up. It is well written and pu...
  • The reason this blog is not getting too many updates is due to me posting over on medium.com for the time. You can find me over there at thi...
  • On using Acme as a day-to-day text editor I've been using the Acme text editor from Plan9Port as my standard text editor for about 9 m...
  • On Erlang, State and Crashes There are two things which are ubiquitous in Erlang: A Process has an internal state. When the process crashes,...
  • When a dog owner wants to train his dog, the procedure is well-known and quite simple. The owner runs two loops: one of positive feedback an...
  • This post is all about parallel computation from a very high level view. I claim Erlang is not a parallel language in particular . It is not...
  • Erlangs message passing In the programming language Erlang[0], there are functionality to pass messages between processes. This feature is...
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.