1. In UNIX there is a specific error number which can be returned from system calls. This error, EAGAIN is used by the OS kernel whenever it has a complex state in which it is deemed too hard to resolve a proper answer to the userland application. The solution is almost a non-solution: you punt the context back to the user program and ask that it goes again and retries the operation. Then the kernel gets rid of the complex state and the next time the program enters the kernel, we can be in another state without the trouble.

    Here is an interesting psychological point: we can use our code to condition another persons brain to cook up a specific program that serves our purpose. That is, we can design our protocols such that they force the user to adapt certain behaviour to his programs. One such trick is deliberate fault injection.

    Say you are serving requests through a HTTP server. Usually, people would imagine that 200 OK is what should be returned always on succesful requests, but I beg to differ. Sometimes—say 1/1000 requests—we deliberately fail the request. We return a 503 Service Unavailable back to the user. This conditions the user to write error-handling code for this request early on. You can't use the service properly without handling this error, since it occurs too often. You can even add a "Retry-After" header and have him go immediately again.

    This deliberate fault injection has many good uses.

    • First, it enforces users of your service to adapt a more biological and fault tolerant approach to computing. Given enough of this kind of conditioning, programmers will automatically begin adding error-handling code to their requests, because otherwise it may not work.
    • Second, it gives you options in case of accidents: say your system is suddenly hit by an emergency which elevates the error rate to 10%. This has no effect, since your users are already able to handle the situation.
    • Third, you can break conflicts by rejecting one or both requests.
    • Fourth, you can solve some distribution problems by failing the request and have the client retry. 
    • Fifth, simple round-robin load balancing is now useful. If you hit an overloaded server, you just return 503 and the client will retry another server.


    I have a hunch that Amazons Web Services uses this trick. Against S3, I've seen an error rate suspiciously close to 1/500. It could be their own way of implementing a chaos monkey and then conditioning all their users to write code in a specific way with it.

    The trick is also applicable in a lot of other contexts. Almost every protocol has some point where you can deliberately inject faults in order to make other clients behave correctly. It is very useful in testing as well. Use QuickCheck to randomly generate requests and let a certain amount be totally wrong. These wrong requests must then be rejected by the system. Otherwise something is wrong with it.

    More generally, this is an example of computer programs being both formal and chaotic at the same time. One can definitely find interesting properties of biological processes to copy into computer systems. While it is nice to be able to prove that your program is correct, the real world is filled with bad code, faulty systems, breaking network switches and so on. Having a reaction to this by having your system be robust to smaller errors is definitely going to be needed. Especially in the longer run, where programs will become even more complex and communicate even more with other systems; other systems over which you have no direct control.

    You can see fault-injection as a type of mutation. The programs coping with the mutation are the programs which should survive in the longer run.

    Consider hacking the brain of your fellow programmers. And force them to write robust programs by conditioning their minds into doing so.

    Thanks to DeadZen for proof-reading and comments.
    3

    View comments


  2. I am no Alan Jay Perlis, nor am I really worthy.

    • Function parameters fornicate. If you have 7, they will quickly breed to 14.
    • Any "new" idea which a person thinks about has a 98% chance of having been researched better and more deeply before 1980. Thus most new ideas aren't.
    • Age rule for the young: If a concept is older than you and still is alive you must understand it. If it is in hibernation it may come back again. If there is no trace of it - some bozo is about to reinvent it.
    • Dynamic typing is a special case of Static typing.
    • Beware the scourge of boolean blindness.
    • Prefer persistence over ephemerality.
    • The program which can be formally reasoned about is usually the shortest, the correct and the fastest.
    • "We will fix it later" - later never occurs.
    • Project success is inversely proportional to project size.
    • Code not written sometimes has emergent behaviour in the system. Either by not having bugs or by executing invisible code infinitely fast in zero seconds.
    • Your portfolio of closed source projects doesn't exist.
    • Version control or doom.
    • Around the year 1999 the number of programmers increased 100-fold. The skill level didn't.
    • Program state is contagious. Avoid like the plague.
    • Business logic is a logic. Inconsistent logic?

    • 0.01: The factor of human beings who can program
    • 0.001: The factor of human beings who can program concurrently
    • 0.0001: The factor of human beings who can program distributively



    • If your benchmark shows your code an order of magnitude faster than the established way, you are correct. For the wrong problem.
    • Debugging systems top-down is like peeling the onion inside-out.
    • A disk travelling on the back of army ants has excellent throughput but miserable latency. So has many Node.js systems.
    • Beware of the arithmetic mean. It is a statistic, and usually a lie.
    • Often, speed comes with a sacrifice of flexibility on the altar of complexity.
    • Sometimes correctness trumps speed. Sometimes it is the other way around.
    • Optimal may be exponentially more expensive to compute than the 99th percentile approximation.



    • The programmer is more important than the programming language
    • Programming languages without formal semantics is akin to a dumping ground. The pearls are few and far between.
    • The brain is more important than the optimizing compiler
    • The tools necessary for programs of a million lines of code are different than those for 1000 lines.
    • Specializing in old tools contains the danger of ending as an extinct dinosaur.
    • Like introduction of 'null', Object Oriented Programming is a grave mistake.
    • The string is heaven because it can encode anything. The string is hell because it can encode anything.



    • Idempotence is your key to network programming.
    • Protocol design is your key to network programming.
    • Sun RPC is usually not the solution. Corollary: HTTP requests neither.
    • Your protocol must have static parts for structure and dynamic parts for extension.
    • Only trust systems you have control over and where you can change the behaviour.
    • If a non-programmer specifies a distributed system, they always violate the CAP theorem.
    • In a distributed system, the important part is the messages. What happens inside a given node is uninteresting. Especially what programming language it is written in.
    • A distributed system can have more failure scenarios than you can handle. Trying is doom.
    • The internet has a failure rate floor. If your system has a failure rate underneath it, you are error-free to the customer.
    • If your system is doing a million $100 requests a year. A failure rate of 10 requests per year is not worth fixing.
    • If your system employs FIFO queues, latency can build up. Bufferbloat is not only in TCP.
    • Beware the system overload situation. It is easier to reject requests than handle them. You need back-pressure to inform.


    2

    View comments


  3. A very common thing that crops up now and then is the question in the title. What is fastest for editing text, the keyboard or the mouse? The answer which is an often quoted answer is an older "Ask Tog" article[1a, 1b, 1c]. They come up again and again in these discussions and then the keyboardists battle it out against the mouse-zealots.

    Since I have been working in most of the "grand" editors out there, Emacs and vi(m) for years, I do have something to say about this subject I think. Currently, I am writing this blog post, and most of my coding in the acme(1)-editor[2]. Acme is often seen as being an editor which is very mouse-centered, but there is more to the game than just being a mouse editor.

    First of all, what keyboard shortcuts do acme(1) understand? It understands 5 commands in total: Let ^ stand for the control character. Then it understands ^A and ^E which moves the cursor to the start and end of the line respectively. It understands ^H which is delete character before cursor (backspace) and ^W which kills a whole word. Finally it understands ^U which deletes from the cursor to the start of the line. The very reason for supporting these shortcuts are that they are very deeply rooted in UNIX. A lot of systems understand these commands and when entering text on end, these commands are very nice to have available. I guess I am a boring typist because when I see I have written a word incorrectly, I often just kill the whole word and type it again. The shortcut ^W is a nice quickly typed command on the left hand of a QWERTY style keyboard.

    Secondly, and I think this is a very important point, acme(1) has a command language stemming from the sam(1) editor. It may be that the mouse is often used, but if you are to change every occurrence of 'foo' into 'bar' you just execute the command "Edit , s/foo/bar/g". This is almost like in vi. I don't think anybody would argue that for a large piece of text this would be faster to do than to manually go and edit the text. The reason is that we are programming the editor. We are writing a program which carries out the mere task for us. And the cognitive overload of doing so is smaller than being the change-monkey. In the command the comma is a shorthand for "all of the files lines". What if we only wanted the change on the 2nd paragraph of the text? In acme(1) you can just select that text and then execute "Edit s/foo/bar/g". Which narrows the editing to the selection only. As you go from "program" to "specific" editing, then the mouse and the spatial user interface makes it faster and faster.

    The [1c] reference has a task which is trying to prove a point. A piece of text needs the execution of, essentially "Edit s/\|/e/g", replacing every '|' with an 'e'. The program above is clearly the fastest way to do it for large texts. And you don't even have to think about that program when you know the editor. But the time it takes to find each letter and replace it is subject to the cognitive overhead the article talks about. It adds up when you are doing lots of these small edits all day.

    For editing source code, a peculiar thing happens. I often grab the mouse and then I more or less stay on the mouse. Note that acme has the usual ability to cut-and-paste text on the mouse alone. You don't need the keyboard for this. It means that you can do a lot of text surgery with the mouse alone. Since you can select the end-of-line codepoint, you can easily reorder lines, including indentation. Often, renaming variables happens on the mouse alone. Also, there is some tricks that the mouse has hidden. Clicking twice right next to a parenthesis '(' selects all text up to the matching ')'. The same with quotes. It allows you to quickly cut out parts of your structured code and replace it with other code.

    Then there is text search. When writing large bodies of programs, you will often end up searching for text more than editing text. The quest is that of something you need to find. Since the mouse in acme(1) has search on a right click by design, most text can be clicked to find the next specimen you need to consider. A more complex invocation is through the "plumber" which understand the context of the text being operated upon. A line like "src/pqueue.erl:21:" is understood as "Open the file "src/pqueue.erl" and goto line 21 by a right click. Combine this with a command like "git grep -n foo" in a shell window and you can quickly find what you are looking for. I often use the shell as my search tool and then click on my target line. You can even ask grep to provide context to find the right spot to edit.

    Good editors can be programmed, and a mouse-centered editor is no exception. Apart from the sam(1) built-in command language, you can also write external unix programs to pipe text through. I have a helper for Erlang terms, called erlfmt, which will reindent any piece of Erlang nicely. I have the same for JSON structures since they are often hard to read.

    The thing that makes acme(1) work though stems from an old idea, by Niklaus Wirth and Jürg GutKnecht[3]: The Oberon operating system. In this operating system, the graphical user interface is a TUI or a textual user interface in which spatiality plays a big role. Not unlike the modern tiling window managers, the system lays out windows next to each other in ways so they never overlap. But unlike the tiling window managers, the interface is purely textual. You can change the menu bars by writing another piece of text there if you want. The same is present in acme(1). You often end up changing your environment into how you want it to look. Since you can "Dump" and "Load" your current environment, each project often ends up with a setup-file that makes the configuration for that particular environment. I essentially have one for each project I am working on. In many Erlang projects, there is a shell window where the menu (called the tag in acme(1)-speak) is extended with the command 'make'. This makes it easy to rebuild the project. And errors are reported as "src/file.erl:LINE:" like above, making error correction painless and fast.

    The key is that to make the mouse efficient, you need to build the environment around the mouse. That is, your system must support the mouse directly and make it possible to carry out many things on the mouse alone. It is rather sad to see that most modern editing environments shun a so effective editing tool and removes it totally from the entering of text. But perhaps the new touch-style interfaces will change that again? Currently their problem seems to be that the mobile phones and tablets are not self-hosting: we are not programming them via themselves. That probably has to happen before good programming user interfaces using touch becomes a possibility. I must admit though, that the idea of actually touching the 'make' button you wrote down there yourself is alluring.


    [1a] http://www.asktog.com/TOI/toi06KeyboardVMouse1.html
    [1b] http://www.asktog.com/TOI/toi22KeyboardVMouse2.html
    [1c] http://www.asktog.com/SunWorldColumns/S02KeyboardVMouse3.html
    [2] Acme is part of the plan9 port: http://swtch.com/plan9port/
    [3] Note that the original Oberon Native System is living on in the Bluebottle/AOS/A2 system today, see http://en.wikipedia.org/wiki/Oberon_(operating_system) and http://en.wikipedia.org/wiki/Bluebottle_OS
    5

    View comments


  4. The following were the initial research requirements for Erlang when they sat out to investigate a new language for telecom[0] (link at the bottom). It is in the thesis written by Bjarne Däcker, and I think it would be fun to scribble down my thoughts on the different requirements. My view may very well differ from the original views, since I came into the world of Erlang pretty late.

    Handling of a very large number of concurrent activities

    In a telecom system, or in an internet webserver, many things happen concurrently with each other. While one person is initiating a call, another person may be talking on a line while a third caller is trying to set up a conference call between 4 parties. This requires you to be able to operate many things concurrently with each other.

    In a webserver, it is the same thing. While you are taking in new GET requests, somebody is doing a POST somewhere while another client is getting data through a Server-Sent-Event channel[1].

    Note that this is not a requirement for parallelism at all. The only requirement is that we can easily describe such concurrent activities. We don't care if it executes on a single core at all.

    Actions to be performed at a certain point in time or within a certain time

    For a telecom system, this is quite important. You must be able to handle timing quite precisely. In principle, you would like to have hard realtime, but in practice soft real-time is often enough.

    But note: This means that you will prefer low latency over system throughput. It is more important that the system begins responding within due time that it is important it can deliver Gigabytes of bandwidth throughput. Often, latency and throughput are opposite one another. Getting latency down can hurt throughput and vice versa.

    It also means that your system must focus on being able to run many timers at once and handle all of them precisely. You may be woken up later than the 200ms you specified, but not before.

    Systems distributed over several computers

    This is a requirement for robustness of the system. The interesting thing to note here is that there are two large categories of systems of distributed nature: shared-nothing (SN) and those who are not. While it is highly desirable to have an SN system, these are not always easily possible to get. The problem occurs as soon as you need to share state between the given architectures. Many developers attempt to avoid sharing systems, for good reasons. But for certain problems, you cannot avoid sharing data. This is where a language with seamless distribution shines.

    Sharing information is very important in a telecom system. A configuration change must eventually be distributed to all end points. If one node goes down, another node must be able to keep on operating. So a telecom system must share some information quickly and cannot be made as an entirely shared-nothing architecture.

    There are other areas where you need to track state, preferably across machines: Computer Game servers, Instant Messaging systems, and Databases are a few such examples. Do also note that every shared-nothing system eventually has a place which shares state. It can be a database deep in the backend which handles multiple requests. It can be a memcached instance. Or a file on disk, even. In any case, few systems share no state.

    Where seamless distribution really rocks is when you need in-memory objects of state. If the disk turns out to be too slow, you need to materialize the thing you are operating on in memory and then periodically checkpoint the state to persistent storage. This is the case where it becomes too expensive to take a request, load the state from disk, change and manipulate the state and then store it back to disk.

    Interaction with hardware

    In telecom, there are certain operations which are impossible to achieve in software. Part of the 3g protocol is the recalculate optimal mobile-phone-to-mast configurations once every millisecond. This makes it impossible to do in software with general purpose chips. You need to handle it with FPGAs or even purposefully crafted chips.

    Back in the day, when Erlang was first developed, the problem has probably been the need to handle ATM switching hardware from the software layer. It also suggest that efficient handling of binary protocol data is important.

    Very large software systems

    Of course, what constitutes very large is subject to change over the years. But it does yield some thoughts on how the construct a language. In very large software projects, you will have many programmers working on the same code base. They must be able to use each other code easily. It must also be possible to evolve the code in one end of the system without affecting other ends.

    Compile speed is important. A recompile can't take too long in this setup. Also, it must be easy to construct interfaces that other programmers can use. Note that a major part is to battle change-over-time in the software, where certain parts of the code gets manipulated over a period of years. It creates its own slew of problems since code must still fit together.

    Another important point when programming-in-the-large is that you need a way to split up a program into packages and pieces. Otherwise, you can't really manage the complexity. You need a way to take different pieces, describe their dependencies and then assemble them into a working system. Preferably, you also want to be able to seamlessly upgrade one part of the software while keeping other parts constant. This suggests that you must be prepared to replace a package at some point in time, without needing to go back and change other parts of the software.

    Complex functionality such as feature interaction

    This requirement ties in with the shared-nothing approach from above. In certain systems, like telecom and computer game servers, the different features of the system will interact in intricate ways. You can't use a database for storing this since the changes must be kept into main memory. Otherwise it is too slow. In other words, it is important that the language allows you to write elaborate and complex solutions to problems where different parts of the system interact in non-trivial ways.

    This requirement is very far from the typical web server, where there is only a single interaction chain. A client will talk to a database. Most of the other things happen to be mere glue facilitating this main requirement.

    Continuous operation for many years

    Telecom systems are expected to have long lifetimes. The systems are expected to run for many years without being stopped for maintenance. Hence you need to handle a continuous operation of the system. If a fault occurs, you must be able to inspect the fault while the system is running. You can't stop it and have a look at the stopped system. Furthermore, the concurrency constraints means that you can't really halt the system, since other parts of the system will continue to operate normally.

    It also means that there has to be an upgrade path going forward. When Erlang was designed, it was not clear what kind of system architecture there would be in the future. There were MIPS, Digital Alpha, x86, HP-PA RISC, Sun SPARC, PowerPC and so on. And there were as many different software platforms: OS/2, Windows, UNIX in different incantations, WxWorks, QNX, NeXt and so on. This may have been the deciding factor in making Erlang into a virtual machine where ease of portability is more important than execution speed or hardware utilization.

    Software maintenance (reconfiguration, etc.) without stopping the system

    This is a requirement in internet networking equipment as well as in telecom systems. You can't stop a router when you decide to reconfigure it. Also, it means that configuration is not always a static thing you can keep in a configuration file. Some of the configuration may be dynamic in nature and be configured as you go along. Probably, this decision was what led to the incorporation of the mnesia database into Erlang.

    It also means that you need to introspect and upgrade the software while it is running. You can't stop operation just to get the system up again. Luckily, on the internet, we often can get away with some kind of service interruption, if done correctly. In a shared nothing architecture, we can often roll servers one at a time and thus upgrade service without anyone noticing. We can do database upgrades by rewriting client code so it can operate on multiple different schemas at a time and then we can go upgrade the scheme. In schemaless databases, we can even upgrade the database schema lazily in a read-repair fashion as we are reading old records.

    Games like Guild Wars 2 employ rolling upgrades by running two versions of the software on the same machine. See for instance the Green/Blue archtecture idea by Martin Fowler, et al[2]. The idea is that when they upgrade the game, they begin adding new players to the new version while keeping the old version running until the last player leaves the server. Of course they can hint the player to reconnect when the population becomes low. It does mean, however, that the player can decide when they want the reconnect. If they are in the middle of something important in the game, they can wait a bit.

    But there are important things to be thinking about here. How do you upgrade the state of the player from the old version to the new one, and so on.

    Stringent quality and reliability requirements

    There are certain decisions in Erlang which supports these requirements. First, the language decided to use garbage collection which eliminates many bugs pertaining to memory management right away. Note that the way the garbage collection is handled in Erlang means that usually GC times are extremely short-lived and thus never a problem for latency.

    Second, the language is very functional. Only a few parts operate in an imperative way, amongst those the messaging primitives and the ETS tables. The effect is the elimination of a lot of state-bugs in the code. These are often problematic in many imperative languages.

    Another decision is that integers are not bounded in size by default. There are no exceptional cases and there are no overflow/underflow bugs which can occur. A measurement was that quite many bugs in code bases are due to these errors. And the price to correct faults in large systems tend to be expensive due to the vast amounts of QA needed. By tolerating such bugs in the virtual machine you can eliminate the cost of fixing these bugs altogether.

    The language prefers operating on functional structure in programs. This means your programs have few variables used for indexing into structure and you operate with maps and folds over large general structures. It also means your code flow avoids complex if-then-else-mazes but has a single generic flow in them which processes data.

    Finally, programs are written in a certain style, OTP, which means that a lot of patterns are covered once and for all. As soon as you see an OTP-compliant system, you instinctively know how to absorb its inner workings. It helps quite a lot when you need to understand a system. OTP also enourages splitting up your systems into multiple process contexts. This means that each part is easier to understand. You only need to understand the part itself and the process contexts it communicates with. Often, this limits the complexity of the system, since you can get away with analyzing only a subset of the whole.

    OTP also encourages you to think into system protocols. To an Erlang-programmer an API is often a protocol which describes how you must communicate with a subsystem. It is different from usual library APIs in the sense that it is not always just function calls. It may be asynchronous messages that flows back and forth. That is, the protocol may specify that you send certain messages and you will get certain, often different, messages delivered to your mailbox. The erlang terms are symbolic, so you have very good ways to describe the contents of a messages.

    Fault tolerance both to hardware failures and software errors

    Note the emphasis that you must be fault tolerant to hardware failure as well as software failure. In certain situations, the hardware breaks down partially, but can still operate on degraded service. If a link is faulty, or you cannot use a given telephony channel, then you may be able to route around the given problem.

    In my opinion, this is one of the places where Erlang fares best. In a highly distributed system, you have to sacrifice some failure scenarios. The reason is that handling all of them is too complex and takes too long time. Some failure scenarios are even impossible to handle at all, and you are forced to aim differently.

    A system can not be free of errors in hardware or software. The thing under your control is the error rate. Even in a highly consistent single-machine-system, that system may break down. It means that the error-rate can never be 0, like in the distributed case. Everything you did not account for is a fault and the system must be built to tolerate those. This is a fairly complex thing to handle, and Erlang is built with a toolbox allowing you to handle the nastier errors of the lot.

    In practice, you are lucky on the internet. There is a noise floor for errors. Suppose your system fails 1 in a million requests. Now suppose that a user uses your service a million times. On average, the poor guy should have a service disruption. But what if his ISP has a rate of 10 in a million? This is the noise floor in effect. People will just retry the request and if you can then give service, you are relatively safe.

    [0] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.88.1957
    [1] http://www.w3.org/TR/eventsource/
    [2] http://martinfowler.com/bliki/BlueGreenDeployment.html

    1

    View comments

  5. To most programmers, the computer is a universal machine in a specific sense. We know, that if we can figure out a program for a problem, we can get any computer to solve that specific problem. We also know, by the rules of computability, that any (common) computer in our usual sense of the term is equivalent. They can solve the same problems and mostly differ in how fast they can solve it or how large a problem they can feasibly work on. We say, that they are Turing complete. There are other meanings of computers which are less traditional - built on biology, or quantum state for instance. They may or may not have a different computing model and hence other rules when it comes to what they can compute. But I won't touch on those, because they are not part of the point I want to make.

    The key concept is versatility. Given a computer - any computer - it can be programmed to solve problems in a specific class. The the programmer, it is highly logical that this be the case. We don't, usually, put much thought in just how amazing a feat it is. To us, the question is really to ask the question of how to achieve the output of a given program. We train on smaller problems by writing Tic-tac-toe, Chess games, hello world, twitter clones and so forth. We exercise mentally by writing 3d engines, raytracers, MP3 encoders, parallel K-means clustering in a quest to figure out how a program can be written. This is also a very software specific view of what a computer is.

    But this is in the eye of the beholder. The reason I write this blog post is due to a hypothesis which has been lurking in my head for a couple of days:

    To most non-programmers, the computer is non-universal. It is a specific machine for a specific purpose and you have to buy a new one once in a while to stay up to date.

    In other words, the computer is simply an advanced hammer. If you need a screwdriver, you need to go buy a screwdriver, that is another computer for that purpose.

    Now, I now that hardware changes and that newer computers are more powerful, have different input methods, better sensors and so on than what the older piece of hardware had. But this just muddles the mind. The point is that there may be a sizable portion of the human population who do not grasp the sheer versatility in the modern computer.

    If the software was right, I could take a smartphone, plug it into a monitor and I would have a state-of-the-art computer from around 2006-2007. Also, I would not be limited to running iOS on Apple hardware only and I would be free to run whatever software I wanted on any phone out there. Again, commercial interests confuses people here.

    Another thought experiment: "How many people believe that to run Facebook, you need a new computer?". How about getting the equivalent of Apple's Siri on an Android phone? I ask, because there are some out there who seems to believe that there is something magic to certain devices which enables them, as the only devices in the world, to carry out a mundane software-centric task.

    There is also a nice example from the CPU world. Intel at least planned to enable software upgrades of their CPU hardware. That is, you download a program which then in turn unlocks your hardware so it can do what it was originally built to do. Another incident was with Creative Labs back in the day where they used a driver to artificially limit certain old hardware - in turn forcing the customer to upgrade his hardware, even though it worked.

    All these incidents cements the purpose of the computer to be a concrete machine. So you have to buy a new one.

    But here is the problem: decision makers, politicians in particular, who don't understand this can not make the right decisions. If you think there is any manufacturing needed in a future world, you are thinking wrongly. Micro-manufacture becomes a possibility with the 3d printer. The computers width in what it can do means that prototyping is virtually possible for anyone. You just need to rent a garage and get to work.

    Except that if you don't understand what a computer is, you will have a very hard time grasping this. In the new world, the post-industrial age, it is cheap to design a new product or provide a new service. And these can utterly remove an older product or service very quickly.

    Daniel Lemire makes this point better than I: A post-industrial point of view

    And with that, I will stop my musing by posing questions:


    • Am I too pessimistic when I view fellow humans as unable to grasp what a computer is?
    • Am I too software-oriented? The hardware also plays a crucial role - is the hardware the primary driver or is the software?
    • Do I worry too much about the fact that politicians, at least in Denmark, are old with virtually no-one having technical skill or merit whatsoever?




    2

    View comments

  6. On Curiosity and its software

    I cannot help but speculate on how the software on the Curiosity rover has been constructed. We know that most of the code is written in C and that it comprises 2.5 Megalines of code, roughly[1]. One may wonder why it is possible to write such a complex system and have it work. This is the Erlang programmers view.

    First some basics. The rover uses a radioactive power source which systematically delivers power to it in a continuous fashion. The power source also provides some heating to the rover in general - which is always nice given the extreme weather conditions present on Mars.

    The rover is mostly autonomous. It takes minutes to hours to send a message and you can only transmit data in limited periods of the Mars day. The rover itself can talk with earth, but that link is slow. It can also talk through satellites orbiting Mars, using them as an uplink. This is faster. The consequence is that the rover must act on its own. We cannot guide it by having a guy in a seat with a joystick here back on earth.

    There are two identical computers on the Rover. We note that NASA acts in the words of Joe Armstrong: "To have a reliable system you need two computers". One of these are always dormant, ready to take over, if the other one dies for a reason. This is a classic takeover scenario as seen in Erlang systems, the OpenBSD PF Firewall and so on. The computers are BAE systems RAD750 computers. They run a PowerPC ISA and have some modest speeds. 200 mhz, 150 or 250 nm manufacturing process and an impressive operating temperature range. It is also radiation hardened and withstand lots of radiation. The memory is also hardened against radiation. It is not an easy task to be a computer on board the Curiosity.

    The operating system is VxWorks. This is a classic microkernel. A modest guess is that the kernel is less than 10 Kilolines of code and is quite battle tested. In other words, this kernel is near bug free. The key here is isolation. We isolate different parts of the rover. There are certain subsystems which are outright crucial to the survival of the rover, whereas a scientific instrument is merely there for observation. Hence we can apply a nice fact, namely that only parts of our 2.5 million lines of code needs to be deeply protected against error. There will some parts which we can survive without.

    NASA[2] uses every trick in the bag to ensure good code quality. Recursion is shunned upon for instance, simply because C compilers cannot guarantee the stack won't explode. Loops are ensured to be terminating such that a static analyzer can find problems. All memory is mostly statically allocated to avoid messing with sudden collection calls and unpredictable performance. Also note that message passing is the preferred way of communicating between subsystems. Not mutexes. Not Software transactional memory. Also, isolation is part of the coding guidelines. By using memory protection and singular ownership of data, we make it hard for subsystems to mess with each other. The Erlang programmer nods at the practices.

    The architecture on the Mars Pathfinder[3] which is the basis turns out to be very Erlang like. They have "Modules" which passes messages. They only wait on receiving messages, sending are void-functions. They have a single event loop for receiving, probably much akin to an Erlang gen_server process. The different modules communicate by sending messages to each other, over a protocol. You can access the memory space of another module, but it is shunned by the JPL coding guidelines. A difference to Erlang which disallows it entirely. The Mars Exploration Rovers (Spirit and Oppurtunity) has many more modules but is the same software basis. And Curiosity is no different. They essentially built on the older software. The thread count is in the hundreds, which also neatly reflects what it would probably be in an Erlang system of this kind.

    In Curiosity, they added "Components" which are groups of modules in order to manage the complexity. Components are also needed in order to handle the fact that you have two redundant computers and many other subsystems are also redundant for robustness. Interestingly, the Erlang designers also saw the need for such a thing, they just named them Applications. Nod.

    Functions checks for all invariants. Input parameters that they satisfy a precondition. That a postcondition holds of the return value and that various invariants are still true with assertions. The Erlang programmers nods again. Interestingly, there is a 60 line limit on functions so they can be printed on a single sheet of paper. The Erlang programmer prefers way shorter function bodies here but the idea still holds. Make code simple and comprehensible.

    Another interesting story is that in the past, one of the rovers had problems with priority inversion. They saved it by using a debug console to inject a correction to the rover. This is very much like we often do in Erlang systems. We can alter the running system as we see fit and upgrade them on the fly. We can monitor the system as it runs and make sure it runs as we would like. The ability to hot-fix the system is valuable. Also, development is done with extensive tracing and analysis of the traces - i.e., Erlang QuickCheck / PropEr, error logging and the tracing facilities.

    It turns out that many of the traits of Erlang systems overlap with that of the Rovers. But I don't think this is a coincidence. The software has certain different properties - the rovers are hard realtime whereas the erlang systems are soft realtime. But by and large, the need to write robust systems means that you need to isolate parts of the system from each other. It is also food for thought, because it looks like the method works. These traits are important for highly reliable software. Perhaps more so than static type checks and verification.

    The upshot is that of all the code lines in the Rover, we probably do not have to trust them all to the maximal level of security. We can sandbox different parts and apply different levels of correctness checking to these parts. In other words, we can manage the errors and alleviate the risk by careful design. Thus for some modules, we can probably live with the fact that they might error. Suppose that the uplink fails. We can probably restart it and have it survive. If not, we have another redundant uplink directly to earth which is slower - but can be used to restore the other uplink. This layering means that multiple components have to fail for the mission to abort. A science experiment can probably fail as well without aborting the mission. We could just take another picture after having restarted the module. There is a trusted computing base, but hopefully it is small and need little change. It is also battle tested on 3 other rovers in the base.

    The things that do not overlap has to do with the need of having soft realtime vs hard realtime. In Erlang we can yield service. It is bad, but we can do it. On a rover it can be disastrous. Especially in the flight control software. Fire a rocket too late and you are in trouble. This explains why they use static allocation and fixed stack size over dynamic allocation. It also explains why they dislike recursion. On the other hand, we get to avoid manual memory management in Erlang. We also have the benefit of a very deterministic tail call optimization, so we can rely on its use.

    TL;DR - Some of the traits of the Curiosity Rovers software closely resembles the architecture of Erlang. Are these traits basic for writing robust software?

    Sources:

    [0] Wikipedia, the Curiosity rover page
    7

    View comments

  7. One interesting view on Erlang is that it is not really about functional programming that much. With the right kind of glasses on, the functional programs are just what is going on inside the processes of the Erlang program. This may be interesting, but to the outside user of the process, you can't discriminate what the inside looks like.

    The program might be imperative for that matter. And it turns out that this isolation between the process and the messages it communicates is central to Erlang. The fact that such a process is an actor, is not really that interesting as well. You could have made a channel-based message passing system, which is more to the heritage of the Pi-calculus (and before it, CSP). And I don't think it would have changed much on the outside. Inside, there are reasons for this choice to ease the programming of an Erlang process - to dictate a specific style of programming. But we cannot see it, unless we know the concrete implementation of the process.

    It turns out that Erlang is not about RPC either. A plug I got from Steve Vinoski at this years Erlang Users Conference (2012) was to go back in time to some of the original RFCs - 707 and 708. These two RFCs, written by J. E. White both of them, contains a deeper insight than what one might expect. They are from before my time, in 1975 and 1976 as well. It turns out that sometimes we forget our past and our history.

    It leads to a discussion on protocols I had with Joe Armstrong. He and I think that protocols are a key part of systems, not just systems written in Erlang, and I would like to try to emphasize this point. When you have two or more processes communicating in an Erlang program, you have defined a protocol between these two processes. Like in RFC 707, both parties can act as a client or server - initiating requests and receiving replies from the other peer(s) they are communicating with. There are some similarities between Erlang message passing and those RFCs which are deeply interesting. Getting to a clear understanding of what your protocols means is very valuable when designing your system. The processes in the system are not as interesting as the protocols. If the processes carries out work, the protocols are there to orchestrate and coordinate.

    Your typical protocol consists of two parts. First there is the syntax of the protocol. This explains which messages you are allowed to send, their format, and what you can receive. But second, there are the semantics: rules governing when a message is valid to send or receive and an explanation of what a message means. Many protocols focus too much about the syntax parts and almost excludes the semantics parts. To make a better internet, we need to change that. And we can begin with our Erlang programs. By formulating the protocol semantics as well, one can often arrive at simpler and more succinct protocols with fewer moving parts. They tend to be easier to implement, and easier to extend. To boot their simple construction often makes for less errors in the code.

    Protocols are important because they standardize and abstract. First of all, they make a standard on which everyone is building: HTTP, TCP/IP, BitTorrent Wire communication, DNS. The neat thing about the standard is that I can go write a webserver in Erlang and have it communicate with a browser, written mostly in C++. And I do not have to worry about the details of implementation language, the concrete implementation design and a whole lot of other small things. The only reason it works is because of the standard. Second, the protocol abstracts details for me. The fact that I can chose to hide the implementation from the rest of the world makes it possible to interoperate in a seamless manner. It is a key feature which drove the internet to where it is.

    One particular daunting protocol is IP (version 4 or 6). It can be implemented in, perhaps, 500 lines of Erlang code. Yet, it underpins all communication you make on the internet. The cost/benefit ratio of those 500 lines of code is low. Tremendously low. IPv4, IPv6, 500 lines of Erlang is enough. Everything today uses this protocol as a basis - so it looks like we hit the jackpot and got the right kind of abstraction. Note that IP choses to leave a lot of problems unsolved. It standardizes on what can be solved instead. If your Erlang protocols can be as succinct and simple, then you have a good basis for building nice reusable systems.

    A way of looking at an Erlang program is to forget about the details inside the processes. What you want to do is to describe the protocol of communication instead. You can describe what will happen if a process receives a certain message. As an example, a BitTorrent client may have a message you can send to the IO layer: `{read, K, Tag}`. The semantics are that the IO subsystem will read piece `K` from the underlying stable storage and it will respond with either `{ok, Tag, Data}` or with `{error, Tag, Reason}`. The tag acts as a unique identifier reference so we can match up a particular read with a response. And it is part of a standard OTP `gen_server` behaviours call semantics to use tagging like this.

    Note we have not told anything about the implementation of reading off of stable storage. We have only described the behaviour of the process to the outside world. This is a nice abstraction since we are now free to implement or reimplement the IO subsystem without ever changing the protocol at all. Also note that we are not implementing a function call, nor are we doing "RPC" as it has become. The IO subsystem is free to carry out other requests before ours. It is also allowed to serve another process in between our request and the response. Or it may choose to interleave several such requests. Or perhaps, the may issue multiple read requests and then wait for them to arrive one by one. The IO subsystem might also send a message `{cached, K}` signifying that a piece was recently cached in memory so serving that one will not require a disk seek. The IO subsystem may also crash in which case there will never be a reply - a case we must handle with either a timeout or a monitor. None of these examples are encapsulated in function call semantics.

    A protocol should have room to "wiggle". If you look at the definition of the TCP protocol, one will come to the realization that it is specified just at the right level. It is not underspecified, so differing implementations will be able to use it as a communication medium. But it is not overspecified either, so implementations are free to interpret the protocol in different ways. A good example is that it is possible to implement TCP as a stop-n-go protocol without windows. And any conformant TCP implementation would understand this. It allows us to build a simple version of TCP, make it work, and then go improve it. Or it allows us to write a simple TCP/IP stack for a small embedded device where code size constraints reign.

    We can do the same with Erlang protocols. Build simple protocols, but allow for certain amounts of wiggling in their implementation. It makes our systems more extensible in the long run, and opens up the venue for improving upon the implementation later, without one having to redefine the whole system. Extensibility is usually achieved by making the procol composite. The reason JSON is winning is because it is simple and protocols designed on top of it is automatically extensible. In Erlang the same is true since Erlang terms are by definition extensible.

    A good example of a bad extension design is from the original BitTorrent protocol. In the handshake, there is a 64 bit value which is a bitstring of 64 possible extensions. The problem is that an extension now requires central coordination since everyone has to agree that bit X means Y. The problem was fixed by using one bit to signify that a new kind of message was valid. This new message contains a JSON-like (bencoded) structure which in turns describes what extensions are understood. Now everyone can add extensions as they like in an ad-hoc fashion.

    Another good example of bad design is the Minecraft server protocol for the game of Minecraft. It has undergone several iterations, but the newest one has a specific packet (0x11) for "Use Bed", or (0x47) which is a thunderbolt striking in the game world. All messages are flat, where a tree-structure ought to have been used. And packets do not have a "length" field so in order to determine the length of the packet, you need to do a partial decode. The latter also has the interesting feature that if you don't understand a packet your can only abort - since it is impossible for you to skip ahead in the packet stream to the next packet.

    The above is part of Jon Postel's principle/law: be conservative in what you do, be liberal in what you accept from others. That is, always conform to the protocol - especially when you send data, but if you receive something you don't understand - then skip it. This assumes that the other party can speak the protocol in a newer version than you, and you should design accordingly. A take-away though is that browsers parsing HTML were too lenient. If the HTML had an obvious parse error, then rather than producing an empty page the parser tried to fix the problem. The key here is that you should only be liberal as long as the message is still meaningful to you. Wrong HTML isn't. This, and the fact that HTML is not an extensible protocol medium creates so many problems today.

    Another important part of protocol design is reliability. If we send a message, are we guaranteed that it arrives? I usually always design my protocols for message delivery failure. Message delivery is not reliable. On a local machine we may think that the Erlang process we are sending to is really there. But if that other process just crashed, there will not be a process to receive our message. But here is th e problem: if that process received the message, operated on it and then crashed, we do not know the state of wether or not our message was processed. In general, I tend to follow the principle of IP: it worked for the internet, so let me design my protocols around that principle as well.

    The only way to solve this problem is by designing protocols around the principle of "Best effort delivery". If the `{read, K, Tag}` command fails to produce an answer within 10 seconds, we can assume the IO subsystem crashed and restarted. Since a read is idempotent (upon success) we can just restate the read request again. The read is in fact nullipotent which is slightly stronger. The whole trick is that by accepting failure your protocol design can cope with it.

    Distributed systems, has the problem to a far greater extent: the network can fail and we cannot trust the machine in the other end to behave as well. So we must design our protocol for failure. There is no other way. Note that almost all modern programs are distributed systems. While we expect the error rate to be fairly low, we cannot by any means guarantee that no errors will occur. Thus our design should incorporate a low error rate, perhaps 1%. It means that it is okay if error-handling is rather expensive: it happens rarely. It just happens to be such in practice, that distribution and unreliability go hand in hand. It would be foolish to make a design around full reliability. To me, any concurrency design which assumes no network or subsystem failure is outright dangerous. The whole system must function to give full availability and since independent parts may fail. Probability will be against us in this situation: The larger the system, the less it will run without trouble.

    In the kingdom of distribution, unreliability is reigning king. He who is an apostate embraces the fact: guarantees are not discrete anymore. There is a fuzzy factor and a risk of something failing, however small. You must account for this in your protocol design - local or distributed. The harness in Erlang, to tame the beast, is fault-tolerant code. The toolbox of supervision, linked processes and valid state isolation are all there to help you with handling unreliability in your protocol design.

    The fuzziness is currently changing the world of computing as we know it. Multicore is but one problem we face. The fact of distribution and failure is another dragon we have to slay. For instance, the CAP theorem is a direct consequence: consistency is not a discrete entity anymore.

    You need fault-tolerance in a modern world of distribution and protocols. There is no way around it.

    1

    View comments

  8. One important key aspect of Erlang programs is to identify where your Stable state is in the program.  Stable state is what you can trust. What you can trust is what you can build on. Joe Armstrong defines one of the key aspects of an Erlang system as Stable Storage. A place where we can push data and be sure it won't change. If we verify data before pushing, we can trust those data a great deal.

    This is important. If our system partially crashes, as is the norm for Erlang programs, it may be necessary to reconstruct state. Stable storage provides the basis from which we can re-read data into memory. Even if recreating data is expensive, you may still want a cache to be able to reconstruct your state faster from disk. Persistent store on a disk is among the best way to make sure data is there.

    In a BitTorrent client like eTorrent for instance, we only worry about the file. If we download a piece of a file and that pieces pass the BitTorrent SHA1 integrity check, we can now regard that part as "safe" write  to stable storage, and never touch it again. I don't have to care about the internal state of peers I am communicating with. I don't have to worry about any internal structure in memory. The on-disk partial download provides all the needed information to reconstruct the system from scratch should I need it.

    Second, there may be state we don't really want to lose - but we can afford it. We can't recreate a user input, so we need that on stable storage like above. But we don't want to redo expensive work if we can. To fix this in Erlang, we create a process to keep the important data, and we let that process protect the data simply by validating and verifying any change of the data. The process becomes a castle with the princess in it. And with a nasty dragon at the drawbridge. (Naturally, the princess and dragon have exquisite meals each night together and they like to dance tango. The nastiness and damsel-in-distress is only kept up for fun to lure unsuspecting knights to the party).

    Third, we can exploit that sequential Erlang is a functional programming language. If we are state S1 and we apply a function to obtain state S2, we have an interesting property: either we obtain S2, or we get an error. But since the data store is persistent, we still have access to S1 if we keep a reference to it. This in effect creates an atomic way of processing: Either we get to the new state safely, or we can't move to the new state due to an error. This means that each state becomes a safe-haven in our processing. Since we can't mutate data, there is no way the processing to obtain S2 can corrupt the state of S1. It allows us to build programs that are highly stable as it ultimately works like a CPU: We have a state and atomically we process a clock cycle to obtain a new state. There is no "in between".

    (Note: I must strain that modern CPUs are more advanced than this, but they try to uphold the illusion above)

    Fourth, we can exploit the isolation between processes. To get state, I must ask another process for it. To ask another process for it, I must send it a message. It might never answer. So I must build my system around the idea that systems will fail occasionally. If it answers however, the data is now mine to do with as I please. It may be invalid since it is too old, but as long as I have it, I can do with it what I please. At that point, I don't care too much about the fate of the other process, since I have a safe copy. This in turn can used to build a system where we know where the stable state is all the time.

    Fifth, we can exploit distributed Erlang. Have a couple of nodes. Store important data on multiple nodes. Now, should one node crash, the other nodes still have the data. And memory + network communication is often way faster than disk. Not to mention that you can get better parallel execution and faster recovery since data is already there in memory, ready to be served on the 10 gigabit link. The princess just phoned her girlfriends in Britain, France, Italy and Russia with the recipes for the next 100 meals (...and her work on homotopic type theory - princesses do have spare time to do research after all).

    See, the point is: when the system begins failing - how do we want it to crash? When you get the chainsaw and slay the proverbial dragon (the tree in your garden which slighty but not really looks like a dragon at all) you don't want it to fall down into your nice house. You want it to crash differently, down on the lawn. The same with Erlang programs. We want them to crash so it has little impact on users, but also such that our important data is still safe. And if it goes really wrong, we want data persisted somewhere else. Either on another node in the cluster, or on disk. We want it to crash in ways which avoids the stable state.

    The key is that we begin thinking about crashing a priori, before it happens. We think of where we have stable state and what parts we don't worry about crashing. The secret behind BitTorrent clients are that they are easy: you can throw away everything, sans the pieces of the file that have been checked for integrity. Everything else can just go crashing as it sees fit, we don't care. But when you take your own application and do the same kind of thinking, chances are that you will reach the same conclusion: there is a little bit of the system which needs protection, but you don't care about the rest.

    That is a hint on how to structure your Erlang program.

    PS. I should probably also write about how the loose coupling of Erlang processes foster good architecture, but that is another post for another time :)

    (Edited a couple of times to fix wording - thanks DeadZen)
    0

    Add a comment

  9. Here is a thing to ponder: Suppose you have a dynamically typed language - like Erlang. What exactly can you do with it to check stuff at compile time? This came about my mind after a twitter message by Yaron Minsky about Ocaml, where he posed the question "How people can live without types".

    An interesting point, by Bob Harper at the Existential Type blog and before him by Dana Scott: Dynamically languages are really unityped in the sense that all terms t : T or in words: "All terms t has the same type T". That is, we define a type which any term has. There is only a single type so both integers, strings, floats, and lists of tuple pairs of references and pids are all of type T. In Erlang we call T for term() or any().

    So what can we actually compile-time check for in a unityped language? It turns out we can do quite some things:
    • A variable that is used but is never defined is a type error.
    • A variable that is defined but is never used is a warning. It may be benign but often, it isn't.
    • A local function call to the wrong name is an error.
    • A local function call with the right name but wrong arity (number of function arguments) is an error.
    • A number of lexical binding errors can be caught without the need for a type system.
    It is interesting how a lot of programmers - even those working in unityped langauges - does not even have the above list at their disposal. I've seen lots of languages that happily accepts a program where a function is not defined at compile time. But Erlang does not accept the above. In fact all of them are caught at compile time.

    Erlang also does another thing, through the powerful dialyzer tool by Sagonas, et.al. Since every erlang term has type any(), we could ask: "Can we create a subtype hierarchy for any()?". It turns out that we can. integer() is a subtype of any(). so is atom() and char() (which is unicode codepoints). And so is [A] - lists of containing type A. string() is then [char()]. But we can do even better. pos_integer() is a refinement of integer() where only positive integers are in the domain. The type 1 | 2 | 3 is the type of exactly one of these three possible values. For notational convenice, we define 0..255 to be 0 | 1 | ... | 255 which is also known as a byte().

    The subtyping hierarchy we have so formed is in some sense stronger than what Ocamls type system can provide. There is a limit to the refinement and we can't capture that an integer value is 3 | 7 | 37. The next step is to use a trick already present in Ocaml: type inference. If we run the inference engine on Erlang we  get a notion of the most specific type we can derive from an expression. The inference is conservative in the sense that if we can't derive anything specific, we just assign it type any(). Which is perfectly valid in our unityped world. It is just an indication that the piece of code is too complex to correctly infer.

    An observation by Sagonas is that an Erlang program is already safe by virtue of the run time system carrying dynamic type information. It thus checks, at runtime, for type errors and exits if such an error happens. A large part of Erlang is devoted to handling these kinds of errors gracefully by the mentality of "Let is crash". In fact, Erlang is among the only languages that has a sound reactive approach to unforseen errors in a system. The strong belief present in almost all other languages - that you by some mixture of sheer luck and all-knowing divination can forsee all kinds of trouble your program might end up in - is ... well ... kind of irresponsible. So rather than trying to solve the problem of giving normal types to Erlang programs, Sagonas and Lindahl looked into using another concept, Success Types.

    The grand idea of a success type is this: suppose we have a function f. If we explore all calls to f that will make it return normally - that is not with an exception or a fatal error, we can determine - through inference -  the type of the value B which f returns. And we can determine what inputs A that will make the function return normally. We say that f has a success type A -> B.

    If we have success types for expressions and functions we can begin looking for things that will be problematic. For instance, if we have a sequence, f(X), g(X), and we have figured out that f will only return if given an integer() and g only when given an atom(), we know that this expression will always fail. That is a guaranteed error in our program and we can thus report it to the user.

    The consequence is that we have a static analysis tool that finds different kinds of errors than what a static semantics (type checker) will find: If we can't type an Ocaml program, we are pessimistic and report it as an error in the program. Reject. If we can't type a dialyzed Erlang program, we are optimistic and take it for granted that the programmer knows better than our puny tool. On the other hand, it does mean that when the dialyzer finds a problem via its success type analysis, it is bound to be an error. Thus the slogan "The dialyzer is never wrong".

    In practice, the dialyzer is so powerful that it can find many errors in programs. Coincidentally, the closer your program is to the style of Ocaml or Haskell, the more errors it finds :)

    The other part, which is orthogonal is the reactive approach to program error I mentioned above. The ideas can probably be traced back to Peter J. Denning and his work on Virtual Memory. The idea is to protect an operating system by compartamentalizing it. Each program running, the process, will have its own memory space. Thus by construction, it can't mess with the state of other processes in the system. The only way to make processes talk to each other is by messaging - communication of the state by copying the message from one space to the other.

    Erlang extends this notion of separate processes into the OS process at a much finer granularity. A typical Erlang program has thousands of (userland) processes messaging each other to coordinate and orchestrate the a program. Thus mistakes in the program become less of a problem: only a small part of the volatile state space can be in error - namely the part tied to the failing process.

    It gives rise to a sane model of programming: You try to anticipate errors in your programs and handle them. But if you don't know how to handle a given error - or you don't think that the case can occur at all,  it is better to let the process crash. Another process will then get informed and handle the crash gracefully. Usually your programs are built to the best of effort and laced sparingly with assertions about the sanity of internal state. As soon as that state is violated you fail fast and crash.

    The secret is that you attack the error bar from below. You place it somewhere such that you think you got all errors in the system. Then you test the system and you fix the few remaining errors that places the bar just right from the perspective of implementation time versus failure rates. Errors that occur from here on out are probably in the class of benign errors. They may occur, but they won't make the system fail. And they will have little impact. In other words, it is about controlling the risk of a fatal error. In Erlang an error usually has to penetrate through several layers of protection before it takes down the whole system and puts it into an unusable state. Because Erlang programs are built such that they can withstand the unforseen and well as the forseen.

    It is almost impossible to handle all errors in a program. And many errors in a program are simply not worth handling. Not all errors come equal. Some are weaker than others. But that will be a post for another time.
    2

    View comments

Blog Archive
About Me
About Me
What this is about
What this is about
I am jlouis. Pro Erlang programmer. I hack Agda, Coq, Twelf, Erlang, Haskell, and (Oca/S)ML. I sometimes write blog posts. I enjoy beer and whisky. I have a rather kinky mind. I also frag people in Quake.
Popular Posts
Popular Posts
  • On Curiosity and its software I cannot help but speculate on how the software on the Curiosity rover has been constructed. We know that m...
  • In this, I describe why Erlang is different from most other language runtimes. I also describe why it often forgoes throughput for lower la...
  • Haskell vs. Erlang Since I wrote a bittorrent client in both Erlang and Haskell, etorrent and combinatorrent respectively, I decided to put ...
  • A response to “Erlang - overhyped or underestimated” There is a blog post about Erlang which recently cropped up. It is well written and pu...
  • The reason this blog is not getting too many updates is due to me posting over on medium.com for the time. You can find me over there at thi...
  • On using Acme as a day-to-day text editor I've been using the Acme text editor from Plan9Port as my standard text editor for about 9 m...
  • On Erlang, State and Crashes There are two things which are ubiquitous in Erlang: A Process has an internal state. When the process crashes,...
  • When a dog owner wants to train his dog, the procedure is well-known and quite simple. The owner runs two loops: one of positive feedback an...
  • This post is all about parallel computation from a very high level view. I claim Erlang is not a parallel language in particular . It is not...
  • Erlangs message passing In the programming language Erlang[0], there are functionality to pass messages between processes. This feature is...
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.