This post is all about parallel computation from a very high level view. I claim Erlang is not a parallel language in particular. It is not created with the primary goal of speeding up computation and harnessing multiple cores, your GPU and so on. If that is your goal, there are other languages which will probably fare much better for your needs (Haskell, for instance).
Note however, while Erlang is not a parallel language, its runtime is rather excellent at forcing out parallelism of existing concurrent programs. So when we say Erlang is parallel, we say that Erlang is parallel in a specific way! The recent years have seen much work in Erlang/OTP on making the runtime concurrently parallel and we are reaping the benefits. The reason can be found in the simple observation that a Erlang program has thousands of processes which gives thousands of executable threads of control. Since you have more than one thread of control, and communication between them is largely asynchronous, you have all the opportunity for a parallel speedup.
Parallelism vs Concurrency
A fine point indeed, is that parallel computation and concurrency are different entities when considering computation. Simon Marlow has a blog post in which he describes the details.
The gist of it, however, is rather simple: computing parallel is to hunt for a program speedup when adding more execution cores to the system running the program. This is a property of the machine. Computing concurrent is to write a program with multiple threads of control such that it will non-deterministically execute each thread. It is a property of the language.
If you come with a view from the hardware and upwards toward to computing logic, then surely you may fall into the trap of equating the two terms. The reason is that to implement parallel computation in current hardware and software, you use a concurrent semantics. The pthread library for instance is writing a concurrent program which then takes advantage of multiple processors if they are available. To
program for the shared memory architecture, you use mutex'es and locks. Most other methods build on top of these primitives. Deep down at the hardware level, it is the presence of a Compare-and-swap
operation that is at the heart of the idea.
One may then surmise: are we always going to use these primitives deep down when we implement parallel programs. Are we always going to go parallel from the concurrency primitives? If your opinion is largely yes, then you may hold the view from below, from hardware and up.
The view up from the logic and down, is that the language has certain constructs available to it, of which some are parallel. How we are going to implement those features is postponed until later. That is, we squint our eyes in a way such that we cannot see the deeper down details - but can concentrate on the general ideas. One could imagine different hardware on which another implementation would be more beneficial, for instance by running on special purpose hardware for the task, an FPGA or a GPU.
The main point is that parallelism is a declarative notion in the logic. If you want an array processed in parallel, you are not going to explain how the underlying implementation should squeeze in more computational work on multiple cores. You are just stating that the semantics of your program is such that it is allowed to do so. Most importantly, in parallel array processing you must be very specific about the order in which elements are being processed. For some computations it doesn't matter. For other computations, there is a notion of order and you can't screw it up. The difference in semantics here w.r.t. order is among what concerns a language designer with the view from logic.
So there are essentially two views: from hardware or from logic. From the perspective of logic, if you have toyed with formal semantics, is that we define concurrency-primitives in a different way than parallelism-primitives formally. That is, they are different entities. The perspective of hardware on the other hand, conflates the ideas because the only way to achieve parallelism is to use
concurrency - especially in UNIX.
Erlang is not a parallel programming language!
Erlang is not a concurrent functional programming language. It is not concurrent in the sense that Erlang was built for robustness and fault tolerance. To implement robustness and fault tolerance, it was decided that concurrency was a good vehicle. Likewise for the decision to make the language functional. These choices were merely done in other to achieve the greater goal and had to be chosen along the way.
Erlang is not a parallel language either! The goal is to provide robustness and fault tolerance. Going parallel does not directly facilitate this goal, and hence from a perspective of language logic there are very few primitives that has to do with the concept of parallelism. Contrast with concurrency of which there are several central primitives built into the language.
From the view of the hardware and the actual implementation of Erlang/OTP, much attention has been paid the recent years to make the Erlang VM be able to execute processes in parallel. This has mainly been done because you can then increase hardware utilization and hope to gain a speedup in your programs. In other words, it is not a necessity, but a nice-to-have feature.
The main point to take away from this observation is that in Erlang, parallel computation is implicit. If you write your program in an idiomatic way, you will often find that there are automatic speedup gains from going to multiple cores. But you have little direct control of how things are going to be parallel. Rather, you design your program such that it can implicitly take advantage of multiple cores.
There is an inherent want for many newfangled Erlang programmers to harness the full potential of their new K-core machine (with K > 1). The idea is simply that by using Erlang, the program will run faster because you will stand a chance at getting the utilization of the cores up.
But this is and has not been the primary focus of Erlang! And hence it is not a given that your program will run faster. To make it go fast, the program has to be written in a style that makes it possible for multiple cores to work on the program at the same time, implicitly.
That said, Erlang was designed in a way that makes automatic use of multiple cores much simpler than some other languages out there. In fact, a common case is that you should be able to speed up an unaltered program in many cases just by adding more cores. Given the focus on robustness and faul tolerance this sounds decision logical. If you have a system that works and is relatively void of bugs, then you should not have to go out and change the internals of the program to make it run faster. The Danger is that your new altered program is incorrect and then it does not matter at all if it runs faster or not.
One might take the goal of Erlangs parallel endeavours as: we hope to achieve a relatively good amount of speedup when adding multiple cores without you having to change the program. That is, if you add more cores, we expect some speedup, but not necessarily a speedup which is the best possible. It is an attempt at a best-effort solution rather than a perfect one. Herein lies the gist of why Erlang is not a parallel programming language. It is not about maximizing the throughput and usage of the many cores, but to use them to automatically enhance an already written program.
Finally, if one looks at the Erlang history, parallelism only came to the language fairly late. While some believe that the language is excellent in a multi-core era because it is (relatively) easy to make the runtime parallel this was not the goal initially. By sheer luck, some of the design decisions made for Erlang makes the language brilliant for running on multiple cores.
Finally, a word has to be said on granularity of processes. What is the smallest piece of computation that can be made to work on multiple cores. In Erlang, the grain is finer than most other languages concepts of "threads". And processes are so cheap that it is easy to spawn a new one to do separate computation. But it is important to note that the grain is coarser than the Haskell concept of a "spark" (see
Runtime support for multicore Haskell (PDF) for instance). And the spark concept is spot-on for squeezing out parallelism at a much finer grain than what Erlang has. The essential trick for sparks in Haskell is that they share the Thread-State-Object (context). So you don't have to go start new threads to service sparks, making them much cheaper to execute even if there are millions of them. As an explicit method of gaining parallelism, sparks is a very interesting concept.
In conclusion: Erlang can do parallelism in the view of adding it on top of concurrency. This means that parallelism is implicit rather than explicit in the program and hence you want to program in a certain style to make sure your program goes fast. The rest of this post here are ideas for doing that.
Utilization vs Speedup
It is very easy to get full utilization of a 128 core machine: Let 1 core do the actual work, and let the remaining 127 cores run an infinite loop.
All cores will now show that they utilize 100%, yet the program is not running any faster than before. We would love to have the program run 128 times faster, now we had 128 times the computational power available. The actual *speedup* is what we are after.
As a programmer, you have two goals. The first goal is to get the utilization up. You can't hope for speedup if cores are sitting there, idling. *The utilization is an upper bound for speedup*. Next, the goal is to make the additional cores do something better than the endless loop above. That is, decrease the execution time of the program.
It is worth to look at Amdahl's law in the parallel variant. The central aspect of the law is *"serial parts in program will slow your program down"*. That part can only be run on a single core, and hence it will limit the other cores into idling if they have to wait. And when we consider a parallel part, there is a critical path, called the span of the computation (see Work & Span for instance). Making the span smaller will also yield a faster program - but when optimizing for the span, the critical path may jump to another part of the computation.
Also of importance is Gustafson's law. Amdahl assumes a fixed workload size. If we can complete that workload in half the time, we have a speedup of two. But as the number of cores increases, so does the overhead of orchestrating their communication. Gustafson (et. al) proposed that we instead make workload size the scaling variable. That is, suppose the solve a workload of size 1 in 1 minute on 1 core. Now, if we have 8 cores, perfect speedup would be that we were able to solve a workload of size 8 in 1 minute on 8 cores. Contrast with Amdahl: workload 1, 1/8 minute and 8 cores.
Erlang and speedup
While Erlang is not an explicit parallel language, it does not hurt to write your Erlang programs in a style which invites multiple cores to do work. In this section, we will try to give some general advice on how to do exactly that. To make Erlang able to execute programs in parallel, you must obviously structure your program as such.
First, your process run-queue must have enough work that it can be divided among several cores. If there is only ever a single process that can run, that will limit the parallelism of your program.
Second, be aware of serializing bottlenecks. A process executing is the granularity. So if a large number of workers all go to the same single process for their data, that single process is now the contention point which is serial and for that part of the code there will not be any speedup. If the query to the single process is dominating in your program you are in a bad shape. The extension of this is rather simple: if you have 4 workers processing a queue, then your maximal speedup is 4. This may not be enough on a 16-core machine.
Third, synchronous calls blocks more than asynchronous casts. If possible, prefer asynchronously communicating processes over synchronous ones. Long chains of synchronous calls following each other are serialization in disguise. They block out other callers when they are processing. They are also prone to creating deadlocks in programs. Asynchronous calls will in addition tend to fill up mailboxes with work. It is much more efficient to handle 8 messages in a context switch time slot than it is handling 1 message and then switch right away on an empty queue. In some cases you can go pseudo-async by making 1 out of 100 calls synchronous for instance. This is an effective way to ensure that you don't overflow a slow receiver as it forces in some amount of flow control.
In Erlang a process is very cheap. It is far cheaper than a `pthread` thread in C. This leads to a programming methodology where you are not afraid of creating processes. Much like you would be creating objects in OOP. A million processes is not a limiting factor. A good design choice in Erlang is simply to create a process per incoming request or event. Take an HTTP server for instance. Each GET-request will spawn a new process to handle that GET. Had processes been heavy in weight, this approach would not have been viable at all. The idea is that the data you are going to be operating on is morphed into a thread-of-control so the data itself can go ask other parties for information, process on itself and so on. There is a strong similarity to OOP here as each process plays the role of an living object (they are "dead" in OOP only to be temporarily revived from their zombie state when the current executioner of control comes by).
The design effectively ensures our run-queue is always filled up well with work to do. Had we opted for a queue-pipeline approach as is common in many other languages, the amount of workers working on each queue would have to be tuned to the number of cores. Furthermore, you have to watch out for overloading a specific queue. In our morph-data-into-process design it is the run-queue we have to watch
out for, but that is simpler and there are tools for doing it (see for instance Erlang Solutions esl/jobs application).
Note, however, that if these many small processes all request data from the same server-like process, then performance will dwindle and suffer. The server-like process is a serializer which we warned about earlier.
One idiom I commonly use is to have a protected ETS table with the option {read_concurrency, true} set. When all the spawned processes want to read data, they go directly into the ETS table and fetch what they need. Writes are serialized through a central process. Of course this scheme can be altered if you need a large amount of writes, or general reader-writer access.
In the newer Erlang/OTP releases (R14 and upwards, I believe) the access to the ETS table is highly parallel in the runtime. Hence, your million processes will not have to wait on each other to read data out of the table.
Another common trick is to increase the independence of computations. Whenever you can eliminate some sharing, perhaps by recalculating it or caching it somewhere, then you increase the independence of the computation. In Erlang semantics, processes have exclusive access to their own data. Hence, the independence of data is already really good. What you want to avoid is having to partner with other processes to solve a task -- if the partner is heavily contended.
The central point is to be aware where serialization might occur in your code and then eliminate it. Otherwise, you may see a bad utilization, and no speedup. There is a nice tool, Percept, which can profile the runnability of a process; when a process or port can be run on a core.
Also, you should be aware of the following interesting gotchas:
I disabled SMP and now my code runs faster!
Yes. This is a common thing. First, when you benchmark your code, a very grave but all too common mistake is to use your code optimized for parallel execution on the single core. Parallel code often has some overhead due to communication and often, it also impedes the execution speed if you run it on a single core only.
You must benchmark the fastest possible non-parallel version of your program and count the speedup against that. The other solution is simply not fair. It is one way you can make your benchmark look better by cheating.
The overhead is also present when you take your program, written in Erlang, and run it on the non-SMP variant of the Erlang VM. It can often go faster for some work loads. The reason is that your program may not exploit parallelism that well and thus the overhead of going truly parallel is dominating the actual speedup gain. I have seen this with some of my Haskell programs as well.
There is a fairly high price for suddenly having multiple cores inside the VM doing things. You will have to synchronize in the runtime and even though your synchronization primitives are heavily optimized, they have overhead. Now, if the problem you are looking at is not even saturating a single core, then surely, it would have been better to run in a single-threaded manner. It is only when you could benefit from a second core that it will help you.
If, on the other hand, your problem maxes out the core, then adding a new core to the mix has some overhead, but it will make the computation scale further and you may experience a better throughput in the end.
So if you are well below a single core in utilization, this is often a way to lower the amount of work your system has to do. And make the code run faster.
My code has faster than perfect speedup!
Another common thing that can happen is super-linear speedup. That is, you may have 8 cores and expect a speedup close to 8 -- but all of a sudden you have a speedup of 12. This happens. The reason it happens is usually that when you add cores to your computer, you also alter other resources. Most notably, you often increase the amount of available cache. If you distribute the computation over multiple physical machines, you may also increase the amount of available RAM.
These factors can improve your computational speed by quite a lot if you can manage computations such that they become independent of each other. It directly reaps the benefit of the added caches as there will be no migration of data from one cache to another. If the computation can suddenly fit into the caches, then the speed can be much increased. Likewise, the same pattern can emerge if you add more RAM to your cluster. Suddenly the central data can be kept mostly in memory and you don't have to load data off the disk anymore.
Hiding latency - the secret weapon for fast parallel computation
If there is a single trick resulting in fast parallel speedups, it has to be latency hiding. Note problems come in three varieties: The simple, the impossible and the interesting.
The simple problems are called the Embarrassingly parallel problems. They are simple to solve: split them into equal sized chunks and solve each part. Then reintegrate the individual solutions into a final result. The idea of Map-Reduce leans itself to a solution method. It is obvious if your problem is simple, then Map-Reduce solves it. The impossible problems are the class of problems for which you have little to no hope of speeding up the computation by having more cores. A compiler, for instance, can compile multiple modules at once. But the steps of lexing-parsing-code-gen-optimization-and-so-on has to happen in that order and has to happen in a serial fashion - a large span is dominating. Some numerical iterative computations require a finished computed step N before they can tackle step N+1. These problems are so data dependent that you have no hope of making those parts parallel. Essentially the span of each iteration is chained together in a way that makes it impossible to speed up.
This leaves us with the interesting class of problems. In order to compute these, you need to have some amount of sharing/dependence in between the processes. And this is where the latency hiding trick comes into play.
When one process sends data to another process, there is latency involved. We must copy data into another mailbox, or perhaps copy to another computer. The bandwidth of the network and the size of the data we have to copy puts us into a situation where there is a minimum latency we can expect when we copy. Now, the naive solution is to repeat a two step process: Compute this iteration. Then Exchange this iteration with other processes. But every time we exchange data the processes are waiting and are idle. If communication time dominates, then our speedup will suffer.
The latency hiding idea almost doesn't even warrant a name: "do something else while the data transfers". That's it. Like an operating system is doing something else while a process waits for disk IO to complete, a process can do something else while waiting for data to transfer.
To Erlang programmers this means that the order in which you send and receive matter. If you reorder the receives such that the communication overhead is hidden, then your program will be able to find the relevant message in its mailbox already there when it is needed. The rule is send early and receive late. As soon as you have data another process will need, you must make it available to them. And when you need data make sure you get it as late in the process as possible, just before you need it. That way, it may already be there in the mailbox, ready for you to use.
The conclusion is that one must take into consideration the impact of latencies. In high performance computing it is of primary importance. A worthy bet is that on modern computers, moving data around is often the bottleneck. Not the raw speed of the CPU core. If we have a distributed system, then the latencies may be much higher due to network copying - with a much higher impact on speed. The CPU cache also plays a crucial role in limiting the speed by which your computations will run.
Conclusion
We have given a brief overview of some of the parallel computation basics. We have argued that concurrency and parallelism are different beasts. We have argued that Erlang is not particularly parallel. We have surmised on what one should do to increase parallelism in Erlang programs. We have also hit on a small number of interesting gotchas when computing parallel. And we have discussed about the concepts of latency hiding.
There is much more to be said about Erlang and parallel computation. I particularly like that Erlang does not assume a cache-coherent architecture for instance. But that will be a post for another time since I am all out of blog space and this is already too large.
-
The reason this blog is not getting too many updates is due to me posting over on medium.com for the time. You can find me over there at this point:
https://medium.com/@jlouis666
where I mostly blog on the same things as here. The medium platform is nicer as a venue for writing text and I like the layout better, which is why I now prefer it to blogger.com.
I might still be crossposting popular posts, but I can't promise anything.
0Add a comment
-
Erlangs message passing
In the programming language Erlang[0], there are functionality to pass messages between processes. This feature is used to implement communication and synchronization between different concurrent activities in an Erlang system. Other languages implement somewhat the same functionality with slightly different semantics. But in most cases, you can implement one of the semantic styles in the other style, so they are not that different in practice. Haskell has STM and MVars for instance, Golang and the Standard ML CML extension has channels, and so on.
The current major implementation of the Erlang language, called the BEAM VM, implements message passing via copying. This sets the implementation apart from almost any other language out there. Most implementations have chosen to pass values between processes by reference. The reason seems strikingly obvious: passing by a reference is much faster than to copy values between processes.
I want to stress that there is nothing in the Erlang semantics which mandate that you copy data around. You could equally well keep the current semantics and opt to just pass values by reference between processes. So if it turns out that we—in the long run—win by using references over copying, we can just change to do exactly that.
One might wonder, why the BEAM VM opts to copying around data rather than passing it by reference. Of course, it boils down to the same thing as always: tradeoffs. There is much to be gained by copying.
Embracing copying
The secret to the success of the BEAM VM is its choice of copying messages around. The sacrifice you make is the ability to pass references. However, that sacrifice pays for itself in the long run in many ways, when you begin to think about the implications of picking a different strategy.
First of all, modern computers are good at copying data. In fact, they are extremely good at tossing data around. Most of the time is spent on copying data between caches anyway, for modern complicated programs. A correctly structured Erlang program will usually only copy some 100 bytes per message, which will quickly be transferred between processes. And so it turns out that there is an overhead to copying data around, but the overhead is much smaller than what people usually expect it to be. The behaviour has been known from around the 90'es, from the high allocation rates of functional languages in general. And copying a message is much like having a high allocation rate.
Second, there is one type of data in the BEAM VM which is passed by reference: binaries above a certain size (64 bytes). When you allocate, say, a 1 megabyte binary it is allocated in a separate memory arena, and you are only passing a pointer to that binary. Note that binaries can not contain pointers themselves, and furthermore, they are also immutable. Since there are no pointers, you don't have to worry about binaries pointing to other data. They are always "leaf nodes" in an Erlang term. This and immutability allows the VM to reuse the same binary in multiple processes. It also allows the VM to make sub-binaries that references inside the 1 megabyte chunk. If you do e.g.,binary:split/3
to split a binary into parts, you just get these sub-binary references into the original binary. Other languages provide roughly the same primitives. Slices in Golang are like this, but they allow you to mutate the contents. This wouldn't work in Erlang, since data is immutable in general. Slices in Standard ML are more like the sub-binaries in the BEAM VM. In general it leads to excellent read-only memory reuse.
Copying means that you have the same operational behaviour, even if you are sending a message to another distributednode()
running on another machine. Or if you have a NUMA architecture where you have to send the message into the memory bank of another CPU core. If modern CPUs lost cache coherency, it would pose no problem to the semantics of Erlang programs. They are fully forwards compatible in case it happens.
Copying means that you have full process isolation. When you send a message (locally) in Erlang the following happens:
- You allocate a target memory space in the sending process
- You copy the message to that memory space
- You take the external lock on the target process
- You link your memory space into the mailbox linked list
- You release the external lock on the target process
Full process isolation means that a process can have its own heap and stack. The most important consequence is that if a process crashes or incorrectly handles memory, only that process can be affected. There is no way a library written by a Junior programmer can go havoc and make arbitrary memory corruption in the system. You can safely add modules and libraries and be sure they are limited in what amount of error they can cause in the running system. For large programs, this feature is necessary.
Since each process has its own heap and stack, it can run GC independently on every process. Since process heaps are usually small, the GC pause time is not noticable. This property makes Erlang into a true soft realtime system. Also, GC can be preempted if a process has amassed a large heap—efficiently thwarthing the ability of one process to hose the system as a whole.
The isolation property also means data independence. Data can't be shared by construction and so it is easy to operate on data in parallel. Since there is no data sharing, programmers are forced to think about how to pass data around in the program, which further improves independence and parallelism of the program. A copied message is a local cache.
Each Erlangnode()
has another arena called ETS—the Erlang Term Storage. ETS is a place in which you can store Erlang terms and look them up by Key later. It is equivalent to a tuple space[1] (though it isn't distributed). Process stores terms in ETS by copying the term to ETS. And they retrieve terms from ETS by copying them back into their own heap before operating on them. ETS is not garbage collected. As such, it works like safe variant of manual memory management. There is no use-after-free, double free, and so on. Just memory leaks. And they can be handled by having a janitorial process to clean up after other processes. The crux is that the non-GC property means you can easily have 400 gigabyte data in ETS without having to worry about GC pause times. Again, the embracing of copying makes it all possible. A common trick in Erlang is to move some of a process data into ETS to avoid amassing a large heap for the process. This then speeds up the GC times of the process into the area of "not noticable".
Cleverly, the copying scheme also avoids the complexity of realtime pauseless garbage collection of large shared heaps. These beasts actually do exist[2], but they are not easy to implement, nor well understood. They were not understood at all when Erlang needed soft realtime guarantees, in any case. So the copying compromise were chosen instead. The VM is now simpler to implement, which leads to fewer errors and a VM that is easier to comprehend and understand.
I am inclined to say that the copying message passing in the BEAM VM is a cornerstone to the success of Erlang. Many of the safety, realtime and robustness guarantees hinges on the copying message passing design, together with data immutability and data persistence.
So as an Erlang programmer, I tend to see copying of messages as a good thing. Not a bad design decision that has to be worked around.
[0] And also in Elixir!
[1] http://en.wikipedia.org/wiki/Tuple_space
[2] http://michaelrbernste.in/2013/06/03/real-time-garbage-collection-is-real.html0Add a comment
-
So the discussion of the 10x programmer has come up again, this time because Shanley Kane wrote about the 10x programmer being a myth over on medium[0]. Shanley makes the case that the original 1966 paper used as a basis for the argument is not really about high productivity. My take is more or less in alignment with Shanley: the 10x programmer is a myth. What is far more likely is that there is a confounding variable in most of the cases which then attributes to some programmers being more productive than others.
The danger of pursuing a 10x productivity in the blind is that you may end up using the only viable trick you have right at your disposal: increasing time spent working.
There are numerous irritating things with this "solution". The most important one is that it doesn't scale. And at best it only scales linearly which means—everything else being equal—that it can only provide a factor of 2, perhaps 2.5 if we are really strechy. Anything more than that requires the use of dubious substances and funny drugs.
There are many employers however, who would love to see you scale along that axis, for one reason or the other. It is quite vampiric, and I don't think it works over longer stretches of time. The only way to get more productivity is by using other tricks. Especially if you are to find the 10 factor increase.
But I think that the only way to really understand the paper is by reading it yourself. So I did.
The 1966 myth paper
The problem, which is the underlying one, is that it is notoriously hard to make accurate controlled tests. The original paper, "EXPLORATORY EXPERIMENTAL STUDIES COMPARING ONLINE AND OFFLINE PROGRAMING PERFORMANCE", 20 December 1966[1], studies if having on-line access to a machine helps productivity. And the findings, compared to an off-line machine is in the positive. In passing, the large diversity between programmers completion time is mentioned. If you disagree with my findings here, I suggest you go read the paper yourself. It is always good to look at old research and there is a good chance that you were not programming—nor born—when this paper was written.
The paper itself is actually pretty interesting. There are some amusing comments in the paper, pertaining to the viewpoint that having an on-line editor is detrimental to development. That is, off-line programming forces the user to cook up better programs. Another amusing fact is that while debugging happened either on-line or off-line, actual programing (note the missing 'm') was always carried out off-line.
The analysis in the article is rather sound, when it comes to deciding if on-line or off-line debugging is most efficient. Even for the small set of participants, ANOVA is performed. Even with the rather small sample, there is a significant favor of on-line debugging (today, most people would probably not be surprised by this fact, but it was not a fact back in the day).
The paper works with two groups. The latter group of 9 persons is for inexperienced programmers. They are not interesting at all to our 10x myth: since they are newcomers, the speed at which you absorb and learn is going to be dominant. Specifically, new programmers—today—come with vastly different experience and skillsets so naturally their diversity is high. Back in the day, it may have been a different situation, I think. Yet, I won't really account for these as it is not as interesting as the remaining 12 participants:
Given the article is from 1966, saving CPU hours on the system for debugging is as important as everything else. It is not only to get rid of "horrid" programers. It is also a questions of saving valuable machine time.The "horrid" portion of the performance frequency distribution is the long tail at the high end, the positively skewed part in which one poor performer can consume as much time or cost as 5, 10, or 20 good ones. Validated techniques to detect and weed out these poor performers could result in vast savings of time, effort, and cost.
Another important point in the article is
This implies that some programers have chosen another language. Especially if you also take the following quote later on in the paper:Programers primarily wrote their programs in JTS (JOVIAL Time-Sharing—a procedure-oriented lauguage for time-sharing).
In other words, it highly suggests a confounding variable which is the choice of programming language. Given the general state of compilers and interpreters in 1966, I am willing to bet that the difference in performance, contributing to the 10x myth is programming language choice.a. A substantial performance factor designated as "programing speed," associated with faster coding and debugging, less CPU time, and the use of a higher-order language. b. A well-defined "program economy" factor marked by shorter and faster running programs associated to some extent with greater programing experience and with the use of machine language rather than higher-order language.
Very few will object to the fact that machine code is harder to write than higher-level programs in JOVIAL. Especially at that point in time where knowledge of a systems machine language was more specialized and required more intimate knowledge of the machine. I find it pretty amusing that this little fact has been hidden completely from the discussion about the paper. Also, it is odd that you have no complete table with 12 users and their times and language choice. It wold make the material stronger since you would be able to run experiments on the data yourself.
If the 10x programmer exist, this paper isn't the one proving it. And if the 10x programmer exists, that programmer is writing Haskell :)
[0] https://medium.com/about-work/6aedba30ecfe
[1] http://www.dtic.mil/dtic/tr/fulltext/u2/645438.pdf
Written in Acme and converted with StackEdit.
0Add a comment
-
A list of common problems
This serves as a gentle reminder list of things one should be aware of when doing Erlang advocacy. The list is quite haphazard, but there is a common point to make, which is that "Erlang is slow". The idea is that there is an underlying confounding reason as why that is in many situations, and it has to do with something not tied to the language itself per se, but rather to bad practice.
The bait is wrong
Let us split the world of programming problems along two axis: "How fast do you want it" and "How much complex synchronization/concurrency does the problem need". The former has to do with fast computation. That is, HPC, Video encoding, data mining and so on. The time it takes to deliver a result matters for these problems. The latter has to do with massive amounts of concurrency: web servers, chat systems, message busses, event coordination and so on.
Problems largely fall into 4 quadrants:
- Type A: Slow, Sequential
- Type B: Fast, Sequential
- Type C: Slow, Concurrent
- Type D: Fast, Concurrent
The thing is: Erlang is interpreted. The reason is probably mostly historical since it is easier to port interpreters and back in the day there were multiple architectures on which you were to run to be in business. But it also makes heavyweight computation in the language horribly slow. So people try using Erlang for a type B problem. Hence, Erlang is slow.
I/O is incorrectly handled
Too many projects are not using theiolist()
type. They should. If you have output, then you should be able to take aniolist()
and operate on that. Requiring any other type is wrong for payload-style data in most cases. The problem is that the code will have lots of unnecessary internal copies of data, where it could just shove that data directly to an underlying socket.
Another common problem is failing to recognize that you are passed an IO-list. This makes for subtle bugs in code at times, where otherwise perfect code fails since the iolist structure changes.
There are also a large set of common problems which stems from setting the wrong options on files and/or sockets and then claiming Erlang has slow IO. The defaults won't give you a lot of speed, but they will give you high flexibility.
And finally, a very common mistake I see all the time is treating a TCP socket stream as a way to pass messages without having any kind of framing. This happens very often, sadly.
All of these mistakes leads to one conclusion only: Erlang is slow.
Bad protocol implementations
When you have to support a new protocol, the first implementation is often written in anger, as quickly as possible and with the goal of solving a completely different problem. This often makes for protocol implementations which does not scale. This in turn falls back on the language, because "this is the fault of the language". And hence, Erlang is slow.
Other times, the problem is with the protocol. Some protocols does not support pipelining and then you suffer the roundtrip to the server at each request. Some protocols have horribly complicated encoding and decoding schemes which makes fast implementation impossible. This even in this era where bandwidth is readily available and you can apply compression on top of the stream once and for all. Some protocols are outright broken. They fail to recognize 30+ years of sane protocol design and thus they redo all the mistakes, yet again. Implementing such a protocol makes Erlang slow.
OOP all the things
A large set of mistakes stems from the belief that you can implement "Object Oriented" design in any language. Then you get a very layered module structure, sometimes with parameterized modules thrown in.
These designs are often highly non-idiomatic for a typical Erlang system and they build a layering upon layering which makes handling of code slow as molasses. Hence, Erlang is slow.
We believe the hype
"Whatsapp can do 2.5 million connections from a single Erlang server". Yes, in a mistake where the server was overloaded, on FreeBSD, with a highly tuned VM. Chances are your windows-backed implementation without tuning can't even take 1000 simultaneous connections.
"Erlang can do 850000 transactions against VoltDB per second". Yes, but what was the environment again? Definitely not a small instance on Amazon.
"Erlang had 99.9999% uptime". Yes, but on what hardware and how many calls in that telecommunication system were allowed to fail? What SLA were used?
"Erlang is used by company X". Yes, and company X is not you.
When these kinds of things are believed too much, then you get the impression that your system should be able to do the same. But there are a lot of differences between projects. And you may not be able to do the same, unless the setup is exactly the same. The problem is then when Erlang fails to deliver. Then it is perceived to be a problem of Erlang. Hence, Erlang is slow.
We cite the wrong studies
Yes, you can probably build an Erlang program 5 times faster than a C++ program, get the same speed and way fewer errors. But, people are not writing C++ anymore. They are writing Python, Javascript, Java, Clojure, PHP, and C#. They also claim to be 5 times faster. In effect, we look at the wrong studies nowadays. You can't really utilize these claims any more since the "competition" is makes the exact same claims. Node.js is even claiming to have the ability to do "massive concurrency", so where is the new difference?
Actually, properly solving a concurrency problem requires you to understand some subtle points about errors and error propagation. In a shared-nothing single threaded Python program, this is not a problem. Hence, building Erlang programs are slow.
No split of protocol from concurrency
Many libraries seek to solve two problems. They want to speak a protocol toward some foreign subsystem, like any other language. But then they need concurrency so they add their own kind of process pool on top of the library.
The result is many different pool implementations, which are all alike, but subtly different. It is very hard to make a robust process pool in the advent of errors. In fact, you might want to QuickCheck your pool implementation for correctness. Yet, this introduces multiple small subtle errors in Erlang programs since a program of major size is often using 3 or 4 different process pool implementations in the same system.
Luckily, this doesn't make Erlang slow. It just puts forth the observation that Erlang seems buggy, and flaky.
Wrong use of concurrency
My pet peeve here is theerlang-mysql-driver
versusemysql
. EMD has a process pool where it round-robins requests into the pool. So each process in the pool has its own queue in front of it. The Emysql driver has a single long queue and workers in the pool have no queues in front of them at all. They just pick off work from the single long queue.
The problem occurs when you have a long-running job. In EMD, due to the round-robin behaviour, requests will queue up on the worker processing the long-running job. Even if they can complete in sub-millisecond times, they will have to wait. Emysql does not have this weakness, since another worker will pick the small job.
But to the newbie Erlang programmer who picks the wrong library, there will be some really odd latencies toward MySQL that they can't explain. Hence, Erlang is slow.
NIF Abuse
As it stands right now, 19/20 NIF implementations are incorrect. One rule about NIFs are that they should not be running for more than 1ms. Otherwise they mess up the schedulers or affect the latency of the programs response times. Most NIF implementations ignore this fact.
As a NIF you have to either respond asynchronously through an internal thread, or you have to cooperate and be ready to yield. there is a callenif_consume_timeslice()
which can help the NIF-implementor, but few use it.
Usually, a NIF gets implemented because speed is wanted. But the problem is that NIFs can wreak a lot of havoc on your VM. And it takes knowledge to write them correctly and such they will run quickly.
Concurrency abuse
When people get concurrency in their hands, they want to use it. The first many libraries and programs written uses scores of processes. Usually, this leads to programs which are way more complicated than they should be. And the programs have way more failure modes to boot. Also, there is overhead in passing messages around so the programs will run slower than they should. Hence, Erlang is slow.
Neglecting error handling
Another common case is code which does nothing to handle errors. Erlang provides some really cool mechanisms for handling errors in the large, but you need to use the tools to get the advantage. It does not magically appear in your code base.
This means your code needs to handle errors by monitoring and by understanding how to restart. In turn, this actually often makes for a program which will run slower. But it will be correct, even when things go wrong. My experience is that going for the speed is not worth it unless you have a really good reason to do so. It is often better to set up more machines or scale out in another way.0Add a comment
-
On using Acme as a day-to-day text editor
I've been using the Acme text editor from Plan9Port as my standard text editor for about 9 months now. Before that, I have used Emacs and Vim quite a lot. I never really got the hang of either Sublime Text or TextMate. The latter because I couldn't run it on all operating systems, the former because is was too new to bother.
With Acme, you sacrifice almost everything. There is no configuration file. This is a plus. I have spent way too much time messing with configuration, where I should have been messing with adaptation. The acme defaults are designed to be sensible and to be easy to work with. The standard font choice works well, and even though it is not antialiased by default, I tend to like the font nonetheless.
Other sacrifices are syntax highlighting, automatic indentation, specific language support and so on. But you gain that the editor always work and there are no upgrades which bother you when working. The editor is built to be a window manager of sorts and you use it as a hub to connect other software together. This hub becomes the main focus of everything you are doing.
What pleases me about the acme editor is that it is simple. You can learn most things it can do in a week and then the power stems from combination of those simple things. It is very much a Zen-style editor with few things going on. You will have to like that choices have been made for you and that you have to adapt to those. But with this editor I spend much more time working on code bases than trying to get my editor to behave.
Setting up acme
- Grab Plan 9 from User Space and
./INSTALL
it somewhere. This gives you the basis environment, but it does require some help to get running. - Grab Plan 9 setup which is my small tools as shell scripts to manipulate files
- I have some changes to
$HOME/.profile
:
export BROWSER='chromium' PLAN9=/home/jlouis/P/plan9 PATH=$PATH:$PLAN9/bin # Plumb files instead of starting new editor. EDITOR=E unset FCEDIT VISUAL # Get rid of backspace characters in Unix man output. PAGER=nobs # Default font for Plan 9 programs. font=$PLAN9/font/lucsans/euro.8.font # Equivalent variables for rc(1). home=$HOME prompt="$H=; " user=$USER export \ BROWSER\ ⋯ PLAN9\ ⋯ font\ home\ prompt\ user
- Acme is started once through the
acme-start.rc
script. This also starts the plumber service. $HOME/lib/plumbing
is linked so I get some additional plumbing rules in addition to the default rules primarily quick access to github stuff
On mouse usage
Acme requires a good mouse to be really effective. Find a gaming mouse with good DPI resolution and then proceed to configure it so it has acceleration and sensitivity settings that you like. It shouldn't be needed to move the mouse too much, yet the movement should be precise. Some mice has microprocessors in them which smooths movement so when you sweep a line, it is easier to stay on the line. It all depends on what mouse you have.
In a moded editor like vim, you are usually either in command mode for cursor movement or in insert mode for entering text. To understand acme, it is the same, either you have a hand on your mouse and are doing commands, or you are inserting text into the buffer at some place. Note that in a system like acme, you can do a lot of tasks on the mouse alone. You can double-click next to a " character to select the whole string or to select pairs (), [] or {}. Click in the start of a line select the whole line. And so on. Since you can also select the \n character, you can easily move around large textual parts of the code. Cut, copy and paste is also on the mouse alone. So most of the (common) things you do in the vim command mode is done with the mouse.
You do have access to a command language, which comes from the sam(1) editor. Learning how that language works helps a lot. I do a lot of my surgery on files by writing commands that change the contents of a selection.
Is the mouse more efficient than the keyboard? Hell yes! The more complex an editing task is, the more effective the mouse is. And for most other simple things, the speed is about the same as the keyboard movement for me.
Working with acme
The key concept of acme is that you use it as the main entry point for all work you do. One of my screens is full-screen acme, and it usually runs two major windows in acme, at the least:scratch
andwin
. The latter is a standard shell so you can open files and operate on commands. I either open files by doinglc
and then 3-click them or by executingB <filename>
. Remember that^F
completes filenames.
Thescratch
is a file I continually write containing helper commands, small snippets, GH references and so on. I usually have a global one, and one per project I am working on. Usual contents:
- Urls to important web pages. 3-click them and chromium takes you there
- Github issues
- Git branch names enclosed in brackets:
[jl/fix-eqc-test/37]
- Notes of importance, thougths.
- Complex commands with notes on their usage so they can be copied in and used quickly
gg foo
command you can justSnarf
it and then use theSend
command in the shell window to fire it off in the source code.
I usually keep done things in the scratch buffer as well for documentation. Every 2-3 months I then move it to a filescratch.$(date +%Y%m%d)
to remember that.
I make heavy use of the fact that acme has a neat way to enter unicode directly, so there are a lot of correct punctuation and a lot of things you won't usually see in ASCII. The editor uses a variable width font by default which is really good when reading and scratching stuff down. Though I also use theFont
command if I need a fixed-with font at times.
Acme is a visual-spatial editor environment. By default, it doesn't hide information from you. At work, it is not uncommon that i have over 50 open buffers on a big 27 inch Mac display. You can do that with acme easily and since you have spatiality, you can also remember where you put a window and get back to it easily. I usually run 4 columns:
- One with documentation and scratch pads.
- One which contain the main code I am working on right now.
- One with shells, erlang shells and code mixed.
- A narrow strip containing directory output to quickly get at a specific file. This strip also holds windows with error output.
Working with Erlang in acme
Most of what I do is Erlang. I sometimes work in different languages, but the operation is roughly the same.
- A shell is used to run
rebar compile
. I add[rebar compile]
to the tag and then click at the right of[
to select it. A 2-click now recompiles. Other typical things in the tag could bemake
ormake test
and so on. - The
gg
command is a shorthand forgit grep -n
. Need a specific thing? I gg it and then the filename comes up with a line number. 3-clicking it is understood by the plumber to open that file on that line. - I tend to avoid using tools which can follow code paths. Mainly because if you need a tool, then chances are that the code itself is quite convoluted and nasty
- I have a window which runs an erlang console for the project I am working on. I often dynamically load code into the erlang node and test it out. It is rare that I reboot the node unless I am doing something startup-specific coding.
- Documentation:
Edit , <erl -man lists
in a dummy window for the purpose - I often search for code.
:/^keyfind
will search for keyfind, but at the start of the line. I keep such a line around in the tag for searches. - The
Edit , d
command clears a window by selecting all contents and then deleting it. - I often utilize the shell commands:
<date +%Y-%m-%d
inserts the current date into the buffer for instance. Selecting text and sending it through|sort
will sort export lists and atom tables. - Each project is written to a dump file with
Dump acme.projectname
. This way, you can easily get back withLoad acme.projectname
which restores your current window layout and more. - I use the shell a lot when I write code. In practice I see the UNIX system as the IDE and then I use acme to access that IDE. It works wonders.
0Add a comment
- Grab Plan 9 from User Space and
-
A disclaimer, to start it all off, in the interest of fairness and honesty: I got a review copy of the book from No Starch Press, and was asked to review it.
I remember a couple of years ago, on IRC (Internet Relay Chat) that Fred had started writing a series of articles on how Erlang worked. It all began in the small, with him writing about the language constructs, the data types, how to use the language on so on. Everytime he wrote a chapter, he would seek out the help of the channel for proofreading the parts and for suggestions how to improve it. I like Fred's work because he made a good introduction to the language and I had a place to point whenever somebody wanted to learn about Erlang. Even better, it was a free resource, straight on the web, and you could just send a link to somebody about a complicated part.
I am not sure how much Fred really envisioned it as a book. He started off from the idea of "Learn you a Haskell for greater good" which is a book written in the same, light, style. But as time passed, he kept writing and he kept adding more chapters to the book. It was also nice when Fred had covered another subject, since it eased the transition for new Erlang programmers. We had another source we could point to when people wanted to consider those aspects of the language.
Since then, Fred has written chapters on concurrency, on distribution and on general program design in Erlang. His style has always been to mix up fact and fun, and he avoided falling into the trap of writing a book full of boring tirades on the language. There is a reader group for such a book as well, but we already have that covered by a language definition and a short tutorial. Many people have enjoyed his book.
And now—now you can get the book in dead wood as well! No starch press made this possible and published all of Fred's amazing work, so you can go buy a book to read everywhere. It is an excellent introduction to Erlang and you can go to the web and look for the first chapter or two in order to get a feel for his writing style. If it is one you like, you should definitely consider buying the book. While the style may be too slow for a seasoned functional programmer, it is the right fit if you have yet to experience functional programming. I know that many people still prefer a book when they are reading. The reasons for this are many, but reading off a low resolution screen doesn't have the same feel to it as printed text. A side-effect is that the book has cleaned up all of the writing and made it clearer.
The major selling point of this book is that it contains it all. You will be exposed to sequential, concurrent and distributed Erlang. Fred also covers some parts which are often skipped in other books, like the dialyzer, mnesia and ETS tables. He also covers most of the testing tools like EUnit and Common Test. And in addition, Fred explains how OTP works and how to structure your OTP applications. Few other books offer that level of depth. Many of the chapters cover things which I consider to be essential to efficient Erlang programming in the large: especially the dialyzer, releases, testing and ETS are important for the professional Erlang programmer.
If you want to know it all, LYSE is a great book to buy. And you can buy it here:
http://learnyousomeerlang.com
0Add a comment
-
In this, I describe why Erlang is different from most other language runtimes. I also describe why it often forgoes throughput for lower latency.
TL;DR - Erlang is different from most other language runtimes in that it targets different values. This describes why it often seem to perform worse if you have few processes, but well if you have many.
From time to time the question of Erlang scheduling gets asked by different people. While this is an abridged version of the real thing, it can act as a way to describe how Erlang operates its processes. Do note that I am taking Erlang R15 as the base point here. If you are a reader from the future, things might have changed quite a lot—though it is usually fair to assume things only got better, in Erlang and other systems.
Toward the operating system, Erlang usually has a thread per core you have in the machine. Each of these threads runs what is known as a scheduler. This is to make sure all cores of the machine can potentially do work for the Erlang system. The cores may be bound to schedulers, through the +sbt flag, which means the schedulers will not "jump around" between cores. It only works on modern operating systems, so OSX can't do it, naturally. It means that the Erlang system knows about processor layout and associated affinities which is important due to caches, migration times and so on. Often the +sbt flag can speed up your system. And at times by quite a lot.
The +A flag defines a number of async threads for the async thread pool. This pool can be used by drivers to block an operation, such that the schedulers can still do useful work while one of the pool-threads are blocked. Most notably the thread pool is used by the file driver to speed up file I/O - but not network I/O.
While the above describes a rough layout towards the OS kernel, we still need to address the concept of an Erlang (userland) process. When you call spawn(fun worker/0) a new process is constructed, by allocating its process control block in userland. This usually amounts to some 600+ bytes and it varies from 32 to 64 bit architectures. Runnable processes are placed in the run-queue of a scheduler and will thus be run later when they get a time-slice.
Before diving into a single scheduler, I want to describe a little bit about how migration works. Every once in a while, processes are migrated between schedulers according to a quite intricate process. The aim of the heuristic is to balance load over multiple schedulers so all cores get utilized fully. But the algorithm also considers if there is enough work to warrant starting up new schedulers. If not, it is better to keep the scheduler turned off as this means the thread has nothing to do. And in turn this means the core can enter power save mode and get turned off. Yes, Erlang conserves power if possible. Schedulers can also work-steal if they are out of work. For the details of this, see [1].
IMPORTANT: In R15, schedulers are started and stopped in a "lagged" fashion. What this means is that Erlang/OTP recognizes that starting a scheduler or stopping one is rather expensive so it only does this if really needed. Suppose there is no work for a scheduler. Rather than immediately taking it to sleep, it will spin for a little while in the hope that work arrives soon. If work arrives, it can be handled immediately with low latency. On the other hand, this means you cannot use tools like top(1) or the OS kernel to measure how efficient your system is executing. You must use the internal calls in the Erlang system. Many people were incorrectly assuming that R15 was worse than R14 for exactly this reason.
Each scheduler runs two types of jobs: process jobs and port jobs. These are run with priorities like in an operating system kernel and is subject to the same worries and heuristics. You can flag processes to be high-priority, low-priority and so on. A process job executes a process for a little while. A port job considers ports. To the uninformed, a "port" in Erlang is a mechanism for communicating with the outside world. Files, network sockets, pipes to other programs are all ports. Programmers can add "port drivers" to the Erlang system in order to support new types of ports, but that does require writing C code. One scheduler will also run polling on network sockets to read in new data from those.
Both processes and ports have a "reduction budget" of 2000 reductions. Any operation in the system costs reductions. This includes function calls in loops, calling built-in-functions (BIFs), garbage collecting heaps of that process[n1], storing/reading from ETS, sending messages (The size of the recipients mailbox counts, large mailboxes are more expensive to send to). This is quite pervasive, by the way. The Erlang regular expression library has been modified and instrumented even if it is written in C code. So when you have a long-running regular expression, you will be counted against it and preempted several times while it runs. Ports as well! Doing I/O on a port costs reductions, sending distributed messages has a cost, and so on. Much time has been spent to ensure that any kind of progress in the system has a reduction cost[n2].
In effect, this is what makes me say that Erlang is one of a few languages that actually does preemptive multitasking and gets soft-realtime right. Also it values low latency over raw throughput, which is not common in programming language runtimes.
To be precise, preemption[2] means that the scheduler can force a task off execution. Everything based on cooperation cannot do this: Python twisted, Node.js, LWT (Ocaml) and so on. But more interestingly, neither Go (golang.org) nor Haskell (GHC) is fully preemptive. Go only switches context on communication, so a tight loop can hog a core. GHC switches upon memory allocation (which admittedly is a very common occurrence in Haskell programs). The problem in these systems are that hogging a core for a while—one might imagine doing an array-operation in both languages—will affect the latency of the system.
This leads to soft-realtime[3] which means that the system will degrade if we fail to meet a timing deadline. Say we have 100 processes on our run-queue. The first one is doing an array-operation which takes 50ms. Now, in Go or Haskell/GHC[n3] this means that tasks 2-100 will take at least 50ms. In Erlang, on the other hand, task 1 would get 2000 reductions, which is sub 1ms. Then it would be put in the back of the queue and tasks 2-100 would be allowed to run. Naturally this means that all tasks are given a fair share.
Erlang is meticously built around ensuring low-latency soft-realtime properties. The reduction count of 2000 is quite low and forces many small context switches. It is quite expensive to break up long-running BIFs so they can be preempted mid-computation. But this also ensures an Erlang system tend to degrade in a graceful manner when loaded with more work. It also means that for a company like Ericsson, where low latency matters, there is no other alternative out there. You can't magically take another throughput-oriented language and obtain low latency. You will have to work for it. And if low latency matters to you, then frankly not picking Erlang is in many cases an odd choice.
[1] "Characterizing the Scalability of Erlang VM on Many-core Processors" http://kth.diva-portal.org/smash/record.jsf?searchId=2&pid=diva2:392243
[2] http://en.wikipedia.org/wiki/Preemption_(computing)
[3] http://en.wikipedia.org/wiki/Real-time_computing
[n1] Process heaps are per-process so one process can't affect the GC time of other processes too much.
[n2] This section is also why one must beware of long-running NIFs. They do not per default preempt, nor do they bump the reduction counter. So they can introduce latency in your system.
[n3] Imagine a single core here, multicore sort of "absorbs" this problem up to core-count, but the problem still persists.
(Smaller edits made to the document at Mon 14th Jan 2013)31View comments
-
In UNIX there is a specific error number which can be returned from system calls. This error, EAGAIN is used by the OS kernel whenever it has a complex state in which it is deemed too hard to resolve a proper answer to the userland application. The solution is almost a non-solution: you punt the context back to the user program and ask that it goes again and retries the operation. Then the kernel gets rid of the complex state and the next time the program enters the kernel, we can be in another state without the trouble.
Here is an interesting psychological point: we can use our code to condition another persons brain to cook up a specific program that serves our purpose. That is, we can design our protocols such that they force the user to adapt certain behaviour to his programs. One such trick is deliberate fault injection.
Say you are serving requests through a HTTP server. Usually, people would imagine that 200 OK is what should be returned always on succesful requests, but I beg to differ. Sometimes—say 1/1000 requests—we deliberately fail the request. We return a 503 Service Unavailable back to the user. This conditions the user to write error-handling code for this request early on. You can't use the service properly without handling this error, since it occurs too often. You can even add a "Retry-After" header and have him go immediately again.
This deliberate fault injection has many good uses.
- First, it enforces users of your service to adapt a more biological and fault tolerant approach to computing. Given enough of this kind of conditioning, programmers will automatically begin adding error-handling code to their requests, because otherwise it may not work.
- Second, it gives you options in case of accidents: say your system is suddenly hit by an emergency which elevates the error rate to 10%. This has no effect, since your users are already able to handle the situation.
- Third, you can break conflicts by rejecting one or both requests.
- Fourth, you can solve some distribution problems by failing the request and have the client retry.
- Fifth, simple round-robin load balancing is now useful. If you hit an overloaded server, you just return 503 and the client will retry another server.
I have a hunch that Amazons Web Services uses this trick. Against S3, I've seen an error rate suspiciously close to 1/500. It could be their own way of implementing a chaos monkey and then conditioning all their users to write code in a specific way with it.
The trick is also applicable in a lot of other contexts. Almost every protocol has some point where you can deliberately inject faults in order to make other clients behave correctly. It is very useful in testing as well. Use QuickCheck to randomly generate requests and let a certain amount be totally wrong. These wrong requests must then be rejected by the system. Otherwise something is wrong with it.
More generally, this is an example of computer programs being both formal and chaotic at the same time. One can definitely find interesting properties of biological processes to copy into computer systems. While it is nice to be able to prove that your program is correct, the real world is filled with bad code, faulty systems, breaking network switches and so on. Having a reaction to this by having your system be robust to smaller errors is definitely going to be needed. Especially in the longer run, where programs will become even more complex and communicate even more with other systems; other systems over which you have no direct control.
You can see fault-injection as a type of mutation. The programs coping with the mutation are the programs which should survive in the longer run.
Consider hacking the brain of your fellow programmers. And force them to write robust programs by conditioning their minds into doing so.
Thanks to DeadZen for proof-reading and comments.3View comments
-
I am no Alan Jay Perlis, nor am I really worthy.
- Function parameters fornicate. If you have 7, they will quickly breed to 14.
- Any "new" idea which a person thinks about has a 98% chance of having been researched better and more deeply before 1980. Thus most new ideas aren't.
- Age rule for the young: If a concept is older than you and still is alive you must understand it. If it is in hibernation it may come back again. If there is no trace of it - some bozo is about to reinvent it.
- Dynamic typing is a special case of Static typing.
- Beware the scourge of boolean blindness.
- Prefer persistence over ephemerality.
- The program which can be formally reasoned about is usually the shortest, the correct and the fastest.
- "We will fix it later" - later never occurs.
- Project success is inversely proportional to project size.
- Code not written sometimes has emergent behaviour in the system. Either by not having bugs or by executing invisible code infinitely fast in zero seconds.
- Your portfolio of closed source projects doesn't exist.
- Version control or doom.
- Around the year 1999 the number of programmers increased 100-fold. The skill level didn't.
- Program state is contagious. Avoid like the plague.
- Business logic is a logic. Inconsistent logic?
- 0.01: The factor of human beings who can program
- 0.001: The factor of human beings who can program concurrently
- 0.0001: The factor of human beings who can program distributively
- If your benchmark shows your code an order of magnitude faster than the established way, you are correct. For the wrong problem.
- Debugging systems top-down is like peeling the onion inside-out.
- A disk travelling on the back of army ants has excellent throughput but miserable latency. So has many Node.js systems.
- Beware of the arithmetic mean. It is a statistic, and usually a lie.
- Often, speed comes with a sacrifice of flexibility on the altar of complexity.
- Sometimes correctness trumps speed. Sometimes it is the other way around.
- Optimal may be exponentially more expensive to compute than the 99th percentile approximation.
- The programmer is more important than the programming language
- Programming languages without formal semantics is akin to a dumping ground. The pearls are few and far between.
- The brain is more important than the optimizing compiler
- The tools necessary for programs of a million lines of code are different than those for 1000 lines.
- Specializing in old tools contains the danger of ending as an extinct dinosaur.
- Like introduction of 'null', Object Oriented Programming is a grave mistake.
- The string is heaven because it can encode anything. The string is hell because it can encode anything.
- Idempotence is your key to network programming.
- Protocol design is your key to network programming.
- Sun RPC is usually not the solution. Corollary: HTTP requests neither.
- Your protocol must have static parts for structure and dynamic parts for extension.
- Only trust systems you have control over and where you can change the behaviour.
- If a non-programmer specifies a distributed system, they always violate the CAP theorem.
- In a distributed system, the important part is the messages. What happens inside a given node is uninteresting. Especially what programming language it is written in.
- A distributed system can have more failure scenarios than you can handle. Trying is doom.
- The internet has a failure rate floor. If your system has a failure rate underneath it, you are error-free to the customer.
- If your system is doing a million $100 requests a year. A failure rate of 10 requests per year is not worth fixing.
- If your system employs FIFO queues, latency can build up. Bufferbloat is not only in TCP.
- Beware the system overload situation. It is easier to reject requests than handle them. You need back-pressure to inform.
2View comments
-
A very common thing that crops up now and then is the question in the title. What is fastest for editing text, the keyboard or the mouse? The answer which is an often quoted answer is an older "Ask Tog" article[1a, 1b, 1c]. They come up again and again in these discussions and then the keyboardists battle it out against the mouse-zealots.
Since I have been working in most of the "grand" editors out there, Emacs and vi(m) for years, I do have something to say about this subject I think. Currently, I am writing this blog post, and most of my coding in the acme(1)-editor[2]. Acme is often seen as being an editor which is very mouse-centered, but there is more to the game than just being a mouse editor.
First of all, what keyboard shortcuts do acme(1) understand? It understands 5 commands in total: Let ^ stand for the control character. Then it understands ^A and ^E which moves the cursor to the start and end of the line respectively. It understands ^H which is delete character before cursor (backspace) and ^W which kills a whole word. Finally it understands ^U which deletes from the cursor to the start of the line. The very reason for supporting these shortcuts are that they are very deeply rooted in UNIX. A lot of systems understand these commands and when entering text on end, these commands are very nice to have available. I guess I am a boring typist because when I see I have written a word incorrectly, I often just kill the whole word and type it again. The shortcut ^W is a nice quickly typed command on the left hand of a QWERTY style keyboard.
Secondly, and I think this is a very important point, acme(1) has a command language stemming from the sam(1) editor. It may be that the mouse is often used, but if you are to change every occurrence of 'foo' into 'bar' you just execute the command "Edit , s/foo/bar/g". This is almost like in vi. I don't think anybody would argue that for a large piece of text this would be faster to do than to manually go and edit the text. The reason is that we are programming the editor. We are writing a program which carries out the mere task for us. And the cognitive overload of doing so is smaller than being the change-monkey. In the command the comma is a shorthand for "all of the files lines". What if we only wanted the change on the 2nd paragraph of the text? In acme(1) you can just select that text and then execute "Edit s/foo/bar/g". Which narrows the editing to the selection only. As you go from "program" to "specific" editing, then the mouse and the spatial user interface makes it faster and faster.
The [1c] reference has a task which is trying to prove a point. A piece of text needs the execution of, essentially "Edit s/\|/e/g", replacing every '|' with an 'e'. The program above is clearly the fastest way to do it for large texts. And you don't even have to think about that program when you know the editor. But the time it takes to find each letter and replace it is subject to the cognitive overhead the article talks about. It adds up when you are doing lots of these small edits all day.
For editing source code, a peculiar thing happens. I often grab the mouse and then I more or less stay on the mouse. Note that acme has the usual ability to cut-and-paste text on the mouse alone. You don't need the keyboard for this. It means that you can do a lot of text surgery with the mouse alone. Since you can select the end-of-line codepoint, you can easily reorder lines, including indentation. Often, renaming variables happens on the mouse alone. Also, there is some tricks that the mouse has hidden. Clicking twice right next to a parenthesis '(' selects all text up to the matching ')'. The same with quotes. It allows you to quickly cut out parts of your structured code and replace it with other code.
Then there is text search. When writing large bodies of programs, you will often end up searching for text more than editing text. The quest is that of something you need to find. Since the mouse in acme(1) has search on a right click by design, most text can be clicked to find the next specimen you need to consider. A more complex invocation is through the "plumber" which understand the context of the text being operated upon. A line like "src/pqueue.erl:21:" is understood as "Open the file "src/pqueue.erl" and goto line 21 by a right click. Combine this with a command like "git grep -n foo" in a shell window and you can quickly find what you are looking for. I often use the shell as my search tool and then click on my target line. You can even ask grep to provide context to find the right spot to edit.
Good editors can be programmed, and a mouse-centered editor is no exception. Apart from the sam(1) built-in command language, you can also write external unix programs to pipe text through. I have a helper for Erlang terms, called erlfmt, which will reindent any piece of Erlang nicely. I have the same for JSON structures since they are often hard to read.
The thing that makes acme(1) work though stems from an old idea, by Niklaus Wirth and Jürg GutKnecht[3]: The Oberon operating system. In this operating system, the graphical user interface is a TUI or a textual user interface in which spatiality plays a big role. Not unlike the modern tiling window managers, the system lays out windows next to each other in ways so they never overlap. But unlike the tiling window managers, the interface is purely textual. You can change the menu bars by writing another piece of text there if you want. The same is present in acme(1). You often end up changing your environment into how you want it to look. Since you can "Dump" and "Load" your current environment, each project often ends up with a setup-file that makes the configuration for that particular environment. I essentially have one for each project I am working on. In many Erlang projects, there is a shell window where the menu (called the tag in acme(1)-speak) is extended with the command 'make'. This makes it easy to rebuild the project. And errors are reported as "src/file.erl:LINE:" like above, making error correction painless and fast.
The key is that to make the mouse efficient, you need to build the environment around the mouse. That is, your system must support the mouse directly and make it possible to carry out many things on the mouse alone. It is rather sad to see that most modern editing environments shun a so effective editing tool and removes it totally from the entering of text. But perhaps the new touch-style interfaces will change that again? Currently their problem seems to be that the mobile phones and tablets are not self-hosting: we are not programming them via themselves. That probably has to happen before good programming user interfaces using touch becomes a possibility. I must admit though, that the idea of actually touching the 'make' button you wrote down there yourself is alluring.
[1a] http://www.asktog.com/TOI/toi06KeyboardVMouse1.html
[1b] http://www.asktog.com/TOI/toi22KeyboardVMouse2.html
[1c] http://www.asktog.com/SunWorldColumns/S02KeyboardVMouse3.html
[2] Acme is part of the plan9 port: http://swtch.com/plan9port/
[3] Note that the original Oberon Native System is living on in the Bluebottle/AOS/A2 system today, see http://en.wikipedia.org/wiki/Oberon_(operating_system) and http://en.wikipedia.org/wiki/Bluebottle_OS5View comments
View comments