A list of common problems
This serves as a gentle reminder list of things one should be aware of when doing Erlang advocacy. The list is quite haphazard, but there is a common point to make, which is that "Erlang is slow". The idea is that there is an underlying confounding reason as why that is in many situations, and it has to do with something not tied to the language itself per se, but rather to bad practice.
The bait is wrong
Let us split the world of programming problems along two axis: "How fast do you want it" and "How much complex synchronization/concurrency does the problem need". The former has to do with fast computation. That is, HPC, Video encoding, data mining and so on. The time it takes to deliver a result matters for these problems. The latter has to do with massive amounts of concurrency: web servers, chat systems, message busses, event coordination and so on.Problems largely fall into 4 quadrants:
- Type A: Slow, Sequential
- Type B: Fast, Sequential
- Type C: Slow, Concurrent
- Type D: Fast, Concurrent
The thing is: Erlang is interpreted. The reason is probably mostly historical since it is easier to port interpreters and back in the day there were multiple architectures on which you were to run to be in business. But it also makes heavyweight computation in the language horribly slow. So people try using Erlang for a type B problem. Hence, Erlang is slow.
I/O is incorrectly handled
Too many projects are not using theiolist()
type. They should. If
you have output, then you should be able to take an iolist()
and
operate on that. Requiring any other type is wrong for payload-style
data in most cases. The problem is that the code will have lots of
unnecessary internal copies of data, where it could just shove that
data directly to an underlying socket.Another common problem is failing to recognize that you are passed an IO-list. This makes for subtle bugs in code at times, where otherwise perfect code fails since the iolist structure changes.
There are also a large set of common problems which stems from setting the wrong options on files and/or sockets and then claiming Erlang has slow IO. The defaults won't give you a lot of speed, but they will give you high flexibility.
And finally, a very common mistake I see all the time is treating a TCP socket stream as a way to pass messages without having any kind of framing. This happens very often, sadly.
All of these mistakes leads to one conclusion only: Erlang is slow.
Bad protocol implementations
When you have to support a new protocol, the first implementation is often written in anger, as quickly as possible and with the goal of solving a completely different problem. This often makes for protocol implementations which does not scale. This in turn falls back on the language, because "this is the fault of the language". And hence, Erlang is slow.Other times, the problem is with the protocol. Some protocols does not support pipelining and then you suffer the roundtrip to the server at each request. Some protocols have horribly complicated encoding and decoding schemes which makes fast implementation impossible. This even in this era where bandwidth is readily available and you can apply compression on top of the stream once and for all. Some protocols are outright broken. They fail to recognize 30+ years of sane protocol design and thus they redo all the mistakes, yet again. Implementing such a protocol makes Erlang slow.
OOP all the things
A large set of mistakes stems from the belief that you can implement "Object Oriented" design in any language. Then you get a very layered module structure, sometimes with parameterized modules thrown in.These designs are often highly non-idiomatic for a typical Erlang system and they build a layering upon layering which makes handling of code slow as molasses. Hence, Erlang is slow.
We believe the hype
"Whatsapp can do 2.5 million connections from a single Erlang server". Yes, in a mistake where the server was overloaded, on FreeBSD, with a highly tuned VM. Chances are your windows-backed implementation without tuning can't even take 1000 simultaneous connections."Erlang can do 850000 transactions against VoltDB per second". Yes, but what was the environment again? Definitely not a small instance on Amazon.
"Erlang had 99.9999% uptime". Yes, but on what hardware and how many calls in that telecommunication system were allowed to fail? What SLA were used?
"Erlang is used by company X". Yes, and company X is not you.
When these kinds of things are believed too much, then you get the impression that your system should be able to do the same. But there are a lot of differences between projects. And you may not be able to do the same, unless the setup is exactly the same. The problem is then when Erlang fails to deliver. Then it is perceived to be a problem of Erlang. Hence, Erlang is slow.
We cite the wrong studies
Yes, you can probably build an Erlang program 5 times faster than a C++ program, get the same speed and way fewer errors. But, people are not writing C++ anymore. They are writing Python, Javascript, Java, Clojure, PHP, and C#. They also claim to be 5 times faster. In effect, we look at the wrong studies nowadays. You can't really utilize these claims any more since the "competition" is makes the exact same claims. Node.js is even claiming to have the ability to do "massive concurrency", so where is the new difference?Actually, properly solving a concurrency problem requires you to understand some subtle points about errors and error propagation. In a shared-nothing single threaded Python program, this is not a problem. Hence, building Erlang programs are slow.
No split of protocol from concurrency
Many libraries seek to solve two problems. They want to speak a protocol toward some foreign subsystem, like any other language. But then they need concurrency so they add their own kind of process pool on top of the library.The result is many different pool implementations, which are all alike, but subtly different. It is very hard to make a robust process pool in the advent of errors. In fact, you might want to QuickCheck your pool implementation for correctness. Yet, this introduces multiple small subtle errors in Erlang programs since a program of major size is often using 3 or 4 different process pool implementations in the same system.
Luckily, this doesn't make Erlang slow. It just puts forth the observation that Erlang seems buggy, and flaky.
Wrong use of concurrency
My pet peeve here is theerlang-mysql-driver
versus emysql
. EMD
has a process pool where it round-robins requests into the pool. So
each process in the pool has its own queue in front of it. The Emysql
driver has a single long queue and workers in the pool have no queues
in front of them at all. They just pick off work from the single long
queue.The problem occurs when you have a long-running job. In EMD, due to the round-robin behaviour, requests will queue up on the worker processing the long-running job. Even if they can complete in sub-millisecond times, they will have to wait. Emysql does not have this weakness, since another worker will pick the small job.
But to the newbie Erlang programmer who picks the wrong library, there will be some really odd latencies toward MySQL that they can't explain. Hence, Erlang is slow.
NIF Abuse
As it stands right now, 19/20 NIF implementations are incorrect. One rule about NIFs are that they should not be running for more than 1ms. Otherwise they mess up the schedulers or affect the latency of the programs response times. Most NIF implementations ignore this fact.As a NIF you have to either respond asynchronously through an internal thread, or you have to cooperate and be ready to yield. there is a call
enif_consume_timeslice()
which can help the NIF-implementor,
but few use it.Usually, a NIF gets implemented because speed is wanted. But the problem is that NIFs can wreak a lot of havoc on your VM. And it takes knowledge to write them correctly and such they will run quickly.
Concurrency abuse
When people get concurrency in their hands, they want to use it. The first many libraries and programs written uses scores of processes. Usually, this leads to programs which are way more complicated than they should be. And the programs have way more failure modes to boot. Also, there is overhead in passing messages around so the programs will run slower than they should. Hence, Erlang is slow.Neglecting error handling
Another common case is code which does nothing to handle errors. Erlang provides some really cool mechanisms for handling errors in the large, but you need to use the tools to get the advantage. It does not magically appear in your code base.This means your code needs to handle errors by monitoring and by understanding how to restart. In turn, this actually often makes for a program which will run slower. But it will be correct, even when things go wrong. My experience is that going for the speed is not worth it unless you have a really good reason to do so. It is often better to set up more machines or scale out in another way.
Add a comment