A list of common problems
This serves as a gentle reminder list of things one should be aware of
when doing Erlang advocacy. The list is quite haphazard, but there is
a common point to make, which is that "Erlang is slow". The idea is
that there is an underlying confounding reason as why that is in many
situations, and it has to do with something not tied to the language
itself per se, but rather to bad practice.
The bait is wrong
Let us split the world of programming problems along two axis: "How
fast do you want it" and "How much complex synchronization/concurrency
does the problem need". The former has to do with
fast computation.
That is, HPC, Video encoding, data mining and so on. The time it takes
to deliver a result matters for these problems. The latter has to do
with massive amounts of concurrency: web servers, chat systems,
message busses, event coordination and so on.
Problems largely fall into 4 quadrants:
- Type A: Slow, Sequential
- Type B: Fast, Sequential
- Type C: Slow, Concurrent
- Type D: Fast, Concurrent
Erlang
excels at type C problems. It does well on type A problems
too, but most languages do. It is usually not good for type B
problems, and most people know this. The major crux is type D. I claim
there are very few problems which fit into this category. Most of the
time you have a problem where you want
parallelism but have rather
simple coordination. A simple fork-join works. Or perhaps MPI or
OpenMP codes. So you really have a type B problem which, by virtue of
coincidence, got classified as type D.
The thing is: Erlang is
interpreted. The reason is probably mostly
historical since it is easier to port interpreters and back in the day
there were multiple architectures on which you were to run to be in
business. But it also makes heavyweight computation in the language
horribly slow. So people try using Erlang for a type B problem. Hence,
Erlang is slow.
I/O is incorrectly handled
Too many projects are not using the
iolist()
type. They should. If
you have output, then you should be able to take an
iolist()
and
operate on that. Requiring any other type is wrong for payload-style
data in most cases. The problem is that the code will have lots of
unnecessary internal copies of data, where it could just shove that
data directly to an underlying socket.
Another common problem is failing to recognize that you are passed an
IO-list. This makes for subtle bugs in code at times, where otherwise
perfect code fails since the iolist structure changes.
There are also a large set of common problems which stems from setting
the wrong options on files and/or sockets and then claiming Erlang has
slow IO. The defaults won't give you a lot of speed, but they will
give you high flexibility.
And finally, a very common mistake I see all the time is treating a
TCP socket stream as a way to pass messages without having any kind of
framing. This happens very often, sadly.
All of these mistakes leads to one conclusion only: Erlang is slow.
Bad protocol implementations
When you have to support a new protocol, the first implementation is
often written in
anger, as quickly as possible and with the goal of
solving a completely different problem. This often makes for protocol
implementations which does not scale. This in turn falls back on the
language, because "this is the fault of the language". And hence,
Erlang is slow.
Other times, the problem is with the protocol. Some protocols does not
support pipelining and then you suffer the roundtrip to the server at
each request. Some protocols have horribly complicated encoding and
decoding schemes which makes fast implementation impossible. This even
in this era where bandwidth is readily available and you can apply
compression on top of the stream once and for all. Some protocols are
outright broken. They fail to recognize 30+ years of sane protocol
design and thus they redo all the mistakes, yet again. Implementing
such a protocol makes Erlang slow.
OOP all the things
A large set of mistakes stems from the belief that you can implement
"Object Oriented" design in any language. Then you get a very layered
module structure, sometimes with parameterized modules thrown in.
These designs are often highly non-idiomatic for a typical Erlang
system and they build a layering upon layering which makes handling of
code slow as molasses. Hence, Erlang is slow.
We believe the hype
"Whatsapp can do 2.5 million connections from a single Erlang server".
Yes, in a mistake where the server was overloaded, on FreeBSD, with a
highly tuned VM. Chances are your windows-backed implementation
without tuning can't even take 1000 simultaneous connections.
"Erlang can do 850000 transactions against VoltDB per second". Yes,
but what was the environment again? Definitely not a small instance on Amazon.
"Erlang had 99.9999% uptime". Yes, but on what hardware and how many
calls in that telecommunication system were allowed to fail? What SLA
were used?
"Erlang is used by company X". Yes, and company X is not you.
When these kinds of things are believed too much, then you get the
impression that your system should be able to do the same. But there
are a lot of differences between projects. And you may not be able to
do the same, unless the setup is exactly the same. The problem is then
when Erlang fails to deliver. Then it is perceived to be a problem of
Erlang. Hence, Erlang is slow.
We cite the wrong studies
Yes, you can probably build an Erlang program 5 times faster than a
C++ program, get the same speed and way fewer errors. But, people are
not writing C++ anymore. They are writing Python, Javascript, Java,
Clojure, PHP, and C#. They
also claim to be 5 times faster. In
effect, we look at the wrong studies nowadays. You can't really
utilize these claims any more since the "competition" is makes the
exact same claims. Node.js is even claiming to have the ability to do
"massive concurrency", so where is the
new difference?
Actually, properly solving a concurrency problem requires you to
understand some subtle points about errors and error propagation. In a
shared-nothing single threaded Python program, this is not a problem.
Hence, building Erlang programs are slow.
No split of protocol from concurrency
Many libraries seek to solve two problems. They want to speak a
protocol toward some foreign subsystem, like any other language. But
then they need concurrency so they add their own kind of process pool
on top of the library.
The result is many different pool implementations, which are all
alike, but subtly different. It is very hard to make a robust process
pool in the advent of errors. In fact, you might want to QuickCheck
your pool implementation for correctness. Yet, this introduces
multiple small subtle errors in Erlang programs since a program of
major size is often using 3 or 4 different process pool
implementations in the same system.
Luckily, this doesn't make Erlang slow. It just puts forth the
observation that Erlang seems buggy, and flaky.
Wrong use of concurrency
My pet peeve here is the
erlang-mysql-driver
versus
emysql
. EMD
has a process pool where it round-robins requests into the pool. So
each process in the pool has its own queue in front of it. The Emysql
driver has a single long queue and workers in the pool have no queues
in front of them at all. They just pick off work from the single long
queue.
The problem occurs when you have a long-running job. In EMD, due to
the round-robin behaviour, requests will queue up on the worker
processing the long-running job. Even if they can complete in
sub-millisecond times, they will have to wait. Emysql does not have
this weakness, since another worker will pick the small job.
But to the newbie Erlang programmer who picks the wrong library, there
will be some really odd latencies toward MySQL that they can't
explain. Hence, Erlang is slow.
NIF Abuse
As it stands right now, 19/20 NIF implementations are
incorrect. One
rule about NIFs are that they should not be running for more than 1ms.
Otherwise they mess up the schedulers or affect the latency of the
programs response times. Most NIF implementations ignore this fact.
As a NIF you have to either respond asynchronously through an internal
thread, or you have to cooperate and be ready to yield. there is a
call
enif_consume_timeslice()
which can help the NIF-implementor,
but few use it.
Usually, a NIF gets implemented because speed is wanted. But the
problem is that NIFs can wreak a lot of havoc on your VM. And it takes
knowledge to write them correctly and such they will run quickly.
Concurrency abuse
When people get concurrency in their hands, they want to use it. The
first many libraries and programs written uses scores of
processes. Usually, this leads to programs which are way more
complicated than they should be. And the programs have way more
failure modes to boot. Also, there is overhead in passing messages
around so the programs will run slower than they should. Hence, Erlang
is slow.
Neglecting error handling
Another common case is code which does nothing to handle errors.
Erlang provides some really cool mechanisms for handling errors in the
large, but you need to use the tools to get the advantage. It does not
magically appear in your code base.
This means your code needs to handle errors by monitoring and by
understanding how to restart. In turn, this actually often makes for a
program which will run slower. But it will be correct, even when
things go wrong. My experience is that going for the speed is not
worth it unless you have a really good reason to do so. It is often
better to set up more machines or scale out in another way.
If your system employs FIFO queues, latency can build up. Bufferbloat is not only in TCP.
ReplyDeleteDo Erlang mailboxes count as FIFO queues under this principle?
Yes, Erlang mailboxes may count as a FIFO queue in some situations. The key is that you can selectively match on messages which removes the strict FIFO order in some cases. In general though, any queue behavior has a sojourn time through the queue and this is to be added to the processing time. If you have a very deep queue then you have trouble since the sojourn time will grow. If you have 10 messages in front of a message and the average processing time is 10ms, you are looking at 100ms wait time before that message gets handled.
ReplyDeleteOn the other hand, you do need some queue in order to absorb quickly arriving messages, i.e., for absorbing shocks.