Differences between Node.js and Erlang
Suppose we have a canonical ping/pong server written in Node,var sys = require("sys");
var http = require("http");
http.createServer(function (req, res) {
res.writeHead(200, {"Content-Type": "text/plain"});
res.end("Hello, World\n");
}).listen(8124, "127.0.0.1");
sys.puts("Server running at http://localhost:8124");
We can run this server easily and test it from the command line:jlouis@illithid:~$ curl http://localhost:8124
Hello, World
And it does what we expect. Now suppose we do something silly. We make a tiny change to the Javascript code:var sys = require("sys");
var http = require("http");
http.createServer(function (req, res) {
res.writeHead(200, {"Content-Type": "text/plain"});
res.end("Hello, World\n");
while(true) { // Do nothing
}
}).listen(8124, "127.0.0.1");
sys.puts("Server running at http://localhost:8124");
Now, the first invocation of our test works, but the second hangs:jlouis@illithid:~$ curl -m 10 http://localhost:8124
Hello, World
jlouis@illithid:~$ curl -m 10 http://localhost:8124
curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received
This should not surprise anybody. What we have here illustrated should be a common knowledge. Namely that Node is not preemptively multitasking but is asking each event to cooperate by yielding to the next one in turn.The example was silly. Now, suppose we have a more realistic example where we do work, but it completes:
var sys = require("sys");
var http = require("http");
http.createServer(function (req, res) {
res.writeHead(200, {"Content-Type": "text/plain"});
x = 0;
while(x < 100000000) { // Do nothing
x++;
}
res.end("Hello, World " + x + "\n");
}).listen(8124, "127.0.0.1");
sys.puts("Server running at http://localhost:8124");
We introduce a loop which does some real work. And then we arrange for it to be non-dead by requiring it in the output. Our server will now still return, but it will take some time before it does so.Let us siege the server:
jlouis@illithid:~$ siege -l -t3M http://localhost:8124
** SIEGE 2.69
** Preparing 15 concurrent users for battle.
The server is now under siege...
[..]
For three minutes, we hammer the server and get a CSV file, which we can then load into R and process.Erlang enters…
For comparison, we takeMochiweb, an Erlang webserver. We do not choose it specifically for its speed or its behaviour. We choose it simply because it is written in Erlang and it will context switch preemptively.The relevant part of the Mochiweb internals are this:
count(X, 0) -> X;
count(X, N) -> count(X+1, N-1).
loop(Req, _DocRoot) ->
"/" ++ Path = Req:get(path),
try
case Req:get(method) of
Method when Method =:= 'GET' ->
X = count(0, 100000000),
Req:respond({200, [], ["Hello, World ", integer_to_list(X), "\n"]});
[..]
It should be pretty straightforward. We implement the counter as a tail-recursive loop and we force its calculation by requesting it to be part of the output.erl -pa deps/mochiweb/ebin -pa ebin
Erlang R14B02 (erts-5.8.3) [source] [64-bit] [smp:2:2]
[rq:2] [async-threads:0] [hipe] [kernel-poll:false]
1> application:start(erlang_test).
{error,{not_started,crypto}}
2> application:start(crypto).
ok
3> application:start(erlang_test).
** Found 0 name clashes in code paths
ok
4>
Notice that we get both my CPUs to work here automatically. But performance is not the point I want to make.Again, we lay siege to this system:
jlouis@illithid:~$ siege -l -t3M http://localhost:8080 | tee erlang.log
Enter R
We can take these data and load them into R for visualization:> a <- read.csv("erlang.log", header=FALSE);
> b <- read.csv("node.js.log", header=FALSE);
> png(file="density.png")
> plot(density(b$V3), col="blue", xlim=c(0,40), ylim=c(0, 0.35));
lines(density(a$V3), col="green")
> dev.off()
> png("boxplot.png")
> boxplot(cbind(a$V3, b$V3))
> dev.off()
Discussion
What have we seen here? We have a situation where Node.js has a much more erratic response time than Erlang. We see that while some Node.js responses complete very fast (a little more than one second) there are also responses which take 29.5 seconds to complete. The summary of the data is here for Node.js:> summary(b$V3)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.040 6.328 13.580 13.940 20.940 29.590
And for Erlang:> summary(a$V3)
Min. 1st Qu. Median Mean 3rd Qu. Max.
9.87 11.21 12.24 12.21 13.16 15.32
The densities are (green is Erlang, blue is Node.js)


Does this mean Node.js is bad? No! Most node.js programs will not blindly loop like this. They will call into a database, make another web request or the like. When they do this, they will allow other requests to be processed in the event loop and the thing we have seen here is nonexistent. It does however show that if a Node.js request is expensive in the processing, it will block other requests from getting served. Contrast this with Erlang, where cheap requests will get through instantly as soon we switch context preemptively.
It also hints you need to carry out histogram plots for your services (kernel density plots are especially nice for showing how the observations spread out). You may be serving all requests, but how long time does it take to serve the slowest one? A user might not want to wait 30 seconds on a result, but he may accept 10 seconds.
Conclusion
My main goal was to set out and exemplify a major difference in how a system like Node.js handles requests compared to Erlang. I think I have succeeded. It underpins the idea that you need to solve problems depending on platform. In Node.js, you will need to break up long-running jobs manually to give others a chance at the CPU (this is essentially cooperative multitasking). In Erlang, this is not a problem — and a single bad process can’t hose the system as a whole. On the other hand, I am sure there are problems for which Node.js shines and it will have to be worked around in Erlang.EDIT: Minor spelling correction.

11 comments:
Hmm... node.js takes 1 cpu, erlang take 2 cpu and has same mean time... erlang 2 times slower?
Jlouis said:
-----
On the other hand, I am sure there are problems for which Node.js shines and it will have to be worked around in Erlang.
-----
Yup. Like sharing the same library of code with the browser :)
If you need to do real calculations in Node you would have to invoke a worker with a callback for when the work has been finished.
@funny_falcon: You can't make the conclusion that Erlang is 2 times slower. First, the counter-loop of Erlang can be written smarter. This drops the mean time to 7.5 seconds (for two CPUs). Second, the Erlang code is byte-code interpreted. If you native-compile with the HiPE optimizer, mean time drops to a mean of 2 seconds.
@Peter: Node.js make a lot of sense. Here you have a dynamic language known by many, with an optimizing compiler. Few dynamic languages have that!
And I agree with the worker-callback, though it means that you, a priori, must have guarded against long running calculations. If the worker-code is share-nothing, I somewhat contemplate how hard it would be to just add a worker pool to Node.js for this kind of thing.
You could use something like multi-node (which spawns multiple node processes) to work around multiple concurrent requests that block. This lets the OS handle load balancing. Of course, that won't help with an infinite loop, but if you have that in your code you are kind of screwed anyway.
the whole idea of node is that it's single threaded and thus side effect free (kind of)
@blacktiger: Load balancing will help somewhat to mitigate, yes. Suppose we start 10 Node.js processes. Then if one of these infinite-loops, every 10th request is dead. If one is slow, all requests balanced to that node will come after it. But if all requests are slow, as in the example, the worst-case response time will improve and approach the Erlang graph.
@Marek: You surely do not mean side-effect here. Effects are numerous in programs. Some examples are: Updating a variable destructively (State), exceptions, sending a message to a concurrent process, handling IO.
Node.js programs are impossible to deadlock, technically (as there are no locks) - and they are usually not blocking due to IO. But note that an infinite loop in that library you use will hose your Node.js server.
In a preemptively multithreaded environment, Erlang is one, an infinite looping process can at most take out one CPU (usually less as the load is increased and the looping process gets smaller time slots on the CPU before it is swtch'ed).
The node.js code makes no sense, it certainly goes against the principle of asynchronous event driven scheduling, of course any CPU bound task should run in an asynchronous worker or different process... This is the design choice of node.js developers, not a flaw.
I understand what you're trying to prove though.
@giorgio: I am not trying to prove that one of the tools is "better" than the other for some value of better. Rather, I wanted to highlight an important difference in the way the systems handle incoming requests and how they process them. This is the reason I twisted the CPU-bound knob so the difference is exaggerated.
The problem is determining when a job is CPU-bound enough to warrant asynchronous processing. If you set this value too low, you die of overhead: It would have been faster just to handle the request yourself. If you set it too high, the processor will suffer because it will let other requests wait longer. In practice, we will usually only face the problem with a much higher load - but that means the CPU-binding has to go on for smaller intervals to queue up.
The real way around the problem is to use something like multi-node which lets you have several forked workers handle a single incoming queue of requests. The exact best number of workers can probably be found by a lookup in the Erlang B or Erlang C formulae (not the prog. lang :). It will mean that a single slow request can't block other requests as much.
Of course, there is a price to pay with having multiple processes. Context switches are more expensive this way, and you certainly need more than you have CPU's. Furthermore, since each process lives in its own memory space, you will need to do something about inter-process communication between OS-processes. Again, it may be against the shared-nothing idea of Node.js, but that just highlights yet another difference.
@jlouis, you're right, it's important to highlight that erlang & node.js handle requests and process them completely differently.
Sorry if my comment may have sound too harsh :)
Post a Comment