One important key aspect of Erlang programs is to identify where your Stable state is in the program. Stable state is what you can trust. What you can trust is what you can build on. Joe Armstrong defines one of the key aspects of an Erlang system as Stable Storage. A place where we can push data and be sure it won't change. If we verify data before pushing, we can trust those data a great deal.
This is important. If our system partially crashes, as is the norm for Erlang programs, it may be necessary to reconstruct state. Stable storage provides the basis from which we can re-read data into memory. Even if recreating data is expensive, you may still want a cache to be able to reconstruct your state faster from disk. Persistent store on a disk is among the best way to make sure data is there.
In a BitTorrent client like eTorrent for instance, we only worry about the file. If we download a piece of a file and that pieces pass the BitTorrent SHA1 integrity check, we can now regard that part as "safe" write to stable storage, and never touch it again. I don't have to care about the internal state of peers I am communicating with. I don't have to worry about any internal structure in memory. The on-disk partial download provides all the needed information to reconstruct the system from scratch should I need it.
Second, there may be state we don't really want to lose - but we can afford it. We can't recreate a user input, so we need that on stable storage like above. But we don't want to redo expensive work if we can. To fix this in Erlang, we create a process to keep the important data, and we let that process protect the data simply by validating and verifying any change of the data. The process becomes a castle with the princess in it. And with a nasty dragon at the drawbridge. (Naturally, the princess and dragon have exquisite meals each night together and they like to dance tango. The nastiness and damsel-in-distress is only kept up for fun to lure unsuspecting knights to the party).
Third, we can exploit that sequential Erlang is a functional programming language. If we are state S1 and we apply a function to obtain state S2, we have an interesting property: either we obtain S2, or we get an error. But since the data store is persistent, we still have access to S1 if we keep a reference to it. This in effect creates an atomic way of processing: Either we get to the new state safely, or we can't move to the new state due to an error. This means that each state becomes a safe-haven in our processing. Since we can't mutate data, there is no way the processing to obtain S2 can corrupt the state of S1. It allows us to build programs that are highly stable as it ultimately works like a CPU: We have a state and atomically we process a clock cycle to obtain a new state. There is no "in between".
(Note: I must strain that modern CPUs are more advanced than this, but they try to uphold the illusion above)
Fourth, we can exploit the isolation between processes. To get state, I must ask another process for it. To ask another process for it, I must send it a message. It might never answer. So I must build my system around the idea that systems will fail occasionally. If it answers however, the data is now mine to do with as I please. It may be invalid since it is too old, but as long as I have it, I can do with it what I please. At that point, I don't care too much about the fate of the other process, since I have a safe copy. This in turn can used to build a system where we know where the stable state is all the time.
Fifth, we can exploit distributed Erlang. Have a couple of nodes. Store important data on multiple nodes. Now, should one node crash, the other nodes still have the data. And memory + network communication is often way faster than disk. Not to mention that you can get better parallel execution and faster recovery since data is already there in memory, ready to be served on the 10 gigabit link. The princess just phoned her girlfriends in Britain, France, Italy and Russia with the recipes for the next 100 meals (...and her work on homotopic type theory - princesses do have spare time to do research after all).
See, the point is: when the system begins failing - how do we want it to crash? When you get the chainsaw and slay the proverbial dragon (the tree in your garden which slighty but not really looks like a dragon at all) you don't want it to fall down into your nice house. You want it to crash differently, down on the lawn. The same with Erlang programs. We want them to crash so it has little impact on users, but also such that our important data is still safe. And if it goes really wrong, we want data persisted somewhere else. Either on another node in the cluster, or on disk. We want it to crash in ways which avoids the stable state.
The key is that we begin thinking about crashing a priori, before it happens. We think of where we have stable state and what parts we don't worry about crashing. The secret behind BitTorrent clients are that they are easy: you can throw away everything, sans the pieces of the file that have been checked for integrity. Everything else can just go crashing as it sees fit, we don't care. But when you take your own application and do the same kind of thinking, chances are that you will reach the same conclusion: there is a little bit of the system which needs protection, but you don't care about the rest.
That is a hint on how to structure your Erlang program.
PS. I should probably also write about how the loose coupling of Erlang processes foster good architecture, but that is another post for another time :)
(Edited a couple of times to fix wording - thanks DeadZen)
Add a comment