In UNIX there is a specific error number which can be returned from system calls. This error, EAGAIN is used by the OS kernel whenever it has a complex state in which it is deemed too hard to resolve a proper answer to the userland application. The solution is almost a non-solution: you punt the context back to the user program and ask that it goes again and retries the operation. Then the kernel gets rid of the complex state and the next time the program enters the kernel, we can be in another state without the trouble.
Here is an interesting psychological point: we can use our code to condition another persons brain to cook up a specific program that serves our purpose. That is, we can design our protocols such that they force the user to adapt certain behaviour to his programs. One such trick is deliberate fault injection.
Say you are serving requests through a HTTP server. Usually, people would imagine that 200 OK is what should be returned always on succesful requests, but I beg to differ. Sometimes—say 1/1000 requests—we deliberately fail the request. We return a 503 Service Unavailable back to the user. This conditions the user to write error-handling code for this request early on. You can't use the service properly without handling this error, since it occurs too often. You can even add a "Retry-After" header and have him go immediately again.
This deliberate fault injection has many good uses.
- First, it enforces users of your service to adapt a more biological and fault tolerant approach to computing. Given enough of this kind of conditioning, programmers will automatically begin adding error-handling code to their requests, because otherwise it may not work.
- Second, it gives you options in case of accidents: say your system is suddenly hit by an emergency which elevates the error rate to 10%. This has no effect, since your users are already able to handle the situation.
- Third, you can break conflicts by rejecting one or both requests.
- Fourth, you can solve some distribution problems by failing the request and have the client retry.
- Fifth, simple round-robin load balancing is now useful. If you hit an overloaded server, you just return 503 and the client will retry another server.
I have a hunch that Amazons Web Services uses this trick. Against S3, I've seen an error rate suspiciously close to 1/500. It could be their own way of implementing a chaos monkey and then conditioning all their users to write code in a specific way with it.
The trick is also applicable in a lot of other contexts. Almost every protocol has some point where you can deliberately inject faults in order to make other clients behave correctly. It is very useful in testing as well. Use QuickCheck to randomly generate requests and let a certain amount be totally wrong. These wrong requests must then be rejected by the system. Otherwise something is wrong with it.
More generally, this is an example of computer programs being both formal and chaotic at the same time. One can definitely find interesting properties of biological processes to copy into computer systems. While it is nice to be able to prove that your program is correct, the real world is filled with bad code, faulty systems, breaking network switches and so on. Having a reaction to this by having your system be robust to smaller errors is definitely going to be needed. Especially in the longer run, where programs will become even more complex and communicate even more with other systems; other systems over which you have no direct control.
You can see fault-injection as a type of mutation. The programs coping with the mutation are the programs which should survive in the longer run.
Consider hacking the brain of your fellow programmers. And force them to write robust programs by conditioning their minds into doing so.
Thanks to DeadZen for proof-reading and comments.
View comments