I have been writing code lately on a BitTorrent client in Haskell. Here is the grand old history of that effort. Many years ago, I attempted to start a client, Conjure, written in Haskell. Sadly, it really did not make it to the point where it was really usable, for me at least.
Then a couple of years passed, I became involved with programming in Erlang. Erlang has a really nice Actor-based concurrency model. Since BitTorrent is highly concurrent and event-driven it was easy to start another implementation effort in Erlang. Thus etorrent was born. Sadly, I lost interest in etorrent - partially because I am partial to statically typed languages of which Erlang is not one.
Thus, we come to this december, where I decided to figure out the state of concurrency in Haskell. To really give GHC a whirl, we need to implement something which is fairly complex. Some people implement ray tracers when toying with programming languages, some solve projecteuler problems. I ... write BitTorrent clients.BitTorrent is a good maturity check. First, you need decent Disk I/O, decent network I/O and a fairly complete HTTP client. Second, the protocols are easy, but still requires some nontrivial parsing to occur. Finally, the system is fairly concurrent, so it fits nicely into concurrency models if the language has one.
Initial Thougths:
Initally, there is the choice of a concurrency framework. In Haskell, there are several such. The built-in Control.Concurrent is fairly low level and fairly simple. At the CS department at Copenhagen University, DIKU, there is currently a course running in CSP and some students decided to use Control.Concurrent.CHP by Neil Brown. I think it will suit them well for the course, but it irritated me that it mapped so weakly to the etorrent model. So Control.Concurrent.CML became the basic library for concurrency. This concurrency model is not entirely unfamiliar to me: it is written by John Reppy for ML.
The CML model is somewhat reminiscent of the Pi-calculus. In particular, we can transmit channels on channels. This gives a neat method for doing RPC-style operations. Send off a message together with a responder channel and then wait for a response on that channel. This means we are easily able to emulate most of what the Actor model has if needed.
Haskell-torrent design:
Haskell-torrent uses a basic design, where a fairly large number of processes communicate with each other to solve the problem of uploading and downloading files. We have a limitation of one single torrent right now, but that should be fixable in time. The process diagram at the header describes the basic layout. Each box is a single process, apart from the "Peer" box, which is multiple processes. The diamond-shapes describes external communication over the network.
Console: A process responsible for User communication. The idea is to have a simple console-interface on which to log messages (it is heavily used for debugging). The Console can in time also be used for parsing user input and affecting the system. It only responds to a "quit" command at the moment. Logging is done with a LogChannel on which the Console process syncs and prints out information.
Main: Initial program entry point. Spawns everything else and is the last thing that closes again.
Status: Keeps track of various torrent-specific statuses. How much is left to download? How many seeders are there? How fast is the torrent currently? The intention is to keep this kind of information around in that process.
Tracker: Communicates with the tracker. Is a specialized HTTP client. It talks to the Timer so it won't continually request the tracker for information. In due time, it should also be able to do UDP tracking. It is using bencoded documents as per the tracker spec. These bcoded documents are handled as in Conjure with the monadic parser generator Parsec.
PeerMgr: Handles a set of peers. Gets lists of other peers via the Tracker process and spawns a pool of them. In due time, this process will handle choking/unchoking of peers so we don't clog the TCP/IP stack by trying to talk to everybody at once.
Listener: Listens on a TCP port and spawns off Peers when they try to connect. Not implemented at all yet, but is fairly easy to get to work.
OMBox: A very simple implementation on top of CML of what is present in Control.Concurrent.SampleVar. This will let the PeerMgr be able to place things for the Peer to read later and it avoids a deadlock between those two which otherwise would occur. OMBox is currently not cleaned up properly. We need some mechanism for poisoning the communication channel so it closes down again gracefully.
Peer: The system contains zero or more peers. Each peer is really four processes. One controls the state of the peer and reacts on the events from the peer. Two are responsible for sending messages to the peer: one as a queue, and one blocks on the socket. Finally, a process receives messages on the socket and then sends them to the controller, blocking in the process.
PieceMgr: Keeps track of the current piece-state. It knows what pieces we have, what we are missing and what to request next. Peers will be using it for asking what to download next.
FS: Abstracts away the file system for the rest of the client. Is implemented as a communication process which peers ask for blocks to send.
Current State:
Currently, the client is pretty flaky. It is able to seed in a controlled lab-setting, but needs much work to be able to leech data from other clients. The TODO-list is fairly long and some of the things on it are pretty hard. But there are also many things which are fairly simple to fix. Feel free to grab anything and help out if it sounds interesting. I am deliberately keeping things simple and leaving doors open. It is my hope that even aspiring Haskell programmers might stand a chance at solving some of the things on the TODO list. I'll happily take any help I can get as in the long run, this is fairly hard to carry out as a single person. The code is on github in my haskell-torrent repository if this should indulge you.
