1. Since the last blog post, a number of new things have been developed for Haskell Torrent. Here is a list of the highlights:
    • John Gunderman documented a queue interface and helped me with haddock.
    • The code doing SHA1 Digests have been abstracted into its own module.
    • We now handle interest as intended in the bittorrent protocol. We update interest based on what peers tells us they have available.
    • We now pick pieces in Random order rather than linearly.
    • We correctly inform peers when we have retrieved a full message by sending them HAVE messages.
    • We can now track how many bytes are left on a torrent. This is needed for the tracker.
    • We have some preliminary rate calculation code, based on a 20 second window and running averages. It ensures a somewhat fair treatment of the rates of different peers.
    • The Choke Manager now uses the rate calculation code to choke peers based on their current speed.
    • The Readme file has been updated with what extensions and protocol-parts we support.
    The current goal:
    We currently aim for two things: BEP-003 support and a preview release (version 0.1). The BEP-003 is the support for the basic bittorrent protocol with no extensions at all present. This is the bare minimum needed to be able to call ourselves a bittorrent client. The list of missing items for BEP-003 is as follows:
    • When the torrent file is initializing it needs to calculate how many bytes there are left to download before completion. This number is reported to the tracker.
    • We need to make sure that the tracker is informed of Torrent completion in the right way.
    • The client will have to handle multi-file torrents. That is, torrents where there are more than one file. We can punt this to after the preview release if needed, because it is not that important for a preview of the client.
    • We must correctly change to a seeder mode when the torrent file in question completes. This is currently not happening.
    • We use a Pure Haskell SHA1 checksum code. Rather than doing this, we should outsource our digest calculation to OpenSSL through the hsopenssl package. The abstraction should make this easier. This is not strictly needed for BEP-003, but it is nice to have.
    The client will currently not handle the endgame in any special way. This means that a torrent will take considerably more time to complete. On one hand this would be nice to have before the preview release. On the other hand, I think it is better to bump it.
    On the Preview:
    The preview is not a working client. The purpose is to get the client tested in different environments by different people so we can find some bugs and get them sacked. I would like to have a client which has shown itself able to download files in non-lab environments, but perhaps with many bugs still left. This is also to get a test of the release engineering involved in the process. The release will also mark a milestone in the development: we can use it as a stepping stone and a breathing space while we decide what to work on next.
    The plan is to use hackage for the release. I am using the Haskell Platform here - but that means GHC6.10.4. We could use a couple of compilation tests on what will become the next Haskell Platform. I am afraid we use some stuff which changed. A quick heads up would be nice.
    Also, before the release, I would like some janitorial code work done. I have not enabled -Wall while developing, but around now might be the time that someone enables it and begins cleaning up dead code, limit module exports and similar janitor-work.
    Watch this blog. The moment the client has downloaded its first non-lab torrent will be worthy of an announce. I can't predict the future, but my gut feeling says it is going to be soon.
    (Note: The layout of this post is miserable. This is what you get in return for using blogspots built-in editor :)
    2

    View comments

  2. Implementing processes

    The Haskell Bittorrent project is evolving at a steady state these days. In the last couple of days, we have implemented most of the code relevant for carrying out choking of peers. Choking is the process by which you only communicate to a few peers at a time. Thus TCP/IP congestion can be avoided which drives up the download and upload rates. This post is not on this part however, which must wait a bit.

    A fellow dane, Thomas Christensen, heeded my call and did some hlint runs over the code. Hopefully, we’ll see more work from him (github). Alex Mason has added even more parsing stuff through Cereal to the code, fixed a number of bugs in the parser and improved its general state. His venture is described on his blog (here).

    Processes

    I will be talking about processes in this post. When haskell-torrent started, our processes were simply “things spawned in the IO monad”. The approach works, but it quickly becomes inadequate for several reasons. In Haskell and FP in general, a lot of power stems from the idea that we can write a large set of small building blocks and then compose them together to form increasingly larger and larger blocks as we go. When composing, we use a fairly small number of helpers — readily present in the standard libraries.

    When running in IO, we quickly end up with a lot of state. This can of course be passed around by “hand” and tail-calls. Unfortunately, this means we will end up using a lot of our precious coding time doing just exactly that. To optimize, we need a way to reflect away configuration and state when we don’t need it, and reify the information at certain points in the program where it is necessary to know the state.

    The ubiquitious tool in Haskell for this are Monads. Not a single monad like IO, but a for-the-case relevant monad built from monad transformers. A couple of days ago, I installed XMonad by Spencer Janssen, Don Stewart, Jason Creighton and many more. This is a window manager — but surprisingly, it needs to solve some problems similar to the one in haskell-torrent. By inspiration from XMonad, we define

    > newtype Process a b c = Process (ReaderT a (StateT b IO) c)
    >    deriving (Functor, Monad, MonadIO, MonadState b, MonadReader a, Typeable)
    

    That is, a Process is a type. It contains some configuration data a, an internal state b and is in the process of evaluating to a value of type c. We use a Reader transformer so we can ask for configuration data when we need it. We do not expect this configuration data to be altered when the process runs. A State Transformer takes care of the internal state of the process. Finally, we let the underlying monad be IO. This gives us access to CML and the outside world.

    We let GHC derive a large set of things automatically (using several extensions in the process). This gives an easier way to manipulate the state when the process is running.

    Running a Process is easy:

    > runP :: a -> b -> Process a b c -> IO (c, b)
    > runP c st (Process p) = runStateT (runReaderT p c) st
    

    which is exactly like in XMonad. Spawning a new process is also fairly easy:

    > spawnP :: a -> b -> Process a b () -> IO ThreadId
    > spawnP c st p = spawn proc
    >   where proc = do runP c st p
    >                   return ()
    

    In general different processes will have different configurations a. These will usually contain the channels on which the process can communicate. We then define a type class

    > class Logging a where
    >   getLogger :: a -> LogChannel
    > 
    > instance Logging LogChannel where
    >   getLogger = id
    

    of types that contain a logger channel. This means we can define a generic log function like this:

    > log :: Logging a => String -> Process a b ()
    > log msg = do
    >     logC <- asks getLogger
    >     liftIO $ logMsg logC msg
    

    Type classes are a magnificent tool when we want to coerce a general function on top of different types. Many of our configurations in the client will instance the Logging class and then the log function knows how to access the logger in the Reader.

    What does this buy us

    The advantage of doing this change on the code is twofold: First, the amount of parameter passing is considerably reduced. Reflection into the monad solves this problem. We are now able to compose easier. Function composition is considerably harder when there are many parameters abound. With the change, preliminary restructuring of the Peer process shows a much simpler flow. Also, there are now ample refactoring opportunities available with the change.

    Second, the monad means we can use locality much more to our advantage. A common idiom is to modify the state of the process or to retrieve the current state for query. This now happens locally at the point where it is needed. Before, we might have needed to pass a parameter through several functions and then use it.

    What next?

    There are a small number of things that needs to be addressed before we can claim that the client is a full bittorrent client:

    • The client needs to correctly handle the concept of interest. It must tell other clients if it is interested in the pieces they have at their disposal for transmission. I have some preliminary code for doing this.
    • The client needs to correctly tell the tracker how many bytes it uploaded and downloaded. This measure is needed on many private trackers as they require people to upload data back.
    • The client needs to be better at choosing the next eligible piece. Choosing one randomly is good enough.
    • The client needs to handle multi-file torrents. It is not as hard as it may sound — the only part of the system that needs to know about files is the code handling the file system. All other parts can just keep on transferring pieces.
    • For choking to work correctly, we must know how fast we are currently transferring to a peer. This is an interesting little problem if somebody feels their curiosity tickled :)
    • We currently take space proportional to torrent size due to our SHA1 calculation being slow and not use a file descriptor. Research into a faster SHA1 library would be really beneficial.
    • We need to accept incoming connections. The system only connects outward at the moment.

    And of course, it needs some testing in a non-lab setting. Currently it can seed and leech, but my setup is very simple: Opentracker and rtorrent on another computer.

    The main github repository for haskell-torrent is here.

    2

    View comments

  3. On Process Hierachies

    I have been in thinking mode the last couple of days. It all started when I decided to look at the problem of wrong in haskell-torrent. What will happen when something fails? Joe Armstrong has part of the answer by arguing that we must proactively expect a process to fail and then let some other process clean it up (paraphrased a lot by me, these are not his exact words).

    Take a look at this image,

    It is a hypergraph where each vertex corresponds to a process and each edge corresponds to an channel. The graph is undirected because in principle each process can both send and receive on the channel in question. In practice however, one can limit the communication as some processes will only transmit on a channel, and others will only receive (and some will do both).

    To simplify the graph, I omit local RPC channels. It is customary to implement RPC by creating a local channel and transmitting that channel over one of the hypergraph edges. Response can then work directly on the local channel, simplifying the problem of giving identity to processes in many cases.

    Termination

    A somewhat scary part of such a network is what happens when we want to terminate it or if something goes wrong in one of the processes. In particular, we have the problem that while Haskell is a pure, nice, protected Nirvana — the real world inside IO is a dank, dark place of twisted little passages (all alike). Thus it might very well be that we get to experience an exception first-hand in one of the above processes.

    I have toyed with the idea of copying the excellent CHP library by Neil Brown. Neil defines a concept of poison for channels. A channel that has been poisoned will always transmit poison. Poison thus propagates through the hypergraph and slowly but surely kills off each process. Neil has a description of the process on his blog. I like the idea because it uses the existing infrastructure of the hypergraph. It would be fairly easy to add support for writing poison handlers at each node.

    Process Hierarchies

    I have pondered on a different scheme for the last couple of days however. We can impose a location graph on the hypergraph above:

    This graph, a tree, describes the location or hierarchy of the processes in question. The processes named S0, S1, … and so forth are supervisors. Their responsibility is only one: Keep their children running according to a policy. Erlang programmers fluent in the OTP high level libraries present in the Erlang distribution will recognize the supervisor term.

    A normal (worker) process has two rules:

    • If the process terminates, it must inform its supervisor.
    • If the process gets an asynchronous KillThread exception, it must die.

    A supervisor process needs to obey the following scheme:

    • If a process under its supervision dies it must take affair according to its policy. This may mean that it kills off all other children and reports back to its supervisor for instance. Or it may mean that it simply restarts the process that got killed. What happens is dependent on the policy we configured the supervisor with.

    • If a supervisor is asked to shutdown by a KillThread exception, it must first kill all of its children.

    Note that with these rules, termination is almost free: Terminate the tree. Because everything is linked to the tree, termination will happen. We will terminate by asynchronous exceptions aggressively.

    An interesting difference from Erlang is that of communication, which is a certain sense is Dual in Haskell+CML: In Erlang, the process id is what you need to hold in order to transmit an (async.) message to the process. In our CML-based system the channel is the name you must hold in order to communicate. In other words, this lets us, in some circumstances, replace the process with a new one but keeping the channel. Note, though, that the interface is not completely clean: if the Haskell runtime figures out a thread is indefinitely blocked, it will in general throw an exception — but I have not yet read the CML code in detail so I do not know if this will happen. It depends on what MVar references a channel hold.

    Another view

    The hypergraph and location graph constitutes what is sometimes called a Bigraph. This is not to be confused with the term Bipartite graph which is a completely different term but sometimes also called a bigraph unfortunately. I somewhat resist the idea to use the same term for different concepts, but work done in this twi-graph concept presented in this post on bigraphs suggests that the term is quite used in this respect.

    The research seems different from what I need however. ITU has done extensive research in bigraphical programming languages, letting them describe various known calculi (Lambda, Pi, etc.). I just want to use the descriptive power of the bigraphs to discriminate location from communication.

    So to conclude: Does that give anybody out there a brilliant idea or does my design have an inherent flaw I should be aware of? If not, I'll probably try to sketch some code in Haskell for the Supervisor tree and processes.
    0

    Add a comment

Blog Archive
About Me
About Me
What this is about
What this is about
I am jlouis. Pro Erlang programmer. I hack Agda, Coq, Twelf, Erlang, Haskell, and (Oca/S)ML. I sometimes write blog posts. I enjoy beer and whisky. I have a rather kinky mind. I also frag people in Quake.
Popular Posts
Popular Posts
  • On Curiosity and its software I cannot help but speculate on how the software on the Curiosity rover has been constructed. We know that m...
  • In this, I describe why Erlang is different from most other language runtimes. I also describe why it often forgoes throughput for lower la...
  • Haskell vs. Erlang Since I wrote a bittorrent client in both Erlang and Haskell, etorrent and combinatorrent respectively, I decided to put ...
  • A response to “Erlang - overhyped or underestimated” There is a blog post about Erlang which recently cropped up. It is well written and pu...
  • The reason this blog is not getting too many updates is due to me posting over on medium.com for the time. You can find me over there at thi...
  • On using Acme as a day-to-day text editor I've been using the Acme text editor from Plan9Port as my standard text editor for about 9 m...
  • On Erlang, State and Crashes There are two things which are ubiquitous in Erlang: A Process has an internal state. When the process crashes,...
  • When a dog owner wants to train his dog, the procedure is well-known and quite simple. The owner runs two loops: one of positive feedback an...
  • This post is all about parallel computation from a very high level view. I claim Erlang is not a parallel language in particular . It is not...
  • Erlangs message passing In the programming language Erlang[0], there are functionality to pass messages between processes. This feature is...
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.