1. Helping with HaskellTorrent

    I have a ideology in which an Open Source project must be easy to hack for other people than the original author. Thus, I am trying to make this possible with HaskellTorrent. In particular, I keep several things in unsolved so others might join the fray, should they want to. This is a list, in no particular order, in which I put forth some things there are left to be done.

    Do note that I tend to keep bugs off this list. Many projects use the bug tracker as a way to track what needs doing. I have a TODO.md file in the top-level dir which contains things. Some of these things are taken from this list.

    HAVE message pruning

    Torrent files work by splitting a file into pieces and then breaking pieces into blocks. The blocks are then exchanged between peers until every block in a piece is downloaded. At that point, the SHA1 checksum of the piece can be used to verify the piece got downloaded correctly. If this is the case, a HAVE message is broadcasted to all peers we are connected to. This notifies other peers of the newly available piece so they can begin requesting it.

    However, there is an optimization oppurtunity. If the peer already has told us (usually by a HAVE message) that it got the piece already, there is no reason to tell it about the availability. Thus, we can prune the sending of the HAVE for those clients. There is already an IntSet with the necessary information inside the PeerP process with the information, so the change is fairly simple.

    Optimize Peer/PieceMgr communication

    Profiling shows we spend a considerable amount of time in the communication between the Peer and the Piece Manager. The Peer will, when downloading, try to keep a pipeline of block REQUEST messages going towards the other end. It mitigates the delay on the internet doing so. Hence, it periodically asks the PieceManager for new blocks to request. It does so by sending a number of blocks it wants together with an IntSet of what pieces the peer at the other end has.

    The Piece Manager is responsible for giving exclusive access to blocks to peers. There is no reason to download the same blocks at multiple peers, so it keeps track of what blocks where requested. Also, it must only serve blocks that the given peer can download. Another goal is that we would like to complete a piece as early as possible so we will try to complete from pieces that are in progress first.

    Currently, the system works by looking at the pieces in progress and then try to serve from these. If all blocks have been taken on pieces in progress, we find one that is pending (randomly) and available at the peer. This one is then promoted to being in progress and the algorithm runs again.

    There are a couple of nice optimizations possible. First, if the peer is a seeder, it effectively has every piece. Thus, there are no reason to keep the IntSet around for those, nor is there any reason to carry out expensive IntSet intersections. Second, if the peer is not a seeder, it would be better to keep a bit array around rather than an IntSet. It does count for a good amount of live memory in the Peer processes, and we expect there to be quite many of those. Third, we could benefit from keeping a cache of the last piece the peer requested blocks from. Trying this cache element blindly in the Piece Manager is advantageous to us: we spend less time in the Piece Manager code. It is also nice for the peer: if a piece is cached at the other end, we might help by requesting as many blocks as possible from that piece. Disk IO is a point of contention in modern bittorrent client implementations.

    Always pick the rarest piece first

    We currently pick pending pieces at random. A better scheme is to know about their availability and then pick them rarest first. The rarest piece is the easiest for us to spread, so it maximizes our ability to give back, which in turn maximizes our ability to download fast. The right Haskell cabal library for this is Data.PSQueue in the package PSQueue. If there are more pieces eligible at the same rarity, we will pick one at random. This gives an excellent use of View Types in Haskell, should one be interested.

    To do this, there is a milestone you need to reach beforehand. When we receive knowledge of piece availability, either through a BITFIELD or a HAVE message, we should propagate that information to the Piece Manager. In the beginning, we can just do nothing with it, and throw it away. It will pave the way for a PSQueue implementation however.

    Increase code quality

    Run hlint on the code. There is a target in the Makefile, which has recently been fixed. Find a GHC warning which is not yet enabled in the .cabal file and enable it, fixing the bugs that turn up. I guess some of these are more evil to fix than others, so start with the easier ones.

    Another area are tests. Running HaskellTorrent —tests will run the embedded test suite. This one fails on 64bit architectures at the moment (I think, this is the narrowing-down I’ve done). If you spot any area which you can figure out how to test, I would really like to discuss it with you. The code can do with a lot more testing than it use right now.

    Discuss use of attoparsec, attoparsec-iteratee and the event library

    I am seriously contemplating impaling all network performance bottlenecks once and for all by using this triple. If you are interested in hacking these, I’ll be happy to talk to you about it. It would push the bottleneck to the Disk layer once and for all, I think. The cool thing is that there are code to gain inspiration from.

    Another nail to the coffin is to support the FAST-extension in the new parser right away. It would pave the way for the rest of the client to understand this extension, so we would be able to get better and faster communication. It also plugs a bug in the original bittorrent specification.

    Use mmap() for Disk IO

    The right way to do disk IO is by use of mmap() on the files. Reading and writing files are not going to be hard, but we also need to get the hopenssl library to talk to the mmap()’ed backend so we get fast checksum calculations.

    1

    View comments

  2. Tufte applied to games

    I played a small bit of the old Infocom game “Wishbringer” the other day. And somehting struck me. This game is small. It has something like 50 locations, with miniscule text at each location. Of course this is because of the limitations of the hardware from the day the game was released, but it also made me think. There is so much game crammed into those 50 locations. Clever location reuse makes the game seem larger than it is. In fact, the majority of locations change considerably over the course of the game.

    Now contrast this with newer sandbox games of enormous size: Morrowind, Oblivion, Fallout 3, all by Bethesda. These games are absolutely huge, but in contrast, the world is a mostly static one with very few changes over the course of the game. Tufte, who just got appointed by the US government to visualize government spending, had this notion of “ink carrying meaning” and “ink”. The ratio between these gives you a number of how much ink is wasted and how tightly information is packed on a piece of paper.

    Applying Tuftes observation to game worlds is interesting and fun. Games like Morrowind and Oblivion has a very low ratio of value, whereas an old game like Wishbringer has a very high value. Surprisingly, the game Arx Fatalis from 2002 will have a rather high ratio. In Arx Fatalis, the game world is pretty small. But the game uses several tricks to circumvent this. The game world is reusing locations for more than one thing. Also, the game world is opened up to you gradually along with the game unfolding. The same idea is applied in the game “The Witcher” where the majority of the game is going on inside a city. New major areas of the city opens up in later acts, but the old areas are kept and changes.

    Today, building huge worlds is not hard. You procedurally generate most of the content in the game through different technologies like SpeedTree for instance. Oblivion has hand-crafted dungeons it would seem, but usually they are just there for being there and not for moving the story ahead. Contrast that with random dungeon in Diablo 2. Usually, there is a chest at the end of those with good levelled loot. Being a loot-game, Diablo 2 encourages you to walk the dungeon.

    I sincerely hope that in the future, more games will be like “Wishbringer”. Mass Effect 1 (Have not played the 2nd game) shows some of the way, but it also adds a lot of randomly generated open spaces driven by a car. These are boring. Kill.

    Luck has it, that small flash-games and similar platforms will make the game logic be central again. I expect there to be some really interesting games with good “Tufte ratios” in the coming years. But the platform will be Web, mobile phone, Javascript or Flash.

    5

    View comments

Blog Archive
About Me
About Me
What this is about
What this is about
I am jlouis. Pro Erlang programmer. I hack Agda, Coq, Twelf, Erlang, Haskell, and (Oca/S)ML. I sometimes write blog posts. I enjoy beer and whisky. I have a rather kinky mind. I also frag people in Quake.
Popular Posts
Popular Posts
  • On Curiosity and its software I cannot help but speculate on how the software on the Curiosity rover has been constructed. We know that m...
  • In this, I describe why Erlang is different from most other language runtimes. I also describe why it often forgoes throughput for lower la...
  • Haskell vs. Erlang Since I wrote a bittorrent client in both Erlang and Haskell, etorrent and combinatorrent respectively, I decided to put ...
  • A response to “Erlang - overhyped or underestimated” There is a blog post about Erlang which recently cropped up. It is well written and pu...
  • The reason this blog is not getting too many updates is due to me posting over on medium.com for the time. You can find me over there at thi...
  • On using Acme as a day-to-day text editor I've been using the Acme text editor from Plan9Port as my standard text editor for about 9 m...
  • On Erlang, State and Crashes There are two things which are ubiquitous in Erlang: A Process has an internal state. When the process crashes,...
  • When a dog owner wants to train his dog, the procedure is well-known and quite simple. The owner runs two loops: one of positive feedback an...
  • This post is all about parallel computation from a very high level view. I claim Erlang is not a parallel language in particular . It is not...
  • Erlangs message passing In the programming language Erlang[0], there are functionality to pass messages between processes. This feature is...
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.